Hi Stanislaw,
[from my previous post...] If it's 3, then where is the effect of constant factor (s=2.5), because we just count values which are not 0? So where is the effect of constant factor(s = 2.5)? 
Administrator

If it's 3, then where is the effect of constant factor (s=2.5), because we Cheers, S.  Crystal Reports  New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royaltyfree distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Carrot2developers mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/carrot2developers 
Hello Stanislaw,
So is it like this? Let assume this condition: term1 in document 3 and document 7 appears in document title, so it will be multiplied by constant factor s=2.5, the matrix is like below: tf weighting d1 d2 d3 d4 d5 d6 d7 [ 0.00 0.00 2.50 1.00 0.00 0.00 2.50 ] > term 1, idf = log(N/dfi) = log(7/3) = 0.368 [ 1.00 1.00 0.00 0.00 0.00 1.00 0.00 ] > term 2, idf = log(N/dfi) = log(7/3) = 0.368 [ 1.00 1.00 0.00 0.00 0.00 1.00 0.00 ] > term 3, idf = log(N/dfi) = log(7/3) = 0.368 [ 1.00 0.00 0.00 0.00 1.00 0.00 0.00 ] > term 4, idf = log(N/dfi) = log(7/2) = 0.544 [ 0.00 0.00 1.00 1.00 0.00 0.00 0.00 ] > term 5, idf = log(N/dfi) = log(7/2) = 0.544 And then tf . idf weighting, for example we multiply 2.5*0.368 = 0.92(how if this value(0.92) is greater than 1.00, is it permitted?) d1 d2 d3 d4 d5 d6 d7 [ 0.000 0.000 0.920 0.368 0.000 0.000 0.920 ] [ 0.368 0.368 0.000 0.000 0.000 0.368 0.000 ] [ 0.368 0.368 0.000 0.000 0.000 0.368 0.000 ] [ 0.544 0.000 0.000 0.000 0.544 0.000 0.000 ] [ 0.000 0.000 0.544 0.544 0.000 0.000 0.000 ] Are two matrices on the above is right? And how if tf . idf weight is greater than 1.00, is it permitted? And is normalization process needed because of this(the tf . idf weight is greater than 1.00) problem? Thanks. 
Administrator

Hi,
It is very likely to happen even without weighting, e.g. if a term appears twice in the same document. So mutliplying by 2.5 is no different from multiple occurrences of the same word in one document. That's why for some applications you need to normalize the columns of td matrix. d1 d2 d3 d4 d5 d6 d7 Looks ok. And is normalization process needed because of this(the tf . idf weight is It's needed because otherwise, longer documents would get higher values in the matrix (because they simply have more words), and that's sometimes undesirable. Cheers, S.  Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of techside developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp as they present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycatcom _______________________________________________ Carrot2developers mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/carrot2developers 
Free forum by Nabble  Edit this page 