[from my previous post...] If it's 3, then where is the effect of constant factor (s=2.5), because we just count values which are not 0? So where is the effect of constant factor(s = 2.5)? 
If it's 3, then where is the effect of constant factor (s=2.5), because we 
So is it like this? Let assume this condition: term1 in document 3 and document 7 appears in document title, so it will be multiplied by constant factor s=2.5, the matrix is like below: tf weighting d1 d2 d3 d4 d5 d6 d7 [ 0.00 0.00 2.50 1.00 0.00 0.00 2.50 ] > term 1, idf = log(N/dfi) = log(7/3) = 0.368 [ 1.00 1.00 0.00 0.00 0.00 1.00 0.00 ] > term 2, idf = log(N/dfi) = log(7/3) = 0.368 [ 1.00 1.00 0.00 0.00 0.00 1.00 0.00 ] > term 3, idf = log(N/dfi) = log(7/3) = 0.368 [ 1.00 0.00 0.00 0.00 1.00 0.00 0.00 ] > term 4, idf = log(N/dfi) = log(7/2) = 0.544 [ 0.00 0.00 1.00 1.00 0.00 0.00 0.00 ] > term 5, idf = log(N/dfi) = log(7/2) = 0.544 And then tf . idf weighting, for example we multiply 2.5*0.368 = 0.92(how if this value(0.92) is greater than 1.00, is it permitted?) d1 d2 d3 d4 d5 d6 d7 [ 0.000 0.000 0.920 0.368 0.000 0.000 0.920 ] [ 0.368 0.368 0.000 0.000 0.000 0.368 0.000 ] [ 0.368 0.368 0.000 0.000 0.000 0.368 0.000 ] [ 0.544 0.000 0.000 0.000 0.544 0.000 0.000 ] [ 0.000 0.000 0.544 0.544 0.000 0.000 0.000 ] Are two matrices on the above is right? And how if tf . idf weight is greater than 1.00, is it permitted? And is normalization process needed because of this(the tf . idf weight is greater than 1.00) problem? Thanks. 
It is very likely to happen even without weighting, e.g. if a term appears twice in the same document. So mutliplying by 2.5 is no different from multiple occurrences of the same word in one document. That's why for some applications you need to normalize the columns of td matrix. d1 d2 d3 d4 d5 d6 d7 Looks ok. And is normalization process needed because of this(the tf . idf weight is It's needed because otherwise, longer documents would get higher values in the matrix (because they simply have more words), and that's sometimes undesirable. Cheers, S. 
