Matrix multiplication after tf.idf weighting with constant factor(s=2.5)?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Matrix multiplication after tf.idf weighting with constant factor(s=2.5)?

zivelian
Hello Stanislaw,
[...from my previous post...]
Stanislaw Osinski wrote
>
> If it's 3, then where is the effect of constant factor (s=2.5), because we
> just count values which are not 0?
>
> So where is the effect of constant factor(s = 2.5)?


There is no effect on the idf (as it depends on document counts and not term
occurrences), the effect is only on tf -- you count each occurrence of a
word in document's title as 2.5 (or whatever value for s you assume).

Cheers,

S.
So is it like this?

Let assume this condition: term1 in document 3 and document 7 appears in document title, so it will be multiplied by constant factor s=2.5, the matrix is like below:

tf weighting

   d1    d2    d3   d4     d5     d6    d7
[ 0.00 0.00 2.50 1.00 0.00 0.00 2.50 ] --> term 1, idf = log(N/dfi) = log(7/3) = 0.368
[ 1.00 1.00 0.00 0.00 0.00 1.00 0.00 ] --> term 2, idf = log(N/dfi) = log(7/3) = 0.368
[ 1.00 1.00 0.00 0.00 0.00 1.00 0.00 ] --> term 3, idf = log(N/dfi) = log(7/3) = 0.368
[ 1.00 0.00 0.00 0.00 1.00 0.00 0.00 ] --> term 4, idf = log(N/dfi) = log(7/2) = 0.544
[ 0.00 0.00 1.00 1.00 0.00 0.00 0.00 ] --> term 5, idf = log(N/dfi) = log(7/2) = 0.544

And then tf . idf weighting, for example we multiply 2.5*0.368 = 0.92(how if this value(0.92) is greater than 1.00, is it permitted?)

     d1     d2     d3      d4       d5     d6       d7
[ 0.000 0.000 0.920 0.368 0.000 0.000 0.920 ]
[ 0.368 0.368 0.000 0.000 0.000 0.368 0.000 ]
[ 0.368 0.368 0.000 0.000 0.000 0.368 0.000 ]
[ 0.544 0.000 0.000 0.000 0.544 0.000 0.000 ]
[ 0.000 0.000 0.544 0.544 0.000 0.000 0.000 ]

Are two matrices on the above is right?

And how if tf . idf weight is greater than 1.00, is it permitted?

And is normalization process needed because of this(the tf . idf weight is greater than 1.00) problem?



Thanks.