Hello Stanislaw,

Stanislaw Osinski wrote

>

> And when exactly you multiplied term weight with this constant factor(s =

> 2.5), there are 3 options:

>

> 1. before tf-idf weighting scheme or

> 2. after tf-idf weighting scheme or

> 3. after column length normalization

>

> Which one?

As far as I can remember, it was option 1.

Cheers,

S.

After I choose option 1(before tf-idf weighting scheme), how do I calculate tf-idf weighting for terms which scaled(multiplied) by constant factor s=2.5?

Let assume this condition: term1 in document 3 and document 7 appears in document title, so it will be multiplied by constant factor s=2.5, the matrix is like below:

d1 d2 d3 d4 d5 d6 d7

[ 0.00 0.00

**2.50** 1.00 0.00 0.00

**2.50** ] --> term 1, weight = log(N/

**dfi**) = log(7/

**?**) = ...

[ 1.00 1.00 0.00 0.00 0.00 1.00 0.00 ] --> term 2, weight = log(N/dfi) = log(7/3) = 0.368

[ 1.00 1.00 0.00 0.00 0.00 1.00 0.00 ] --> term 3, weight = log(N/dfi) = log(7/3) = 0.368

[ 1.00 0.00 0.00 0.00 1.00 0.00 0.00 ] --> term 4, weight = log(N/dfi) = log(7/2) = 0.544

[ 0.00 0.00 1.00 1.00 0.00 0.00 0.00 ] --> term 5, weight = log(N/dfi) = log(7/2) = 0.544

What is the value of

**dfi** in term1's weight?

Is it 6(0+0+2.5+1+0+0+2.5), so term 1's weight = log(7/6) = 0.067

Or

Is it 3(there are 3 values in term1's row which is not 0), so term 1's weight = log(7/3) = 0.368

Or

Other answer?

Which one?

Thanks.