How to count the score, in STC algorithm

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to count the score, in STC algorithm

Jumadi
I read formula to count the score, in STC algorithm is,
s(B) = |B|.f(|P|)

but, I see the result score in Carrot, written in fraction. Whereas, the formula above is multiplication.


Tq,

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: How to count the score, in STC algorithm

Dawid Weiss-2
Be more specific: which score are you talking about -- base cluster's
score? If so then the JavaDoc description of baseClusterScore provides
you with all the details of that function. It probably deviates from
the original paper as it has been tuned over time to provide better
results.

Quote:

       * Calculates base cluster score.
       * <p>
       * The boost is calculated as a Gaussian function of density
around the "optimum"
       * expected phrase length (average) and "tolerance" towards
shorter and longer phrases
       * (standard deviation). You can draw this score multiplier's
characteristic with
       * gnuplot:
       * <pre>
       * reset
       *
       * set xrange [0:10]
       * set yrange [0:]
       * set samples 11
       * set boxwidth 1 absolute
       *
       * set xlabel &quot;Phrase length&quot;
       * set ylabel &quot;Score multiplier&quot;
       *
       * set border 3
       * set key noautotitles
       *
       * set grid
       *
       * set xtics border nomirror 1
       * set ytics border nomirror
       * set ticscale 1.0
       * show tics
       *
       * set size ratio .5
       *
       * # Base cluster boost function.
       * boost(x) = exp(-(x - optimal) * (x - optimal) / (2 *
tolerance * tolerance))
       *
       * plot optimal=2, tolerance=2, boost(x) with histeps title
&quot;optimal=2, tolerance=2&quot;, \
       *      optimal=2, tolerance=4, boost(x) with histeps title
&quot;optimal=2, tolerance=4&quot;, \
       *      optimal=2, tolerance=6, boost(x) with histeps title
&quot;optimal=2, tolerance=6&quot;
       *
       * pause -1
       * </pre>
       * One word-phrases can be given a fixed boost, if
       * {@link #singleTermBoost} is greater than zero.
       *
       * @param phraseLength Effective phrase length (number of non-stopwords).
       * @param documentCount Number of documents this phrase occurred in.
       * @return Returns the base cluster score calculated as a
function of the number of
       *         documents the phrase occurred in and a function of
the effective length of
       *         the phrase.

Dawid

On Sun, Jul 14, 2013 at 4:07 PM, Jumadi Jumadi <[hidden email]> wrote:
> I read formula to count the score, in STC algorithm is,
> s(B) = |B|.f(|P|)
>
> but, I see the result score in Carrot, written in fraction. Whereas, the
> formula above is multiplication.
>
>
> Tq,

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers