Optimum Parameters for Lingo

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimum Parameters for Lingo

seyfullahd
Hi again :)

I wonder what the optimum values for the two backbone thresholds as minClusterSize and desiredClusterCountBase. Could their default values be accepted as the best value for Lingo with the purpose of web search results clustering. I suppose we could, since they are the defaults, but wanted to ask it though. I am actually asking this question in order to be sure about which parameters should I use for my experiments to compare the current Lingo's results and my experiments' results.


And also I have a second question.

In workbench application, we can check / uncheck or change values of parameters.
My question is when we change a parameter's value, is the algorithm works from the beginning again or what? By changing a parameter's value in workbench, could we be really experimenting what if this parameter's value change to this value. Or in order to experiment the effects of the changes of parameters' value, do we actually have to change the code and reproduce the jars, workbench. If it depends on parameter, my question is actually on minClusterSize and desiredClusterCountBase.

Thanks in advance,

Seyfullah
Reply | Threaded
Open this post in threaded view
|

Re: Optimum Parameters for Lingo

Stanislaw Osinski-3
Hi,

I wonder what the optimum values for the two backbone thresholds as
minClusterSize and desiredClusterCountBase. Could their default values be
accepted as the best value for Lingo with the purpose of web search results
clustering. I suppose we could, since they are the defaults, but wanted to
ask it though. I am actually asking this question in order to be sure about
which parameters should I use for my experiments to compare the current
Lingo's results and my experiments' results.

There is no one optimum set of parameters. Unfortunately (from the research point of view), these are user-specific: some users may prefer fewer clusters and some may prefer more, same with the cluster sizes. I'm not sure what algorithm you'll be comparing with, but if its a modified version of Lingo, then maybe it would be enough to use the same values for both algorithms?

 
And also I have a second question.

In workbench application, we can check / uncheck or change values of
parameters.
My question is when we change a parameter's value, is the algorithm works
from the beginning again or what?

Yes, clustering is performed from scratch when you change parameters.

 
By changing a parameter's value in
workbench, could we be really experimenting what if this parameter's value
change to this value. Or in order to experiment the effects of the changes
of parameters' value, do we actually have to change the code and reproduce
the jars, workbench. If it depends on parameter, my question is actually on
minClusterSize and desiredClusterCountBase.

If you're changing the parameters of the default Lingo algorithm, then you don't need to update any code. For modified algorithm, I'd strongly recommend writing a simple Java class that would run the tests for you and save results to some log file. This would be both simpler to code and more effective to use.

Cheers,

S.


------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Optimum Parameters for Lingo

seyfullahd
Stanislaw Osinski-3 wrote
I'm not sure what algorithm you'll be comparing with, but if its a modified version
of Lingo, then maybe it would be enough to use the same values for both
algorithms?
Yeah, thanks :)

Stanislaw Osinski-3 wrote
> And also I have a second question.
> In workbench application, we can check / uncheck or change values of
> parameters.
> My question is when we change a parameter's value, is the algorithm works
> from the beginning again or what?


Yes, clustering is performed from scratch when you change parameters.
Great! :)


Stanislaw Osinski-3 wrote
For modified algorithm, I'd strongly
recommend writing a simple Java class that would run the tests for you and
save results to some log file. This would be both simpler to code and more
effective to use.
I've never thought about this before! Thanks! :) Is there any example test class for such a purpose in the current code by the way? Maybe, I could cheat from there :)

Cheers!

Seyfullah
Reply | Threaded
Open this post in threaded view
|

Re: Optimum Parameters for Lingo

Stanislaw Osinski-3

I've never thought about this before! Thanks! :) Is there any example test
class for such a purpose in the current code by the way? Maybe, I could
cheat from there :)

Take a look at this class:

/carrot2-examples/examples/org/carrot2/examples/research/ClusteringQualityBenchmark.java

It's a very simple benchmark, but should be enough to start with.

S.

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Optimum Parameters for Lingo

seyfullahd
Stanislaw Osinski-3 wrote
Take a look at this class:

/carrot2-examples/examples/org/carrot2/examples/research/ClusteringQualityBenchmark.java

It's a very simple benchmark, but should be enough to start with.

S.
Thanks Stanislaw,

I already started with that class actually :)

I've done all my experiments and get the results using that class already. But I was not changing the parameters using that "carrot's jar client" class.

I suppose you mean I could run this class for many times, and each time I can change the parameters while keep using the same jar without needing changing the parameters in API for each time and build the jar again and run the class with new jar.

I remember we can get instance of LingoClusteringAlgorithm as giving an attribute set in which we decide the parameters. I suppose I should use it in that way, right? I now remember that there is an example of that usage, too. I will find how to and use it that way.

And, this didn't be a question when I finish, but it is ok :)

Thank you very much

Seyfullah
Reply | Threaded
Open this post in threaded view
|

Re: Optimum Parameters for Lingo

Dawid Weiss-2
> I remember we can get instance of LingoClusteringAlgorithm as giving an
> attribute set in which we decide the parameters. I suppose I should use it
> in that way, right? I now remember that there is an example of that usage,
> too. I will find how to and use it that way.

Those extra generated "builder" and "descriptor" classes are bound to
your class definition. So if you have modified the code and added
different attributes (or modified the code) then you will have to run
with the two different JARs or you will have to create your own
algorithm class separate from the existing one (so that you can use
both independently).

As for parameter passing -- this indeed is done via a
Map<String,Object>, there are helper classes that are generated at
compilation time that help you keep the code in sync with the actual
types and attributes. See UsingAttributes.java example -- it has a
number of examples of different ways of passing attributes. I would
suggest using the builder pattern.

Dawid

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers