the number of cluters is adjustable?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

the number of cluters is adjustable?

cy163
Hi ALL ,


I wonder if the number of output clusters is adjustible for linggo clustering algorithm and for STC algorithm.

Thanks

cy163
Reply | Threaded
Open this post in threaded view
|

Re: the number of cluters is adjustable?

Dawid Weiss-2
The number of clusters is automatically determined by algorithms. You
can specify the "maximum" for the number of clusters, but this is
similar to just truncating their number in post-processing.

Dawid

On Sat, May 8, 2010 at 9:37 AM, cy163 <[hidden email]> wrote:

>
> Hi ALL ,
>
>
> I wonder if the number of output clusters is adjustible for linggo
> clustering algorithm and for STC algorithm.
>
> Thanks
>
> cy163
> --
> View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/the-number-of-cluters-is-adjustable-tp5022668p5022668.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>

------------------------------------------------------------------------------

_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: the number of cluters is adjustable?

Stanislaw Osinski
Administrator

The number of clusters is automatically determined by algorithms. You
can specify the "maximum" for the number of clusters, but this is
similar to just truncating their number in post-processing.

For Lingo you can set the desired number of clusters using the following attribute:

http://download.carrot2.org/head/manual/#section.attribute.LingoClusteringAlgorithm.desiredClusterCountBase

However, there is no one to one correspondence between the value of this attribute and the number of clusters, the number of clusters created by the algorithm will be proportional to the cluster count base, but not in a linear way.

S.



------------------------------------------------------------------------------


_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: the number of cluters is adjustable?

cy163
thank you for your response.

I am using carrot2 V 3.1.0. Then the parameter "LingoClusteringAlgorithm.desiredClusterCountBase"
is still available?

Would you please show me an example code snippet on it?

thanks
Reply | Threaded
Open this post in threaded view
|

Re: the number of cluters is adjustable?

cy163
I search and found the following codes

SimpleController controller = new SimpleController();
            Map<String, Object> attributes = new HashMap<String, Object>();

            attributes.put(AttributeNames.QUERY, terminos);
            attributes.put(AttributeNames.RESULTS, 500);

            attributes.put("LingoClusteringAlgorithm.factorizationFactory", LocalNonnegativeMatrixFactorizationFactory.class);
            attributes.put("LingoClusteringAlgorithm.titleWordsBoost", 8.5);
            attributes.put("LingoClusteringAlgorithm.desiredClusterCountBase", 8);
            attributes.put("LingoClusteringAlgorithm.clusterMergingThreshold", 0.55);


            ProcessingResult result = controller.process(attributes, org.carrot2.source.yahoo.YahooDocumentSource.class, LingoClusteringAlgorithm.class);


from
http://rankeodebusquedas.googlecode.com/svn-history/r74/trunk/ClusterX/src/java/aplicacion/Buscador.java



I wonder

  what is the meaning of  
(0) LocalNonnegativeMatrixFactorizationFactory.class
(1) AttributeNames.RESULTS
(2) LingoClusteringAlgorithm.titleWordsBoost
(3) LingoClusteringAlgorithm.clusterMergingThreshold

Are they available for carrot2 v3.1.0?

Does clusterMergingThreshold has impact on the number of resulting clusters?
Reply | Threaded
Open this post in threaded view
|

carrots clustering Chinese documents

cy163
I have to cluster some Chinese documents. with carrot2 V3.1.0

I use the following line

attribute.put(Attribute.ACTIVE_LANGUAGE,   LANGUAGECODE.SIMPLE_CHINESE);

in my program.   I wonder besides the line any thing else I should mention to make carrot2 V3.1.0 to process Chinese text properly.


Thanks
Reply | Threaded
Open this post in threaded view
|

Re: the number of cluters is adjustable?

Stanislaw Osinski
Administrator
In reply to this post by cy163

I search and found the following codes

SimpleController controller = new SimpleController();
           Map<String, Object> attributes = new HashMap<String, Object>();

           attributes.put(AttributeNames.QUERY, terminos);
           attributes.put(AttributeNames.RESULTS, 500);

           attributes.put("LingoClusteringAlgorithm.factorizationFactory",
LocalNonnegativeMatrixFactorizationFactory.class);
           attributes.put("LingoClusteringAlgorithm.titleWordsBoost", 8.5);

attributes.put("LingoClusteringAlgorithm.desiredClusterCountBase", 8);

attributes.put("LingoClusteringAlgorithm.clusterMergingThreshold", 0.55);


           ProcessingResult result = controller.process(attributes,
org.carrot2.source.yahoo.YahooDocumentSource.class,
LingoClusteringAlgorithm.class);


from
http://rankeodebusquedas.googlecode.com/svn-history/r74/trunk/ClusterX/src/java/aplicacion/Buscador.java
http://rankeodebusquedas.googlecode.com/svn-history/r74/trunk/ClusterX/src/java/aplicacion/Buscador.java

You'll find the same code in Carrot2:

http://download.carrot2.org/stable/carrot2-java-api-3.3.0.zip

Or as source code:

http://fisheye3.atlassian.com/browse/carrot2/trunk/applications/carrot2-examples/src/

Incidentally, I recommend that you switch to version 3.3.0 of Carrot, it has improved clustering performance and the controller API.

I wonder

 what is the meaning of
(0) LocalNonnegativeMatrixFactorizationFactory.class
(1) AttributeNames.RESULTS
(2) LingoClusteringAlgorithm.titleWordsBoost
(3) LingoClusteringAlgorithm.clusterMergingThreshold

These are all different tuning attributes, see the manual for a description: http://download.carrot2.org/head/manual/#section.component.lingo
 
Are they available for carrot2 v3.1.0?

Yes, but as I said, for best results, try the newest version.

Does clusterMergingThreshold has impact on the number of resulting clusters?

Yes, the more aggressive merging, the fewer clusters. See the manual for a complete description: http://download.carrot2.org/head/manual/#section.attribute.LingoClusteringAlgorithm.clusterMergingThreshold

S.


------------------------------------------------------------------------------


_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: carrots clustering Chinese documents

Stanislaw Osinski
Administrator
In reply to this post by cy163

I have to cluster some Chinese documents. with carrot2 V3.1.0

I use the following line

attribute.put(Attribute.ACTIVE_LANGUAGE,   LANGUAGECODE.SIMPLE_CHINESE);

in my program.   I wonder besides the line any thing else I should mention
to make carrot2 V3.1.0 to process Chinese text properly.

Yes, that should work for version 3.1.0. If you switch to the newest version (3.3.0), here's a complete example of clustering non-English content:

http://fisheye3.atlassian.com/browse/carrot2/trunk/applications/carrot2-examples/src/org/carrot2/examples/clustering/ClusteringNonEnglishContent.java?r=HEAD

S.


------------------------------------------------------------------------------


_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers