Number of clusters and number of input documents

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Number of clusters and number of input documents

Gaurang Patel
Hi,

I modified the carrot-examples.ClusteringDataFromDocumentSources to make it work on Google Search results. I used GoogleDocumentSource.class as source of documents.

I have a list of questions as follows:

1) Everytime I am getting either 15/16/17 clusters, no matter what the query term is. Does this depend on some sort of parameter? How can I achieve the clustering engine return various number of clusters depending on the input documents. Does the number of clusters depend only on the number of input documents?

2) How can I know how many documents(snippets) are being considered in the clustering run? What does the following set? attributes.put(AttributeNames.RESULTS, 1000);


Regards,
Gaurang Patel
Reply | Threaded
Open this post in threaded view
|

Re: Number of clusters and number of input documents

Dawid Weiss-2
> 1) Everytime I am getting either 15/16/17 clusters, no matter what the query
> term is. Does this depend on some sort of parameter? How can I achieve the

Each algorithm has its own parameters. STC does have a fixed maximum
set of clusters to return. Look at the documentation of attributes for
 each algorithm.

> documents. Does the number of clusters depend only on the number of input
> documents?

No, it depends on their content too.

> 2) How can I know how many documents(snippets) are being considered in the
> clustering run? What does the following set?
> attributes.put(AttributeNames.RESULTS, 1000);

This is the maximum cap for the number of documents to retrieve from a
document source. Google will not return this many documents anyway
(I think their maximum is much, much lower).

Dawid

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers