[carrot2] Simple Doubt

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[carrot2] Simple Doubt

cccefet
This post has NOT been accepted by the mailing list yet.
I started a little while with a carrot and I'm doubtful that I think is simple.
I realized that when clustering documents that appear clustered in more than one cluster.
How can I make this not happen? I want each document belongs to one and only one cluster.
Reply | Threaded
Open this post in threaded view
|

Re: [carrot2] Simple Doubt

cccefet
This post has NOT been accepted by the mailing list yet.
Ops, i forgot .This is my cluster method:

public List<Cluster> clustering(Bloco bloco) {
               
                StringTokenizer st = new StringTokenizer(bloco.getTextos().toString(),BlocoMySqlDao.SEPARATOR);
               
                ArrayList<Document> documents = new ArrayList<Document>();
               
                while (st.hasMoreTokens()) {
                        String s = st.nextToken();
                        documents.add(new Document(null,s,LanguageCode.PORTUGUESE));
                }
               
                /* A controller to manage the processing pipeline. */
                Controller controller = ControllerFactory.createSimple();

               
                /* Input data for clustering, list of Documents in this case. */
                Map<String, Object> attributes = new HashMap<String, Object>();
                attributes.put(AttributeNames.DOCUMENTS, documents);
                 
                /* Perform clustering */
                ProcessingResult result = controller.process(attributes, LingoClusteringAlgorithm.class);
                 
                /* Clusters created by Carrot2. */
                List<Cluster> clusters = result.getClusters();

                return clusters;
        }
Reply | Threaded
Open this post in threaded view
|

Re: [carrot2] Simple Doubt

cccefet
someone??
please, i really need help.
=(
Reply | Threaded
Open this post in threaded view
|

Re: [carrot2] Simple Doubt

Stanislaw Osinski
Administrator
someone??
please, i really need help.

Hi there,

Your previous messages did not reach us because you didn't sign up for the mailing list. A question similar to yours has already been asked and answered e.g. here:


Cheers,

Staszek

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: [carrot2] Simple Doubt

cccefet
Hi man, first thanks 4 your answer.
But this link is this forun , is not ?

http://project.carrot2.org/forum.html#nabble-td5504564%7Ca5506345

Here my messages again:

I started a little while with a carrot and I'm doubtful that I think is simple.
I realized that when clustering documents that appear clustered in more than one cluster.
How can I make this not happen? I want each document belongs to one and only one cluster.

Ops, i forgot .This is my cluster method:

public List<Cluster> clustering(Bloco bloco) {
               
                StringTokenizer st = new StringTokenizer(bloco.getTextos().toString(),BlocoMySqlDao.SEPARATOR);
               
                ArrayList<Document> documents = new ArrayList<Document>();
               
                while (st.hasMoreTokens()) {
                        String s = st.nextToken();
                        documents.add(new Document(null,s,LanguageCode.PORTUGUESE));
                }
               
                /* A controller to manage the processing pipeline. */
                Controller controller = ControllerFactory.createSimple();

               
                /* Input data for clustering, list of Documents in this case. */
                Map<String, Object> attributes = new HashMap<String, Object>();
                attributes.put(AttributeNames.DOCUMENTS, documents);
                 
                /* Perform clustering */
                ProcessingResult result = controller.process(attributes, LingoClusteringAlgorithm.class);
                 
                /* Clusters created by Carrot2. */
                List<Cluster> clusters = result.getClusters();

                return clusters;
        }
Reply | Threaded
Open this post in threaded view
|

Re: [carrot2] Simple Doubt

Stanislaw Osinski
Administrator
Hi man, first thanks 4 your answer.
But this link is this forun , is not ?
Here my messages again:

I started a little while with a carrot and I'm doubtful that I think is
simple.
I realized that when clustering documents that appear clustered in more than
one cluster.
How can I make this not happen? I want each document belongs to one and only
one cluster.

Ops, i forgot .This is my cluster method:

public List<Cluster> clustering(Bloco bloco) {

               StringTokenizer st = new
StringTokenizer(bloco.getTextos().toString(),BlocoMySqlDao.SEPARATOR);

               ArrayList<Document> documents = new ArrayList<Document>();

               while (st.hasMoreTokens()) {
                       String s = st.nextToken();
                       documents.add(new
Document(null,s,LanguageCode.PORTUGUESE));
               }

               /* A controller to manage the processing pipeline. */
               Controller controller = ControllerFactory.createSimple();


               /* Input data for clustering, list of Documents in this
case. */
               Map<String, Object> attributes = new HashMap<String,
Object>();
               attributes.put(AttributeNames.DOCUMENTS, documents);

               /* Perform clustering */
               ProcessingResult result = controller.process(attributes,
LingoClusteringAlgorithm.class);

               /* Clusters created by Carrot2. */
               List<Cluster> clusters = result.getClusters();

               return clusters;
       }

A question similar to yours has already been asked and answered e.g. here:


Cheers,

Staszek


------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: [carrot2] Simple Doubt

cccefet
dude, sorry for my ignorance, I am inexperienced and perhaps not understanding what you mean, but you pasting this link to this forun.
if you could paste the link forun with the answer, or better yet, tell me how to configure this in the carrot would be helpful.

Again thanks for the reply and sorry for the inconvenience.
Reply | Threaded
Open this post in threaded view
|

Re: [carrot2] Simple Doubt

Stanislaw Osinski
Administrator
dude, sorry for my ignorance, I am inexperienced and perhaps not
understanding what you mean, but you pasting this link to this forun.

Ooops, my bad, sorry. Here's the link I wanted to paste:


The above forum thread contains the answer. 


S.

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: [carrot2] Simple Doubt

cccefet
I would first like to thank your help, I managed to do more or less what I wanted, tracking down the code that I added before returning the clusters:

/ ** Removing common documents ** /
for(Cluster cluster: clusters ) {
                        for ( Document document: cluster.getAllDocuments()) {
                                for (Cluster auxCluster: clusters) {
                                        if (!cluster.equals(auxCluster)) {
                                                if ( auxCluster.getAllDocuments().contains(document) ) {
                                                        auxCluster.getAllDocuments().remove(document);
                                                }
                                        }
                                }
                        }
                }


How can I see a common removal until the cluster that appears first in the document remains with him.
I read in another post that the ideal would be to go removing according to the similarity between the cluster and the document. So there goes my second question:

How do I find this value, the score, the value of similarity between the cluster and document?

Again thanks for your patience with a beginner like me.
Reply | Threaded
Open this post in threaded view
|

Re: [carrot2] Simple Doubt

Stanislaw Osinski
Administrator
I would first like to thank your help, I managed to do more or less what I
wanted, tracking down the code that I added before returning the clusters:

/ ** Removing common documents ** /
for(Cluster cluster: clusters ) {
                       for ( Document document: cluster.getAllDocuments()) {
                               for (Cluster auxCluster: clusters) {
                                       if (!cluster.equals(auxCluster)) {
                                               if ( auxCluster.getAllDocuments().contains(document) ) {
                                                       auxCluster.getAllDocuments().remove(document);
                                               }
                                       }
                               }
                       }
               }

Looks correct, though it could probably be implemented more efficiently with some sort of document -> clusters index / map structure.
 
How can I see a common removal until the cluster that appears first in the
document remains with him.

I don't follow this bit, I'm afraid.
 
I read in another post that the ideal would be to go removing according to
the similarity between the cluster and the document. So there goes my second
question:

How do I find this value, the score, the value of similarity between the
cluster and document?

The cluster--document similarity in Carrot2 is very straightforward: the cluster's document must contain all / most of cluster's labels (though some exceptions may occur due to cluster merging). As a result, Carrot's built-in similarity measure would not discriminate the different document assignments very well for you. What I would try instead would be keeping the document in the highest-scoring cluster and discard from the other clusters.

Cheers,

Staszek

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers