Clustering Lucene Results

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Clustering Lucene Results

Hider, Sandy
Clustering Lucene Results

Hi All,
I read in the FAQ that

Can I perform clustering without specifying the query?
Yes. While the query is usually very helpful to get rid of the obvious meanings related to the documents in the search results set, it is not obligatory -- the clustering algorithms will cope without the query.

I am trying to do something similar to this.  I have hits from several Lucene queries which I am already keeping track of in my application.  I would like Carrot2 to categorize them and do so without needing a query. I am trying to figure out the best way to do this.

Here is what I came up with: 

Extend LuceneLocalInputComponent and override pushResults()
        Make pushResults() create RawDocuments from my current data set
        And not pass a query variable to the LocalFilterComponent.

Will this work?

Thanks in advance,

Sandy



       


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Clustering Lucene Results

Dawid Weiss-2

I assume you want to do it by modifying the code (because a much easier way is
to use the DCS and simply push the documents to be clustered to it). If you want
to do it from the code, then look at the direct document feed example:

http://fisheye3.atlassian.com/browse/carrot2/trunk/carrot2/applications/carrot2-demo-api-example/src/org/carrot2/apiexample/DirectDocumentFeedExample.java?r=HEAD

Dawid

Hider, Sandy wrote:

> Hi All,
> I read in the FAQ that
>
> Can I perform clustering without specifying the query?
> Yes. While the query is usually very helpful to get rid of the obvious
> meanings related to the documents in the search results set, it is not
> obligatory -- the clustering algorithms will cope without the query.
>
> I am trying to do something similar to this.  I have hits from several
> Lucene queries which I am already keeping track of in my application.  I
> would like Carrot2 to categorize them and do so without needing a query.
> I am trying to figure out the best way to do this.
>
> Here is what I came up with:  
>
> Extend LuceneLocalInputComponent and override pushResults()
> Make pushResults() create RawDocuments from my current data set
> And not pass a query variable to the LocalFilterComponent.
>
> Will this work?
>
> Thanks in advance,
>
> Sandy
>
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Clustering Lucene Results

bradpitt
Entering *:* brings back all the documents from the Lucene index.

However, it brings back only the titles (no snippets). Why?

Unlike, looking for the term in title and the content fields, it appears that the clusters are currently formed by just using the words from the titles. I haven't specified title or any other field as the "default" search field  while building the index.

Carrot simply asks me for the title, content and URL field in the source_lucene_attributes.xml file.

Now that I am getting back all the documents what would be the best way to get Carrot to cluster on the title AND content without using a query term?







Reply | Threaded
Open this post in threaded view
|

Re: Clustering Lucene Results

Dawid Weiss-2
Because your Lucene field names must match the default configuration
in Carrot2? Look at this example to see how custom field names can be
handled:

ClusteringDataFromLuceneWithCustomFields.java

Dawid

On Fri, Jul 9, 2010 at 10:54 PM, bradpitt <[hidden email]> wrote:

>
> Entering *:* brings back all the documents from the Lucene index.
>
> However, it brings back only the titles (no snippets). Why?
>
> Unlike, looking for the term in title and the content fields, it appears
> that the clusters are currently formed by just using the words from the
> titles. I haven't specified title or any other field as the "default" search
> field  while building the index.
>
> Carrot simply asks me for the title, content and URL field in the
> source_lucene_attributes.xml file.
>
> Now that I am getting back all the documents what would be the best way to
> get Carrot to cluster on the title AND content without using a query term?
>
>
>
>
>
>
>
>
> --
> View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Clustering-Lucene-Results-tp639484p5275959.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers