Clustering only shows "Other Topics"

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Clustering only shows "Other Topics"

spraja
Hi, I am using carrot2 version 3.5.1 with Lucene 3.1.0.

While searching with Lucene works fine, the Carrot2Workbench only shows "Other Topics" only for any search term(s). I do have about 30 pdf documents indexed and they cover topics including energy, information retrieval etc.

Please guide me if this is an acceptable behavior. Thanks a lot.

Cheers,
Raja
Reply | Threaded
Open this post in threaded view
|

Re: Clustering only shows "Other Topics"

Stanislaw Osinski
Administrator
Hi,

Make sure you choose the right fields in the "Document content field" and "Document title field" combo boxes (http://download.carrot2.org/head/manual/#section.getting-started.lucene). Also, the fields you choose must be stored fields (in Lucene terms).

Cheers,

Staszek

On Tue, Jul 26, 2011 at 18:15, spraja <[hidden email]> wrote:
Hi, I am using carrot2 version 3.5.1 with Lucene 3.1.0.

While searching with Lucene works fine, the Carrot2Workbench only shows
"Other Topics" only for any search term(s). I do have about 30 pdf documents
indexed and they cover topics including energy, information retrieval etc.

Please guide me if this is an acceptable behavior. Thanks a lot.

Cheers,
Raja

--
View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Clustering-only-shows-Other-Topics-tp6622792p6622792.html
Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers


------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Clustering only shows "Other Topics"

spraja
Hi Staszek,

That worked like magic.  Thanks! You are the mannnnn ! :)

For the sake of other beginners like me - I used Apache PDFBox to index pdf files. The LucenePDFDocument object will read all the metadata and content of the file and produce the Lucene Document object in a single convenience method - LucenePDFDocument.getDocument( file ). But it natively does not store the "contents" in the field. I had to change that part in the source code of LucenePDFDocument to force it to store the contents.

Cheers,
Raja