Using the Workbench with Lucene

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Using the Workbench with Lucene

GilbertoFragoso
This post has NOT been accepted by the mailing list yet.
Hi,

I downloaded Carrot2 Workbench 3.5.2 (win 32 version) and I'm attempting to use it to cluster using Lucene.  For the tests I've tried Lucene 3.4.0 and 3.0.3, indexing 100 sample files with an IndexFiles demo program included in the distribution.  The demo program creates documents for indexing with three fields, "path", "modified", and "contents", and I tested that the resulting indexes are searchable.  

To set the Workbench to use Lucene, I've pretty much followed http://download.carrot2.org/head/manual/#section.getting-started.lucene, but when I hit "Process" I get the error:

    Processing error: Attribute binding failed: Could not get field value
    org.carrot2.source.lucene.LuceneDocumentSource#analyzer
    Attribute binding failed: Could not get field value org.carrot2.source.lucene.LuceneDocumentSource#analyzer

I've tried using different clustering algorithms and different entries in the SimpleFieldMapper section of the Search tab, but keep getting the same error.  Can you suggest some settings for this section?  Or do you think it could be something else, maybe a Lucene incompatibility with this version of the Workbench?  I can post/email the source for the java files as well as screenshots of the workbench.  Please let me know how to proceed.

thanks,
Gilberto
Reply | Threaded
Open this post in threaded view
|

Using the Workbench with Lucene

GilbertoFragoso
Hi,

I downloaded Carrot2 Workbench 3.5.2 (win 32 version) and I'm attempting to
use it to cluster Lucene-indexed docs.  For the tests I've tried Lucene 3.4.0 and
3.0.3, indexing 100 sample files with an IndexFiles demo program included in
the distribution.  The demo program creates documents for indexing with
three fields, "path", "modified", and "contents", and I tested that the
resulting indexes are searchable.

To set the Workbench to use Lucene, I've pretty much followed
http://download.carrot2.org/head/manual/#section.getting-started.lucene, but
when I hit "Process" I get the error:

    Processing error: Attribute binding failed: Could not get field value
    org.carrot2.source.lucene.LuceneDocumentSource#analyzer
    Attribute binding failed: Could not get field value org.carrot2.source.lucene.LuceneDocumentSource#analyzer

I've tried using different clustering algorithms and different entries in
the SimpleFieldMapper section of the Search tab, but keep getting the same
error.  Can you suggest some settings for this section?  Or do you think it
could be something else, maybe a Lucene incompatibility with this version of
the Workbench?  I can post/email the source for the java files as well as
screenshots of the workbench.  Please let me know how to proceed.

thanks,
Gilberto


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Using the Workbench with Lucene

Dawid Weiss-2
Hi Gilberto,

Workbench 3.5.2 comes with built-in Lucene 3.1.0 - I wonder if this
can affect what you're observing. Can you try with this exact version?

We should be upgrading to the newest Lucene too, I filed an issue for this here:
http://issues.carrot2.org/browse/CARROT-868

Anyway, try with Lucene 3.1.0, please. If it doesn't work, let me know. Thanks,

Dawid


On Thu, Oct 6, 2011 at 3:19 AM, Fragoso, Gilberto (NIH/NCI) [E]
<[hidden email]> wrote:

> Hi,
>
> I downloaded Carrot2 Workbench 3.5.2 (win 32 version) and I'm attempting to
> use it to cluster Lucene-indexed docs.  For the tests I've tried Lucene 3.4.0 and
> 3.0.3, indexing 100 sample files with an IndexFiles demo program included in
> the distribution.  The demo program creates documents for indexing with
> three fields, "path", "modified", and "contents", and I tested that the
> resulting indexes are searchable.
>
> To set the Workbench to use Lucene, I've pretty much followed
> http://download.carrot2.org/head/manual/#section.getting-started.lucene, but
> when I hit "Process" I get the error:
>
>    Processing error: Attribute binding failed: Could not get field value
>    org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>    Attribute binding failed: Could not get field value org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>
> I've tried using different clustering algorithms and different entries in
> the SimpleFieldMapper section of the Search tab, but keep getting the same
> error.  Can you suggest some settings for this section?  Or do you think it
> could be something else, maybe a Lucene incompatibility with this version of
> the Workbench?  I can post/email the source for the java files as well as
> screenshots of the workbench.  Please let me know how to proceed.
>
> thanks,
> Gilberto
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Using the Workbench with Lucene

GilbertoFragoso
Hi Dawid,

I tried Lucene 3.1.0 and it didn't work, the error message is the same one as before.

I noticed that Carrot2's LuceneDocumentSource was using Version.LUCENE_30 for the StandardAnalyzer, so I set my indexer test program to the same (the search test program I'm using requires the same version).  In the Workbench, I tested the SimpleFieldMapper section in the Search tab with values context fragments = 1 through 3, content field = contents, title field = path, url field = path.  

Thanks,
Gilberto

-----Original Message-----
From: Dawid Weiss [mailto:[hidden email]]
Sent: Thursday, October 06, 2011 3:52 AM
To: Carrot2-developers
Subject: Re: [C2-devel] Using the Workbench with Lucene

Hi Gilberto,

Workbench 3.5.2 comes with built-in Lucene 3.1.0 - I wonder if this
can affect what you're observing. Can you try with this exact version?

We should be upgrading to the newest Lucene too, I filed an issue for this here:
http://issues.carrot2.org/browse/CARROT-868

Anyway, try with Lucene 3.1.0, please. If it doesn't work, let me know. Thanks,

Dawid


On Thu, Oct 6, 2011 at 3:19 AM, Fragoso, Gilberto (NIH/NCI) [E]
<[hidden email]> wrote:

> Hi,
>
> I downloaded Carrot2 Workbench 3.5.2 (win 32 version) and I'm attempting to
> use it to cluster Lucene-indexed docs.  For the tests I've tried Lucene 3.4.0 and
> 3.0.3, indexing 100 sample files with an IndexFiles demo program included in
> the distribution.  The demo program creates documents for indexing with
> three fields, "path", "modified", and "contents", and I tested that the
> resulting indexes are searchable.
>
> To set the Workbench to use Lucene, I've pretty much followed
> http://download.carrot2.org/head/manual/#section.getting-started.lucene, but
> when I hit "Process" I get the error:
>
>    Processing error: Attribute binding failed: Could not get field value
>    org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>    Attribute binding failed: Could not get field value org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>
> I've tried using different clustering algorithms and different entries in
> the SimpleFieldMapper section of the Search tab, but keep getting the same
> error.  Can you suggest some settings for this section?  Or do you think it
> could be something else, maybe a Lucene incompatibility with this version of
> the Workbench?  I can post/email the source for the java files as well as
> screenshots of the workbench.  Please let me know how to proceed.
>
> thanks,
> Gilberto
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Using the Workbench with Lucene

Dawid Weiss-2
Ok, thanks. I'll see what's going on.

Dawid

On Thu, Oct 6, 2011 at 7:22 PM, Fragoso, Gilberto (NIH/NCI) [E]
<[hidden email]> wrote:

> Hi Dawid,
>
> I tried Lucene 3.1.0 and it didn't work, the error message is the same one as before.
>
> I noticed that Carrot2's LuceneDocumentSource was using Version.LUCENE_30 for the StandardAnalyzer, so I set my indexer test program to the same (the search test program I'm using requires the same version).  In the Workbench, I tested the SimpleFieldMapper section in the Search tab with values context fragments = 1 through 3, content field = contents, title field = path, url field = path.
>
> Thanks,
> Gilberto
>
> -----Original Message-----
> From: Dawid Weiss [mailto:[hidden email]]
> Sent: Thursday, October 06, 2011 3:52 AM
> To: Carrot2-developers
> Subject: Re: [C2-devel] Using the Workbench with Lucene
>
> Hi Gilberto,
>
> Workbench 3.5.2 comes with built-in Lucene 3.1.0 - I wonder if this
> can affect what you're observing. Can you try with this exact version?
>
> We should be upgrading to the newest Lucene too, I filed an issue for this here:
> http://issues.carrot2.org/browse/CARROT-868
>
> Anyway, try with Lucene 3.1.0, please. If it doesn't work, let me know. Thanks,
>
> Dawid
>
>
> On Thu, Oct 6, 2011 at 3:19 AM, Fragoso, Gilberto (NIH/NCI) [E]
> <[hidden email]> wrote:
>> Hi,
>>
>> I downloaded Carrot2 Workbench 3.5.2 (win 32 version) and I'm attempting to
>> use it to cluster Lucene-indexed docs.  For the tests I've tried Lucene 3.4.0 and
>> 3.0.3, indexing 100 sample files with an IndexFiles demo program included in
>> the distribution.  The demo program creates documents for indexing with
>> three fields, "path", "modified", and "contents", and I tested that the
>> resulting indexes are searchable.
>>
>> To set the Workbench to use Lucene, I've pretty much followed
>> http://download.carrot2.org/head/manual/#section.getting-started.lucene, but
>> when I hit "Process" I get the error:
>>
>>    Processing error: Attribute binding failed: Could not get field value
>>    org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>>    Attribute binding failed: Could not get field value org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>>
>> I've tried using different clustering algorithms and different entries in
>> the SimpleFieldMapper section of the Search tab, but keep getting the same
>> error.  Can you suggest some settings for this section?  Or do you think it
>> could be something else, maybe a Lucene incompatibility with this version of
>> the Workbench?  I can post/email the source for the java files as well as
>> screenshots of the workbench.  Please let me know how to proceed.
>>
>> thanks,
>> Gilberto
>>
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2dcopy1
>> _______________________________________________
>> Carrot2-developers mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>
>>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Using the Workbench with Lucene

Dawid Weiss-2
Gilberto,

The problem is with the version used. I've just tried indexing using
Lucene 3.0.3 and Workbench works fine. It does have problems
reading newer indexes though. What is it that you're trying to
achieve? I see a few options:

1) use Lucene 3.0.3 to index your files and then Workbench to open the index,
2) use Lucene in whatever version you like and feed documents to
either DCS or programmatically to Carrot2 API.
3) index your documents using SOLR and use Workbench's Solr source
(should work regardless of the version used),

I will update the Lucene version used in our master branch tomorrow,
so if you can wait, fetch the updated binaries tomorrow. If you watch
the Jira issue I mentioned, you'll know once it's fixed.

Dawid

On Thu, Oct 6, 2011 at 8:25 PM, Dawid Weiss
<[hidden email]> wrote:

> Ok, thanks. I'll see what's going on.
>
> Dawid
>
> On Thu, Oct 6, 2011 at 7:22 PM, Fragoso, Gilberto (NIH/NCI) [E]
> <[hidden email]> wrote:
>> Hi Dawid,
>>
>> I tried Lucene 3.1.0 and it didn't work, the error message is the same one as before.
>>
>> I noticed that Carrot2's LuceneDocumentSource was using Version.LUCENE_30 for the StandardAnalyzer, so I set my indexer test program to the same (the search test program I'm using requires the same version).  In the Workbench, I tested the SimpleFieldMapper section in the Search tab with values context fragments = 1 through 3, content field = contents, title field = path, url field = path.
>>
>> Thanks,
>> Gilberto
>>
>> -----Original Message-----
>> From: Dawid Weiss [mailto:[hidden email]]
>> Sent: Thursday, October 06, 2011 3:52 AM
>> To: Carrot2-developers
>> Subject: Re: [C2-devel] Using the Workbench with Lucene
>>
>> Hi Gilberto,
>>
>> Workbench 3.5.2 comes with built-in Lucene 3.1.0 - I wonder if this
>> can affect what you're observing. Can you try with this exact version?
>>
>> We should be upgrading to the newest Lucene too, I filed an issue for this here:
>> http://issues.carrot2.org/browse/CARROT-868
>>
>> Anyway, try with Lucene 3.1.0, please. If it doesn't work, let me know. Thanks,
>>
>> Dawid
>>
>>
>> On Thu, Oct 6, 2011 at 3:19 AM, Fragoso, Gilberto (NIH/NCI) [E]
>> <[hidden email]> wrote:
>>> Hi,
>>>
>>> I downloaded Carrot2 Workbench 3.5.2 (win 32 version) and I'm attempting to
>>> use it to cluster Lucene-indexed docs.  For the tests I've tried Lucene 3.4.0 and
>>> 3.0.3, indexing 100 sample files with an IndexFiles demo program included in
>>> the distribution.  The demo program creates documents for indexing with
>>> three fields, "path", "modified", and "contents", and I tested that the
>>> resulting indexes are searchable.
>>>
>>> To set the Workbench to use Lucene, I've pretty much followed
>>> http://download.carrot2.org/head/manual/#section.getting-started.lucene, but
>>> when I hit "Process" I get the error:
>>>
>>>    Processing error: Attribute binding failed: Could not get field value
>>>    org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>>>    Attribute binding failed: Could not get field value org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>>>
>>> I've tried using different clustering algorithms and different entries in
>>> the SimpleFieldMapper section of the Search tab, but keep getting the same
>>> error.  Can you suggest some settings for this section?  Or do you think it
>>> could be something else, maybe a Lucene incompatibility with this version of
>>> the Workbench?  I can post/email the source for the java files as well as
>>> screenshots of the workbench.  Please let me know how to proceed.
>>>
>>> thanks,
>>> Gilberto
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure contains a
>>> definitive record of customers, application performance, security
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> sense of it. Business sense. IT sense. Common sense.
>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>> _______________________________________________
>>> Carrot2-developers mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>>
>>>
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2dcopy1
>> _______________________________________________
>> Carrot2-developers mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2dcopy1
>> _______________________________________________
>> Carrot2-developers mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>
>>
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Using the Workbench with Lucene

GilbertoFragoso
Dawid,

No luck with lucene 3.0.3.  Plus modified my test indexer so the fields would be "snippet", "url", "title", and including "path" as well.  The resulting index is searchable and the test search program returns the correct values for the various fields (excluding "snippet" from report).  But setting the workbench to use these fields gave me the same error as before.  I could try with 3.4 when you update the binaries.  It would take me a few days to set up SOLR but might have to go there.  Would you mind me sending you the test indexer source (maybe you'd find what I'm doing wrong in there)?

Thanks,
Gilberto

-----Original Message-----
From: Dawid Weiss [mailto:[hidden email]]
Sent: Thursday, October 06, 2011 2:53 PM
To: Carrot2-developers
Subject: Re: [C2-devel] Using the Workbench with Lucene

Gilberto,

The problem is with the version used. I've just tried indexing using
Lucene 3.0.3 and Workbench works fine. It does have problems
reading newer indexes though. What is it that you're trying to
achieve? I see a few options:

1) use Lucene 3.0.3 to index your files and then Workbench to open the index,
2) use Lucene in whatever version you like and feed documents to
either DCS or programmatically to Carrot2 API.
3) index your documents using SOLR and use Workbench's Solr source
(should work regardless of the version used),

I will update the Lucene version used in our master branch tomorrow,
so if you can wait, fetch the updated binaries tomorrow. If you watch
the Jira issue I mentioned, you'll know once it's fixed.

Dawid

On Thu, Oct 6, 2011 at 8:25 PM, Dawid Weiss
<[hidden email]> wrote:

> Ok, thanks. I'll see what's going on.
>
> Dawid
>
> On Thu, Oct 6, 2011 at 7:22 PM, Fragoso, Gilberto (NIH/NCI) [E]
> <[hidden email]> wrote:
>> Hi Dawid,
>>
>> I tried Lucene 3.1.0 and it didn't work, the error message is the same one as before.
>>
>> I noticed that Carrot2's LuceneDocumentSource was using Version.LUCENE_30 for the StandardAnalyzer, so I set my indexer test program to the same (the search test program I'm using requires the same version).  In the Workbench, I tested the SimpleFieldMapper section in the Search tab with values context fragments = 1 through 3, content field = contents, title field = path, url field = path.
>>
>> Thanks,
>> Gilberto
>>
>> -----Original Message-----
>> From: Dawid Weiss [mailto:[hidden email]]
>> Sent: Thursday, October 06, 2011 3:52 AM
>> To: Carrot2-developers
>> Subject: Re: [C2-devel] Using the Workbench with Lucene
>>
>> Hi Gilberto,
>>
>> Workbench 3.5.2 comes with built-in Lucene 3.1.0 - I wonder if this
>> can affect what you're observing. Can you try with this exact version?
>>
>> We should be upgrading to the newest Lucene too, I filed an issue for this here:
>> http://issues.carrot2.org/browse/CARROT-868
>>
>> Anyway, try with Lucene 3.1.0, please. If it doesn't work, let me know. Thanks,
>>
>> Dawid
>>
>>
>> On Thu, Oct 6, 2011 at 3:19 AM, Fragoso, Gilberto (NIH/NCI) [E]
>> <[hidden email]> wrote:
>>> Hi,
>>>
>>> I downloaded Carrot2 Workbench 3.5.2 (win 32 version) and I'm attempting to
>>> use it to cluster Lucene-indexed docs.  For the tests I've tried Lucene 3.4.0 and
>>> 3.0.3, indexing 100 sample files with an IndexFiles demo program included in
>>> the distribution.  The demo program creates documents for indexing with
>>> three fields, "path", "modified", and "contents", and I tested that the
>>> resulting indexes are searchable.
>>>
>>> To set the Workbench to use Lucene, I've pretty much followed
>>> http://download.carrot2.org/head/manual/#section.getting-started.lucene, but
>>> when I hit "Process" I get the error:
>>>
>>>    Processing error: Attribute binding failed: Could not get field value
>>>    org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>>>    Attribute binding failed: Could not get field value org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>>>
>>> I've tried using different clustering algorithms and different entries in
>>> the SimpleFieldMapper section of the Search tab, but keep getting the same
>>> error.  Can you suggest some settings for this section?  Or do you think it
>>> could be something else, maybe a Lucene incompatibility with this version of
>>> the Workbench?  I can post/email the source for the java files as well as
>>> screenshots of the workbench.  Please let me know how to proceed.
>>>
>>> thanks,
>>> Gilberto
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure contains a
>>> definitive record of customers, application performance, security
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> sense of it. Business sense. IT sense. Common sense.
>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>> _______________________________________________
>>> Carrot2-developers mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>>
>>>
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2dcopy1
>> _______________________________________________
>> Carrot2-developers mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2dcopy1
>> _______________________________________________
>> Carrot2-developers mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>
>>
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Using the Workbench with Lucene

Dawid Weiss-2
Yes -- please send me the source code of the indexer. And a ZIP of a
small index that causes this, if you can. Thanks!

Dawid

On Thu, Oct 6, 2011 at 10:06 PM, Fragoso, Gilberto (NIH/NCI) [E]
<[hidden email]> wrote:

> Dawid,
>
> No luck with lucene 3.0.3.  Plus modified my test indexer so the fields would be "snippet", "url", "title", and including "path" as well.  The resulting index is searchable and the test search program returns the correct values for the various fields (excluding "snippet" from report).  But setting the workbench to use these fields gave me the same error as before.  I could try with 3.4 when you update the binaries.  It would take me a few days to set up SOLR but might have to go there.  Would you mind me sending you the test indexer source (maybe you'd find what I'm doing wrong in there)?
>
> Thanks,
> Gilberto
>
> -----Original Message-----
> From: Dawid Weiss [mailto:[hidden email]]
> Sent: Thursday, October 06, 2011 2:53 PM
> To: Carrot2-developers
> Subject: Re: [C2-devel] Using the Workbench with Lucene
>
> Gilberto,
>
> The problem is with the version used. I've just tried indexing using
> Lucene 3.0.3 and Workbench works fine. It does have problems
> reading newer indexes though. What is it that you're trying to
> achieve? I see a few options:
>
> 1) use Lucene 3.0.3 to index your files and then Workbench to open the index,
> 2) use Lucene in whatever version you like and feed documents to
> either DCS or programmatically to Carrot2 API.
> 3) index your documents using SOLR and use Workbench's Solr source
> (should work regardless of the version used),
>
> I will update the Lucene version used in our master branch tomorrow,
> so if you can wait, fetch the updated binaries tomorrow. If you watch
> the Jira issue I mentioned, you'll know once it's fixed.
>
> Dawid
>
> On Thu, Oct 6, 2011 at 8:25 PM, Dawid Weiss
> <[hidden email]> wrote:
>> Ok, thanks. I'll see what's going on.
>>
>> Dawid
>>
>> On Thu, Oct 6, 2011 at 7:22 PM, Fragoso, Gilberto (NIH/NCI) [E]
>> <[hidden email]> wrote:
>>> Hi Dawid,
>>>
>>> I tried Lucene 3.1.0 and it didn't work, the error message is the same one as before.
>>>
>>> I noticed that Carrot2's LuceneDocumentSource was using Version.LUCENE_30 for the StandardAnalyzer, so I set my indexer test program to the same (the search test program I'm using requires the same version).  In the Workbench, I tested the SimpleFieldMapper section in the Search tab with values context fragments = 1 through 3, content field = contents, title field = path, url field = path.
>>>
>>> Thanks,
>>> Gilberto
>>>
>>> -----Original Message-----
>>> From: Dawid Weiss [mailto:[hidden email]]
>>> Sent: Thursday, October 06, 2011 3:52 AM
>>> To: Carrot2-developers
>>> Subject: Re: [C2-devel] Using the Workbench with Lucene
>>>
>>> Hi Gilberto,
>>>
>>> Workbench 3.5.2 comes with built-in Lucene 3.1.0 - I wonder if this
>>> can affect what you're observing. Can you try with this exact version?
>>>
>>> We should be upgrading to the newest Lucene too, I filed an issue for this here:
>>> http://issues.carrot2.org/browse/CARROT-868
>>>
>>> Anyway, try with Lucene 3.1.0, please. If it doesn't work, let me know. Thanks,
>>>
>>> Dawid
>>>
>>>
>>> On Thu, Oct 6, 2011 at 3:19 AM, Fragoso, Gilberto (NIH/NCI) [E]
>>> <[hidden email]> wrote:
>>>> Hi,
>>>>
>>>> I downloaded Carrot2 Workbench 3.5.2 (win 32 version) and I'm attempting to
>>>> use it to cluster Lucene-indexed docs.  For the tests I've tried Lucene 3.4.0 and
>>>> 3.0.3, indexing 100 sample files with an IndexFiles demo program included in
>>>> the distribution.  The demo program creates documents for indexing with
>>>> three fields, "path", "modified", and "contents", and I tested that the
>>>> resulting indexes are searchable.
>>>>
>>>> To set the Workbench to use Lucene, I've pretty much followed
>>>> http://download.carrot2.org/head/manual/#section.getting-started.lucene, but
>>>> when I hit "Process" I get the error:
>>>>
>>>>    Processing error: Attribute binding failed: Could not get field value
>>>>    org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>>>>    Attribute binding failed: Could not get field value org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>>>>
>>>> I've tried using different clustering algorithms and different entries in
>>>> the SimpleFieldMapper section of the Search tab, but keep getting the same
>>>> error.  Can you suggest some settings for this section?  Or do you think it
>>>> could be something else, maybe a Lucene incompatibility with this version of
>>>> the Workbench?  I can post/email the source for the java files as well as
>>>> screenshots of the workbench.  Please let me know how to proceed.
>>>>
>>>> thanks,
>>>> Gilberto
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> All the data continuously generated in your IT infrastructure contains a
>>>> definitive record of customers, application performance, security
>>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>>> sense of it. Business sense. IT sense. Common sense.
>>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>>> _______________________________________________
>>>> Carrot2-developers mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>>>
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure contains a
>>> definitive record of customers, application performance, security
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> sense of it. Business sense. IT sense. Common sense.
>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>> _______________________________________________
>>> Carrot2-developers mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>>
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure contains a
>>> definitive record of customers, application performance, security
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> sense of it. Business sense. IT sense. Common sense.
>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>> _______________________________________________
>>> Carrot2-developers mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>>
>>>
>>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Using the Workbench with Lucene

GilbertoFragoso
Hi Dawid,

Here's the latest copy of the test indexer, I got rid of the "path" field as well.  I've cleaned up the non-3.0.3 stuff to make it more readable (it began life as the 3.4.0 demo indexer).  A .zip of the directory holding the index files is also included (with a .z extension to bypass our filters).

Thanks,
Gilberto

-----Original Message-----
From: Dawid Weiss [mailto:[hidden email]]
Sent: Thursday, October 06, 2011 4:34 PM
To: Carrot2-developers
Subject: Re: [C2-devel] Using the Workbench with Lucene

Yes -- please send me the source code of the indexer. And a ZIP of a
small index that causes this, if you can. Thanks!

Dawid

On Thu, Oct 6, 2011 at 10:06 PM, Fragoso, Gilberto (NIH/NCI) [E]
<[hidden email]> wrote:

> Dawid,
>
> No luck with lucene 3.0.3.  Plus modified my test indexer so the fields would be "snippet", "url", "title", and including "path" as well.  The resulting index is searchable and the test search program returns the correct values for the various fields (excluding "snippet" from report).  But setting the workbench to use these fields gave me the same error as before.  I could try with 3.4 when you update the binaries.  It would take me a few days to set up SOLR but might have to go there.  Would you mind me sending you the test indexer source (maybe you'd find what I'm doing wrong in there)?
>
> Thanks,
> Gilberto
>
> -----Original Message-----
> From: Dawid Weiss [mailto:[hidden email]]
> Sent: Thursday, October 06, 2011 2:53 PM
> To: Carrot2-developers
> Subject: Re: [C2-devel] Using the Workbench with Lucene
>
> Gilberto,
>
> The problem is with the version used. I've just tried indexing using
> Lucene 3.0.3 and Workbench works fine. It does have problems
> reading newer indexes though. What is it that you're trying to
> achieve? I see a few options:
>
> 1) use Lucene 3.0.3 to index your files and then Workbench to open the index,
> 2) use Lucene in whatever version you like and feed documents to
> either DCS or programmatically to Carrot2 API.
> 3) index your documents using SOLR and use Workbench's Solr source
> (should work regardless of the version used),
>
> I will update the Lucene version used in our master branch tomorrow,
> so if you can wait, fetch the updated binaries tomorrow. If you watch
> the Jira issue I mentioned, you'll know once it's fixed.
>
> Dawid
>
> On Thu, Oct 6, 2011 at 8:25 PM, Dawid Weiss
> <[hidden email]> wrote:
>> Ok, thanks. I'll see what's going on.
>>
>> Dawid
>>
>> On Thu, Oct 6, 2011 at 7:22 PM, Fragoso, Gilberto (NIH/NCI) [E]
>> <[hidden email]> wrote:
>>> Hi Dawid,
>>>
>>> I tried Lucene 3.1.0 and it didn't work, the error message is the same one as before.
>>>
>>> I noticed that Carrot2's LuceneDocumentSource was using Version.LUCENE_30 for the StandardAnalyzer, so I set my indexer test program to the same (the search test program I'm using requires the same version).  In the Workbench, I tested the SimpleFieldMapper section in the Search tab with values context fragments = 1 through 3, content field = contents, title field = path, url field = path.
>>>
>>> Thanks,
>>> Gilberto
>>>
>>> -----Original Message-----
>>> From: Dawid Weiss [mailto:[hidden email]]
>>> Sent: Thursday, October 06, 2011 3:52 AM
>>> To: Carrot2-developers
>>> Subject: Re: [C2-devel] Using the Workbench with Lucene
>>>
>>> Hi Gilberto,
>>>
>>> Workbench 3.5.2 comes with built-in Lucene 3.1.0 - I wonder if this
>>> can affect what you're observing. Can you try with this exact version?
>>>
>>> We should be upgrading to the newest Lucene too, I filed an issue for this here:
>>> http://issues.carrot2.org/browse/CARROT-868
>>>
>>> Anyway, try with Lucene 3.1.0, please. If it doesn't work, let me know. Thanks,
>>>
>>> Dawid
>>>
>>>
>>> On Thu, Oct 6, 2011 at 3:19 AM, Fragoso, Gilberto (NIH/NCI) [E]
>>> <[hidden email]> wrote:
>>>> Hi,
>>>>
>>>> I downloaded Carrot2 Workbench 3.5.2 (win 32 version) and I'm attempting to
>>>> use it to cluster Lucene-indexed docs.  For the tests I've tried Lucene 3.4.0 and
>>>> 3.0.3, indexing 100 sample files with an IndexFiles demo program included in
>>>> the distribution.  The demo program creates documents for indexing with
>>>> three fields, "path", "modified", and "contents", and I tested that the
>>>> resulting indexes are searchable.
>>>>
>>>> To set the Workbench to use Lucene, I've pretty much followed
>>>> http://download.carrot2.org/head/manual/#section.getting-started.lucene, but
>>>> when I hit "Process" I get the error:
>>>>
>>>>    Processing error: Attribute binding failed: Could not get field value
>>>>    org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>>>>    Attribute binding failed: Could not get field value org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>>>>
>>>> I've tried using different clustering algorithms and different entries in
>>>> the SimpleFieldMapper section of the Search tab, but keep getting the same
>>>> error.  Can you suggest some settings for this section?  Or do you think it
>>>> could be something else, maybe a Lucene incompatibility with this version of
>>>> the Workbench?  I can post/email the source for the java files as well as
>>>> screenshots of the workbench.  Please let me know how to proceed.
>>>>
>>>> thanks,
>>>> Gilberto
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> All the data continuously generated in your IT infrastructure contains a
>>>> definitive record of customers, application performance, security
>>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>>> sense of it. Business sense. IT sense. Common sense.
>>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>>> _______________________________________________
>>>> Carrot2-developers mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>>>
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure contains a
>>> definitive record of customers, application performance, security
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> sense of it. Business sense. IT sense. Common sense.
>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>> _______________________________________________
>>> Carrot2-developers mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>>
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure contains a
>>> definitive record of customers, application performance, security
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> sense of it. Business sense. IT sense. Common sense.
>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>> _______________________________________________
>>> Carrot2-developers mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>>
>>>
>>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers

IndexFiles-cleanedUpCommentedCode.java (4K) Download Attachment
LocalLucene.z (100K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Using the Workbench with Lucene

Dawid Weiss-2
In reply to this post by Dawid Weiss-2
I can't reproduce the error you're getting. This is awkward. The index
you've sent me is not correct, but workbench doesn't complain about it
- it just returns zero documents, that's all.

The problem with IndexFiles class is that fields that are to be
clustered must be both analyzer and stored. So, in your code, you
should change the snippet in how you add fields to include:

org.apache.lucene.document.Field.Store.YES,
org.apache.lucene.document.Field.Index.ANALYZED));

Also, Field constructor accepting a stream by default doesn't store
the content. You must load the file content into a string first and
the add it to the index (stored, analyzed).

Frankly speaking, I think Solr or ElasticSearch may  be an easier
option for you to create an index since they handle many things
automatically -- mime type detection and text content extraction among
other things. But if you work with that Lucene example things should
also work fine. I really have no idea why you're getting that
exception in the first place -- can't reproduce it on my working
environment (and I work on Windows, Mac and Ubuntu Linux boxes).

Dawid

On Thu, Oct 6, 2011 at 10:33 PM, Dawid Weiss
<[hidden email]> wrote:

> Yes -- please send me the source code of the indexer. And a ZIP of a
> small index that causes this, if you can. Thanks!
>
> Dawid
>
> On Thu, Oct 6, 2011 at 10:06 PM, Fragoso, Gilberto (NIH/NCI) [E]
> <[hidden email]> wrote:
>> Dawid,
>>
>> No luck with lucene 3.0.3.  Plus modified my test indexer so the fields would be "snippet", "url", "title", and including "path" as well.  The resulting index is searchable and the test search program returns the correct values for the various fields (excluding "snippet" from report).  But setting the workbench to use these fields gave me the same error as before.  I could try with 3.4 when you update the binaries.  It would take me a few days to set up SOLR but might have to go there.  Would you mind me sending you the test indexer source (maybe you'd find what I'm doing wrong in there)?
>>
>> Thanks,
>> Gilberto
>>
>> -----Original Message-----
>> From: Dawid Weiss [mailto:[hidden email]]
>> Sent: Thursday, October 06, 2011 2:53 PM
>> To: Carrot2-developers
>> Subject: Re: [C2-devel] Using the Workbench with Lucene
>>
>> Gilberto,
>>
>> The problem is with the version used. I've just tried indexing using
>> Lucene 3.0.3 and Workbench works fine. It does have problems
>> reading newer indexes though. What is it that you're trying to
>> achieve? I see a few options:
>>
>> 1) use Lucene 3.0.3 to index your files and then Workbench to open the index,
>> 2) use Lucene in whatever version you like and feed documents to
>> either DCS or programmatically to Carrot2 API.
>> 3) index your documents using SOLR and use Workbench's Solr source
>> (should work regardless of the version used),
>>
>> I will update the Lucene version used in our master branch tomorrow,
>> so if you can wait, fetch the updated binaries tomorrow. If you watch
>> the Jira issue I mentioned, you'll know once it's fixed.
>>
>> Dawid
>>
>> On Thu, Oct 6, 2011 at 8:25 PM, Dawid Weiss
>> <[hidden email]> wrote:
>>> Ok, thanks. I'll see what's going on.
>>>
>>> Dawid
>>>
>>> On Thu, Oct 6, 2011 at 7:22 PM, Fragoso, Gilberto (NIH/NCI) [E]
>>> <[hidden email]> wrote:
>>>> Hi Dawid,
>>>>
>>>> I tried Lucene 3.1.0 and it didn't work, the error message is the same one as before.
>>>>
>>>> I noticed that Carrot2's LuceneDocumentSource was using Version.LUCENE_30 for the StandardAnalyzer, so I set my indexer test program to the same (the search test program I'm using requires the same version).  In the Workbench, I tested the SimpleFieldMapper section in the Search tab with values context fragments = 1 through 3, content field = contents, title field = path, url field = path.
>>>>
>>>> Thanks,
>>>> Gilberto
>>>>
>>>> -----Original Message-----
>>>> From: Dawid Weiss [mailto:[hidden email]]
>>>> Sent: Thursday, October 06, 2011 3:52 AM
>>>> To: Carrot2-developers
>>>> Subject: Re: [C2-devel] Using the Workbench with Lucene
>>>>
>>>> Hi Gilberto,
>>>>
>>>> Workbench 3.5.2 comes with built-in Lucene 3.1.0 - I wonder if this
>>>> can affect what you're observing. Can you try with this exact version?
>>>>
>>>> We should be upgrading to the newest Lucene too, I filed an issue for this here:
>>>> http://issues.carrot2.org/browse/CARROT-868
>>>>
>>>> Anyway, try with Lucene 3.1.0, please. If it doesn't work, let me know. Thanks,
>>>>
>>>> Dawid
>>>>
>>>>
>>>> On Thu, Oct 6, 2011 at 3:19 AM, Fragoso, Gilberto (NIH/NCI) [E]
>>>> <[hidden email]> wrote:
>>>>> Hi,
>>>>>
>>>>> I downloaded Carrot2 Workbench 3.5.2 (win 32 version) and I'm attempting to
>>>>> use it to cluster Lucene-indexed docs.  For the tests I've tried Lucene 3.4.0 and
>>>>> 3.0.3, indexing 100 sample files with an IndexFiles demo program included in
>>>>> the distribution.  The demo program creates documents for indexing with
>>>>> three fields, "path", "modified", and "contents", and I tested that the
>>>>> resulting indexes are searchable.
>>>>>
>>>>> To set the Workbench to use Lucene, I've pretty much followed
>>>>> http://download.carrot2.org/head/manual/#section.getting-started.lucene, but
>>>>> when I hit "Process" I get the error:
>>>>>
>>>>>    Processing error: Attribute binding failed: Could not get field value
>>>>>    org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>>>>>    Attribute binding failed: Could not get field value org.carrot2.source.lucene.LuceneDocumentSource#analyzer
>>>>>
>>>>> I've tried using different clustering algorithms and different entries in
>>>>> the SimpleFieldMapper section of the Search tab, but keep getting the same
>>>>> error.  Can you suggest some settings for this section?  Or do you think it
>>>>> could be something else, maybe a Lucene incompatibility with this version of
>>>>> the Workbench?  I can post/email the source for the java files as well as
>>>>> screenshots of the workbench.  Please let me know how to proceed.
>>>>>
>>>>> thanks,
>>>>> Gilberto
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> All the data continuously generated in your IT infrastructure contains a
>>>>> definitive record of customers, application performance, security
>>>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>>>> sense of it. Business sense. IT sense. Common sense.
>>>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>>>> _______________________________________________
>>>>> Carrot2-developers mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>>>>
>>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> All the data continuously generated in your IT infrastructure contains a
>>>> definitive record of customers, application performance, security
>>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>>> sense of it. Business sense. IT sense. Common sense.
>>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>>> _______________________________________________
>>>> Carrot2-developers mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>>>
>>>> ------------------------------------------------------------------------------
>>>> All the data continuously generated in your IT infrastructure contains a
>>>> definitive record of customers, application performance, security
>>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>>> sense of it. Business sense. IT sense. Common sense.
>>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>>> _______________________________________________
>>>> Carrot2-developers mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>>>
>>>>
>>>
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2dcopy1
>> _______________________________________________
>> Carrot2-developers mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2dcopy1
>> _______________________________________________
>> Carrot2-developers mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>
>>
>

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers