Carrot2 3.5.0 and Solr 3.3.0 not honoring carrot.lexicalResourceDir

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Carrot2 3.5.0 and Solr 3.3.0 not honoring carrot.lexicalResourceDir

OldSkoolMark
I'm using Solr 3.3.0 which has Carrot2 3.5.0 baked in. I have tried both leaving the default lexical resource directory (carrot.lexicalResourceDir) unchanged in conf/solrconfig.xml and using an absolute path to the same location. Same result. Neither conf/clustering/carrot2/{stopwords.en,stoplabels.en}  are being honored.

Issue with the version(s) I'm using, or am I missing something?

Thanks,

Mark


------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Carrot2 3.5.0 and Solr 3.3.0 not honoring carrot.lexicalResourceDir

Dawid Weiss-2
Hi Mark,

I did the following:

1) downloaded Solr 3.3.0
2) cd example
3) started Solr and indexed the demo content as in the README.TXT

Now, you can peek at where clustering resources are read from if you
enable logging. Java's default handler is trimmed at INFO, I simply
copied the default logging properties from the JRE and placed them in
example/ folder (attached), then executed:

java -Djava.util.logging.config.file=logging.properties
-Dsolr.clustering.enabled=true -jar start.jar

4) The default Carrot2 resource location is under
example/solr/conf/clustering/carrot2/, I copied stopwords.en in there.
5) When I execute: http://localhost:8983/solr/clustering?q=*:*&rows=10
I can see in the log file that:

FINE: Looking for stopwords.en in solr/./conf/clustering/carrot2
Oct 28, 2011 9:23:29 AM
org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine$1
getAll
INFO: stopwords.en loaded from solr/./conf/clustering/carrot2
Oct 28, 2011 9:23:29 AM org.carrot2.util.resource.ResourceLookup getFirst
FINE: getFirst():
        stopwords.en
        - 1 hit from: org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine$1@7eaa2ef2
                - org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine$1$1@44a9a32c
        - 0 hits [not scanned] from:
org.carrot2.util.resource.ClassLoaderLocator [class loader:
java.net.FactoryURLClassLoader@4b142196]

And everything works as expected -- adding stopwords in there,
resetting solr (lexical resources are read once, it is a costly
process) results in a modifies set of cluster labels.

Let me know if the above doesn't work for you,

Dawid

On Fri, Oct 28, 2011 at 3:41 AM, Mark Rosenberg <[hidden email]> wrote:

> I'm using Solr 3.3.0 which has Carrot2 3.5.0 baked in. I have tried both
> leaving the default lexical resource directory (carrot.lexicalResourceDir)
> unchanged in conf/solrconfig.xml and using an absolute path to the same
> location. Same result. Neither
> conf/clustering/carrot2/{stopwords.en,stoplabels.en}  are being honored.
>
> Issue with the version(s) I'm using, or am I missing something?
>
> Thanks,
>
> Mark
>
>
> ------------------------------------------------------------------------------
> The demand for IT networking professionals continues to grow, and the
> demand for specialized networking skills is growing even more rapidly.
> Take a complimentary Learning@Cisco Self-Assessment and learn
> about Cisco certifications, training, and career opportunities.
> http://p.sf.net/sfu/cisco-dev2dev
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers

logging.properties (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Carrot2 3.5.0 and Solr 3.3.0 not honoring carrot.lexicalResourceDir

OldSkoolMark
Hi Dawid,

Thanks so much for your help. My solr configuration is multicore, but with your suggestion, I can see in the log that my stopwords.en and stoplabels.en files are being loaded. However, I don't see any change in results even when something as drastic as (?i).* into stoplabels.en. I also have placed each word of one of the cluster labels into stopwords.en.  That also has no effect.

Any further words of wisdom?

Thanks,

Mark



-----Original Message-----
From: Dawid Weiss [mailto:[hidden email]]
Sent: Friday, October 28, 2011 12:25 AM
To: Carrot2-developers
Subject: Re: [C2-devel] Carrot2 3.5.0 and Solr 3.3.0 not honoring carrot.lexicalResourceDir

Hi Mark,

I did the following:

1) downloaded Solr 3.3.0
2) cd example
3) started Solr and indexed the demo content as in the README.TXT

Now, you can peek at where clustering resources are read from if you enable logging. Java's default handler is trimmed at INFO, I simply copied the default logging properties from the JRE and placed them in example/ folder (attached), then executed:

java -Djava.util.logging.config.file=logging.properties
-Dsolr.clustering.enabled=true -jar start.jar

4) The default Carrot2 resource location is under example/solr/conf/clustering/carrot2/, I copied stopwords.en in there.
5) When I execute: http://localhost:8983/solr/clustering?q=*:*&rows=10
I can see in the log file that:

FINE: Looking for stopwords.en in solr/./conf/clustering/carrot2 Oct 28, 2011 9:23:29 AM
org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine$1
getAll
INFO: stopwords.en loaded from solr/./conf/clustering/carrot2 Oct 28, 2011 9:23:29 AM org.carrot2.util.resource.ResourceLookup getFirst
FINE: getFirst():
        stopwords.en
        - 1 hit from: org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine$1@7eaa2ef2
                - org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine$1$1@44a9a32c
        - 0 hits [not scanned] from:
org.carrot2.util.resource.ClassLoaderLocator [class loader:
java.net.FactoryURLClassLoader@4b142196]

And everything works as expected -- adding stopwords in there, resetting solr (lexical resources are read once, it is a costly
process) results in a modifies set of cluster labels.

Let me know if the above doesn't work for you,

Dawid

On Fri, Oct 28, 2011 at 3:41 AM, Mark Rosenberg <[hidden email]> wrote:

> I'm using Solr 3.3.0 which has Carrot2 3.5.0 baked in. I have tried
> both leaving the default lexical resource directory
> (carrot.lexicalResourceDir) unchanged in conf/solrconfig.xml and using
> an absolute path to the same location. Same result. Neither
> conf/clustering/carrot2/{stopwords.en,stoplabels.en}  are being honored.
>
> Issue with the version(s) I'm using, or am I missing something?
>
> Thanks,
>
> Mark
>
>
> ----------------------------------------------------------------------
> -------- The demand for IT networking professionals continues to grow,
> and the demand for specialized networking skills is growing even more
> rapidly.
> Take a complimentary Learning@Cisco Self-Assessment and learn about
> Cisco certifications, training, and career opportunities.
> http://p.sf.net/sfu/cisco-dev2dev
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>
------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Carrot2 3.5.0 and Solr 3.3.0 not honoring carrot.lexicalResourceDir

OldSkoolMark
Hi Dawid,

Dug a little deeper ... Combination of user error and maybe http://issues.carrot2.org/browse/CARROT-827 . First, my service URL was .../search not .../clustering. However, fixing that error didn't change the symptoms.

When I run http://localhost:8983/solr/useractivity/clustering?q=*%3A*&version=2.2&start=0&rows=100&indent=on in Firefox, my stopwords and stoplabels are reflected in the results.

Perhaps my 64bit Ubuntu installation of the Carrot2 workbench is bad. Are the SWT exceptions in the attached workbench log to be expected?

Thanks again,

Mark

carrotWB.log
Reply | Threaded
Open this post in threaded view
|

Re: Carrot2 3.5.0 and Solr 3.3.0 not honoring carrot.lexicalResourceDir

Dawid Weiss-2
Hi Mark,

I guess there may be some misunderstanding -- when you're using the
workbench to cluster Solr search results the result does _not_ need to
be clustered on the Solr side (only the search results are fetched
from Solr, then they are clustered in the Workbench). This also
applies to lexical resources -- Workbench has a separate copy of all
lexical resources, so if you modify them on Solr side and fetch for
clustering to the Workbench, you won't see any difference...

What is your goal? Perhaps we can point you at a simple workflow that
will work for you? What I would suggest for now is to:

1) set up solr without any clustering, normal setup (multicore or not),\
2) point workbench at that solr instance, enter your query and cluster
your search result,
3) modify _workbench's_ lexical resources, then select 'reload lexical
resources' algorithm parameter on the tuning parameter's view; after
each change to lexical resources, this will re-cluster your input with
the new set of stopwords/ stoplabels.
4) once you get the desired results, copy over lexical resources to
solr, enable clustering; the results should be identical to what you
have in Workbench (given the same Carrot2 version of course).

Dawid

On Fri, Oct 28, 2011 at 8:40 PM, OldSkoolMark <[hidden email]> wrote:

> Hi Dawid,
>
> Dug a little deeper ... Combination of user error and maybe
> http://issues.carrot2.org/browse/CARROT-827 . First, my service URL was
> .../search not .../clustering. However, fixing that error didn't change the
> symptoms.
>
> When I run
> http://localhost:8983/solr/useractivity/clustering?q=*%3A*&version=2.2&start=0&rows=100&indent=on
> in Firefox, my stopwords and stoplabels are reflected in the results.
>
> Perhaps my 64bit Ubuntu installation of the Carrot2 workbench is bad. Are
> the SWT exceptions in the attached workbench log to be expected?
>
> Thanks again,
>
> Mark
>
> http://carrot2-users-and-developers-forum.607571.n2.nabble.com/file/n6941310/carrotWB.log
> carrotWB.log
>
> --
> View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Carrot2-3-5-0-and-Solr-3-3-0-not-honoring-carrot-lexicalResourceDir-tp6938601p6941310.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> The demand for IT networking professionals continues to grow, and the
> demand for specialized networking skills is growing even more rapidly.
> Take a complimentary Learning@Cisco Self-Assessment and learn
> about Cisco certifications, training, and career opportunities.
> http://p.sf.net/sfu/cisco-dev2dev
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

RE: Carrot2 3.5.0 and Solr 3.3.0 not honoring carrot.lexicalResourceDir

OldSkoolMark

Hi Dawid,

 

My bad. Thanks for clarifying the workflow. Now I understand what I was seeing. Your proposed workflow works fine for me.

 

Best Regards,

 

Mark

 

From: JIRA [hidden email] [via Carrot2 Users and Developers Forum] [mailto:ml-node+[hidden email]]
Sent: Friday, October 28, 2011 11:57 AM
To: Mark Rosenberg
Subject: Re: Carrot2 3.5.0 and Solr 3.3.0 not honoring carrot.lexicalResourceDir

 

Hi Mark,

I guess there may be some misunderstanding -- when you're using the
workbench to cluster Solr search results the result does _not_ need to
be clustered on the Solr side (only the search results are fetched
from Solr, then they are clustered in the Workbench). This also
applies to lexical resources -- Workbench has a separate copy of all
lexical resources, so if you modify them on Solr side and fetch for
clustering to the Workbench, you won't see any difference...

What is your goal? Perhaps we can point you at a simple workflow that
will work for you? What I would suggest for now is to:

1) set up solr without any clustering, normal setup (multicore or not),\
2) point workbench at that solr instance, enter your query and cluster
your search result,
3) modify _workbench's_ lexical resources, then select 'reload lexical
resources' algorithm parameter on the tuning parameter's view; after
each change to lexical resources, this will re-cluster your input with
the new set of stopwords/ stoplabels.
4) once you get the desired results, copy over lexical resources to
solr, enable clustering; the results should be identical to what you
have in Workbench (given the same Carrot2 version of course).

Dawid

On Fri, Oct 28, 2011 at 8:40 PM, OldSkoolMark <[hidden email]> wrote:


> Hi Dawid,
>
> Dug a little deeper ... Combination of user error and maybe
> http://issues.carrot2.org/browse/CARROT-827 . First, my service URL was
> .../search not .../clustering. However, fixing that error didn't change the
> symptoms.
>
> When I run
> http://localhost:8983/solr/useractivity/clustering?q=*%3A*&version=2.2&start=0&rows=100&indent=on
> in Firefox, my stopwords and stoplabels are reflected in the results.
>
> Perhaps my 64bit Ubuntu installation of the Carrot2 workbench is bad. Are
> the SWT exceptions in the attached workbench log to be expected?
>
> Thanks again,
>
> Mark
>
> http://carrot2-users-and-developers-forum.607571.n2.nabble.com/file/n6941310/carrotWB.log
> carrotWB.log
>
> --
> View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Carrot2-3-5-0-and-Solr-3-3-0-not-honoring-carrot-lexicalResourceDir-tp6938601p6941310.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> The demand for IT networking professionals continues to grow, and the
> demand for specialized networking skills is growing even more rapidly.
> Take a complimentary Learning@Cisco Self-Assessment and learn
> about Cisco certifications, training, and career opportunities.
> http://p.sf.net/sfu/cisco-dev2dev
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>


------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers


To unsubscribe from Carrot2 3.5.0 and Solr 3.3.0 not honoring carrot.lexicalResourceDir, click here.