Tunning Lingo parameters

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Tunning Lingo parameters

milos
Hello,

1) I am using 3.0 webapp and I would like to play with some Lingo
parameters described in Section 7.12 of your documentation. Where do I
have to change these parameters? In some config file in the webapp tree or
in the source code of some class and then to recompile?

2) If I would like to add support for a language that is not supported
(Serbian in my case) what I am supposed to do? Probably to put somewhere
Serbian stoplist and stemmer (where?) and how to activate that new
language (this is probably related to question 1)?

3) Is it possible to use only a stoplist and not stemming (I don't have
Serbian stemmer)?

4) Is it possible to have two Lucene sources (pointing to two different
indexes) and to set different language parameters (and possible different
Lingo parameters) for each source?

Thanks,
Milos


------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Tunning Lingo parameters

Stanislaw Osinski
Administrator
Hi Milos,
 
1) I am using 3.0 webapp and I would like to play with some Lingo
parameters described in Section 7.12 of your documentation. Where do I
have to change these parameters? In some config file in the webapp tree or
in the source code of some class and then to recompile?

To be honest, the quickest way to experiment with different attribute values would be to use the Document Clustering Workbench (http://project.carrot2.org/download-workbench-linux-64bit.html) -- you can change the values in a GUI and observe the results in real time.

When you've come up with the attribute values you like, you'd need to transfer them to an XML config file in the webapp. For the next release we're planning an export wizard in Workbench for this, but for now you'd need to do that manually. Some hint you'll find here: http://download.carrot2.org/head/manual/#section.advanced-topics.customizing-applications.adding-source-to-webapp. During the weekend, I'll add a section to the manual devoted specifically to passing Lingo parameters to the webapp/
 
2) If I would like to add support for a language that is not supported
(Serbian in my case) what I am supposed to do? Probably to put somewhere
Serbian stoplist and stemmer (where?) and how to activate that new
language (this is probably related to question 1)?

Currently, you can't add new languages without recompiling the code I'm afraid... All you need to do is to add a constant for Serbian in org.carrot2.text.linguistic.LanguageCode and stopwords.rb (assuming "rb" is the iso suffix) and stoplabels.rb (UTF-8) encoded to the src-resources dir.

Incidentally, if you could share the stopwords and stoplabels you create, that would be great, we could include them in the next release.
 
3) Is it possible to use only a stoplist and not stemming (I don't have
Serbian stemmer)?

Yes, if a stemmer is not found, an identity stemmer (essentially no stemming) will be used.
 
4) Is it possible to have two Lucene sources (pointing to two different
indexes) and to set different language parameters (and possible different
Lingo parameters) for each source?

Ah, it's not possible out of the box at the moment. If you'd like to achieve this behaviour for the webapp, you'd need to slightly modify org.carrot2.webapp.QueryProcessorServlet to add the appropriate language to the attributes map, based on the requested document source id.

Cheers,

S.

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Tunning Lingo parameters

milos
Hello,

> During the weekend, I'll add a section to the manual devoted specifically
> to
> passing Lingo parameters to the webapp/
>
>

I haven't seen any new instructions...

I have a question about Workbench. In both 32bit and 64bit versions for
Linux I cannot set Lucene title, url and content fields from the app since
there are no fields to do that. I can only set the index directory and the
type of Analyzer used. Why is that so? What am I doing wrong?

Sincerely, Milos




------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Tunning Lingo parameters

Dawid Weiss-2

> I have a question about Workbench. In both 32bit and 64bit versions for
> Linux I cannot set Lucene title, url and content fields from the app since
> there are no fields to do that. I can only set the index directory and the
> type of Analyzer used. Why is that so? What am I doing wrong?

These attributes are optional, you have to toggle their visibility in the GUI.
See attached picture.

D.

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers

file2008-08-02-09.08.102008-11-18-11.55.152008-11-18-12.36.312009-03-04-12.30.18.png (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Tunning Lingo parameters

Stanislaw Osinski
Administrator
In reply to this post by milos
> During the weekend, I'll add a section to the manual devoted specifically
> to
> passing Lingo parameters to the webapp/
>
>

I haven't seen any new instructions...

Ooops, it must have slipped my mind. It's already there:

http://download.carrot2.org/head/manual/#section.advanced-topics.customizing-applications.customizing-lingo-for-webapp

Cheers,

S.

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers