Lingo algorithm and word frequency

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Lingo algorithm and word frequency

Bogdan94202
Hi,

Do you have more detailed description of what Lingo is doing?
I am aiming at one particular, rather simple, thing - to get the most frequent words used accross all documents. Would Lingo, based on some specific attribute configuration, give me that?
For example if I set "Phrase label boost" to 0, would the end result actually be something like "which are the top X words by usage in the documents I analyzed"?

Best regards,
Bogdan
Reply | Threaded
Open this post in threaded view
|

Re: Lingo algorithm and word frequency

Dawid Weiss-2
Lingo is described in our research papers and in Staszek's thesis --
see the "publications" link on the Carrot2 Web site. What you're after
is not directly or indirectly available from the algorithm, however.
You can use the preprocessing context to get the word counts in
documents... but I don't think it'll be possible without touching the
code.

Dawid

On Thu, Dec 24, 2009 at 5:11 PM, Bogdan94202 <[hidden email]> wrote:

>
> Hi,
>
> Do you have more detailed description of what Lingo is doing?
> I am aiming at one particular, rather simple, thing - to get the most
> frequent words used accross all documents. Would Lingo, based on some
> specific attribute configuration, give me that?
> For example if I set "Phrase label boost" to 0, would the end result
> actually be something like "which are the top X words by usage in the
> documents I analyzed"?
>
> Best regards,
> Bogdan
> --
> View this message in context: http://n2.nabble.com/Lingo-algorithm-and-word-frequency-tp4213682p4213682.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> This SF.Net email is sponsored by the Verizon Developer Community
> Take advantage of Verizon's best-in-class app development support
> A streamlined, 14 day to market process makes app distribution fast and easy
> Join now and get one step closer to millions of Verizon customers
> http://p.sf.net/sfu/verizon-dev2dev
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Lingo algorithm and word frequency

Stanislaw Osinski
Administrator
Hi,

You can use the PreprocessingPipeline class for that:

http://download.carrot2.org/head/javadoc/org/carrot2/text/preprocessing/PreprocessingPipeline.html

The code is fairly straightforward:

final List<Document> documents = ...;
final String query = ...;
final PreprocessingPipeline preprocessingPipeline = new PreprocessingPipeline();
final PreprocessingContext context = preprocessingPipeline.preprocess(documents, query);

Then, you can obtain quite a lot of interesting information from the members of the ProcessingContext, e.g.:

http://download.carrot2.org/head/javadoc/org/carrot2/text/preprocessing/PreprocessingContext.html
http://download.carrot2.org/head/javadoc/org/carrot2/text/preprocessing/PreprocessingContext.AllWords.html
http://download.carrot2.org/head/javadoc/org/carrot2/text/preprocessing/PreprocessingContext.AllWords.html#tf

You can see LingoClusteringAlgorithm for a full usage example:

http://fisheye3.atlassian.com/browse/carrot2/trunk/core/carrot2-algorithm-lingo/src/org/carrot2/clustering/lingo/LingoClusteringAlgorithm.java?r=trunk#l155

Cheers,

Staszek


On Thu, Dec 24, 2009 at 22:38, Dawid Weiss <[hidden email]> wrote:
Lingo is described in our research papers and in Staszek's thesis --
see the "publications" link on the Carrot2 Web site. What you're after
is not directly or indirectly available from the algorithm, however.
You can use the preprocessing context to get the word counts in
documents... but I don't think it'll be possible without touching the
code.

Dawid

On Thu, Dec 24, 2009 at 5:11 PM, Bogdan94202 <[hidden email]> wrote:
>
> Hi,
>
> Do you have more detailed description of what Lingo is doing?
> I am aiming at one particular, rather simple, thing - to get the most
> frequent words used accross all documents. Would Lingo, based on some
> specific attribute configuration, give me that?
> For example if I set "Phrase label boost" to 0, would the end result
> actually be something like "which are the top X words by usage in the
> documents I analyzed"?
>
> Best regards,
> Bogdan


------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers