Carrot2 Labelling Manually

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Carrot2 Labelling Manually

sachith
I am modifying the Carrot2 source code for a project . I need to know whether we can manually input the set of probable labels to LINGO and see how the documents are clustered around them? As opposed to the algorithm finding the most probable phrases on its own via the PhraseExtractor.java file.

--
Regards

Sachith Sri Ram Kothur

Birla Institute of Technology & Science, Pilani 


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Carrot2 Labelling Manually

Stanislaw Osinski
Administrator
I am modifying the Carrot2 source code for a project . I need to know whether we can manually input the set of probable labels to LINGO and see how the documents are clustered around them? As opposed to the algorithm finding the most probable phrases on its own via the PhraseExtractor.java file.

This is doable, but not easy with the current architecture. You'd have to modify the https://github.com/carrot2/carrot2/blob/master/core/carrot2-util-text/src/org/carrot2/text/preprocessing/pipeline/CompletePreprocessingPipeline.java to replace PhraseExtractor with your own implementation that returns only the phrases you'd like to cluster around.

Stanislaw

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Carrot2 Labelling Manually

sachith
Yeah,
So to introduce my own selection of phrases, I need to find the word indices, term frequencies and tfbydocument of each word in the phrases. Do I need to write a new functions to compute all these? or can I use any inbuilt carrot functions ?

On Wed, Nov 19, 2014 at 2:21 PM, Stanislaw Osinski <[hidden email]> wrote:
I am modifying the Carrot2 source code for a project . I need to know whether we can manually input the set of probable labels to LINGO and see how the documents are clustered around them? As opposed to the algorithm finding the most probable phrases on its own via the PhraseExtractor.java file.

This is doable, but not easy with the current architecture. You'd have to modify the https://github.com/carrot2/carrot2/blob/master/core/carrot2-util-text/src/org/carrot2/text/preprocessing/pipeline/CompletePreprocessingPipeline.java to replace PhraseExtractor with your own implementation that returns only the phrases you'd like to cluster around.

Stanislaw

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers




--
Regards

Sachith Sri Ram Kothur

Birla Institute of Technology & Science, Pilani 


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Carrot2 Labelling Manually

Dawid Weiss
> Do I need to write a new functions to compute all these? or can I use any inbuilt carrot functions ?

You can use/ modify Lingo's preprocessing context which seems to have
what you need. Your questions are very generic; look at the source
code and try to analyze it first. Specific questions yield more
specific answers.

Dawid

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers