stoplabels and stopwords: still wrong labels appear

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

stoplabels and stopwords: still wrong labels appear

hotfefone
This post was updated on .
Hi!

I have this xml
saet.xml

and I added to stopword list the word "device"
I added the same word to stoplabel list  in different forms:
(?i)(device|method...)
(?i)device.*
(?i)device .*

in "label filtering" the items
"Remove leading and trailing stop words"
"Remove stop labels"
are flagged

One of the resulting clusters is "DEVICE for..."

Where I went wrong?

Thank you for your help!

UPDATE:
I have such result when I set
Title word boost = 0


Reply | Threaded
Open this post in threaded view
|

Re: stoplabels and stopwords: still wrong labels appear

Stanislaw Osinski
Administrator
Power of posting..?
I closed and relaunched the carrot2-workbench.exe file and now it seems to
work

You don't need to restart workbench, just check the "Reload lexical resources" checkbox:


S.

------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric
Ries, the creator of the Lean Startup Methodology on "Lean Startup
Secrets Revealed." This video shows you how to validate your ideas,
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

RE: stoplabels and stopwords: still wrong labels appear

hotfefone
Dear Stanislaw, thank you for your help. I had "Reload lexical.." already flagged

I might be wrong but it seems that there are different results whether I select attributes one by one or I directly open a previously saved attributes xml file

thanks for any help


Date: Fri, 15 Jul 2011 07:48:34 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: stoplabels and stopwords: still wrong labels appear

Power of posting..?
I closed and relaunched the carrot2-workbench.exe file and now it seems to
work

You don't need to restart workbench, just check the "Reload lexical resources" checkbox:


S.

------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric
Ries, the creator of the Lean Startup Methodology on "Lean Startup
Secrets Revealed." This video shows you how to validate your ideas,
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers



To unsubscribe from stoplabels and stopwords: still wrong labels appear, click here.
Reply | Threaded
Open this post in threaded view
|

Re: stoplabels and stopwords: still wrong labels appear

Stanislaw Osinski
Administrator
Hello,

Would you be able to write down a procedure to reproduce this bug? E.g. 1) run Workbench, 2) run clustering using X document source, 3) change attribute Z to z and attribute Y to y, .... n) clusters are: ... but should be ... .

Thanks!

Staszek

On Sun, Jul 17, 2011 at 16:34, hotfefone <[hidden email]> wrote:
Dear Stanislaw, thank you for your help. I had "Reload lexical.." already flagged

I might be wrong but it seems that there are different results whether I select attributes one by one or I directly open a previously saved attributes xml file

thanks for any help


Date: Fri, 15 Jul 2011 07:48:34 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: stoplabels and stopwords: still wrong labels appear


Power of posting..?
I closed and relaunched the carrot2-workbench.exe file and now it seems to
work

You don't need to restart workbench, just check the "Reload lexical resources" checkbox:


S.

------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric
Ries, the creator of the Lean Startup Methodology on "Lean Startup
Secrets Revealed." This video shows you how to validate your ideas,
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]

To unsubscribe from stoplabels and stopwords: still wrong labels appear, click here.


View this message in context: RE: stoplabels and stopwords: still wrong labels appear
------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric
Ries, the creator of the Lean Startup Methodology on "Lean Startup
Secrets Revealed." This video shows you how to validate your ideas,
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers



------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: stoplabels and stopwords: still wrong labels appear

hotfefone
here what I did:
1) run Workbench
2) open xml file saet.xml
3) "Process" (NB: preset attributes are different from default)
Results (by the way, how can I export the list of clusters?):
Heating (4)
Specifically Thrust blocks... (3)
Steel (3)
INDUCTION HEATING (2)
MULTICRYSTALLINE SEMICONDUCTOR (2)
TEMPERATURES (2)
Welding (2)

4) Save current attributes to algorithm-lingo-attributes_110719.xml
5) Open the same attributes file (automatic re-processing)
Results:
INDUCTION (5)
METHOD AND DEVICE (5)
Localized INDUCTION ... (3)
DEVICE for OBTAINING.. (2)
INDUCTION HEATING DEVICE (2)
Tool (2)
Other Topics (2)

in the second list of clusters there are some with stopwords and stoplabels (eg: method, device, tools...)

hope it will help
thanks!
Reply | Threaded
Open this post in threaded view
|

Re: stoplabels and stopwords: still wrong labels appear

Stanislaw Osinski
Administrator
Hi,

Thanks for the reproduction procedure, it's indeed a bug: http://issues.carrot2.org/browse/CARROT-827. Add yourself as a watcher to it to get notified when it's fixed.

Cheers,

Staszek


On Tue, Jul 19, 2011 at 14:26, hotfefone <[hidden email]> wrote:
here what I did:
1) run Workbench
2) open xml file
http://carrot2-users-and-developers-forum.607571.n2.nabble.com/file/n6598602/saet.xml
saet.xml
3) "Process" (NB: preset attributes are different from default)
Results (by the way, how can I export the list of clusters?):
Heating (4)
Specifically Thrust blocks... (3)
Steel (3)
INDUCTION HEATING (2)
MULTICRYSTALLINE SEMICONDUCTOR (2)
TEMPERATURES (2)
Welding (2)

4) Save current attributes to
http://carrot2-users-and-developers-forum.607571.n2.nabble.com/file/n6598602/algorithm-lingo-attributes_110719.xml
algorithm-lingo-attributes_110719.xml
5) Open the same attributes file (automatic re-processing)
Results:
INDUCTION (5)
METHOD AND DEVICE (5)
Localized INDUCTION ... (3)
DEVICE for OBTAINING.. (2)
INDUCTION HEATING DEVICE (2)
Tool (2)
Other Topics (2)

in the second list of clusters there are some with stopwords and stoplabels
(eg: method, device, tools...)

hope it will help
thanks!

--
View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/stoplabels-and-stopwords-still-wrong-labels-appear-tp6587084p6598602.html
Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Magic Quadrant for Content-Aware Data Loss Prevention
Research study explores the data loss prevention market. Includes in-depth
analysis on the changes within the DLP market, and the criteria used to
evaluate the strengths and weaknesses of these DLP solutions.
http://www.accelacomm.com/jaw/sfnl/114/51385063/
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers


------------------------------------------------------------------------------
10 Tips for Better Web Security
Learn 10 ways to better secure your business today. Topics covered include:
Web security, SSL, hacker attacks & Denial of Service (DoS), private keys,
security Microsoft Exchange, secure Instant Messaging, and much more.
http://www.accelacomm.com/jaw/sfnl/114/51426210/
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers