Stop words

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Stop words

carrotuser
This post was updated on .
Whenever I cluster, I end up getting some clusters which I don't want to see. I put those words in stoplabels.en but it still comes up? What am I doing wrong?
For ex, I put these words in stoplabels.en
(?i)Firefox (4.0|4.0.1|3.6).*
(?i)FF

Why am I still getting it in the Carrot2 workbench on clustering?
Reply | Threaded
Open this post in threaded view
|

Re: Stop words

Dawid Weiss-2
1) which algorithm are you using?
2) can you privide an input XML that results in junk cluster labels?

Dawid

On Tue, Jun 21, 2011 at 11:28 PM, brinda <[hidden email]> wrote:

> Whenever I cluster, I end up getting some clusters which I don't want to see.
> I put those words in stoplabels.en but it still comes up? What am I doing
> wrong?
> For ex, I put these words in stoplabels.en
> (?i)Firefox (4.0|4.0.1|3.6).*
> (?1)FF
>
> Why am I still getting it in the Carrot2 workbench on clustering?
>
> --
> View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Stop-words-tp6501798p6501798.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> EditLive Enterprise is the world's most technically advanced content
> authoring tool. Experience the power of Track Changes, Inline Image
> Editing and ensure content is compliant with Accessibility Checking.
> http://p.sf.net/sfu/ephox-dev2dev
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>

------------------------------------------------------------------------------
Simplify data backup and recovery for your virtual environment with vRanger.
Installation's a snap, and flexible recovery options mean your data is safe,
secure and there when you need it. Data protection magic?
Nope - It's vRanger. Get your free trial download today.
http://p.sf.net/sfu/quest-sfdev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Stop words

carrotuser
I am using Lingo algorithm for clustering. Here is the xml file.
I am also attaching my stoplabel.en
I don't want words like Firefox, FF to come up but it still shows up.


On Tue, Jun 21, 2011 at 10:59 PM, Dawid Weiss [via Carrot2 Users and Developers Forum] <[hidden email]> wrote:
1) which algorithm are you using?
2) can you privide an input XML that results in junk cluster labels?

Dawid

On Tue, Jun 21, 2011 at 11:28 PM, brinda <[hidden email]> wrote:

> Whenever I cluster, I end up getting some clusters which I don't want to see.
> I put those words in stoplabels.en but it still comes up? What am I doing
> wrong?
> For ex, I put these words in stoplabels.en
> (?i)Firefox (4.0|4.0.1|3.6).*
> (?1)FF
>
> Why am I still getting it in the Carrot2 workbench on clustering?
>
> --
> View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Stop-words-tp6501798p6501798.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.

>
> ------------------------------------------------------------------------------
> EditLive Enterprise is the world's most technically advanced content
> authoring tool. Experience the power of Track Changes, Inline Image
> Editing and ensure content is compliant with Accessibility Checking.
> http://p.sf.net/sfu/ephox-dev2dev
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>
------------------------------------------------------------------------------
Simplify data backup and recovery for your virtual environment with vRanger.
Installation's a snap, and flexible recovery options mean your data is safe,
secure and there when you need it. Data protection magic?
Nope - It's vRanger. Get your free trial download today.
http://p.sf.net/sfu/quest-sfdev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers



If you reply to this email, your message will be added to the discussion below:
http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Stop-words-tp6501798p6503002.html
To unsubscribe from Stop words, click here.


qqvasked.xml (113K) Download Attachment
stoplabels.en (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Stop words

Dawid Weiss-2

I did this: 

1) launched the workbench from command line (trunk version, but this shouldn't matter);
2) picked  XML as the search source and loaded your XML.
3) cleaned the query field as to avoid giving a hint to the clustering engine,
4) clustered the input; I got the top-cluster label reading: "Firefox 4.0.1".
5) I then selected "reload lexical resources" on the attributes view, updated workspace/stoplabels.en and re-clustered; interestingly, the label remained the same.

I then re-ran the same, but with verbose logging (from the console) and it worked just fine. This was a hit -- the problem is that the launcher does not resolve working dir properly and resolves to something else for some reason.

What platform (operating system) are you working on? The temporary workaround is to launch the workbench from the console -- this should make current working dir point at the right location. An alternative workaround is to locate where "workspace" folder was created (on my machine this was ~/Documents) and copy your resources there. This is uglier, however.

I filed an error report here:

let me know if this helped and track the above issue to find out when/if we found a way to fix it.
Dawid


------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Stop words

carrotuser
I am using Mac 10.6.7 and I am using Carrot2 Workbench. WIll it work with Windows machine?
Is it not possible to use Workbench instead of console? I like the workbench interface


On Mon, Jun 27, 2011 at 3:06 AM, Dawid Weiss [via Carrot2 Users and Developers Forum] <[hidden email]> wrote:

I did this: 

1) launched the workbench from command line (trunk version, but this shouldn't matter);
2) picked  XML as the search source and loaded your XML.
3) cleaned the query field as to avoid giving a hint to the clustering engine,
4) clustered the input; I got the top-cluster label reading: "Firefox 4.0.1".
5) I then selected "reload lexical resources" on the attributes view, updated workspace/stoplabels.en and re-clustered; interestingly, the label remained the same.

I then re-ran the same, but with verbose logging (from the console) and it worked just fine. This was a hit -- the problem is that the launcher does not resolve working dir properly and resolves to something else for some reason.

What platform (operating system) are you working on? The temporary workaround is to launch the workbench from the console -- this should make current working dir point at the right location. An alternative workaround is to locate where "workspace" folder was created (on my machine this was ~/Documents) and copy your resources there. This is uglier, however.

I filed an error report here:

let me know if this helped and track the above issue to find out when/if we found a way to fix it.
Dawid


------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2

_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers



If you reply to this email, your message will be added to the discussion below:
http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Stop-words-tp6501798p6519838.html
To unsubscribe from Stop words, click here.

Reply | Threaded
Open this post in threaded view
|

Re: Stop words

Dawid Weiss-2
I'll take a look at it on the mac and let you know.

Dawid

On Mon, Jun 27, 2011 at 6:26 PM, brinda <[hidden email]> wrote:

> I am using Mac 10.6.7 and I am using Carrot2 Workbench. WIll it work with
> Windows machine?
> Is it not possible to use Workbench instead of console? I like the workbench
> interface
>
>
> On Mon, Jun 27, 2011 at 3:06 AM, Dawid Weiss [via Carrot2 Users and
> Developers Forum] <[hidden email]> wrote:
>>
>> I did this:
>> 1) launched the workbench from command line (trunk version, but this
>> shouldn't matter);
>> 2) picked  XML as the search source and loaded your XML.
>> 3) cleaned the query field as to avoid giving a hint to the clustering
>> engine,
>> 4) clustered the input; I got the top-cluster label reading: "Firefox
>> 4.0.1".
>> 5) I then selected "reload lexical resources" on the attributes view,
>> updated workspace/stoplabels.en and re-clustered; interestingly, the label
>> remained the same.
>> I then re-ran the same, but with verbose logging (from the console) and it
>> worked just fine. This was a hit -- the problem is that the launcher does
>> not resolve working dir properly and resolves to something else for some
>> reason.
>> What platform (operating system) are you working on? The temporary
>> workaround is to launch the workbench from the console -- this should make
>> current working dir point at the right location. An alternative workaround
>> is to locate where "workspace" folder was created (on my machine this was
>> ~/Documents) and copy your resources there. This is uglier, however.
>> I filed an error report here:
>> http://issues.carrot2.org/browse/CARROT-822
>> let me know if this helped and track the above issue to find out when/if
>> we found a way to fix it.
>> Dawid
>>
>>
>> ------------------------------------------------------------------------------
>> All of the data generated in your IT infrastructure is seriously valuable.
>> Why? It contains a definitive record of application performance, security
>> threats, fraudulent activity, and more. Splunk takes this data and makes
>> sense of it. IT sense. And common sense.
>> http://p.sf.net/sfu/splunk-d2d-c2
>> _______________________________________________
>> Carrot2-developers mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Stop-words-tp6501798p6519838.html
>> To unsubscribe from Stop words, click here.
>
> ________________________________
> View this message in context: Re: Stop words
> Sent from the Carrot2 Users and Developers Forum mailing list archive at
> Nabble.com.
>
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2d-c2
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Stop words

Dawid Weiss-2
In reply to this post by carrotuser
> Is it not possible to use Workbench instead of console? I like the workbench
> interface

I didn't suggest using the console -- I suggested _starting_ Workbench
from the console to correct the current working directory, that's it.
Of course on a Mac things do get a little bit more complex because
workbench is an app bundle. So... your problem is indeed related to
the bug I mentioned before -- current working directory is not
pointing where the resources are and the defaults are picked up
instead.

I don't have a solution for you right now (other than using Windows).
I'll provide a fix for this tomorrow simply enforcing the workspace
folder in the installation directory. One this is in trunk, you can
get a fresh build of the workbench from the download area. Subscribe
to the JIRA issue above to be notified about the progress.

Dawid

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Stop words

Dawid Weiss-2
This issue should be resolved in the head build. The installation's
workspace directory will be used to pick up resources. You can
download the head build from here:

http://download.carrot2.org/head/

I'd wait an hour or so -- these ZIPs are rsynced from our build server
and it takes a while to propagate. These are the ones for MacOSX:

http://download.carrot2.org/head/carrot2-workbench-macosx.cocoa.x86-3.6.0-dev.zip
http://download.carrot2.org/head/carrot2-workbench-macosx.cocoa.x86_64-3.6.0-dev.zip

Let us know if this worked for you.
Dawid

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Stop words

carrotuser
I shall try it out and get back to you.
Thank you so much for being so helpful ! Makes it easier :)

On Tue, Jun 28, 2011 at 12:56 AM, JIRA [hidden email] [via Carrot2 Users and Developers Forum] <[hidden email]> wrote:
This issue should be resolved in the head build. The installation's
workspace directory will be used to pick up resources. You can
download the head build from here:

http://download.carrot2.org/head/

I'd wait an hour or so -- these ZIPs are rsynced from our build server
and it takes a while to propagate. These are the ones for MacOSX:

http://download.carrot2.org/head/carrot2-workbench-macosx.cocoa.x86-3.6.0-dev.zip
http://download.carrot2.org/head/carrot2-workbench-macosx.cocoa.x86_64-3.6.0-dev.zip

Let us know if this worked for you.
Dawid

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers



If you reply to this email, your message will be added to the discussion below:
http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Stop-words-tp6501798p6523989.html
To unsubscribe from Stop words, click here.