cluster labels redundance

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

cluster labels redundance

reinhard
hi,

another remark to the webapp and lingo.
it has displayed cluster labels such as

Österreich Österreich
Starnightclub Party und Lounge und Lounge

there is some redundance.
i would prefer labels such as

Österreich
Starnightclub Party und Lounge

where is this redundance coming from?
i guess, it can be simply filtered or?

best regards
reinhard

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: cluster labels redundance

Dawid Weiss-2
If it were simple, we would have done it, probably. This looks more
like a bug. Provide the input for which these labels are shown, it
will help us debug the cause of this redundancy. By "the input" I mean
either the XML or an URL to the on-line demo which shows this.

Dawid

On Mon, Mar 15, 2010 at 12:58 PM, reinhard schwab
<[hidden email]> wrote:

> hi,
>
> another remark to the webapp and lingo.
> it has displayed cluster labels such as
>
> Österreich Österreich
> Starnightclub Party und Lounge und Lounge
>
> there is some redundance.
> i would prefer labels such as
>
> Österreich
> Starnightclub Party und Lounge
>
> where is this redundance coming from?
> i guess, it can be simply filtered or?
>
> best regards
> reinhard
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: cluster labels redundance

Stanislaw Osinski
Administrator

If it were simple, we would have done it, probably. This looks more
like a bug. Provide the input for which these labels are shown, it
will help us debug the cause of this redundancy. By "the input" I mean
either the XML or an URL to the on-line demo which shows this.

Well, it may actually be a missing feature rather than a bug. We'd need to add a label filter (http://download.carrot2.org/head/javadoc/org/carrot2/text/preprocessing/filter/ILabelFilter.html) that would remove labels containing duplicate words.

S.

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: cluster labels redundance

Dawid Weiss-2
I'd look at the input first -- where does the duplication come from?
Knowing the root of the problem is certainly not going to hurt.

Dawid

On Mon, Mar 15, 2010 at 12:58 PM, Stanislaw Osinski
<[hidden email]> wrote:

>
>> If it were simple, we would have done it, probably. This looks more
>> like a bug. Provide the input for which these labels are shown, it
>> will help us debug the cause of this redundancy. By "the input" I mean
>> either the XML or an URL to the on-line demo which shows this.
>
> Well, it may actually be a missing feature rather than a bug. We'd need to
> add a label filter
> (http://download.carrot2.org/head/javadoc/org/carrot2/text/preprocessing/filter/ILabelFilter.html)
> that would remove labels containing duplicate words.
>
> S.
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: cluster labels redundance

reinhard
i will send a private email to you.

reinhard

Dawid Weiss schrieb:

> I'd look at the input first -- where does the duplication come from?
> Knowing the root of the problem is certainly not going to hurt.
>
> Dawid
>
> On Mon, Mar 15, 2010 at 12:58 PM, Stanislaw Osinski
> <[hidden email]> wrote:
>  
>>> If it were simple, we would have done it, probably. This looks more
>>> like a bug. Provide the input for which these labels are shown, it
>>> will help us debug the cause of this redundancy. By "the input" I mean
>>> either the XML or an URL to the on-line demo which shows this.
>>>      
>> Well, it may actually be a missing feature rather than a bug. We'd need to
>> add a label filter
>> (http://download.carrot2.org/head/javadoc/org/carrot2/text/preprocessing/filter/ILabelFilter.html)
>> that would remove labels containing duplicate words.
>>
>> S.
>>
>> ------------------------------------------------------------------------------
>> Download Intel&#174; Parallel Studio Eval
>> Try the new software tools for yourself. Speed compiling, find bugs
>> proactively, and fine-tune applications for parallel performance.
>> See why Intel Parallel Studio got high marks during beta.
>> http://p.sf.net/sfu/intel-sw-dev
>> _______________________________________________
>> Carrot2-developers mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>
>>
>>    
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>  


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers