Fwd: Bug in org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: Bug in org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm

Stanislaw Osinski
Administrator
Hi Sergio,

Good catch! I must have broken the code when changing the loop structure or something...

Stanislaw


On Sun, Jan 18, 2015 at 9:06 PM, Dawid Weiss <[hidden email]> wrote:
Hi Sergio,

Thanks for the report and apologies for belated reply.

I filed this issue:
http://issues.carrot2.org/browse/CARROT-1081

Will try to address it as soon as possible. Thanks!

Dawid

On Wed, Jan 14, 2015 at 4:31 PM, Sergio Ricardo de Melo Queiroz
<[hidden email]> wrote:
> Hello,
>
> I was running BisectingKMeansClusteringAlgorithm on some documents and saw
> that, contrary to the expected, it did not returned hard clusterings, i.e.,
> the same document appeared in multiple clusters. I looked at the code and
> figured out that the problem was likely due to a bug in line 442, where
> starts the block:
>
> if (it < iterations - 1)
> {
>     previousResult = result;
>     result = Lists.newArrayList();
>     for (int i = 0; i < partitions; i++)
>     {
>         result.add(new IntArrayList(selected.columns()));
>     }
> }
>
> This condition caused that the result list is not initialized anew in the
> last iteration, so that the last iteration adds elements to the partitions
> of the iteration before it. I removed the "if" (so that the code inside the
> if executed for all iterations) and the algorithm started behaving as
> expected.
>
> Hope you can fix it in the repository.
>
> Yours,
>
> Sergio Queiroz
> http://www.cin.ufpe.br/~srmq/
>
> ------------------------------------------------------------------------------
> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
> GigeNET is offering a free month of service with a new server in Ashburn.
> Choose from 2 high performing configs, both with 100TB of bandwidth.
> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
> http://p.sf.net/sfu/gigenet
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers




------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Bug in org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm

Sergio Ricardo de Melo Queiroz
Thank you guys!


Sergio

2015-01-18 19:25 GMT-03:00 Stanislaw Osinski <[hidden email]>:
Hi Sergio,

Good catch! I must have broken the code when changing the loop structure or something...

Stanislaw


On Sun, Jan 18, 2015 at 9:06 PM, Dawid Weiss <[hidden email]> wrote:
Hi Sergio,

Thanks for the report and apologies for belated reply.

I filed this issue:
http://issues.carrot2.org/browse/CARROT-1081

Will try to address it as soon as possible. Thanks!

Dawid

On Wed, Jan 14, 2015 at 4:31 PM, Sergio Ricardo de Melo Queiroz
<[hidden email]> wrote:
> Hello,
>
> I was running BisectingKMeansClusteringAlgorithm on some documents and saw
> that, contrary to the expected, it did not returned hard clusterings, i.e.,
> the same document appeared in multiple clusters. I looked at the code and
> figured out that the problem was likely due to a bug in line 442, where
> starts the block:
>
> if (it < iterations - 1)
> {
>     previousResult = result;
>     result = Lists.newArrayList();
>     for (int i = 0; i < partitions; i++)
>     {
>         result.add(new IntArrayList(selected.columns()));
>     }
> }
>
> This condition caused that the result list is not initialized anew in the
> last iteration, so that the last iteration adds elements to the partitions
> of the iteration before it. I removed the "if" (so that the code inside the
> if executed for all iterations) and the algorithm started behaving as
> expected.
>
> Hope you can fix it in the repository.
>
> Yours,
>
> Sergio Queiroz
> http://www.cin.ufpe.br/~srmq/
>
> ------------------------------------------------------------------------------
> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
> GigeNET is offering a free month of service with a new server in Ashburn.
> Choose from 2 high performing configs, both with 100TB of bandwidth.
> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
> http://p.sf.net/sfu/gigenet
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers




------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers



------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers