Nearest Neighbor Algorithm is able to reduce the number of Other Topics cluster members in STC

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Nearest Neighbor Algorithm is able to reduce the number of Other Topics cluster members in STC

Jumadi
the existing research in the process of clustering algorithms using Suffix Tree Clustering (STC), still produce text documents Other Topics cluster members in large numbers and text documents members of this cluster are still relevant to the text document members of the existing clusters. Therefore, the text documents that is in the Other Topics cluster need to compare with all text documents in the existing clusters to determine the level of similarity. Thus, a text document Other Topics cluster members can be classified into one particular cluster by using the cosine similarity function based on the results of calculations using the method of Vector Space Model (VSM) which refers to the term frequency and the frequency of existing documents. Results of this calculation will be used by the Nearest Neighbor method in the classification process to determine the destination cluster displacement for text documents Other Topics cluster members. The main criteria of goal cluster as destination of displacement is the cluster with the highest number of members that have the highest similarity. The process of moving text document cluster members Other Topics impact on the reduction in the number of members of this cluster. 

------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Nearest Neighbor Algorithm is able to reduce the number of Other Topics cluster members in STC

Dawid Weiss
One of the assumptions in STC is that you get clear relationship
between a cluster's label and the documents it contains -- the cluster
label will occur in all the documents. When you're merging base
clusters you are partially resigning from this assumption, but you
still can explain the cluster's content by showing all the labels from
merged clusters. When you append documents using VSM this relationship
will be, for the most part, gone.

Don't get me wrong -- I like the idea, but I like being able to
understand the relationship between the cluster label and documents
even more.

Dawid

On Mon, Nov 4, 2013 at 2:07 AM, Jumadi masjum <[hidden email]> wrote:

> the existing research in the process of clustering algorithms using Suffix
> Tree Clustering (STC), still produce text documents Other Topics cluster
> members in large numbers and text documents members of this cluster are
> still relevant to the text document members of the existing clusters.
> Therefore, the text documents that is in the Other Topics cluster need to
> compare with all text documents in the existing clusters to determine the
> level of similarity. Thus, a text document Other Topics cluster members can
> be classified into one particular cluster by using the cosine similarity
> function based on the results of calculations using the method of Vector
> Space Model (VSM) which refers to the term frequency and the frequency of
> existing documents. Results of this calculation will be used by the Nearest
> Neighbor method in the classification process to determine the destination
> cluster displacement for text documents Other Topics cluster members. The
> main criteria of goal cluster as destination of displacement is the cluster
> with the highest number of members that have the highest similarity. The
> process of moving text document cluster members Other Topics impact on the
> reduction in the number of members of this cluster.
>
> ------------------------------------------------------------------------------
> Android is increasing in popularity, but the open development platform that
> developers love is also attractive to malware creators. Download this white
> paper to learn more about secure code signing practices that can help keep
> Android apps secure.
> http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>

------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Nearest Neighbor Algorithm is able to reduce the number of Other Topics cluster members in STC

Jumadi
I see. Thanks.

In testing the results of the classification of other topics that cluster member switch to an existing cluster and associated with his cluster names or cluster label transfer destination, was only 28% that is humanly (manually) the text of other topics appropriate cluster member to move to an existing cluster.

Someday I'll try the other one. Thanks for your attention.


On 4 November 2013 15:39, Dawid Weiss <[hidden email]> wrote:
One of the assumptions in STC is that you get clear relationship
between a cluster's label and the documents it contains -- the cluster
label will occur in all the documents. When you're merging base
clusters you are partially resigning from this assumption, but you
still can explain the cluster's content by showing all the labels from
merged clusters. When you append documents using VSM this relationship
will be, for the most part, gone.

Don't get me wrong -- I like the idea, but I like being able to
understand the relationship between the cluster label and documents
even more.

Dawid

On Mon, Nov 4, 2013 at 2:07 AM, Jumadi masjum <[hidden email]> wrote:
> the existing research in the process of clustering algorithms using Suffix
> Tree Clustering (STC), still produce text documents Other Topics cluster
> members in large numbers and text documents members of this cluster are
> still relevant to the text document members of the existing clusters.
> Therefore, the text documents that is in the Other Topics cluster need to
> compare with all text documents in the existing clusters to determine the
> level of similarity. Thus, a text document Other Topics cluster members can
> be classified into one particular cluster by using the cosine similarity
> function based on the results of calculations using the method of Vector
> Space Model (VSM) which refers to the term frequency and the frequency of
> existing documents. Results of this calculation will be used by the Nearest
> Neighbor method in the classification process to determine the destination
> cluster displacement for text documents Other Topics cluster members. The
> main criteria of goal cluster as destination of displacement is the cluster
> with the highest number of members that have the highest similarity. The
> process of moving text document cluster members Other Topics impact on the
> reduction in the number of members of this cluster.
>
> ------------------------------------------------------------------------------
> Android is increasing in popularity, but the open development platform that
> developers love is also attractive to malware creators. Download this white
> paper to learn more about secure code signing practices that can help keep
> Android apps secure.
> http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>

------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers


------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers