Text clustering within a single document

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Text clustering within a single document

This post has NOT been accepted by the mailing list yet.
I have a document with 25012 paragraph marks in MS Word
Each paragraph mark represents a phrase which is 8 to 20 words.

I would like to acquire software that will automatically cluster these phrases into categories of similar type (similar means that the meaning of the phrases are related in some way that makes the phrases categorized into a group similar in some basic meaningful way.

I would like each cluster of phrases to have a self generated topic name just like Carrot2 does it for large quanties of documents.  In this case the only difference is that I want the clustering to be done on a large number of phrases within the same document.

Is this possible, and if so, how can I purchase the software.  If it is not possible then the only work around I can think of is to generate a macro that will create a file name for each of the 25012 phrases and then use Carrot2 to cluster the 25012 file names.

Thoughts?  Feedback?

Thanks much,