LINGO and STC

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

LINGO and STC

serghiño80

Another question that I happened to ask on the previous mail:

as affected if we use the character "_" to separate the data?

example:

Document 1: rey_juan_carlos
Document 2: juan_carlos
Document 3: juan_carlos_rey


affect the character "_" to the classification of documents within the cluster?


------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: LINGO and STC

Stanislaw Osinski
Administrator

Another question that I happened to ask on the previous mail:

as affected if we use the character "_" to separate the data?

example:

Document 1: rey_juan_carlos
Document 2: juan_carlos
Document 3: juan_carlos_rey


affect the character "_" to the classification of documents within the cluster?

Yes -- rey_juan_carlos (with underscores) will be treated as one word (token / feature), while rey juan carlos (with spaces) will be treated as three separate words. To answer your previous question -- the case does not matter for either of the clustering algorithms.

Cheers,

S.

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers