About language...

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

About language...

jredondo
The phrase discovery algorithm in the feature extraction phase of LINGO is
language independent?

Thanks...


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: About language...

Stanislaw Osinski
Administrator

The phrase discovery algorithm in the feature extraction phase of LINGO is
language independent?

It should work well for all Indoeuropean languages. It won't work that well with e.g. Chinese or Japanese.

Staszek



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: About language...

jredondo
Thank you buddy... Good to know it.

>> The phrase discovery algorithm in the feature extraction phase of LINGO
>> is
>> language independent?
>>
>
> It should work well for all Indoeuropean languages. It won't work that
> well
> with e.g. Chinese or Japanese.
>
> Staszek
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_sfd2d_oct_______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: About language...

jredondo
Staszek,

How different (or improved) is the actual implementation of LINGO (in
carrot2 current release) from the description offered in your master
thesis work?

Thanks again.

Jorge

> Thank you buddy... Good to know it.
>
>>> The phrase discovery algorithm in the feature extraction phase of LINGO
>>> is
>>> language independent?
>>>
>>
>> It should work well for all Indoeuropean languages. It won't work that
>> well
>> with e.g. Chinese or Japanese.
>>
>> Staszek
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_sfd2d_oct_______________________________________________
>> Carrot2-developers mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>
>
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_sfd2d_oct
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: About language...

Stanislaw Osinski
Administrator

How different (or improved) is the actual implementation of LINGO (in
carrot2 current release) from the description offered in your master
thesis work?

The general idea is the same, the differences are:

* More matrix factorizations to choose from (http://project.carrot2.org/publications/osinski04-dimensionality.pdf)
* Some simple tuning options (stop words, stop labels)

S.



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

About user's query...

jredondo
Hi again...
I'm studying the thesis where LINGO is proposed. Very interesting for us!
Thanks for publishing it. We would like to optimize LINGO to process not
web searches but spanish text entries coming from large public polls with
more or less open questions. So...

It calls my attention that none of the four main phases of the algorithm
(preprocessing, feature extraction, cluster's label induction and clusters
content discovery) has nothing to do with user's queries. Not even the
cluster-score has something to do with them.
What is the precise role of the user's queries in LINGO clustering results?
As it is presented there, it seems to be no relation at all.
Since the clustering actually change for different queries, I suppose that
it is not like I said and that I'm not seen something important in
relation with queries and clusters So?

Thank you guys.

Jorge.








------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: About user's query...

Stanislaw Osinski
Administrator

It calls my attention that none of the four main phases of the algorithm
(preprocessing, feature extraction, cluster's label induction and clusters
content discovery) has nothing to do with user's queries. Not even the
cluster-score has something to do with them.
What is the precise role of the user's queries in LINGO clustering results?
As it is presented there, it seems to be no relation at all.
Since the clustering actually change for different queries, I suppose that
it is not like I said and that I'm not seen something important in
relation with queries and clusters So?

Currently, Lingo does not create labels that consist only of words contained in the query. Obviously, you could go further and e.g. exclude query words from the TD matrix. An alternative to specific treatment of query words, you could just ignore terms whose tf/df exceeds a certain threshold (which would hold for query words most of the time too).

S.

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: About user's query...

jredondo
>> It calls my attention that none of the four main phases of the algorithm
>> (preprocessing, feature extraction, cluster's label induction and
>> clusters
>> content discovery) has nothing to do with user's queries. Not even the
>> cluster-score has something to do with them.
>> What is the precise role of the user's queries in LINGO clustering
>> results?
>> As it is presented there, it seems to be no relation at all.
>> Since the clustering actually change for different queries, I suppose
>> that
>> it is not like I said and that I'm not seen something important in
>> relation with queries and clusters So?
>>
>
> Currently, Lingo does not create labels that consist only of words
> contained in the query. Obviously, you could go further and e.g. exclude
> query words from the TD matrix. An alternative to specific treatment of
> query words, you could just ignore terms whose tf/df exceeds a certain
> threshold (which would hold for query words most of the time too).
>
> S.

Thanks for your answer.
I still have a doubt:
As I said, I have notice that none of the four main phases of lingo has
nothing to do with user's queries. Not even the cluster score is
calculated using information derived from user's queries.
To confirm that, i have tried a clustering without query over a text
document corpus (via xml source). Then i have tried clustering again over
the same corpus but now with a query with words that does not appear in
labels from the previous clustering. The new clusters are identical as
before.
This somehow confirm my hypothesis about what carrot2 does with user's query:
1) Feed it into the search engines that give carrot2 the snippets (the
"corpus" over which it will do clustering).
2) Exclude from clusters labels those which consist only of query's words.
(not sure how... how?)

And nothing else.

Is this right?

This is important for us because now that we have seen that clusters are
identical for different queries (that does not match any of the clusters
labels) we realized that we_need_a_fulltext_search_engine over the corpus
in order to filter those documents which are "closer" to the user's query,
discarding those which presumably have nothing to do with it.

Thank you very much again...

Jorge








------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: About user's query...

Stanislaw Osinski-3

This somehow confirm my hypothesis about what carrot2 does with user's query:
1) Feed it into the search engines that give carrot2 the snippets (the
"corpus" over which it will do clustering).
2) Exclude from clusters labels those which consist only of query's words.
(not sure how... how?)

 
This is important for us because now that we have seen that clusters are
identical for different queries (that does not match any of the clusters
labels) we realized that we_need_a_fulltext_search_engine over the corpus
in order to filter those documents which are "closer" to the user's query,
discarding those which presumably have nothing to do with it.

That is correct. You will observe a difference only if the query-less result contained a cluster with a label consisting of query words only.

Staszek


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers