Query Text expression

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Query Text expression

Carlos S. Zamudio-2-3
Hi,

I've asked this before but I need to ask this question once again, since
I don't think I understand yet how Carrot2 handles the query text. I am
using Carrot2 (version 3 now) to cluster search results, but due to some
specialized handling of searches that I do I don't use the built-in
search API that Carrot2 provides. Rather I am taking advantage of the
ability for Carrot2 to accept a list of Documents to be clustered
directly (I'm using the LingoClusteringAlgorithm).

My question is how to deal with the query text when the query is a
complex boolean expression.  Since each different search engines can
have a slightly different expression syntax and conventions,  I wonder
if Carrot2 can extract the proper information from the query to help
clustering.  Is there a specific query syntax that Carrot2 would accept,
for example Yahoo's or Lucene's?

Thank you for any suggestions/clarifications on this topic.

Carlos S. Zamudio


------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Query Text expression

Stanislaw Osinski
Administrator
My question is how to deal with the query text when the query is a
complex boolean expression.  Since each different search engines can
have a slightly different expression syntax and conventions,  I wonder
if Carrot2 can extract the proper information from the query to help
clustering.  Is there a specific query syntax that Carrot2 would accept,
for example Yahoo's or Lucene's?

Thank you for any suggestions/clarifications on this topic.

Hi Carlos,

If you're feeding documents directly to Carrot2 algorithms, the interpretation of the query will depend on the clustering algorithm.

Lingo simply disallows labels that consist only of words found in the query (see: http://fisheye3.atlassian.com/browse/carrot2/branches/3.0-proto/core/carrot2-util-text/src/org/carrot2/text/preprocessing/filter/QueryLabelFilter.java?r=trunk#l1), no complex query analysis is performed. So the simplest strategy would be to pass all words that occurred in your original query to Lingo, separated by space. In theory, we could somehow try to exploit the boolean expression structure (e.g. to allow some labels that would be block by the above simple approach), but I'm not sure if that's worth the effort...

When it comes to STC, I'm not sure how exactly the query is used -- Dawid?

Cheers,

S.

------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Query Text expression

Dawid Weiss-2

> When it comes to STC, I'm not sure how exactly the query is used -- Dawid?

It's very similar -- the words that appear in the query are banned from output
labels as far as I can remember.

Dawid

------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Query Text expression

savannah_beckett
In reply to this post by Stanislaw Osinski
How about keyword boost factor?  My query is "AIG^0.9 bailout^0.3 money^0.7", the higher the boost factor, the more important the keyword is.  Does Lingo understand this or does it discard it? or does it think that the number is keyword which is what I am worrying about.

Your link fisheye3.atlassian.com is a dead link.
Thanks.

Stanislaw Osinski wrote
>
> My question is how to deal with the query text when the query is a
> complex boolean expression.  Since each different search engines can
> have a slightly different expression syntax and conventions,  I wonder
> if Carrot2 can extract the proper information from the query to help
> clustering.  Is there a specific query syntax that Carrot2 would accept,
> for example Yahoo's or Lucene's?
>
> Thank you for any suggestions/clarifications on this topic.


Hi Carlos,

If you're feeding documents directly to Carrot2 algorithms, the
interpretation of the query will depend on the clustering algorithm.

Lingo simply disallows labels that consist only of words found in the query
(see:
http://fisheye3.atlassian.com/browse/carrot2/branches/3.0-proto/core/carrot2-util-text/src/org/carrot2/text/preprocessing/filter/QueryLabelFilter.java?r=trunk#l1),
no complex query analysis is performed. So the simplest strategy would be to
pass all words that occurred in your original query to Lingo, separated by
space. In theory, we could somehow try to exploit the boolean expression
structure (e.g. to allow some labels that would be block by the above simple
approach), but I'm not sure if that's worth the effort...

When it comes to STC, I'm not sure how exactly the query is used -- Dawid?

Cheers,

S.

------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Carrot2-developers mailing list
Carrot2-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Query Text expression

Stanislaw Osinski
Administrator
How about keyword boost factor?  My query is "AIG^0.9 bailout^0.3 money^0.7", the higher the boost factor, the more important the keyword is.  Does Lingo understand this or does it discard it? or does it think that the number is keyword which is what I am worrying about.

You're right, Lingo does not understand boosts and thinks the number is part of the keywords (see: http://fisheye3.atlassian.com/browse/carrot2/trunk/core/carrot2-util-text/src/org/carrot2/text/preprocessing/LanguageModelStemmer.java?r=trunk#l231), which may degrade the clustering quality. A quick workaround would be like this:

1. Fetch documents from your document source (call controller with only the document source in the pipeline), passing the original query.

2. Clean up the query from all special syntax (to leave just keywords and possibly numbers that are not part of the syntax, like boost factor) and call clustering passing that query and the documents you fetched in step 1.

That's for a quick hack. In the future, I'd like to do it in a more systematic way (http://issues.carrot2.org/browse/CARROT-484). Meanwhile, if would be able to contribute query cleaning code you wrote, that would be much appreciated!

Cheers,

S.

------------------------------------------------------------------------------

_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Query Text expression

savannah_beckett
I guess this maybe a repeated question, does Lingo understand boolean query at all?  I am afraid that Lingo doesn't know that the phrase after the minus means I don't want those phrase, and it thinks that I want those phrase instead.
e.g. "AIG bailout money -government", does Lingo think that government as a keyword that I want to appear in result because it doesn't understand the meaning of minus?
Thanks.

Stanislaw Osinski wrote
>
> How about keyword boost factor?  My query is "AIG^0.9 bailout^0.3
> money^0.7", the higher the boost factor, the more important the keyword is.
>  Does Lingo understand this or does it discard it? or does it think that the
> number is keyword which is what I am worrying about.


You're right, Lingo does not understand boosts and thinks the number is part
of the keywords (see:
http://fisheye3.atlassian.com/browse/carrot2/trunk/core/carrot2-util-text/src/org/carrot2/text/preprocessing/LanguageModelStemmer.java?r=trunk#l231),
which may degrade the clustering quality. A quick workaround would be like
this:

1. Fetch documents from your document source (call controller with only the
document source in the pipeline), passing the original query.

2. Clean up the query from all special syntax (to leave just keywords and
possibly numbers that are not part of the syntax, like boost factor) and
call clustering passing that query and the documents you fetched in step 1.

That's for a quick hack. In the future, I'd like to do it in a more
systematic way (http://issues.carrot2.org/browse/CARROT-484). Meanwhile, if
would be able to contribute query cleaning code you wrote, that would be
much appreciated!

Cheers,

S.

------------------------------------------------------------------------------

_______________________________________________
Carrot2-developers mailing list
Carrot2-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Query Text expression

Stanislaw Osinski
Administrator
I guess this maybe a repeated question, does Lingo understand boolean query at all?  

No, Lingo does not perform any analysis of the query besides splitting it into keywords based on white space.
 
I am afraid that Lingo doesn't know that the phrase after the minus means I don't want those phrase, and it thinks that I want those phrase instead.
e.g. "AIG bailout money -government", does Lingo think that government as a keyword that I want to appear in result because it doesn't understand the meaning of minus?

Lingo does not understand the minus. However, this should not be a problem because the search engine should understand it and not return documents with that word. If there is no such word in the results, Lingo will not create such a label either.

Cheers,

S.

------------------------------------------------------------------------------

_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers