Google as a data source

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Google as a data source

spok
Hi,

to my understanding it is possible to use Google as a data source, at least in carrot2 3.9.x (can be rea in the manual). When I download the snapshot, Google can´t be selected, however ...

What has to be done?

Best regards

spok
Reply | Threaded
Open this post in threaded view
|

Re: Google as a data source

Dawid Weiss
Where does it state so in the manual? If it does, it's an error -- we
used to support Google back when it had an API but that was deprecated
a good while ago. We still use
http://code.google.com/apis/ajaxsearch/documentation/ to prefetch the
top search results for the webapp (see GoogleDocumentSource class in
the source code) and this seems to still work, but I don't know when
even this is going to vanish.

If you want to use GoogleDocumentSource you can, but it can only fetch
32 search results -- hardly enough for any sensible clustering...

Dawid

On Sat, Oct 12, 2013 at 3:01 PM, spok <[hidden email]> wrote:

> Hi,
>
> to my understanding it is possible to use Google as a data source, at least
> in carrot2 3.9.x (can be rea in the manual). When I download the snapshot,
> Google can´t be selected, however ...
>
> What has to be done?
>
> Best regards
>
> spok
>
>
>
> --
> View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Google-as-a-data-source-tp7578324.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Google as a data source

Jack Park
I notice that ajax search says it's deprecated and passes you to a
link which is now bad, but which has this root:
https://developers.google.com/custom-search/
I'm just guessing that the game has dramatically changed.

On Sat, Oct 12, 2013 at 11:51 AM, Dawid Weiss <[hidden email]> wrote:

> Where does it state so in the manual? If it does, it's an error -- we
> used to support Google back when it had an API but that was deprecated
> a good while ago. We still use
> http://code.google.com/apis/ajaxsearch/documentation/ to prefetch the
> top search results for the webapp (see GoogleDocumentSource class in
> the source code) and this seems to still work, but I don't know when
> even this is going to vanish.
>
> If you want to use GoogleDocumentSource you can, but it can only fetch
> 32 search results -- hardly enough for any sensible clustering...
>
> Dawid
>
> On Sat, Oct 12, 2013 at 3:01 PM, spok <[hidden email]> wrote:
>> Hi,
>>
>> to my understanding it is possible to use Google as a data source, at least
>> in carrot2 3.9.x (can be rea in the manual). When I download the snapshot,
>> Google can´t be selected, however ...
>>
>> What has to be done?
>>
>> Best regards
>>
>> spok
>>
>>
>>
>> --
>> View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Google-as-a-data-source-tp7578324.html
>> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>>
>> ------------------------------------------------------------------------------
>> October Webinars: Code for Performance
>> Free Intel webinars can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
>> the latest Intel processors and coprocessors. See abstracts and register >
>> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Carrot2-developers mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Google as a data source

spok
In reply to this post by Dawid Weiss
I understood the following parts in Carrot2 3.9.0 manual in the way that it should be possible:

4.2.1 Clustering results from common search engines

To try Carrot2 clustering on results from common search engines, such as Google, or Bing, you can either: ...

12.6 Google Web Search

Searches the web using Google. ....
Reply | Threaded
Open this post in threaded view
|

Re: Google as a data source

spok
In reply to this post by Jack Park
You can find the documentation for Google custom search engine - cse - here:

www.google.com/cse/docs/‎

It´s rather easy to use, so perhaps it can be integrated in carrot2, if the former way is no longer possible?

spok
Reply | Threaded
Open this post in threaded view
|

Re: Google as a data source

Dawid Weiss
This is no longer valid -- I'll correct the manual. As for Google, you
could plug in a custom search feed, sure. I don't have the incentive
to do it since there's, for example, Microsoft gives a much nicer API
for Bing with a sensible limit for personal use (5000 requests monthly
I believe). If you write it, push a pull request via github, we'll
consider adding it to the codebase.

@Jack - yeah, you could see Google's open API gradually going from
open through limited to pretty much proprietary (or non-existent). I
guess I don't blame them -- such APIs are only marginal source of
income and they're probably a major source of abuse (if not limited
somehow).

Dawid


On Sat, Oct 12, 2013 at 10:27 PM, spok <[hidden email]> wrote:

> You can find the documentation for Google custom search engine - cse - here:
>
> www.google.com/cse/docs/‎
>
> It´s rather easy to use, so perhaps it can be integrated in carrot2, if the
> former way is no longer possible?
>
> spok
>
>
>
> --
> View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Google-as-a-data-source-tp7578324p7578328.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Google as a data source

Dawid Weiss
Oh, one more thing --
http://download.carrot2.org/head/manual/index.html#section.component.google

this component is what I mentioned -- it is functional, but it'll only
fetch a very small set of top search results. It's practically not
suitable for clustering search results on its own. We use it in
combination with Comcepta's aggregating search engine (but we have an
agreement with them, you'll soon hit free API limits if you try to use
it the same way we do).

In general perhaps you should explain what you're after -- maybe there
are other ways to achieve it than Google.

Dawid


On Sun, Oct 13, 2013 at 8:38 AM, Dawid Weiss <[hidden email]> wrote:

> This is no longer valid -- I'll correct the manual. As for Google, you
> could plug in a custom search feed, sure. I don't have the incentive
> to do it since there's, for example, Microsoft gives a much nicer API
> for Bing with a sensible limit for personal use (5000 requests monthly
> I believe). If you write it, push a pull request via github, we'll
> consider adding it to the codebase.
>
> @Jack - yeah, you could see Google's open API gradually going from
> open through limited to pretty much proprietary (or non-existent). I
> guess I don't blame them -- such APIs are only marginal source of
> income and they're probably a major source of abuse (if not limited
> somehow).
>
> Dawid
>
>
> On Sat, Oct 12, 2013 at 10:27 PM, spok <[hidden email]> wrote:
>> You can find the documentation for Google custom search engine - cse - here:
>>
>> www.google.com/cse/docs/‎
>>
>> It´s rather easy to use, so perhaps it can be integrated in carrot2, if the
>> former way is no longer possible?
>>
>> spok
>>
>>
>>
>> --
>> View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Google-as-a-data-source-tp7578324p7578328.html
>> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>>
>> ------------------------------------------------------------------------------
>> October Webinars: Code for Performance
>> Free Intel webinars can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
>> the latest Intel processors and coprocessors. See abstracts and register >
>> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Carrot2-developers mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Google as a data source

spok
In reply to this post by Dawid Weiss
I have a Bing key for 5000 requests per month, which is enough of course, but - in the past - I found a reasonable searches where Bing doesn´t deliver as many hits as Google does ...

So, I would be wery interested to combine a custom search and carrot2.

How to do this?

Spok
Reply | Threaded
Open this post in threaded view
|

Re: Google as a data source

Dawid Weiss
You need to write an implementation of IDocumentSource which will
fetch documents using Google's custom search API. Then you can use
this as an algorithm directly or combine Bing and your Google data
source into a meta-search engine (deduplicating search results based
on the URL, for example).

Dawid

On Sun, Oct 13, 2013 at 4:41 PM, spok <[hidden email]> wrote:

> I have a Bing key for 5000 requests per month, which is enough of course, but
> - in the past - I found a reasonable searches where Bing doesn´t deliver as
> many hits as Google does ...
>
> So, I would be wery interested to combine a custom search and carrot2.
>
> How to do this?
>
> Spok
>
>
>
> --
> View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Google-as-a-data-source-tp7578324p7578332.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Google as a data source

spok
Unfortunately I don´t know Java ...

Integrating a Google cse as a data source can´t become part of carrot2 roadmap?

Would be great, and I think others would appreciate this, too.

spok
Reply | Threaded
Open this post in threaded view
|

Re: Google as a data source

Dawid Weiss
> Integrating a Google cse as a data source can´t become part of carrot2
> roadmap?
>
> Would be great, and I think others would appreciate this, too.

It could, but there's so much work to do that I don't think it'll
realistically help you if I said yes -- we can look at it in a few
months from now...  Also there's a problem of who would be the target
of such an extension -- the CSE is primarily targeted at commercial
customers anyway, quoting:

"Paid Users of Google Site Search can also use the XML API to retrieve
XML results and customize their display."

So we wouldn't even be able to test this without purchasing a license
ourselves. If you need  this functionality I think you should hire a
freelancer who knows Java to write it (and give that person the access
to CSE search results stream).

Dawid

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Google as a data source

Dawid Weiss
Oh, and there's always the legal aspect of this too; I just peeked at
their terms of license:

"You may not in any way frame, cache or modify the Results produced by
Google, except as otherwise agreed to between You and Google."

Full text at:
https://support.google.com/customsearch/answer/1714300?hl=en&ref_topic=1717070

It really is kind of vague whether "cache or modify" would apply to
clustering search results on-line and displaying clusters alongside
their search results... We want no lawsuits from a company that could
probably afford to buy the entire country we live in :)

Dawid



On Mon, Oct 14, 2013 at 8:20 AM, Dawid Weiss <[hidden email]> wrote:

>> Integrating a Google cse as a data source can´t become part of carrot2
>> roadmap?
>>
>> Would be great, and I think others would appreciate this, too.
>
> It could, but there's so much work to do that I don't think it'll
> realistically help you if I said yes -- we can look at it in a few
> months from now...  Also there's a problem of who would be the target
> of such an extension -- the CSE is primarily targeted at commercial
> customers anyway, quoting:
>
> "Paid Users of Google Site Search can also use the XML API to retrieve
> XML results and customize their display."
>
> So we wouldn't even be able to test this without purchasing a license
> ourselves. If you need  this functionality I think you should hire a
> freelancer who knows Java to write it (and give that person the access
> to CSE search results stream).
>
> Dawid

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers