Re: Welcome to the "Carrot2-developers" mailing list

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Welcome to the "Carrot2-developers" mailing list

zimzaz
Hi,

I have a mirror of Wikipedia set up on a private server and I would be able
to do the same clustering search on this Wikipedia as is carried out at
search.carrot2.org.  Also, I would like to be able to pass a list of
document ids to a MediaWiki plugin (Collection).  What is a good strategy
for accomplishing this?

Fred Zimmerman
------------------------------------------------------------------------------
BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA
http://p.sf.net/sfu/rim-devcon-copy2
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Welcome to the "Carrot2-developers" mailing list

Dawid Weiss-2
The search at search.carrot2.org is using Microsoft Bing with a site
restriction. In other words, you won't be able to do the same on an
intranet unless you already have an intranet search engine which can
provide search results in XML or JSON formats that can be digested by
Carrot2 DocumentSources, explained here:

http://download.carrot2.org/head/manual/index.html#section.architecture.input-xml

If I undestand the other question correctly, you want to pass a list
of ID-identified documents for clustering and then pass the clustered
sets of IDs somewhere else, right? Use the document clustering server
(DCS) as a web service and write a snippet of code in your programming
language of choice that will:

1) fetch the documents for the query from your intranet's search engine,
2) convert into Carrot2 input XML, pass this for clustering to the DCS,
3) retrieve the result, pass the IDs from clustered sets of documents
to MediaWiki or elsewhere.

Dawid

On Mon, Sep 19, 2011 at 12:47 AM, Fred Zimmerman <[hidden email]> wrote:

> Hi,
>
> I have a mirror of Wikipedia set up on a private server and I would be able
> to do the same clustering search on this Wikipedia as is carried out at
> search.carrot2.org.  Also, I would like to be able to pass a list of
> document ids to a MediaWiki plugin (Collection).  What is a good strategy
> for accomplishing this?
>
> Fred Zimmerman
> ------------------------------------------------------------------------------
> BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
> http://p.sf.net/sfu/rim-devcon-copy2
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>

------------------------------------------------------------------------------
BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
Learn about the latest advances in developing for the
BlackBerry&reg; mobile platform with sessions, labs & more.
See new tools and technologies. Register for BlackBerry&reg; DevCon today!
http://p.sf.net/sfu/rim-devcon-copy1 
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Welcome to the "Carrot2-developers" mailing list

zimzaz
1) Can Solr/Nutch provide the search results in the right format?
2) yes, that's what I want to do with the answer set, and I think I understand how to do it. Perfect!

If anyone else has already done this I would be happy to borrow your code ... ;-)


On Mon, Sep 19, 2011 at 02:51, Dawid Weiss <[hidden email]> wrote:
The search at search.carrot2.org is using Microsoft Bing with a site
restriction. In other words, you won't be able to do the same on an
intranet unless you already have an intranet search engine which can
provide search results in XML or JSON formats that can be digested by
Carrot2 DocumentSources, explained here:

http://download.carrot2.org/head/manual/index.html#section.architecture.input-xml

If I undestand the other question correctly, you want to pass a list
of ID-identified documents for clustering and then pass the clustered
sets of IDs somewhere else, right? Use the document clustering server
(DCS) as a web service and write a snippet of code in your programming
language of choice that will:

1) fetch the documents for the query from your intranet's search engine,
2) convert into Carrot2 input XML, pass this for clustering to the DCS,
3) retrieve the result, pass the IDs from clustered sets of documents
to MediaWiki or elsewhere.

Dawid

On Mon, Sep 19, 2011 at 12:47 AM, Fred Zimmerman <[hidden email]> wrote:
> Hi,
>
> I have a mirror of Wikipedia set up on a private server and I would be able
> to do the same clustering search on this Wikipedia as is carried out at
> search.carrot2.org.  Also, I would like to be able to pass a list of
> document ids to a MediaWiki plugin (Collection).  What is a good strategy
> for accomplishing this?
>
> Fred Zimmerman
> ------------------------------------------------------------------------------
> BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
> http://p.sf.net/sfu/rim-devcon-copy2
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>

------------------------------------------------------------------------------
BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
Learn about the latest advances in developing for the
BlackBerry&reg; mobile platform with sessions, labs & more.
See new tools and technologies. Register for BlackBerry&reg; DevCon today!
http://p.sf.net/sfu/rim-devcon-copy1
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers


------------------------------------------------------------------------------
BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
Learn about the latest advances in developing for the
BlackBerry&reg; mobile platform with sessions, labs & more.
See new tools and technologies. Register for BlackBerry&reg; DevCon today!
http://p.sf.net/sfu/rim-devcon-copy1 
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Welcome to the "Carrot2-developers" mailing list

Dawid Weiss-2
> 1) Can Solr/Nutch provide the search results in the right format?

You can integrate Carrot2 algorithms into Solr directly. Solr will
then emit document clusters as part of the query.
http://download.carrot2.org/head/manual/index.html#section.solr

> 2) yes, that's what I want to do with the answer set, and I think I
> understand how to do it. Perfect!

If you integrate with Solr directly, you won't need DCS.

> If anyone else has already done this I would be happy to borrow your code
> ... ;-)

Solr integration code is in Solr/Lucene repo, but just feel free to
grap the latest distro and you should have it there.

Dawid

------------------------------------------------------------------------------
BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
Learn about the latest advances in developing for the
BlackBerry&reg; mobile platform with sessions, labs & more.
See new tools and technologies. Register for BlackBerry&reg; DevCon today!
http://p.sf.net/sfu/rim-devcon-copy1 
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers