Can the webapp dynamically load an XML file as a source?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Can the webapp dynamically load an XML file as a source?

Evan Cooperman
Hi,

I'm trying to use the dcs to cluster some snippets from my site.  I'm then trying to save the XML returned from the dcs to use as input to the webapp for display.  I have a few questions about this:

1. Is this possible?  Is the XML output from the dcs what the webapp needs to display a visualization?
2. If the answer to #1 is yes, is there a way to load the webapp in a way that dynamically defines what XML file to load to display the visualization?  Or is there a way to load the most recent file in a folder?  Or a whole folder?

I'm trying to find the easiest way to integrate the carrot visualization into our application without having to re-write it.

Any help would be greatly appreciated!

TIA,

Evan
Reply | Threaded
Open this post in threaded view
|

Re: Can the webapp dynamically load an XML file as a source?

Stanislaw Osinski
Administrator
Hi,

I'm trying to use the dcs to cluster some snippets from my site.  I'm then
trying to save the XML returned from the dcs to use as input to the webapp
for display.  I have a few questions about this:

1. Is this possible?  Is the XML output from the dcs what the webapp needs
to display a visualization?
2. If the answer to #1 is yes, is there a way to load the webapp in a way
that dynamically defines what XML file to load to display the visualization?
Or is there a way to load the most recent file in a folder?  Or a whole
folder?

I'm trying to find the easiest way to integrate the carrot visualization
into our application without having to re-write it.

The DCS produces the results exactly in the format required by the visualization (http://download.carrot2.org/stable/manual/#section.architecture.output-xml). The easiest implementation route would probably be to ignore the webapp and direcly use the visualization and its JavaScript API. It shouldn't be too difficult to put together a simple static HTML+JavaScript app that would:

1. Post your data to DCS for clustering
2. Feed the result XML into the visualization.

If you don't mind the open source branding (logo) on the visualization, take a look here:

http://download.carrotsearch.com/circles/demo/

and download the ZIP with the visualization binaries, JavaScript API examples and reference.

Also, you may want to check another visualization we'll be adding to Carrot2 shortly:

http://download.carrotsearch.com/foamtree/demo/

Cheers,

S.

------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Can the webapp dynamically load an XML file as a source?

Evan Cooperman
Stanislaw Osinski wrote
The DCS produces the results exactly in the format required by the
visualization (
http://download.carrot2.org/stable/manual/#section.architecture.output-xml).
The easiest implementation route would probably be to ignore the webapp and
direcly use the visualization and its JavaScript API. It shouldn't be too
difficult to put together a simple static HTML+JavaScript app that would:

1. Post your data to DCS for clustering
2. Feed the result XML into the visualization.
Thanks for your quick response.  Everything seems to be working well other than 1 point of integration between my app and carrot - I'm having trouble with step 2, is there a way to retrieve the full resulting XML via the PHP API (Carrot2.php)?  All I see are helper methods that allow you to loop through clusters/documents.  Here is my code:

$processor = new Carrot2Processor('http://localhost:8080/carrot2-dcs/dcs/rest');
$job = new Carrot2Job();
$job->addDocument('test.xml', '<searchresult> <query>broadcasting</query> <document> <title>Document 1 Test</title> <snippet>This is a test of the emergency broadcasting service.</snippet> </document> <document> <title>Document 2 Title</title> <snippet>Document 2 Content.</snippet> <url>http://document.url/2</url> </document> <document> <title>Document 3 Title</title> <snippet>Document 3 Content.</snippet> <url>http://document.url/3</url> </document> </searchresult>'); // this is the example XML provided by carrot
$job->setAttribute("results", 20);

// Perform clustering
try {
    $result = $processor->cluster($job);
}
catch (Carrot2Exception $e) {
    echo 'An error occurred during processing: ' . $e->getMessage();
    exit(10);
}


At this point I'm having trouble accessing the resulting XML, though I can use the provided methods to do so.  But what I really want is the full XML.
Reply | Threaded
Open this post in threaded view
|

Re: Can the webapp dynamically load an XML file as a source?

Stanislaw Osinski
Administrator

At this point I'm having trouble accessing the resulting XML, though I can
use the provided methods to do so.  But what I really want is the full XML.

Take a look at the cluster(Carrot2Job $job) method in Carrot2.php. Towards the end you have:

    return $this->extractResponse($response); 

The $response variable is the XML as a string, so if you omit the call to $this->extractResponse(), the method will be returning the raw XML.

For the next release, I'll refactor Carrot2.php to include a method for fetching raw XML.

Cheers,

S.

------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Can the webapp dynamically load an XML file as a source?

Evan Cooperman
Stanislaw Osinski wrote
Take a look at the cluster(Carrot2Job $job) method in Carrot2.php. Towards
the end you have:

    return $this->extractResponse($response);

The $response variable is the XML as a string, so if you omit the call to
$this->extractResponse(), the method will be returning the raw XML.

For the next release, I'll refactor Carrot2.php to include a method for
fetching raw XML.
Great, and thanks again for your quick answer.

I tried removing the $this->extractResponse() from $response and I seem to be getting back the same XML I put in the document in the job I sent to be processed. There are no errors so I feel like I'm still missing something from my input, maybe an attribute?  Here is the relevant code:

$algorithm = "lingo";
$job = new Carrot2Job();
$job->addDocument(
'test.xml', '<searchresult> <query>broadcasting</query> <document> <title>Document 1 Test</title> <snippet>This is a test of the emergency broadcasting service.</snippet> </document> <document> <title>Document 2 Title</title> <snippet>Document 2 Content.</snippet> <url>http://document.url/2</url> </document> <document> <title>Document 3 Title</title> <snippet>Document 3 Content.</snippet> <url>http://document.url/3</url> </document> </searchresult>'
);
$job->setAlgorithm($algorithm);
$xml = $processor->getClusterXML($job); // (getClusterXML is the new method I created to return the response only, rather than hijacking the existing method)

Here's a dump of the fields that go into the $fields variable used by CURL when posting (CURLOPT_POSTFIELDS     => $fields):

array(3) { ["dcs.output.format"]=> string(3) "XML" ["dcs.c2stream"]=> string(738) " <searchresult> <query>broadcasting</query> <document> <title>Document 1 Test</title> <snippet>This is a test of the emergency broadcasting service.</snippet> </document> <document> <title>Document 2 Title</title> <snippet>Document 2 Content.</snippet> <url>http://document.url/2</url> </document> <document> <title>Document 3 Title</title> <snippet>Document 3 Content.</snippet> <url>http://document.url/3</url> </document> </searchresult> " ["dcs.algorithm"]=> string(5) "lingo" }


I have a feeling maybe it's to due with dcs.c2stream?

Thanks so much for your help.

Evan

Reply | Threaded
Open this post in threaded view
|

Re: Can the webapp dynamically load an XML file as a source?

Stanislaw Osinski
Administrator

I tried removing the $this->extractResponse() from $response and I seem to
be getting back the same XML I put in the document in the job I sent to be
processed. There are no errors so I feel like I'm still missing something
from my input, maybe an attribute?  Here is the relevant code:

$algorithm = "lingo";
$job = new Carrot2Job();
$job->addDocument('test.xml', ' broadcasting  Document 1 Test This is a test
of the emergency broadcasting service.   Document 2 Title Document 2
Content. http://document.url/2   Document 3 Title Document 3 Content.
http://document.url/3  ');
$job->setAlgorithm($algorithm);
$xml = $processor->getClusterXML($job); // (getClusterXML is the new method
I created to return the response only, rather than hijacking the existing
method)

Here's a dump of the fields that go into the $fields variable used by CURL
when posting (CURLOPT_POSTFIELDS     => $fields):

array(3) { ["dcs.output.format"]=> string(3) "XML" ["dcs.c2stream"]=>
string(738) "  broadcasting  Document 1 Test This is a test of the emergency
broadcasting service.   Document 2 Title Document 2 Content.
http://document.url/2   Document 3 Title Document 3 Content.
http://document.url/3   " ["dcs.algorithm"]=> string(5) "lingo" }
 
The code looks good to me, I'd have to run it locally to do some debugging really.

But before I do that, can you try running the attached code against your DCS? I modified Carrot2.php to return the raw XML as part of the response object and changed example.php to display the XML. I tried it with Carrot2 DCS 3.4.2 and it seems to work as expected.

Cheers,

S.

------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers

dcs.zip (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Can the webapp dynamically load an XML file as a source?

Evan Cooperman
Stanislaw Osinski wrote
The code looks good to me, I'd have to run it locally to do some debugging
really.

But before I do that, can you try running the attached code against your
DCS? I modified Carrot2.php to return the raw XML as part of the response
object and changed example.php to display the XML. I tried it with Carrot2
DCS 3.4.2 and it seems to work as expected.
This worked great for me, thanks!  I think part of my problem was that I was trying to feed XML as the text of a document rather than just text.

Once I updated Carrot2.php with the version you gave me, used your example but updated the addExampleDocuments method to use documents with content rather than URL-based documents the clustering seems to be working fine and I *hopefully* won't have any more questions for you :)

Thanks for all the help!

Evan
Reply | Threaded
Open this post in threaded view
|

Re: Can the webapp dynamically load an XML file as a source?

Evan Cooperman
As a side note, it appears that the XML returned from the getXml() method is not in the proper form needed to feed into the webapp, there needs to be a replacement done so '\n' is converted to '' - maybe the getXml method should handle this?
Reply | Threaded
Open this post in threaded view
|

Re: Can the webapp dynamically load an XML file as a source?

Dawid Weiss-2
Evan,

can you tell us a bit what you do with these XMLs? Because a character
sequence \n should in no way interfere with XMLs posted to the webapp
(and in fact it shouldn't even appear on the output since in XML \n
does not need to be escaped).

I assume there is something awkward going on in PHP, but I can't tell
off the top of my head what it can be.

Dawid

On Wed, Mar 16, 2011 at 1:48 AM, evancooperman <[hidden email]> wrote:

> As a side note, it appears that the XML returned from the getXml() method is
> not in the proper form needed to feed into the webapp, there needs to be a
> replacement done so '\n' is converted to '' - maybe the getXml method should
> handle this?
>
> --
> View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Can-the-webapp-dynamically-load-an-XML-file-as-a-source-tp6162166p6174912.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Colocation vs. Managed Hosting
> A question and answer guide to determining the best fit
> for your organization - today and in the future.
> http://p.sf.net/sfu/internap-sfd2d
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>

------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Can the webapp dynamically load an XML file as a source?

Evan Cooperman
JIRA dawid.weiss@cs.put.poznan.pl wrote
Evan,

can you tell us a bit what you do with these XMLs? Because a character
sequence \n should in no way interfere with XMLs posted to the webapp
(and in fact it shouldn't even appear on the output since in XML \n
does not need to be escaped).

I assume there is something awkward going on in PHP, but I can't tell
off the top of my head what it can be.
Hey Dawid,

I've created a partial that will cluster an array of text snippets and display the results of that clustering in a new page via the javascript API.  The partial uses a method I created as shown here:

function clusterXML($search_text, $snippets = array()) {
    $processor = new Carrot2Processor('http://localhost:8080/carrot2-dcs/dcs/rest');

    $job = new Carrot2Job();
    $job->setAlgorithm('lingo');
    $job->setAttributes(array (
          'TermDocumentMatrixBuilder.termWeighting' => 'org.carrot2.text.vsm.LinearTfIdfTermWeighting'
    ));
   
    // Add snippets to the job as documents so they can be clustered
    if (!count($snippets)) {
        addExampleDocuments($job);        
    } else {
        foreach($snippets as $title => $snippet) {
            $job->addDocument($title, $snippet);
        }
    }

    if (isset($search_text)) {
        $job->setQuery($search_text); // set the query as a hint for the clustering algorithm (optional)
    }

    $result = $processor->cluster($job);
    return $result->getXml();
}

We then use this method in this manner:

$xml = clusterXML('data mining'); // TODO 1: replace data mining with the actual search string
.
.
.
<script type="text/javascript"> var xml = '<?= preg_replace("/\n/", '', $xml) ?>'; // replace all carriage returns with blank space so the xml is interpretted properly var circles = new CarrotSearchCircles({.... </script>.
.
.


This is working properly for me but not without the preg_replace statement above.  Hope this helps.


Evan