Quantcast

text data in excel cells

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

text data in excel cells

hotfefone
Hi!
i am new to carrot2 and I know very basic things about xml
I have an excel file with a structure like "ID, attribute1, text" (20-30 000 rows)
I would like to cluster the text in "text" column around relevant/frequent words (I don't know them ex ante)

i tried to save the file as XML and use it in carrot2 but I was not able to produce any result

Users manual suggests to create a XSLT file, but I have no idea how to do that.

Before starting to study xml and xlst , I would like to have some advices from you.

Thank you in advance for your help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: text data in excel cells

Stanislaw Osinski
Administrator
Hi,

I think the easiest approach would be to:

1. Save the file as CSV
2. Write a simple piece of code in your favourite programming language that would convert the CSV into the XML format required by Carrot2:


Then you should be able to cluster the XML file using e.g. Carrot2 Document Clustering Workbench.

Cheers,

Staszek


On Tue, Jun 28, 2011 at 12:44, hotfefone <[hidden email]> wrote:
Hi!
i am new to carrot2 and I know very basic things about xml
I have an excel file with a structure like "ID, attribute1, text" (20-30 000
rows)
I would like to cluster the text in "text" column around relevant/frequent
words (I don't know them ex ante)

i tried to save the file as XML and use it in carrot2 but I was not able to
produce any result

Users manual suggests to create a XSLT file, but I have no idea how to do
that.

Before starting to study xml and xlst , I would like to have some advices
from you.

Thank you in advance for your help

--
View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/text-data-in-excel-cells-tp6524369p6524369.html
Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers


------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: text data in excel cells

hotfefone
Thank you for your suggestion
I managed to replicate the carrot2 structure suggested here  with some edits: here an example

<?xml version="1.0" encoding="UTF-8"?>
<query></query>
<searchresult>
<document id="C101A">
<title>default</title>
<url></url>
<snippet>METHOD FOR CASTING STEEL INTO A CONTINUOUS CASTING MOLD</snippet>
</document>
<document id="C101B">
<title>default</title>
<url></url>
<snippet>PENDULUM SHEAR FOR CONTINUOUS CASTING INSTALLATION</snippet>
</document>
...
</searchresult>

using carrot2 on my preliminary sample (250 document Ids) do not return any result.
I do not know whether "query" should be filled in the xml file (i tried to add a word inherent with my texts but no results again)
my "title" field is always "default" since I inserted my text field in "snippet"
"url" is always empty

any help would be highly appreciated! thanks!
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: text data in excel cells

JIRA dawid.weiss@cs.put.poznan.pl
> <?xml version="1.0" encoding="UTF-8"?>
> <query></query>
> <searchresult>

This is not a valid XML. Make sure your input XML is valid (by trying
to display it in a browser, for example).

Dawid

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: text data in excel cells

hotfefone
Thank you Dawid.
next error is

Processing error: Could not process query: For input string: "CA101A"
Could not process query: For input string: "CA101A"

I supopse I have to turn all the <document id="..."> to numbers

in the meanwhile, many thanks!


Date: Thu, 30 Jun 2011 08:59:11 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: text data in excel cells

> <?xml version="1.0" encoding="UTF-8"?>
> <query></query>
> <searchresult>

This is not a valid XML. Make sure your input XML is valid (by trying
to display it in a browser, for example).

Dawid

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers



If you reply to this email, your message will be added to the discussion below:
http://carrot2-users-and-developers-forum.607571.n2.nabble.com/text-data-in-excel-cells-tp6524369p6533876.html
To unsubscribe from text data in excel cells, click here.
Loading...