|
Hi!
i am new to carrot2 and I know very basic things about xml I have an excel file with a structure like "ID, attribute1, text" (20-30 000 rows) I would like to cluster the text in "text" column around relevant/frequent words (I don't know them ex ante) i tried to save the file as XML and use it in carrot2 but I was not able to produce any result Users manual suggests to create a XSLT file, but I have no idea how to do that. Before starting to study xml and xlst , I would like to have some advices from you. Thank you in advance for your help |
|
Administrator
|
Hi, I think the easiest approach would be to: 1. Save the file as CSV 2. Write a simple piece of code in your favourite programming language that would convert the CSV into the XML format required by Carrot2:
Then you should be able to cluster the XML file using e.g. Carrot2 Document Clustering Workbench.
Cheers, Staszek On Tue, Jun 28, 2011 at 12:44, hotfefone <[hidden email]> wrote: Hi! ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ Carrot2-developers mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/carrot2-developers |
|
Thank you for your suggestion
I managed to replicate the carrot2 structure suggested here with some edits: here an example <?xml version="1.0" encoding="UTF-8"?> <query></query> <searchresult> <document id="C101A"> <title>default</title> <url></url> <snippet>METHOD FOR CASTING STEEL INTO A CONTINUOUS CASTING MOLD</snippet> </document> <document id="C101B"> <title>default</title> <url></url> <snippet>PENDULUM SHEAR FOR CONTINUOUS CASTING INSTALLATION</snippet> </document> ... </searchresult> using carrot2 on my preliminary sample (250 document Ids) do not return any result. I do not know whether "query" should be filled in the xml file (i tried to add a word inherent with my texts but no results again) my "title" field is always "default" since I inserted my text field in "snippet" "url" is always empty any help would be highly appreciated! thanks! |
|
> <?xml version="1.0" encoding="UTF-8"?>
> <query></query> > <searchresult> This is not a valid XML. Make sure your input XML is valid (by trying to display it in a browser, for example). Dawid ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ Carrot2-developers mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/carrot2-developers |
|
Thank you Dawid.
next error is Processing error: Could not process query: For input string: "CA101A" Could not process query: For input string: "CA101A" I supopse I have to turn all the <document id="..."> to numbers in the meanwhile, many thanks! Date: Thu, 30 Jun 2011 08:59:11 -0700 From: [hidden email] To: [hidden email] Subject: Re: text data in excel cells > <?xml version="1.0" encoding="UTF-8"?> > <query></query> > <searchresult> This is not a valid XML. Make sure your input XML is valid (by trying to display it in a browser, for example). Dawid ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ Carrot2-developers mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/carrot2-developers If you reply to this email, your message will be added to the discussion below:
http://carrot2-users-and-developers-forum.607571.n2.nabble.com/text-data-in-excel-cells-tp6524369p6533876.html
To unsubscribe from text data in excel cells, click here.
|
| Powered by Nabble | Edit this page |
