New document sources in the WebApp - Need Help

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

New document sources in the WebApp - Need Help

Nuno A
Hello All.
 
First of all, I would like to thank the Carrot2 authors for their great work.
 
 
I have a few questions.
 
I'm trying to develop new Document Sources using the Carrot framework (version 3.x revision 3282).
 
These Document Sources will gather results in Portuguese from Google, Yahoo and some Portuguese SE (search engines), like Sapo.
 
I've already implemented a class called "SapoDocumentSource", which extends the "MultipageSearchEngine" class, and queries the Sapo SE using "HttpUtils.doGet",
 
and then parses the HTML results directly using the open-source "HTMLParser" java lib.
 
Next I develop a new console JavaApp to see the results and clusters by Lingo, using the example "ClusteringDataFromDocumentSource.java", and its working OK.
 
 
Now, I'm trying to integrate the new source in the Carrot WebApp.
 
I've already managed to add a new tab and icon for the new source in the WebApp, but when I press the Search button, its just says "Loading" in both cluster and result page frames. And the "Loading" doesn't disapear.
 
 
The Tomcat (I'm using version 1.6) logs, says this:
 
catalina.out :
 
[Fatal Error] :195:27: XML document structures must start and end within the same entity.
[Fatal Error] :195:27: XML document structures must start and end within the same entity.
 
c2-carrot2-webapp-30-full.log :
 
2009-03-01 16:34:54,308,[ERROR],[http-8080-2],org.carrot2.util.xsltfilter.XSLTFilterServletResponse,XSLT filter error: An unhandled exception occurred.
javax.servlet.ServletException: java.lang.reflect.InvocationTargetException
 at org.carrot2.webapp.QueryProcessorServlet.doGet(QueryProcessorServlet.java:200)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at org.carrot2.util.xsltfilter.XSLTFilter.doFilter(XSLTFilter.java:120)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at com.planetj.servlet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:222)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
 at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
 at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.simpleframework.xml.load.Conduit.invoke(Conduit.java:222)
 at org.simpleframework.xml.load.Conduit.persist(Conduit.java:178)
 at org.simpleframework.xml.load.Schema.persist(Schema.java:190)
 at org.simpleframework.xml.load.Composite.write(Composite.java:643)
 at org.simpleframework.xml.load.Traverser.write(Traverser.java:206)
 at org.simpleframework.xml.load.CompositeInlineList.write(CompositeInlineList.java:257)
 at org.simpleframework.xml.load.CompositeInlineList.write(CompositeInlineList.java:235)
 at org.simpleframework.xml.load.Composite.writeElement(Composite.java:833)
 at org.simpleframework.xml.load.Composite.writeElements(Composite.java:728)
 at org.simpleframework.xml.load.Composite.write(Composite.java:666)
 at org.simpleframework.xml.load.Composite.write(Composite.java:644)
 at org.simpleframework.xml.load.Composite.writeElement(Composite.java:833)
 at org.simpleframework.xml.load.Composite.writeElements(Composite.java:728)
 at org.simpleframework.xml.load.Composite.write(Composite.java:666)
 at org.simpleframework.xml.load.Composite.write(Composite.java:644)
 at org.simpleframework.xml.load.Traverser.write(Traverser.java:206)
 at org.simpleframework.xml.load.Traverser.write(Traverser.java:183)
 at org.simpleframework.xml.load.Traverser.write(Traverser.java:161)
 at org.simpleframework.xml.load.Persister.write(Persister.java:769)
 at org.simpleframework.xml.load.Persister.write(Persister.java:751)
 at org.simpleframework.xml.load.Persister.write(Persister.java:732)
 at org.simpleframework.xml.load.Persister.write(Persister.java:848)
 at org.carrot2.webapp.QueryProcessorServlet.handleSearchRequest(QueryProcessorServlet.java:334)
 at org.carrot2.webapp.QueryProcessorServlet.doGet(QueryProcessorServlet.java:195)
 ... 20 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.List
 at org.carrot2.core.Document.beforeSerialization(Document.java:258)
 ... 48 more
2009-03-01 16:34:54,325,[ERROR],[http-8080-2],org.carrot2.util.xsltfilter.XSLTFilterServletResponse,XSLT filter error: Error applying stylesheet.
javax.xml.transform.TransformerException: Input parsing exception.
 at org.carrot2.util.xsltfilter.XSLTFilterServletResponse.processWithXslt(XSLTFilterServletResponse.java:423)
 at org.carrot2.util.xsltfilter.XSLTFilterServletResponse.finishResponse(XSLTFilterServletResponse.java:294)
 at org.carrot2.util.xsltfilter.XSLTFilter.doFilter(XSLTFilter.java:134)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at com.planetj.servlet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:222)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
 at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
 at java.lang.Thread.run(Thread.java:619)
Caused by: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
 at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
 at org.carrot2.util.xsltfilter.XSLTFilterServletResponse.processWithXslt(XSLTFilterServletResponse.java:402)
 ... 17 more
---------
org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
 at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
 at org.carrot2.util.xsltfilter.XSLTFilterServletResponse.processWithXslt(XSLTFilterServletResponse.java:402)
 at org.carrot2.util.xsltfilter.XSLTFilterServletResponse.finishResponse(XSLTFilterServletResponse.java:294)
 at org.carrot2.util.xsltfilter.XSLTFilter.doFilter(XSLTFilter.java:134)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at com.planetj.servlet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:222)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
 at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
 at java.lang.Thread.run(Thread.java:619)
2009-03-01 16:34:55,041,[INFO],[http-8080-1],queryLog,lingo,sapo,100,3404,cluster
2009-03-01 16:34:55,046,[ERROR],[http-8080-1],org.carrot2.util.xsltfilter.XSLTFilterServletResponse,XSLT filter error: An unhandled exception occurred.
javax.servlet.ServletException: java.lang.reflect.InvocationTargetException
 at org.carrot2.webapp.QueryProcessorServlet.doGet(QueryProcessorServlet.java:200)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at org.carrot2.util.xsltfilter.XSLTFilter.doFilter(XSLTFilter.java:120)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at com.planetj.servlet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:222)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
 at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
 at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.simpleframework.xml.load.Conduit.invoke(Conduit.java:222)
 at org.simpleframework.xml.load.Conduit.persist(Conduit.java:178)
 at org.simpleframework.xml.load.Schema.persist(Schema.java:190)
 at org.simpleframework.xml.load.Composite.write(Composite.java:643)
 at org.simpleframework.xml.load.Traverser.write(Traverser.java:206)
 at org.simpleframework.xml.load.CompositeInlineList.write(CompositeInlineList.java:257)
 at org.simpleframework.xml.load.CompositeInlineList.write(CompositeInlineList.java:235)
 at org.simpleframework.xml.load.Composite.writeElement(Composite.java:833)
 at org.simpleframework.xml.load.Composite.writeElements(Composite.java:728)
 at org.simpleframework.xml.load.Composite.write(Composite.java:666)
 at org.simpleframework.xml.load.Composite.write(Composite.java:644)
 at org.simpleframework.xml.load.Composite.writeElement(Composite.java:833)
 at org.simpleframework.xml.load.Composite.writeElements(Composite.java:728)
 at org.simpleframework.xml.load.Composite.write(Composite.java:666)
 at org.simpleframework.xml.load.Composite.write(Composite.java:644)
 at org.simpleframework.xml.load.Traverser.write(Traverser.java:206)
 at org.simpleframework.xml.load.Traverser.write(Traverser.java:183)
 at org.simpleframework.xml.load.Traverser.write(Traverser.java:161)
 at org.simpleframework.xml.load.Persister.write(Persister.java:769)
 at org.simpleframework.xml.load.Persister.write(Persister.java:751)
 at org.simpleframework.xml.load.Persister.write(Persister.java:732)
 at org.simpleframework.xml.load.Persister.write(Persister.java:848)
 at org.carrot2.webapp.QueryProcessorServlet.handleSearchRequest(QueryProcessorServlet.java:334)
 at org.carrot2.webapp.QueryProcessorServlet.doGet(QueryProcessorServlet.java:195)
 ... 20 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.List
 at org.carrot2.core.Document.beforeSerialization(Document.java:258)
 ... 48 more
2009-03-01 16:34:55,054,[ERROR],[http-8080-1],org.carrot2.util.xsltfilter.XSLTFilterServletResponse,XSLT filter error: Error applying stylesheet.
javax.xml.transform.TransformerException: Input parsing exception.
 at org.carrot2.util.xsltfilter.XSLTFilterServletResponse.processWithXslt(XSLTFilterServletResponse.java:423)
 at org.carrot2.util.xsltfilter.XSLTFilterServletResponse.finishResponse(XSLTFilterServletResponse.java:294)
 at org.carrot2.util.xsltfilter.XSLTFilter.doFilter(XSLTFilter.java:134)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at com.planetj.servlet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:222)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
 at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
 at java.lang.Thread.run(Thread.java:619)
Caused by: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
 at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
 at org.carrot2.util.xsltfilter.XSLTFilterServletResponse.processWithXslt(XSLTFilterServletResponse.java:402)
 ... 17 more
---------
org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
 at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
 at org.carrot2.util.xsltfilter.XSLTFilterServletResponse.processWithXslt(XSLTFilterServletResponse.java:402)
 at org.carrot2.util.xsltfilter.XSLTFilterServletResponse.finishResponse(XSLTFilterServletResponse.java:294)
 at org.carrot2.util.xsltfilter.XSLTFilter.doFilter(XSLTFilter.java:134)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at com.planetj.servlet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:222)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
 at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
 at java.lang.Thread.run(Thread.java:619)
 
 
Querying with a query that gives no results, the WebApp works correctly, and says "no clusters found" and "Your query returned no documents. Please try a more general query." in the page frames.
 
 
What I'm doing wrong ? Am I forgetting something ?
 
 
Thanks for your help.
 
 
Regards,
 
Nuno A.
 

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: New document sources in the WebApp - Need Help

Stanislaw Osinski
Administrator
Hello,

I've already managed to add a new tab and icon for the new source in the WebApp, but when I press the Search button, its just says "Loading" in both cluster and result page frames. And the "Loading" doesn't disapear.
 
 
The Tomcat (I'm using version 1.6) logs, says this:
 
c2-carrot2-webapp-30-full.log :

Looking at the logs I've found this:

Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.List
 at org.carrot2.core.Document.beforeSerialization(Document.java:258)

Looking at the code (Document.java):

    private void beforeSerialization()
    {
        synchronized (fields)
        {
            title = (String) fields.get(TITLE); // <----------------- line 258
            snippet = (String) fields.get(SUMMARY);
            url = (String) fields.get(CONTENT_URL);
            sources = (List<String>) fields.get(SOURCES);

            // Wrapper iterates over the whole map, so we need to synchronize
            // to avoid concurrent modification exceptions in setters
            otherFieldsForSerialization = MapUtils.asHashMap(SimpleXmlWrappers
                .wrap(fields));
        }
        otherFieldsForSerialization.remove(TITLE);
        otherFieldsForSerialization.remove(SUMMARY);
        otherFieldsForSerialization.remove(CONTENT_URL);
        otherFieldsForSerialization.remove(SOURCES);
    }

For some reason some document's title field ends up to be a List instead of a String, which is causing the exception. Can you try using a debugger or some logging to find out what's inside the list?

Cheers,

Staszek

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: New document sources in the WebApp - Need Help

Nuno A
Hello Stanislaw,

you were right, problem solved! I was inserting a String instead of a List of Strings for the Document.SOURCE.

The WebApp is working fine now. Thanks.

Regards,

Nuno A.