White spaces are required between publicId and systemId

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

White spaces are required between publicId and systemId

Bogdan94202
I am facing this issue when trying to use Carrot2 with Google Desktop.
What could be the reason, how can I fix it?
Thanks in advance,
Bogdan

eclipse.buildId=unknown
java.version=1.6.0_10-rc
java.vendor=Sun Microsystems Inc.
BootLoader constants: OS=win32, ARCH=x86, WS=win32, NL=en_US
Command-line arguments:  -os win32 -ws win32 -arch x86 -data workspace


Error
Sun Dec 13 12:46:29 EET 2009
Processing error: javax.xml.transform.TransformerException: javax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: White spaces are required between publicId and systemId.

org.carrot2.core.ProcessingException: javax.xml.transform.TransformerException: javax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: White spaces are required between publicId and systemId.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.carrot2.util.ExceptionUtils.wrapAs(Unknown Source)
at org.carrot2.source.SimpleSearchEngine.process(Unknown Source)
at org.carrot2.core.ControllerUtils.performProcessing(Unknown Source)
at org.carrot2.core.ControllerUtils.performProcessing(Unknown Source)
at org.carrot2.core.CachingController$CachedDataFactory.createEntry(Unknown Source)
at net.sf.ehcache.constructs.blocking.SelfPopulatingCache.get(SelfPopulatingCache.java:71)
at org.carrot2.core.CachingController$CachedProcessingComponent.process(Unknown Source)
at org.carrot2.core.ControllerUtils.performProcessing(Unknown Source)
at org.carrot2.core.ControllerUtils.performProcessing(Unknown Source)
at org.carrot2.core.CachingController.processInternal(Unknown Source)
at org.carrot2.core.CachingController.process(Unknown Source)
at org.carrot2.workbench.core.ui.SearchJob.run(Unknown Source)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
Caused by: javax.xml.transform.TransformerException: javax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: White spaces are required between publicId and systemId.
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown Source)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown Source)
at org.carrot2.source.xml.XmlDocumentSourceHelper.getCarrot2XmlStream(Unknown Source)
at org.carrot2.source.xml.XmlDocumentSourceHelper.loadProcessingResult(Unknown Source)
at org.carrot2.source.xml.XmlDocumentSourceHelper.loadProcessingResult(Unknown Source)
at org.carrot2.source.xml.RemoteXmlSimpleSearchEngineBase.fetchSearchResponse(Unknown Source)
at org.carrot2.source.google.GoogleDesktopDocumentSource.fetchSearchResponse(Unknown Source)
... 12 more
Caused by: javax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: White spaces are required between publicId and systemId.
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(Unknown Source)
... 19 more
Caused by: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: White spaces are required between publicId and systemId.
at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(Unknown Source)
at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(Unknown Source)
... 20 more
Reply | Threaded
Open this post in threaded view
|

Re: White spaces are required between publicId and systemId

Dawid Weiss-2
This looks like XML parsing error. Can you check if the XML returned
by Google desktop validates? If it does not, file a bug with them.

Dawid

On Sun, Dec 13, 2009 at 11:53 AM, Bogdan94202 <[hidden email]> wrote:

>
> I am facing this issue when trying to use Carrot2 with Google Desktop.
> What could be the reason, how can I fix it?
> Thanks in advance,
> Bogdan
>
> eclipse.buildId=unknown
> java.version=1.6.0_10-rc
> java.vendor=Sun Microsystems Inc.
> BootLoader constants: OS=win32, ARCH=x86, WS=win32, NL=en_US
> Command-line arguments:  -os win32 -ws win32 -arch x86 -data workspace
>
>
> Error
> Sun Dec 13 12:46:29 EET 2009
> Processing error: javax.xml.transform.TransformerException:
> javax.xml.transform.TransformerException:
> com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: White spaces
> are required between publicId and systemId.
>
> org.carrot2.core.ProcessingException:
> javax.xml.transform.TransformerException:
> javax.xml.transform.TransformerException:
> com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: White spaces
> are required between publicId and systemId.
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
> at java.lang.reflect.Constructor.newInstance(Unknown Source)
> at org.carrot2.util.ExceptionUtils.wrapAs(Unknown Source)
> at org.carrot2.source.SimpleSearchEngine.process(Unknown Source)
> at org.carrot2.core.ControllerUtils.performProcessing(Unknown Source)
> at org.carrot2.core.ControllerUtils.performProcessing(Unknown Source)
> at org.carrot2.core.CachingController$CachedDataFactory.createEntry(Unknown
> Source)
> at
> net.sf.ehcache.constructs.blocking.SelfPopulatingCache.get(SelfPopulatingCache.java:71)
> at
> org.carrot2.core.CachingController$CachedProcessingComponent.process(Unknown
> Source)
> at org.carrot2.core.ControllerUtils.performProcessing(Unknown Source)
> at org.carrot2.core.ControllerUtils.performProcessing(Unknown Source)
> at org.carrot2.core.CachingController.processInternal(Unknown Source)
> at org.carrot2.core.CachingController.process(Unknown Source)
> at org.carrot2.workbench.core.ui.SearchJob.run(Unknown Source)
> at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
> Caused by: javax.xml.transform.TransformerException:
> javax.xml.transform.TransformerException:
> com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: White spaces
> are required between publicId and systemId.
> at
> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown
> Source)
> at
> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown
> Source)
> at
> org.carrot2.source.xml.XmlDocumentSourceHelper.getCarrot2XmlStream(Unknown
> Source)
> at
> org.carrot2.source.xml.XmlDocumentSourceHelper.loadProcessingResult(Unknown
> Source)
> at
> org.carrot2.source.xml.XmlDocumentSourceHelper.loadProcessingResult(Unknown
> Source)
> at
> org.carrot2.source.xml.RemoteXmlSimpleSearchEngineBase.fetchSearchResponse(Unknown
> Source)
> at
> org.carrot2.source.google.GoogleDesktopDocumentSource.fetchSearchResponse(Unknown
> Source)
> ... 12 more
> Caused by: javax.xml.transform.TransformerException:
> com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: White spaces
> are required between publicId and systemId.
> at
> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(Unknown
> Source)
> ... 19 more
> Caused by: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException:
> White spaces are required between publicId and systemId.
> at
> com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(Unknown
> Source)
> at
> com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(Unknown
> Source)
> ... 20 more
>
> --
> View this message in context: http://n2.nabble.com/White-spaces-are-required-between-publicId-and-systemId-tp4159202p4159202.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: White spaces are required between publicId and systemId

Bogdan94202
how can I get that XML? any idea?
Reply | Threaded
Open this post in threaded view
|

Re: White spaces are required between publicId and systemId

Dawid Weiss-2
Carrot2 source code constructs URLs to the Google Desktop API, so
setting a breakpoint is one possibility. Another is reading Google
API.

D.

On Sun, Dec 13, 2009 at 2:13 PM, Bogdan94202 <[hidden email]> wrote:

>
> how can I get that XML? any idea?
> --
> View this message in context: http://n2.nabble.com/White-spaces-are-required-between-publicId-and-systemId-tp4159202p4159541.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: White spaces are required between publicId and systemId

Bogdan94202
I downloaded carrot2 source code and when I launch the workbench in debug mode the Search view says: "No active sources". What could be wrong?
Reply | Threaded
Open this post in threaded view
|

Re: White spaces are required between publicId and systemId

Bogdan94202
and the log is saying:

java.lang.RuntimeException: Could not load attribute metadata from: org.carrot2.source.SimpleSearchEngine.xml
at org.carrot2.util.attribute.BindableDescriptorBuilder.getBindableMetadata(BindableDescriptorBuilder.java:203)
at org.carrot2.util.attribute.BindableDescriptorBuilder.buildMetadataForBindableHierarchy(BindableDescriptorBuilder.java:163)
at org.carrot2.util.attribute.BindableDescriptorBuilder.buildDescriptor(BindableDescriptorBuilder.java:78)
at org.carrot2.util.attribute.BindableDescriptorBuilder.buildDescriptor(BindableDescriptorBuilder.java:46)
at org.carrot2.core.ProcessingComponentDescriptor.getBindableDescriptor(ProcessingComponentDescriptor.java:268)
at org.carrot2.core.ProcessingComponentDescriptor.getBindableDescriptor(ProcessingComponentDescriptor.java:249)
at org.carrot2.workbench.core.WorkbenchCorePlugin.scanSuites(WorkbenchCorePlugin.java:324)
at org.carrot2.workbench.core.WorkbenchCorePlugin.start(WorkbenchCorePlugin.java:97)
at org.eclipse.osgi.framework.internal.core.BundleContextImpl$1.run(BundleContextImpl.java:783)
at java.security.AccessController.doPrivileged(Native Method)
at org.eclipse.osgi.framework.internal.core.BundleContextImpl.startActivator(BundleContextImpl.java:774)
at org.eclipse.osgi.framework.internal.core.BundleContextImpl.start(BundleContextImpl.java:755)
at org.eclipse.osgi.framework.internal.core.BundleHost.startWorker(BundleHost.java:352)
at org.eclipse.osgi.framework.internal.core.AbstractBundle.start(AbstractBundle.java:280)
at org.eclipse.osgi.framework.util.SecureAction.start(SecureAction.java:408)
at org.eclipse.core.runtime.internal.adaptor.EclipseLazyStarter.postFindLocalClass(EclipseLazyStarter.java:111)
at org.eclipse.osgi.baseadaptor.loader.ClasspathManager.findLocalClass(ClasspathManager.java:449)
at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.findLocalClass(DefaultClassLoader.java:211)
at org.eclipse.osgi.internal.loader.BundleLoader.findLocalClass(BundleLoader.java:381)
at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:457)
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:410)
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:398)
at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:105)
at java.lang.ClassLoader.loadClass(ClassLoader.java:264)
at org.eclipse.osgi.internal.loader.BundleLoader.loadClass(BundleLoader.java:326)
at org.eclipse.osgi.framework.internal.core.BundleHost.loadClass(BundleHost.java:231)
at org.eclipse.osgi.framework.internal.core.AbstractBundle.loadClass(AbstractBundle.java:1193)
at org.eclipse.core.internal.registry.osgi.RegistryStrategyOSGI.createExecutableExtension(RegistryStrategyOSGI.java:160)
at org.eclipse.core.internal.registry.ExtensionRegistry.createExecutableExtension(ExtensionRegistry.java:874)
at org.eclipse.core.internal.registry.ConfigurationElement.createExecutableExtension(ConfigurationElement.java:243)
at org.eclipse.core.internal.registry.ConfigurationElementHandle.createExecutableExtension(ConfigurationElementHandle.java:51)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:189)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:368)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:559)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:514)
at org.eclipse.equinox.launcher.Main.run(Main.java:1311)
at org.eclipse.equinox.launcher.Main.main(Main.java:1287)
Reply | Threaded
Open this post in threaded view
|

Re: White spaces are required between publicId and systemId

Bogdan94202
just found the reason - for lots of plugins (algorithm, source, core, etc.) there were xml files missing in the SVN, which I had to copy/paste from the workbench binary distribution
Reply | Threaded
Open this post in threaded view
|

Re: White spaces are required between publicId and systemId

Bogdan94202
now moving to the original issue: the xml parsing error:
the google desktop gives a content wichi starts with:
 
 
<results count="3703"> 
<result> 
<category>file</category> 
<doc_id>394459</doc_id> 
...

which appears to be invalid according to the apache parser.
any idea how to fix it? (no way to change it on google dekstop side)
Reply | Threaded
Open this post in threaded view
|

Re: White spaces are required between publicId and systemId

Dawid Weiss-2
In reply to this post by Bogdan94202
They were not missing, they are generated -- in the source distribution run:

ant jar

for example, and they will be generated for you.

Dawid

On Sun, Dec 13, 2009 at 6:06 PM, Bogdan94202 <[hidden email]> wrote:

>
> just found the reason - for lots of plugins (algorithm, source, core, etc.)
> there were xml files missing in the SVN, which I had to copy/paste from the
> workbench binary distribution
> --
> View this message in context: http://n2.nabble.com/White-spaces-are-required-between-publicId-and-systemId-tp4159202p4160213.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: White spaces are required between publicId and systemId

Dawid Weiss-2
In reply to this post by Bogdan94202
Can you send the XML result verbatim, please? This looks like a valid XML to me.

Dawid

On Sun, Dec 13, 2009 at 7:42 PM, Bogdan94202 <[hidden email]> wrote:

>
> now moving to the original issue: the xml parsing error:
> the google desktop gives a content wichi starts with:
> <!--
> Content-type: fix-mhtml
>
>  -->
>
> <results count="3703">
> <result>
> <category>file</category>
> <doc_id>394459</doc_id>
> ...
>
> which appears to be invalid according to the apache parser.
> any idea how to fix it? (no way to change it on google dekstop side)
> --
> View this message in context: http://n2.nabble.com/White-spaces-are-required-between-publicId-and-systemId-tp4159202p4160508.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: White spaces are required between publicId and systemId

Stanislaw Osinski
Administrator
In reply to this post by Dawid Weiss-2

They were not missing, they are generated -- in the source distribution run:

ant jar

for example, and they will be generated for you.

Also, the manual contains a detailed description of how to run Workbench in Eclipse:

http://download.carrot2.org/head/manual/#section.advanced-topics.running-in-eclipse.workbench

Cheers,

Staszek


------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev

_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: White spaces are required between publicId and systemId

Bogdan94202
In reply to this post by Dawid Weiss-2

> Can you send the XML result verbatim, please? This looks like a valid XML to me.

Unfortunately, I cannot send the complete XML as it would expose some confidential info.
But I anyway gave up on Google Desktop and went for Solr

It seems to work now with Solr but what I observe is that the clustering analysis happens on the level of titles but not on the level of content? Is this expected/normal? How can I get the clustering happen on content level?
(I used Solr with Solr-Cell and indexed some MS PowerPoint files and their content goes into the "attr_content" attribute in the Solr/Lucene index repo.)
Reply | Threaded
Open this post in threaded view
|

Re: White spaces are required between publicId and systemId

Bogdan94202
In reply to this post by Stanislaw Osinski
> Also, the manual contains a detailed description of how to run Workbench in Eclipse:

Thanks Staszek!
It seems I proved the principle "4 hours debugging 'saves' 2 min reading the manual" ;)
Reply | Threaded
Open this post in threaded view
|

Re: White spaces are required between publicId and systemId

Dawid Weiss-2
In reply to this post by Bogdan94202
> Unfortunately, I cannot send the complete XML as it would expose some
> confidential info.

You can send it to me off-list. If it did not work, it would be
beneficial for the project to understand the source of the problem. I
can assure you we won't distribute any files further on.

Dawid

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers