Token exhilar exceeds length of provided text sized 3801

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Token exhilar exceeds length of provided text sized 3801

OldSkoolMark
Having some trouble clustering my data ... These symptoms are similar to some problems that were fixed last year. Possible regression? Suggestions on how to proceed? Thanks in advance!

https://issues.apache.org/jira/browse/SOLR-1883
https://issues.apache.org/jira/browse/SOLR-1404

Nov 7, 2011 8:15:35 AM org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine cluster
SEVERE: Carrot2 clustering failed
org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token exhilar exceeds length of provided text sized 3801
        at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:475)
        at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:379)
        at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.getDocuments(CarrotClusteringEngine.java:303)
        at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:124)
        at org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:91)
        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
        at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token exhilar exceeds length of provided text sized 3801
        at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
        at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:468)
        ... 27 more

Here is the relevant portion of my solrconfig.

  <requestHandler name="/clustering"
                  startup="lazy"
                  enable="${solr.clustering.enabled:false}"
                  class="solr.SearchHandler">
    <lst name="defaults">
      <bool name="clustering">true</bool>
      <str name="clustering.engine">default</str>
      <bool name="clustering.results">true</bool>
     
      <str name="carrot.title">title</str>
      <str name="carrot.url">url</str>
     
       <str name="carrot.snippet">description</str>
       
       <bool name="carrot.produceSummary">true</bool>
       
       
       
       <bool name="carrot.outputSubClusters">false</bool>
       
       <str name="defType">edismax</str>
       <str name="qf">
          text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
       </str>
       <str name="q.alt">*:*</str>
       <str name="rows">10</str>
       <str name="fl">*,score</str>
    </lst>     
    <arr name="last-components">
      <str>clustering</str>
    </arr>
  </requestHandler>

Also my data-config.xml as I my data is in an sqlite3 DB.

<dataConfig>
  <dataSource type="JdbcDataSource"
              driver="org.sqlite.JDBC"
              name="user-ds"
              url="jdbc:sqlite://home/wowmobile/public_fm/public/sisa/public/fmedia/db/smarthomegroupservices.db"
              user=""
              password=""/>
 <document>
    <entity name="useractivity" dataSource="user-ds" transformer="HTMLStripTransformer"
            query="SELECT id,user_id,content_type_id,content_id FROM user_activity WHERE content_type_id=6">
      <field column="user_id" name="user_id" />
      <entity name="webcontent"
              query="SELECT description,title,url FROM wa_feed_items WHERE id IS NOT NULL AND id=${useractivity.content_id}">
        <field column="description" name="description" />
        <field column="title" name="title" />
        <field column="url" name="url" />
      </entity>
    </entity>
  </document>
</dataConfig>

schema.xml has the standard description and title fields.
  <field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
   <field name="description" type="text_general" indexed="true" stored="true"/>
Reply | Threaded
Open this post in threaded view
|

Re: Token exhilar exceeds length of provided text sized 3801

Dawid Weiss-2
Which Solr version are you using? Can you provide a repeatable example
(index + query, for instance)?

Dawid

On Mon, Nov 7, 2011 at 5:36 PM, OldSkoolMark <[hidden email]> wrote:

> Having some trouble clustering my data ... These symptoms are similar to some
> problems that were fixed last year. Possible regression? Suggestions on how
> to proceed? Thanks in advance!
>
> https://issues.apache.org/jira/browse/SOLR-1883
> https://issues.apache.org/jira/browse/SOLR-1404
>
> Nov 7, 2011 8:15:35 AM
> org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine cluster
> SEVERE: Carrot2 clustering failed
> org.apache.solr.common.SolrException:
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token
> exhilar exceeds length of provided text sized 3801
>        at
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:475)
>        at
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:379)
>        at
> org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.getDocuments(CarrotClusteringEngine.java:303)
>        at
> org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:124)
>        at
> org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:91)
>        at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
>        at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>        at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>        at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>        at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>        at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>        at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>        at org.mortbay.jetty.Server.handle(Server.java:326)
>        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>        at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>        at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>        at
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
> Token exhilar exceeds length of provided text sized 3801
>        at
> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
>        at
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:468)
>        ... 27 more
>
> Here is the relevant portion of my solrconfig.
>
>  <requestHandler name="/clustering"
>                  startup="lazy"
>                  enable="${solr.clustering.enabled:false}"
>                  class="solr.SearchHandler">
>    <lst name="defaults">
>      <bool name="clustering">true</bool>
>      <str name="clustering.engine">default</str>
>      <bool name="clustering.results">true</bool>
>
>      <str name="carrot.title">title</str>
>      <str name="carrot.url">url</str>
>
>       <str name="carrot.snippet">description</str>
>
>       <bool name="carrot.produceSummary">true</bool>
>
>
>
>       <bool name="carrot.outputSubClusters">false</bool>
>
>       <str name="defType">edismax</str>
>       <str name="qf">
>          text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
>       </str>
>       <str name="q.alt">*:*</str>
>       <str name="rows">10</str>
>       <str name="fl">*,score</str>
>    </lst>
>    <arr name="last-components">
>      <str>clustering</str>
>    </arr>
>  </requestHandler>
>
> Also my data-config.xml as I my data is in an sqlite3 DB.
>
> <dataConfig>
>  <dataSource type="JdbcDataSource"
>              driver="org.sqlite.JDBC"
>              name="user-ds"
>
> url="jdbc:sqlite://home/wowmobile/public_fm/public/sisa/public/fmedia/db/smarthomegroupservices.db"
>              user=""
>              password=""/>
>  <document>
>    <entity name="useractivity" dataSource="user-ds"
> transformer="HTMLStripTransformer"
>            query="SELECT id,user_id,content_type_id,content_id FROM
> user_activity WHERE content_type_id=6">
>      <field column="user_id" name="user_id" />
>      <entity name="webcontent"
>              query="SELECT description,title,url FROM wa_feed_items WHERE id IS
> NOT NULL AND id=${useractivity.content_id}">
>        <field column="description" name="description" />
>        <field column="title" name="title" />
>        <field column="url" name="url" />
>      </entity>
>    </entity>
>  </document>
> </dataConfig>
>
> schema.xml has the standard description and title fields.
>  <field name="title" type="text_general" indexed="true" stored="true"
> multiValued="true"/>
>   <field name="description" type="text_general" indexed="true"
> stored="true"/>
>
>
> --
> View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Token-exhilar-exceeds-length-of-provided-text-sized-3801-tp6971043p6971043.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Save $700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>
>

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Token exhilar exceeds length of provided text sized 3801

OldSkoolMark
Sorry about the duplicate messages. The forum software threw an exception and claimed to reject them.

Anyway, I'm using Solr 3.4.0. I've attached my index index.tar.gz. The query I use is:

http://localhost:8983/solr/useractivity/clustering

Thanks so much for your help!

Mark
Reply | Threaded
Open this post in threaded view
|

Re: Token exhilar exceeds length of provided text sized 3801

OldSkoolMark
In reply to this post by OldSkoolMark
One other bit of hopefully useful data. Here is one of the offending documents as reported by luke:

exhilar.txt

Thanks again.

Mark