Carrot2 Webapp Performance

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Carrot2 Webapp Performance

milos
Dear colleagues,

I am intending to use slightly changed Carrot2 webapp in my commercial
software. Hence, I am interested in its performance issues. Do you have
any numbers, graphs or white paper about carrot2 webapp performance? Are
there any performance tips concerning Carrot on Tomcat and Tomcat itself?

I'll use only Lingo clustering algorithm and will have only one Lucene
source for data. Is there a possibility to change the code to reflect that
fact, and what do I have to do (if that would speed up the webapp)? What
classes I have to check in that case?

Is it better to have Apache+Tomcat combination or just Tomcat stand-alone,
keeping in mind that my webapp only receives queries and displays
clustered results from Lucene index? I suppose that almost all traffic are
non-static pages, is that true?

There are a lot of questions but I hope you can help me (as always before)
at least in an iterative conversation.

Sincerely, Milos


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Carrot2 Webapp Performance

Dawid Weiss-2

> software. Hence, I am interested in its performance issues. Do you have
> any numbers, graphs or white paper about carrot2 webapp performance? Are
> there any performance tips concerning Carrot on Tomcat and Tomcat itself?

We don't have these, unfortunately. If you'd like to contribute them back
(analysis with JMeter, Grinder or something like this), it would be great. We
consider the Web application merely a demo of the technology and we are not
fine-tuning it for performance (although it should do fine). The most expensive
bits will be certainly clustering and XSLT processing -- depending on your load
it may be required to distribute the workload to a number of machines anyway.

> I'll use only Lingo clustering algorithm and will have only one Lucene
> source for data. Is there a possibility to change the code to reflect that
> fact, and what do I have to do (if that would speed up the webapp)? What
> classes I have to check in that case?

You can remove any input/ algorithm descriptors that are not relevant to you. If
you want further customizations, you'll have to delve in the Web application
code and modify it by hand.

> Is it better to have Apache+Tomcat combination or just Tomcat stand-alone,
> keeping in mind that my webapp only receives queries and displays
> clustered results from Lucene index? I suppose that almost all traffic are
> non-static pages, is that true?

Not everything is dynamic (CSS, images, etc), but there is certainly some
overhead for serving static content. If I may suggest something, I would go with
the 3.0-line (it is currently on the branch, we will be releasing it as soon as
we go through a few issues that are still open). The 3.0 release introduces a
wholly different processing pipeline, but for you the most important thing is
that it is optimized for Web performance (css sprites, many optimizations in the
JS code). Compare the speed of loading with the 2.x line:

http://demo.carrot2.org/demo-3.0/

The docs are scarce at the moment, so you'll have to work with the source code.

D.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Carrot2 Webapp Performance

milos
Hello,
thank you for your answer.

I did the following:

1) removed all files from algorithms folder except Lingo related files
2) removed all sources from inputs except input-lucene.bsh
3) removed from common.xsl, page.xsl and customize.xsl all unnecessary
templates related to tabs

It seems to be more responsive now!

But you are right - the most critical part is XSLT processing.
Since I only have to query Lucene index and cluster the results without
need for tabs, facets and other stuff,  the question is how to remove XSLT
at all from webapp and use QueryProcessorServlet and maybe some other
classes to just write plain HTML? Is it possible to do that retaining only
parts related to clustering and caching search results? What classes do I
have to change and maybe write others?

In fact the main question is: do you have any class diagram or some other
sketch of the webapp's system architecture to start with in order to
remove XSLT processing? (I am using version 2.x)

Best regards, Milos

>
>> software. Hence, I am interested in its performance issues. Do you have
>> any numbers, graphs or white paper about carrot2 webapp performance? Are
>> there any performance tips concerning Carrot on Tomcat and Tomcat
>> itself?
>
> We don't have these, unfortunately. If you'd like to contribute them back
> (analysis with JMeter, Grinder or something like this), it would be great.
> We
> consider the Web application merely a demo of the technology and we are
> not
> fine-tuning it for performance (although it should do fine). The most
> expensive
> bits will be certainly clustering and XSLT processing -- depending on your
> load
> it may be required to distribute the workload to a number of machines
> anyway.
>
>> I'll use only Lingo clustering algorithm and will have only one Lucene
>> source for data. Is there a possibility to change the code to reflect
>> that
>> fact, and what do I have to do (if that would speed up the webapp)? What
>> classes I have to check in that case?
>
> You can remove any input/ algorithm descriptors that are not relevant to
> you. If
> you want further customizations, you'll have to delve in the Web
> application
> code and modify it by hand.
>
>> Is it better to have Apache+Tomcat combination or just Tomcat
>> stand-alone,
>> keeping in mind that my webapp only receives queries and displays
>> clustered results from Lucene index? I suppose that almost all traffic
>> are
>> non-static pages, is that true?
>
> Not everything is dynamic (CSS, images, etc), but there is certainly some
> overhead for serving static content. If I may suggest something, I would
> go with
> the 3.0-line (it is currently on the branch, we will be releasing it as
> soon as
> we go through a few issues that are still open). The 3.0 release
> introduces a
> wholly different processing pipeline, but for you the most important thing
> is
> that it is optimized for Web performance (css sprites, many optimizations
> in the
> JS code). Compare the speed of loading with the 2.x line:
>
> http://demo.carrot2.org/demo-3.0/
>
> The docs are scarce at the moment, so you'll have to work with the source
> code.
>
> D.
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the
> world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Carrot2 Webapp Performance

Stanislaw Osinski
Administrator
1) removed all files from algorithms folder except Lingo related files
2) removed all sources from inputs except input-lucene.bsh
3) removed from common.xsl, page.xsl and customize.xsl all unnecessary
templates related to tabs

It seems to be more responsive now!

But you are right - the most critical part is XSLT processing.

Out of curiosity -- what's your testing method? Have you measured how Lingo clustering time relates to the webapp overhead (XSLT etc)? If I were to guess I'd say that Lingo processing takes at least an order of magnitude longer.

 
Since I only have to query Lucene index and cluster the results without
need for tabs, facets and other stuff,  the question is how to remove XSLT
at all from webapp and use QueryProcessorServlet and maybe some other
classes to just write plain HTML?

Maybe in this case it would be easier to write the application from scratch using Carrot2 API? Then you'd have full control over the generation of markup. However, the markup for the cluster tree is rather complicated, so if you decide to generate it directly from Java, your code may end up messy.
 
Is it possible to do that retaining only
parts related to clustering and caching search results? What classes do I
have to change and maybe write others?

The 2.x line webapp wasn't terribly cleanly-coded, that's why we improved it in the 3.x line Dawid mentioned earlier. The latter is also built based on XSLT though.

Before you delve into removing XSLT, I'd double check if it really slows things down that much (e.g. check if template caching is enabled in web.xml etc.).

Cheers,

Staszek

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Carrot2 Webapp Performance

milos
Hello,

> Out of curiosity -- what's your testing method? Have you measured how
> Lingo
> clustering time relates to the webapp overhead (XSLT etc)? If I were to
> guess I'd say that Lingo processing takes at least an order of magnitude
> longer.

I didn't test it by scientific means at all! It is just my subjective
impression :)

> Before you delve into removing XSLT, I'd double check if it really slows
> things down that much (e.g. check if template caching is enabled in
> web.xml
> etc.).
>

OK then. Since you provide a nice clustering tree I'll stick with your
webapp.
I hope that I'll test the performance before commercial usage and give you
the report on that issue.
I have just 3 more performance questions:

1) Is clustering implemented faster in version 3?
2) As I understood I will speed up XSLT processing if I enable template
caching, is that true?
3) If I reduce the number of pages to be clustered from 100 to 50 that
will speed up the response but is that related to quality/number of
clusters and how?

Regards, Milos


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Carrot2 Webapp Performance

Stanislaw Osinski
Administrator
1) Is clustering implemented faster in version 3?

The default settings are probably slightly slower (200ms per 100 docs on our build server, 100ms per 100 docs if you use native matrix libraries -- we have binaries for certain platforms + compilation instructions for others). However, the new version should produce better results (e.g. better doc to cluster assignment), and has a lot of tuning parameters you can change to trade quality for speed.

If you'd like to experiment with clustering and your Lucene index, you can download the 3.x line GUI application from here:

http://builds.carrot2.org/download/C3WORKBENCH-COMMIT/artifacts/latest/Carrot2-Workbench-Binaries
 
2) As I understood I will speed up XSLT processing if I enable template
caching, is that true?

If template caching is set to false, the XSLT templates are read / parsed on every request (good for development). If it's set to true, XSLT templates are read / parsed only once (good for production)
 
3) If I reduce the number of pages to be clustered from 100 to 50 that
will speed up the response but is that related to quality/number of
clusters and how?

This will have a large impact on clustering time -- the time should be significantly lower. When it comes to results -- you'll simply get fewer clusters. Again, this is something you can experiment with using the GUI application I mentioned above.

Cheers,

Staszek

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Carrot2 Webapp Performance

Dawid Weiss-2

> If template caching is set to false, the XSLT templates are read / parsed on
> every request (good for development). If it's set to true, XSLT templates
> are read / parsed only once (good for production)

Just a short follow-up, you can resign from using XSLT entirely and write your
own serializers that hook up in the Webapp (2.x line). This will be a pain to
maintain and write though, so before you commit to it, test it with template
caching and a modern XSLT processor (saxon and xsltc).

D.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Carrot2 Webapp Performance

milos
In reply to this post by Stanislaw Osinski
OK. Now I have a complete information about what to do.

Thanks to you and Dawid,
Milos

>>
>> 1) Is clustering implemented faster in version 3?
>
>
> The default settings are probably slightly slower (200ms per 100 docs on
> our
> build server, 100ms per 100 docs if you use native matrix libraries -- we
> have binaries for certain platforms + compilation instructions for
> others).
> However, the new version should produce better results (e.g. better doc to
> cluster assignment), and has a lot of tuning parameters you can change to
> trade quality for speed.
>
> If you'd like to experiment with clustering and your Lucene index, you can
> download the 3.x line GUI application from here:
>
> http://builds.carrot2.org/download/C3WORKBENCH-COMMIT/artifacts/latest/Carrot2-Workbench-Binaries
>
>
>> 2) As I understood I will speed up XSLT processing if I enable template
>> caching, is that true?
>
>
> If template caching is set to false, the XSLT templates are read / parsed
> on
> every request (good for development). If it's set to true, XSLT templates
> are read / parsed only once (good for production)
>
>
>> 3) If I reduce the number of pages to be clustered from 100 to 50 that
>> will speed up the response but is that related to quality/number of
>> clusters and how?
>
>
> This will have a large impact on clustering time -- the time should be
> significantly lower. When it comes to results -- you'll simply get fewer
> clusters. Again, this is something you can experiment with using the GUI
> application I mentioned above.
>
> Cheers,
>
> Staszek
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the
> world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/_______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers