Document Search/Classification Engine?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Document Search/Classification Engine?

jklondon
We are building an application that will take user data (so letters, invoices
etc) and users will be able to apply simple classification tags to them.

I am wondering if we could use this technology to faciliate either

- Search-Within PDF
- Tag suggestions

Thoughts any help appreciated

Rav
Reply | Threaded
Open this post in threaded view
|

Re: Document Search/Classification Engine?

Dawid Weiss-2
- For search within PDF, look at Apache Tika's integration with Apache
Lucene/SOLR,
- For tag suggestions (where the tag set is bounded and defined), I'd
use classifiers as found in Apache Mahout; if you have an unbounded
set of tags this can boil down to collabolarative filtering (again,
Mahout).

Dawid

On Mon, Jul 5, 2010 at 6:06 PM, jklondon <[hidden email]> wrote:

>
> We are building an application that will take user data (so letters, invoices
> etc) and users will be able to apply simple classification tags to them.
>
> I am wondering if we could use this technology to faciliate either
>
> - Search-Within PDF
> - Tag suggestions
>
> Thoughts any help appreciated
>
> Rav
> --
> View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Document-Search-Classification-Engine-tp5256669p5256669.html
> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Document Search/Classification Engine?

jklondon
Dawid, thanks for your input will check out those technologies.

Ravi

On 6 July 2010 08:09, Dawid Weiss <[hidden email]> wrote:

> - For search within PDF, look at Apache Tika's integration with Apache
> Lucene/SOLR,
> - For tag suggestions (where the tag set is bounded and defined), I'd
> use classifiers as found in Apache Mahout; if you have an unbounded
> set of tags this can boil down to collabolarative filtering (again,
> Mahout).
>
> Dawid
>
> On Mon, Jul 5, 2010 at 6:06 PM, jklondon <[hidden email]> wrote:
>>
>> We are building an application that will take user data (so letters, invoices
>> etc) and users will be able to apply simple classification tags to them.
>>
>> I am wondering if we could use this technology to faciliate either
>>
>> - Search-Within PDF
>> - Tag suggestions
>>
>> Thoughts any help appreciated
>>
>> Rav
>> --
>> View this message in context: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/Document-Search-Classification-Engine-tp5256669p5256669.html
>> Sent from the Carrot2 Users and Developers Forum mailing list archive at Nabble.com.
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Sprint
>> What will you do first with EVO, the first 4G phone?
>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>> _______________________________________________
>> Carrot2-developers mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> Carrot2-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/carrot2-developers
>

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers