Preprocessing Context Data Structures

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Preprocessing Context Data Structures

seyfullahd
Hello,

I want to be sure if I understand the preprocessing context data structures clearly. I read all the javadoc comments about them, but still I don't feel I completely understand.

Please confirm or correct me :)

* AllWords holds information about all the words
   - that passed more than dfThreshold  documents (threshold defined in CaseNormalizer)
* AllPhrases holds information about all the phrases
   - that passed more than dfThreshold  documents (threshold defined in PhraseExtractor)

And

* AllLabels
   - before minClusterSize threshold applied: featureIndex holds the (somehow) indices of words and phrases in AllWords and AllPhrases that passed from LabelFilterProcessor
   - after minClusterSize threshold applied: featureIndex holds the (somehow) indices of words and phrases in AllWords and AllPhrases that passed from LabelFilterProcessor and having more documents than minClusterSize documents (threshold defined in DocumentAssigner)

Thanks in advance :)

Seyfullah
Reply | Threaded
Open this post in threaded view
|

Re: Preprocessing Context Data Structures

Dawid Weiss-2
> I want to be sure if I understand the preprocessing context data structures
> clearly. I read all the javadoc comments about them, but still I don't feel
> I completely understand.

Did you check out the PNG diagram of relationships in those arrays?
This should make it easier.

All the constraints you mentioned depend on the implementation of the
preprocessing pipeline -- for the ones in the codebase I think your
conclusions are correct, but you *could* change the preprocessing
pipeline and fill those arrays with values that depend on your own
settings (whatever they might be).

In other words: the "pointers" between those arrays are important. How
the actual values are extracted and whether they are pruned of some
information or not is up to the implementation of the preprocessing
pipeline.

Dawid

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Preprocessing Context Data Structures

seyfullahd
Dawid Weiss wrote
Did you check out the PNG diagram of relationships in those arrays?
This should make it easier.
I didn't know it's existence :) In order to understand clearly, I made an eclipse plugin generate a class diagram containing that array structures and used that to deep into :) Now, I saw it thank you. I will check it.

Dawid Weiss wrote
All the constraints you mentioned depend on the implementation of the
preprocessing pipeline -- for the ones in the codebase I think your
conclusions are correct, but you *could* change the preprocessing
pipeline and fill those arrays with values that depend on your own
settings (whatever they might be).

In other words: the "pointers" between those arrays are important. How
the actual values are extracted and whether they are pruned of some
information or not is up to the implementation of the preprocessing
pipeline.
Thanks :)

Seyfullah
Reply | Threaded
Open this post in threaded view
|

Re: Preprocessing Context Data Structures

Dawid Weiss-2
> I didn't know it's existence :)

This PNG is embedded in the JavaDoc, actually:
http://download.carrot2.org/stable/3.7.1/javadoc/org/carrot2/text/preprocessing/PreprocessingContext.html

Dawid

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Carrot2-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/carrot2-developers
Reply | Threaded
Open this post in threaded view
|

Re: Preprocessing Context Data Structures

seyfullahd
Yes, it's my fault not to know. I didn't see it since I checked only javadoc comments on context structures and only on eclipse. I saw the png file when you give me hint yesterday and just realized it :)