public final class CzechAnalyzer extends Analyzer
Analyzer
for Czech language.
Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified.
NOTE: This class uses the same Version
dependent settings as StandardAnalyzer
.
Modifier and Type | Field and Description |
---|---|
static String[] |
CZECH_STOP_WORDS
List of typical stopwords.
|
overridesTokenStreamMethod
Constructor and Description |
---|
CzechAnalyzer()
Deprecated.
Use
CzechAnalyzer(Version) instead |
CzechAnalyzer(File stopwords)
Deprecated.
Use
CzechAnalyzer(Version, File) instead |
CzechAnalyzer(HashSet stopwords)
Deprecated.
Use
CzechAnalyzer(Version, HashSet) instead |
CzechAnalyzer(String[] stopwords)
Deprecated.
Use
CzechAnalyzer(Version, String[]) instead |
CzechAnalyzer(Version matchVersion)
Builds an analyzer with the default stop words (
CZECH_STOP_WORDS ). |
CzechAnalyzer(Version matchVersion,
File stopwords)
Builds an analyzer with the given stop words.
|
CzechAnalyzer(Version matchVersion,
HashSet stopwords) |
CzechAnalyzer(Version matchVersion,
String[] stopwords)
Builds an analyzer with the given stop words.
|
Modifier and Type | Method and Description |
---|---|
void |
loadStopWords(InputStream wordfile,
String encoding)
Loads stopwords hash from resource stream (file, database...).
|
TokenStream |
reusableTokenStream(String fieldName,
Reader reader)
Returns a (possibly reused)
TokenStream which tokenizes all the text in
the provided Reader . |
TokenStream |
tokenStream(String fieldName,
Reader reader)
Creates a
TokenStream which tokenizes all the text in the provided Reader . |
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setOverridesTokenStreamMethod, setPreviousTokenStream
public static final String[] CZECH_STOP_WORDS
public CzechAnalyzer()
CzechAnalyzer(Version)
insteadCZECH_STOP_WORDS
).public CzechAnalyzer(Version matchVersion)
CZECH_STOP_WORDS
).public CzechAnalyzer(String[] stopwords)
CzechAnalyzer(Version, String[])
insteadpublic CzechAnalyzer(Version matchVersion, String[] stopwords)
public CzechAnalyzer(HashSet stopwords)
CzechAnalyzer(Version, HashSet)
insteadpublic CzechAnalyzer(File stopwords) throws IOException
CzechAnalyzer(Version, File)
insteadIOException
public CzechAnalyzer(Version matchVersion, File stopwords) throws IOException
IOException
public void loadStopWords(InputStream wordfile, String encoding)
wordfile
- File containing the wordlistencoding
- Encoding used (win-1250, iso-8859-2, ...), null for default system encodingpublic final TokenStream tokenStream(String fieldName, Reader reader)
TokenStream
which tokenizes all the text in the provided Reader
.tokenStream
in class Analyzer
TokenStream
built from a StandardTokenizer
filtered with
StandardFilter
, LowerCaseFilter
, and StopFilter
public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException
TokenStream
which tokenizes all the text in
the provided Reader
.reusableTokenStream
in class Analyzer
TokenStream
built from a StandardTokenizer
filtered with
StandardFilter
, LowerCaseFilter
, and StopFilter
IOException
Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.