The AsciiWhiteSpaceAnalyzer recognizes tokens as maximal strings of non-whitespace characters. If implemented in Ruby the AsciiWhiteSpaceAnalyzer would look like;
class AsciiWhiteSpaceAnalyzer def initialize(lower = true) @lower = lower end def token_stream(field, str) if @lower return AsciiLowerCaseFilter.new(AsciiWhiteSpaceTokenizer.new(str)) else return AsciiWhiteSpaceTokenizer.new(str) end end end
As you can see it makes use of the AsciiWhiteSpaceTokenizer. You should use WhiteSpaceAnalyzer if you want to recognize multibyte encodings such as “UTF-8”.
Create a new AsciiWhiteSpaceAnalyzer which downcases tokens by default but can optionally leave case as is. Lowercasing will only be done to ASCII characters.
set to false if you don't want the field's tokens to be downcased
static VALUE frb_a_white_space_analyzer_init(int argc, VALUE *argv, VALUE self) { Analyzer *a; GET_LOWER(false); a = whitespace_analyzer_new(lower); Frt_Wrap_Struct(self, NULL, &frb_analyzer_free, a); object_add(a, self); return self; }