A WhiteSpaceTokenizer is a tokenizer that divides text at white-space. Adjacent sequences of non-WhiteSpace characters form tokens.
"Dave's résumé, at http://www.davebalmain.com/ 1234" => ["Dave's", "résumé,", "at", "http://www.davebalmain.com", "1234"]
Create a new WhiteSpaceTokenizer which optionally downcases tokens. Downcasing is done according the current locale.
set to false if you don't wish to downcase tokens
static VALUE frb_whitespace_tokenizer_init(int argc, VALUE *argv, VALUE self) { TS_ARGS(false); #ifndef POSH_OS_WIN32 if (!frb_locale) frb_locale = setlocale(LC_CTYPE, ""); #endif return get_wrapped_ts(self, rstr, mb_whitespace_tokenizer_new(lower)); }