class Syntax::Tokenizer

The base class of all tokenizers. It sets up the scanner and manages the looping until all tokens have been extracted. It also provides convenience methods to make sure adjacent tokens of identical groups are returned as a single token.

Constants

EOL

Attributes

chunk[R]

The current chunk of text being accumulated

group[R]

The current group being processed by the tokenizer

Public Instance Methods

finish() click to toggle source

Finish tokenizing. This flushes the buffer, yielding any remaining text to the client.

# File lib/syntax/common.rb, line 57
def finish
  start_group nil
  teardown
end
option(opt) click to toggle source

Get the value of the specified option.

# File lib/syntax/common.rb, line 89
def option(opt)
  @options ? @options[opt] : nil
end
set( opts={} ) click to toggle source

Specify a set of tokenizer-specific options. Each tokenizer may (or may not) publish any options, but if a tokenizer does those options may be used to specify optional behavior.

# File lib/syntax/common.rb, line 84
def set( opts={} )
  ( @options ||= Hash.new ).update opts
end
setup() click to toggle source

Subclasses may override this method to provide implementation-specific setup logic.

# File lib/syntax/common.rb, line 52
def setup
end
start( text, &block ) click to toggle source

Start tokenizing. This sets up the state in preparation for tokenization, such as creating a new scanner for the text and saving the callback block. The block will be invoked for each token extracted.

# File lib/syntax/common.rb, line 42
def start( text, &block )
  @chunk = ""
  @group = :normal
  @callback = block
  @text = StringScanner.new( text )
  setup
end
step() click to toggle source

Subclasses must implement this method, which is called for each iteration of the tokenization process. This method may extract multiple tokens.

# File lib/syntax/common.rb, line 69
def step
  raise NotImplementedError, "subclasses must implement #step"
end
teardown() click to toggle source

Subclasses may override this method to provide implementation-specific teardown logic.

# File lib/syntax/common.rb, line 64
def teardown
end
tokenize( text, &block ) click to toggle source

Begins tokenizing the given text, calling step until the text has been exhausted.

# File lib/syntax/common.rb, line 75
def tokenize( text, &block )
  start text, &block
  step until @text.eos?
  finish
end