This class provides a facility to parse a string containing one or more RFC2822 addresses into an array of RMail::Address objects. You can use it directly, but it is more conveniently used with the RMail::Address.parse method.
Create a RMail::Address::Parser object that will
parse string
. See also the RMail::Address.parse method.
# File lib/rmail/address.rb, line 280 def initialize(string) @string = string end
This function attempts to extract mailing addresses from the string passed to new. The function returns an RMail::Address::List of RMail::Address objects (RMail::Address::List is a subclass of Array). A malformed input string will not generate an exception. Instead, the array returned will simply not contained the malformed addresses.
The string is expected to be in a valid format as documented in
RFC2822's mailbox-list grammar. This will work for lists of addresses
in the To:
, From:
, etc. headers in email.
# File lib/rmail/address.rb, line 296 def parse @lexemes = [] @tokens = [] @addresses = RMail::Address::List.new @errors = 0 new_address get address_list reset_errors @addresses.delete_if { |a| !a.local || !a.domain } end
Parse this:
addrSpec = localPart "@" domain
# File lib/rmail/address.rb, line 562 def addr_spec local_part expect(SYM_AT_SIGN) domain end
Parse this: address = mailbox | group
# File lib/rmail/address.rb, line 414 def address # At this point we could be looking at a display-name, angle # addr, or local-part. If looking at a local-part, it could # actually be a display-name, according to the following: # # local-part '@' -> it is a local part of a local-part @ domain # local-part '<' -> it is a display-name of a mailbox # local-part ':' -> it is a display-name of a group # display-name '<' -> it is a mailbox display name # display-name ':' -> it is a group display name # # set lookahead to '@' '<' or ':' (or another value for # invalid input) lookahead = address_lookahead if lookahead == SYM_COLON group else mailbox(lookahead) end end
Parse this: #address_list = ([address] SYNC “,”) {[address] SYNC “,” } [address] .
# File lib/rmail/address.rb, line 362 def address_list if @sym == SYM_ATOM || @sym == SYM_ATOM_NON_ASCII || @sym == SYM_QTEXT || @sym == SYM_LESS_THAN address end sync(SYM_COMMA) return if @sym.nil? expect(SYM_COMMA) new_address while @sym == SYM_ATOM || @sym == SYM_ATOM_NON_ASCII || @sym == SYM_QTEXT || @sym == SYM_LESS_THAN || @sym == SYM_COMMA if @sym == SYM_ATOM || @sym == SYM_ATOM_NON_ASCII || @sym == SYM_QTEXT || @sym == SYM_LESS_THAN address end sync(SYM_COMMA) return if @sym.nil? expect(SYM_COMMA) new_address end if @sym == SYM_ATOM || @sym == SYM_QTEXT || @sym == SYM_LESS_THAN address end end
Parses ahead through a local-part or display-name until no longer looking at a word or “.” and returns the next symbol.
# File lib/rmail/address.rb, line 396 def address_lookahead lookahead = [] while @sym == SYM_ATOM || @sym == SYM_ATOM_NON_ASCII || @sym == SYM_QTEXT || @sym == SYM_PERIOD lookahead.push([@sym, @lexeme]) get end retval = @sym putback(@sym, @lexeme) putback_array(lookahead) get retval end
Parse this:
angleAddr = SYNC "<" [obsRoute] addrSpec SYNC ">"
# File lib/rmail/address.rb, line 536 def angle_addr expect(SYM_LESS_THAN) if @sym == SYM_AT_SIGN obs_route end addr_spec expect(SYM_GREATER_THAN) end
# File lib/rmail/address.rb, line 714 def comment depth = 0 comment = '' catch(:done) { while @string =~ /\A(\(([^\(\)\]|\.)*)/ @string = $' comment += $1 depth += 1 while @string =~ /\A(([^\(\)\]|\.)*\))/ @string = $' comment += $1 depth -= 1 throw :done if depth == 0 if @string =~ /\A(([^\(\)\]|\.)+)/ @string = $' comment += $1 end end end } comment = comment.gsub(/[\r\n\t ]+/, ' '). sub(/\A\((.*)\)$/, '\1'). gsub(/\(.)/, '\1') @addresses.last.comments = (@addresses.last.comments || []) + [comment] end
Parse this:
word = atom | atom_non_ascii | quotedString
# File lib/rmail/address.rb, line 504 def display_name_word if @sym == SYM_ATOM || @sym == SYM_ATOM_NON_ASCII || @sym == SYM_QTEXT save_text get else error "expected word, got #{@sym.inspect}" end end
Parse this:
domain = domainLiteral | obsDomain
# File lib/rmail/address.rb, line 547 def domain if @sym == SYM_DOMAIN_LITERAL save_text @addresses.last.domain = get_text get elsif @sym == SYM_ATOM obs_domain @addresses.last.domain = get_text else error "expected start of domain, got #{@sym.inspect}" end end
# File lib/rmail/address.rb, line 763 def error(s) @errors += 1 end
# File lib/rmail/address.rb, line 741 def expect(token) if @sym == token get else error("expected #{token.inspect} but got #{@sym.inspect}") end end
# File lib/rmail/address.rb, line 749 def expect_save(token) if @sym == token save_text end expect(token) end
Get a single token from the string or from the @tokens array if somebody used putback.
# File lib/rmail/address.rb, line 627 def get unless @tokens.empty? @sym, @lexeme = @tokens.pop else get_tokenize end end
Get the text that has been saved up to this point.
# File lib/rmail/address.rb, line 337 def get_text text = '' sep = '' @lexemes.each { |lexeme| if lexeme == '.' text << lexeme sep = '' else text << sep text << lexeme sep = ' ' end } @lexemes = [] text end
Get a single token from the string
# File lib/rmail/address.rb, line 636 def get_tokenize @lexeme = nil loop { case @string when nil # the end @sym = nil break when "" # the end @sym = nil break when /\A[\r\n\t ]+/ # skip whitespace @string = $' when /\A\(/ # skip comment comment when /\A""/ # skip empty quoted text @string = $' when /\A[\w!$%&\*+\/=?^_\`{\}|~#-]+/ @string = $' @sym = SYM_ATOM break when /\A"(.*?([^\]|\\))"/ @string = $' @sym = SYM_QTEXT @lexeme = $1.gsub(/\(.)/, '\1') break when /\A</ @string = $' @sym = SYM_LESS_THAN break when /\A>/ @string = $' @sym = SYM_GREATER_THAN break when /\A@/ @string = $' @sym = SYM_AT_SIGN break when /\A,/ @string = $' @sym = SYM_COMMA break when /\A:/ @string = $' @sym = SYM_COLON break when /\A;/ @string = $' @sym = SYM_SEMI_COLON break when /\A\./ @string = $' @sym = SYM_PERIOD break when /\A(\[.*?([^\]|\\)\])/ @string = $' @sym = SYM_DOMAIN_LITERAL @lexeme = $1.gsub(/(^|[^\])[\r\n\t ]+/, '\1').gsub(/\(.)/, '\1') break when /\A[\200-\377\w!$%&\*+\/=?^_\`{\}|~#-]+/ # This is just like SYM_ATOM, but includes all characters # with high bits. This is so we can allow such tokens in # the display name portion of an address even though it # violates the RFCs. @string = $' @sym = SYM_ATOM_NON_ASCII break when /\A./ @string = $' # garbage error('garbage character in string') else raise "internal error, @string is #{@string.inspect}" end } if @sym @lexeme ||= $& end end
Parse this:
group = word {word | "."} SYNC ":" [mailbox_list] SYNC ";"
# File lib/rmail/address.rb, line 480 def group word while @sym == SYM_ATOM || @sym == SYM_QTEXT || @sym == SYM_PERIOD if @sym == SYM_ATOM || @sym == SYM_QTEXT word else save_text get end end sync(SYM_COLON) expect(SYM_COLON) get_text # throw away group name @addresses.last.comments = nil if @sym == SYM_ATOM || @sym == SYM_QTEXT || @sym == SYM_COMMA || @sym == SYM_LESS_THAN mailbox_list end sync(SYM_SEMI_COLON) expect(SYM_SEMI_COLON) end
Parse this:
local_part = word *( "." word )
# File lib/rmail/address.rb, line 570 def local_part word while @sym == SYM_PERIOD save_text get word end @addresses.last.local = get_text end
Parse this:
mailbox = angleAddr | word {word | "."} angleAddr | word {"." word} "@" domain .
lookahead will be set to the return value of #address_lookahead, which will be '@' or '<' (or another value for invalid input)
# File lib/rmail/address.rb, line 445 def mailbox(lookahead) if @sym == SYM_LESS_THAN angle_addr elsif lookahead == SYM_LESS_THAN display_name_word while @sym == SYM_ATOM || @sym == SYM_ATOM_NON_ASCII || @sym == SYM_QTEXT || @sym == SYM_PERIOD if @sym == SYM_ATOM || @sym == SYM_ATOM_NON_ASCII || @sym == SYM_QTEXT display_name_word else save_text get end end @addresses.last.display_name = get_text angle_addr else word while @sym == SYM_PERIOD save_text get word end @addresses.last.local = get_text expect(SYM_AT_SIGN) domain end end
Parse a mailbox list.
# File lib/rmail/address.rb, line 525 def mailbox_list mailbox(address_lookahead) while @sym == SYM_COMMA get new_address mailbox(address_lookahead) end end
# File lib/rmail/address.rb, line 331 def new_address reset_errors @addresses.push(Address.new) end
Parse this:
obs_domain = atom *( "." atom ) .
# File lib/rmail/address.rb, line 582 def obs_domain expect_save(SYM_ATOM) while @sym == SYM_PERIOD save_text get expect_save(SYM_ATOM) end end
Parse this:
obs_domain_list = "@" domain *( *( "," ) "@" domain )
# File lib/rmail/address.rb, line 600 def obs_domain_list expect(SYM_AT_SIGN) domain while @sym == SYM_COMMA || @sym == SYM_AT_SIGN while @sym == SYM_COMMA get end expect(SYM_AT_SIGN) domain end end
Parse this:
obs_route = obs_domain_list ":"
# File lib/rmail/address.rb, line 593 def obs_route obs_domain_list expect(SYM_COLON) end
Put a token back into the input stream. This token will be retrieved by the next call to get.
# File lib/rmail/address.rb, line 614 def putback(sym, lexeme) @tokens.push([sym, lexeme]) end
Put back an array of tokens into the input stream.
# File lib/rmail/address.rb, line 619 def putback_array(a) a.reverse_each { |e| putback(*e) } end
# File lib/rmail/address.rb, line 324 def reset_errors if @errors > 0 @addresses.pop @errors = 0 end end
Save the current lexeme away for later retrieval with get_text.
# File lib/rmail/address.rb, line 356 def save_text @lexemes << @lexeme end
# File lib/rmail/address.rb, line 756 def sync(token) while @sym && @sym != token error "expected #{token.inspect} but got #{@sym.inspect}" get end end
Parse this:
word = atom | quotedString
# File lib/rmail/address.rb, line 515 def word if @sym == SYM_ATOM || @sym == SYM_QTEXT save_text get else error "expected word, got #{@sym.inspect}" end end