In Files

Parent

Mechanize

Synopsis

The Mechanize library is used for automating interaction with a website. It can follow links, and submit forms. Form fields can be populated and submitted. A history of URL's is maintained and can be queried.

Example

require 'rubygems'
require 'mechanize'
require 'logger'

agent = Mechanize.new { |a| a.log = Logger.new("mech.log") }
agent.user_agent_alias = 'Mac Safari'
page = agent.get("http://www.google.com/")
search_form = page.form_with(:name => "f")
search_form.field_with(:name => "q").value = "Hello"
search_results = agent.submit(search_form)
puts search_results.body

Constants

AGENT_ALIASES

User Agent aliases

VERSION

The version of Mechanize you are using.

Attributes

html_parser[RW]
log[RW]
ca_file[RW]
cert[RW]
conditional_requests[RW]
follow_meta_refresh[RW]
follow_redirect?[RW]

Controls how this agent deals with redirects. If it is set to true or :all, all 3xx redirects are automatically followed. This is the default behavior. If it is :permanent, only 301 (Moved Permanently) redirects are followed. If it is a false value, no redirects are followed.

gzip_enabled[RW]
history[R]
history_added[RW]
html_parser[RW]

The HTML parser to be used when parsing documents

keep_alive[RW]
keep_alive_time[RW]
key[RW]
open_timeout[RW]
pass[RW]
pluggable_parser[R]
proxy_addr[R]

Proxy settings

proxy_pass[R]
proxy_port[R]
proxy_user[R]
read_timeout[RW]
redirect_ok[RW]

Controls how this agent deals with redirects. If it is set to true or :all, all 3xx redirects are automatically followed. This is the default behavior. If it is :permanent, only 301 (Moved Permanently) redirects are followed. If it is a false value, no redirects are followed.

redirection_limit[RW]
request_headers[RW]

A hash of custom request headers

scheme_handlers[RW]
user_agent[RW]
verify_callback[RW]
watch_for_set[RW]

Public Class Methods

inherited(child) click to toggle source
# File lib/mechanize.rb, line 120
def inherited(child)
  child.html_parser ||= html_parser
  child.log ||= log
  super
end
new() click to toggle source
# File lib/mechanize.rb, line 127
def initialize
  # attr_accessors
  @cookie_jar     = CookieJar.new
  @log            = nil
  @open_timeout   = nil
  @read_timeout   = nil
  @user_agent     = AGENT_ALIASES['Mechanize']
  @watch_for_set  = nil
  @history_added  = nil
  @ca_file        = nil # OpenSSL server certificate file

  # callback for OpenSSL errors while verifying the server certificate
  # chain, can be used for debugging or to ignore errors by always
  # returning _true_
  @verify_callback = nil
  @cert           = nil # OpenSSL Certificate
  @key            = nil # OpenSSL Private Key
  @pass           = nil # OpenSSL Password
  @redirect_ok    = true
  @gzip_enabled   = true

  # attr_readers
  @history        = Mechanize::History.new
  @pluggable_parser = PluggableParser.new

  # Auth variables
  @user           = nil # Auth User
  @password       = nil # Auth Password
  @digest         = nil # DigestAuth Digest
  @auth_hash      = {}  # Keep track of urls for sending auth
  @request_headers= {}  # A hash of request headers to be used

  @conditional_requests = true

  @follow_meta_refresh  = false
  @redirection_limit    = 20

  # Connection Cache & Keep alive
  @keep_alive_time  = 300
  @keep_alive       = true

  @scheme_handlers  = Hash.new { |h,k|
    h[k] = lambda { |link, page|
      raise UnsupportedSchemeError.new(k)
    }
  }
  @scheme_handlers['http']      = lambda { |link, page| link }
  @scheme_handlers['https']     = @scheme_handlers['http']
  @scheme_handlers['relative']  = @scheme_handlers['http']
  @scheme_handlers['file']      = @scheme_handlers['http']

  @pre_connect_hook = Chain::PreConnectHook.new
  @post_connect_hook = Chain::PostConnectHook.new

  set_http
  @html_parser = self.class.html_parser

  yield self if block_given?
end

Public Instance Methods

auth(user, password) click to toggle source

Sets the user and password to be used for authentication.

# File lib/mechanize.rb, line 225
def auth(user, password)
  @user       = user
  @password   = password
end
Also aliased as: basic_auth
back() click to toggle source

Equivalent to the browser back button. Returns the most recent page visited.

# File lib/mechanize.rb, line 347
def back
  @history.pop
end
basic_auth(user, password) click to toggle source
Alias for: auth
click(link) click to toggle source

If the parameter is a string, finds the button or link with the value of the string and clicks it. Otherwise, clicks the Mechanize::Page::Link object passed in. Returns the page fetched.

# File lib/mechanize.rb, line 324
def click(link)
  case link
  when String, Regexp
    if real_link = page.link_with(:text => link)
      click real_link
    else
      button = nil
      form = page.forms.find do |f|
        button = f.button_with(:value => link)
        button.is_a? Form::Submit
      end
      submit form, button if form
    end
  else
    referer = link.page rescue referer = nil
    href = link.respond_to?(:href) ? link.href :
      (link['href'] || link['src'])
    get(:url => href, :referer => (referer || current_page()))
  end
end
cookies() click to toggle source

Returns a list of cookies stored in the cookie jar.

# File lib/mechanize.rb, line 220
def cookies
  @cookie_jar.to_a
end
current_page() click to toggle source

Returns the current page loaded by Mechanize

# File lib/mechanize.rb, line 435
def current_page
  @history.last
end
Also aliased as: page
delete(url, query_params = {}, options = {}) click to toggle source

DELETE to url with query_params, and setting options:

delete('http://tenderlovemaking.com/', {'q' => 'foo'}, :headers => {})
# File lib/mechanize.rb, line 292
def delete(url, query_params = {}, options = {})
  page = head(url, query_params, options.merge({:verb => :delete}))
  add_to_history(page)
  page
end
get(options, parameters = [], referer = nil) click to toggle source

Fetches the URL passed in and returns a page.

# File lib/mechanize.rb, line 232
def get(options, parameters = [], referer = nil)
  verb = :get

  unless options.is_a? Hash
    url = options
    unless parameters.respond_to?(:each) # FIXME: Remove this in 0.8.0
      referer = parameters
      parameters = []
    end
  else
    raise ArgumentError.new("url must be specified") unless url = options[:url]
    parameters = options[:params] || []
    referer    = options[:referer]
    headers    = options[:headers]
    verb       = options[:verb] || verb
  end

  unless referer
    if url.to_s =~ %{\Ahttps?://}
      referer = Page.new(nil, {'content-type'=>'text/html'})
    else
      referer = current_page || Page.new(nil, {'content-type'=>'text/html'})
    end
  end

  # FIXME: Huge hack so that using a URI as a referer works.  I need to
  # refactor everything to pass around URIs but still support
  # Mechanize::Page#base
  unless referer.is_a?(Mechanize::File)
    referer = referer.is_a?(String) ?
    Page.new(URI.parse(referer), {'content-type' => 'text/html'}) :
      Page.new(referer, {'content-type' => 'text/html'})
  end

  # fetch the page
  page = fetch_page(  :uri      => url,
                      :referer  => referer,
                      :headers  => headers || {},
                      :verb     => verb,
                      :params   => parameters
                      )
  add_to_history(page)
  yield page if block_given?
  page
end
get_file(url) click to toggle source

Fetch a file and return the contents of the file.

# File lib/mechanize.rb, line 317
def get_file(url)
  get(url).body
end
head(url, query_params = {}, options = {}) click to toggle source

HEAD to url with query_params, and setting options:

head('http://tenderlovemaking.com/', {'q' => 'foo'}, :headers => {})
# File lib/mechanize.rb, line 303
def head(url, query_params = {}, options = {})
  options = {
    :uri      => url,
    :headers  => {},
    :params   => query_params,
    :verb     => :head
  }.merge(options)
  # fetch the page
  page = fetch_page(options)
  yield page if block_given?
  page
end
log() click to toggle source
# File lib/mechanize.rb, line 190
def log; self.class.log end
log=(l) click to toggle source
# File lib/mechanize.rb, line 189
def log=(l); self.class.log = l end
max_history() click to toggle source
# File lib/mechanize.rb, line 188
def max_history; @history.max_size end
max_history=(length) click to toggle source
# File lib/mechanize.rb, line 187
def max_history=(length); @history.max_size = length end
page() click to toggle source
Alias for: current_page
post(url, query={}, headers={}) click to toggle source

Posts to the given URL with the request entity. The request entity is specified by either a string, or a list of key-value pairs represented by a hash or an array of arrays.

Examples:

agent.post('http://example.com/', "foo" => "bar")

agent.post('http://example.com/', [ ["foo", "bar"] ])

agent.post('http://example.com/', "<message>hello</message>", 'Content-Type' => 'application/xml')
# File lib/mechanize.rb, line 361
def post(url, query={}, headers={})
  if query.is_a?(String)
    return request_with_entity(:post, url, query, :headers => headers)
  end
  node = {}
  # Create a fake form
  class << node
    def search(*args); []; end
  end
  node['method'] = 'POST'
  node['enctype'] = 'application/x-www-form-urlencoded'

  form = Form.new(node)
  query.each { |k,v|
    if v.is_a?(IO)
      form.enctype = 'multipart/form-data'
      ul = Form::FileUpload.new({'name' => k.to_s},::File.basename(v.path))
      ul.file_data = v.read
      form.file_uploads << ul
    else
      form.fields << Form::Field.new({'name' => k.to_s},v)
    end
  }
  post_form(url, form, headers)
end
post_connect_hooks() click to toggle source
# File lib/mechanize.rb, line 196
def post_connect_hooks
  @post_connect_hook.hooks
end
pre_connect_hooks() click to toggle source
# File lib/mechanize.rb, line 192
def pre_connect_hooks
  @pre_connect_hook.hooks
end
put(url, entity, options = {}) click to toggle source

PUT to url with entity, and setting options:

put('http://tenderlovemaking.com/', 'new content', :headers => {'Content-Type' => 'text/plain'})
# File lib/mechanize.rb, line 283
def put(url, entity, options = {})
  request_with_entity(:put, url, entity, options)
end
request_with_entity(verb, url, entity, options={}) click to toggle source
# File lib/mechanize.rb, line 409
def request_with_entity(verb, url, entity, options={})
  cur_page = current_page || Page.new( nil, {'content-type'=>'text/html'})

  options = {
    :uri      => url,
    :referer  => cur_page,
    :headers  => {},
  }.update(options)

  headers = {
    'Content-Type' => 'application/octet-stream',
    'Content-Length' => entity.size.to_s,
  }.update(options[:headers])

  options.update({
                   :verb => verb,
                   :params => [entity],
                   :headers => headers,
                 })

  page = fetch_page(options)
  add_to_history(page)
  page
end
set_proxy(addr, port, user = nil, pass = nil) click to toggle source

Sets the proxy address, port, user, and password addr should be a host, with no "http://"

# File lib/mechanize.rb, line 202
def set_proxy(addr, port, user = nil, pass = nil)
  proxy = URI.parse "http://#{addr}"
  proxy.port = port
  proxy.user     = user if user
  proxy.password = pass if pass

  set_http proxy

  nil
end
submit(form, button=nil, headers={}) click to toggle source

Submit a form with an optional button. Without a button:

page = agent.get('http://example.com')
agent.submit(page.forms.first)

With a button

agent.submit(page.forms.first, page.forms.first.buttons.first)
# File lib/mechanize.rb, line 393
def submit(form, button=nil, headers={})
  form.add_button_to_query(button) if button
  case form.method.upcase
  when 'POST'
    post_form(form.action, form, headers)
  when 'GET'
    get(  :url      => form.action.gsub(/\?[^\?]*$/, ''),
          :params   => form.build_query,
          :headers  => headers,
          :referer  => form.page
          )
  else
    raise "unsupported method: #{form.method.upcase}"
  end
end
transact() click to toggle source

Runs given block, then resets the page history as it was before. self is given as a parameter to the block. Returns the value of the block.

# File lib/mechanize.rb, line 454
def transact
  history_backup = @history.dup
  begin
    yield self
  ensure
    @history = history_backup
  end
end
user_agent_alias=(al) click to toggle source

Set the user agent for the Mechanize object. See AGENT_ALIASES

# File lib/mechanize.rb, line 215
def user_agent_alias=(al)
  self.user_agent = AGENT_ALIASES[al] || raise("unknown agent alias")
end
visited?(url) click to toggle source

Returns whether or not a url has been visited

# File lib/mechanize.rb, line 440
def visited?(url)
  ! visited_page(url).nil?
end
visited_page(url) click to toggle source

Returns a visited page for the url passed in, otherwise nil

# File lib/mechanize.rb, line 445
def visited_page(url)
  if url.respond_to? :href
    url = url.href
  end
  @history.visited_page(resolve(url))
end

[Validate]

Generated with the Darkfish Rdoc Generator 2.