class WebRobots
Public Class Methods
new(user_agent, options = nil)
click to toggle source
Creates a WebRobots object for a robot named
user_agent
, with optional options
.
-
:http_get => a custom method, proc, or anything that responds to .call(uri), to be used for fetching robots.txt. It must return the response body if successful, return an empty string if the resource is not found, and return nil or raise any error on failure. Redirects should be handled within this proc.
-
:crawl_delay => determines how to react to Crawl-delay directives. If
:sleep
is given, WebRobots sleeps as demanded when allowed?(url)/disallowed?(url) is called. This is the default behavior. If:ignore
is given, WebRobots does nothing. If a custom method, proc, or anything that responds to .call(delay, last_checked_at), it is called.
# File lib/webrobots.rb, line 28 def initialize(user_agent, options = nil) @user_agent = user_agent options ||= {} @http_get = options[:http_get] || method(:http_get) crawl_delay_handler = case value = options[:crawl_delay] || :sleep when :ignore nil when :sleep method(:crawl_delay_handler) else if value.respond_to?(:call) value else raise ArgumentError, "invalid Crawl-delay handler: #{value.inspect}" end end @parser = RobotsTxt::Parser.new(user_agent, crawl_delay_handler) @parser_mutex = Mutex.new @robotstxt = create_cache() end