Class: Ronin::Web::Spider::Agent

Inherits:
Spidr::Agent
  • Object
show all
Defined in:
lib/ronin/web/spider/agent.rb

Overview

Extends Spidr::Agent.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(proxy: Support::Network::HTTP.proxy, user_agent: Support::Network::HTTP.user_agent, **kwargs) {|agent| ... } ⇒ Agent

Creates a new Spider object.

Parameters:

  • proxy (Spidr::Proxy, Addressable::URI, URI::HTTP, Hash, String, nil) (defaults to: Support::Network::HTTP.proxy)

    The proxy to use while spidering.

  • user_agent (String, nil) (defaults to: Support::Network::HTTP.user_agent)

    The User-Agent string to send.

  • kwargs (Hash{Symbol => Object})

    Additional keyword arguments for Spidr::Agent#initialize.

Options Hash (**kwargs):

  • :referer (String, nil)

    The referer URL to send.

  • :delay (Integer) — default: 0

    Duration in seconds to pause between spidering each link.

  • :schemes (Array) — default: ['http', 'https']

    The list of acceptable URI schemes to visit. The https scheme will be ignored if net/https cannot be loaded.

  • :host (String, nil)

    The host-name to visit.

  • :hosts (Array<String, Regexp, Proc>)

    The patterns which match the host-names to visit.

  • :ignore_hosts (Array<String, Regexp, Proc>)

    The patterns which match the host-names to not visit.

  • :ports (Array<Integer, Regexp, Proc>)

    The patterns which match the ports to visit.

  • :ignore_ports (Array<Integer, Regexp, Proc>)

    The patterns which match the ports to not visit.

  • :links (Array<String, Regexp, Proc>)

    The patterns which match the links to visit.

  • :ignore_links (Array<String, Regexp, Proc>)

    The patterns which match the links to not visit.

  • :exts (Array<String, Regexp, Proc>)

    The patterns which match the URI path extensions to visit.

  • :ignore_exts (Array<String, Regexp, Proc>)

    The patterns which match the URI path extensions to not visit.

Yields:

  • (agent)

    If a block is given, it will be passed the newly created web spider agent.

Yield Parameters:

  • agent (Agent)

    The newly created web spider agent.

See Also:



96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# File 'lib/ronin/web/spider/agent.rb', line 96

def initialize(proxy:      Support::Network::HTTP.proxy,
               user_agent: Support::Network::HTTP.user_agent,
               **kwargs,
               &block)
  proxy = case proxy
          when Addressable::URI
            Spidr::Proxy.new(
              host:     proxy.host,
              port:     proxy.port,
              user:     proxy.user,
              password: proxy.password
            )
          else
            proxy
          end

  user_agent = case user_agent
               when Symbol
                 Support::Network::HTTP::UserAgents[user_agent]
               else
                 user_agent
               end

  super(proxy: proxy, user_agent: user_agent, **kwargs,&block)
end

Instance Attribute Details

#collected_certsArray<Ronin::Support::Crypto::Cert> (readonly)

All certificates encountered while spidering.

Returns:

  • (Array<Ronin::Support::Crypto::Cert>)


161
162
163
# File 'lib/ronin/web/spider/agent.rb', line 161

def collected_certs
  @collected_certs
end

#visited_hostsSet<String>? (readonly)

The visited host names.

Returns:

  • (Set<String>, nil)


127
128
129
# File 'lib/ronin/web/spider/agent.rb', line 127

def visited_hosts
  @visited_hosts
end

Instance Method Details

#every_cert {|cert| ... } ⇒ Object

Passes every unique TLS certificate to the given block and populates #collected_certs.

Examples:

spider.every_cert do |cert|
  puts "Discovered new cert for #{cert.subject.command_name}, #{cert.subject_alt_name}"
end

Yields:

  • (cert)

Yield Parameters:

  • (Ronin::Support::Crypto::Cert)


178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
# File 'lib/ronin/web/spider/agent.rb', line 178

def every_cert
  @collected_certs ||= []

  serials = Set.new

  every_page do |page|
    if page.url.scheme == 'https'
      cert = sessions[page.url].peer_cert

      if serials.add?(cert.serial)
        cert = Support::Crypto::Cert(cert)

        @collected_certs << cert
        yield cert
      end
    end
  end
end

#every_comment {|comment| ... } ⇒ Object

Passes every HTML and JavaScript comment to the given block.

Examples:

spider.every_comment do |comment|
  puts comment
end

Yields:

  • (comment)

    The given block will be passed each HTML or JavaScript comment.

Yield Parameters:

  • comment (String)

    The contents of a HTML or JavaScript comment.

See Also:



354
355
356
357
# File 'lib/ronin/web/spider/agent.rb', line 354

def every_comment(&block)
  every_html_comment(&block)
  every_javascript_comment(&block)
end

#every_favicon {|favicon| ... } ⇒ Object

Pass every favicon from every page to the given block.

Examples:

spider.every_favicon do |page|
  # ...
end

Yields:

  • (favicon)

    The given block will be passed every encountered .ico file.

Yield Parameters:

  • favicon (Spidr::Page)

    An encountered .ico file.

See Also:



215
216
217
218
219
# File 'lib/ronin/web/spider/agent.rb', line 215

def every_favicon
  every_page do |page|
    yield page if page.icon?
  end
end

#every_host {|host| ... } ⇒ Object

Passes every unique host name that the agent visits to the given block and populates #visited_hosts.

Examples:

spider.every_host do |host|
  puts "Spidring #{host} ..."
end

Yields:

Yield Parameters:

  • host (String)


144
145
146
147
148
149
150
151
152
153
154
# File 'lib/ronin/web/spider/agent.rb', line 144

def every_host
  @visited_hosts ||= Set.new

  every_page do |page|
    host = page.url.host

    if @visited_hosts.add?(host)
      yield host
    end
  end
end

#every_html_comment {|comment| ... } ⇒ Object

Passes every non-empty HTML comment to the given block.

Examples:

spider.every_html_comment do |comment|
  puts comment
end

Yields:

  • (comment)

    The given block will be pass every HTML comment.

Yield Parameters:

  • comment (String)

    The HTML comment inner text, with leading and trailing whitespace stripped.



238
239
240
241
242
243
244
245
246
247
248
# File 'lib/ronin/web/spider/agent.rb', line 238

def every_html_comment
  every_html_page do |page|
    page.doc.xpath('//comment()').each do |comment|
      comment_text = comment.inner_text.strip

      unless comment_text.empty?
        yield comment_text
      end
    end
  end
end

#every_javascript {|js| ... } ⇒ Object Also known as: every_js

Passes every piece of JavaScript to the given block.

Examples:

spider.every_javascript do |js|
  puts js
end

Yields:

  • (js)

    The given block will be passed every piece of JavaScript source.

Yield Parameters:

  • js (String)

    The JavaScript source code.



266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
# File 'lib/ronin/web/spider/agent.rb', line 266

def every_javascript
  # yield inner text of every `<script type="text/javascript">` tag
  # and every `.js` URL.
  every_html_page do |page|
    page.doc.xpath('//script[@type="text/javascript"]').each do |script|
      unless script.inner_text.empty?
        yield script.inner_text
      end
    end
  end

  every_javascript_page do |page|
    yield page.body
  end
end

#every_javascript_comment {|comment| ... } ⇒ Object Also known as: every_js_comment

Passes every JavaScript comment to the given block.

Examples:

spider.every_javascript_comment do |comment|
  puts comment
end

Yields:

  • (comment)

    The given block will be passed each JavaScript comment.

Yield Parameters:

  • comment (String)

    The contents of a JavaScript comment.



327
328
329
330
331
# File 'lib/ronin/web/spider/agent.rb', line 327

def every_javascript_comment(&block)
  every_javascript do |js|
    js.scan(Support::Text::Patterns::JAVASCRIPT_COMMENT,&block)
  end
end

#every_javascript_string {|string| ... } ⇒ Object Also known as: every_js_string

Passes every JavaScript string value to the given block.

end

Examples:

spider.every_javascript_string do |str|
 puts str

Yields:

  • (string)

    The given block will be passed each JavaScript string with the quote marks removed.

Yield Parameters:

  • string (String)

    The parsed contents of a JavaScript string.



301
302
303
304
305
306
307
# File 'lib/ronin/web/spider/agent.rb', line 301

def every_javascript_string
  every_javascript do |js|
    js.scan(Support::Text::Patterns::STRING) do |js_string|
      yield Support::Encoding::JS.unquote(js_string)
    end
  end
end