Class: Ronin::Web::CLI::Commands::Spider Private
- Inherits:
-
Ronin::Web::CLI::Command
- Object
- Core::CLI::Command
- Ronin::Web::CLI::Command
- Ronin::Web::CLI::Commands::Spider
- Includes:
- CommandKit::Colors, CommandKit::Options::Verbose, CommandKit::Printing::Indent, SpiderOptions
- Defined in:
- lib/ronin/web/cli/commands/spider.rb
Overview
This class is part of a private API. You should avoid using this class if possible, as it may be removed or be changed in the future.
Spiders a website.
Usage
ronin-web spider [options] {--host HOST | --domain DOMAIN | --site URL}
Options
--host HOST Spiders the specific HOST
--domain DOMAIN Spiders the whole domain
--site URL Spiders the website, starting at the URL
--open-timeout SECS Sets the connection open timeout
--read-timeout SECS Sets the read timeout
--ssl-timeout SECS Sets the SSL connection timeout
--continue-timeout SECS Sets the continue timeout
--keep-alive-timeout SECS Sets the connection keep alive timeout
-P, --proxy PROXY Sets the proxy to use
-H, --header NAME: VALUE Sets a default header
--host-header NAME=VALUE Sets a default header
-U, --user-agent-string STRING The User-Agent string to use
-u chrome-linux|chrome-macos|chrome-windows|chrome-iphone|chrome-ipad|chrome-android|firefox-linux|firefox-macos|firefox-windows|firefox-iphone|firefox-ipad|firefox-android|safari-macos|safari-iphone|safari-ipad|edge,
--user-agent The User-Agent to use
-R, --referer URL Sets the Referer URL
--delay SECS Sets the delay in seconds between each request
-l, --limit COUNT Only spiders up to COUNT pages
-d, --max-depth DEPTH Only spiders up to max depth
--enqueue URL Adds the URL to the queue
--visited URL Marks the URL as previously visited
--strip-fragments Enables/disables stripping the fragment component of every URL
--strip-query Enables/disables stripping the query component of every URL
--visit-scheme SCHEME Visit URLs with the URI scheme
--visit-schemes-like /REGEX/ Visit URLs with URI schemes that match the REGEX
--ignore-scheme SCHEME Ignore the URLs with the URI scheme
--ignore-schemes-like /REGEX/
Ignore the URLs with URI schemes matching the REGEX
--visit-host HOST Visit URLs with the matching host name
--visit-hosts-like /REGEX/ Visit URLs with hostnames that match the REGEX
--ignore-host HOST Ignore the host name
--ignore-hosts-like /REGEX/ Ignore the host names matching the REGEX
--visit-port PORT Visit URLs with the matching port number
--visit-ports-like /REGEX/ Visit URLs with port numbers that match the REGEX
--ignore-port PORT Ignore the port number
--ignore-ports-like /REGEX/ Ignore the port numbers matching the REGEXP
--visit-link URL Visit the URL
--visit-links-like /REGEX/ Visit URLs that match the REGEX
--ignore-link URL Ignore the URL
--ignore-links-like /REGEX/ Ignore URLs matching the REGEX
--visit-ext FILE_EXT Visit URLs with the matching file ext
--visit-exts-like /REGEX/ Visit URLs with file exts that match the REGEX
--ignore-ext FILE_EXT Ignore the URLs with the file ext
--ignore-exts-like /REGEX/ Ignore URLs with file exts matching the REGEX
-r, --robots Specifies whether to honor robots.txt
-v, --verbose Enables verbose output
--print-stauts Print the status codes for each URL
--print-headers Print response headers for each URL
--print-header NAME Prints a specific header
--history FILE The history file
--archive DIR Archive every visited page to the DIR
--git-archive DIR Archive every visited page to the git repository
-X, --xpath XPATH Evaluates the XPath on each HTML page
-C, --css-path XPATH Evaluates the CSS-path on each HTML page
--print-hosts Print all discovered hostnames
--print-certs Print all encountered SSL/TLS certificates
--save-certs Saves all encountered SSL/TLS certificates
--print-js-strings Print all JavaScript strings
--print-js-url-strings Print URL strings found in JavaScript
--print-js-path-strings Print path strings found in JavaScript
--print-js-absolute-path-strings
Only print absolute path strings found in JavaScript
--print-js-relative-path-strings
Only print relative path strings found in JavaScript
--print-html-comments Print HTML comments
--print-js-comments Print JavaScript comments
--print-comments Print all HTML and JavaScript comments
-h, --help Print help information
Examples
ronin-web spider --host scanme.nmap.org
ronin-web spider --domain nmap.org
ronin-web spider --site https://scanme.nmap.org/
Instance Attribute Summary
Attributes included from SpiderOptions
Instance Method Summary collapse
-
#define_printing_callbacks(agent) ⇒ Object
private
Defines callbacks that print information.
-
#print_content(content) ⇒ Object
private
Print content from a page.
-
#print_headers(page) ⇒ Object
private
Prints the headers of a page.
-
#print_page(page) ⇒ Object
private
Prints a page.
-
#print_query(page) ⇒ Object
private
Prints the XPath or CSS-path query result for the page.
-
#print_status(page) ⇒ Object
private
Prints the status of a page.
-
#print_url(page) ⇒ Object
private
Prints the URL for a page.
-
#print_verbose(message) ⇒ Object
private
Prints an information message.
-
#run ⇒ Object
private
Runs the
ronin-web spider
command.
Methods included from SpiderOptions
#continue_timeout, #continue_timeout=, #default_headers, #delay, #delay=, #history, #host_headers, #ignore_exts, #ignore_hosts, #ignore_links, #ignore_ports, #ignore_schemes, included, #initialize, #keep_alive_timeout, #keep_alive_timeout=, #limit, #limit=, #max_depth, #max_depth=, #new_agent, #open_timeout, #open_timeout=, #proxy, #proxy=, #queue, #read_timeout, #read_timeout=, #referer, #referer=, #robots, #robots=, #ssl_timeout, #ssl_timeout=, #strip_fragments, #strip_fragments=, #strip_query, #strip_query=, #user_agent, #user_agent=, #visit_exts, #visit_hosts, #visit_links, #visit_ports, #visit_schemes
Instance Method Details
#define_printing_callbacks(agent) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Defines callbacks that print information.
280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 280 def define_printing_callbacks(agent) if [:print_hosts] agent.every_host do |host| print_verbose "spidering new host #{host}" end end if [:print_certs] agent.every_cert do |cert| print_verbose "encountered new certificate for #{cert.subject.common_name}" end end if [:print_js_strings] agent.every_js_string do |string| print_content string end end if [:print_js_url_strings] agent.every_js_url_string do |url| print_content url end end if [:print_js_path_strings] agent.every_js_path_string do |path| print_content path end end if [:print_js_absolute_path_strings] agent.every_js_absolute_path_string do |path| print_content path end end if [:print_js_relative_path_strings] agent.every_js_relative_path_string do |path| print_content path end end if [:print_html_comments] agent.every_html_comment do |comment| print_content comment end end if [:print_js_comments] agent.every_js_comment do |comment| print_content comment end end if [:print_comments] agent.every_comment do |comment| print_content comment end end end |
#print_content(content) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Print content from a page.
444 445 446 447 448 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 444 def print_content(content) content.to_s.each_line do |line| puts " #{line}" end end |
#print_headers(page) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Prints the headers of a page.
405 406 407 408 409 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 405 def print_headers(page) page.response.each_capitalized do |name,value| print_content "#{name}: #{value}" end end |
#print_page(page) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Prints a page.
384 385 386 387 388 389 390 391 392 393 394 395 396 397 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 384 def print_page(page) print_status(page) if [:print_status] print_url(page) if [:print_headers] print_headers(page) elsif [:print_header] if (header = page.response[[:print_header]]) print_content header end end print_query(page) if ([:xpath] || [:css_path]) end |
#print_query(page) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Prints the XPath or CSS-path query result for the page.
417 418 419 420 421 422 423 424 425 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 417 def print_query(page) if page.html? if [:xpath] print_content page.doc.xpath([:xpath]) elsif [:css_path] print_content page.doc.css([:css_path]) end end end |
#print_status(page) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Prints the status of a page.
348 349 350 351 352 353 354 355 356 357 358 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 348 def print_status(page) if page.code < 300 print "#{colors.bright_green(page.code)} " elsif page.code < 400 print "#{colors.bright_yellow(page.code)} " elsif page.code < 500 print "#{colors.bright_red(page.code)} " else print "#{colors.bold(colors.bright_red(page.code))} " end end |
#print_url(page) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Prints the URL for a page.
366 367 368 369 370 371 372 373 374 375 376 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 366 def print_url(page) if page.code < 300 puts "#{colors.green(page.url)} " elsif page.code < 400 puts "#{colors.yellow(page.url)} " elsif page.code < 500 puts "#{colors.red(page.url)} " else puts "#{colors.bold(colors.red(page.url))} " end end |
#print_verbose(message) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Prints an information message.
432 433 434 435 436 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 432 def print_verbose() if verbose? puts colors.yellow("* #{}") end end |
#run ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Runs the ronin-web spider
command.
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 204 def run archive = if [:archive] Web::Spider::Archive.open([:archive]) elsif [:git_archive] Web::Spider::GitArchive.open([:git_archive]) end history_file = if [:history] File.open([:history],'w') end agent = new_agent do |agent| agent.every_page do |page| print_page(page) end agent.every_failed_url do |url| print_verbose "failed to request #{url}" end define_printing_callbacks(agent) if history_file agent.every_page do |page| history_file.puts(page.url) history_file.flush end end if archive agent.every_ok_page do |page| archive.write(page.url,page.body) end end end # post-spidering tasks if [:git_archive] archive.commit "Updated #{Time.now}" end if [:print_hosts] puts puts "Spidered the following hosts:" puts indent do agent.visited_hosts.each do |host| puts host end end end if [:print_certs] puts puts "Discovered the following certs:" puts agent.collected_certs.each do |cert| puts cert puts end end ensure if [:history] history_file.close end end |