Class: Ronin::Web::CLI::Commands::Spider Private
- Inherits:
-
Ronin::Web::CLI::Command
- Object
- Core::CLI::Command
- Ronin::Web::CLI::Command
- Ronin::Web::CLI::Commands::Spider
- Includes:
- CommandKit::Colors, CommandKit::Options::Verbose, CommandKit::Printing::Indent
- Defined in:
- lib/ronin/web/cli/commands/spider.rb
Overview
This class is part of a private API. You should avoid using this class if possible, as it may be removed or be changed in the future.
Spiders a website.
Usage
ronin-web spider [options] {--host HOST | --domain DOMAIN | --site URL}
Options
-v, --verbose Enables verbose output
--open-timeout SECS Sets the connection open timeout
--read-timeout SECS Sets the read timeout
--ssl-timeout SECS Sets the SSL connection timeout
--continue-timeout SECS Sets the continue timeout
--keep-alive-timeout SECS Sets the connection keep alive timeout
-P, --proxy PROXY Sets the proxy to use.
-H, --header NAME: VALUE Sets a default header
--host-header NAME=VALUE Sets a default header
-u chrome-linux|chrome-macos|chrome-windows|chrome-iphone|chrome-ipad|chrome-android|firefox-linux|firefox-macos|firefox-windows|firefox-iphone|firefox-ipad|firefox-android|safari-macos|safari-iphone|safari-ipad|edge,
--user-agent The User-Agent to use
-U, --user-agent-string STRING The User-Agent string to use
-R, --referer URL Sets the Referer URL
--delay SECS Sets the delay in seconds between each request
-l, --limit COUNT Only spiders up to COUNT pages
-d, --max-depth DEPTH Only spiders up to max depth
--enqueue URL Adds the URL to the queue
--visited URL Marks the URL as previously visited
--strip-fragments Enables/disables stripping the fragment component of every URL
--strip-query Enables/disables stripping the query component of every URL
--visit-host HOST Visit URLs with the matching host name
--visit-hosts-like /REGEX/ Visit URLs with hostnames that match the REGEX
--ignore-host HOST Ignore the host name
--ignore-hosts-like /REGEX/ Ignore the host names matching the REGEX
--visit-port PORT Visit URLs with the matching port number
--visit-ports-like /REGEX/ Visit URLs with port numbers that match the REGEX
--ignore-port PORT Ignore the port number
--ignore-ports-like /REGEX/ Ignore the port numbers matching the REGEXP
--visit-link URL Visit the URL
--visit-links-like /REGEX/ Visit URLs that match the REGEX
--ignore-link URL Ignore the URL
--ignore-links-like /REGEX/ Ignore URLs matching the REGEX
--visit-ext FILE_EXT Visit URLs with the matching file ext
--visit-exts-like /REGEX/ Visit URLs with file exts that match the REGEX
--ignore-ext FILE_EXT Ignore the URLs with the file ext
--ignore-exts-like /REGEX/ Ignore URLs with file exts matching the REGEX
-r, --robots Specifies whether to honor robots.txt
--host HOST Spiders the specific HOST
--domain DOMAIN Spiders the whole domain
--site URL Spiders the website, starting at the URL
--print-status Print the status codes for each URL
--print-headers Print response headers for each URL
--print-header NAME Prints a specific header
--history FILE The history file
--archive DIR Archive every visited page to the DIR
--git-archive DIR Archive every visited page to the git repository
-X, --xpath XPATH Evaluates the XPath on each HTML page
-C, --css-path XPATH Evaluates the CSS-path on each HTML page
-h, --help Print help information
Examples
ronin-web spider --host scanme.nmap.org
ronin-web spider --domain nmap.org
ronin-web spider --site https://scanme.nmap.org/
Instance Attribute Summary collapse
-
#default_headers ⇒ Hash{String => String}
readonly
private
The default HTTP headers to send with every request.
-
#history ⇒ Array<String>
readonly
private
The pre-existing of previously visited URLs to start spidering with.
-
#host_headers ⇒ Hash{String => String}
readonly
private
The mapping of custom
Host
headers. -
#ignore_exts ⇒ Array<String, Regexp>
readonly
private
The URL file extensions to ignore.
-
#ignore_hosts ⇒ Array<String, Regexp>
readonly
private
The hosts to ignore.
-
#ignore_links ⇒ Array<String, Regexp>
readonly
private
The links to ignore.
-
#ignore_ports ⇒ Array<Integer, Regexp>
readonly
private
The port numbers to ignore.
-
#queue ⇒ Array<String>
readonly
private
The pre-existing queue of URLs to start spidering with.
-
#visit_exts ⇒ Array<String, Regexp>
readonly
private
The URL file extensions to visit.
-
#visit_hosts ⇒ Array<String, Regexp>
readonly
private
The hosts to visit.
-
#visit_links ⇒ Array<String, Regexp>
readonly
private
The links to visit.
-
#visit_ports ⇒ Array<Integer, Regexp>
readonly
private
The port numbers to visit.
-
#visit_schemes ⇒ Array<String>
readonly
private
The schemes to visit.
Instance Method Summary collapse
-
#agent_kwargs ⇒ Hash{Symbol => Object}
private
Builds keyword arguments for
Ronin::Web::Spider::Agent#initialize
. -
#define_printing_callbacks(agent) ⇒ Object
private
Defines callbacks that print information.
-
#initialize(**kwargs) ⇒ Spider
constructor
private
Initializes the spider command.
-
#new_agent {|agent| ... } ⇒ Ronin::Web::Spider::Agent
private
Creates a new web spider agent.
-
#print_content(content) ⇒ Object
private
Print content from a page.
-
#print_headers(page) ⇒ Object
private
Prints the headers of a page.
-
#print_page(page) ⇒ Object
private
Prints a page.
-
#print_query(page) ⇒ Object
private
Prints the XPath or CSS-path query result for the page.
-
#print_status(page) ⇒ Object
private
Prints the status of a page.
-
#print_url(page) ⇒ Object
private
Prints the URL for a page.
-
#print_verbose(message) ⇒ Object
private
Prints an information message.
-
#run ⇒ Object
private
Runs the
ronin-web spider
command.
Constructor Details
#initialize(**kwargs) ⇒ Spider
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Initializes the spider command.
530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 530 def initialize(**kwargs) super(**kwargs) @default_headers = {} @host_headers = {} @queue = [] @history = [] @visit_schemes = [] @visit_hosts = [] @visit_ports = [] @visit_links = [] @visit_exts = [] @ignore_hosts = [] @ignore_ports = [] @ignore_links = [] @ignore_exts = [] end |
Instance Attribute Details
#default_headers ⇒ Hash{String => String} (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The default HTTP headers to send with every request.
462 463 464 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 462 def default_headers @default_headers end |
#history ⇒ Array<String> (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The pre-existing of previously visited URLs to start spidering with.
477 478 479 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 477 def history @history end |
#host_headers ⇒ Hash{String => String} (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The mapping of custom Host
headers.
467 468 469 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 467 def host_headers @host_headers end |
#ignore_exts ⇒ Array<String, Regexp> (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The URL file extensions to ignore.
522 523 524 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 522 def ignore_exts @ignore_exts end |
#ignore_hosts ⇒ Array<String, Regexp> (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The hosts to ignore.
507 508 509 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 507 def ignore_hosts @ignore_hosts end |
#ignore_links ⇒ Array<String, Regexp> (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The links to ignore.
517 518 519 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 517 def ignore_links @ignore_links end |
#ignore_ports ⇒ Array<Integer, Regexp> (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The port numbers to ignore.
512 513 514 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 512 def ignore_ports @ignore_ports end |
#queue ⇒ Array<String> (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The pre-existing queue of URLs to start spidering with.
472 473 474 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 472 def queue @queue end |
#visit_exts ⇒ Array<String, Regexp> (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The URL file extensions to visit.
502 503 504 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 502 def visit_exts @visit_exts end |
#visit_hosts ⇒ Array<String, Regexp> (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The hosts to visit.
487 488 489 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 487 def visit_hosts @visit_hosts end |
#visit_links ⇒ Array<String, Regexp> (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The links to visit.
497 498 499 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 497 def visit_links @visit_links end |
#visit_ports ⇒ Array<Integer, Regexp> (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The port numbers to visit.
492 493 494 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 492 def visit_ports @visit_ports end |
#visit_schemes ⇒ Array<String> (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The schemes to visit.
482 483 484 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 482 def visit_schemes @visit_schemes end |
Instance Method Details
#agent_kwargs ⇒ Hash{Symbol => Object}
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Builds keyword arguments for Ronin::Web::Spider::Agent#initialize
.
701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 701 def agent_kwargs kwargs = {} kwargs[:proxy] = [:proxy] if [:proxy] unless @default_headers.empty? kwargs[:default_headers] = @default_headers end unless @host_headers.empty? kwargs[:host_headers] = @host_headers end kwargs[:user_agent] = @user_agent if @user_agent kwargs[:referer] = [:referer] if [:referer] kwargs[:delay] = [:delay] if [:delay] kwargs[:limit] = [:limit] if [:limit] kwargs[:max_depth] = [:max_depth] if [:max_depth] kwargs[:queue] = @queue unless @queue.empty? kwargs[:history] = @history unless @history.empty? if .has_key?(:strip_fragments) kwargs[:strip_fragments] = [:strip_fragments] end if .has_key?(:strip_query) kwargs[:strip_query] = [:strip_query] end kwargs[:schemes] = @visit_schemes unless @visit_schemes.empty? kwargs[:hosts] = @visit_hosts unless @visit_hosts.empty? kwargs[:ports] = @visit_ports unless @visit_ports.empty? kwargs[:links] = @visit_links unless @visit_links.empty? kwargs[:exts] = @visit_exts unless @visit_exts.empty? kwargs[:ignore_hosts] = @ignore_hosts unless @ignore_hosts.empty? kwargs[:ignore_ports] = @ignore_ports unless @ignore_ports.empty? kwargs[:ignore_links] = @ignore_links unless @ignore_links.empty? kwargs[:ignore_exts] = @ignore_exts unless @ignore_exts.empty? kwargs[:robots] = [:robots] if .has_key?(:robots) return kwargs end |
#define_printing_callbacks(agent) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Defines callbacks that print information.
630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 630 def define_printing_callbacks(agent) if [:print_hosts] agent.every_host do |host| print_verbose "spidering new host #{host}" end end if [:print_certs] agent.every_cert do |cert| print_verbose "encountered new certificate for #{cert.subject.common_name}" end end if [:print_js_strings] agent.every_js_string do |string| print_content string end end if [:print_html_comments] agent.every_html_comment do |comment| print_content comment end end if [:print_js_comments] agent.every_js_comment do |comment| print_content comment end end if [:print_comments] agent.every_comment do |comment| print_content comment end end end |
#new_agent {|agent| ... } ⇒ Ronin::Web::Spider::Agent
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Creates a new web spider agent.
682 683 684 685 686 687 688 689 690 691 692 693 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 682 def new_agent(&block) if [:host] Web::Spider.host([:host],**agent_kwargs,&block) elsif [:domain] Web::Spider.domain([:domain],**agent_kwargs,&block) elsif [:site] Web::Spider.site([:site],**agent_kwargs,&block) else print_error "must specify --host, --domain, or --site" exit(-1) end end |
#print_content(content) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Print content from a page.
850 851 852 853 854 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 850 def print_content(content) content.to_s.each_line do |line| puts " #{line}" end end |
#print_headers(page) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Prints the headers of a page.
811 812 813 814 815 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 811 def print_headers(page) page.response.each_capitalized do |name,value| print_content "#{name}: #{value}" end end |
#print_page(page) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Prints a page.
790 791 792 793 794 795 796 797 798 799 800 801 802 803 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 790 def print_page(page) print_status(page) if [:print_status] print_url(page) if [:print_headers] print_headers(page) elsif [:print_header] if (header = page.response[[:print_header]]) print_content header end end print_query(page) if ([:xpath] || [:css_path]) end |
#print_query(page) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Prints the XPath or CSS-path query result for the page.
823 824 825 826 827 828 829 830 831 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 823 def print_query(page) if page.html? if [:xpath] print_content page.doc.xpath([:xpath]) elsif [:css_path] print_content page.doc.css([:css_path]) end end end |
#print_status(page) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Prints the status of a page.
754 755 756 757 758 759 760 761 762 763 764 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 754 def print_status(page) if page.code < 300 print "#{colors.bright_green(page.code)} " elsif page.code < 400 print "#{colors.bright_yellow(page.code)} " elsif page.code < 500 print "#{colors.bright_red(page.code)} " else print "#{colors.bold(colors.bright_red(page.code))} " end end |
#print_url(page) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Prints the URL for a page.
772 773 774 775 776 777 778 779 780 781 782 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 772 def print_url(page) if page.code < 300 puts "#{colors.green(page.url)} " elsif page.code < 400 puts "#{colors.yellow(page.url)} " elsif page.code < 500 puts "#{colors.red(page.url)} " else puts "#{colors.bold(colors.red(page.url))} " end end |
#print_verbose(message) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Prints an information message.
838 839 840 841 842 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 838 def print_verbose() if verbose? puts colors.yellow("* #{}") end end |
#run ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Runs the ronin-web spider
command.
554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 |
# File 'lib/ronin/web/cli/commands/spider.rb', line 554 def run archive = if [:archive] Web::Spider::Archive.open([:archive]) elsif [:git_archive] Web::Spider::GitArchive.open([:git_archive]) end history_file = if [:history] File.open([:history],'w') end agent = new_agent do |agent| agent.every_page do |page| print_page(page) end agent.every_failed_url do |url| print_verbose "failed to request #{url}" end define_printing_callbacks(agent) if history_file agent.every_page do |page| history_file.puts(page.url) history_file.flush end end if archive agent.every_ok_page do |page| archive.write(page.url,page.body) end end end # post-spidering tasks if [:git_archive] archive.commit "Updated #{Time.now}" end if [:print_hosts] puts puts "Spidered the following hosts:" puts indent do agent.visited_hosts.each do |host| puts host end end end if [:print_certs] puts puts "Discovered the following certs:" puts agent.collected_certs.each do |cert| puts cert puts end end ensure if [:history] history_file.close end end |