NAME
ronin-web-spider - Spiders a website
SYNOPSIS
ronin-web spider [options] {--host HOST | --domain DOMAIN | --site URL}
DESCRIPTION
Spiders a website.
OPTIONS
--hostHOST- Spiders the specific HOST.
--domainDOMAIN- Spiders the whole DOMAIN.
--siteURL- Spiders the website, starting at the URL.
--open-timeoutSECS- Sets the connection open timeout.
--read-timeoutSECS- Sets the read timeout.
--ssl-timeoutSECS- Sets the SSL connection timeout.
--continue-timeoutSECS- Sets the continue timeout.
--keep-alive-timeoutSECS- Sets the connection keep alive timeout.
-P,--proxyPROXY- Sets the proxy to use.
-H,--header“NAME: VALUE”- Sets a default header.
--host-headerNAME=VALUE- Sets a default header.
-u,--user-agentchrome-linux|chrome-macos|chrome-windows|chrome-iphone|chrome-ipad|chrome-android|firefox-linux|firefox-macos|firefox-windows|firefox-iphone|firefox-ipad|firefox-android|safari-macos|safari-iphone|safari-ipad|edge- The
User-Agentto use. -U,--user-agent-stringSTRING- The raw
User-Agentstring to use. -R,--refererURL- Sets the
RefererURL. --delaySECS- Sets the delay in seconds between each request.
-l,--limitCOUNT- Only spiders up to COUNT pages.
-d,--max-depthDEPTH- Only spiders up to max depth.
--enqueueURL- Adds the URL to the queue.
--visitedURL- Marks the URL as previously visited.
--strip-fragments- Enables/disables stripping the fragment component of every URL.
--strip-query- Enables/disables stripping the query component of every URL.
--visit-schemeSCHEME- Visit URLs with the URI scheme.
--visit-schemes-like/REGEX/- Visit URLs with URI schemes that match the REGEX.
--ignore-schemeSCHEME- Ignore URLs with the URI scheme.
--ignore-schemes-like/REGEX/- Ignore URLs with URI schemes matching the REGEX.
--visit-hostHOST- Visit URLs with the matching host name.
--visit-hosts-like/REGEX/- Visit URLs with hostnames that match the REGEX.
--ignore-hostHOST- Ignore the host name.
--ignore-hosts-like/REGEX/- Ignore the host names matching the REGEX.
--visit-portPORT- Visit URLs with the matching port number.
--visit-ports-like/REGEX/- Visit URLs with port numbers that match the REGEX.
--ignore-portPORT- Ignore the port number.
--ignore-ports-like/REGEX/- Ignore the port numbers matching the REGEXP.
--visit-linkURL- Visit the URL.
--visit-links-like/REGEX/- Visit URLs that match the REGEX.
--ignore-linkURL- Ignore the URL.
--ignore-links-like/REGEX/- Ignore URLs matching the REGEX.
--visit-extFILE_EXT- Visit URLs with the matching file ext.
--visit-exts-like/REGEX/- Visit URLs with file exts that match the REGEX.
--ignore-extFILE_EXT- Ignore the URLs with the file ext.
--ignore-exts-like/REGEX/- Ignore URLs with file exts matching the REGEX.
-r,--robots- Specifies whether to honor
robots.txt. --print-status- Print the status codes for each URL.
--print-headers- Print response headers for each URL.
--print-headerNAME- Prints a specific header.
--historyFILE- Sets the history file to write every visited URL to.
--archiveDIR- Archive every visited page to the DIR.
--git-archiveDIR- Archive every visited page to the git repository.
-X,--xpathXPATH- Evaluates the XPath on each HTML page.
-C,--css-pathXPATH- Evaluates the CSS-path on each HTML page.
--print-hosts- Print all discovered hostnames.
--print-certs- Print all encountered SSL/TLS certificates.
--save-certs- Saves all encountered SSL/TLS certificates.
--print-js-strings- Print all JavaScript strings.
--print-js-url-strings- Print URL strings found in JavaScript.
--print-js-path-strings- Print path strings found in JavaScript.
--print-js-absolute-path-strings- Only print absolute path strings found in JavaScript.
--print-js-relative-path-strings- Only print relative path strings found in JavaScript.
--print-html-comments- Print HTML comments.
--print-js-comments- Print JavaScript comments.
--print-comments- Print all HTML and JavaScript comments.
-v,--verbose- Enables verbose output.
-h,--help- Print help information.
ENVIRONMENT
- HTTP_PROXY
- Sets the global HTTP proxy.
- RONIN_HTTP_PROXY
- Sets the HTTP proxy for Ronin.
AUTHOR
Postmodern postmodern.mod3@gmail.com
SEE ALSO
ronin-web-server ronin-web-proxy ronin-web-diff ronin-web-new-spider