Class: Ronin::Web::Spider::Archive

Inherits:
Object
  • Object
show all
Defined in:
lib/ronin/web/spider/archive.rb

Overview

Represents a web archive directory.

Example

Spider a host and archive every web page:

require 'ronin/web/spider'
require 'ronin/web/spider/archive'

Ronin::Web::Spider::Archive.open('path/to/root') do |archive|
  Ronin::Web::Spider.every_page(host: 'example.com') do |page|
    archive.write(page.url,page.body)
  end
end

Direct Known Subclasses

GitArchive

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(root) ⇒ Archive

Initializes the archive.

Parameters:

  • root (String)

    The path to the root directory.



55
56
57
# File 'lib/ronin/web/spider/archive.rb', line 55

def initialize(root)
  @root = File.expand_path(root)
end

Instance Attribute Details

#rootString (readonly)

The path to the archive root directory.

Returns:

  • (String)


47
48
49
# File 'lib/ronin/web/spider/archive.rb', line 47

def root
  @root
end

Class Method Details

.open(root) {|archive| ... } ⇒ GitArchive

Creates the archive and the archive's directory, if it already does not exist.

Parameters:

  • root (String)

    The path to the new archive.

Yields:

  • (archive)

    If a block is given, it will be passed the newly created archive.

Yield Parameters:

  • archive (Archive)

    The newly created archive.

Returns:



75
76
77
78
79
80
81
82
# File 'lib/ronin/web/spider/archive.rb', line 75

def self.open(root)
  archive = new(root)

  FileUtils.mkdir_p(archive.root)

  yield archive if block_given?
  return archive
end

Instance Method Details

#to_sString

Converts the archive to a String.

Returns:

  • (String)

    The path of the archive directory.



113
114
115
# File 'lib/ronin/web/spider/archive.rb', line 113

def to_s
  @root
end

#write(url, body) ⇒ String

Archives a webpage.

Parameters:

  • url (URI::HTTP)

    The URL of the response.

  • body (String)

    The response body to save.

Returns:

  • (String)

    The full path to the archived page.



96
97
98
99
100
101
102
103
104
105
# File 'lib/ronin/web/spider/archive.rb', line 96

def write(url,body)
  absolute_path = File.join(@root,url.request_uri[1..])
  absolute_path << 'index.html' if absolute_path.end_with?('/')

  parent_dir = File.dirname(absolute_path)

  FileUtils.mkdir_p(parent_dir) unless File.directory?(parent_dir)
  File.write(absolute_path,body)
  return absolute_path
end