Class: Ronin::Web::Spider::GitArchive

Inherits:
Archive
  • Object
show all
Defined in:
lib/ronin/web/spider/git_archive.rb

Overview

Represents a web archive directory that is backed by Git.

Example

Spider a host and archive every web page to a Git repository:

require 'ronin/web/spider'
require 'ronin/web/spider/git_archive'
require 'date'

Ronin::Web::Spider::GitArchive.open('path/to/root') do |archive|
  archive.commit("Updated #{Date.today}") do
    Ronin::Web::Spider.every_page(host: 'example.com') do |page|
      archive.write(page.url,page.body)
    end
  end
end

Instance Attribute Summary

Attributes inherited from Archive

#root

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Archive

#initialize, #to_s

Constructor Details

This class inherits a constructor from Ronin::Web::Spider::Archive

Class Method Details

.open(root) {|archive| ... } ⇒ GitArchive

Creates the Git archive, if it already does not exist.

Parameters:

  • root (String)

    The path to the new Git archive.

Yields:

  • (archive)

    If a block is given, it will be passed the newly created Git archive.

Yield Parameters:

  • archive (GitArchive)

    The newly created Git archive.

Returns:



63
64
65
66
67
68
69
# File 'lib/ronin/web/spider/git_archive.rb', line 63

def self.open(root)
  super(root) do |archive|
    archive.init unless archive.git?

    yield archive if block_given?
  end
end

Instance Method Details

#commit(message) {|self| ... } ⇒ true

Commits changes to the Git archive.

Examples:

archive.write(url,response.body)
archive.commit "Updated #{Date.today}"

with a block:

archive.commit("Updated #{Date.today}") do
  Ronin::Web::Spider.every_page(host: 'example.com') do |page|
    archive.write(page.url,page.body)
  end
end

Parameters:

  • message (String)

    The commit message.

Yields:

  • (self)

    If a block is given it will be called before committing any changes.

Returns:

  • (true)

    Indicates whether the changes were successfully committed.

Raises:

  • (GitError)

    Indicates the git command exited with an error.

  • (GitNotInstalled)

    Indicates that git was not installed or could not be found in the $PATH environment variable.



153
154
155
156
157
# File 'lib/ronin/web/spider/git_archive.rb', line 153

def commit(message)
  yield self if block_given?

  git('commit','-m',message.to_s)
end

#git?Boolean

Determines if the git repository has been initialized.

Returns:

  • (Boolean)


76
77
78
# File 'lib/ronin/web/spider/git_archive.rb', line 76

def git?
  File.directory?(File.join(@root,'.git'))
end

#inittrue

Initializes the Git repository.

Returns:

  • (true)

    Indicates the Git repository was successfully initialized.

Raises:

  • (GitError)

    Indicates that the git command exited with an error.

  • (GitNotInstalled)

    Indicates that git was not installed or could not be found in the $PATH environment variable.



93
94
95
# File 'lib/ronin/web/spider/git_archive.rb', line 93

def init
  git('init')
end

#write(url, body) ⇒ String

Saves a webpage to the Git archive.

Parameters:

  • url (URI::HTTP)

    The URL of the response.

  • body (String)

    The response body to save.

Returns:

  • (String)

    The full path to the archived page.

Raises:

  • (GitError)

    Indicates that the git command exited with an error.

  • (GitNotInstalled)

    Indicates that git was not installed or could not be found in the $PATH environment variable.



116
117
118
119
120
121
# File 'lib/ronin/web/spider/git_archive.rb', line 116

def write(url,body)
  absolute_path = super(url,body)

  git('add',absolute_path)
  return absolute_path
end