Module: Ronin::Support::Encoding::XML

Defined in:
lib/ronin/support/encoding/xml.rb

Overview

Contains methods for encoding/decoding escaping/unescaping XML data.

Features

  • Supports lowercase (ex: &) and uppercase (ex: &) encoding.
  • Supports decimal (ex: A) and hexadecimal (ex: A) character encoding.
  • Supports zero-padding (ex: A).

Core-Ext Methods

Since:

  • 1.0.0

Constant Summary collapse

ESCAPE_BYTES =

Special bytes and their escaped XML characters.

Since:

  • 1.0.0

{
  39 => ''',
  38 => '&',
  34 => '"',
  60 => '<',
  62 => '>'
}
ESCAPE_BYTES_UPPERCASE =

Special bytes and their escaped XML characters, but in uppercase.

Since:

  • 1.0.0

{
  39 => ''',
  38 => '&',
  34 => '"',
  60 => '<',
  62 => '>'
}
ESCAPED_CHARS =

XML escaped characters and their unescaped forms.

Since:

  • 1.0.0

{
  ''' => "'",
  '&'  => '&',
  '"' => '"',
  '&lt;'   => '<',
  '&gt;'   => '>'
}

Class Method Summary collapse

Class Method Details

.decode(data) ⇒ String

Alias for unescape.

Parameters:

  • data (String)

    The data to XML unescape.

Returns:

  • (String)

    The unescaped String.

See Also:

Since:

  • 1.0.0



243
244
245
# File 'lib/ronin/support/encoding/xml.rb', line 243

def self.decode(data)
  unescape(data)
end

.encode(data, **kwargs) ⇒ String

Encodes each character in the given data as an XML character.

Examples:

Encoding::XML.encode("abc")
# => "&#97;&#98;&#99;"

Zero-padding:

Encoding::XML.encode("abc", zero_pad: true)
# => "&#0000097;&#0000098;&#0000099;"

Hexadecimal encoded characters:

Encoding::XML.encode("abc", format: :hex)
# => "&#x61;&#x62;&#x63;"

Uppercase hexadecimal encoded characters:

Encoding::XML.encode("abc\xff", format: :hex, case: :upper)
# => "&#X61;&#X62;&#X63;&#XFF;"

Parameters:

  • data (String)

    The data to XML encode.

  • kwargs (Hash{Symbol => Object})

    Additional keyword arguments.

Options Hash (**kwargs):

  • :format (:decimal, :hex) — default: :decimal

    The numeric format for the escaped characters.

  • :zero_pad (Boolean) — default: false

    Controls whether the escaped characters will be left-padded with up to seven 0 characters.

  • :case (:lower, :upper, nil)

    Controls whether to output lowercase or uppercase XML special characters. Defaults to lowercase hexadecimal.

Returns:

  • (String)

    The XML encoded String.

Raises:

  • (ArgumentError)

    The format: or case: keyword argument is invalid.

Since:

  • 1.0.0



216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
# File 'lib/ronin/support/encoding/xml.rb', line 216

def self.encode(data,**kwargs)
  encoded = String.new

  if data.valid_encoding?
    data.each_codepoint do |codepoint|
      encoded << encode_byte(codepoint,**kwargs)
    end
  else
    data.each_byte do |byte|
      encoded << encode_byte(byte,**kwargs)
    end
  end

  return encoded
end

.encode_byte(byte, format: :decimal, zero_pad: false, **kwargs) ⇒ String

Encodes the byte as a XML decimal character.

Examples:

Encoding::XML.encode_byte(0x41)
# => "&#65;"

Zero-padding:

Encoding::XML.encode_byte(0x41, zero_pad: true)
# => "&#0000065;"

Hexadecimal escaped characters:

Encoding::XML.encode_byte(0x41, format: :hex)
# => "&#x41;"

Uppercase hexadecimal escaped characters:

Encoding::XML.encode_byte(0xff, format: :hex, case: :upper)
# => "&#XFF;"

Parameters:

  • byte (Integer)

    The byte to XML encode.

  • format (:decimal, :hex) (defaults to: :decimal)

    The numeric format for the escaped characters.

  • zero_pad (Boolean) (defaults to: false)

    Controls whether the escaped characters will be left-padded with up to seven 0 characters.

  • kwargs (Hash{Symbol => Object})

    Additional keyword arguments.

Options Hash (**kwargs):

  • :case (:lower, :upper, nil)

    Controls whether to output lowercase or uppercase XML special characters. Defaults to lowercase hexadecimal.

Returns:

  • (String)

    The XML decimal character.

Raises:

  • (ArgumentError)

    The format: or case: keyword argument is invalid.

Since:

  • 1.0.0



150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
# File 'lib/ronin/support/encoding/xml.rb', line 150

def self.encode_byte(byte, format: :decimal, zero_pad: false, **kwargs)
  case format
  when :decimal
    if zero_pad then "&#%.7d;" % byte
    else             "&#%d;" % byte
    end
  when :hex
    case kwargs[:case]
    when :upper
      if zero_pad then "&#X%.7X;" % byte
      else             "&#X%.2X;" % byte
      end
    when :lower, nil
      if zero_pad then "&#x%.7x;" % byte
      else             "&#x%.2x;" % byte
      end
    else
      raise(ArgumentError,"case (#{kwargs[:case].inspect}) keyword argument must be either :lower, :upper, or nil")
    end
  else
    raise(ArgumentError,"format (#{format.inspect}) must be :decimal or :hex")
  end
end

.escape(data, **kwargs) ⇒ String

XML escapes the data.

Examples:

Encoding::XML.escape("one & two")
# => "one &amp; two"

Uppercase escaped characters:

Encoding::XML.encode("one & two", case: :upper)
# => "one &AMP; two"

Parameters:

  • data (String)

    The data to XML escape.

  • kwargs (Hash{Symbol => Object})

    Additional keyword arguments.

Options Hash (**kwargs):

  • :case (:lower, :upper, nil)

    Controls whether to output lowercase or uppercase XML special characters. Defaults to lowercase hexadecimal.

Returns:

  • (String)

    The XML escaped String.

Raises:

  • (ArgumentError)

    The case: keyword argument is invalid.

Since:

  • 1.0.0



274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
# File 'lib/ronin/support/encoding/xml.rb', line 274

def self.escape(data,**kwargs)
  escaped = String.new

  if data.valid_encoding?
    data.each_codepoint do |codepoint|
      escaped << escape_byte(codepoint,**kwargs)
    end
  else
    data.each_byte do |byte|
      escaped << escape_byte(byte,**kwargs)
    end
  end

  return escaped
end

.escape_byte(byte, **kwargs) ⇒ String

Escapes the byte as a XML decimal character.

Examples:

Encoding::XML.escape_byte(0x41)
# => "A"
Encoding::XML.escape_byte(0x26)
# => "&amp;"

Uppercase encoding:

Encoding::XML.escape_byte(0x26, case: :upper)
# => "&AMP;"

Parameters:

  • byte (Integer)

    The byte to XML escape.

  • kwargs (Hash{Symbol => Object})

    Additional keyword arguments.

Options Hash (**kwargs):

  • :case (:lower, :upper, nil)

    Controls whether to output lowercase or uppercase XML special characters. Defaults to lowercase hexadecimal.

Returns:

  • (String)

    The XML decimal character.

Raises:

  • (ArgumentError)

    The case: keyword argument is invalid.

Since:

  • 1.0.0



91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# File 'lib/ronin/support/encoding/xml.rb', line 91

def self.escape_byte(byte,**kwargs)
  table = case kwargs[:case]
          when :upper      then ESCAPE_BYTES_UPPERCASE
          when :lower, nil then ESCAPE_BYTES
          else
            raise(ArgumentError,"case (#{kwargs[:case].inspect}) keyword argument must be either :lower, :upper, or nil")
          end

  table.fetch(byte) do
    if (byte >= 0 && byte <= 0xff)
      byte.chr(Encoding::ASCII_8BIT)
    else
      byte.chr(Encoding::UTF_8)
    end
  end
end

.unescape(data) ⇒ String

Unescapes the XML encoded data.

Examples:

Encoding::XML.unescape("&lt;p&gt;one &lt;span&gt;two&lt;/span&gt;&lt;/p&gt;")
# => "<p>one <span>two</span></p>"

Parameters:

  • data (String)

    The data to XML unescape.

Returns:

  • (String)

    The unescaped String.

See Also:

Since:

  • 1.0.0



314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
# File 'lib/ronin/support/encoding/xml.rb', line 314

def self.unescape(data)
  unescaped = String.new(encoding: Encoding::UTF_8)
  scanner   = StringScanner.new(data)

  until scanner.eos?
    unescaped << if (named_char = scanner.scan(/&(?:apos|amp|quot|lt|gt);/i))
                   ESCAPED_CHARS.fetch(named_char.downcase)
                 elsif (decimal_char = scanner.scan(/&#\d+;/))
                   decimal_char[2..-2].to_i.chr(Encoding::UTF_8)
                 elsif (hex_char     = scanner.scan(/&#x[a-f0-9]+;/i))
                   hex_char[3..-2].to_i(16).chr(Encoding::UTF_8)
                 else
                   scanner.getch
                 end
  end

  return unescaped
end