Parsing URIs is easy
— postmodern
Despite what others may say, parsing URIs is not hard.
In fact, Ruby already makes parsing URIs fairly easy with the URI()
method.
uri = URI('http://www.google.com/search?q=parsing+URIs+is+hard%2C+let%27s+go+shopping&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:unofficial&client=firefox-a')
# => #<URI::HTTP:0x00000000f94188 URL:http://www.google.com/search?q=parsing+URIs+is+hard%2C+let%27s+go+shopping&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:unofficial&client=firefox-a>
Query Params
Although, the URI library does not parse the parameters within the query string. Ronin, like other modern Ruby projects, depends on many other smaller RubyGems for functionality; also so you don’t have to install and require them by hand. One such RubyGem is uri-query_params, which allows you to access the parameters within the query string of any URI::HTTP (or URI::HTTPS) object:
uri.query_params['q']
# => "parsing+URIs+is+hard,+let's+go+shopping"
pp uri.query_params
# {"q"=>"parsing+URIs+is+hard,+let's+go+shopping",
# "ie"=>"utf-8",
# "oe"=>"utf-8",
# "aq"=>"t",
# "rls"=>"org.mozilla:en-US:unofficial",
# "client"=>"firefox-a"}
# => {"q"=>"parsing+URIs+is+hard,+let's+go+shopping", "ie"=>"utf-8", "oe"=>"utf-8", "aq"=>"t", "rls"=>"org.mozilla:en-US:unofficial", "client"=>"firefox-a"}
Additionally, you can parse/dump individual query strings:
URI::QueryParams.parse("q=1&x=2")
# => {"q" => "1", "x" => "2"}
URI::QueryParams.dump(:q => 1, :x => 2)
# => "q=1&x=2"
The URI::QueryParams.dump method is also used by HTTP helper methods for the :query_params option:
http_get(:host => 'example.com', :path => '/page.php', :query_params => {'id' => "1 OR 1=1"})
Non-standard URIs
There are URIs that Ruby has trouble parsing, such as so called punycode domains. Not to worry, Ronin also requires the addressable RubyGem, a URI parsing library on steroids:
uri = Addressable::URI.parse("http://www.詹姆斯.com/?q=1")
# => #<Addressable::URI:0xb525d4 URI:http://www.詹姆斯.com/?q=1>
uri.normalize
# => #<Addressable::URI:0xb57bec URI:http://www.xn--8ws00zhy3a.com/?q=1>
With Ronin, parsing URIs is easy.