Class: Hermeneutics::URLText

Inherits:
Object
  • Object
show all
Defined in:
lib/hermeneutics/escape.rb

Overview

URL-able representation

What’s acually happening

URLs may not contain spaces and serveral character as slashes, ampersands etc. These characters will be masked by a percent sign and two hex digits representing the ASCII code. Eight bit characters should be masked the same way.

An URL line does not store encoding information by itself. A locator may either say one of these:

http://www.example.com/subdir/index.html?umlfield=%C3%BCber+alles
http://www.example.com/subdir/index.html?umlfield=%FCber+alles

The reading CGI has to decide on itself how to treat it.

Examples

URLText.encode "'Stop!' said Fred."     #=> "%27Stop%21%27+said+Fred."
URLText.decode "%27Stop%21%27+said+Fred%2e"
                                        #=> "'Stop!' said Fred."

Defined Under Namespace

Classes: Dict

Constant Summary collapse

PAIR_SET =

:stopdoc:

"="
PAIR_SEP =
"&"

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(keep_8bit: nil, keep_space: nil, mask_space: nil) ⇒ URLText

:call-seq:

new( hash)  -> urltext

Creates a URLText converter.

The parameters may be given as values or as a hash.

utx = URLText.new keep_8bit: true, keep_space: false

See the encode method for an explanation of these parameters.



267
268
269
270
271
# File 'lib/hermeneutics/escape.rb', line 267

def initialize keep_8bit: nil, keep_space: nil, mask_space: nil
  @keep_8bit  = keep_8bit
  @keep_space = keep_space
  @mask_space = mask_space
end

Instance Attribute Details

#keep_8bitObject

Returns the value of attribute keep_8bit.



254
255
256
# File 'lib/hermeneutics/escape.rb', line 254

def keep_8bit
  @keep_8bit
end

#keep_spaceObject

Returns the value of attribute keep_space.



254
255
256
# File 'lib/hermeneutics/escape.rb', line 254

def keep_space
  @keep_space
end

#mask_spaceObject

Returns the value of attribute mask_space.



254
255
256
# File 'lib/hermeneutics/escape.rb', line 254

def mask_space
  @mask_space
end

Class Method Details

.decode(str) ⇒ Object

:call-seq:

decode( str)                 -> str
decode( str, encoding)       -> str

Decode the contained string.

utx = URLText.new
utx.decode "%27Stop%21%27+said+Fred%2e"       #=> "'Stop!' said Fred."

The encoding will be kept. That means that an invalidly encoded string could be produced.

a = "bl%F6d"
a.encode! "utf-8"
d = utx.decode a
d =~ /./        #=> "invalid byte sequence in UTF-8 (ArgumentError)"


457
458
459
460
461
462
463
# File 'lib/hermeneutics/escape.rb', line 457

def decode str
  r = str.new_string
  r.tr! "+", " "
  r.gsub! /(?:%([0-9A-F]{2}))/i do $1.hex.chr end
  r.force_encoding str.encoding
  r
end

.decode_hash(qstr) ⇒ Object

:call-seq:

decode_hash( str)                      -> hash
decode_hash( str) { |key,val| ... }    -> nil or int

Decode a URL-style encoded string to a Hash. In case a block is given, the number of key-value pairs is returned.

str = "a=%3B%3B%3B&x=%26auml%3B%26ouml%3B%26uuml%3B"
URLText.decode_hash str do |k,v|
  puts "#{k} = #{v}"
end

Output:

a = ;;;
x = äöü


482
483
484
485
486
487
488
489
490
491
492
493
494
495
# File 'lib/hermeneutics/escape.rb', line 482

def decode_hash qstr
  if block_given? then
    i = 0
    each_pair qstr do |k,v|
      yield k, v
      i += 1
    end
    i.nonzero?
  else
    Dict.create do |h|
      each_pair qstr do |k,v| h.parse k, v end
    end
  end
end

.encode(str) ⇒ Object



428
429
430
# File 'lib/hermeneutics/escape.rb', line 428

def encode str
  std.encode str
end

.encode_hash(hash) ⇒ Object



432
433
434
# File 'lib/hermeneutics/escape.rb', line 432

def encode_hash hash
  std.encode_hash hash
end

.mkurl(path, hash, anchor = nil) ⇒ Object



436
437
438
# File 'lib/hermeneutics/escape.rb', line 436

def mkurl path, hash, anchor = nil
  std.mkurl path, hash, anchor
end

.stdObject



424
425
426
# File 'lib/hermeneutics/escape.rb', line 424

def std
  @std ||= new
end

Instance Method Details

#decode(str) ⇒ Object



414
415
416
# File 'lib/hermeneutics/escape.rb', line 414

def decode str
  self.class.decode str
end

#decode_hash(qstr, &block) ⇒ Object



418
419
420
# File 'lib/hermeneutics/escape.rb', line 418

def decode_hash qstr, &block
  self.class.decode_hash qstr, &block
end

#encode(str) ⇒ Object

:call-seq:

encode( str)     -> str

Create a string that contains %XX-encoded bytes.

utx = URLText.new
utx.encode "'Stop!' said Fred."       #=> "%27Stop%21%27+said+Fred."

The result will not contain any 8-bit characters, except when keep_8bit is set. The result will be in the same encoding as the argument although this normally has no meaning.

utx = URLText.new keep_8bit: true
s = "< ä >".encode "UTF-8"
utx.encode s                    #=> "%3C+\u{e4}+%3E"  in UTF-8

s = "< ä >".encode "ISO-8859-1"
utx.encode s                    #=> "%3C+\xe4+%3E"      in ISO-8859-1

A space “ ” will not be replaced by a plus ”+ if keep_space is set.

utx = URLText.new keep_space: true
s = "< x >"
utx.encode s                    #=> "%3C x %3E"

When mask_space is set, then a space will be represented as “%20”,



301
302
303
304
305
306
307
308
309
310
311
312
313
314
# File 'lib/hermeneutics/escape.rb', line 301

def encode str
  r = str.new_string
  r.force_encoding Encoding::ASCII_8BIT unless @keep_8bit
  r.gsub! %r/([^a-zA-Z0-9_.-])/ do |c|
    if c == " " and not @mask_space then
      @keep_space ? c : "+"
    elsif not @keep_8bit or c.ascii_only? then
      "%%%02X" % c.ord
    else
      c
    end
  end
  r.encode! str.encoding
end

#encode_hash(hash) ⇒ Object

:call-seq:

encode_hash( hash)     -> str

Encode a Hash to a URL-style string.

utx = URLText.new

h = { name: "John Doe", age: 42 }
utx.encode_hash h
    #=> "name=John+Doe&age=42"

h = { a: ";;;", x: "äöü" }
utx.encode_hash h
    #=> "a=%3B%3B%3B&x=%C3%A4%C3%B6%C3%BC"


381
382
383
384
385
386
387
388
389
390
# File 'lib/hermeneutics/escape.rb', line 381

def encode_hash hash
  hash.map { |(k,v)|
    case v
      when nil   then next
      when true  then v = k
      when false then v = ""
    end
    [k, v].map { |x| encode x.to_s }.join PAIR_SET
  }.compact.join PAIR_SEP
end

#mkurl(path, hash = nil, anchor = nil) ⇒ Object

:call-seq:

mkurl( path, hash, anchor = nil)     -> str

Make an URL.

utx = URLText.new
h = { name: "John Doe", age: "42" }
utx.encode_hash "myscript.rb", h, "chapter"
    #=> "myscript.rb?name=John+Doe&age=42#chapter"


402
403
404
405
406
407
408
409
410
# File 'lib/hermeneutics/escape.rb', line 402

def mkurl path, hash = nil, anchor = nil
  unless Hash === hash then
    hash, anchor = anchor, hash
  end
  r = "#{path}"
  r << "?#{encode_hash hash}" if hash
  r << "##{anchor}" if anchor
  r
end