Module: ServeByteRange

Defined in:: lib/serve_byte_range.rb,
lib/serve_byte_range/version.rb

Defined Under Namespace

Classes: BlockWritableWithLimit, ByteRangeBody, EmptyBody, MultipartByteRangesBody, NotModifiedBody, Unsatisfiable, WholeBody

Constant Summary collapse

VERSION =

"1.0.0"

Class Method Summary collapse

.coalesce_ranges(ranges) ⇒ Array

The RFC specifically gives an example of non-canonical, but still valid request for overlapping ranges: > Several legal but not canonical specifications of the second 500 > bytes (byte offsets 500-999, inclusive): > bytes=500-600,601-999 > bytes=500-700,601-999 In such cases, ranges need to be collapsed together.
.generate_boundary ⇒ Object

Strictly - the boundary is supposed to not appear in any of the parts of the multipart response, so first you need to scan the response, pick a byte sequence that does not occur in it, and then use that.
.serve_ranges(env, resource_size:, etag: nil, resource_content_type: "binary/octet-stream", multipart_boundary: generate_boundary) {|range[Range], io[IO]| ... } ⇒ Array

The Rack response triplet of ‘[status, header_hash, enumerable_body]`.

Class Method Details

.coalesce_ranges(ranges) ⇒ `Array`

The RFC specifically gives an example of non-canonical, but still valid request for overlapping ranges:

> Several legal but not canonical specifications of the second 500
> bytes (byte offsets 500-999, inclusive):
>  bytes=500-600,601-999
>  bytes=500-700,601-999

In such cases, ranges need to be collapsed together. First, to avoid serving a tiny byte range over and over - causing excessive requests to upstream, second - to optimize for doing less requests in total.

# File 'lib/serve_byte_range.rb', line 209

def self.coalesce_ranges(ranges)
  return [] if ranges.empty?
  # The RFC says https://www.rfc-editor.org/rfc/rfc7233#section-6.1
  #
  # > Servers ought to ignore, coalesce, or reject
  # > egregious range requests, such as requests for more than two
  # > overlapping ranges or for many small ranges in a single set,
  # > particularly when the ranges are requested out of order for no
  # > apparent reason.
  sorted_ranges = ranges.sort_by(&:begin)
  first = sorted_ranges.shift
  coalesced_sorted_ranges = sorted_ranges.each_with_object([first]) do |next_range, acc|
    prev_range = acc.pop
    if prev_range.end >= next_range.begin # Range#overlap? can be used on 3.3+
      new_begin = [prev_range.begin, next_range.begin].min
      new_end = [prev_range.end, next_range.end].max
      acc << Range.new(new_begin, new_end)
    else
      acc << prev_range << next_range
    end
  end
  # Sort the ranges according to the order the client requested.
  # The spec says that a client _may_ want to get a certain byte range first,
  # and it seems a legitimate use case, not ill intent.
  #
  # > A client that is requesting multiple ranges SHOULD list those ranges
  # > in ascending order (the order in which they would typically be
  # > received in a complete representation) unless there is a specific
  # > need to request a later part earlier.  For example, a user agent
  # > processing a large representation with an internal catalog of parts
  # > might need to request later parts first, particularly if the
  # > representation consists of pages stored in reverse order and the user
  # > agent wishes to transfer one page at a time.
  indices = ranges.map do |r|
    coalesced_sorted_ranges.find_index { |cr| cr.begin <= r.begin && cr.end >= r.end }
  end
  indices.uniq.map { |i| coalesced_sorted_ranges.fetch(i) }
end

.generate_boundary ⇒ `Object`

Strictly - the boundary is supposed to not appear in any of the parts of the multipart response, so first you need to scan the response, pick a byte sequence that does not occur in it, and then use that. In practice, nobody does that - and a well-behaved HTTP client should honor the Content-Range header when extracting the byte range from the response. See stackoverflow.com/questions/37413715



193
194
195

# File 'lib/serve_byte_range.rb', line 193

def self.generate_boundary
  Random.bytes(12).unpack1("H*")
end

.serve_ranges(env, resource_size:, etag: nil, resource_content_type: "binary/octet-stream", multipart_boundary: generate_boundary) {|range[Range], io[IO]| ... } ⇒ `Array`

Returns the Rack response triplet of ‘[status, header_hash, enumerable_body]`.

Examples:

status, headers, body = serve_ranges(env, resource_size: file.size) do |range, io|
  file.seek(range.begin)
  IO.copy_stream(file, io, range.size)
end
[status, headers, body]

Yields:

(range[Range], io[IO]) —

The HTTP range being requested and the IO(ish) object to ‘write()` the bytes into

# File 'lib/serve_byte_range.rb', line 261

def self.serve_ranges(env, resource_size:, etag: nil, resource_content_type: "binary/octet-stream", multipart_boundary: generate_boundary, &range_serving_block)
  # As per RFC:
  # If the entity tag given in the If-Range header matches the current cache validator for the entity,
  # then the server SHOULD provide the specified sub-range of the entity using a 206 (Partial Content)
  # response. If the cache validator does not match, then the server SHOULD return the entire entity
  # using a 200 (OK) response.
  wants_ranges_and_etag_valid = env["HTTP_IF_RANGE"] && env["HTTP_IF_RANGE"] == etag && env["HTTP_RANGE"]
  wants_ranges_and_no_etag = !env["HTTP_IF_RANGE"] && env["HTTP_RANGE"]
  wants_no_ranges_and_supplies_etag = env["HTTP_IF_NONE_MATCH"] && !env["HTTP_RANGE"] && !env["HTTP_IF_RANGE"]

  # Very old Rack versions do not have get_byte_ranges and have just byte_ranges
  http_ranges_from_header = Rack::Utils.respond_to?(:get_byte_ranges) ? Rack::Utils.get_byte_ranges(env["HTTP_RANGE"], resource_size) : Rack::Utils.byte_ranges(env, resource_size)
  http_ranges_from_header = coalesce_ranges(http_ranges_from_header) if http_ranges_from_header

  body = if wants_no_ranges_and_supplies_etag && env["HTTP_IF_NONE_MATCH"] == etag
    NotModifiedBody.new
  elsif resource_size.zero?
    EmptyBody.new
  elsif http_ranges_from_header && (wants_ranges_and_no_etag || wants_ranges_and_etag_valid)
    if http_ranges_from_header.none?
      Unsatisfiable.new(resource_size: resource_size)
    elsif http_ranges_from_header.one?
      ByteRangeBody.new(http_range: http_ranges_from_header.first, resource_size: resource_size, resource_content_type: resource_content_type, &range_serving_block)
    else
      MultipartByteRangesBody.new(http_ranges: http_ranges_from_header, resource_size: resource_size, resource_content_type: resource_content_type, boundary: multipart_boundary, &range_serving_block)
    end
  else
    WholeBody.new(resource_size: resource_size, resource_content_type: resource_content_type, &range_serving_block)
  end
  headers = body.headers

  etag = etag.inspect if etag && !etag.match?(/^".+"$/)
  headers["ETag"] = etag if etag

  [body.status, headers, body]
end