Module: ServeByteRange

Defined in:
lib/serve_byte_range.rb,
lib/serve_byte_range/version.rb

Defined Under Namespace

Classes: BlockWritableWithLimit, ByteRangeBody, EmptyBody, MultipartByteRangesBody, NotModifiedBody, Unsatisfiable, WholeBody

Constant Summary collapse

VERSION =
"1.0.0"

Class Method Summary collapse

Class Method Details

.coalesce_ranges(ranges) ⇒ Array

The RFC specifically gives an example of non-canonical, but still valid request for overlapping ranges:

> Several legal but not canonical specifications of the second 500
> bytes (byte offsets 500-999, inclusive):
>  bytes=500-600,601-999
>  bytes=500-700,601-999

In such cases, ranges need to be collapsed together. First, to avoid serving a tiny byte range over and over - causing excessive requests to upstream, second - to optimize for doing less requests in total.



209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
# File 'lib/serve_byte_range.rb', line 209

def self.coalesce_ranges(ranges)
  return [] if ranges.empty?
  # The RFC says https://www.rfc-editor.org/rfc/rfc7233#section-6.1
  #
  # > Servers ought to ignore, coalesce, or reject
  # > egregious range requests, such as requests for more than two
  # > overlapping ranges or for many small ranges in a single set,
  # > particularly when the ranges are requested out of order for no
  # > apparent reason.
  sorted_ranges = ranges.sort_by(&:begin)
  first = sorted_ranges.shift
  coalesced_sorted_ranges = sorted_ranges.each_with_object([first]) do |next_range, acc|
    prev_range = acc.pop
    if prev_range.end >= next_range.begin # Range#overlap? can be used on 3.3+
      new_begin = [prev_range.begin, next_range.begin].min
      new_end = [prev_range.end, next_range.end].max
      acc << Range.new(new_begin, new_end)
    else
      acc << prev_range << next_range
    end
  end
  # Sort the ranges according to the order the client requested.
  # The spec says that a client _may_ want to get a certain byte range first,
  # and it seems a legitimate use case, not ill intent.
  #
  # > A client that is requesting multiple ranges SHOULD list those ranges
  # > in ascending order (the order in which they would typically be
  # > received in a complete representation) unless there is a specific
  # > need to request a later part earlier.  For example, a user agent
  # > processing a large representation with an internal catalog of parts
  # > might need to request later parts first, particularly if the
  # > representation consists of pages stored in reverse order and the user
  # > agent wishes to transfer one page at a time.
  indices = ranges.map do |r|
    coalesced_sorted_ranges.find_index { |cr| cr.begin <= r.begin && cr.end >= r.end }
  end
  indices.uniq.map { |i| coalesced_sorted_ranges.fetch(i) }
end

.generate_boundaryObject

Strictly - the boundary is supposed to not appear in any of the parts of the multipart response, so first you need to scan the response, pick a byte sequence that does not occur in it, and then use that. In practice, nobody does that - and a well-behaved HTTP client should honor the Content-Range header when extracting the byte range from the response. See stackoverflow.com/questions/37413715



193
194
195
# File 'lib/serve_byte_range.rb', line 193

def self.generate_boundary
  Random.bytes(12).unpack1("H*")
end

.serve_ranges(env, resource_size:, etag: nil, resource_content_type: "binary/octet-stream", multipart_boundary: generate_boundary) {|range[Range], io[IO]| ... } ⇒ Array

Returns the Rack response triplet of ‘[status, header_hash, enumerable_body]`.

Examples:

status, headers, body = serve_ranges(env, resource_size: file.size) do |range, io|
  file.seek(range.begin)
  IO.copy_stream(file, io, range.size)
end
[status, headers, body]

Yields:

  • (range[Range], io[IO])

    The HTTP range being requested and the IO(ish) object to ‘write()` the bytes into



261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
# File 'lib/serve_byte_range.rb', line 261

def self.serve_ranges(env, resource_size:, etag: nil, resource_content_type: "binary/octet-stream", multipart_boundary: generate_boundary, &range_serving_block)
  # As per RFC:
  # If the entity tag given in the If-Range header matches the current cache validator for the entity,
  # then the server SHOULD provide the specified sub-range of the entity using a 206 (Partial Content)
  # response. If the cache validator does not match, then the server SHOULD return the entire entity
  # using a 200 (OK) response.
  wants_ranges_and_etag_valid = env["HTTP_IF_RANGE"] && env["HTTP_IF_RANGE"] == etag && env["HTTP_RANGE"]
  wants_ranges_and_no_etag = !env["HTTP_IF_RANGE"] && env["HTTP_RANGE"]
  wants_no_ranges_and_supplies_etag = env["HTTP_IF_NONE_MATCH"] && !env["HTTP_RANGE"] && !env["HTTP_IF_RANGE"]

  # Very old Rack versions do not have get_byte_ranges and have just byte_ranges
  http_ranges_from_header = Rack::Utils.respond_to?(:get_byte_ranges) ? Rack::Utils.get_byte_ranges(env["HTTP_RANGE"], resource_size) : Rack::Utils.byte_ranges(env, resource_size)
  http_ranges_from_header = coalesce_ranges(http_ranges_from_header) if http_ranges_from_header

  body = if wants_no_ranges_and_supplies_etag && env["HTTP_IF_NONE_MATCH"] == etag
    NotModifiedBody.new
  elsif resource_size.zero?
    EmptyBody.new
  elsif http_ranges_from_header && (wants_ranges_and_no_etag || wants_ranges_and_etag_valid)
    if http_ranges_from_header.none?
      Unsatisfiable.new(resource_size: resource_size)
    elsif http_ranges_from_header.one?
      ByteRangeBody.new(http_range: http_ranges_from_header.first, resource_size: resource_size, resource_content_type: resource_content_type, &range_serving_block)
    else
      MultipartByteRangesBody.new(http_ranges: http_ranges_from_header, resource_size: resource_size, resource_content_type: resource_content_type, boundary: multipart_boundary, &range_serving_block)
    end
  else
    WholeBody.new(resource_size: resource_size, resource_content_type: resource_content_type, &range_serving_block)
  end
  headers = body.headers

  etag = etag.inspect if etag && !etag.match?(/^".+"$/)
  headers["ETag"] = etag if etag

  [body.status, headers, body]
end