Class: Care::Cache

Inherits:

Object

Object
Care::Cache

show all

Defined in:: lib/care.rb

Overview

Stores cached pages of data from the given IO as strings. Pages are sized to be ‘page_size` or less (for the last page).

Instance Method Summary collapse

#byteslice(io, at, n_bytes) ⇒ String^?

Returns the maximum possible byte string that can be recovered from the given ‘io` at the given offset.
#clear ⇒ Object

Clears the page cache of all strings with data.
#hydrate_page(io, page_i) ⇒ Object

Hydrates a page at the certain index or returns the contents of that page if it is already in the cache.
#initialize(page_size = DEFAULT_PAGE_SIZE) ⇒ Cache constructor

Initializes a new cache pages container with pages of given size.
#inspect ⇒ Object

We provide an overridden implementation of #inspect to avoid printing the actual contents of the cached pages.
#read_page(io, page_i) ⇒ Object

Reads the requested page from the given IO.

Constructor Details

#initialize(page_size = DEFAULT_PAGE_SIZE) ⇒ `Cache`

Initializes a new cache pages container with pages of given size

Raises:

(ArgumentError)

# File 'lib/care.rb', line 80

def initialize(page_size = DEFAULT_PAGE_SIZE)
  @page_size = page_size.to_i
  raise ArgumentError, 'The page size must be a positive Integer' unless @page_size > 0
  @pages = {}
  @lowest_known_empty_page = nil
end

Instance Method Details

#byteslice(io, at, n_bytes) ⇒ `String`^?

Returns the maximum possible byte string that can be recovered from the given ‘io` at the given offset. If the IO has been exhausted, `nil` will be returned instead. Will use the cached pages where available, or fetch pages where necessary

Parameters:

io (#seek, #read) —

the IO to read data from
at (Integer) —

at which offset we have to read
n_bytes (Integer) —

how many bytes we want to read/cache

Returns:

(String, nil) —

the content read from the IO or ‘nil` if no data was available

Raises:

ArgumentError

# File 'lib/care.rb', line 98

def byteslice(io, at, n_bytes)
  raise ArgumentError, "The number of bytes to fetch must be a positive Integer, but was #{n_bytes}" if n_bytes < 1
  raise ArgumentError, "Negative offsets are not supported (got #{at})" if at < 0

  first_page = at / @page_size
  last_page = (at + n_bytes) / @page_size

  relevant_pages = (first_page..last_page).map { |i| hydrate_page(io, i) }

  # Create one string combining all the pages which are relevant for
  # us - it is much easier to address that string instead of piecing
  # the output together page by page, and joining arrays of strings
  # is supposed to be optimized.
  slab = if relevant_pages.length > 1
    # If our read overlaps multiple pages, we do have to join them, this is
    # the general case
    relevant_pages.join
  else # We only have one page
    # Optimize a little. If we only have one page that we need to read from
    # - which is likely going to be the case *often* we can avoid allocating
    # a new string for the joined pages and juse use the only page
    # directly as the slab. Since it might contain a `nil` and we do
    # not join (which casts nils to strings) we take care of that too
    relevant_pages.first || ''
  end

  offset_in_slab = at % @page_size
  slice = slab.byteslice(offset_in_slab, n_bytes)

  # Returning an empty string from read() is very confusing for the caller,
  # and no builtins do this - if we are at EOF we should return nil
  slice if slice && !slice.empty?
end

#clear ⇒ `Object`

Clears the page cache of all strings with data

Returns:

void

# File 'lib/care.rb', line 135

def clear
  @pages.map { |maybe_page_str| maybe_page_str.clear if maybe_page_str.respond_to?(:clear) }
  @pages.clear
end

#hydrate_page(io, page_i) ⇒ `Object`

Hydrates a page at the certain index or returns the contents of that page if it is already in the cache

Parameters:

io (IO) —

the IO to read from
page_i (Integer) —

which page (zero-based) to hydrate and return

# File 'lib/care.rb', line 145

def hydrate_page(io, page_i)
  # Avoid trying to read the page if we know there is no content to fill it
  # in the underlying IO
  return if @lowest_known_empty_page && page_i >= @lowest_known_empty_page

  @pages[page_i] ||= read_page(io, page_i)
end

#inspect ⇒ `Object`

We provide an overridden implementation of #inspect to avoid printing the actual contents of the cached pages

# File 'lib/care.rb', line 155

def inspect
  # Simulate the builtin object ID output https://stackoverflow.com/a/11765495/153886
  oid_str = (object_id << 1).to_s(16).rjust(16, '0')

  ivars = instance_variables
  ivars.delete(:@pages)
  ivars_str = ivars.map do |ivar|
    "#{ivar}=#{instance_variable_get(ivar).inspect}"
  end.join(' ')
  synthetic_vars = 'num_hydrated_pages=%d' % @pages.length
  '#<%s:%s %s %s>' % [self.class, oid_str, synthetic_vars, ivars_str]
end

#read_page(io, page_i) ⇒ `Object`

Reads the requested page from the given IO

Parameters:

io (IO) —

the IO to read from
page_i (Integer) —

which page (zero-based) to read

# File 'lib/care.rb', line 172

def read_page(io, page_i)
  Measurometer.increment_counter('format_parser.parser.care.page_reads_from_upsteam', 1)

  io.seek(page_i * @page_size)
  read_result = Measurometer.instrument('format_parser.care.read_page') { io.read(@page_size) }
  if read_result.nil?
    # If the read went past the end of the IO the read result will be nil,
    # so we know our IO is exhausted here
    @lowest_known_empty_page = page_i if @lowest_known_empty_page.nil? || @lowest_known_empty_page > page_i
  elsif read_result.bytesize < @page_size
    # If we read less than we initially wanted we know there are no pages
    # to read following this one, so we can also optimize
    @lowest_known_empty_page = page_i + 1
  end

  read_result
end

Class: Care::Cache

Overview

Instance Method Summary collapse

Constructor Details

#initialize(page_size = DEFAULT_PAGE_SIZE) ⇒ Cache

Instance Method Details

#byteslice(io, at, n_bytes) ⇒ String?

#clear ⇒ Object

#hydrate_page(io, page_i) ⇒ Object

#inspect ⇒ Object

#read_page(io, page_i) ⇒ Object

#initialize(page_size = DEFAULT_PAGE_SIZE) ⇒ `Cache`

#byteslice(io, at, n_bytes) ⇒ `String`^?

#clear ⇒ `Object`

#hydrate_page(io, page_i) ⇒ `Object`

#inspect ⇒ `Object`

#read_page(io, page_i) ⇒ `Object`