Class: FormatParser::ReadLimitsConfig
- Inherits:
-
Object
- Object
- FormatParser::ReadLimitsConfig
- Defined in:
- lib/read_limits_config.rb
Overview
We need to apply various limits so that parsers do not over-read, do not cause too many HTTP requests to be dispatched and so on. These should be balanced with one another- for example, we cannot tell a parser that it is limited to reading 1024 bytes while at the same time limiting the size of the cache pages it may slurp in to less than that amount, since it can quickly become frustrating. ReadLimitsConfig computes these limits for us, in a fairly balanced way, based on one setting.
Constant Summary collapse
- MAX_PAGE_FAULTS =
16
Instance Method Summary collapse
-
#cache_page_size ⇒ Object
How big should the cache page be.
-
#initialize(total_bytes_available_per_parser) ⇒ ReadLimitsConfig
constructor
A new instance of ReadLimitsConfig.
-
#max_pagefaults_per_parser ⇒ Object
Each parser can incur HTTP requests when performing ‘parse_http`.
-
#max_read_bytes_per_parser ⇒ Object
Defines how many bytes each parser may request to read from the IO object given to it.
-
#max_reads_per_parser ⇒ Object
Defines how many ‘#read` calls each parser may perform on the IO object given to it.
-
#max_seeks_per_parser ⇒ Object
Defines how many ‘#seek` calls each parser may perform on the IO object given to it.
Constructor Details
#initialize(total_bytes_available_per_parser) ⇒ ReadLimitsConfig
Returns a new instance of ReadLimitsConfig.
10 11 12 |
# File 'lib/read_limits_config.rb', line 10 def initialize(total_bytes_available_per_parser) @max_read_bytes_per_parser = total_bytes_available_per_parser.to_i end |
Instance Method Details
#cache_page_size ⇒ Object
How big should the cache page be. Each cache page read will incur one ‘#read` on the underlying IO object, remote or local
28 29 30 |
# File 'lib/read_limits_config.rb', line 28 def cache_page_size @max_read_bytes_per_parser / 4 end |
#max_pagefaults_per_parser ⇒ Object
Each parser can incur HTTP requests when performing ‘parse_http`. This constant sets the maximum number of pages each parser is allowed to hit that have not been fetched previously and are not stored in the cache. For example, with most formats the first cache page and the last cache page - tail and head of the file, respectively - will be available right after the first parser retreives some data. The second parser accessing the same data will reuse the in-memory cache.
38 39 40 |
# File 'lib/read_limits_config.rb', line 38 def max_pagefaults_per_parser MAX_PAGE_FAULTS end |
#max_read_bytes_per_parser ⇒ Object
Defines how many bytes each parser may request to read from the IO object given to it. Is used to artificially limit unbounded reads in parsers that may wander off and try to gulp in the file given to them indefinitely due to infinite loops or wrongly implemented skips - or when handling data that has been deliberately crafted in a way that can make a parser misbehave. This is less strict than one could think - for example, the MOOV parser used for Quicktime files will skip over the actual atom contents of the atoms, and will only read atom headers - which stays under this limit for quite some time.
22 23 24 |
# File 'lib/read_limits_config.rb', line 22 def max_read_bytes_per_parser @max_read_bytes_per_parser end |
#max_reads_per_parser ⇒ Object
Defines how many ‘#read` calls each parser may perform on the IO object given to it. Is used to artificially limit unbounded reads in parsers that may wander off and try to gulp in the file given to them indefinitely due to infinite loops or wrongly implemented skips - or when handling data that has been deliberately crafted in a way that can make a parser misbehave.
47 48 49 50 |
# File 'lib/read_limits_config.rb', line 47 def max_reads_per_parser # Imagine we read per single byte @max_read_bytes_per_parser / 2 end |
#max_seeks_per_parser ⇒ Object
Defines how many ‘#seek` calls each parser may perform on the IO object given to it. Is used to artificially limit unbounded reads in parsers that may wander off and try to gulp in the file given to them indefinitely due to infinite loops or wrongly implemented skips - or when handling data that has been deliberately crafted in a way that can make a parser misbehave.
57 58 59 60 |
# File 'lib/read_limits_config.rb', line 57 def max_seeks_per_parser # Imagine we have to seek once per byte @max_read_bytes_per_parser / 2 end |