Class: AcceptLanguage::Parser
- Inherits:
-
Object
- Object
- AcceptLanguage::Parser
- Defined in:
- lib/accept_language/parser.rb
Overview
Accept-Language Header Parser
Parser handles the parsing of Accept-Language HTTP header field values as defined in RFC 7231 Section 5.3.5. It extracts language ranges and their associated quality values (q-values), validates them according to the specification, and provides matching capabilities against application- supported languages.
Overview
The Accept-Language header field value consists of a comma-separated list of language ranges, each optionally accompanied by a quality value indicating relative preference. This parser:
-
Tokenizes the header into individual language-range entries
-
Extracts and validates language ranges per RFC 4647 Section 2.1
-
Extracts and validates quality values per RFC 7231 Section 5.3.1
-
Stores valid entries for subsequent matching operations
Quality Values (q-values)
Quality values indicate the user’s relative preference for a language. Per RFC 7231 Section 5.3.1, the syntax is:
qvalue = ( "0" [ "." 0*3DIGIT ] ) | ( "1" [ "." 0*3("0") ] )
This means:
-
Values range from
0.000to1.000 -
Maximum of 3 decimal places
-
0indicates “not acceptable” -
1indicates “most preferred” (default when omitted)
Examples of valid q-values: 0, 0.5, 0.75, 0.123, 1, 1.0, 1.000
Examples of invalid q-values (silently ignored): 1.5, 0.1234, -0.5, .5
Language Ranges
Language ranges follow the Basic Language Range syntax defined in RFC 4647 Section 2.1:
language-range = (1*8ALPHA *("-" 1*8alphanum)) / "*"
alphanum = ALPHA / DIGIT
Valid ranges consist of:
-
A primary subtag of 1-8 alphabetic characters (e.g.,
en,zh,ast) -
Zero or more subtags of 1-8 alphanumeric characters, separated by hyphens
-
The special wildcard * (matches any language)
This syntax is compatible with BCP 47 language tags, supporting:
-
Year-based variant subtags (e.g.,
1996inde-CH-1996) -
Numeric region codes (e.g.,
419for Latin America) -
Script subtags (e.g.,
Hantfor Traditional Chinese script)
Examples of valid language ranges:
-
en(English) -
en-US(English, United States) -
zh-Hant-TW(Chinese, Traditional script, Taiwan) -
de-CH-1996(German, Switzerland, 1996 orthography) -
sr-Latn(Serbian, Latin script) -
* (wildcard)
Internal Representation
Internally, quality values are stored as integers in the range 0-1000 (multiplied by 1000) to avoid floating-point comparison issues. This is an implementation detail and does not affect the public API.
Thread Safety
Parser instances are immutable after initialization. The languages_range hash is frozen, making Parser instances safe to share between threads.
Error Handling
The parser is lenient by design to handle real-world headers that may not strictly conform to specifications:
-
Invalid language ranges are silently skipped
-
Invalid quality values cause the entry to be skipped
-
Empty or
nilinput results in an empty languages_range -
Malformed entries (missing separators, etc.) are skipped
However, the parser is strict about input types: only String or nil are accepted for the field parameter.
Constant Summary collapse
- DEFAULT_QUALITY =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Default quality value (1.0) scaled to internal integer representation.
When a language range appears without an explicit quality value, it is assigned this default value, indicating maximum preference.
1_000- DIGIT_ZERO =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
The ASCII digit zero character, used in quality value parsing.
"0"- DOT =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
The decimal point character, used in quality value parsing.
"."- FIELD_TYPE_ERROR =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Error message raised when
fieldargument is not a String or nil.This guards against accidental non-String values being passed to the parser, which would cause unexpected behavior during parsing.
"Field must be a String or nil"- SEPARATOR =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
The comma character that separates language-range entries in the Accept-Language header field value.
","- SPACE =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
The space character, stripped during parsing as whitespace around separators is optional per RFC 7231.
" "- SUFFIX =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
The suffix that precedes quality values in language-range entries. A language entry with a quality value has the form: language-range;q=qvalue
";q="- QVALUE_PATTERN =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Regular expression pattern for validating quality values.
Implements RFC 7231 Section 5.3.1 qvalue syntax:
qvalue = ( "0" [ "." 0*3DIGIT ] ) | ( "1" [ "." 0*3("0") ] )This pattern accepts:
-
0or1(integer form) -
0.followed by 1-3 digits (e.g.,0.5,0.75,0.123) -
1.followed by 1-3 zeros (e.g.,1.0,1.00,1.000)
-
/\A(?:0(?:\.[0-9]{1,3})?|1(?:\.0{1,3})?)\z/- LANGTAG_PATTERN =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Regular expression pattern for validating language ranges.
Implements RFC 4647 Section 2.1 Basic Language Range syntax:
language-range = (1*8ALPHA *("-" 1*8alphanum)) / "*" alphanum = ALPHA / DIGITPattern Structure
The pattern accepts either:
-
The wildcard character *
-
A primary subtag (1-8 ALPHA) followed by zero or more subtags (each 1-8 ALPHANUM, preceded by a hyphen)
This syntax is compatible with BCP 47 language tags, allowing alphanumeric subtags for variant subtags, numeric region codes, and other modern language tag features.
-
/\A(?:\*|[a-zA-Z]{1,8}(?:-[a-zA-Z0-9]{1,8})*)\z/
Instance Attribute Summary collapse
-
#languages_range ⇒ Hash{String => Integer}
readonly
private
The parsed language preferences extracted from the Accept-Language header.
Instance Method Summary collapse
-
#initialize(field) ⇒ Parser
constructor
Creates a new Parser instance by parsing the given Accept-Language header field value.
-
#match(*available_langtags) ⇒ Symbol?
Finds the best matching language from the available options based on the user’s preferences expressed in the Accept-Language header.
Constructor Details
#initialize(field) ⇒ Parser
Creates a new Parser instance by parsing the given Accept-Language header field value.
The parser extracts all valid language-range entries from the header, validates their language ranges and quality values, and stores them for subsequent matching operations.
Parsing Process
-
Validate that input is a String or nil
-
Convert nil to empty string
-
Normalize to lowercase for case-insensitive matching
-
Remove all spaces (whitespace is insignificant per RFC 7231)
-
Split on commas to get individual entries
-
For each entry:
-
Split on ;q= to separate range from quality
-
Validate the language range
-
Validate and parse the quality value (default 1.0 if absent)
-
Store valid entries in the languages_range hash
-
293 294 295 296 297 |
# File 'lib/accept_language/parser.rb', line 293 def initialize(field) raise ::TypeError, FIELD_TYPE_ERROR unless field.nil? || field.is_a?(::String) @languages_range = import(field) end |
Instance Attribute Details
#languages_range ⇒ Hash{String => Integer} (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The parsed language preferences extracted from the Accept-Language header.
This hash maps downcased language ranges to their quality values (scaled to integers 0-1000). Ranges are stored in lowercase for case-insensitive matching.
246 247 248 |
# File 'lib/accept_language/parser.rb', line 246 def languages_range @languages_range end |
Instance Method Details
#match(*available_langtags) ⇒ Symbol?
Finds the best matching language from the available options based on the user’s preferences expressed in the Accept-Language header.
This method delegates to Matcher to perform the actual matching using the Basic Filtering scheme defined in RFC 4647 Section 3.3.1, which considers:
-
**Quality values**: Higher q-values indicate stronger preference
-
**Declaration order**: When q-values are equal, earlier declaration wins
-
**Prefix matching**:
enmatchesen-US,en-GB, etc. -
Wildcards: * matches any language not explicitly listed
-
Exclusions: q=0 explicitly excludes a language
Matching Algorithm
-
Remove any available languages that are explicitly excluded (+q=0+)
-
Iterate through preferred languages in descending quality order
-
For each preferred language, find the first available language that:
-
Exactly matches the preferred range, OR
-
Has the preferred range as a prefix (followed by a hyphen)
-
-
For wildcards, match any available language not already matched
-
Return the first match found, or
nilif no match exists
Return Value Preservation
The method returns the language tag exactly as provided in the available_langtags argument, preserving the original case. This is important for direct use with I18n.locale and similar APIs.
375 376 377 |
# File 'lib/accept_language/parser.rb', line 375 def match(*) Matcher.new(**languages_range).call(*) end |