Class: AcceptLanguage::Parser
- Inherits:
-
Object
- Object
- AcceptLanguage::Parser
- Defined in:
- lib/accept_language/parser.rb
Overview
Accept-Language Header Parser
Parser handles the parsing of Accept-Language HTTP header field values as defined in RFC 2616 Section 14.4. It extracts language tags and their associated quality values (q-values), validates them according to the specification, and provides matching capabilities against application- supported languages.
Overview
The Accept-Language header field value consists of a comma-separated list of language ranges, each optionally accompanied by a quality value indicating relative preference. This parser:
-
Tokenizes the header into individual language-range entries
-
Extracts and validates language tags per BCP 47
-
Extracts and validates quality values per RFC 2616 Section 3.9
-
Stores valid entries for subsequent matching operations
Quality Values (q-values)
Quality values express the user’s relative preference for a language. Per RFC 2616 Section 3.9, the syntax is:
qvalue = ( "0" [ "." 0*3DIGIT ] ) | ( "1" [ "." 0*3("0") ] )
This means:
-
Values range from
0.000to1.000 -
Maximum of 3 decimal places
-
0indicates “not acceptable” -
1indicates “most preferred” (default when omitted)
Examples of valid q-values: 0, 0.5, 0.75, 0.123, 1, 1.0, 1.000
Examples of invalid q-values (silently ignored): 1.5, 0.1234, -0.5, .5
Language Tags
Language tags follow the BCP 47 specification (RFC 5646), which supersedes the RFC 1766 reference in RFC 2616 Section 3.10. Valid tags consist of:
-
A primary subtag of 1-8 alphabetic characters (e.g.,
en,zh,ast) -
Zero or more subtags of 1-8 alphanumeric characters, separated by hyphens
-
The special wildcard tag * (matches any language)
Examples of valid language tags:
-
en(English) -
en-US(English, United States) -
zh-Hant-TW(Chinese, Traditional script, Taiwan) -
de-CH-1996(German, Switzerland, 1996 orthography) -
sr-Latn(Serbian, Latin script) -
* (wildcard)
Internal Representation
Internally, quality values are stored as integers in the range 0-1000 (multiplied by 1000) to avoid floating-point comparison issues. This is an implementation detail and does not affect the public API.
Thread Safety
Parser instances are immutable after initialization. The languages_range hash is frozen, making Parser instances safe to share between threads.
Error Handling
The parser is lenient by design to handle real-world headers that may not strictly conform to specifications:
-
Invalid language tags are silently skipped
-
Invalid quality values cause the entry to be skipped
-
Empty or
nilinput results in an empty languages_range -
Malformed entries (missing separators, etc.) are skipped
However, the parser is strict about input types: only String or nil are accepted for the field parameter.
Constant Summary collapse
- DEFAULT_QUALITY =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Default quality value (1.0) scaled to internal integer representation.
When a language tag appears without an explicit quality value, it is assigned this default value, indicating maximum preference.
1_000- DIGIT_ZERO =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
The ASCII digit zero character, used in quality value parsing.
"0"- DOT =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
The decimal point character, used in quality value parsing.
"."- FIELD_TYPE_ERROR =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Error message raised when
fieldargument is not a String or nil.This guards against accidental non-String values being passed to the parser, which would cause unexpected behavior during parsing.
"Field must be a String or nil"- SEPARATOR =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
The comma character that separates language-range entries in the Accept-Language header field value.
","- SPACE =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
The space character, stripped during parsing as whitespace around separators is optional per RFC 2616.
" "- SUFFIX =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
The suffix that precedes quality values in language-range entries. A language entry with a quality value has the form: langtag;q=qvalue
";q="- QVALUE_PATTERN =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Regular expression pattern for validating quality values.
Implements RFC 2616 Section 3.9 qvalue syntax:
qvalue = ( "0" [ "." 0*3DIGIT ] ) | ( "1" [ "." 0*3("0") ] )This pattern accepts:
-
0or1(integer form) -
0.followed by 1-3 digits (e.g.,0.5,0.75,0.123) -
1.followed by 1-3 zeros (e.g.,1.0,1.00,1.000)
-
/\A(?:0(?:\.[0-9]{1,3})?|1(?:\.0{1,3})?)\z/- LANGTAG_PATTERN =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Regular expression pattern for validating language tags.
Supports BCP 47 (RFC 5646) language tags, which supersede the RFC 1766 tags referenced in RFC 2616 Section 3.10.
Pattern Structure
The pattern accepts either:
-
The wildcard character *
-
A primary subtag (1-8 ALPHA) followed by zero or more subtags (each 1-8 ALPHANUM, preceded by a hyphen)
BCP 47 vs RFC 1766
RFC 2616 Section 3.10 references RFC 1766, which only allowed alphabetic characters in subtags. However, BCP 47 (the current standard) permits alphanumeric subtags to support:
-
Year-based variant subtags (e.g.,
1996inde-CH-1996) -
Numeric region codes (e.g.,
419for Latin America) -
Script subtags with numbers (rare but valid)
This implementation follows BCP 47 for maximum compatibility with modern language tags.
-
/\A(?:\*|[a-zA-Z]{1,8}(?:-[a-zA-Z0-9]{1,8})*)\z/
Instance Attribute Summary collapse
-
#languages_range ⇒ Hash{String => Integer}
readonly
private
The parsed language preferences extracted from the Accept-Language header.
Instance Method Summary collapse
-
#initialize(field) ⇒ Parser
constructor
Creates a new Parser instance by parsing the given Accept-Language header field value.
-
#match(*available_langtags) ⇒ Symbol?
Finds the best matching language from the available options based on the user’s preferences expressed in the Accept-Language header.
Constructor Details
#initialize(field) ⇒ Parser
Creates a new Parser instance by parsing the given Accept-Language header field value.
The parser extracts all valid language-range entries from the header, validates their language tags and quality values, and stores them for subsequent matching operations.
Parsing Process
-
Validate that input is a String or nil
-
Convert nil to empty string
-
Normalize to lowercase for case-insensitive matching
-
Remove all spaces (whitespace is insignificant per RFC 2616)
-
Split on commas to get individual entries
-
For each entry:
-
Split on ;q= to separate tag from quality
-
Validate the language tag
-
Validate and parse the quality value (default 1.0 if absent)
-
Store valid entries in the languages_range hash
-
290 291 292 293 294 |
# File 'lib/accept_language/parser.rb', line 290 def initialize(field) raise ::TypeError, FIELD_TYPE_ERROR unless field.nil? || field.is_a?(::String) @languages_range = import(field) end |
Instance Attribute Details
#languages_range ⇒ Hash{String => Integer} (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
The parsed language preferences extracted from the Accept-Language header.
This hash maps downcased language tags to their quality values (scaled to integers 0-1000). Tags are stored in lowercase for case-insensitive matching.
243 244 245 |
# File 'lib/accept_language/parser.rb', line 243 def languages_range @languages_range end |
Instance Method Details
#match(*available_langtags) ⇒ Symbol?
Finds the best matching language from the available options based on the user’s preferences expressed in the Accept-Language header.
This method delegates to Matcher to perform the actual matching, which considers:
-
**Quality values**: Higher q-values indicate stronger preference
-
**Declaration order**: When q-values are equal, earlier declaration wins
-
**Prefix matching**:
enmatchesen-US,en-GB, etc. -
Wildcards: * matches any language not explicitly listed
-
Exclusions: q=0 explicitly excludes a language
Matching Algorithm
-
Remove any available languages that are explicitly excluded (+q=0+)
-
Iterate through preferred languages in descending quality order
-
For each preferred language, find the first available language that:
-
Exactly matches the preferred tag, OR
-
Has the preferred tag as a prefix (followed by a hyphen)
-
-
For wildcards, match any available language not already matched
-
Return the first match found, or
nilif no match exists
Return Value Preservation
The method returns the language tag exactly as provided in the available_langtags argument, preserving the original case. This is important for direct use with I18n.locale and similar APIs.
371 372 373 |
# File 'lib/accept_language/parser.rb', line 371 def match(*) Matcher.new(**languages_range).call(*) end |