Class: AcceptLanguage::Parser

Inherits:
Object
  • Object
show all
Defined in:
lib/accept_language/parser.rb

Overview

Accept-Language Header Parser

Parser handles the parsing of Accept-Language HTTP header field values as defined in RFC 7231 Section 5.3.5. It extracts language ranges and their associated quality values (q-values), validates them according to the specification, and provides matching capabilities against application- supported languages.

Overview

The Accept-Language header field value consists of a comma-separated list of language ranges, each optionally accompanied by a quality value indicating relative preference. This parser:

  1. Tokenizes the header into individual language-range entries

  2. Extracts and validates language ranges per RFC 4647 Section 2.1

  3. Extracts and validates quality values per RFC 7231 Section 5.3.1

  4. Stores valid entries for subsequent matching operations

Quality Values (q-values)

Quality values indicate the user’s relative preference for a language. Per RFC 7231 Section 5.3.1, the syntax is:

qvalue = ( "0" [ "." 0*3DIGIT ] ) | ( "1" [ "." 0*3("0") ] )

This means:

  • Values range from 0.000 to 1.000

  • Maximum of 3 decimal places

  • 0 indicates “not acceptable”

  • 1 indicates “most preferred” (default when omitted)

Examples of valid q-values: 0, 0.5, 0.75, 0.123, 1, 1.0, 1.000

Examples of invalid q-values (silently ignored): 1.5, 0.1234, -0.5, .5

Language Ranges

Language ranges follow the Basic Language Range syntax defined in RFC 4647 Section 2.1:

language-range = (1*8ALPHA *("-" 1*8alphanum)) / "*"
alphanum       = ALPHA / DIGIT

Valid ranges consist of:

  • A primary subtag of 1-8 alphabetic characters (e.g., en, zh, ast)

  • Zero or more subtags of 1-8 alphanumeric characters, separated by hyphens

  • The special wildcard * (matches any language)

This syntax is compatible with BCP 47 language tags, supporting:

  • Year-based variant subtags (e.g., 1996 in de-CH-1996)

  • Numeric region codes (e.g., 419 for Latin America)

  • Script subtags (e.g., Hant for Traditional Chinese script)

Examples of valid language ranges:

  • en (English)

  • en-US (English, United States)

  • zh-Hant-TW (Chinese, Traditional script, Taiwan)

  • de-CH-1996 (German, Switzerland, 1996 orthography)

  • sr-Latn (Serbian, Latin script)

  • * (wildcard)

Internal Representation

Internally, quality values are stored as integers in the range 0-1000 (multiplied by 1000) to avoid floating-point comparison issues. This is an implementation detail and does not affect the public API.

Thread Safety

Parser instances are immutable after initialization. The languages_range hash is frozen, making Parser instances safe to share between threads.

Error Handling

The parser is lenient by design to handle real-world headers that may not strictly conform to specifications:

  • Invalid language ranges are silently skipped

  • Invalid quality values cause the entry to be skipped

  • Empty or nil input results in an empty languages_range

  • Malformed entries (missing separators, etc.) are skipped

However, the parser is strict about input types: only String or nil are accepted for the field parameter.

Examples:

Basic usage

parser = AcceptLanguage::Parser.new("da, en-GB;q=0.8, en;q=0.7")
parser.match(:en, :da)
# => :da

Inspecting parsed languages

parser = AcceptLanguage::Parser.new("fr-CH;q=0.9, fr;q=0.8, en;q=0.7")
parser.languages_range
# => {"fr-ch"=>900, "fr"=>800, "en"=>700}

Handling wildcards

parser = AcceptLanguage::Parser.new("de, *;q=0.5")
parser.match(:ja, :de)
# => :de

Handling exclusions

parser = AcceptLanguage::Parser.new("*, en;q=0")
parser.match(:en, :fr)
# => :fr

See Also:

Since:

  • 1.0.0

Constant Summary collapse

DEFAULT_QUALITY =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Default quality value (1.0) scaled to internal integer representation.

When a language range appears without an explicit quality value, it is assigned this default value, indicating maximum preference.

Since:

  • 1.0.0

1_000
DIGIT_ZERO =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

The ASCII digit zero character, used in quality value parsing.

Since:

  • 1.0.0

"0"
DOT =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

The decimal point character, used in quality value parsing.

Since:

  • 1.0.0

"."
FIELD_TYPE_ERROR =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Error message raised when field argument is not a String or nil.

This guards against accidental non-String values being passed to the parser, which would cause unexpected behavior during parsing.

Since:

  • 1.0.0

"Field must be a String or nil"
SEPARATOR =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

The comma character that separates language-range entries in the Accept-Language header field value.

Since:

  • 1.0.0

","
SPACE =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

The space character, stripped during parsing as whitespace around separators is optional per RFC 7231.

Since:

  • 1.0.0

" "
SUFFIX =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

The suffix that precedes quality values in language-range entries. A language entry with a quality value has the form: language-range;q=qvalue

Since:

  • 1.0.0

";q="
QVALUE_PATTERN =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Regular expression pattern for validating quality values.

Implements RFC 7231 Section 5.3.1 qvalue syntax:

qvalue = ( "0" [ "." 0*3DIGIT ] ) | ( "1" [ "." 0*3("0") ] )

This pattern accepts:

  • 0 or 1 (integer form)

  • 0. followed by 1-3 digits (e.g., 0.5, 0.75, 0.123)

  • 1. followed by 1-3 zeros (e.g., 1.0, 1.00, 1.000)

Examples:

Valid matches

QVALUE_PATTERN.match?("0")     # => true
QVALUE_PATTERN.match?("0.5")   # => true
QVALUE_PATTERN.match?("0.123") # => true
QVALUE_PATTERN.match?("1")     # => true
QVALUE_PATTERN.match?("1.0")   # => true
QVALUE_PATTERN.match?("1.000") # => true

Invalid (no match)

QVALUE_PATTERN.match?("0.1234") # => false (too many decimals)
QVALUE_PATTERN.match?("1.5")    # => false (> 1)
QVALUE_PATTERN.match?("2")      # => false (> 1)
QVALUE_PATTERN.match?(".5")     # => false (missing leading digit)
QVALUE_PATTERN.match?("1.001")  # => false (1.x must be zeros only)

Since:

  • 1.0.0

/\A(?:0(?:\.[0-9]{1,3})?|1(?:\.0{1,3})?)\z/
LANGTAG_PATTERN =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Regular expression pattern for validating language ranges.

Implements RFC 4647 Section 2.1 Basic Language Range syntax:

language-range = (1*8ALPHA *("-" 1*8alphanum)) / "*"
alphanum       = ALPHA / DIGIT

Pattern Structure

The pattern accepts either:

  • The wildcard character *

  • A primary subtag (1-8 ALPHA) followed by zero or more subtags (each 1-8 ALPHANUM, preceded by a hyphen)

This syntax is compatible with BCP 47 language tags, allowing alphanumeric subtags for variant subtags, numeric region codes, and other modern language tag features.

Examples:

Valid language ranges

LANGTAG_PATTERN.match?("en")         # => true
LANGTAG_PATTERN.match?("en-US")      # => true
LANGTAG_PATTERN.match?("zh-Hant-TW") # => true
LANGTAG_PATTERN.match?("de-CH-1996") # => true
LANGTAG_PATTERN.match?("*")          # => true

Invalid language ranges

LANGTAG_PATTERN.match?("")              # => false (empty)
LANGTAG_PATTERN.match?("toolongprimary") # => false (> 8 chars)
LANGTAG_PATTERN.match?("en_US")         # => false (underscore)
LANGTAG_PATTERN.match?("123")           # => false (numeric primary)

Since:

  • 1.0.0

/\A(?:\*|[a-zA-Z]{1,8}(?:-[a-zA-Z0-9]{1,8})*)\z/

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(field) ⇒ Parser

Creates a new Parser instance by parsing the given Accept-Language header field value.

The parser extracts all valid language-range entries from the header, validates their language ranges and quality values, and stores them for subsequent matching operations.

Parsing Process

  1. Validate that input is a String or nil

  2. Convert nil to empty string

  3. Normalize to lowercase for case-insensitive matching

  4. Remove all spaces (whitespace is insignificant per RFC 7231)

  5. Split on commas to get individual entries

  6. For each entry:

    1. Split on ;q= to separate range from quality

    2. Validate the language range

    3. Validate and parse the quality value (default 1.0 if absent)

    4. Store valid entries in the languages_range hash

Examples:

Standard header

Parser.new("en-US, en;q=0.9, fr;q=0.8")

With wildcard

Parser.new("fr-FR, fr;q=0.9, *;q=0.5")

With exclusion

Parser.new("*, en;q=0")

Empty or nil input

Parser.new("")   # languages_range => {}
Parser.new(nil)  # languages_range => {}

Malformed input (invalid entries skipped)

Parser.new("en, invalid;;q=0.5, fr;q=0.8")
# languages_range => {"en"=>1000, "fr"=>800}

Raises:

  • (TypeError)

    if field is neither a String nor nil

See Also:

Since:

  • 1.0.0



293
294
295
296
297
# File 'lib/accept_language/parser.rb', line 293

def initialize(field)
  raise ::TypeError, FIELD_TYPE_ERROR unless field.nil? || field.is_a?(::String)

  @languages_range = import(field)
end

Instance Attribute Details

#languages_rangeHash{String => Integer} (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

The parsed language preferences extracted from the Accept-Language header.

This hash maps downcased language ranges to their quality values (scaled to integers 0-1000). Ranges are stored in lowercase for case-insensitive matching.

Examples:

parser = Parser.new("en-GB;q=0.8, fr;q=0.9, de")
parser.languages_range
# => {"en-gb"=>800, "fr"=>900, "de"=>1000}

Since:

  • 1.0.0



246
247
248
# File 'lib/accept_language/parser.rb', line 246

def languages_range
  @languages_range
end

Instance Method Details

#match(*available_langtags) ⇒ Symbol?

Finds the best matching language from the available options based on the user’s preferences expressed in the Accept-Language header.

This method delegates to Matcher to perform the actual matching using the Basic Filtering scheme defined in RFC 4647 Section 3.3.1, which considers:

  1. **Quality values**: Higher q-values indicate stronger preference

  2. **Declaration order**: When q-values are equal, earlier declaration wins

  3. **Prefix matching**: en matches en-US, en-GB, etc.

  4. Wildcards: * matches any language not explicitly listed

  5. Exclusions: q=0 explicitly excludes a language

Matching Algorithm

  1. Remove any available languages that are explicitly excluded (+q=0+)

  2. Iterate through preferred languages in descending quality order

  3. For each preferred language, find the first available language that:

    • Exactly matches the preferred range, OR

    • Has the preferred range as a prefix (followed by a hyphen)

  4. For wildcards, match any available language not already matched

  5. Return the first match found, or nil if no match exists

Return Value Preservation

The method returns the language tag exactly as provided in the available_langtags argument, preserving the original case. This is important for direct use with I18n.locale and similar APIs.

Examples:

Basic matching

parser = Parser.new("da, en-GB;q=0.8, en;q=0.7")
parser.match(:en, :da)
# => :da

Regional variant matching

parser = Parser.new("en-GB, en;q=0.9")
parser.match(:en, :"en-GB", :"en-US")
# => :"en-GB"

Prefix matching

parser = Parser.new("en")
parser.match(:"en-US", :"en-GB")
# => :"en-US"  (first match wins)

No match found

parser = Parser.new("ja, zh")
parser.match(:en, :fr, :de)
# => nil

Wildcard matching

parser = Parser.new("en, *;q=0.5")
parser.match(:fr)
# => :fr  (matched by wildcard)

Exclusion

parser = Parser.new("*, en;q=0")
parser.match(:en, :fr)
# => :fr  (en is excluded)

With I18n

parser = Parser.new(request.env["HTTP_ACCEPT_LANGUAGE"])
locale = parser.match(*I18n.available_locales) || I18n.default_locale
I18n.locale = locale

Raises:

  • (TypeError)

    if any element in available_langtags is not a Symbol

See Also:

Since:

  • 1.0.0



375
376
377
# File 'lib/accept_language/parser.rb', line 375

def match(*available_langtags)
  Matcher.new(**languages_range).call(*available_langtags)
end