Class: AcceptLanguage::Parser

Inherits:

Object

Object
AcceptLanguage::Parser

show all

Defined in:: lib/accept_language/parser.rb

Overview

Accept-Language Header Parser

Parser handles the parsing of Accept-Language HTTP header field values as defined in RFC 2616 Section 14.4. It extracts language tags and their associated quality values (q-values), validates them according to the specification, and provides matching capabilities against application- supported languages.

Overview

The Accept-Language header field value consists of a comma-separated list of language ranges, each optionally accompanied by a quality value indicating relative preference. This parser:

Tokenizes the header into individual language-range entries
Extracts and validates language tags per BCP 47
Extracts and validates quality values per RFC 2616 Section 3.9
Stores valid entries for subsequent matching operations

Quality Values (q-values)

Quality values express the user’s relative preference for a language. Per RFC 2616 Section 3.9, the syntax is:

qvalue = ( "0" [ "." 0*3DIGIT ] ) | ( "1" [ "." 0*3("0") ] )

This means:

Values range from 0.000 to 1.000
Maximum of 3 decimal places
0 indicates “not acceptable”
1 indicates “most preferred” (default when omitted)

Examples of valid q-values: 0, 0.5, 0.75, 0.123, 1, 1.0, 1.000

Examples of invalid q-values (silently ignored): 1.5, 0.1234, -0.5, .5

Language Tags

Language tags follow the BCP 47 specification (RFC 5646), which supersedes the RFC 1766 reference in RFC 2616 Section 3.10. Valid tags consist of:

A primary subtag of 1-8 alphabetic characters (e.g., en, zh, ast)
Zero or more subtags of 1-8 alphanumeric characters, separated by hyphens
The special wildcard tag * (matches any language)

Examples of valid language tags:

en (English)
en-US (English, United States)
zh-Hant-TW (Chinese, Traditional script, Taiwan)
de-CH-1996 (German, Switzerland, 1996 orthography)
sr-Latn (Serbian, Latin script)
* (wildcard)

Internal Representation

Internally, quality values are stored as integers in the range 0-1000 (multiplied by 1000) to avoid floating-point comparison issues. This is an implementation detail and does not affect the public API.

Thread Safety

Parser instances are immutable after initialization. The languages_range hash is frozen, making Parser instances safe to share between threads.

Error Handling

The parser is lenient by design to handle real-world headers that may not strictly conform to specifications:

Invalid language tags are silently skipped
Invalid quality values cause the entry to be skipped
Empty or nil input results in an empty languages_range
Malformed entries (missing separators, etc.) are skipped

However, the parser is strict about input types: only String or nil are accepted for the field parameter.

Examples:

Basic usage

parser = AcceptLanguage::Parser.new("da, en-GB;q=0.8, en;q=0.7")
parser.match(:en, :da)
# => :da

Inspecting parsed languages

parser = AcceptLanguage::Parser.new("fr-CH;q=0.9, fr;q=0.8, en;q=0.7")
parser.languages_range
# => {"fr-ch"=>900, "fr"=>800, "en"=>700}

Handling wildcards

parser = AcceptLanguage::Parser.new("de, *;q=0.5")
parser.match(:ja, :de)
# => :de

Handling exclusions

parser = AcceptLanguage::Parser.new("*, en;q=0")
parser.match(:en, :fr)
# => :fr

Constant Summary collapse

DEFAULT_QUALITY =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Default quality value (1.0) scaled to internal integer representation.

When a language tag appears without an explicit quality value, it is assigned this default value, indicating maximum preference.

Returns:

(Integer) —

1000 (representing q=1.0)

Since:

1.0.0

1_000

DIGIT_ZERO =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

The ASCII digit zero character, used in quality value parsing.

Returns:

(String) —

“0”

Since:

1.0.0

"0"

DOT =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

The decimal point character, used in quality value parsing.

Returns:

(String) —

“.”

Since:

1.0.0

"."

FIELD_TYPE_ERROR =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Error message raised when field argument is not a String or nil.

This guards against accidental non-String values being passed to the parser, which would cause unexpected behavior during parsing.

Returns:

(String)

Since:

1.0.0

"Field must be a String or nil"

SEPARATOR =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

The comma character that separates language-range entries in the Accept-Language header field value.

Returns:

(String) —

“,”

Since:

1.0.0

","

SPACE =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

The space character, stripped during parsing as whitespace around separators is optional per RFC 2616.

Returns:

(String) —

“ ”

Since:

1.0.0

" "

SUFFIX =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

The suffix that precedes quality values in language-range entries. A language entry with a quality value has the form: langtag;q=qvalue

Returns:

(String) —

“;q=”

Since:

1.0.0

";q="

QVALUE_PATTERN =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Regular expression pattern for validating quality values.

Implements RFC 2616 Section 3.9 qvalue syntax:

qvalue = ( "0" [ "." 0*3DIGIT ] ) | ( "1" [ "." 0*3("0") ] )

This pattern accepts:

0 or 1 (integer form)

0. followed by 1-3 digits (e.g., 0.5, 0.75, 0.123)

1. followed by 1-3 zeros (e.g., 1.0, 1.00, 1.000)

Examples:

Valid matches

QVALUE_PATTERN.match?("0") # => true QVALUE_PATTERN.match?("0.5") # => true QVALUE_PATTERN.match?("0.123") # => true QVALUE_PATTERN.match?("1") # => true QVALUE_PATTERN.match?("1.0") # => true QVALUE_PATTERN.match?("1.000") # => true

Invalid (no match)

QVALUE_PATTERN.match?("0.1234") # => false (too many decimals) QVALUE_PATTERN.match?("1.5") # => false (> 1) QVALUE_PATTERN.match?("2") # => false (> 1) QVALUE_PATTERN.match?(".5") # => false (missing leading digit) QVALUE_PATTERN.match?("1.001") # => false (1.x must be zeros only)

Returns:

(Regexp)

Since:

1.0.0

/\A(?:0(?:\.[0-9]{1,3})?|1(?:\.0{1,3})?)\z/

LANGTAG_PATTERN =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Regular expression pattern for validating language tags.

Supports BCP 47 (RFC 5646) language tags, which supersede the RFC 1766 tags referenced in RFC 2616 Section 3.10.

Pattern Structure

The pattern accepts either:

The wildcard character *

A primary subtag (1-8 ALPHA) followed by zero or more subtags (each 1-8 ALPHANUM, preceded by a hyphen)

BCP 47 vs RFC 1766

RFC 2616 Section 3.10 references RFC 1766, which only allowed alphabetic characters in subtags. However, BCP 47 (the current standard) permits alphanumeric subtags to support:

Year-based variant subtags (e.g., 1996 in de-CH-1996)

Numeric region codes (e.g., 419 for Latin America)

Script subtags with numbers (rare but valid)

This implementation follows BCP 47 for maximum compatibility with modern language tags.

Examples:

Valid language tags

LANGTAG_PATTERN.match?("en") # => true LANGTAG_PATTERN.match?("en-US") # => true LANGTAG_PATTERN.match?("zh-Hant-TW") # => true LANGTAG_PATTERN.match?("de-CH-1996") # => true LANGTAG_PATTERN.match?("*") # => true

Invalid language tags

LANGTAG_PATTERN.match?("") # => false (empty) LANGTAG_PATTERN.match?("toolongprimary") # => false (> 8 chars) LANGTAG_PATTERN.match?("en_US") # => false (underscore) LANGTAG_PATTERN.match?("123") # => false (numeric primary)

Returns:

(Regexp)

Since:

1.0.0

/\A(?:\*|[a-zA-Z]{1,8}(?:-[a-zA-Z0-9]{1,8})*)\z/

Instance Attribute Summary collapse

#languages_range ⇒ Hash{String => Integer} readonly private

The parsed language preferences extracted from the Accept-Language header.

Instance Method Summary collapse

#initialize(field) ⇒ Parser constructor

Creates a new Parser instance by parsing the given Accept-Language header field value.
#match(*available_langtags) ⇒ Symbol^?

Finds the best matching language from the available options based on the user’s preferences expressed in the Accept-Language header.

Constructor Details

#initialize(field) ⇒ `Parser`

Creates a new Parser instance by parsing the given Accept-Language header field value.

The parser extracts all valid language-range entries from the header, validates their language tags and quality values, and stores them for subsequent matching operations.

Parsing Process

Validate that input is a String or nil
Convert nil to empty string
Normalize to lowercase for case-insensitive matching
Remove all spaces (whitespace is insignificant per RFC 2616)
Split on commas to get individual entries
For each entry:
1. Split on ;q= to separate tag from quality
2. Validate the language tag
3. Validate and parse the quality value (default 1.0 if absent)
4. Store valid entries in the languages_range hash

Examples:

Standard header

Parser.new("en-US, en;q=0.9, fr;q=0.8")

With wildcard

Parser.new("fr-FR, fr;q=0.9, *;q=0.5")

With exclusion

Parser.new("*, en;q=0")

Empty or nil input

Parser.new("")   # languages_range => {}
Parser.new(nil)  # languages_range => {}

Malformed input (invalid entries skipped)

Parser.new("en, invalid;;q=0.5, fr;q=0.8")
# languages_range => {"en"=>1000, "fr"=>800}

Parameters:

field (String, nil) —

the Accept-Language header field value. Common sources include request.env in Rack applications or request.headers in Rails. When nil is passed (header absent), it is treated as an empty string.

Raises:

(TypeError) —

if field is neither a String nor nil

Instance Attribute Details

#languages_range ⇒ `Hash{String => Integer}` (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

The parsed language preferences extracted from the Accept-Language header.

This hash maps downcased language tags to their quality values (scaled to integers 0-1000). Tags are stored in lowercase for case-insensitive matching.

Examples:

parser = Parser.new("en-GB;q=0.8, fr;q=0.9, de")
parser.languages_range
# => {"en-gb"=>800, "fr"=>900, "de"=>1000}

Returns:

(Hash{String => Integer}) —

language tags mapped to quality values

Since:

1.0.0



243
244
245

# File 'lib/accept_language/parser.rb', line 243

def languages_range
  @languages_range
end

Instance Method Details

#match(*available_langtags) ⇒ `Symbol`^?

Finds the best matching language from the available options based on the user’s preferences expressed in the Accept-Language header.

This method delegates to Matcher to perform the actual matching, which considers:

**Quality values**: Higher q-values indicate stronger preference
**Declaration order**: When q-values are equal, earlier declaration wins
**Prefix matching**: en matches en-US, en-GB, etc.
Wildcards: * matches any language not explicitly listed
Exclusions: q=0 explicitly excludes a language

Matching Algorithm

Remove any available languages that are explicitly excluded (+q=0+)
Iterate through preferred languages in descending quality order
For each preferred language, find the first available language that:
- Exactly matches the preferred tag, OR
- Has the preferred tag as a prefix (followed by a hyphen)
For wildcards, match any available language not already matched
Return the first match found, or nil if no match exists

Return Value Preservation

The method returns the language tag exactly as provided in the available_langtags argument, preserving the original case. This is important for direct use with I18n.locale and similar APIs.

Examples:

Basic matching

parser = Parser.new("da, en-GB;q=0.8, en;q=0.7")
parser.match(:en, :da)
# => :da

Regional variant matching

parser = Parser.new("en-GB, en;q=0.9")
parser.match(:en, :"en-GB", :"en-US")
# => :"en-GB"

Prefix matching

parser = Parser.new("en")
parser.match(:"en-US", :"en-GB")
# => :"en-US"  (first match wins)

No match found

parser = Parser.new("ja, zh")
parser.match(:en, :fr, :de)
# => nil

Wildcard matching

parser = Parser.new("en, *;q=0.5")
parser.match(:fr)
# => :fr  (matched by wildcard)

Exclusion

parser = Parser.new("*, en;q=0")
parser.match(:en, :fr)
# => :fr  (en is excluded)

With I18n

parser = Parser.new(request.env["HTTP_ACCEPT_LANGUAGE"])
locale = parser.match(*I18n.available_locales) || I18n.default_locale
I18n.locale = locale

Parameters:

available_langtags (Array<Symbol>) —

the languages your application supports. These are typically your I18n.available_locales or a similar list.

Returns:

(Symbol, nil) —

the best matching language tag from the available options, in its original form as passed to this method. Returns nil if no acceptable match is found.

Raises:

(TypeError) —

if any element in available_langtags is not a Symbol

Class: AcceptLanguage::Parser

Overview

Accept-Language Header Parser

Overview

Quality Values (q-values)

Language Tags

Internal Representation

Thread Safety

Error Handling

Examples:

Basic usage

Inspecting parsed languages

Handling wildcards

Handling exclusions

Constant Summary collapse

Examples:

Valid matches

Invalid (no match)

Pattern Structure

BCP 47 vs RFC 1766

Examples:

Valid language tags

Invalid language tags

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(field) ⇒ Parser

Parsing Process

Examples:

Standard header

With wildcard

With exclusion

Empty or nil input

Malformed input (invalid entries skipped)

Instance Attribute Details

#languages_range ⇒ Hash{String => Integer} (readonly)

Examples:

Instance Method Details

#match(*available_langtags) ⇒ Symbol?

Matching Algorithm

Return Value Preservation

Examples:

Basic matching

Regional variant matching

Prefix matching

No match found

Wildcard matching

Exclusion

With I18n

#initialize(field) ⇒ `Parser`

#languages_range ⇒ `Hash{String => Integer}` (readonly)

#match(*available_langtags) ⇒ `Symbol`^?