Unicode::DisplayWidth
Determines the monospace display width of a string in Ruby. Implementation based on EastAsianWidth.txt and other data, 100% in Ruby. It does not rely on the OS vendor (like wcwidth()) to provide an up-to-date method for measuring string width.
Unicode version: 13.0.0 (March 2020)
Supported Rubies: 3.0, 2.7, 2.6, 2.5
Old Rubies which might still work: 2.4, 2.3, 2.2, 2.1, 2.0, 1.9
Version 2.0 — Breaking Changes
Some features of this library were marked deprecated for a long time and have been removed with Version 2.0:
- Aliases of display_width (…_size, …_length) have been removed
- Auto-loading of string core extension has been removed:
If you are relying on the String#display_width
string extension to be automatically loaded (old behavior), please load it explicitly now:
require "unicode/display_width/string_ext"
You could also change your Gemfile
line to achieve this:
gem "unicode-display_width", require: "unicode/display_width/string_ext"
Introduction to Character Widths
Guessing the correct space a character will consume on terminals is not easy. There is no single standard. Most implementations combine data from East Asian Width, some General Categories, and hand-picked adjustments.
How this Library Handles Widths
Further at the top means higher precedence. Please expect changes to this algorithm with every MINOR version update (the X in 1.X.0)!
Width | Characters | Comment |
---|---|---|
X | (user defined) | Overwrites any other values |
-1 | "\b" |
Backspace (total width never below 0) |
0 | "\0" , "\x05" , "\a" , "\n" , "\v" , "\f" , "\r" , "\x0E" , "\x0F" |
C0 control codes that do not change horizontal width |
1 | "\u{00AD}" |
SOFT HYPHEN |
2 | "\u{2E3A}" |
TWO-EM DASH |
3 | "\u{2E3B}" |
THREE-EM DASH |
0 | General Categories: Mn, Me, Cf (non-arabic) | Excludes ARABIC format characters |
0 | "\u{1160}".."\u{11FF}" |
HANGUL JUNGSEONG |
0 | "\u{2060}".."\u{206F}" , "\u{FFF0}".."\u{FFF8}" , "\u{E0000}".."\u{E0FFF}" |
Ignorable ranges |
2 | East Asian Width: F, W | Full-width characters |
2 | "\u{3400}".."\u{4DBF}" , "\u{4E00}".."\u{9FFF}" , "\u{F900}".."\u{FAFF}" , "\u{20000}".."\u{2FFFD}" , "\u{30000}".."\u{3FFFD}" |
Full-width ranges |
1 or 2 | East Asian Width: A | Ambiguous characters, user defined, default: 1 |
1 | All other codepoints | - |
Install
Install the gem with:
$ gem install unicode-display_width
Or add to your Gemfile:
gem 'unicode-display_width'
Usage
Classic API
require 'unicode/display_width'
Unicode::DisplayWidth.of("⚀") # => 1
Unicode::DisplayWidth.of("一") # => 2
Ambiguous Characters
The second parameter defines the value returned by characters defined as ambiguous:
Unicode::DisplayWidth.of("·", 1) # => 1
Unicode::DisplayWidth.of("·", 2) # => 2
Custom Overwrites
You can overwrite how to handle specific code points by passing a hash (or even a proc) as third parameter:
Unicode::DisplayWidth.of("a\tb", 1, "\t".ord => 10)) # => tab counted as 10, so result is 12
Emoji Support
Emoji width support is included, but in must be activated manually. It will adjust the string's size for modifier and zero-width joiner sequences. You also need to add the unicode-emoji gem to your Gemfile:
gem 'unicode-display_width'
gem 'unicode-emoji'
Enable the emoji string width adjustments by passing emoji: true
as fourth parameter:
Unicode::DisplayWidth.of "🤾🏽♀️" # => 5
Unicode::DisplayWidth.of "🤾🏽♀️", 1, {}, emoji: true # => 2
Usage with String Extension
require 'unicode/display_width/string_ext'
"⚀".display_width # => 1
'一'.display_width # => 2
Modern API: Keyword-arguments Based Config Object
Version 2.0 introduces a keyword-argument based API, which allows you to save your configuration for later-reuse. This requires an extra line of code, but has the advantage that you'll need to define your string-width options only once:
require 'unicode/display_width'
display_width = Unicode::DisplayWidth.new(
# ambiguous: 1,
overwrite: { "A".ord => 100 },
emoji: true,
)
display_width.of "⚀" # => 1
display_width.of "🤾🏽♀️" # => 2
display_width.of "A" # => 100
Usage From the CLI
Use this one-liner to print out display widths for strings from the command-line:
$ gem install unicode-display_width
$ ruby -r unicode/display_width -e 'puts Unicode::DisplayWidth.of $*[0]' -- "一"
Replace "一" with the actual string to measure
Other Implementations & Discussion
- Python: https://github.com/jquast/wcwidth
- JavaScript: https://github.com/mycoboco/wcwidth.js
- C: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
- C for Julia: https://github.com/JuliaLang/utf8proc/issues/2
See unicode-x for more Unicode related micro libraries.
Copyright & Info
- Copyright (c) 2011, 2015-2020 Jan Lelis, https://janlelis.com, released under the MIT license
- Early versions based on runpaint's unicode-data interface: Copyright (c) 2009 Run Paint Run Run
- Unicode data: https://www.unicode.org/copyright.html#Exhibit1