Class: String

Inherits:

Object

Object
String

show all

Defined in:: lib/searchlink/semver.rb,
lib/searchlink/string.rb,
lib/searchlink/curl/html.rb,
lib/searchlink/searches/hook.rb

Overview

Hookmark String helpers

Instance Method Summary collapse

#clean ⇒ String

Remove newlines, escape quotes, and remove Google Analytics strings.
#close_punctuation ⇒ String

Complete incomplete punctuation pairs.
#close_punctuation! ⇒ Object

Destructive punctuation close.
#code_indent ⇒ String

Indent each line of string with 4 spaces.
#distance(t) ⇒ Object
#fix_gist_file ⇒ description_of_the_return_value

Convert file-myfile-rb to myfile.rb.
#matches_all(terms) ⇒ Object

Test that self matches every word in terms.
#matches_any(terms) ⇒ Object

Test if self contains any of terms.
#matches_exact(string) ⇒ Object

Test if self contains exactl match for string (case insensitive).
#matches_fuzzy(terms, separator: " ", start_word: true, threshhold: 5) ⇒ Object
#matches_none(terms) ⇒ Object

Test that self does not contain any of terms.
#matches_score(terms, separator: " ", start_word: true) ⇒ Object

Score string based on number of matches, 0 - 10.
#nil_if_missing ⇒ Nil, String

Test an AppleScript response, substituting nil for ‘Missing Value’.
#normalize_trigger ⇒ String

Adds ?: to any parentheticals in a regular expression to avoid match groups.
#parse_flags ⇒ Object

parse command line flags into long options.
#parse_flags! ⇒ Object
#path_elements ⇒ Array

Extract the most relevant portions from a URL path.
#remove_entities ⇒ Object
#remove_protocol ⇒ String

Remove the protocol from a URL.
#remove_seo(url) ⇒ String

Remove SEO elements from a title.
#remove_seo!(url) ⇒ Object

Destructively remove SEO elements from a title.
#scrubup ⇒ Object

Scrub invalid characters from string.
#scrubup! ⇒ Object
#shorten_path ⇒ Object

Shorten path by adding ~ for home directory.
#slugify ⇒ String

Turn a string into a slug, removing spaces and non-alphanumeric characters.
#slugify! ⇒ Object

Destructive slugify.
#spacer ⇒ String

Generate a spacer based on character widths for help dialog display.
#split_hook ⇒ Object
#split_hooks ⇒ Object
#to_am ⇒ String

convert itunes to apple music link.
#to_rx_array(separator: " ", start_word: true) ⇒ Array

Break a string into an array of Regexps.
#truncate(max) ⇒ Object

Truncate string to given length, preserving words.
#truncate!(max) ⇒ Object

Truncate in place.
#url_decode ⇒ Object
#url_encode ⇒ String

URL Encode string.
#url_path ⇒ String

Return just the path of a URL.
#valid_version? ⇒ Boolean

Test if given string is a valid semantic version number with major, minor and patch (and optionally pre).
#word_wrap(col_width = 60, prefix = "") ⇒ Object

As with #word_wrap, but modifies the string in place.
#word_wrap!(col_width = 60, prefix = "") ⇒ Object

Word wrap a string not exceeding max width.
#yaml_val ⇒ Object

Quote a YAML value if needed.

Instance Method Details

#clean ⇒ `String`

Remove newlines, escape quotes, and remove Google Analytics strings

Returns:

(String) —

cleaned URL/String

# File 'lib/searchlink/string.rb', line 142

def clean
  gsub(/\n+/, " ")
    .gsub(/"/, "&quot")
    .gsub(/\|/, "-")
    .gsub(/([&?]utm_[scm].+=[^&\s!,.)\]]++?)+(&.*)/, '\2')
    .sub(/\?&/, "").strip
end

#close_punctuation ⇒ `String`

Complete incomplete punctuation pairs

Returns:

(String) —

string with all punctuation properly paired

# File 'lib/searchlink/string.rb', line 210

def close_punctuation
  return self unless self =~ /[“‘\[(<]/

  words = split(/\s+/)

  punct_chars = {
    "“" => "”",
    "‘" => "’",
    "[" => "]",
    "(" => ")",
    "<" => ">"
  }

  left_punct = []

  words.each do |w|
    punct_chars.each do |k, v|
      left_punct.push(k) if w =~ /#{Regexp.escape(k)}/
      left_punct.delete_at(left_punct.rindex(k)) if w =~ /#{Regexp.escape(v)}/
    end
  end

  tail = ""
  left_punct.reverse.each { |c| tail += punct_chars[c] }

  gsub(/[^a-z)\]’”.…]+$/i, "...").strip + tail
end

#close_punctuation! ⇒ `Object`

Destructive punctuation close

See Also:

#close_punctuation



200
201
202

# File 'lib/searchlink/string.rb', line 200

def close_punctuation!
  replace close_punctuation
end

#code_indent ⇒ `String`

Indent each line of string with 4 spaces

Returns:

(String) —

indented string



510
511
512

# File 'lib/searchlink/string.rb', line 510

def code_indent
  split(/\n/).map { |l| "    #{l}" }.join("\n")
end

#distance(t) ⇒ `Object`

# File 'lib/searchlink/string.rb', line 422

def distance(t)
  s = dup
  m = s.length
  n = t.length
  return m if n.zero?
  return n if m.zero?

  d = Array.new(m + 1) { Array.new(n + 1) }

  (0..m).each { |i| d[i][0] = i }
  (0..n).each { |j| d[0][j] = j }
  (1..n).each do |j|
    (1..m).each do |i|
      d[i][j] = if s[i - 1] == t[j - 1] # adjust index into string
                  d[i - 1][j - 1] # no operation required
                else
                  [d[i - 1][j] + 1, # deletion
                   d[i][j - 1] + 1, # insertion
                   d[i - 1][j - 1] + 1 # substitution
    ].min
                end
    end
  end
  d[m][n]
end

#fix_gist_file ⇒ `description_of_the_return_value`

Convert file-myfile-rb to myfile.rb

Returns:

(description_of_the_return_value)



117
118
119

# File 'lib/searchlink/string.rb', line 117

def fix_gist_file
  sub(/^file-/, "").sub(/-([^-]+)$/, '.\1')
end

#matches_all(terms) ⇒ `Object`

Test that self matches every word in terms

Parameters:

terms (String) —

The terms to test

# File 'lib/searchlink/string.rb', line 485

def matches_all(terms)
  rx_terms = terms.is_a?(String) ? terms.to_rx_array : terms
  rx_terms.each { |rx| return false unless gsub(/[^a-z0-9 ]/i, "") =~ rx }
  true
end

#matches_any(terms) ⇒ `Object`

Test if self contains any of terms

Parameters:

terms (String) —

The terms to test

# File 'lib/searchlink/string.rb', line 474

def matches_any(terms)
  rx_terms = terms.is_a?(String) ? terms.to_rx_array : terms
  rx_terms.each { |rx| return true if gsub(/[^a-z0-9 ]/i, "") =~ rx }
  false
end

#matches_exact(string) ⇒ `Object`

Test if self contains exactl match for string (case insensitive)

Parameters:

string (String) —

The string to match

# File 'lib/searchlink/string.rb', line 453

def matches_exact(string)
  comp = gsub(/[^a-z0-9 ]/i, "")
  comp =~ /\b#{string.gsub(/[^a-z0-9 ]/i, '').split(/ +/).map { |s| Regexp.escape(s) }.join(' +')}/i
end

#matches_fuzzy(terms, separator: " ", start_word: true, threshhold: 5) ⇒ `Object`

# File 'lib/searchlink/string.rb', line 408

def matches_fuzzy(terms, separator: " ", start_word: true, threshhold: 5)
  sources = split(/(#{separator})+/)
  words = terms.split(/(#{separator})+/)
  matches = 0
  sources.each do |src|
    words.each do |term|
      d = src.distance(term)
      matches += 1 if d <= threshhold
    end
  end

  ((matches / words.count.to_f) * 10).round(3)
end

#matches_none(terms) ⇒ `Object`

Test that self does not contain any of terms

Parameters:

terms (String) —

The terms to test

# File 'lib/searchlink/string.rb', line 463

def matches_none(terms)
  rx_terms = terms.is_a?(String) ? terms.to_rx_array : terms
  rx_terms.each { |rx| return false if gsub(/[^a-z0-9 ]/i, "") =~ rx }
  true
end

#matches_score(terms, separator: " ", start_word: true) ⇒ `Object`

Score string based on number of matches, 0 - 10

Parameters:

terms (String) —

The terms to match
separator (String) (defaults to: " ") —

The word separator
start_word (Boolean) (defaults to: true) —

Require match to be at beginning of word

# File 'lib/searchlink/string.rb', line 395

def matches_score(terms, separator: " ", start_word: true)
  matched = 0
  regexes = terms.to_rx_array(separator: separator, start_word: start_word)

  regexes.each do |rx|
    matched += 1 if self =~ rx
  end

  return 0 if matched.zero?

  ((matched / regexes.count.to_f) * 10).round(3)
end

#nil_if_missing ⇒ `Nil`, `String`

Test an AppleScript response, substituting nil for ‘Missing Value’

Returns:

(Nil, String) —

nil if string is “missing value”

# File 'lib/searchlink/string.rb', line 380

def nil_if_missing
  return nil if self =~ /missing value/

  self
end

#normalize_trigger ⇒ `String`

Adds ?: to any parentheticals in a regular expression to avoid match groups

Returns:

(String) —

modified regular expression



58
59
60

# File 'lib/searchlink/string.rb', line 58

def normalize_trigger
  gsub(/\((?!\?:)/, "(?:").gsub(/(^(\^|\\A)|(\$|\\Z)$)/, "").downcase
end

#parse_flags ⇒ `Object`

parse command line flags into long options

# File 'lib/searchlink/string.rb', line 80

def parse_flags
  gsub(/(\+\+|--)([dirtvs]+)\b/) do
    m = Regexp.last_match
    bool = m[1] == "++" ? "" : "no-"
    output = " "
    m[2].split("").each do |arg|
      output += case arg
                when "d"
                  "--#{bool}debug "
                when "i"
                  "--#{bool}inline "
                when "r"
                  "--#{bool}prefix_random "
                when "t"
                  "--#{bool}include_titles "
                when "v"
                  "--#{bool}validate_links "
                when "s"
                  "--#{bool}remove_seo "
                else
                  ""
                end
    end

    output
  end.gsub(/ +/, " ")
end

#parse_flags! ⇒ `Object`



108
109
110

# File 'lib/searchlink/string.rb', line 108

def parse_flags!
  replace parse_flags
end

#path_elements ⇒ `Array`

Extract the most relevant portions from a URL path

Returns:

(Array) —

array of relevant path elements

# File 'lib/searchlink/string.rb', line 182

def path_elements
  path = url_path
  # force trailing slash
  path.sub!(%r{/?$}, "/")
  # remove last path element
  path.sub!(%r{/[^/]+[.-][^/]+/$}, "")
  # remove starting/ending slashes
  path.gsub!(%r{(^/|/$)}, "")
  # split at slashes, delete sections that are shorter
  # than 5 characters or only consist of numbers
  path.split(%r{/}).delete_if { |section| section =~ /^\d+$/ || section.length < 5 }
end

#remove_entities ⇒ `Object`



6
7
8

# File 'lib/searchlink/curl/html.rb', line 6

def remove_entities
  gsub(/&nbsp;/, " ")
end

#remove_protocol ⇒ `String`

Remove the protocol from a URL

Returns:

(String) —

just hostname and path of URL



165
166
167

# File 'lib/searchlink/string.rb', line 165

def remove_protocol
  sub(%r{^(https?|s?ftp|file)://}, "")
end

#remove_seo(url) ⇒ `String`

Remove SEO elements from a title

Parameters:

url —

The url of the page from which the title came

Returns:

(String) —

cleaned title

# File 'lib/searchlink/string.rb', line 257

def remove_seo(url)
  title = dup
  url = URI.parse(url)
  host = url.hostname
  unless host
    return self unless SL.config["debug"]

    SL.add_error("Invalid URL", "Could not remove SEO for #{url}")
    return self
  end

  path = url.path
  root_page = path =~ %r{^/?$} ? true : false

  title.gsub!(/\s*(&ndash;|&mdash;)\s*/, " - ")
  title.gsub!(/&[lr]dquo;/, '"')
  title.gsub!(/&[lr]dquo;/, "'")
  title.gsub!(/&#8211;/, " — ")
  title = CGI.unescapeHTML(title)
  title.gsub!(/ +/, " ")

  seo_title_separators = %w[| » « — – - · :]

  begin
    re_parts = []

    host_parts = host.sub(/(?:www\.)?(.*?)\.[^.]+$/, '\1').split(/\./).delete_if { |p| p.length < 3 }
    h_re = !host_parts.empty? ? host_parts.map { |seg| seg.downcase.split(//).join(".?") }.join("|") : ""
    re_parts.push(h_re) unless h_re.empty?

    # p_re = path.path_elements.map{|seg| seg.downcase.split(//).join('.?') }.join('|')
    # re_parts.push(p_re) if p_re.length > 0

    site_re = "(#{re_parts.join('|')})"

    dead_switch = 0

    while title.downcase.gsub(/[^a-z]/i, "") =~ /#{site_re}/i
      break if dead_switch > 5

      seo_title_separators.each_with_index do |sep, i|
        parts = title.split(/ *#{Regexp.escape(sep)} +/)

        next if parts.length == 1

        remaining_separators = seo_title_separators[i..].map { |s| Regexp.escape(s) }.join("")
        seps = Regexp.new("^[^#{remaining_separators}]+$")

        longest = parts.longest_element.strip

        unless parts.empty?
          parts.delete_if do |pt|
            compressed = pt.strip.downcase.gsub(/[^a-z]/i, "")
            compressed =~ /#{site_re}/ && pt =~ seps ? !root_page : false
          end
        end

        title = if parts.empty?
                  longest
                elsif parts.length < 2
                  parts.join(sep)
                elsif parts.length > 2
                  parts.longest_element.strip
                else
                  parts.join(sep)
                end
      end
      dead_switch += 1
    end
  rescue StandardError => e
    return self unless SL.config["debug"]

    SL.add_error("Error SEO processing title for #{url}", e)
    return self
  end

  seps = Regexp.new(" *[#{seo_title_separators.map { |s| Regexp.escape(s) }.join('')}] +")
  if title =~ seps
    seo_parts = title.split(seps)
    title = seo_parts.longest_element.strip if seo_parts.length.positive?
  end

  title && title.length > 5 ? title.gsub(/\s+/, " ") : CGI.unescapeHTML(self)
end

#remove_seo!(url) ⇒ `Object`

Destructively remove SEO elements from a title

Parameters:

url —

The url of the page from which the title came

See Also:

#remove_seo



246
247
248

# File 'lib/searchlink/string.rb', line 246

def remove_seo!(url)
  replace remove_seo(url)
end

#scrubup ⇒ `Object`

Scrub invalid characters from string



31
32
33

# File 'lib/searchlink/string.rb', line 31

def scrubup
  encode("utf-16", invalid: :replace).encode("utf-8").gsub(/\u00A0/, " ")
end

#scrubup! ⇒ `Object`

See Also:

#scrub



36
37
38

# File 'lib/searchlink/string.rb', line 36

def scrubup!
  replace scrub
end

#shorten_path ⇒ `Object`

Shorten path by adding ~ for home directory

# File 'lib/searchlink/string.rb', line 517

def shorten_path
  home_directory = ENV["HOME"]
  sub(home_directory, "~")
end

#slugify ⇒ `String`

Turn a string into a slug, removing spaces and non-alphanumeric characters

Returns:

(String) —

slugified string



126
127
128

# File 'lib/searchlink/string.rb', line 126

def slugify
  downcase.gsub(/[^a-z0-9_]/i, "-").gsub(/-+/, "-").sub(/-?$/, "")
end

#slugify! ⇒ `Object`

Destructive slugify

See Also:

#slugify



132
133
134

# File 'lib/searchlink/string.rb', line 132

def slugify!
  replace slugify
end

#spacer ⇒ `String`

Generate a spacer based on character widths for help dialog display

Returns:

(String) —

string containing tabs

# File 'lib/searchlink/string.rb', line 67

def spacer
  len = length
  scan(/[mwv]/).each { len += 1 }
  scan(/t/).each { len -= 1 }
  case len
  when 0..3
    "\t\t"
  when 4..12
    " \t"
  end
end

#split_hook ⇒ `Object`

# File 'lib/searchlink/searches/hook.rb', line 8

def split_hook
  elements = split(/\|\|/)
  {
    name: elements[0].nil_if_missing,
    url: elements[1].nil_if_missing,
    path: elements[2].nil_if_missing
  }
end

#split_hooks ⇒ `Object`



17
18
19

# File 'lib/searchlink/searches/hook.rb', line 17

def split_hooks
  split(/\^\^/).map(&:split_hook)
end

#to_am ⇒ `String`

convert itunes to apple music link

Returns:

(String) —

apple music link

# File 'lib/searchlink/string.rb', line 153

def to_am
  input = dup
  input.sub!(%r{/itunes\.apple\.com}, "geo.itunes.apple.com")
  append = input =~ %r{\?[^/]+=} ? "&app=music" : "?app=music"
  input + append
end

#to_rx_array(separator: " ", start_word: true) ⇒ `Array`

Break a string into an array of Regexps

Parameters:

separator (String) (defaults to: " ") —

The word separator
start_word (Boolean) (defaults to: true) —

Require matches at start of word

Returns:

(Array) —

array of regular expressions

# File 'lib/searchlink/string.rb', line 500

def to_rx_array(separator: " ", start_word: true)
  bound = start_word ? '\b' : ""
  str = gsub(/(#{separator})+/, separator)
  str.split(/#{separator}/).map { |arg| /#{bound}#{arg.gsub(/[^a-z0-9]/i, '.?')}/i }
end

#truncate(max) ⇒ `Object`

Truncate string to given length, preserving words

Parameters:

max (Number) —

The maximum length

# File 'lib/searchlink/string.rb', line 358

def truncate(max)
  return self if length < max

  trunc_title = []

  words = split(/\s+/)
  words.each do |word|
    break unless trunc_title.join(" ").length.close_punctuation + word.length <= max

    trunc_title << word
  end

  trunc_title.empty? ? words[0] : trunc_title.join(" ")
end

#truncate!(max) ⇒ `Object`

Truncate in place

Parameters:

max (Number) —

The maximum length

See Also:

#truncate



349
350
351

# File 'lib/searchlink/string.rb', line 349

def truncate!(max)
  replace truncate(max)
end

#url_decode ⇒ `Object`



48
49
50

# File 'lib/searchlink/string.rb', line 48

def url_decode
  CGI.unescape(self)
end

#url_encode ⇒ `String`

URL Encode string

Returns:

(String) —

url encoded string



44
45
46

# File 'lib/searchlink/string.rb', line 44

def url_encode
  ERB::Util.url_encode(gsub(/%22/, '"'))
end

#url_path ⇒ `String`

Return just the path of a URL

Returns:

(String) —

The path.



174
175
176

# File 'lib/searchlink/string.rb', line 174

def url_path
  URI.parse(self).path
end

#valid_version? ⇒ `Boolean`

Test if given string is a valid semantic version number with major, minor and patch (and optionally pre)

Returns:

(Boolean) —

string is semantic version number

# File 'lib/searchlink/semver.rb', line 39

def valid_version?
  pattern = /^\d+\.\d+\.\d+(-?([^0-9]+\d*))?$/
  self =~ pattern ? true : false
end

#word_wrap(col_width = 60, prefix = "") ⇒ `Object`

As with #word_wrap, but modifies the string in place.

CREDIT: Gavin Kistner, Dayne Broderson

# File 'lib/searchlink/string.rb', line 23

def word_wrap(col_width = 60, prefix = "")
  str = dup
  str.gsub!(/(\S{#{col_width}})(?=\S)/, "#{prefix}\\1")
  str.gsub!(/(.{1,#{col_width}})(?:\s+|$)/, "#{prefix}\\1\n")
  str
end

#word_wrap!(col_width = 60, prefix = "") ⇒ `Object`

Word wrap a string not exceeding max width. CREDIT: Gavin Kistner, Dayne Broderson



15
16
17

# File 'lib/searchlink/string.rb', line 15

def word_wrap!(col_width = 60, prefix = "")
  replace dup.word_wrap(col_width, prefix)
end

#yaml_val ⇒ `Object`

Quote a YAML value if needed

# File 'lib/searchlink/string.rb', line 7

def yaml_val
  yaml = YAML.safe_load("key: '#{self}'")
  YAML.dump(yaml).match(/key: (.*?)$/)[1]
end

Class: String

Overview

Instance Method Summary collapse

Instance Method Details

#clean ⇒ String

#close_punctuation ⇒ String

#close_punctuation! ⇒ Object

#code_indent ⇒ String

#distance(t) ⇒ Object

#fix_gist_file ⇒ description_of_the_return_value

#matches_all(terms) ⇒ Object

#matches_any(terms) ⇒ Object

#matches_exact(string) ⇒ Object

#matches_fuzzy(terms, separator: " ", start_word: true, threshhold: 5) ⇒ Object

#matches_none(terms) ⇒ Object

#matches_score(terms, separator: " ", start_word: true) ⇒ Object

#nil_if_missing ⇒ Nil, String

#normalize_trigger ⇒ String

#parse_flags ⇒ Object

#parse_flags! ⇒ Object

#path_elements ⇒ Array

#remove_entities ⇒ Object

#remove_protocol ⇒ String

#remove_seo(url) ⇒ String

#remove_seo!(url) ⇒ Object

#scrubup ⇒ Object

#scrubup! ⇒ Object

#shorten_path ⇒ Object

#slugify ⇒ String

#slugify! ⇒ Object

#spacer ⇒ String

#split_hook ⇒ Object

#split_hooks ⇒ Object

#to_am ⇒ String

#to_rx_array(separator: " ", start_word: true) ⇒ Array

#truncate(max) ⇒ Object

#truncate!(max) ⇒ Object

#url_decode ⇒ Object

#url_encode ⇒ String

#url_path ⇒ String

#valid_version? ⇒ Boolean

#word_wrap(col_width = 60, prefix = "") ⇒ Object

#word_wrap!(col_width = 60, prefix = "") ⇒ Object

#yaml_val ⇒ Object

#clean ⇒ `String`

#close_punctuation ⇒ `String`

#close_punctuation! ⇒ `Object`

#code_indent ⇒ `String`

#distance(t) ⇒ `Object`

#fix_gist_file ⇒ `description_of_the_return_value`

#matches_all(terms) ⇒ `Object`

#matches_any(terms) ⇒ `Object`

#matches_exact(string) ⇒ `Object`

#matches_fuzzy(terms, separator: " ", start_word: true, threshhold: 5) ⇒ `Object`

#matches_none(terms) ⇒ `Object`

#matches_score(terms, separator: " ", start_word: true) ⇒ `Object`

#nil_if_missing ⇒ `Nil`, `String`

#normalize_trigger ⇒ `String`

#parse_flags ⇒ `Object`

#parse_flags! ⇒ `Object`

#path_elements ⇒ `Array`

#remove_entities ⇒ `Object`

#remove_protocol ⇒ `String`

#remove_seo(url) ⇒ `String`

#remove_seo!(url) ⇒ `Object`

#scrubup ⇒ `Object`

#scrubup! ⇒ `Object`

#shorten_path ⇒ `Object`

#slugify ⇒ `String`

#slugify! ⇒ `Object`

#spacer ⇒ `String`

#split_hook ⇒ `Object`

#split_hooks ⇒ `Object`

#to_am ⇒ `String`

#to_rx_array(separator: " ", start_word: true) ⇒ `Array`

#truncate(max) ⇒ `Object`

#truncate!(max) ⇒ `Object`

#url_decode ⇒ `Object`

#url_encode ⇒ `String`

#url_path ⇒ `String`

#valid_version? ⇒ `Boolean`

#word_wrap(col_width = 60, prefix = "") ⇒ `Object`

#word_wrap!(col_width = 60, prefix = "") ⇒ `Object`

#yaml_val ⇒ `Object`