Class: String

Inherits:

Object

Object
String

show all

Defined in:: lib/searchlink/semver.rb,
lib/searchlink/string.rb,
lib/searchlink/curl/html.rb,
lib/searchlink/searches/hook.rb

Overview

Hookmark String helpers

Instance Method Summary collapse

#clean ⇒ String

Remove newlines, escape quotes, and remove Google Analytics strings.
#close_punctuation ⇒ String

Complete incomplete punctuation pairs.
#close_punctuation! ⇒ Object

Destructive punctuation close.
#code_indent ⇒ String

Indent each line of string with 4 spaces.
#distance(t) ⇒ Object
#fix_gist_file ⇒ description_of_the_return_value

Convert file-myfile-rb to myfile.rb.
#matches_all(terms) ⇒ Object

Test that self matches every word in terms.
#matches_any(terms) ⇒ Object

Test if self contains any of terms.
#matches_exact(string) ⇒ Object

Test if self contains exactl match for string (case insensitive).
#matches_fuzzy(terms, separator: ' ', start_word: true, threshhold: 5) ⇒ Object
#matches_none(terms) ⇒ Object

Test that self does not contain any of terms.
#matches_score(terms, separator: ' ', start_word: true) ⇒ Object

Score string based on number of matches, 0 - 10.
#nil_if_missing ⇒ Nil, String

Test an AppleScript response, substituting nil for ‘Missing Value’.
#normalize_trigger ⇒ String

Adds ?: to any parentheticals in a regular expression to avoid match groups.
#parse_flags ⇒ Object

parse command line flags into long options.
#parse_flags! ⇒ Object
#path_elements ⇒ Array

Extract the most relevant portions from a URL path.
#remove_entities ⇒ Object
#remove_protocol ⇒ String

Remove the protocol from a URL.
#remove_seo(url) ⇒ String

Remove SEO elements from a title.
#remove_seo!(url) ⇒ Object

Destructively remove SEO elements from a title.
#scrub ⇒ Object

Scrub invalid characters from string.
#scrub! ⇒ Object
#slugify ⇒ String

Turn a string into a slug, removing spaces and non-alphanumeric characters.
#slugify! ⇒ Object

Destructive slugify.
#spacer ⇒ String

Generate a spacer based on character widths for help dialog display.
#split_hook ⇒ Object
#split_hooks ⇒ Object
#to_am ⇒ String

convert itunes to apple music link.
#to_rx_array(separator: ' ', start_word: true) ⇒ Array

Break a string into an array of Regexps.
#truncate(max) ⇒ Object

Truncate string to given length, preserving words.
#truncate!(max) ⇒ Object

Truncate in place.
#url_decode ⇒ Object
#url_encode ⇒ String

URL Encode string.
#url_path ⇒ String

Return just the path of a URL.
#valid_version? ⇒ Boolean

Test if given string is a valid semantic version number with major, minor and patch (and optionally pre).

Instance Method Details

#clean ⇒ `String`

Remove newlines, escape quotes, and remove Google Analytics strings

Returns:

(String) —

cleaned URL/String

# File 'lib/searchlink/string.rb', line 115

def clean
  gsub(/\n+/, ' ')
    .gsub(/"/, '&quot')
    .gsub(/\|/, '-')
    .gsub(/([&?]utm_[scm].+=[^&\s!,.)\]]++?)+(&.*)/, '\2')
    .sub(/\?&/, '').strip
end

#close_punctuation ⇒ `String`

Complete incomplete punctuation pairs

Returns:

(String) —

string with all punctuation properly paired

# File 'lib/searchlink/string.rb', line 183

def close_punctuation
  return self unless self =~ /[“‘\[(<]/

  words = split(/\s+/)

  punct_chars = {
    '“' => '”',
    '‘' => '’',
    '[' => ']',
    '(' => ')',
    '<' => '>'
  }

  left_punct = []

  words.each do |w|
    punct_chars.each do |k, v|
      left_punct.push(k) if w =~ /#{Regexp.escape(k)}/
      left_punct.delete_at(left_punct.rindex(k)) if w =~ /#{Regexp.escape(v)}/
    end
  end

  tail = ''
  left_punct.reverse.each { |c| tail += punct_chars[c] }

  gsub(/[^a-z)\]’”.…]+$/i, '...').strip + tail
end

#close_punctuation! ⇒ `Object`

Destructive punctuation close

See Also:

#close_punctuation



173
174
175

# File 'lib/searchlink/string.rb', line 173

def close_punctuation!
  replace close_punctuation
end

#code_indent ⇒ `String`

Indent each line of string with 4 spaces

Returns:

(String) —

indented string



484
485
486

# File 'lib/searchlink/string.rb', line 484

def code_indent
  split(/\n/).map { |l| "    #{l}" }.join("\n")
end

#distance(t) ⇒ `Object`

# File 'lib/searchlink/string.rb', line 397

def distance(t)
  s = self.dup
  m = s.length
  n = t.length
  return m if n == 0
  return n if m == 0
  d = Array.new(m+1) {Array.new(n+1)}

  (0..m).each {|i| d[i][0] = i}
  (0..n).each {|j| d[0][j] = j}
  (1..n).each do |j|
    (1..m).each do |i|
      d[i][j] = if s[i-1] == t[j-1]  # adjust index into string
                  d[i-1][j-1]       # no operation required
                else
                  [ d[i-1][j]+1,    # deletion
                    d[i][j-1]+1,    # insertion
                    d[i-1][j-1]+1,  # substitution
                  ].min
                end
    end
  end
  d[m][n]
end

#fix_gist_file ⇒ `description_of_the_return_value`

Convert file-myfile-rb to myfile.rb

Returns:

(description_of_the_return_value)



90
91
92

# File 'lib/searchlink/string.rb', line 90

def fix_gist_file
  sub(/^file-/, '').sub(/-([^\-]+)$/, '.\1')
end

#matches_all(terms) ⇒ `Object`

Test that self matches every word in terms

Parameters:

terms (String) —

The terms to test

# File 'lib/searchlink/string.rb', line 459

def matches_all(terms)
  rx_terms = terms.is_a?(String) ? terms.to_rx_array : terms
  rx_terms.each { |rx| return false unless gsub(/[^a-z0-9 ]/i, '') =~ rx }
  true
end

#matches_any(terms) ⇒ `Object`

Test if self contains any of terms

Parameters:

terms (String) —

The terms to test

# File 'lib/searchlink/string.rb', line 448

def matches_any(terms)
  rx_terms = terms.is_a?(String) ? terms.to_rx_array : terms
  rx_terms.each { |rx| return true if gsub(/[^a-z0-9 ]/i, '') =~ rx }
  false
end

#matches_exact(string) ⇒ `Object`

Test if self contains exactl match for string (case insensitive)

Parameters:

string (String) —

The string to match

# File 'lib/searchlink/string.rb', line 427

def matches_exact(string)
  comp = gsub(/[^a-z0-9 ]/i, '')
  comp =~ /\b#{string.gsub(/[^a-z0-9 ]/i, '').split(/ +/).map { |s| Regexp.escape(s) }.join(' +')}/i
end

#matches_fuzzy(terms, separator: ' ', start_word: true, threshhold: 5) ⇒ `Object`

# File 'lib/searchlink/string.rb', line 383

def matches_fuzzy(terms, separator: ' ', start_word: true, threshhold: 5)
  sources = split(/(#{separator})+/)
  words = terms.split(/(#{separator})+/)
  matches = 0
  sources.each do |src|
    words.each do |term|
      d = src.distance(term)
      matches += 1 if d <= threshhold
    end
  end

  ((matches / words.count.to_f) * 10).round(3)
end

#matches_none(terms) ⇒ `Object`

Test that self does not contain any of terms

Parameters:

terms (String) —

The terms to test

# File 'lib/searchlink/string.rb', line 437

def matches_none(terms)
  rx_terms = terms.is_a?(String) ? terms.to_rx_array : terms
  rx_terms.each { |rx| return false if gsub(/[^a-z0-9 ]/i, '') =~ rx }
  true
end

#matches_score(terms, separator: ' ', start_word: true) ⇒ `Object`

Score string based on number of matches, 0 - 10

Parameters:

terms (String) —

The terms to match
separator (String) (defaults to: ' ') —

The word separator
start_word (Boolean) (defaults to: true) —

Require match to be at beginning of word

# File 'lib/searchlink/string.rb', line 370

def matches_score(terms, separator: ' ', start_word: true)
  matched = 0
  regexes = terms.to_rx_array(separator: separator, start_word: start_word)

  regexes.each do |rx|
    matched += 1 if self =~ rx
  end

  return 0 if matched.zero?

  ((matched / regexes.count.to_f) * 10).round(3)
end

#nil_if_missing ⇒ `Nil`, `String`

Test an AppleScript response, substituting nil for ‘Missing Value’

Returns:

(Nil, String) —

nil if string is “missing value”

# File 'lib/searchlink/string.rb', line 355

def nil_if_missing
  return nil if self =~ /missing value/

  self
end

#normalize_trigger ⇒ `String`

Adds ?: to any parentheticals in a regular expression to avoid match groups

Returns:

(String) —

modified regular expression



31
32
33

# File 'lib/searchlink/string.rb', line 31

def normalize_trigger
  gsub(/\((?!\?:)/, '(?:').gsub(/(^(\^|\\A)|(\$|\\Z)$)/, '').downcase
end

#parse_flags ⇒ `Object`

parse command line flags into long options

# File 'lib/searchlink/string.rb', line 53

def parse_flags
  gsub(/(\+\+|--)([dirtvs]+)\b/) do
    m = Regexp.last_match
    bool = m[1] == '++' ? '' : 'no-'
    output = ' '
    m[2].split('').each do |arg|
      output += case arg
                when 'd'
                  "--#{bool}debug "
                when 'i'
                  "--#{bool}inline "
                when 'r'
                  "--#{bool}prefix_random "
                when 't'
                  "--#{bool}include_titles "
                when 'v'
                  "--#{bool}validate_links "
                when 's'
                  "--#{bool}remove_seo "
                else
                  ''
                end
    end

    output
  end.gsub(/ +/, ' ')
end

#parse_flags! ⇒ `Object`



81
82
83

# File 'lib/searchlink/string.rb', line 81

def parse_flags!
  replace parse_flags
end

#path_elements ⇒ `Array`

Extract the most relevant portions from a URL path

Returns:

(Array) —

array of relevant path elements

# File 'lib/searchlink/string.rb', line 155

def path_elements
  path = url_path
  # force trailing slash
  path.sub!(%r{/?$}, '/')
  # remove last path element
  path.sub!(%r{/[^/]+[.\-][^/]+/$}, '')
  # remove starting/ending slashes
  path.gsub!(%r{(^/|/$)}, '')
  # split at slashes, delete sections that are shorter
  # than 5 characters or only consist of numbers
  path.split(%r{/}).delete_if { |section| section =~ /^\d+$/ || section.length < 5 }
end

#remove_entities ⇒ `Object`



6
7
8

# File 'lib/searchlink/curl/html.rb', line 6

def remove_entities
  gsub(/&nbsp;/, ' ')
end

#remove_protocol ⇒ `String`

Remove the protocol from a URL

Returns:

(String) —

just hostname and path of URL



138
139
140

# File 'lib/searchlink/string.rb', line 138

def remove_protocol
  sub(%r{^(https?|s?ftp|file)://}, '')
end

#remove_seo(url) ⇒ `String`

Remove SEO elements from a title

Parameters:

url —

The url of the page from which the title came

Returns:

(String) —

cleaned title

# File 'lib/searchlink/string.rb', line 230

def remove_seo(url)
  title = dup
  url = URI.parse(url)
  host = url.hostname
  unless host
    return self unless SL.config['debug']

    SL.add_error('Invalid URL', "Could not remove SEO for #{url}")
    return self

  end

  path = url.path
  root_page = path =~ %r{^/?$} ? true : false

  title.gsub!(/\s*(&ndash;|&mdash;)\s*/, ' - ')
  title.gsub!(/&[lr]dquo;/, '"')
  title.gsub!(/&[lr]dquo;/, "'")
  title.gsub!(/&#8211;/, ' — ')
  title = CGI.unescapeHTML(title)
  title.gsub!(/ +/, ' ')

  seo_title_separators = %w[| » « — – - · :]

  begin
    re_parts = []

    host_parts = host.sub(/(?:www\.)?(.*?)\.[^.]+$/, '\1').split(/\./).delete_if { |p| p.length < 3 }
    h_re = !host_parts.empty? ? host_parts.map { |seg| seg.downcase.split(//).join('.?') }.join('|') : ''
    re_parts.push(h_re) unless h_re.empty?

    # p_re = path.path_elements.map{|seg| seg.downcase.split(//).join('.?') }.join('|')
    # re_parts.push(p_re) if p_re.length > 0

    site_re = "(#{re_parts.join('|')})"

    dead_switch = 0

    while title.downcase.gsub(/[^a-z]/i, '') =~ /#{site_re}/i

      break if dead_switch > 5

      seo_title_separators.each_with_index do |sep, i|
        parts = title.split(/ *#{Regexp.escape(sep)} +/)

        next if parts.length == 1

        remaining_separators = seo_title_separators[i..].map { |s| Regexp.escape(s) }.join('')
        seps = Regexp.new("^[^#{remaining_separators}]+$")

        longest = parts.longest_element.strip

        unless parts.empty?
          parts.delete_if do |pt|
            compressed = pt.strip.downcase.gsub(/[^a-z]/i, '')
            compressed =~ /#{site_re}/ && pt =~ seps ? !root_page : false
          end
        end

        title = if parts.empty?
                  longest
                elsif parts.length < 2
                  parts.join(sep)
                elsif parts.length > 2
                  parts.longest_element.strip
                else
                  parts.join(sep)
                end
      end
      dead_switch += 1
    end
  rescue StandardError => e
    return self unless SL.config['debug']

    SL.add_error("Error SEO processing title for #{url}", e)
    return self
  end

  seps = Regexp.new(" *[#{seo_title_separators.map { |s| Regexp.escape(s) }.join('')}] +")
  if title =~ seps
    seo_parts = title.split(seps)
    title = seo_parts.longest_element.strip if seo_parts.length.positive?
  end

  title && title.length > 5 ? title.gsub(/\s+/, ' ') : CGI.unescapeHTML(self)
end

#remove_seo!(url) ⇒ `Object`

Destructively remove SEO elements from a title

Parameters:

url —

The url of the page from which the title came

See Also:

#remove_seo



219
220
221

# File 'lib/searchlink/string.rb', line 219

def remove_seo!(url)
  replace remove_seo(url)
end

#scrub ⇒ `Object`

Scrub invalid characters from string



4
5
6

# File 'lib/searchlink/string.rb', line 4

def scrub
  encode('utf-16', invalid: :replace).encode('utf-8').gsub(/\u00A0/, ' ')
end

#scrub! ⇒ `Object`

See Also:

#scrub



9
10
11

# File 'lib/searchlink/string.rb', line 9

def scrub!
  replace scrub
end

#slugify ⇒ `String`

Turn a string into a slug, removing spaces and non-alphanumeric characters

Returns:

(String) —

slugified string



99
100
101

# File 'lib/searchlink/string.rb', line 99

def slugify
  downcase.gsub(/[^a-z0-9_]/i, '-').gsub(/-+/, '-').sub(/-?$/, '')
end

#slugify! ⇒ `Object`

Destructive slugify

See Also:

#slugify



105
106
107

# File 'lib/searchlink/string.rb', line 105

def slugify!
  replace slugify
end

#spacer ⇒ `String`

Generate a spacer based on character widths for help dialog display

Returns:

(String) —

string containing tabs

# File 'lib/searchlink/string.rb', line 40

def spacer
  len = length
  scan(/[mwv]/).each { len += 1 }
  scan(/t/).each { len -= 1 }
  case len
  when 0..3
    "\t\t"
  when 4..12
    " \t"
  end
end

#split_hook ⇒ `Object`

# File 'lib/searchlink/searches/hook.rb', line 6

def split_hook
  elements = split(/\|\|/)
  {
    name: elements[0].nil_if_missing,
    url: elements[1].nil_if_missing,
    path: elements[2].nil_if_missing
  }
end

#split_hooks ⇒ `Object`



15
16
17

# File 'lib/searchlink/searches/hook.rb', line 15

def split_hooks
  split(/\^\^/).map(&:split_hook)
end

#to_am ⇒ `String`

convert itunes to apple music link

Returns:

(String) —

apple music link

# File 'lib/searchlink/string.rb', line 126

def to_am
  input = dup
  input.sub!(%r{/itunes\.apple\.com}, 'geo.itunes.apple.com')
  append = input =~ %r{\?[^/]+=} ? '&app=music' : '?app=music'
  input + append
end

#to_rx_array(separator: ' ', start_word: true) ⇒ `Array`

Break a string into an array of Regexps

Parameters:

separator (String) (defaults to: ' ') —

The word separator
start_word (Boolean) (defaults to: true) —

Require matches at start of word

Returns:

(Array) —

array of regular expressions

# File 'lib/searchlink/string.rb', line 474

def to_rx_array(separator: ' ', start_word: true)
  bound = start_word ? '\b' : ''
  str = gsub(/(#{separator})+/, separator)
  str.split(/#{separator}/).map { |arg| /#{bound}#{arg.gsub(/[^a-z0-9]/i, '.?')}/i }
end

#truncate(max) ⇒ `Object`

Truncate string to given length, preserving words

Parameters:

max (Number) —

The maximum length

# File 'lib/searchlink/string.rb', line 333

def truncate(max)
  return self if length < max

  trunc_title = []

  words = split(/\s+/)
  words.each do |word|
    break unless trunc_title.join(' ').length.close_punctuation + word.length <= max

    trunc_title << word
  end

  trunc_title.empty? ? words[0] : trunc_title.join(' ')
end

#truncate!(max) ⇒ `Object`

Truncate in place

Parameters:

max (Number) —

The maximum length

See Also:

#truncate



324
325
326

# File 'lib/searchlink/string.rb', line 324

def truncate!(max)
  replace truncate(max)
end

#url_decode ⇒ `Object`



21
22
23

# File 'lib/searchlink/string.rb', line 21

def url_decode
  CGI.unescape(self)
end

#url_encode ⇒ `String`

URL Encode string

Returns:

(String) —

url encoded string



17
18
19

# File 'lib/searchlink/string.rb', line 17

def url_encode
  ERB::Util.url_encode(gsub(/%22/, '"'))
end

#url_path ⇒ `String`

Return just the path of a URL

Returns:

(String) —

The path.



147
148
149

# File 'lib/searchlink/string.rb', line 147

def url_path
  URI.parse(self).path
end

#valid_version? ⇒ `Boolean`

Test if given string is a valid semantic version number with major, minor and patch (and optionally pre)

Returns:

(Boolean) —

string is semantic version number

# File 'lib/searchlink/semver.rb', line 37

def valid_version?
  pattern = /^\d+\.\d+\.\d+(-?([^0-9]+\d*))?$/
  self =~ pattern ? true : false
end

Class: String

Overview

Instance Method Summary collapse

Instance Method Details

#clean ⇒ String

#close_punctuation ⇒ String

#close_punctuation! ⇒ Object

#code_indent ⇒ String

#distance(t) ⇒ Object

#fix_gist_file ⇒ description_of_the_return_value

#matches_all(terms) ⇒ Object

#matches_any(terms) ⇒ Object

#matches_exact(string) ⇒ Object

#matches_fuzzy(terms, separator: ' ', start_word: true, threshhold: 5) ⇒ Object

#matches_none(terms) ⇒ Object

#matches_score(terms, separator: ' ', start_word: true) ⇒ Object

#nil_if_missing ⇒ Nil, String

#normalize_trigger ⇒ String

#parse_flags ⇒ Object

#parse_flags! ⇒ Object

#path_elements ⇒ Array

#remove_entities ⇒ Object

#remove_protocol ⇒ String

#remove_seo(url) ⇒ String

#remove_seo!(url) ⇒ Object

#scrub ⇒ Object

#scrub! ⇒ Object

#slugify ⇒ String

#slugify! ⇒ Object

#spacer ⇒ String

#split_hook ⇒ Object

#split_hooks ⇒ Object

#to_am ⇒ String

#to_rx_array(separator: ' ', start_word: true) ⇒ Array

#truncate(max) ⇒ Object

#truncate!(max) ⇒ Object

#url_decode ⇒ Object

#url_encode ⇒ String

#url_path ⇒ String

#valid_version? ⇒ Boolean

#clean ⇒ `String`

#close_punctuation ⇒ `String`

#close_punctuation! ⇒ `Object`

#code_indent ⇒ `String`

#distance(t) ⇒ `Object`

#fix_gist_file ⇒ `description_of_the_return_value`

#matches_all(terms) ⇒ `Object`

#matches_any(terms) ⇒ `Object`

#matches_exact(string) ⇒ `Object`

#matches_fuzzy(terms, separator: ' ', start_word: true, threshhold: 5) ⇒ `Object`

#matches_none(terms) ⇒ `Object`

#matches_score(terms, separator: ' ', start_word: true) ⇒ `Object`

#nil_if_missing ⇒ `Nil`, `String`

#normalize_trigger ⇒ `String`

#parse_flags ⇒ `Object`

#parse_flags! ⇒ `Object`

#path_elements ⇒ `Array`

#remove_entities ⇒ `Object`

#remove_protocol ⇒ `String`

#remove_seo(url) ⇒ `String`

#remove_seo!(url) ⇒ `Object`

#scrub ⇒ `Object`

#scrub! ⇒ `Object`

#slugify ⇒ `String`

#slugify! ⇒ `Object`

#spacer ⇒ `String`

#split_hook ⇒ `Object`

#split_hooks ⇒ `Object`

#to_am ⇒ `String`

#to_rx_array(separator: ' ', start_word: true) ⇒ `Array`

#truncate(max) ⇒ `Object`

#truncate!(max) ⇒ `Object`

#url_decode ⇒ `Object`

#url_encode ⇒ `String`

#url_path ⇒ `String`

#valid_version? ⇒ `Boolean`