Module: Wool::LexicalAnalysis

Included in:
Warning
Defined in:
lib/wool/analysis/lexical_analysis.rb

Overview

This is a set of methods that get provided to Warnings so they can perform lexical analysis of their bodies. This module handles tokenizing only - not parse-trees.

Defined Under Namespace

Classes: Token

Instance Method Summary collapse

Instance Method Details

#find_keyword(*args) ⇒ Array

Finds the first instance of a set of keywords in the body. If no text is given to scan, then the full content is scanned.

Parameters:

  • body (String)

    (self.body) The first parameter is optional: the text to search. This defaults to the full text.

  • keyword (Symbol)

    The rest of the arguments are keywords to search for. Any number of keywords may be specified.

Returns:

  • (Array)

    the token in the form returned by Ripper. See #lex.



89
90
91
92
93
94
95
96
97
98
# File 'lib/wool/analysis/lexical_analysis.rb', line 89

def find_keyword(*args)
  body, list = _extract_token_search_args(args)
  list.map! {|x| x.to_s}
  lexed = lex(body)
  lexed.find.with_index do |tok, idx|
    is_keyword = tok.type == :on_kw && list.include?(tok.body)
    is_not_symbol = idx == 0 || lexed[idx-1].type != :on_symbeg
    is_keyword && is_not_symbol
  end
end

#find_token(*args) ⇒ Array

Finds the first instance of a set of tokens in the body. If no text is given to scan, then the full content is scanned.

Parameters:

  • body (String)

    (self.body) The first parameter is optional: the text to search. This defaults to the full text.

  • token (Symbol)

    The rest of the arguments are tokens to search for. Any number of tokens may be specified.

Returns:

  • (Array)

    the token in the form returned by Ripper. See #lex.



108
109
110
111
112
113
114
115
116
# File 'lib/wool/analysis/lexical_analysis.rb', line 108

def find_token(*args)
  body, list = _extract_token_search_args(args)
  lexed = lex(body)
  lexed.find.with_index do |tok, idx|
    is_token = list.include?(tok.type)
    is_not_symbol = idx == 0 || lexed[idx-1].type != :on_symbeg
    is_token && is_not_symbol
  end
end

#lex(body = self.body) ⇒ Array<Array<Integer, Integer>, Symbol, String>

Lexes the given text.

Parameters:

  • body (String) (defaults to: self.body)

    (self.body) The text to lex

Returns:

  • (Array<Array<Integer, Integer>, Symbol, String>)

    A set of tokens in Ripper’s result format. Each token is an array of the form: [[1, token_position], token_type, token_text]. I’m not exactly clear on why the 1 is always there. At any rate - the result is an array of those tokens.



28
29
30
# File 'lib/wool/analysis/lexical_analysis.rb', line 28

def lex(body = self.body)
  Ripper.lex(body).map {|token| Token.new(token) }
end

#select_token(*args) ⇒ Array<Array>

Searches for the given token using standard [body], target symbols syntax. Yields for each token found that matches the query, and returns all those who match.

Parameters:

  • body (String)

    (self.body) The first parameter is optional: the text to search. This defaults to the full text.

  • token (Symbol)

    The rest of the arguments are tokens to search for. Any number of tokens may be specified.

Returns:

  • (Array<Array>)

    All the matching tokens for the query



70
71
72
73
74
75
76
77
78
79
# File 'lib/wool/analysis/lexical_analysis.rb', line 70

def select_token(*args)
  body, list = _extract_token_search_args(args)
  result = []
  while (token = find_token(body, *list)) && token != nil
    result << token if yield(*token)
    _, body = split_on_token(body, *list)
    body = body[token.body.size..-1]
  end
  return result
end

#split_on_keyword(*args) ⇒ Array<String, String>

Splits the body into two halfs based on the first appearance of a keyword.

Examples:

split_on_keyword('x = 5 unless y == 2', :unless)
# => ['x = 5 ', 'unless y == 2']

Parameters:

  • body (String)

    (self.body) The first parameter is optional: the text to search. This defaults to the full text.

  • token (Symbol)

    The rest of the arguments are keywords to search for. Any number of keywords may be specified.

Returns:

  • (Array<String, String>)

    The body split by the keyword.



128
129
130
131
132
# File 'lib/wool/analysis/lexical_analysis.rb', line 128

def split_on_keyword(*args)
  body, keywords = _extract_token_search_args(args)
  token = find_keyword(body, *keywords)
  return _split_body_with_raw_token(body, token)
end

#split_on_token(*args) ⇒ Array<String, String>

Splits the body into two halfs based on the first appearance of a token.

Examples:

split_on_token('x = 5 unless y == 2', :on_kw)
# => ['x = 5 ', 'unless y == 2']

Parameters:

  • body (String)

    (self.body) The first parameter is optional: the text to search. This defaults to the full text.

  • token (Symbol)

    The rest of the arguments are tokens to search for. Any number of tokens may be specified.

Returns:

  • (Array<String, String>)

    The body split by the token.



144
145
146
147
148
# File 'lib/wool/analysis/lexical_analysis.rb', line 144

def split_on_token(*args)
  body, tokens = _extract_token_search_args(args)
  token = find_token(body, *tokens)
  return _split_body_with_raw_token(body, token)
end

#text_between_token_positions(text, left, right, inclusive = :none) ⇒ Object

Returns the text between two token positions. The token positions are in [line, column] format. The body, left, and right tokens must be provided, and optionally, you can override the inclusiveness of the text-between operation. It defaults to :none, for including neither the left nor right tokens in the result. You can pass :none, :left, :right, or :both.

Parameters:

  • body (String)

    (self.body) The first parameter is optional: the text to search. This defaults to the full text.

  • left (Token)

    the left token to get the text between

  • right (Token)

    the right token to get the text between

  • inclusive (Symbol) (defaults to: :none)

    should the :left, :right, :both, or :none tokens be included in the resulting text?

Returns:

  • the text between the two tokens within the text. This is necessary because the lexer provides [line, column] coordinates which is quite unfortunate.



47
48
49
50
51
52
53
54
55
56
57
58
59
# File 'lib/wool/analysis/lexical_analysis.rb', line 47

def text_between_token_positions(text, left, right, inclusive = :none)
  result = ""
  lines = text.lines.to_a
  left.line.upto(right.line) do |cur_line|
    line = lines[cur_line - 1]
    result << left.body if cur_line == left.line && (inclusive == :both || inclusive == :left)
    left_bound = cur_line == left.line ? left.col + left.body.size : 0
    right_bound = cur_line == right.line ? right.col - 1 : -1
    result << line[left_bound..right_bound]
    result << right.body if cur_line == right.line && (inclusive == :both || inclusive == :right)
  end
  result
end