Module: Wool::LexicalAnalysis

Included in:: Warning

Defined in:: lib/wool/analysis/lexical_analysis.rb

Overview

This is a set of methods that get provided to Warnings so they can perform lexical analysis of their bodies. This module handles tokenizing only - not parse-trees.

Defined Under Namespace

Classes: Token

Instance Method Summary collapse

#find_keyword(*args) ⇒ Array

Finds the first instance of a set of keywords in the body.
#find_token(*args) ⇒ Array

Finds the first instance of a set of tokens in the body.
#lex(body = self.body) ⇒ Array<Array<Integer, Integer>, Symbol, String>

Lexes the given text.
#select_token(*args) ⇒ Array<Array>

Searches for the given token using standard [body], target symbols syntax.
#split_on_keyword(*args) ⇒ Array<String, String>

Splits the body into two halfs based on the first appearance of a keyword.
#split_on_token(*args) ⇒ Array<String, String>

Splits the body into two halfs based on the first appearance of a token.
#text_between_token_positions(text, left, right, inclusive = :none) ⇒ Object

Returns the text between two token positions.

Instance Method Details

#find_keyword(*args) ⇒ `Array`

Finds the first instance of a set of keywords in the body. If no text is given to scan, then the full content is scanned.

Parameters:

body (String) —

(self.body) The first parameter is optional: the text to search. This defaults to the full text.
keyword (Symbol) —

The rest of the arguments are keywords to search for. Any number of keywords may be specified.

Returns:

(Array) —

the token in the form returned by Ripper. See #lex.

# File 'lib/wool/analysis/lexical_analysis.rb', line 89

def find_keyword(*args)
  body, list = _extract_token_search_args(args)
  list.map! {|x| x.to_s}
  lexed = lex(body)
  lexed.find.with_index do |tok, idx|
    is_keyword = tok.type == :on_kw && list.include?(tok.body)
    is_not_symbol = idx == 0 || lexed[idx-1].type != :on_symbeg
    is_keyword && is_not_symbol
  end
end

#find_token(*args) ⇒ `Array`

Finds the first instance of a set of tokens in the body. If no text is given to scan, then the full content is scanned.

Parameters:

body (String) —

(self.body) The first parameter is optional: the text to search. This defaults to the full text.
token (Symbol) —

The rest of the arguments are tokens to search for. Any number of tokens may be specified.

Returns:

(Array) —

the token in the form returned by Ripper. See #lex.

# File 'lib/wool/analysis/lexical_analysis.rb', line 108

def find_token(*args)
  body, list = _extract_token_search_args(args)
  lexed = lex(body)
  lexed.find.with_index do |tok, idx|
    is_token = list.include?(tok.type)
    is_not_symbol = idx == 0 || lexed[idx-1].type != :on_symbeg
    is_token && is_not_symbol
  end
end

#lex(body = self.body) ⇒ `Array<Array<Integer, Integer>, Symbol, String>`

Lexes the given text.

Parameters:

body (String) (defaults to: self.body) —

(self.body) The text to lex

Returns:

(Array<Array<Integer, Integer>, Symbol, String>) —

A set of tokens in Ripper’s result format. Each token is an array of the form: [[1, token_position], token_type, token_text]. I’m not exactly clear on why the 1 is always there. At any rate - the result is an array of those tokens.



28
29
30

# File 'lib/wool/analysis/lexical_analysis.rb', line 28

def lex(body = self.body)
  Ripper.lex(body).map {|token| Token.new(token) }
end

#select_token(*args) ⇒ `Array<Array>`

Searches for the given token using standard [body], target symbols syntax. Yields for each token found that matches the query, and returns all those who match.

Parameters:

body (String) —

(self.body) The first parameter is optional: the text to search. This defaults to the full text.
token (Symbol) —

The rest of the arguments are tokens to search for. Any number of tokens may be specified.

Returns:

(Array<Array>) —

All the matching tokens for the query

# File 'lib/wool/analysis/lexical_analysis.rb', line 70

def select_token(*args)
  body, list = _extract_token_search_args(args)
  result = []
  while (token = find_token(body, *list)) && token != nil
    result << token if yield(*token)
    _, body = split_on_token(body, *list)
    body = body[token.body.size..-1]
  end
  return result
end

#split_on_keyword(*args) ⇒ `Array<String, String>`

Splits the body into two halfs based on the first appearance of a keyword.

Examples:

split_on_keyword('x = 5 unless y == 2', :unless)
# => ['x = 5 ', 'unless y == 2']

Parameters:

body (String) —

(self.body) The first parameter is optional: the text to search. This defaults to the full text.
token (Symbol) —

The rest of the arguments are keywords to search for. Any number of keywords may be specified.

Returns:

(Array<String, String>) —

The body split by the keyword.

# File 'lib/wool/analysis/lexical_analysis.rb', line 128

def split_on_keyword(*args)
  body, keywords = _extract_token_search_args(args)
  token = find_keyword(body, *keywords)
  return _split_body_with_raw_token(body, token)
end

#split_on_token(*args) ⇒ `Array<String, String>`

Splits the body into two halfs based on the first appearance of a token.

Examples:

split_on_token('x = 5 unless y == 2', :on_kw)
# => ['x = 5 ', 'unless y == 2']

Parameters:

body (String) —

(self.body) The first parameter is optional: the text to search. This defaults to the full text.
token (Symbol) —

The rest of the arguments are tokens to search for. Any number of tokens may be specified.

Returns:

(Array<String, String>) —

The body split by the token.

# File 'lib/wool/analysis/lexical_analysis.rb', line 144

def split_on_token(*args)
  body, tokens = _extract_token_search_args(args)
  token = find_token(body, *tokens)
  return _split_body_with_raw_token(body, token)
end

#text_between_token_positions(text, left, right, inclusive = :none) ⇒ `Object`

Returns the text between two token positions. The token positions are in [line, column] format. The body, left, and right tokens must be provided, and optionally, you can override the inclusiveness of the text-between operation. It defaults to :none, for including neither the left nor right tokens in the result. You can pass :none, :left, :right, or :both.

Parameters:

body (String) —

(self.body) The first parameter is optional: the text to search. This defaults to the full text.
left (Token) —

the left token to get the text between
right (Token) —

the right token to get the text between
inclusive (Symbol) (defaults to: :none) —

should the :left, :right, :both, or :none tokens be included in the resulting text?

Returns:

the text between the two tokens within the text. This is necessary because the lexer provides [line, column] coordinates which is quite unfortunate.

# File 'lib/wool/analysis/lexical_analysis.rb', line 47

def text_between_token_positions(text, left, right, inclusive = :none)
  result = ""
  lines = text.lines.to_a
  left.line.upto(right.line) do |cur_line|
    line = lines[cur_line - 1]
    result << left.body if cur_line == left.line && (inclusive == :both || inclusive == :left)
    left_bound = cur_line == left.line ? left.col + left.body.size : 0
    right_bound = cur_line == right.line ? right.col - 1 : -1
    result << line[left_bound..right_bound]
    result << right.body if cur_line == right.line && (inclusive == :both || inclusive == :right)
  end
  result
end

Module: Wool::LexicalAnalysis

Overview

Defined Under Namespace

Instance Method Summary collapse

Instance Method Details

#find_keyword(*args) ⇒ Array

#find_token(*args) ⇒ Array

#lex(body = self.body) ⇒ Array<Array<Integer, Integer>, Symbol, String>

#select_token(*args) ⇒ Array<Array>

#split_on_keyword(*args) ⇒ Array<String, String>

#split_on_token(*args) ⇒ Array<String, String>

#text_between_token_positions(text, left, right, inclusive = :none) ⇒ Object

#find_keyword(*args) ⇒ `Array`

#find_token(*args) ⇒ `Array`

#lex(body = self.body) ⇒ `Array<Array<Integer, Integer>, Symbol, String>`

#select_token(*args) ⇒ `Array<Array>`

#split_on_keyword(*args) ⇒ `Array<String, String>`

#split_on_token(*args) ⇒ `Array<String, String>`

#text_between_token_positions(text, left, right, inclusive = :none) ⇒ `Object`