Class: SyntaxSuggest::CleanDocument
- Inherits:
-
Object
- Object
- SyntaxSuggest::CleanDocument
- Defined in:
- lib/syntax_suggest/clean_document.rb
Overview
Parses and sanitizes source into a lexically aware document
Internally the document is represented by an array with each index containing a CodeLine correlating to a line from the source code.
There are three main phases in the algorithm:
-
Sanitize/format input source
-
Search for invalid blocks
-
Format invalid blocks into something meaningful
This class handles the first part.
The reason this class exists is to format input source for better/easier/cleaner exploration.
The CodeSearch class operates at the line level so we must be careful to not introduce lines that look valid by themselves, but when removed will trigger syntax errors or strange behavior.
## Join Trailing slashes
Code with a trailing slash is logically treated as a single line:
1 it "code can be split" \
2 "across multiple lines" do
In this case removing line 2 would add a syntax error. We get around this by internally joining the two lines into a single “line” object
## Logically Consecutive lines
Code that can be broken over multiple lines such as method calls are on different lines:
1 User.
2 where(name: "schneems").
3 first
Removing line 2 can introduce a syntax error. To fix this, all lines are joined into one.
## Heredocs
A heredoc is an way of defining a multi-line string. They can cause many problems. If left as a single line, the parser would try to parse the contents as ruby code rather than as a string. Even without this problem, we still hit an issue with indentation:
1 foo = <<~HEREDOC
2 "Be yourself; everyone else is already taken.""
3 ― Oscar Wilde
4 puts "I look like ruby code" # but i'm still a heredoc
5 HEREDOC
If we didn’t join these lines then our algorithm would think that line 4 is separate from the rest, has a higher indentation, then look at it first and remove it.
If the code evaluates line 5 by itself it will think line 5 is a constant, remove it, and introduce a syntax errror.
All of these problems are fixed by joining the whole heredoc into a single line.
## Comments and whitespace
Comments can throw off the way the lexer tells us that the line logically belongs with the next line. This is valid ruby but results in a different lex output than before:
1 User.
2 where(name: "schneems").
3 # Comment here
4 first
To handle this we can replace comment lines with empty lines and then re-lex the source. This removal and re-lexing preserves line index and document size, but generates an easier to work with document.
Instance Method Summary collapse
-
#call ⇒ Object
Call all of the document “cleaners” and return self.
-
#clean_sweep(source:) ⇒ Object
Remove comments.
-
#initialize(source:) ⇒ CleanDocument
constructor
A new instance of CleanDocument.
-
#join_consecutive! ⇒ Object
Smushes logically “consecutive” lines.
-
#join_groups(groups) ⇒ Object
Helper method for joining “groups” of lines.
-
#join_heredoc! ⇒ Object
Smushes all heredoc lines into one line.
-
#join_trailing_slash! ⇒ Object
Join lines with a trailing slash.
-
#lines ⇒ Object
Return an array of CodeLines in the document.
-
#take_while_including(range = 0) ⇒ Object
Helper method for grabbing elements from document.
-
#to_s ⇒ Object
Renders the document back to a string.
Constructor Details
#initialize(source:) ⇒ CleanDocument
Returns a new instance of CleanDocument.
87 88 89 90 |
# File 'lib/syntax_suggest/clean_document.rb', line 87 def initialize(source:) lines = clean_sweep(source: source) @document = CodeLine.from_source(lines.join, lines: lines) end |
Instance Method Details
#call ⇒ Object
Call all of the document “cleaners” and return self
94 95 96 97 98 99 100 |
# File 'lib/syntax_suggest/clean_document.rb', line 94 def call join_trailing_slash! join_consecutive! join_heredoc! self end |
#clean_sweep(source:) ⇒ Object
Remove comments
replace with empty newlines
source = <<~'EOM'
# Comment 1
puts "hello"
# Comment 2
puts "world"
EOM
lines = CleanDocument.new(source: source).lines
expect(lines[0].to_s).to eq("\n")
expect(lines[1].to_s).to eq("puts "hello")
expect(lines[2].to_s).to eq("\n")
expect(lines[3].to_s).to eq("puts "world")
Important: This must be done before lexing.
After this change is made, we lex the document because removing comments can change how the doc is parsed.
For example:
values = LexAll.new(source: <<~EOM))
User.
# comment
where(name: 'schneems')
EOM
expect(
values.count {|v| v.type == :on_ignored_nl}
).to eq(1)
After the comment is removed:
values = LexAll.new(source: <<~EOM))
User.
where(name: 'schneems')
EOM
expect(
values.count {|v| v.type == :on_ignored_nl}
).to eq(2)
157 158 159 160 161 162 163 164 165 166 167 |
# File 'lib/syntax_suggest/clean_document.rb', line 157 def clean_sweep(source:) # Match comments, but not HEREDOC strings with #{variable} interpolation # https://rubular.com/r/HPwtW9OYxKUHXQ source.lines.map do |line| if line.match?(/^\s*#([^{].*|)$/) $/ else line end end end |
#join_consecutive! ⇒ Object
Smushes logically “consecutive” lines
source = <<~'EOM'
User.
where(name: 'schneems').
first
EOM
lines = CleanDocument.new(source: source).join_consecutive!.lines
expect(lines[0].to_s).to eq(source)
expect(lines[1].to_s).to eq("")
The one known case this doesn’t handle is:
Ripper.lex <<~EOM
a &&
b ||
c
EOM
For some reason this introduces ‘on_ignore_newline` but with BEG type
225 226 227 228 229 230 231 232 233 234 |
# File 'lib/syntax_suggest/clean_document.rb', line 225 def join_consecutive! consecutive_groups = @document.select(&:ignore_newline_not_beg?).map do |code_line| take_while_including(code_line.index..) do |line| line.ignore_newline_not_beg? end end join_groups(consecutive_groups) self end |
#join_groups(groups) ⇒ Object
Helper method for joining “groups” of lines
Input is expected to be type Array<Array<CodeLine>>
The outer array holds the various “groups” while the inner array holds code lines.
All code lines are “joined” into the first line in their group.
To preserve document size, empty lines are placed in the place of the lines that were “joined”
266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 |
# File 'lib/syntax_suggest/clean_document.rb', line 266 def join_groups(groups) groups.each do |lines| line = lines.first # Handle the case of multiple groups in a row # if one is already replaced, move on next if @document[line.index].empty? # Join group into the first line @document[line.index] = CodeLine.new( lex: lines.map(&:lex).flatten, line: lines.join, index: line.index ) # Hide the rest of the lines lines[1..].each do |line| # The above lines already have newlines in them, if add more # then there will be double newline, use an empty line instead @document[line.index] = CodeLine.new(line: "", index: line.index, lex: []) end end self end |
#join_heredoc! ⇒ Object
Smushes all heredoc lines into one line
source = <<~'EOM'
foo = <<~HEREDOC
lol
hehehe
HEREDOC
EOM
lines = CleanDocument.new(source: source).join_heredoc!.lines
expect(lines[0].to_s).to eq(source)
expect(lines[1].to_s).to eq("")
181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 |
# File 'lib/syntax_suggest/clean_document.rb', line 181 def join_heredoc! start_index_stack = [] heredoc_beg_end_index = [] lines.each do |line| line.lex.each do |lex_value| case lex_value.type when :on_heredoc_beg start_index_stack << line.index when :on_heredoc_end start_index = start_index_stack.pop end_index = line.index heredoc_beg_end_index << [start_index, end_index] end end end heredoc_groups = heredoc_beg_end_index.map { |start_index, end_index| @document[start_index..end_index] } join_groups(heredoc_groups) self end |
#join_trailing_slash! ⇒ Object
Join lines with a trailing slash
source = <<~'EOM'
it "code can be split" \
"across multiple lines" do
EOM
lines = CleanDocument.new(source: source).join_consecutive!.lines
expect(lines[0].to_s).to eq(source)
expect(lines[1].to_s).to eq("")
246 247 248 249 250 251 252 |
# File 'lib/syntax_suggest/clean_document.rb', line 246 def join_trailing_slash! trailing_groups = @document.select(&:trailing_slash?).map do |code_line| take_while_including(code_line.index..) { |x| x.trailing_slash? } end join_groups(trailing_groups) self end |
#lines ⇒ Object
Return an array of CodeLines in the document
104 105 106 |
# File 'lib/syntax_suggest/clean_document.rb', line 104 def lines @document end |
#take_while_including(range = 0) ⇒ Object
Helper method for grabbing elements from document
Like ‘take_while` except when it stops iterating, it also returns the line that caused it to stop
296 297 298 299 300 301 302 303 304 |
# File 'lib/syntax_suggest/clean_document.rb', line 296 def take_while_including(range = 0..) take_next_and_stop = false @document[range].take_while do |line| next if take_next_and_stop take_next_and_stop = !(yield line) true end end |
#to_s ⇒ Object
Renders the document back to a string
109 110 111 |
# File 'lib/syntax_suggest/clean_document.rb', line 109 def to_s @document.join end |