Class: CaptureState
- Inherits:
-
Object
- Object
- CaptureState
- Defined in:
- lib/rpeg/captures.rb
Overview
VM and post-match capture support
q.v. LPEG’s CaptureState, lpcap.h
As a Ruby class it contains as much of the capture code (lpcap.c) as makes sense. Because of this, it needs to know the
Instance Attribute Summary collapse
-
#captures ⇒ Object
readonly
Returns the value of attribute captures.
Instance Method Summary collapse
- #advance ⇒ Object
- #capture_all ⇒ Object
- #current_breadcrumb ⇒ Object
- #done? ⇒ Boolean
-
#extract_one_string(what) ⇒ Object
This is LPEG’s addonestring (lpcap.c).
-
#extract_string_capture ⇒ Object
This is LPEG’s stringcap (lpcap.c).
-
#extract_subst_capture ⇒ Object
This is LPEG’s substcap (lpcap.c).
- #index ⇒ Object
- #index=(val) ⇒ Object
-
#initialize(breadcrumbs, subject, subject_index, extra_args, starting_index:) ⇒ CaptureState
constructor
A new instance of CaptureState.
-
#munge_last!(num) ⇒ Object
partially rotate the captures to make what is currently the final value the n-th from last value.
-
#pop(num = nil) ⇒ Object
Pop the top capture off and return it, erroring if there isn’t one.
-
#prev_end_index ⇒ Object
The ending subject index of the previous breadcrumb.
-
#push(cap) ⇒ Object
push a captured value.
-
#push_fold_capture ⇒ Object
This is LPEG’s foldcap (lpcap.c).
-
#push_function_capture ⇒ Object
This is LPEG’s functioncap (lpcap.c).
-
#push_nested_captures(add_extra: false) ⇒ Object
See pushnestedcaptures in lpcap.c.
-
#push_num_capture ⇒ Object
This is LPEG’s numcap (lpcap.c).
-
#push_one_nested_value ⇒ Object
Push nested values and then pop off all but one.
-
#push_query_capture ⇒ Object
This is LPEG’s querycap (lpcap.c).
-
#push_table_capture ⇒ Object
This is LPEG’s tablecap (lpcap.h).
-
#run_time_capture ⇒ Object
In LPEG this logic is split between the main VM loop (lpvm.c) and the runtimecap function (lpcap.c).
-
#seek_back_ref!(group_name) ⇒ Object
Search backwards from the current breadcrumb for the start of the group capture with the given name.
-
#seek_matching_open! ⇒ Object
This is LPEG’s findopen (lpcap.c).
-
#seek_next! ⇒ Object
This is LPEG’s nextcap (lpcap.c).
-
#str_caps ⇒ Object
This is LPEG’s getstrcaps (lpcap.c) /* ** Collect values from current capture into array ‘cps’.
Constructor Details
#initialize(breadcrumbs, subject, subject_index, extra_args, starting_index:) ⇒ CaptureState
Returns a new instance of CaptureState.
148 149 150 151 152 153 154 155 |
# File 'lib/rpeg/captures.rb', line 148 def initialize(, subject, subject_index, extra_args, starting_index:) @breadcrumbs = @breadcrumb_idx = starting_index || 0 @subject = subject.freeze @subject_index = subject_index @extra_args = extra_args.freeze @captures = [] end |
Instance Attribute Details
#captures ⇒ Object (readonly)
Returns the value of attribute captures.
146 147 148 |
# File 'lib/rpeg/captures.rb', line 146 def captures @captures end |
Instance Method Details
#advance ⇒ Object
194 195 196 |
# File 'lib/rpeg/captures.rb', line 194 def advance @breadcrumb_idx += 1 end |
#capture_all ⇒ Object
157 158 159 |
# File 'lib/rpeg/captures.rb', line 157 def capture_all push_capture until done? end |
#current_breadcrumb ⇒ Object
188 189 190 191 192 |
# File 'lib/rpeg/captures.rb', line 188 def raise "No available breadcrumb" if done? @breadcrumbs[@breadcrumb_idx] end |
#done? ⇒ Boolean
184 185 186 |
# File 'lib/rpeg/captures.rb', line 184 def done? @breadcrumb_idx == @breadcrumbs.size end |
#extract_one_string(what) ⇒ Object
This is LPEG’s addonestring (lpcap.c)
/* ** Evaluates a capture and adds its first value to buffer ‘b’; returns ** whether there was a value */
We just return the value, or nil if there isn’t one
533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 |
# File 'lib/rpeg/captures.rb', line 533 def extract_one_string(what) case .kind when Capture::STRING extract_string_capture when Capture::SUBST extract_subst_capture else n = push_capture return nil if n.zero? pop(n - 1) # just leave one res = pop # LPEG tests the type of this value with lua_isstring, which returns 1 if the value is a string or a number. raise "invalid #{what} value (a #{res.class})" unless res.is_a?(String) || res.is_a?(Numeric) res.to_s end end |
#extract_string_capture ⇒ Object
This is LPEG’s stringcap (lpcap.c)
We return the result
485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 |
# File 'lib/rpeg/captures.rb', line 485 def extract_string_capture fmt = .data.must_be_a(String) the_str_caps = str_caps result = +"" idx = -1 loop do idx += 1 break if idx >= fmt.length if fmt[idx] != "%" result << fmt[idx] next end idx += 1 unless ('0'..'9').cover?(fmt[idx]) result << fmt[idx] next end capture_index = fmt[idx].to_i raise "invalid capture index (#{capture_index})" if capture_index > the_str_caps.size - 1 str_cap = the_str_caps[capture_index] if str_cap.isstring result << @subject[(str_cap.subject_start)...(str_cap.subject_end)].join next end cs_index = @breadcrumb_idx @breadcrumb_idx = the_str_caps[capture_index]. val = extract_one_string("capture") # lpeg's addonestring, but return instead of appending to b raise "no values in capture index #{capture_index}" unless val result << val @breadcrumb_idx = cs_index end result end |
#extract_subst_capture ⇒ Object
This is LPEG’s substcap (lpcap.c)
457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 |
# File 'lib/rpeg/captures.rb', line 457 def extract_subst_capture = curr = .subject_index result = +"" if .full? result = @subject[curr, .size - 1].join else advance # skip open until .close? nxt = .subject_index result << @subject[curr, nxt - curr].join if (match = extract_one_string("replacement")) result << match curr = prev_end_index else # no capture index curr = nxt end end result << @subject[curr, .subject_index - curr].join end advance result end |
#index ⇒ Object
198 199 200 |
# File 'lib/rpeg/captures.rb', line 198 def index @breadcrumb_idx end |
#index=(val) ⇒ Object
202 203 204 |
# File 'lib/rpeg/captures.rb', line 202 def index=(val) @breadcrumb_idx = val end |
#munge_last!(num) ⇒ Object
partially rotate the captures to make what is currently the final value the n-th from last value. For example, if @captures is currently [0, 1, 2, 3, 4], then calling munge_last(3) makes it [0, 1, 4, 2, 3]. Now 4 (previously the last value) is third from last. When n == 1 this is a no-op
692 693 694 695 696 697 698 699 700 701 |
# File 'lib/rpeg/captures.rb', line 692 def munge_last!(num) return if num == 1 raise "Bad munge argument" unless num.positive? raise "Not enough values in array to munge it" if num > @captures.size tail = @captures.pop(num) last = tail.pop @captures << last @captures += tail end |
#pop(num = nil) ⇒ Object
Pop the top capture off and return it, erroring if there isn’t one.
If the argument is given we pop off the last n captures as a chunk (not one by one) and return them in an array.
nil might be in the stack, so we need to count rather than simply check pop for truthiness
171 172 173 174 175 176 177 178 179 180 181 182 |
# File 'lib/rpeg/captures.rb', line 171 def pop(num = nil) if num raise "Cannot pop off a negative number of elements" if num.negative? raise "There are not #{num} captures to pop" if num > @captures.size @captures.pop(num) else raise "There is not a capture to pop" unless @captures.size.positive? @captures.pop end end |
#prev_end_index ⇒ Object
The ending subject index of the previous breadcrumb
207 208 209 210 211 |
# File 'lib/rpeg/captures.rb', line 207 def prev_end_index raise "No previous breadcrumb" unless @breadcrumb_idx.positive? @breadcrumbs[@breadcrumb_idx - 1].end_index end |
#push(cap) ⇒ Object
push a captured value
162 163 164 |
# File 'lib/rpeg/captures.rb', line 162 def push(cap) @captures << cap end |
#push_fold_capture ⇒ Object
This is LPEG’s foldcap (lpcap.c)
366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 |
# File 'lib/rpeg/captures.rb', line 366 def push_fold_capture raise "no initial value for fold capture" if .full? fn = .data.must_be advance if .close? || (n = push_capture).zero? raise "no initial value for fold capture" end # discard all but one capture. This is the first value for the fold accumulator pop(n - 1) acc = pop until .close? n = push_capture acc = fn.call(acc, *pop(n)) end advance # skip close push acc 1 end |
#push_function_capture ⇒ Object
This is LPEG’s functioncap (lpcap.c)
412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 |
# File 'lib/rpeg/captures.rb', line 412 def push_function_capture proc = .data.must_be_a(Proc) # get the proc to call n = push_nested_captures # get the nested captures... args = pop(n) # ...pop them result = proc.call(*args) # ... and pass them to the proc # the results, if any, are the capture values # # The natural thing to do here would be result = Array(result) and just enumerate them to push onto the capture stack. BUT, # sometimes proc will return a Hash (when building a grammar in RE, for example) and Array({x: 1}) = [[:x, 1]], which is not # what we want. At root, the issue is that Lua is better than Ruby at distinguishing between a function that returns multiple # value and one that returns a single value that is an array. The following appears to be what we want, and remember that we # need to write capture functions that are careful to distinguish between returning [1,2,3] (multiple captures) and [[1,2,3]] # (single capture that is an array). # # Another gotcha: a function that returns nil does not give a capture, while one that returns [nil] has captured the single # value nil. # # TODO: consider whether the Maybe monad would help here. It would help distinguish between [1,2,3] and [[1,2,3]] more cleanly, # but not with "no capture" vs "nil was captured", since typically Maybe(nil) = None(). if result.is_a?(Array) result.each { |cap| push cap } result.size elsif result push result 1 else 0 end end |
#push_nested_captures(add_extra: false) ⇒ Object
See pushnestedcaptures in lpcap.c
/* ** Push on the Lua stack all values generated by nested captures inside ** the current capture. Returns number of values pushed. ‘addextra’ ** makes it push the entire match after all captured values. The ** entire match is pushed also if there are no other nested values, ** so the function never returns zero. */
We append what we find to the capture state and return their number.
Code is closely based on the LPEG code.
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 |
# File 'lib/rpeg/captures.rb', line 292 def push_nested_captures(add_extra: false) open_capture = advance if open_capture.full? cpos = open_capture.subject_index match_len = open_capture.size - 1 match_range = cpos...(cpos + match_len) push @subject[match_range].join return 1 end count = 0 count += push_capture until .close? # Nested captures # We have reached our matching close close_capture = advance if add_extra || count.zero? match_range = (open_capture.subject_index)...(close_capture.subject_index) push @subject[match_range].join count += 1 end count.must_be.positive? count end |
#push_num_capture ⇒ Object
This is LPEG’s numcap (lpcap.c)
395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 |
# File 'lib/rpeg/captures.rb', line 395 def push_num_capture idx = .data if idx.zero? # skip them all seek_next! return 0 end n = push_nested_captures raise "no capture '#{idx}" if n < idx vals = pop(n) # pop them off push vals[idx - 1] # push back the one we want 1 end |
#push_one_nested_value ⇒ Object
Push nested values and then pop off all but one
389 390 391 392 |
# File 'lib/rpeg/captures.rb', line 389 def push_one_nested_value n = push_nested_captures pop(n - 1) end |
#push_query_capture ⇒ Object
This is LPEG’s querycap (lpcap.c)
443 444 445 446 447 448 449 450 451 452 453 454 |
# File 'lib/rpeg/captures.rb', line 443 def push_query_capture hash = .data.must_be_a(Hash) push_one_nested_value query_key = pop # pop it result = hash[query_key] if result push(result) 1 else 0 # no result end end |
#push_table_capture ⇒ Object
This is LPEG’s tablecap (lpcap.h)
Instead of always producing a Hash, potentially with integer keys 0, 1, 2, …, we produce an array when there are no named groups to worry about.
-
TODO: reconsider this. Arrays are nicer than Hashes in this case but client code might not know which class to expect, especially when getting captures from a complicated pattern.
-
At first I always returned a Hash, with numeric keys 0, 1, 2, … for the anonymous captures and name keys for the others. But this felt clunky, especially when we want to, say, join the anonymous arguments into a string.
-
Maybe we should return a hash with an :anonymous key giving the array of anonymous captures, or something like that.
-
Experimental: return a TableCapture instance
330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 |
# File 'lib/rpeg/captures.rb', line 330 def push_table_capture if .full? # Empty table push [] advance return 1 end advance # move past the open capture named_results = {} indexed_results = [] next_index = 0 until .close? = if .kind == Capture::GROUP && .data # named group. We only keep track of the *first* value in the group push_one_nested_value value = pop named_results[.data] = value else # not a named group # k is the number we just got. We pop them back and put them in our result object k = push_capture (0..(k - 1)).to_a.reverse_each do |i| indexed_results[next_index + i] = pop end next_index += k end end advance # skip the close entry push Capture::TableCapture.new(named_results, indexed_results) 1 end |
#run_time_capture ⇒ Object
In LPEG this logic is split between the main VM loop (lpvm.c) and the runtimecap function (lpcap.c). As noted above, the LPEG code is complicated by the need to manage references to objects living on the Lua stack to avoid C-side memory leaks. We don’t have to worry about such things
We start at the CLOSE_RUN_TIME breadcrumb and
- change the CLOSE_RUN_TIME to a regular CLOSE
- find the matching OPEN and grab the Proc that we need to call
- push the nested captures and immediately pop them
- pass the necessary arguments to proc: the subject, the current posistion, and the just-popped captures
- clear out all the breadcrumbs after the OPEN
We return a [bc, result] pair.
- bc is the new "breadcrumb count" for the VM, as we discard the existing captures for the RunTime grouping.
- result is the result of the Proc call
566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 |
# File 'lib/rpeg/captures.rb', line 566 def run_time_capture seek_matching_open! .kind.must_be(Capture::GROUP) open_cap_idx = index proc = .data.must_be_a(Proc) # get the proc to call @subject_as_str ||= @subject.join args = [@subject_as_str, @subject_index] n = push_nested_captures args += pop(n) # prepare arguments for the function result = Array(proc.call(*args)) # ... and pass them to the proc [open_cap_idx + 1, result] end |
#seek_back_ref!(group_name) ⇒ Object
Search backwards from the current breadcrumb for the start of the group capture with the given name.
If we find it the state index is updated appropriately. If we don’t find it we raise an exception.
This is LPEG’s findback() (lpcap.c)
627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 |
# File 'lib/rpeg/captures.rb', line 627 def seek_back_ref!(group_name) group_name.must_be while @breadcrumb_idx.positive? @breadcrumb_idx -= 1 # Skip nested captures if .close? seek_matching_open! else # The opening of a capture that encloses the BACKREF. Skip it and keep going backwards next unless .full? end # We are at an open capture that was closed before our BACKREF next unless .kind == Capture::GROUP # is it a group? next unless .data == group_name # does it have the right name? # We found it! return end raise "back reference '#{group_name}' not found" end |
#seek_matching_open! ⇒ Object
This is LPEG’s findopen (lpcap.c)
Assume we are starting from a close capture. We go back to the matching open capture.
651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 |
# File 'lib/rpeg/captures.rb', line 651 def seek_matching_open! n = 0 # number of nested closes waiting for an open loop do @breadcrumb_idx -= 1 raise "subject index underflow in seek_open!" if @breadcrumb_idx.negative? if .close? n += 1 elsif !.full? # It's an open of some sort return if n.zero? n -= 1 end end end |
#seek_next! ⇒ Object
This is LPEG’s nextcap (lpcap.c)
Move to the next capture
671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 |
# File 'lib/rpeg/captures.rb', line 671 def seek_next! unless .full? n = 0 # number of nested opens waiting for a close loop do @breadcrumb_idx += 1 if .close? break if n.zero? n -= 1 elsif !.full? n += 1 end end end @breadcrumb_idx += 1 end |
#str_caps ⇒ Object
This is LPEG’s getstrcaps (lpcap.c) /* ** Collect values from current capture into array ‘cps’. Current ** capture must be Cstring (first call) or Csimple (recursive calls). ** (In first call, fills %0 with whole match for Cstring.) ** Returns number of elements in the array that were filled. */
We simply return the array of StrAux elements
591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 |
# File 'lib/rpeg/captures.rb', line 591 def str_caps result = [] first_aux = StrAux.new first_aux.isstring = true first_aux.subject_start = .subject_index result << first_aux first_is_full = .full? unless first_is_full advance # move past the Open until .close? if result.size > MAX_STR_CAPS seek_next! # just skip it elsif .kind == Capture::SIMPLE result += str_caps # get the matches recursively else # Not a string aux = StrAux.new aux.isstring = false aux. = index seek_next! result << aux end end end result[0].subject_end = .end_index advance # skip capture close/full capture result end |