Class: RPEG::Pattern
- Inherits:
-
Object
- Object
- RPEG::Pattern
- Defined in:
- lib/rpeg/rpeg.rb
Overview
The class representing “patterns” and containing the logic to turn them into programs for the virtual machine.
Very roughly, this is where the LPEG code in lptree.c and lpcode.c lives
Constant Summary collapse
- NODE_TYPES =
%i[ charset char any seq ordered_choice repeated not and ntrue nfalse grammar open_call rule call capture runtime behind ].each do |op| const_set op.upcase, op end
- FULL_CHAR_SET =
We assume we have UTF-8 input with no multibyte characters.
Set.new((0..255).map{ _1.chr(Encoding::UTF_8) })
Instance Attribute Summary collapse
-
#capture ⇒ Object
readonly
Returns the value of attribute capture.
-
#data ⇒ Object
sometimes we need to tweak this.
-
#left ⇒ Object
readonly
Returns the value of attribute left.
-
#right ⇒ Object
readonly
Returns the value of attribute right.
-
#type ⇒ Object
readonly
Returns the value of attribute type.
Instance Method Summary collapse
-
#*(other) ⇒ Object
p1 * p2 means p1 followed by p2.
-
#**(other) ⇒ Object
pat ** n means “n or more occurrences of pat” when n is non-negative, and “up to -n occurrences of pat” if n is negative.
-
#+(other) ⇒ Object
p1 + p2 is ordered choice: if p1 matches we match and never consider p2, otherwise try matching on p2.
-
#+@ ⇒ Object
Unary “and”: pattern matches here (without consuming any input).
-
#-(other) ⇒ Object
Difference is “this but not that”.
-
#-@ ⇒ Object
Unary negation represents “does not match”.
-
#/(other) ⇒ Object
Replacement captures of various kinds.
-
#call_recursive(func, default) ⇒ Object
From callrecursive in LPEG’s lpcode.c.
- #charset ⇒ Object
-
#charsetlike? ⇒ Boolean
Pattern properties.
-
#check_pred(pred) ⇒ Object
The is lpeg’s checkaux from lpcode.c.
-
#child ⇒ Object
If left is defined and right is nil - so we have a unary op - we can get child here.
-
#code(follow_set: FULL_CHAR_SET, dominating_test: nil, active_choice: false) ⇒ Object
follow_set - the set of first characters accepted by whatever comes after us, or the full set of characters if nothing follows us.
-
#coerce(other) ⇒ Object
This only happens if other is a Numeric type, which is annoying.
-
#convert_open_call_to_call!(rule, ref) ⇒ Object
Special operation when closing open calls.
-
#first_set(follow_set = FULL_CHAR_SET) ⇒ Object
LPEG’s getfirst.
- #fix_type(other) ⇒ Object
-
#fixed_len ⇒ Object
fixedlen from LPEG’s lpcode.h.
-
#has_captures? ⇒ Boolean
From hascaptures in LPEG’s lpcode.c /* ** Check whether a pattern tree has captures */.
-
#head_fail? ⇒ Boolean
LPEG’s headfail.
-
#i ⇒ Object
shorthand.
-
#initialize(type, left = nil, right = nil, data: nil, capture: nil) ⇒ Pattern
constructor
Left and right are subpatterns.
-
#loops? ⇒ Boolean
From checkloops in lptree.c.
-
#match(str, init = 0, *extra_args) ⇒ Object
Return the index just after the matching prefix of str or nil if there is no match.
-
#need_follow? ⇒ Boolean
LPEG’s needfollow (lpcode.c).
- #nofail? ⇒ Boolean
- #nullable? ⇒ Boolean
-
#num_children ⇒ Object
TODO: consider recurising via a #children method.
- #to_s ⇒ Object
-
#verify_rule ⇒ Object
We check if a rule can be left-recursive, i.e., whether we can return to the rule without consuming any input.
Constructor Details
#initialize(type, left = nil, right = nil, data: nil, capture: nil) ⇒ Pattern
Left and right are subpatterns. data is other relevant data capture is used for Capture patterns
1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 |
# File 'lib/rpeg/rpeg.rb', line 1413 def initialize(type, left = nil, right = nil, data: nil, capture: nil) raise "Bad node type #{type}" unless NODE_TYPES.include?(type) @type = type @left = left @right = right @data = data @capture = capture sanity_check return unless type == GRAMMAR fix_up_grammar verify_grammar end |
Instance Attribute Details
#capture ⇒ Object (readonly)
Returns the value of attribute capture.
354 355 356 |
# File 'lib/rpeg/rpeg.rb', line 354 def capture @capture end |
#data ⇒ Object
sometimes we need to tweak this
355 356 357 |
# File 'lib/rpeg/rpeg.rb', line 355 def data @data end |
#left ⇒ Object (readonly)
Returns the value of attribute left.
354 355 356 |
# File 'lib/rpeg/rpeg.rb', line 354 def left @left end |
#right ⇒ Object (readonly)
Returns the value of attribute right.
354 355 356 |
# File 'lib/rpeg/rpeg.rb', line 354 def right @right end |
#type ⇒ Object (readonly)
Returns the value of attribute type.
354 355 356 |
# File 'lib/rpeg/rpeg.rb', line 354 def type @type end |
Instance Method Details
#*(other) ⇒ Object
p1 * p2 means p1 followed by p2
390 391 392 393 394 395 396 397 398 399 400 401 |
# File 'lib/rpeg/rpeg.rb', line 390 def *(other) other = fix_type(other) # true is the identity for * return self if other.type == NTRUE return other if type == NTRUE # rejigger to make SEQ right-associative. I don't know that it makes a difference, but LPEG does it. return left * (right * other) if type == SEQ Pattern.new(SEQ, self, other) end |
#**(other) ⇒ Object
pat ** n means “n or more occurrences of pat” when n is non-negative, and “up to -n occurrences of pat” if n is negative.
423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 |
# File 'lib/rpeg/rpeg.rb', line 423 def **(other) n = other.must_be_a(Integer) if n >= 0 raise "Pattern may match 0-length string so repetition may lead to an infinite loop" if nullable? patt = Pattern.new(REPEATED, self) # this repeats 0 or more times while n.positive? patt = self * patt n -= 1 end else n = -n patt = RPEG.P(true) while n.positive? patt = self * patt + true n -= 1 end end patt end |
#+(other) ⇒ Object
p1 + p2 is ordered choice: if p1 matches we match and never consider p2, otherwise try matching on p2
404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 |
# File 'lib/rpeg/rpeg.rb', line 404 def +(other) other = fix_type(other) if charsetlike? && other.charsetlike? # Take the union of the charsets RPEG.S(charset + other.charset) elsif type == NFALSE other elsif other.type == NFALSE self elsif type == ORDERED_CHOICE # rejigger to make this operation right-associative, giving more efficient compiled code. See Ierusalimschy 4.2 left + (right + other) else Pattern.new(ORDERED_CHOICE, self, other) end end |
#+@ ⇒ Object
Unary “and”: pattern matches here (without consuming any input)
Ierusalimschy points out that &patt can be implemented as –patt, but there is an optimization for the VM, so we preserve it
454 455 456 |
# File 'lib/rpeg/rpeg.rb', line 454 def +@ Pattern.new(AND, self) end |
#-(other) ⇒ Object
Difference is “this but not that”. So p1 - p2 matches if p1 does and p2 doesn’t. Here, p2 doesn’t consume any input but p1 does. The pattern is equivalent to -p2 * p1.
Special case: if both patterns are charsets we replace with a single charset
462 463 464 465 466 467 468 469 |
# File 'lib/rpeg/rpeg.rb', line 462 def -(other) other = fix_type(other) return RPEG.S(charset - other.charset) if charsetlike? && other.charsetlike? # Otherwise we use -p2 * p1, i.e., "p2 doesn't match here" followed by "try to match and consume p1" -other * self end |
#-@ ⇒ Object
Unary negation represents “does not match”. So -patt says that there is no match at the current position and we don’t consume any of the string
447 448 449 |
# File 'lib/rpeg/rpeg.rb', line 447 def -@ Pattern.new(NOT, self) end |
#/(other) ⇒ Object
Replacement captures of various kinds
From the LPEG docs:
patt / string
Creates a string capture. It creates a capture string based on string. The captured value is a copy of string, except that
the character % works as an escape character: any sequence in string of the form %n, with n between 1 and 9, stands for
the match of the n-th capture in patt. The sequence %0 stands for the whole match. The sequence %% stands for a single %.
patt / number
Creates a numbered capture. For a non-zero number, the captured value is the n-th value captured by patt. When number is
zero, there are no captured values.
patt / table [for this we accept a Hash]
Creates a query capture. It indexes the given table using as key the first value captured by patt, or the whole match if
patt produced no value. The value at that index is the final value of the capture. If the table does not have that key,
there is no captured value.
patt / function
Creates a function capture. It calls the given function passing all captures made by patt as arguments, or the whole match
if patt made no capture. The values returned by the function are the final values of the capture. In particular, if
function returns no value, there is no captured value.
497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 |
# File 'lib/rpeg/rpeg.rb', line 497 def /(other) case other when String Pattern.new(CAPTURE, self, data: other, capture: Capture::STRING) when Integer raise "Cannot use negative number for numbered capture" if other.negative? Pattern.new(CAPTURE, self, data: other, capture: Capture::NUM) when Hash Pattern.new(CAPTURE, self, data: other, capture: Capture::QUERY) when Proc Pattern.new(CAPTURE, self, data: other, capture: Capture::FUNCTION) else raise "Replacement capture is not supported for #{other}" end end |
#call_recursive(func, default) ⇒ Object
From callrecursive in LPEG’s lpcode.c
/* ** Visit a TCall node taking care to stop recursion. If node not yet ** visited, return ‘f(sib2(tree))’, otherwise return ‘def’ (default ** value) */
This method acts as a circuit breaker for structural recursion that might otherwise get in a loop among mutually recursive grammar rules.
It’s janky, but we follow LPEG’s approach of hijacking the key field (which we call data) to keep track of the recursion
738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 |
# File 'lib/rpeg/rpeg.rb', line 738 def call_recursive(func, default) type.must_be CALL child.type.must_be RULE already_visited = :already_visited saved_data = @data if saved_data == already_visited default else # first time we've been here @data = already_visited result = send(func) @data = saved_data result end end |
#charset ⇒ Object
587 588 589 590 591 592 593 594 595 596 597 598 |
# File 'lib/rpeg/rpeg.rb', line 587 def charset raise "Pattern #{type} isn't charset-like" unless charsetlike? case type when CHARSET data when CHAR Set.new([data]) when ANY FULL_CHAR_SET end end |
#charsetlike? ⇒ Boolean
Pattern properties
583 584 585 |
# File 'lib/rpeg/rpeg.rb', line 583 def charsetlike? type == CHARSET || type == CHAR || type == ANY end |
#check_pred(pred) ⇒ Object
The is lpeg’s checkaux from lpcode.c. Comment from that function (reformatted):
/* ** Checks how a pattern behaves regarding the empty string, in one of two different ways:
** - A pattern is nullable if it can match without consuming any character; ** - A pattern is nofail if it never fails for any string (including the empty string).
** The difference is only for predicates and run-time captures; for other patterns, the two properties are equivalent. (With ** predicates, &‘a’ is nullable but not nofail. Of course, nofail => nullable.)
** These functions are all convervative in the following way: ** p is nullable => nullable(p) ** nofail(p) => p cannot fail
** The function assumes that TOpenCall is not nullable; this will be checked again when the grammar is fixed.
** Run-time captures can do whatever they want, so the result is conservative. */
631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 |
# File 'lib/rpeg/rpeg.rb', line 631 def check_pred(pred) raise "Bad check predicate #{pred}" unless %i[nullable nofail].include?(pred) case type when CHAR, CHARSET, ANY, OPEN_CALL, NFALSE # Not nullable; for open_call this is a blind assumption false when NTRUE, REPEATED true when NOT, BEHIND # can match empty, but can fail pred != :nofail when AND # can match empty; can fail exactly when body can return true if pred == :nullable child.check_pred(pred) when RUNTIME # can fail; match empty iff body does return false if pred == :nofail child.check_pred(pred) when SEQ left.check_pred(pred) && right.check_pred(pred) when ORDERED_CHOICE left.check_pred(pred) || right.check_pred(pred) when GRAMMAR # Strings are matched by the initial nonterminal child.first.check_pred(pred) when CALL, RULE, CAPTURE child.check_pred(pred) else raise "Unhandled pattern type #{type}" end end |
#child ⇒ Object
If left is defined and right is nil - so we have a unary op - we can get child here
373 374 375 376 377 |
# File 'lib/rpeg/rpeg.rb', line 373 def child raise 'Pattern is not unary' if right left end |
#code(follow_set: FULL_CHAR_SET, dominating_test: nil, active_choice: false) ⇒ Object
follow_set
-
the set of first characters accepted by whatever comes after us, or the full set of characters if nothing follows us
dominating_test
-
a TEST_CHAR, TEST_CHARSET, or TEST_ANY instruction that we can assume has succeeded and which might save us a little time
-
see the tt argument sprinkled through the functions in LPEG’s lpcode.c
active_option
-
there is a CHOICE instruction still “active” in the code already generated
-
certain pattern types can take advantange of this to avoid another CHOICE instruction
-
this is called ‘opt’ in lpcode.c
NOTE: don’t cache the results as we did before, because the code depends on the arguments
932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 |
# File 'lib/rpeg/rpeg.rb', line 932 def code(follow_set: FULL_CHAR_SET, dominating_test: nil, active_choice: false) code = [] case type when CHARSET code << charset_code(data, dominating_test) when CHAR code << char_code(data, dominating_test) when ANY code << Instruction.new(i::ANY) when SEQ code.concat seq_code(follow_set, dominating_test) when NTRUE # we always succeed, which means we don't have to do anything at all when NFALSE code << Instruction.new(i::FAIL) when OPEN_CALL # we resolved these to CALL when the grammar node was created. So if we see one now it is because it was not contained in a # grammar. raise 'OPEN_CALL node appears outside of a grammar' when CALL # This is symbolic target for now. It will be converted to a numeric offset during GRAMMAR analysis code << Instruction.new(i::CALL, offset: data) when ORDERED_CHOICE code.concat choice_code(follow_set, active_choice) when REPEATED code.concat repeated_code(follow_set, active_choice) when NOT code.concat not_code when AND code.concat and_code(dominating_test) when BEHIND code << Instruction.new(i::BEHIND, aux: data) if data.positive? code.concat child.code when CAPTURE c = child.code(follow_set:, dominating_test:) len = fixed_len if len && !child.has_captures? code.concat c code << Instruction.new(i::FULL_CAPTURE, data:, aux: { capture_length: len, kind: capture }) else code << Instruction.new(i::OPEN_CAPTURE, data:, aux: { kind: capture }) code.concat c code << Instruction.new(i::CLOSE_CAPTURE, aux: { kind: Capture::CLOSE }) end when RUNTIME code << Instruction.new(i::OPEN_CAPTURE, data:, aux: { kind: Capture::GROUP }) code.concat child.code(follow_set: FULL_CHAR_SET, dominating_test:) code << Instruction.new(i::CLOSE_RUN_TIME, aux: { kind: Capture::CLOSE }) when RULE code = child.code(follow_set:) code[0] = code.first.clone code.first.dec = data # decorate with the nonterminal, but clone first to avoid unexpected mutations when GRAMMAR code.concat grammar_code else raise "Unhandled pattern type #{type}" end code end |
#coerce(other) ⇒ Object
This only happens if other is a Numeric type, which is annoying. See the monkeypatching below for other cases.
385 386 387 |
# File 'lib/rpeg/rpeg.rb', line 385 def coerce(other) [RPEG.P(other), self] end |
#convert_open_call_to_call!(rule, ref) ⇒ Object
Special operation when closing open calls
1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 |
# File 'lib/rpeg/rpeg.rb', line 1429 def convert_open_call_to_call!(rule, ref) raise "Cannot convert pattern to CALL" unless type == OPEN_CALL raise "Must give rule and nonterminal symbol to CALL pattern" unless rule && ref raise "Rule for CALL pattern must be a rule, got #{rule.type}" unless rule.type == RULE @type = CALL @left = rule @data = ref # We must check these again when needed rather than use the memoized values %i[@nullable @nofail].each do |ivar| remove_instance_variable(ivar) if instance_variable_defined?(ivar) end end |
#first_set(follow_set = FULL_CHAR_SET) ⇒ Object
LPEG’s getfirst
static int getfirst (TTree *tree, const Charset *follow, Charset *firstset) {
/* ** Computes the ‘first set’ of a pattern. ** The result is a conservative aproximation: ** match p ax -> x (for some x) ==> a belongs to first(p) ** or ** a not in first(p) ==> match p ax -> fail (for all x)
- So we want to know the set of characters that can make the pattern succeed, at least on the first characters
-
** ** The set ‘follow’ is the first set of what follows the ** pattern (full set if nothing follows it). ** ** The function returns 0 when this resulting set can be used for ** test instructions that avoid the pattern altogether. ** A non-zero return can happen for two reasons: ** 1) match p ” -> ” ==> return has bit 1 set ** (tests cannot be used because they would always fail for an empty input); ** 2) there is a match-time capture ==> return has bit 2 set ** (optimizations should not bypass match-time captures). */
I don’t really understand what is going on here. I’m hoping it will make more sense as I port it. I think we pass in follow and return the int and firstset.
807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 |
# File 'lib/rpeg/rpeg.rb', line 807 def first_set(follow_set = FULL_CHAR_SET) case type when CHAR, CHARSET, ANY [0, charset] when NTRUE [1, follow_set.clone] # /* accepts the empty string */ when NFALSE [0, Set.new] when ORDERED_CHOICE e1, first1 = left.first_set(follow_set) e2, first2 = right.first_set(follow_set) [e1 | e2, first1 | first2] when SEQ if !left.nullable? # /* when p1 is not nullable, p2 has nothing to contribute; # return getfirst(sib1(tree), fullset, firstset); */ left.first_set(FULL_CHAR_SET) else e2, first2 = right.first_set(follow_set) e1, first1 = left.first_set(first2) return [0, first1] if e1.zero? # /* 'e1' ensures that first can be used */ return [2, first1] if (e1 | e2) & 2 == 2 # /* one of the children has a matchtime? */ [e2, first1] # /* else depends on 'e2' */ end when REPEATED _, first_cs = child.first_set(follow_set) [1, first_cs | follow_set] # /* accept the empty string */ when CAPTURE, RULE child.first_set(follow_set) when GRAMMAR child.first.first_set(follow_set) when RUNTIME # NOTE: I don't understand this # # /* function invalidates any follow info. */ e, first_set = child.first_set(FULL_CHAR_SET) if e.positive? [2, first_set] # /* function is not "protected"? */ else [0, first_set] # /* pattern inside capture ensures first can be used */ end when CALL child.first_set(follow_set) when AND e, first_set = child.first_set(follow_set) [e, first_set & follow_set] when NOT, BEHIND if type == NOT && child.charsetlike? [1, FULL_CHAR_SET - child.charset] else # /* instruction gives no new information */ # /* call 'getfirst' only to check for math-time captures */ e, = child.first_set(follow_set) [e | 1, follow_set] # /* always can accept the empty string */ end else raise "Unhandled node type #{type}" end end |
#fix_type(other) ⇒ Object
514 515 516 517 518 |
# File 'lib/rpeg/rpeg.rb', line 514 def fix_type(other) return other if other.is_a?(Pattern) RPEG.P(other) # see what we can do end |
#fixed_len ⇒ Object
fixedlen from LPEG’s lpcode.h
/* ** number of characters to match a pattern (or -1 if variable) */
We return nil if the node’s matches are not all of the same length
674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 |
# File 'lib/rpeg/rpeg.rb', line 674 def fixed_len case type when CHARSET, CHAR, ANY 1 when NOT, AND, NTRUE, NFALSE, BEHIND 0 when REPEATED, OPEN_CALL, RUNTIME nil when CAPTURE, RULE child.fixed_len when GRAMMAR child.first.fixed_len # the first rule is the initial nonterminal when CALL call_recursive(:fixed_len, nil) when SEQ left_len = left.fixed_len return nil unless left_len right_len = right.fixed_len return nil unless right_len left_len + right_len when ORDERED_CHOICE left_len = left.fixed_len return nil unless left_len right_len = right.fixed_len right_len == left_len ? right_len : nil else raise "Unhandled node type #{type}" end end |
#has_captures? ⇒ Boolean
From hascaptures in LPEG’s lpcode.c /* ** Check whether a pattern tree has captures */
761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 |
# File 'lib/rpeg/rpeg.rb', line 761 def has_captures? case type when CAPTURE, RUNTIME true when CALL call_recursive(:has_captures?, false) when GRAMMAR child.any?(&:has_captures?) else case num_children when 0 false when 1 child.has_captures? when 2 left.has_captures? || right.has_captures? end end end |
#head_fail? ⇒ Boolean
LPEG’s headfail
/* ** If ‘headfail(tree)’ true, then ‘tree’ can fail only depending on the ** next character of the subject. */
874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 |
# File 'lib/rpeg/rpeg.rb', line 874 def head_fail? case type when CHAR, CHARSET, ANY, NFALSE true when NTRUE, REPEATED, RUNTIME, NOT false when CAPTURE, RULE, AND, CALL child.head_fail? when GRAMMAR child.first.head_fail? when SEQ return false unless right.nofail? left.head_fail? when ORDERED_CHOICE left.head_fail? && right.head_fail? else raise "Unhandled node type #{type}" end end |
#loops? ⇒ Boolean
From checkloops in lptree.c
/* ** Check whether a tree has potential infinite loops */
712 713 714 715 716 717 718 719 720 721 722 723 724 |
# File 'lib/rpeg/rpeg.rb', line 712 def loops? return true if type == REPEATED && child.nullable? # /* sub-grammars already checked */, i.e., by verify_grammar return false if type == GRAMMAR || type == CALL case num_children when 1 child.loops? when 2 left.loops? || right.loops? end end |
#match(str, init = 0, *extra_args) ⇒ Object
Return the index just after the matching prefix of str or nil if there is no match
str: the string the match against init: the string index to start at, defaulting to 0 extra_args: used by Argument Captures
362 363 364 365 366 367 368 369 370 |
# File 'lib/rpeg/rpeg.rb', line 362 def match(str, init = 0, *extra_args) # Note that the program doesn't depend on the arguments so we can cache it @program ||= optimize_jumps(code(follow_set: FULL_CHAR_SET) + [Instruction.new(i::OP_END)]) machine = ParsingMachine.new(@program, str, init, extra_args) machine.run return machine.captures if machine.success? end |
#need_follow? ⇒ Boolean
LPEG’s needfollow (lpcode.c)
/* ** Check whether the code generation for the given tree can benefit ** from a follow set (to avoid computing the follow set when it is ** not needed) */
1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 |
# File 'lib/rpeg/rpeg.rb', line 1305 def need_follow? case type when CHAR, CHARSET, ANY, NFALSE, NTRUE, AND, NOT, RUNTIME, GRAMMAR, CALL, BEHIND false when ORDERED_CHOICE, REPEATED true when CAPTURE child.need_follow? when SEQ right.need_follow? else raise "Unhandled case #{type} in need_follow?" end end |
#nofail? ⇒ Boolean
606 607 608 609 610 |
# File 'lib/rpeg/rpeg.rb', line 606 def nofail? return @nofail if defined? @nofail @nofail = check_pred(:nofail) end |
#nullable? ⇒ Boolean
600 601 602 603 604 |
# File 'lib/rpeg/rpeg.rb', line 600 def nullable? return @nullable if defined? @nullable @nullable = check_pred(:nullable) end |
#num_children ⇒ Object
TODO: consider recurising via a #children method. Then we can handle the necessary rules in a GRAMMAR just once.
896 897 898 899 900 901 902 903 904 905 906 907 908 909 |
# File 'lib/rpeg/rpeg.rb', line 896 def num_children case type when CHARSET, CHAR, ANY, NTRUE, NFALSE, OPEN_CALL 0 when REPEATED, AND, NOT, CALL, RULE, CAPTURE, RUNTIME, BEHIND 1 when SEQ, ORDERED_CHOICE 2 when GRAMMAR raise "#num_children isn't meaningful for GRAMMAR nodes" else raise "Unhandled pattern type #{type}" end end |
#to_s ⇒ Object
520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 |
# File 'lib/rpeg/rpeg.rb', line 520 def to_s return @to_s if @to_s result = [] do_sub_pattern = lambda do |sub_patt| sub_patt.to_s.split("\n").each do |line| result << "| #{line}" end end type_s = type.to_s.capitalize case type when CHARSET result << "Charset: #{data.join.dump}" when ANY result << "#{type_s}: #{data}" when CHAR result << "#{type_s}: #{data.dump}" when NTRUE result << "TRUE" when NFALSE result << "FALSE" when OPEN_CALL result << "OpenCall: #{data}" when CALL result << "Call: #{data}" when SEQ, ORDERED_CHOICE result << (type == SEQ ? "Seq:" : "Ordered Choice:") do_sub_pattern.call(left) do_sub_pattern.call(right) when RULE result << "nonterminal: #{data}" when REPEATED, NOT, AND, BEHIND result << "#{type_s}: " do_sub_pattern.call(child) when CAPTURE result << "Capture: #{capture} #{data.inspect}" do_sub_pattern.call(child) when RUNTIME result << "Runtime: #{capture} #{data.inspect}" do_sub_pattern.call(child) when GRAMMAR result << "Grammar:" first = true child.each do |nonterminal, rule_pattern| prefix = " #{nonterminal}: " first = true rule_pattern.to_s.split("\n").each do |line| line_prefix = first ? prefix : (" " * prefix.len) result << "#{line_prefix}#{line}" end end else raise "Unhandled type for to_s: #{type}" end @to_s = result.join("\n") end |
#verify_rule ⇒ Object
We check if a rule can be left-recursive, i.e., whether we can return to the rule without consuming any input. The plan is to walk the tree into subtrees whenever we see we can do so without consuming any input.
LPEG comment follows. Note that we check for nullability directly for sanity’s sake.
/* ** Check whether a rule can be left recursive; raise an error in that ** case; otherwise return 1 iff pattern is nullable. ** The return value is used to check sequences, where the second pattern ** is only relevant if the first is nullable. ** Parameter ‘nb’ works as an accumulator, to allow tail calls in ** choices. (‘nb’ true makes function returns true.) ** Parameter ‘passed’ is a list of already visited rules, ‘npassed’ ** counts the elements in ‘passed’. ** Assume ktable at the top of the stack. */
1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 |
# File 'lib/rpeg/rpeg.rb', line 1346 def verify_rule raise "verify_rule called on something that isn't a rule" unless type == RULE rules_seen = [] local_rec = lambda do |pattern, num_rules_seen| case pattern.type when CHAR, CHARSET, ANY, NTRUE, NFALSE, BEHIND # no op when NOT, AND, REPEATED, CAPTURE, RUNTIME # nullable, so keep going local_rec.call(pattern.child, num_rules_seen) when CALL local_rec.call(pattern.child, num_rules_seen) when SEQ local_rec.call(pattern.left, num_rules_seen) # only check 2nd child if first is nullable local_rec.call(pattern.right, num_rules_seen) if pattern.left.nullable? when ORDERED_CHOICE # must check both children local_rec.call(pattern.left, num_rules_seen) local_rec.call(pattern.right, num_rules_seen) when RULE raise "rule '#{pattern.data}' may be left-recursive" if rules_seen[0...num_rules_seen].include?(pattern) num_rules_seen += 1 rules_seen[num_rules_seen] = pattern local_rec.call(pattern.child, num_rules_seen) when GRAMMAR # LPEG says: /* sub-grammar cannot be left recursive */ # But why? I guess because we would have rejected it at creation. else raise "Unhandled case #{pattern.type} in verify_rule" end end local_rec.call(self, 0) end |