Sourcify

ParseTree is great, it accesses the runtime AST (abstract syntax tree) and makes it possible to convert any object to ruby code & S-expression, BUT ParseTree doesn’t work for 1.9.* & JRuby.

RubyParser is great, and it works for any rubies (of course, not 100% compatible for 1.8.7 & 1.9.* syntax yet), BUT it works only with static code.

I truely enjoy using the above tools, but with my other projects, the absence of ParseTree on the different rubies is forcing me to hand-baked my own solution each time to extract the proc code i need at runtime. This is frustrating, the solution for each of them is never perfect, and i’m reinventing the wheel each time just to address a particular pattern of usage (using regexp kungfu).

Enough is enough, and now we have Sourcify, a unified solution to extract proc code. When ParseTree is available, it simply works as a thin wrapper round it, otherwise, it uses a home-baked ragel-generated scanner to extract the proc code. Further processing with RubyParser & Ruby2Ruby to ensure 100% with ParseTree (yup, there is no denying that i really like ParseTree).

Installing It

The religiously standard way:

$ gem install ParseTree sourcify

Or on 1.9.* or JRuby:

$ gem install ruby_parser file-tail sourcify

Using It

Sourcify adds 4 methods to Proc:

1. Proc#to_source

Returns the code representation of the proc:

require 'sourcify'

lambda { x + y }.to_source
# >> "proc { (x + y) }"

proc { x + y }.to_source
# >> "proc { (x + y) }"

Like it or not, a lambda is represented as a proc when converted to source (exactly the same way as ParseTree). It is possible to only extract the body of the proc by passing in => true:

lambda { x + y }.to_source(:strip_enclosure => true)
# >> "(x + y)"

lambda {|i| i + 2 }.to_source(:strip_enclosure => true)
# >> "(i + 2)"

2. Proc#to_sexp

Returns the S-expression of the proc:

require 'sourcify'

x = 1
lambda { x + y }.to_sexp
# >> s(:iter,
# >>  s(:call, nil, :proc, s(:arglist)),
# >>   nil,
# >>    s(:call, s(:lvar, :x), :+, s(:arglist, s(:call, nil, :y, s(:arglist)))))

To extract only the body of the proc:

lambda { x + y }.to_sexp(:strip_enclosure => true)
# >> s(:call, s(:lvar, :x), :+, s(:arglist, s(:call, nil, :y, s(:arglist)))))

3. Proc#to_raw_source

Unlike Proc#to_source, which returns code that retains only functional aspects, fetching of raw source returns the raw code enclosed within the proc, including fluff like comments:

lambda do |i|
  i+1 # (blah)
end.to_source
# >> "proc do |i|
# >>   i+1 # (blah)
# >> end"

NOTE: This is extracting of raw code, it relies on static code scanning (even when running in ParseTree mode), the gotchas for static code scanning always apply.

4. Proc#source_location

By default, this is only available on 1.9.*, it is added (as a bonus) to provide consistency under 1.8.*:

# /tmp/test.rb
require 'sourcify'

lambda { x + y }.source_location
# >> ["/tmp/test.rb", 5]

Performance

Performance is embarassing for now, benchmarking results for processing 500 procs (in the ObjectSpace of an average rails project) yiels the following:

ruby                               user       system    total      real
ruby-1.8.7-p299  (w ParseTree)     10.270000  0.010000  10.280000  ( 10.311430)
ruby-1.8.7-p299  (static scanner)  14.120000  0.080000  14.200000  ( 14.283817)
ruby-1.9.1-p376  (static scanner)  17.380000  0.050000  17.430000  ( 17.405966)
jruby-1.5.2      (static scanner)  21.318000  0.000000  21.318000  ( 21.318000)

Since i’m still pretty new to ragel, the code scanner will probably become better & faster as my knowlegde & skills with ragel improve. Also, instead of generating a pure ruby scanner, we can generate native code (eg. C or java, or whatever) instead. As i’m a C & java noob, this will probably take some time to realize.

Gotchas

Nothing beats ParseTree’s ability to access the runtime AST, it is a very powerful feature. The scanner-based (static) implementation suffer the following gotchas:

1. The source code is everything

Since static code analysis is involved, the subject code needs to physically exist within a file, meaning Proc#source_location must return the expected *[file, lineno]*, the following will not work:

def test
  eval('lambda { x + y }')
end

test.source_location
# >> ["(eval)", 1]

test.to_source
# >> Sourcify::CannotParseEvalCodeError

The same applies to *Blah#to_proc* & *&:blah*:

klass = Class.new do
  def aa(&block); block ; end
  def bb; 1+2; end
end

klass.new.method(:bb).to_proc.to_source
# >> Sourcify::CannotHandleCreatedOnTheFlyProcError

klass.new.aa(&:bb).to_source
# >> Sourcify::CannotHandleCreatedOnTheFlyProcError

2. Multiple matching procs per line error

Sometimes, we may have multiple procs on a line, Sourcify can handle this as long as the subject proc has arity that is unique from others:

# Yup, this works as expected :)
b1 = lambda {|a| a+1 }; b2 = lambda { 1+2 }
b2.to_source
# >> proc { (1 + 2) }

# Nope, this won't work :(
b1 = lambda { 1+2 }; b2 = lambda { 2+3 }
b2.to_source
# >> raises Sourcify::MultipleMatchingProcsPerLineError

As observed, the above does not work when there are multiple procs having the same arity, on the same line. Furthermore, this bug under 1.8.* affects the accuracy of this approach.

To better narrow down the scanning, try:

  • passing in the => … option

    x = lambda { proc { :blah } }
    
    x.to_source
    # >> Sourcify::MultipleMatchingProcsPerLineError
    
    x.to_source(:attached_to => :lambda)
    # >> "proc { proc { :blah } }"
    
  • passing in the => … option

    x = lambda { lambda { :blah } }
    
    x.to_source
    # >> Sourcify::MultipleMatchingProcsPerLineError
    
    x.to_source(:ignore_nested => true)
    # >> "proc { lambda { :blah } }"
    
  • attaching a body matcher proc

    x, y = lambda { def secret; 1; end }, lambda { :blah }
    
    x.to_source
    # >> Sourcify::MultipleMatchingProcsPerLineError
    
    x.to_source{|body| body =~ /^(.*\W|)def\W/ }
    # >> 'proc { def secret; 1; end }'
    

Pls refer to the rdoc for more details.

3. Occasional Racc::ParseError

Under the hood, sourcify relies on RubyParser to yield s-expression, and since RubyParser does not yet fully handle 1.8.7 & 1.9.* syntax, you will get a nasty Racc::ParseError when you have any code that is not compatible with 1.8.6.

Is it really working ??

Sourcify spec suite currently passes in the following rubies:

  • MRI-1.8.*, REE-1.8.7 (both ParseTree & static scanner modes)

  • JRuby-1.6.*, MRI-1.9.* (static scanner ONLY)

Besides its own spec suite, sourcify has also been tested to handle:

ObjectSpace.each_object(Proc) {|o| puts o.to_source }

For projects:

(TODO: the more the merrier)

Projects using it

Projects using sourcify include:

Additional Resources

Sourcify is heavily inspired by many ideas gathered from the ruby community:

The sad fact that Proc#to_source wouldn’t be available in the near future:

Note on Patches/Pull Requests

  • Fork the project.

  • Make your feature addition or bug fix.

  • Add tests for it. This is important so I don’t break it in a future version unintentionally.

  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)

  • Send me a pull request. Bonus points for topic branches.

Copyright © 2010 NgTzeYang. See LICENSE for details.