Xre

This is a rust extension for finding all the matches in a text. It's a simple extension that is used in one part of our codebase and this extension makes it substantially faster.

Why?

The main reason for this extension is that we have a unique use case where we need to find all captures with their respectful offsets in a large text and we need to do this for a huge number of regexes and texts. This is a slow operation in ruby since the regular scan does not provide the actual captures, but only the offsets. This extension provides the captures and offsets in a single call for a list of regexes to avoid the overhead of calling the regex engine multiple times.

Moreover, another part of the problem is that we also require the surrounding text of the capture with some radius, which is also not possible with the regular scan method without resorting to iterating the whole text multiple times.

🤓 Another a bit more technical reason for why the rust extension is faster is that ruby strings are indexed in characters(O(n)), but rust strings are indexed in bytes(O(1)). And with a careful iteration over the characters on the rust side(see regex_list.rs#captures_with_context and utils.rs#find_char_index) as we go through the text we can avoid multiple(n) O(n) operations that ruby would have to do, reducing the algorithmic complexity from O(n^2) to O(n) for one regex, and from O(m * n^2) to O(m * n)(where m and n are the number of regexes and the number of texts respectfuly) for multiple regexes.

Developing:

The simplest way to develop the gem locally without reinstalling the gem is to just:

  1. change the gem "xre" in the Gemfile to gem "xre", path: "xre"
  2. run bundle install

and you're good to go. Just change the code(then compile the rust code if you changed it, more on that below), go into rails c and hack away.

Another simple way is to:

  1. change the gem code(then, once again, compile the rust code if you changed it)
  2. go into rails c or the console of your choosing
  3. run require_relative "xre/lib/xre"

all done.

💡Note: The following part of the readme assumes that you're in the xre directory inside clearscope project.

How to build rust code locally:

rake compile # or just rake

sometimes you might need to clean the build:

rake clean && rake compile

you might need to install rust toolchain for that:

curl https://sh.rustup.rs -sSf | sh

How to run tests:

Ruby tests:

bundle exec rake spec

Rust tests:

cargo test # you will need rust toolchain for that

Rust linting:

# you will need rust toolchain for that
cargo fmt # will format the code
cargo clippy # will suggest improvements, not only style ones

Publishing:

Note: For this to work, make sure you are signed in to rubygems.org as a clearscope organization member.

The codebase includes xre/Rakefile which in turn defines a gem:native task that compiles the extension for the x86_64-linux, x86_64-darwin, and arm64-darwin and puts the compiled gems into xre/pkg/ directory, from where one should gem push xre-<version>-x86_64-linux.gem to publish the gem.

Note: You will need a docker installed on your machine for that

so to publish the gem you should:

cd xre
rake gem:native
gem push pkg/xre-<version>-<platform>.gem