Xre
This is a rust extension for finding all the matches in a text. It's a simple extension that is used in one part of our codebase and this extension makes it substantially faster.
Why?
The main reason for this extension is that we have a unique use case where we need to find all captures
with their respectful offsets in a large text and we need to do this for a huge number of regexes and texts.
This is a slow operation in ruby since the regular scan
does not provide the actual captures,
but only the offsets. This extension provides the captures and offsets in a single call for a
list of regexes to avoid the overhead of calling the regex engine multiple times.
Moreover, another part of the problem is that we also require the surrounding text of the capture
with some radius, which is also not possible with the regular scan
method without resorting to iterating
the whole text multiple times.
🤓 Another a bit more technical reason for why the rust extension is faster is that ruby strings are indexed in characters(
O(n)
), but rust strings are indexed in bytes(O(1)
). And with a careful iteration over the characters on the rust side(seeregex_list.rs#captures_with_context
andutils.rs#find_char_index
) as we go through the text we can avoid multiple(n
)O(n)
operations that ruby would have to do, reducing the algorithmic complexity fromO(n^2)
toO(n)
for one regex, and fromO(m * n^2)
toO(m * n)
(wherem
andn
are the number of regexes and the number of texts respectfuly) for multiple regexes.
Developing:
The simplest way to develop the gem locally without reinstalling the gem is to just:
- change the
gem "xre"
in theGemfile
togem "xre", path: "xre"
- run
bundle install
and you're good to go. Just change the code(then compile the rust code if you changed it, more on that below),
go into rails c
and hack away.
Another simple way is to:
- change the gem code(then, once again, compile the rust code if you changed it)
- go into
rails c
or the console of your choosing - run
require_relative "xre/lib/xre"
all done.
💡Note: The following part of the readme assumes that you're in the
xre
directory insideclearscope
project.
How to build rust code locally:
rake compile # or just rake
sometimes you might need to clean the build:
rake clean && rake compile
you might need to install rust toolchain for that:
curl https://sh.rustup.rs -sSf | sh
How to run tests:
Ruby tests:
bundle exec rake spec
Rust tests:
cargo test # you will need rust toolchain for that
Rust linting:
# you will need rust toolchain for that
cargo fmt # will format the code
cargo clippy # will suggest improvements, not only style ones
Publishing:
Note: For this to work, make sure you are signed in to rubygems.org as a clearscope organization member.
The codebase includes xre/Rakefile
which in turn defines a gem:native
task that compiles the extension for the x86_64-linux
, x86_64-darwin
, and arm64-darwin
and puts the compiled gems into xre/pkg/
directory, from where one should
gem push xre-<version>-x86_64-linux.gem
to publish the gem.
Note: You will need a docker installed on your machine for that
so to publish the gem you should:
cd xre
rake gem:native
gem push pkg/xre-<version>-<platform>.gem