by
by is a library preloader for Ruby designed to speed up process startup. It uses a client/server approach, where the server loads the libraries and listens on a UNIX socket, and the client connects to that socket to run a process. For each client connection, the server forks a worker process, which uses the current directory, stdin, stdout, stderr, and environment of the client process. The worker process then processes the arguments provided by the client. The client process waits until the worker process returns an exit code and closes the socket, and uses exit code 0 (normal exit) if the worker process indicates success, or exit code 1 (error) if the worker process indicates an error.
Installation
gem install by
Source Code
Source code is available on GitHub at github.com/jeremyevans/by
Usage
To use by
, you first start by-server
, passing in libraries you would like to preload.
$ by-server sequel roda capybara
Then you can run ruby with the libraries preloaded using by
:
$ by -e 'p [Sequel, Roda, Capybara]'
[Sequel, Roda, Capybara]
The advantage of using by
is that the libraries are already loaded, so Ruby doesn’t have to find the libraries and parse the files in each library on process startup. Here’s a performance comparison:
$ /usr/bin/time ruby -e 'require "sequel"; require "roda"; require "capybara"'
1.67 real 0.93 user 0.66 sys
$ /usr/bin/time by -e 'require "sequel"; require "roda"; require "capybara"'
0.37 real 0.20 user 0.15 sys
The more libraries your program uses that you can preload in the server program, the greater the speedup this offers.
Speeding Things Up Even More By Avoiding Rubygems
Loading Rubygems is by far the slowest thing that Ruby does during process initialization:
$ /usr/bin/time ruby -e ''
0.25 real 0.11 user 0.14 sys
$ /usr/bin/time ruby --disable-gems -e ''
0.03 real 0.02 user 0.01 sys
You can speedup by
by making it not require rubygems, since it only needs the socket
standard library. The only issue with that is that by
is distributed as a gem. There are a few workarounds.
-
Create a shell alias. How you create the alias will depend on the shell you are using, but here’s some Ruby code that will output an alias command that will work for most shells:
require 'rbconfig' by = Gem.activate_bin_path("by", "by") puts "alias by='#{RbConfig.ruby} --disable-gems #{by}'"
Note that one issue with using a shell alias is that it only works when loaded and used by the shell, it won’t work if executed by another program.
-
Copy the
by
program and modify the shebang line to use the path to yourruby
binary and--disable-gems
. You can get the path to theby
program with the following Ruby code.puts Gem.activate_bin_path("by", "by")
You would copy that file to somewhere in your
$PATH
before where the rubygems wrapper is installed, and then modify the shebang. -
Add your own shell wrapper program that calls
by
. Here’s some example Ruby code that may work, though whether it does depends on your shell.require 'rbconfig' by = Gem.activate_bin_path("by", "by") File.binwrite("by", "#!/bin/sh\nexec #{RbConfig.ruby} --disable-gems #{by} \"$@\"\n") File.chmod(0755, "by")
With each of these approaches, you can get much faster program execution:
$ /usr/bin/time ./by -e 'require "sequel"; require "roda"; require "capybara"'
0.08 real 0.05 user 0.03 sys
As you can see, by avoiding Rubygems, using by
to require the three libraries executes three times faster than Ruby itself starts if you are using Rubygems.
With each of these approaches, you need to update the alias/wrapper any time you update the by
gem when the by
program itself has changed. However, the by
program itself is quite small and simple and unlikely to change.
Argument Handling
by-server
treats all arguments provided on the command line as arguments to Kernel#require
.
by
passes all arguments to the worker process over the UNIX socket.
The worker process handles arguments passed by the client in the following way:
-
If first argument is
m
or matches/\.rb:\d+\z
, uses them
gem to run a single minitest test by line number, waiting until after the test is run so that it can return the correct exit code. -
If first argument is
irb
, starts an IRB shell with remaining arguments in ARGV. -
If first argument is
-e
, evaluates second argument as Ruby code, with remaining arguments in ARGV. -
If no arguments are given, evaluates Ruby code provided on stdin.
-
Otherwise, treats first argument as a file name, expands the file path, and then requires that. If Minitest is loaded and set to autorun, waits until after Minitest runs tests, so it can return the correct exit code. If Minitest is not loaded or not set to autorun, exits after the file is required.
Restarting the Server
If by-server
is already running, running by-server
will shutdown the existing server and start a new server with the arguments it is given.
Stopping the Server
Running by-server stop
will stop an existing server without starting a new server. If no server is running, by-server stop
will exit without doing anything.
You can also send a TERM
signal to the by-server
process to shut the server down gracefully. Be aware that by default, by-server
daemonizes, so the pid of the started by-server
will not be the pid by-server
uses to run. For that reason, it is recommended to use by-server stop
to stop the server.
Running Multiple Servers
Manually
You can run multiple by-server
processes concurrently by making sure they each use a separate UNIX socket, which you can configure with the BY_SOCKET
environment variable:
$ BY_SOCKET=~/.by_sequel_socket by-server sequel
$ BY_SOCKET=~/.by_roda_socket by-server roda
$ BY_SOCKET=~/.by_sequel_socket by -e 'p [defined?(Sequel), defined?(Roda)]'
["constant", nil]
$ BY_SOCKET=~/.by_roda_socket by -e 'p [defined?(Sequel), defined?(Roda)]'
[nil, "constant"]
Using by-session
In many cases, it can be helpful to have a separate server process for each application directory. by-session
exists to make this easier. by-session
will call by-server
with the arguments it is given, using a socket in the current directory by default, and then open a new shell. When the shell exits, by-session
will stop the by-server
it spawned.
If the directory in which you are running by-session
has a Gemfile
, you could add a file named .by-session-setup.rb
in your home directory, which contains:
require 'bundler/setup'
Bundler.require(:default)
When you to startup a by-session
shell for the directory using the Gemfile
, you can use:
$ by-session ~/.by-session-setup
This will load all gems in the Gemfile
into the by-server
process. If you are doing this, you must be careful to only run this in a directory that you trust.
If you don’t want to specify the ~/.by-session-setup
argument every time you start by-session
, you can use the BY_SERVER_AUTO_REQUIRE
environment variable.
Environment Variables
BY_SOCKET
-
The path to the UNIX socket to listen on (
by-server
) or connect to (by
). DEBUG
-
If set to
log
, logs$LOADED_FEATURES
to stdout after requiring libraries (by-server
) or before worker process shutdown (by
).
by-server
-Specific Environment Variables
BY_SERVER_AUTO_REQUIRE
-
Whitespace separated list of libraries for
by-server
to require, before it requires command line arguments. BY_SERVER_NO_DAEMON
-
Do not daemonize if set.
BY_SERVER_DAEMON_NO_CHDIR
-
Do not change directory to
/
when daemonizing if set. BY_SERVER_DAEMON_NO_REDIR_STDIO
-
Do not redirect stdio to
/dev/null
when daemonizing if set.
by-server
Signals
QUIT
-
Close the socket (this is what
by-server stop
uses). TERM
-
Delete the socket path and then close the socket.
Internals
There are two classes, By::Server
and By::Worker
. By::Server
listens on the UNIX socket, forking worker processes for each connection. By::Worker
is run in each worker process handling receiving data from the by
command line program.
The by
command line program is self-contained, there is no Ruby class for the behavior, to make sure startup is as fast as possible. by-session
is also self-contained.
Customization
For custom handling of arguments, you can require by/server
and use the By::Server.with_argument_handler
method. For example, if you wanted to add support for an initial -I
option to modify the load path, and then use the standard argument handling:
require 'by/server'
By::Server.with_argument_handler do |args|
if args[0] == '-I'
args.shift
$LOAD_PATH.unshift(args.shift)
end
super(args)
end.new.run
Note that if you do this, you are responsible for making sure to correctly communicate with the client socket. Otherwise, it’s possible the client socket may hang waiting on a response. Please review the default argument handling in lib/by/worker.rb
before writing your own argument handler.
Security
As with any program that forks without executing, the memory layout is shared by the client and the server program, which can lead to Blind Return Oriented Programming (BROP) attacks. You should avoid using by
to run a program that deals with any untrusted input. by
makes a deliberate choice to trade security to make process startup as fast as possible.
The server socket is set to mode 0600, so it is only readable and writable by the same user.
Name
The name by
was chosen because it is ruby
with the ru
preloaded.
Similar Projects
-
Spring: github.com/rails/spring
-
Spinoff: github.com/bernd/spinoff
License
MIT
Author
Jeremy Evans <[email protected]>