by

by is a library preloader for Ruby designed to speed up process startup. It uses a client/server approach, where the server loads the libraries and listens on a UNIX socket, and the client connects to that socket to run a process. For each client connection, the server forks a worker process, which uses the current directory, stdin, stdout, stderr, and environment of the client process. The worker process then processes the arguments provided by the client. The client process waits until the worker process returns an exit code and closes the socket, and uses exit code 0 (normal exit) if the worker process indicates success, or exit code 1 (error) if the worker process indicates an error.

Installation

gem install by

Source Code

Source code is available on GitHub at github.com/jeremyevans/by

Usage

To use by, you first start by-server, passing in libraries you would like to preload.

$ by-server sequel roda capybara

Then you can run ruby with the libraries preloaded using by:

$ by -e 'p [Sequel, Roda, Capybara]'
[Sequel, Roda, Capybara]

The advantage of using by is that the libraries are already loaded, so Ruby doesn’t have to find the libraries and parse the files in each library on process startup. Here’s a performance comparison:

$ /usr/bin/time ruby -e 'require "sequel"; require "roda"; require "capybara"'
        1.67 real         0.93 user         0.66 sys

$ /usr/bin/time   by -e 'require "sequel"; require "roda"; require "capybara"'
        0.37 real         0.20 user         0.15 sys

The more libraries your program uses that you can preload in the server program, the greater the speedup this offers.

Speeding Things Up Even More By Avoiding Rubygems

Loading Rubygems is by far the slowest thing that Ruby does during process initialization:

$ /usr/bin/time ruby -e ''
        0.25 real         0.11 user         0.14 sys

$ /usr/bin/time ruby --disable-gems -e ''
        0.03 real         0.02 user         0.01 sys

You can speedup by by making it not require rubygems, since it only needs the socket standard library. The only issue with that is that by is distributed as a gem. There are a few workarounds.

  1. Create a shell alias. How you create the alias will depend on the shell you are using, but here’s some Ruby code that will output an alias command that will work for most shells:

    require 'rbconfig'
    by = Gem.activate_bin_path("by", "by")
    puts "alias by='#{RbConfig.ruby} --disable-gems #{by}'"
    

    Note that one issue with using a shell alias is that it only works when loaded and used by the shell, it won’t work if executed by another program.

  2. Copy the by program and modify the shebang line to use the path to your ruby binary and --disable-gems. You can get the path to the by program with the following Ruby code.

    puts Gem.activate_bin_path("by", "by")
    

    You would copy that file to somewhere in your $PATH before where the rubygems wrapper is installed, and then modify the shebang.

  3. Add your own shell wrapper program that calls by. Here’s some example Ruby code that may work, though whether it does depends on your shell.

    require 'rbconfig'
    by = Gem.activate_bin_path("by", "by")
    File.binwrite("by", "#!/bin/sh\nexec #{RbConfig.ruby} --disable-gems #{by} \"$@\"\n")
    File.chmod(0755, "by")
    

With each of these approaches, you can get much faster program execution:

$ /usr/bin/time ./by -e 'require "sequel"; require "roda"; require "capybara"'
        0.08 real         0.05 user         0.03 sys

As you can see, by avoiding Rubygems, using by to require the three libraries executes three times faster than Ruby itself starts if you are using Rubygems.

With each of these approaches, you need to update the alias/wrapper any time you update the by gem when the by program itself has changed. However, the by program itself is quite small and simple and unlikely to change.

Argument Handling

by-server treats all arguments provided on the command line as arguments to Kernel#require.

by passes all arguments to the worker process over the UNIX socket.

The worker process handles arguments passed by the client in the following way:

  • If first argument is m or matches /\.rb:\d+\z, uses the m gem to run a single minitest test by line number, waiting until after the test is run so that it can return the correct exit code.

  • If first argument is irb, starts an IRB shell with remaining arguments in ARGV.

  • If first argument is -e, evaluates second argument as Ruby code, with remaining arguments in ARGV.

  • If no arguments are given, evaluates Ruby code provided on stdin.

  • Otherwise, treats first argument as a file name, expands the file path, and then requires that. If Minitest is loaded and set to autorun, waits until after Minitest runs tests, so it can return the correct exit code. If Minitest is not loaded or not set to autorun, exits after the file is required.

Restarting the Server

If by-server is already running, running by-server will shutdown the existing server and start a new server with the arguments it is given.

Stopping the Server

Running by-server stop will stop an existing server without starting a new server. If no server is running, by-server stop will exit without doing anything.

You can also send a TERM signal to the by-server process to shut the server down gracefully. Be aware that by default, by-server daemonizes, so the pid of the started by-server will not be the pid by-server uses to run. For that reason, it is recommended to use by-server stop to stop the server.

Running Multiple Servers

Manually

You can run multiple by-server processes concurrently by making sure they each use a separate UNIX socket, which you can configure with the BY_SOCKET environment variable:

$ BY_SOCKET=~/.by_sequel_socket by-server sequel
$ BY_SOCKET=~/.by_roda_socket by-server roda
$ BY_SOCKET=~/.by_sequel_socket by -e 'p [defined?(Sequel), defined?(Roda)]'
["constant", nil]
$ BY_SOCKET=~/.by_roda_socket by -e 'p [defined?(Sequel), defined?(Roda)]'
[nil, "constant"]

Using by-session

In many cases, it can be helpful to have a separate server process for each application directory. by-session exists to make this easier. by-session will call by-server with the arguments it is given, using a socket in the current directory by default, and then open a new shell. When the shell exits, by-session will stop the by-server it spawned.

If the directory in which you are running by-session has a Gemfile, you could add a file named .by-session-setup.rb in your home directory, which contains:

require 'bundler/setup'
Bundler.require(:default)

When you to startup a by-session shell for the directory using the Gemfile, you can use:

$ by-session ~/.by-session-setup

This will load all gems in the Gemfile into the by-server process. If you are doing this, you must be careful to only run this in a directory that you trust.

If you don’t want to specify the ~/.by-session-setup argument every time you start by-session, you can use the BY_SERVER_AUTO_REQUIRE environment variable.

Environment Variables

BY_SOCKET

The path to the UNIX socket to listen on (by-server) or connect to (by).

DEBUG

If set to log, logs $LOADED_FEATURES to stdout after requiring libraries (by-server) or before worker process shutdown (by).

by-server-Specific Environment Variables

BY_SERVER_AUTO_REQUIRE

Whitespace separated list of libraries for by-server to require, before it requires command line arguments.

BY_SERVER_NO_DAEMON

Do not daemonize if set.

BY_SERVER_DAEMON_NO_CHDIR

Do not change directory to / when daemonizing if set.

BY_SERVER_DAEMON_NO_REDIR_STDIO

Do not redirect stdio to /dev/null when daemonizing if set.

by-server Signals

QUIT

Close the socket (this is what by-server stop uses).

TERM

Delete the socket path and then close the socket.

Internals

There are two classes, By::Server and By::Worker. By::Server listens on the UNIX socket, forking worker processes for each connection. By::Worker is run in each worker process handling receiving data from the by command line program.

The by command line program is self-contained, there is no Ruby class for the behavior, to make sure startup is as fast as possible. by-session is also self-contained.

Customization

For custom handling of arguments, you can require by/server and use the By::Server.with_argument_handler method. For example, if you wanted to add support for an initial -I option to modify the load path, and then use the standard argument handling:

require 'by/server'

By::Server.with_argument_handler do |args|
  if args[0] == '-I'
    args.shift
    $LOAD_PATH.unshift(args.shift)
  end
  super(args)
end.new.run

Note that if you do this, you are responsible for making sure to correctly communicate with the client socket. Otherwise, it’s possible the client socket may hang waiting on a response. Please review the default argument handling in lib/by/worker.rb before writing your own argument handler.

Security

As with any program that forks without executing, the memory layout is shared by the client and the server program, which can lead to Blind Return Oriented Programming (BROP) attacks. You should avoid using by to run a program that deals with any untrusted input. by makes a deliberate choice to trade security to make process startup as fast as possible.

The server socket is set to mode 0600, so it is only readable and writable by the same user.

Name

The name by was chosen because it is ruby with the ru preloaded.

Similar Projects

License

MIT

Author

Jeremy Evans <[email protected]>