SimpleSearch - Simple vector space search library

What is SimpleSearch?


SimpleSearch is a simple vector space text search engine.

Installation


Prerequisites

* Ruby 1.8 (http://www.ruby-lang.org/)

Optional

* RubyGems (http://rubygems.rubyforge.org)

Installing SimpleSearch


RubyGems (rubygems.rubyforge.org):

gem install SimpleSearch

…or…

.tar.gz installation:

ruby setup.rb #not yet available

Using SimpleSearch


SimpleSearch comes with a command line program that was primarily written as an example of how to use the API but might actually be useful.

To run the command line program, simply type: $ search-simple –help

An example: $ search-simple –cache=/tmp/mycache –dir=/usr/local/lib/ruby/gems/1.8/doc –extensions=html markup

This will cause search-simple to (re)index all of the files with a .html extension in your RubyGems rdoc directory and then search them for the words “markup” and “html”. The search indices will be stored in /tmp/mycache.

At the heart of SimpleSearch is, of course, an API that can be embedded in other programs. The code of SimpleSearch was originally created by Dave Thomas as a search mechanism for his RubLog (rubyforge.org/projects/rublog) weblogging package. The API can be used as follows:

require ‘search/simple’ Search::Simple::Searcher.load(content_for_indexing(options), “/tmp/search_cache”) contents = Search::Simple::Contents.new # silly example

Dir['**/*'].each do |file_name|

File.open(file_name) do |file|

		contents << Search::Simple::Content.new(file.read, File.expand_path(file_name), file.mtime)

end end sr = s.find_words([‘some’, ‘keywords’, ‘to’, ‘search’, ‘for’]) if sr.contains_matches sr.results.sort.each do |res| puts “#resres.score:#resres.name” end else puts “No matches” end

Credits


Almost all of this code was written by Dave Thomas (pragprog.com/pragdave). The original code was a complete rewrite at an attempt that Chad Fowler (www.chadfowler.com) made to do a vector space search for RubLog. Chad Fowler adapted Dave’s working RubLog code to be Rublog-independent and created what is now SimpleSearch out of it.