MakeTextSearch

MakeTextSearch is a tool that let you make full-text search using the engine of your RDBMS easily.

There are a tools like Sphinx or Lucene very powerful and fast, but they require an extra effort to configure and maintain them, because they are tools outside of the RDBMS. Some RDBMS, like PostgreSQL or MySQL, have their own full-text search engine, so the time in configuration and maintenance is lesser.

In this first version we have implemented support for PostgreSQL. In the near future we will implement support for MySQL and more. If the database has no full-text search engine it will use an equivalent using plain SQL.

MakeTextSearch works with Rails 3

Installation

In the Gemfile file add

gem "make-text-search"

After the bundle install you have to generate the migration to create the documents table

rails generate text_search:migration

Usage

In the models where you want to run full-text searchs, you have to declare the indexed fields using has_text_search

class Post < ActiveRecord::Base
  has_text_search :title, :content
end

The fields added to the index can be virtual.

class Post < ActiveRecord::Base
  belongs_to :user

  # Add the user name to the index
  def user_name
    user.try :name
  end

  has_text_search :title, :content, :user_name
end

Filters

The content added to the index can be filtered. Right now there are two filters: :substrings y :strip_html

class Post < ActiveRecord::Base
  has_text_search :title, :filter => :substrings
  has_text_search :content, :intro, :filter => [:strip_html, :substrings]
end

You can use several filters using an array. The order is important. If you use both :substrings and :strip_html, :strip_html should be the first.

:substrings let you search inside the words. For example, the word knowledge can be found with owled if you filter the content with :substrings.

:strip_html removes the HTML tags and it translates HTML entities to its equivalent in UTF-8:

Ir a <a href="http://www.google.es">Google Espa&ntilde;a</a>

will be

Ir a Google España

Language

The documents can be parsed using a language. You can set the default language with config.make_text_search.default_language. The initial value is nil, which means that the documents are parsed in a agnostic way.

If you want to set the default language add this line to the config/application.rb file.

config.make_text_search.default_language = "spanish"

If you want to have a different language for every record you have to implement the text_search_language instance method. For example

class Post < ActiveRecord::Base
  has_text_search :title, :filter => :substrings
  has_text_search :content, :intro, :filter => [:strip_html, :substrings]

  def text_search_language
    case locale
    when "es" "spanish"
    when "en" "english"
    when "de" "german"
    when "it" "italian"
    else
      Rails.application.config.make_text_search.default_language
    end
  end
end

You can get the available languages of your PostgreSQL server with

select * from pg_ts_dict;

Para realizar las búsquedas hay que usar el scope #search_text To perform search you have to use the scope #search_text

Post.search_text("foo")

Post.published.search_text("foo & bar").paginate(:page => params[:page])

The query language is the same used by PostgreSQL. See www.postgresql.org/docs/8.4/static/datatype-textsearch.html#DATATYPE-TSQUERY

Resources

TODO

  • Query builder. Add & and | operators

  • Option :language in #search_text

  • In PostgreSQL, use ts_headline and ts_rank

  • RDoc-ize methods