LucidWorks-Ruby
Ruby bindings for the REST API of the LucidWorks family of search products.
The LucidWorks family of products are search engines that combine the open source search technologies Lucene and Solr with open source crawlers, a management UI and a REST API. The LucidWorks REST API provides a programmatic way to manage collections, data-sources, scheduling and many of the other objects and tasks involved in running a search engine.
Information
You can view the LucidWorks-Ruby documentation in RDoc format here:
rubydoc.info/github/lucidimagination/lucidworks-ruby/master/frames
The LucidWorks REST API is documented here:
lucidworks.lucidimagination.com/display/LWEUG/Rest+API
Bug reports
Where should people file bugs? GitHub? That implies we have open sourced this already. An email address at Lucid?
Installation
Install the gem:
gem install lucid_works
Or add it to your Gemfile, then run bundle install:
gem "lucid_works"
Show Me the Money
This single statement (note the periods) will connect to a LucidWorks server running on the local machine, create a collection called “News” and a data-source called “cnn” for the cnn.com website, then start a crawl. Cut and paste into Irb:
require 'lucid_works'
LucidWorks::Server.new("http://localhost:8888").
create_collection(:name => 'News').
create_datasource(:name => 'cnn',
:crawler => 'lucid.aperture', :type => 'web',
:url => 'http://cnn.com', :crawl_depth => '1').
start_crawl!
Now, how does it work:
Object Model
The LucidWorks object model looks something like this:
Server -+- Collection -+- Datasource -+- Status
| | +- History
| | +- Schedule
| | +- Index
| | +- Crawldata
| | +- Job
| +- Field
| +- Index
| +- Info
| +- Settings
| +- Activity -+- Status
| +- History
|
+- Logs -+- Index -+- Summary
| +- Query -+- Summary
|
+- Crawlers
+- Version
This is what has been modeled so far. The actual REST API is more extensive.
Usage
Server
The starting point for our communication with a LucidWorks server is a LucidWorks::Server object, e.g. for a LucidWorks server running on the local machine, on the standard port:
server = LucidWorks::Server.new("http://localhost:8888")
Collections
Collections are modeled using the LucidWorks::Collection class. LucidWorks::Server has_many :collections, therefore:
To retrieve collections:
@server.collections -> an array LucidWorks::Collection
puts @server.collections.map(&:name)
@server.collection("name") -> a single LucidWorks::Collection
Create a collection:
collection = @server.build_collection(:name => "MY_STUFF")
collection.save
or
collection = @server.create_collection(:name => "MY_STUFF")
Delete a collection:
collection.destroy
Wipe all indexed data from a collection:
collection.empty!
Collection Info
The Collection::Info contains a lot of data about the state of a collection.
info = @server.collection('coll1').info -> a LucidWorks::Collection::Info
info.index_num_docs -> 12345
info.index_size -> "44.3 MB"
Collection Settings
The Collection::Settings class contains indexing and querying settings for the collection.
settings = @server.collection('collection1').settings -> a LucidWorks::Collection::Settings
settings.query_parser -> "lucid"
settings.synonym_list -> ["Lawyer", "Attorney", "one", "1", ...]
Field
Collection has_many :fields. The Field class models data about a collection’s field.
field = @server.collection('collection1').field('body') -> a LucidWorks::Field
field.field_type -> "text_en"
field.facet -> false
Datasources
Collection has_many :datasources. Datasources are modeled using the LucidWorks::Datasource class. They support all the standard ORM methods, e.g.
collection.datasources -> an array of LucidWorks::Datasource
collection.datasource(123) -> a single LucidWorks::Datasource
datasource = collection.create_datasource(
:crawler => 'lucid.aperture',
:type => 'web',
:name => "example.com",
:url => "http://example.com/",
:crawl_depth => 1
)
Note that the latter does not start a crawl of the datasource.
To start a datasource crawling:
datasource.start_crawl!
To stop a datasource crawl:
datasource.stop_crawl!
To delete all the data crawled from a data-source:
datasource.empty!
The ORM
This library implements a simple ORM (object relational model) on top of the LucidWorks REST API which behaves somewhat like ActiveResource/ActiveRecord (if you want to know why we didn’t just use ActiveResource, see the Rationale section).
Base
LucidWorks::Base is the ORM foundation of this library. It supports many of the ActiveRecord style methods. e.g. given a Thing model:
class Thing < LucidWorks::Base
end
Then Thing will have the following class methods:
thing = Thing.new(:attrib => value, :parent => parent) -> unsaved Thing
Thing.create(:attr => value, ..., :parent => parent) -> saved Thing
Thing.find(:all, :parent => parent) -> Array of Thing
Thing.find(id, :parent => parent) -> a Thing
The ‘parent’ must be another LucidWorks::Base model or a LucidWorks::Server; this is only required when the class is used stand-alone. If the model is created/retrieved from an association, this value is set for you automatically.
thing.save -> true/false
thing.destroy
Has_many associations
The has_many association is used to associate a resource with another collection resource. Given:
class Thing < LucidWorks::Base
has_many :others
end
Then
thing.others -> array of Other
thing.other(id) -> an Other
thing.new_other(:attr => val, ...) -> an unsaved Other
thing.create_other(:attr => val, ...) -> saved Other
Has_one associations
The has_one association is used to associate a resource with another singleton resource that is transient, i.e. can be created and destroyed.
class Thing < LucidWorks::Base
has_one :whatnot
end
class Whatnot < LucidWorks::Base
self.singleton = true
belongs_to :thing
end
Then
thing.whatnot -> a retrieved Whatnot
thing.build_whatnot -> an unsaved Whatnot
Belongs_to associations
Te belongs to association augments the model with methods to access its parent. Given:
class Whatnot < LucidWorks::Base
self.singleton = true
belongs_to :thing
end
Then:
whatnot.thing -> A Thing
For more information on association see LucidWorks::Associations::ClassMethods
Schema
A class may have a schema defined as follows:
class ThingWithSchema < LucidWorks::Base
schema do
attribute :string1, :string
attribute :bool1, :boolean
attribute :integer1, :integer
attributes :string2, :string3, :string4
attributes :bool2, :bool3, :type => :boolean
attributes :int2, :int3, :type => :integer
attribute :string_with_values, :values => ['one', 'two']
attribute :dontsendme, :omit_during_update => true
attribute :sendnull, :string, :nil_when_blank => true
end
end
Classes with a schema may have validations applied to its attributes. The default attribute type is :string. See LucidWorks::Schema for more details.
Rationale
Originally this library started out as a set of ActiveResource classes. This required a lot of hacking of ActiveResource as ActiveResource makes a lot of assumptions about the way a REST API should work - it’s basically just designed to talk to Rails applications - and many REST APIs, including this one, don’t conform to those rules. Among the changes required to ActiveResource were:
-
Don’t require attributes always be nested inside :resource => on create and update.
-
Allow client-side generation of a resource ID during create.
-
Support has_one and has_many associations.
However eventually this strategy hit a brick wall that would have been extremely expensive to hurdle. We needed the following features:
-
The ability to talk to the same API on more than one server simultaneously.
-
Support file uploads using multi-part post.
Given the design of ActiveResource these would have been expensive to implement and it became simpler to just write a simple ORM by marrying ActiveModel and RestClient.
License
Copyright 2012 Lucid Imagination lucidimagination.com
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this software except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.