25 KiB
Elasticsearch::Model
The elasticsearch-model
library builds on top of the
the elasticsearch
library.
It aims to simplify integration of Ruby classes ("models"), commonly found e.g. in Ruby on Rails applications, with the Elasticsearch search and analytics engine.
Compatibility
This library is compatible with Ruby 1.9.3 and higher.
The library version numbers follow the Elasticsearch major versions, and the master
branch
is compatible with the Elasticsearch master
branch, therefore, with the next major version.
Rubygem | Elasticsearch | |
---|---|---|
0.1 | → | 1.x |
2.x | → | 2.x |
5.x | → | 5.x |
6.x | → | 6.x |
master | → | master |
Installation
Install the package from Rubygems:
gem install elasticsearch-model
To use an unreleased version, either add it to your Gemfile
for Bundler:
gem 'elasticsearch-model', git: 'git://github.com/elastic/elasticsearch-rails.git', branch: '5.x'
or install it from a source code checkout:
git clone https://github.com/elastic/elasticsearch-rails.git
cd elasticsearch-rails/elasticsearch-model
bundle install
rake install
Usage
Let's suppose you have an Article
model:
require 'active_record'
ActiveRecord::Base.establish_connection( adapter: 'sqlite3', database: ":memory:" )
ActiveRecord::Schema.define(version: 1) { create_table(:articles) { |t| t.string :title } }
class Article < ActiveRecord::Base; end
Article.create title: 'Quick brown fox'
Article.create title: 'Fast black dogs'
Article.create title: 'Swift green frogs'
Setup
To add the Elasticsearch integration for this model, require elasticsearch/model
and include the main module in your class:
require 'elasticsearch/model'
class Article < ActiveRecord::Base
include Elasticsearch::Model
end
This will extend the model with functionality related to Elasticsearch.
Feature Extraction Pattern
Instead of including the Elasticsearch::Model
module directly in your model,
you can include it in a "concern" or "trait" module, which is quite common pattern in Rails applications,
using e.g. ActiveSupport::Concern
as the instrumentation:
# In: app/models/concerns/searchable.rb
#
module Searchable
extend ActiveSupport::Concern
included do
include Elasticsearch::Model
mapping do
# ...
end
def self.search(query)
# ...
end
end
end
# In: app/models/article.rb
#
class Article
include Searchable
end
The __elasticsearch__
Proxy
The Elasticsearch::Model
module contains a big amount of class and instance methods to provide
all its functionality. To prevent polluting your model namespace, this functionality is primarily
available via the __elasticsearch__
class and instance level proxy methods;
see the Elasticsearch::Model::Proxy
class documentation for technical information.
The module will include important methods, such as search
, into the class or module only
when they haven't been defined already. Following two calls are thus functionally equivalent:
Article.__elasticsearch__.search 'fox'
Article.search 'fox'
See the Elasticsearch::Model
module documentation for technical information.
The Elasticsearch client
The module will set up a client,
connected to localhost:9200
, by default. You can access and use it as any other Elasticsearch::Client
:
Article.__elasticsearch__.client.cluster.health
# => { "cluster_name"=>"elasticsearch", "status"=>"yellow", ... }
To use a client with different configuration, just set up a client for the model:
Article.__elasticsearch__.client = Elasticsearch::Client.new host: 'api.server.org'
Or configure the client for all models:
Elasticsearch::Model.client = Elasticsearch::Client.new log: true
You might want to do this during your application bootstrap process, e.g. in a Rails initializer.
Please refer to the
elasticsearch-transport
library documentation for all the configuration options, and to the
elasticsearch-api
library documentation
for information about the Ruby client API.
Importing the data
The first thing you'll want to do is importing your data into the index:
Article.import
# => 0
It's possible to import only records from a specific scope
or query
, transform the batch with the transform
and preprocess
options, or re-create the index by deleting it and creating it with correct mapping with the force
option -- look for examples in the method documentation.
No errors were reported during importing, so... let's search the index!
Searching
For starters, we can try the "simple" type of search:
response = Article.search 'fox dogs'
response.took
# => 3
response.results.total
# => 2
response.results.first._score
# => 0.02250402
response.results.first._source.title
# => "Quick brown fox"
Search results
The returned response
object is a rich wrapper around the JSON returned from Elasticsearch,
providing access to response metadata and the actual results ("hits").
Each "hit" is wrapped in the Result
class, and provides method access
to its properties via Hashie::Mash
.
The results
object supports the Enumerable
interface:
response.results.map { |r| r._source.title }
# => ["Quick brown fox", "Fast black dogs"]
response.results.select { |r| r.title =~ /^Q/ }
# => [#<Elasticsearch::Model::Response::Result:0x007 ... "_source"=>{"title"=>"Quick brown fox"}}>]
In fact, the response
object will delegate Enumerable
methods to results
:
response.any? { |r| r.title =~ /fox|dog/ }
# => true
To use Array
's methods (including any ActiveSupport extensions), just call to_a
on the object:
response.to_a.last.title
# "Fast black dogs"
Search results as database records
Instead of returning documents from Elasticsearch, the records
method will return a collection
of model instances, fetched from the primary database, ordered by score:
response.records.to_a
# Article Load (0.3ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1, 2)
# => [#<Article id: 1, title: "Quick brown fox">, #<Article id: 2, title: "Fast black dogs">]
The returned object is the genuine collection of model instances returned by your database,
i.e. ActiveRecord::Relation
for ActiveRecord, or Mongoid::Criteria
in case of MongoDB.
This allows you to chain other methods on top of search results, as you would normally do:
response.records.where(title: 'Quick brown fox').to_a
# Article Load (0.2ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1, 2) AND "articles"."title" = 'Quick brown fox'
# => [#<Article id: 1, title: "Quick brown fox">]
response.records.records.class
# => ActiveRecord::Relation::ActiveRecord_Relation_Article
The ordering of the records by score will be preserved, unless you explicitly specify a different order in your model query language:
response.records.order(:title).to_a
# Article Load (0.2ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1, 2) ORDER BY "articles".title ASC
# => [#<Article id: 2, title: "Fast black dogs">, #<Article id: 1, title: "Quick brown fox">]
The records
method returns the real instances of your model, which is useful when you want to access your
model methods -- at the expense of slowing down your application, of course.
In most cases, working with results
coming from Elasticsearch is sufficient, and much faster. See the
elasticsearch-rails
library for more information about compatibility with the Ruby on Rails framework.
When you want to access both the database records
and search results
, use the each_with_hit
(or map_with_hit
) iterator:
response.records.each_with_hit { |record, hit| puts "* #{record.title}: #{hit._score}" }
# * Quick brown fox: 0.02250402
# * Fast black dogs: 0.02250402
Searching multiple models
It is possible to search across multiple models with the module method:
Elasticsearch::Model.search('fox', [Article, Comment]).results.to_a.map(&:to_hash)
# => [
# {"_index"=>"articles", "_type"=>"article", "_id"=>"1", "_score"=>0.35136628, "_source"=>...},
# {"_index"=>"comments", "_type"=>"comment", "_id"=>"1", "_score"=>0.35136628, "_source"=>...}
# ]
Elasticsearch::Model.search('fox', [Article, Comment]).records.to_a
# Article Load (0.3ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" IN (1)
# Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."id" IN (1,5)
# => [#<Article id: 1, title: "Quick brown fox">, #<Comment id: 1, body: "Fox News">, ...]
By default, all models which include the Elasticsearch::Model
module are searched.
NOTE: It is not possible to chain other methods on top of the records
object, since it
is a heterogenous collection, with models potentially backed by different databases.
Pagination
You can implement pagination with the from
and size
search parameters. However, search results
can be automatically paginated with the kaminari
or
will_paginate
gems.
(The pagination gems must be added before the Elasticsearch gems in your Gemfile,
or loaded first in your application.)
If Kaminari or WillPaginate is loaded, use the familiar paging methods:
response.page(2).results
response.page(2).records
In a Rails controller, use the params[:page]
parameter to paginate through results:
@articles = Article.search(params[:q]).page(params[:page]).records
@articles.current_page
# => 2
@articles.next_page
# => 3
To initialize and include the Kaminari pagination support manually:
Kaminari::Hooks.init if defined?(Kaminari::Hooks)
Elasticsearch::Model::Response::Response.__send__ :include, Elasticsearch::Model::Response::Pagination::Kaminari
The Elasticsearch DSL
In most situations, you'll want to pass the search definition in the Elasticsearch domain-specific language to the client:
response = Article.search query: { match: { title: "Fox Dogs" } },
highlight: { fields: { title: {} } }
response.results.first.highlight.title
# ["Quick brown <em>fox</em>"]
You can pass any object which implements a to_hash
method, which is called automatically,
so you can use a custom class or your favourite JSON builder to build the search definition:
require 'jbuilder'
query = Jbuilder.encode do |json|
json.query do
json.match do
json.title do
json.query "fox dogs"
end
end
end
end
response = Article.search query
response.results.first.title
# => "Quick brown fox"
Also, you can use the elasticsearch-dsl
library, which provides a specialized Ruby API for
the Elasticsearch Query DSL:
require 'elasticsearch/dsl'
query = Elasticsearch::DSL::Search.search do
query do
match :title do
query 'fox dogs'
end
end
end
response = Article.search query
response.results.first.title
# => "Quick brown fox"
Index Configuration
For proper search engine function, it's often necessary to configure the index properly.
The Elasticsearch::Model
integration provides class methods to set up index settings and mappings.
NOTE: Elasticsearch will automatically create an index when a document is indexed,
with default settings and mappings. Create the index in advance with the create_index!
method, so your index configuration is respected.
class Article
settings index: { number_of_shards: 1 } do
mappings dynamic: 'false' do
indexes :title, analyzer: 'english', index_options: 'offsets'
end
end
end
Article.mappings.to_hash
# => {
# :article => {
# :dynamic => "false",
# :properties => {
# :title => {
# :type => "string",
# :analyzer => "english",
# :index_options => "offsets"
# }
# }
# }
# }
Article.settings.to_hash
# { :index => { :number_of_shards => 1 } }
You can use the defined settings and mappings to create an index with desired configuration:
Article.__elasticsearch__.client.indices.delete index: Article.index_name rescue nil
Article.__elasticsearch__.client.indices.create \
index: Article.index_name,
body: { settings: Article.settings.to_hash, mappings: Article.mappings.to_hash }
There's a shortcut available for this common operation (convenient e.g. in tests):
Article.__elasticsearch__.create_index! force: true
Article.__elasticsearch__.refresh_index!
By default, index name and document type will be inferred from your class name, you can set it explicitly, however:
class Article
index_name "articles-#{Rails.env}"
document_type "post"
end
Updating the Documents in the Index
Usually, we need to update the Elasticsearch index when records in the database are created, updated or deleted;
use the index_document
, update_document
and delete_document
methods, respectively:
Article.first.__elasticsearch__.index_document
# => {"ok"=>true, ... "_version"=>2}
Automatic Callbacks
You can automatically update the index whenever the record changes, by including
the Elasticsearch::Model::Callbacks
module in your model:
class Article
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
end
Article.first.update_attribute :title, 'Updated!'
Article.search('*').map { |r| r.title }
# => ["Updated!", "Lime green frogs", "Fast black dogs"]
The automatic callback on record update keeps track of changes in your model
(via ActiveModel::Dirty
-compliant implementation),
and performs a partial update when this support is available.
The automatic callbacks are implemented in database adapters coming with Elasticsearch::Model
. You can easily
implement your own adapter: please see the relevant chapter below.
Custom Callbacks
In case you would need more control of the indexing process, you can implement these callbacks yourself,
by hooking into after_create
, after_save
, after_update
or after_destroy
operations:
class Article
include Elasticsearch::Model
after_save { logger.debug ["Updating document... ", index_document ].join }
after_destroy { logger.debug ["Deleting document... ", delete_document].join }
end
For ActiveRecord-based models, use the after_commit
callback to protect
your data against inconsistencies caused by transaction rollbacks:
class Article < ActiveRecord::Base
include Elasticsearch::Model
after_commit on: [:create] do
__elasticsearch__.index_document if self.published?
end
after_commit on: [:update] do
__elasticsearch__.update_document if self.published?
end
after_commit on: [:destroy] do
__elasticsearch__.delete_document if self.published?
end
end
Asynchronous Callbacks
Of course, you're still performing an HTTP request during your database transaction, which is not optimal for large-scale applications. A better option would be to process the index operations in background, with a tool like Resque or Sidekiq:
class Article
include Elasticsearch::Model
after_save { Indexer.perform_async(:index, self.id) }
after_destroy { Indexer.perform_async(:delete, self.id) }
end
An example implementation of the Indexer
worker class could look like this:
class Indexer
include Sidekiq::Worker
sidekiq_options queue: 'elasticsearch', retry: false
Logger = Sidekiq.logger.level == Logger::DEBUG ? Sidekiq.logger : nil
Client = Elasticsearch::Client.new host: 'localhost:9200', logger: Logger
def perform(operation, record_id)
logger.debug [operation, "ID: #{record_id}"]
case operation.to_s
when /index/
record = Article.find(record_id)
Client.index index: 'articles', type: 'article', id: record.id, body: record.__elasticsearch__.as_indexed_json
when /delete/
Client.delete index: 'articles', type: 'article', id: record_id
else raise ArgumentError, "Unknown operation '#{operation}'"
end
end
end
Start the Sidekiq workers with bundle exec sidekiq --queue elasticsearch --verbose
and
update a model:
Article.first.update_attribute :title, 'Updated'
You'll see the job being processed in the console where you started the Sidekiq worker:
Indexer JID-eb7e2daf389a1e5e83697128 DEBUG: ["index", "ID: 7"]
Indexer JID-eb7e2daf389a1e5e83697128 INFO: PUT http://localhost:9200/articles/article/1 [status:200, request:0.004s, query:n/a]
Indexer JID-eb7e2daf389a1e5e83697128 DEBUG: > {"id":1,"title":"Updated", ...}
Indexer JID-eb7e2daf389a1e5e83697128 DEBUG: < {"ok":true,"_index":"articles","_type":"article","_id":"1","_version":6}
Indexer JID-eb7e2daf389a1e5e83697128 INFO: done: 0.006 sec
Model Serialization
By default, the model instance will be serialized to JSON using the as_indexed_json
method,
which is defined automatically by the Elasticsearch::Model::Serializing
module:
Article.first.__elasticsearch__.as_indexed_json
# => {"id"=>1, "title"=>"Quick brown fox"}
If you want to customize the serialization, just implement the as_indexed_json
method yourself,
for instance with the as_json
method:
class Article
include Elasticsearch::Model
def as_indexed_json(options={})
as_json(only: 'title')
end
end
Article.first.as_indexed_json
# => {"title"=>"Quick brown fox"}
The re-defined method will be used in the indexing methods, such as index_document
.
Please note that in Rails 3, you need to either set include_root_in_json: false
, or prevent adding
the "root" in the JSON representation with other means.
Relationships and Associations
When you have a more complicated structure/schema, you need to customize the as_indexed_json
method -
or perform the indexing separately, on your own.
For example, let's have an Article
model, which has_many Comment
s,
Author
s and Categories
. We might want to define the serialization like this:
def as_indexed_json(options={})
self.as_json(
include: { categories: { only: :title},
authors: { methods: [:full_name], only: [:full_name] },
comments: { only: :text }
})
end
Article.first.as_indexed_json
# => { "id" => 1,
# "title" => "First Article",
# "created_at" => 2013-12-03 13:39:02 UTC,
# "updated_at" => 2013-12-03 13:39:02 UTC,
# "categories" => [ { "title" => "One" } ],
# "authors" => [ { "full_name" => "John Smith" } ],
# "comments" => [ { "text" => "First comment" } ] }
Of course, when you want to use the automatic indexing callbacks, you need to hook into the appropriate
ActiveRecord callbacks -- please see the full example in examples/activerecord_associations.rb
.
Other ActiveModel Frameworks
The Elasticsearch::Model
module is fully compatible with any ActiveModel-compatible model, such as Mongoid:
require 'mongoid'
Mongoid.connect_to 'articles'
class Article
include Mongoid::Document
field :id, type: String
field :title, type: String
attr_accessible :id, :title, :published_at
include Elasticsearch::Model
def as_indexed_json(options={})
as_json(except: [:id, :_id])
end
end
Article.create id: '1', title: 'Quick brown fox'
Article.import
response = Article.search 'fox';
response.records.to_a
# MOPED: 127.0.0.1:27017 QUERY database=articles collection=articles selector={"_id"=>{"$in"=>["1"]}} ...
# => [#<Article _id: 1, id: nil, title: "Quick brown fox", published_at: nil>]
Full examples for CouchBase, DataMapper, Mongoid, Ohm and Riak models can be found in the examples
folder.
Adapters
To support various "OxM" (object-relational- or object-document-mapper) implementations and frameworks,
the Elasticsearch::Model
integration supports an "adapter" concept.
An adapter provides implementations for common behaviour, such as fetching records from the database, hooking into model callbacks for automatic index updates, or efficient bulk loading from the database. The integration comes with adapters for ActiveRecord and Mongoid out of the box.
Writing an adapter for your favourite framework is straightforward -- let's see a simplified adapter for DataMapper:
module DataMapperAdapter
# Implement the interface for fetching records
#
module Records
def records
klass.all(id: ids)
end
# ...
end
end
# Register the adapter
#
Elasticsearch::Model::Adapter.register(
DataMapperAdapter,
lambda { |klass| defined?(::DataMapper::Resource) and klass.ancestors.include?(::DataMapper::Resource) }
)
Require the adapter and include Elasticsearch::Model
in the class:
require 'datamapper_adapter'
class Article
include DataMapper::Resource
include Elasticsearch::Model
property :id, Serial
property :title, String
end
When accessing the records
method of the response, for example,
the implementation from our adapter will be used now:
response = Article.search 'foo'
response.records.to_a
# ~ (0.000057) SELECT "id", "title", "published_at" FROM "articles" WHERE "id" IN (3, 1) ORDER BY "id"
# => [#<Article @id=1 @title="Foo" @published_at=nil>, #<Article @id=3 @title="Foo Foo" @published_at=nil>]
response.records.records.class
# => DataMapper::Collection
More examples can be found in the examples
folder. Please see the Elasticsearch::Model::Adapter
module and its submodules for technical information.
Settings
The module provides a common settings
method to customize various features.
Before version 7.0.0 of the gem, the only supported setting was :inheritance_enabled
. This setting has been deprecated
and removed.
Development and Community
For local development, clone the repository and run bundle install
. See rake -T
for a list of
available Rake tasks for running tests, generating documentation, starting a testing cluster, etc.
Bug fixes and features must be covered by unit tests.
Github's pull requests and issues are used to communicate, send bug reports and code contributions.
To run all tests against a test Elasticsearch cluster, use a command like this:
curl -# https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.0.0.RC1.tar.gz | tar xz -C tmp/
SERVER=start TEST_CLUSTER_COMMAND=$PWD/tmp/elasticsearch-1.0.0.RC1/bin/elasticsearch bundle exec rake test:all
Single Table Inheritance deprecation
Single Table Inheritance
has been supported through the 6.x series of this gem. With this feature,
elasticsearch settings (index mappings, etc) on a parent model could be inherited by a child model leading to different
model documents being indexed into the same Elasticsearch index. This feature depended on the ability to set a type
for a document in Elasticsearch. The Elasticsearch team has deprecated support for types
, as is described
here.
This gem will also remove support for types and Single Table Inheritance
in version 7.0 as it enables an anti-pattern.
Please save different model documents in separate indices. If you want to use STI, you can include an artificial
type
field manually in each document and use it in other operations.
License
This software is licensed under the Apache 2 license, quoted below.
Copyright (c) 2014 Elasticsearch <http://www.elasticsearch.org>
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.