Elasticsearch Percolator Query Implementation in Ruby

Introduction

In this article, we will see when we need to use elasticsearch percolator query and how to implement it in Ruby. The elasticsearch percolator query is written based on Ubuntu, but it works in other Linux libraries too.

 

How does it work?

 

We believe most Elasticsearch developers think conventionally, and so, they design documents according to the structure of data and store them in an index. Then they define queries through  the search API  to retrieve these documents. The percolator works in the opposite (reverse) direction. Meaning, first, you store queries into an index and then through the Percolate API you define documents in order to retrieve these queries

 

  • All queries are loaded in memory
  • Each document is indexed in memory
  • All queries get executed against it
  • Execution time linear to # of queries
  • Memory index gets cleaned up

 

Elastic Search Image

 

When do we need to use percolator?

 

The usage of the Percolate API in Elasticsearch is quite common, and for the purpose of document monitoring and alerting.

 

For example, provision of a platform that stores users’ interests in order to send the right content (notification alert) to the right users every time new content comes in.

 

For instance, a user subscribes to a specific topic, and as soon as a new article for that topic comes in, a notification will be sent to the interested users.

 

How is this done?

 

By expressing the users’ interests as an elasticsearch query, using the query DSL, and you can register it in elasticsearch as though it was a document. Every time a new article is issued, without needing to index it, you can percolate it to know which users are interested in it. At this point in time you know who needs to receive a notification containing the article link (sending the notification is not done by elasticsearch though). An additional step would also be to index the content itself but that is not required.

 

The uses of this concept are many, such as alerting weather forecast, price monitoring, news alerts, stocks alerts, logos monitoring and many more.

 

Pre-requisites & Setup:

 

Java:

 

Elastic search engine is developed in Java, so we need to make sure Java is installed with help of the below command:

 


java --version

 

Installing Elasticsearch:

 

Next, install Elasticsearch with the below command:

 


sudo apt-get install elasticsearch

 

In order to make sure that Elasticsearch is installed correctly, use the following command:

 


curl -XGET 'localhost:9200'

 

The result should be something like the following:

 


{
"name" : "lNOxiFt",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "r8yOSyCjRtmHFYmdbijjpg",
"version" : {
"number" : "5.1.2",
"build_hash" : "c8c4c16",
"build_date" : "2017-01-11T20:18:39.146Z",
"build_snapshot" : false,
"lucene_version" : "6.3.0"
},
"tagline" : "You Know, for Search"

 

Using Percolator:

 

The following steps explain how your queries get stored into an index and how you define documents in order to retrieve these queries through the Percolate API.

 

  1. Requirement and service set up
  2. Making a connection
  3. Create a index
  4. Index a query
  5. Percolate a document

 

Requirement & Service setup

 

In order to implement elasticsearch percolator, we need elasticsearch gem.

 


gem 'elasticsearch'

 

I created one service object to index query.

 


index_service = Services::Percolation.new

index_service.re_index

 

Making a connection

 

In order to make a connection, we need elasticsearch-transport, which provides a low-level Ruby client for connecting to an Elasticsearch cluster.

 


def initialize(cfg)

@cfg = cfg

transport_configuration = lambda do |f|

f.response :logger

f.adapter  :typhoeus

end

transport = Elasticsearch::Transport::Transport::HTTP::Faraday.new hosts: [

{ host: @cfg['elastic']['url'], port: @cfg['elastic']['port'] } ], &transport_configuration

@server = Elasticsearch::Client.new log: true, transport: transport

end

 

def re_index

index_name = "percolator-index"

delete_index(index_name)

create_index(index_name)

ds = ['foo', 'bar']

ds.map do |i|

index(i, index_name)

end

end

 

Create an index

 

Create an index with two mappings:

 


def create_index(index_name)

@server.indices.create index: index_name, body: {

mappings: {

doctype: {

properties: {

message: {

type: "text"

}

}

},

queries: {

properties: {

query: {

type: "percolator"

}

}

}

}

}

end

 

The doctype mapping is the mapping used to pre-process the document defined in the elasticsearch percolator query before it gets indexed into a temporary index.

 

The queries mapping is the mapping used for indexing the query documents. A json object is stored in the query field, and this json object actually constitutes an Elasticsearch query. Further, this query field is configured in such a way as to utilise the percolator field type. This particular field type (the percolator field type) is used since it is the one that can comprehend the query dsl. This is also useful because of the manner in which it stores the query. The documents specified on the elasticsearch percolator query can be matched at any point later, with the query.

 

Index a query

 

Register a query in the percolator:

 


def index(ds, index_name)

query = { query: { match: { message: "#{ds}" } } }

begin

r = @server.index index: index_name, type: 'queries', id: ds, body: query

puts 'Indexing result:'

puts r.inspect

rescue Faraday::Error::ResourceNotFound,

Faraday::Error::ClientError,

Faraday::Error::ConnectionFailed => e

puts "Connection failed: #{e}"

false

end

end

 

Percolate a document

 

Match a document to the registered percolator queries:

 


def list_document(index_name='percolator-index')

sleep 2

doc = { query: { percolate: { field: "query", document_type: "doctype",document: {message: 'message foo bar'} } } }

data = @server.search index: index_name, type: 'queries', body: doc

puts "final result"

puts data

end

 

The above request will yield the following output response:

 


{"took"=>8, "timed_out"=>false, "_shards"=>{"total"=>5, "successful"=>5, "failed"=>0}, "hits"=>{"total"=>2, "max_score"=>0.25316024, "hits"=>[{"_index"=>"percolator-index", "_type"=>"queries", "_id"=>"foo", "_score"=>0.25316024, "_source"=>{"query"=>{"match"=>{"message"=>"foo"}}}}, {"_index"=>"percolator-index", "_type"=>"queries", "_id"=>"bar", "_score"=>0.25316024, "_source"=>{"query"=>{"match"=>{"message"=>"bar"}}}}]}}

 

This can then be used in whichever manner to render the desired output.

 

This is a sample implementation of elasticsearch percolator query using Ruby, and as mentioned above, it has quite a lot of features. To learn more, check this ElasticSearch Percolator.

 

To checkout this particular example, please check agiratech github repo.