Sie sind auf Seite 1von 15

Add search features to your application, try Elasticsearch

part 1 : the concepts



bylouis gueye
Hi all,
The ultimate goal of a search engine is to provide fast, reliable, easy to use, scalable search features to a software.
But before diving into complex and technical considerations we should ask ourselves why bother with a search
engine.
1 Why should my team bother with a new complex component ?
Adding a new component needs new dev/integration/ops skills. The learning curve might be important. The
configuration in testing mode can really be a nightmare to setup. So why introducing such risk and complexity
in a project ?
One day or another, anyone familiar with databases did add some contains semantics to search features. You
end up writing such queries:
1
2
3
4
5
6
7
8
9
select *
from table0 t0
left join table1 t1 on t0.fk = t1.id
left join table2 t2 on t1.fk = t2.id
left join table3 t3 on t2.fk = t3.id
where t0.title = '%term%'
or t1.description = '%term%'
and t3.created > '2011-01-12'
and t3.status not in ('archived', 'suspended', 'canceled')

The above query will perform slower and slower as your amount of data grows because the time consuming parts
of the query dont use the optimized path: indices. Moreover, as the requirements evolve, building your query
will be more of a nightmare than pleasure.
As a rule of thumb, whenever the time spent waiting for results in a complex search is not acceptable youre left
with 2 choices:
optimize your request: make sure you use the most optimized path,
use a search engine: highly optimized for reading and searching because it indexes almost everything (not true
for database which emphasizes on relations and structure).
2 How does it work ?
The principle of a full-text search engine is based on indexing documents. First index documents then search in
those documents.
A document is a succession of words stored in sections/paragraphs. An analogy with database could be : a table
for a document, a field for a section. Words are tokens.
Indexing is the process of analysing a document and storing the result of that analysis for further retrieval.
Analysing is the process of extracting tokens from a field, counting occurences (which are valuable for
pertinence), associating them with unique path in document.
Not all tokens are relevant to search, some are so common that they are ignored. Indexers user analyzers that can
ignore such tokens.
Not all fields are analyzed. For instance a unique reference like ISBN should not be analyzed.
All the settings can be configured in a mapping.
You can search in the same type of document or in all type of documents. The later use case, though less
intuitive, can be a great time saver when it comes to build cross cutting informations like statistics.
Keep in mind : first write a document definition, setup your engine with that definition, index documents
(tokenize, store) then search.
3 Which tool does the job ?
So far so good, I understand the concepts but dont know which tool does the job.
Before choosing a tool, to avoid getting lost in a world were not familiar with, lets write down requirements.
the tool should integrate seamlessly with either java or http (because HTTP is a great interface).
the tool should be easy to install : debian package would be awesome.
the tool should be easy to configure : declarative settings would be much appreciated.
the tool should provide a comprehensive documentation that allow one to get familiar with the concept first
then the practice.
the tool should provide a comprehensive integration/acceptance test suite that will serve as a learning tool.
the obvious ones : fast at runtime and lowest possible memory footprint.
While you have Woosh in python and Zend Lucene in php, you have Solr, Elasticsearch andHibernate Search in
java.
They all rely on Lucene, are written in java and 2 of them (Elasticsearch and solr) offer HTTP interface to index
and search. Lucene is a very advanced and mature project. The amount of work around it is huge. But Lucene
mainly focuses on the very technical details about parsing and analysing text. It focuses on providing fast
searcher and reliable indexer and low level features like custom analysers, synonyms and all the plumbing/noise
that avoid one to focus on the business search requirements. The other projects take advantage of that core and
offer higher level features around it like remoting (REST/HTTP), declarative configuration, scaling (clusters,
etc), etc.
I did go for Elasticsearch because it offers in-memory nodes which are valuable when testing in embedded mode.
In addition, REST is the preferred way to instrument Elasticsearch. I really like the idea because I believe that
HTTP is a hell of an interface. Moreover the REST API is really simpler than the .
I cant do a comparative work, I can just explain why I was attracted by Elasticsearch.
I think were good for the concepts. This article is the first in a series of 4 Add search features to your
application, try Elasticsearch.
This is the full program:
part 1 : the concepts
part 2 : start small
part 3 : attaching indexation to events
part 4 : search (define a query grammar, parse query, build Elasticsearch query, search, build response)

Add search features to your application, try
Elasticsearch part 2 : start small

Jan21by louis gueye
Hi,
This article is part of a whole which aims to describe how one could integrate Elasticsearch. Theprevious post
discussed the concepts: why using a search engine?.
No people learn the same way. I usually need to understand the theory and I need to start small.
So I usually start by books. If there is no book I look for a blog that explains the philosophy. Finally I look for
pieces of codes (official documentation, blogs, tutorials, github, etc). Starting small makes me confident and
allows me to increase complexity gradually.
Depending on what you want to know about Elasticsearch you should read different section of the guide :
- SETUP section : describes how to install and run Elasticsearch (run as a service).
API : this is the REST API which seems more complete than the others. Describes how to inter-operate with
nodes : search, index, check clusters status.
Query DSL : the query API is quite rich. You have explanations about the syntax and semantics of queries and
filters.
Mapping : mapping configures elasticsearch when indexing/searching on a particular type of document.
Mapping is an important part which deserves special care.
Modules : presents the technical architecture with low level services like discovery or http.
Index modules : low level configuration on indices like sharding and logging.
River : the river concept is the ability to feed your index from another datasource (pull data every X ms).
Java and groovy API : if your software already runs in a jvm you can benefit from that and control elastic
search via this API.
To avoid getting lost in the documentation, lets focus on simple goals. Well implement them progressively:
create node/client in test environment
create node/client in non test environment
integrate with Spring
create/delete/exists an index on a node
wait until node status is ok before operating on it
create/update/delete/find data on an index
create a mapping
1 Admin operations
Operations on indices are admin operation. You can find them in the API section under Indices.
* Create node/client in test environment
A node is a process (a member) belonging to a cluster (a group). A builder is responsible of joining, detaching,
configuring the node. When creating a node, sensible defaults are already configured. I didnt dive into the
discovery process and I wont. Creating a node will automatically create its encapsulating cluster. Creating a node
is as simple as :
1
2
3
4
5
6
// settings
private Settings defaultSettings = ImmutableSettings.settingsBuilder().put("cluster.name", "test-cluster-" +
NetworkUtils.getLocalAddress().getHostName()).build();
// create node
final Node node = NodeBuilder.nodeBuilder().local(true).data(true).settings(defaultSettings).build();
// start node
node.start()
The above code will create a node instance. The node doesnt use the transport layer (tcp). So no rmi, no http, no
network services. Everything happens in the jvm.
To operate with a node you must acquire a client from your node. Any single operation depends
on it
1 Client client = node.client();
An invaluable resource on how to setup nodes and clients in test env is the AbstractNodesTestsclass.
* Create node in non test environment
In non test env just install Elasticsearch like described in the documentation SETUP section.This installation
uses the transport layer (tcp).
There isnt an official debian package but Nicolas Huray and Damien Hardy contributed on the project and
wrote one which has will be integrated to the 0.19 branch. This branch moves from a gradle building system to
maven. It will use the jdeb-maven-plugin to build the debian package. It will then be available for download on
the elasticsearch site.
Once installed you should have an Elasticsearch instance up and running with a discovery service listening on
port 54328, an HTTP service listening on 9200 and an inter-node communication port 9300.
The default cluster name is elasticsearch but we do no use it to make sure tests run in isolation.
For more on nodes and clusters configuration feel free to read this page in the official documentation.
* Integrate with Spring
You can integrate with spring by creating a FactoryBean wich is responsible for the Node/Client construction.
Dont forget to destroy them as they really are memory consuming (beware of PermGenSpace ).
This post, though a bit complex for my needs, was helpful.
If interested in that specific part you can take a look at LocalNodeClientFactoryBean.
* Create an index on a node
Once your node is up you can create indices. The main property of the index is its name which acts like an id. A
node cant have 2 indices with same name. The index name cant contains special chars like ., / etc. Keep it
simple.
1 Client client = node.client();
Then create an index with adverts id, intended to store adverts :
1
2
client.admin().indices().prepareCreate("adverts")
.execute().actionGet();
Depending on your organization you can choose to create one indice per software or one indice per stored type or
wathever settings suits you. You just have to maintain the indices names.
* Remove an index from a node
As soon as you have the name it is straightforward. You should can test existence before removing.
1
2
3
4
5
if (client.admin().indices().prepareExists("adverts")
.execute().actionGet().exists()) {
client.admin().indices().prepareDelete("adverts")
.execute().actionGet();
}
* Wait for cluster health
1
2
3
client.admin().cluster()
.prepareHealth("adverts").setWaitForYellowStatus()
.execute().actionGet();
2 Data operations
* Create / Update
1
2
3
4
client.prepareIndex("adverts", "advert", "1286743")//
.setRefresh(true) //
.setSource(advertToJsonByteArrayConverter.convert(advert)) //
.execute().actionGet();
The above code will index data (the source) whose type is advert under the adverts index area. It will also
commit (refresh) the index modifications.
The source can be many types ranging from a fieldName/value map to the byte array. The byte array is the
preferred way so I created converters from/to byte[]/Advert.
AdvertToJsonByteArrayConverter (relies on Spring Converter interface)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
...
/**
* @see org.springframework.core.convert.converter.Converter#convert(java.lang.Object)
*/
@Override
public byte[] convert(final Advert source) {

if (source == null) return null;

this.jsonMapper.getSerializationConfig()
.without(SerializationConfig.Feature.FAIL_ON_EMPTY_BEANS);

String string;
try {
string = this.jsonMapper.writeValueAsString(source);
return string.getBytes("utf-8");
} catch (final Throwable th) {
throw new IllegalArgumentException(th);
}
// System.out.println("source as string = " + string);
}

...
Updating means re-indexing so its the exact same operation.
* Delete data
When done with an object and we dont want it to appear in search results we might want to delete it from the
index:
1
2
client.prepareDelete("adverts", "advert", "4586321")
.setRefresh(true).execute().actionGet();
That will delete then refresh immediately after.
* Find data
The search API is very rich so you have to understand search semantics. If youre familiar with lucene then
everything will seem obvious to you. But if you dont youll have to get familiar with the basics.
There are 2 main types of search : exact match and full text.
The exact match operates on fields as a whole data. The field is considered a term (even if it contains spaces). It
is not analyzed, so querying field=condition will return nothing if field equals excellent condition. Exact
match suits very well for certain fields (id, reference, date, status, etc) but not for all. Exact match fields can be
sorted.
The full text operates on tokens. The analyzer removes stop words, splits the field in tokens, groups them. The
most relevant result is the one that contains the higher term occurences (roughly).
You obviously cant apply a lexical sort on those fields. They are sorted by score.
Below, an exact match example (will match adverts with provided id):
1
2
3
4
5
private SearchResponse findById(final Long id) {
return client.prepareSearch("adverts").setTypes("advert")
.setQuery(QueryBuilders.boolQuery()
.must(QueryBuilders.termQuery("_id", id))).execute().actionGet();
}
Below, a full text on a single field example (will match adverts whose description field contains at least once the
term condition):
1
2
3
4
client.prepareSearch("adverts").setTypes("advert")
.setQuery(QueryBuilders.boolQuery()
.must(QueryBuilders.queryString("condition")
.defaultField("description")).execute().actionGet();
* Create a mapping
The searchable nature of a field is an important design decision and can be configured in the mapping. It
defines, for an indexed type, the indexed fields and for each field some interesting properties like analyzed
nature (analyze|not_analyzed), type (long, string, date) , etc.Elasticsearch provides a default mapping: strings
fields are analyzed, other ones are not.
I really recommend you to spend some time on that section. One dont necessarily have to design the perfect
mapping the first time (it requires some experience) but the decisions taken on that part will impact the search
results.
Below an example of mapping:
1
2
3
4
5
6
7
8
9
10
{
"advert" : {
"properties" : {
"id" : {
"type" : "long",
"index" : "not_analyzed"
},
"name" : {
"type" : "string"
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
},
"description" : {
"type" : "string"
},
"email" : {
"type" : "string",
"index" : "not_analyzed"
},
"phoneNumber" : {
"type" : "string",
"index" : "not_analyzed"
},
"reference" : {
"type" : "string",
"index" : "not_analyzed"
},
"address" : {
"dynamic" : "true",
"properties" : {
"streetAddress" : {
"type" : "string"
},
"postalCode" : {
"type" : "string"
},
"city" : {
"type" : "string"
},
"countryCode" : {
"type" : "string"
}
}
}
}
}
}
I gathered those CRUD operations in 2 integration tests
: ElasticSearchDataOperationsTestITand ElasticSearchAdminOperationsTestIT.
Now that were familiar with Elasticsearch basic operations we can move on. We can consider improving the
code.
You agree that handling the indexing task manually is an option but not the most elegant and reliable one.
In the next post well discuss the different solutions to automatically index our data.

Add search features to your application, try
Elasticsearch part 3 : attaching indexation
to events

Jan22by louis gueye
Now that we are able to index, we should think of when whe should trigger indexing tasks.
A simple answer would be : whenever some indexed data has changed. Changed means change cardinality
(add/remove) or change existing data.
Either we invoke indexing tasks whenever we code an action that changes data or we use an event model which
listens to precise events.
1 JPA event model
If you use JPA as a persistence mechanism you can take advantage of its elegant mechanism. You can register an
entitys listeners either at class level or at method level via annotations.
One can annotate an entity method as a listner to an event. The method is the will be executed when the event
defined by the annotation occurs.
If this solution seems too intrusive or too specific, one can externalize this behaviour in a class and annotate the
entity.

Below, an example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
import javax.persistence.Entity;
import javax.persistence.EntityListeners;
import javax.persistence.Id;
import javax.persistence.PostLoad;
import javax.persistence.PostPersist;
import javax.persistence.PostUpdate;
import javax.persistence.PrePersist;
import javax.persistence.PreRemove;
import javax.persistence.PreUpdate;
import javax.persistence.Transient;

@Entity
@EntityListeners({EmployeeDebugListener.class, NameValidator.class})
public class Employee {
@Id private int id;
private String name;
@Transient private long syncTime;

@PostPersist
@PostUpdate
@PostLoad
private void resetSyncTime() {
syncTime = System.currentTimeMillis();
System.out.println("Employee.resetSyncTime called on employee id: " + getId());
}

public long getCachedAge() {
return System.currentTimeMillis() - syncTime;
}

public int getId() {
return id;
}

public void setId(int id) {
this.id = id;
}

public String toString() {
return "Employee id: " + getId() ;
}
}
That model, although very elegant, doesnt suit you if you use Spring because the persistence cant use a bean
instance. It creates its own instances which totally goes against dependency injection. This post is a rather
complete material about the solution based on JPA.
2 Hibernate event model
When using Hibernate, without JPA, with Spring you can register instances, not only classes. Your
application can listen to post-insert/post-update/post-delete events. This solution is my favorite one if your
application writes little and reads much.
You can specify your listeners by setting the eventListeners property of theLocalSessionFactoryBean. Its a
map which associates an event key to an array of listeners instance.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<bean name="sessionFactory" class="org.springframework.orm.hibernate3.LocalSessionFactoryBean">
<property name="dataSource" ref="dataSource"/>
<property name="mappingLocations" value="classpath:hibernate/mapping/*.xml"/>
<property name="hibernateProperties">
<props>
<prop key="hibernate.dialect">${hibernate.dialect}</prop>
<prop key="hibernate.hbm2ddl.auto">${hibernate.hbm2ddl.auto}</prop>
<prop key="hibernate.show_sql">${hibernate.show_sql}</prop>
<prop key="hibernate.connection.useUnicode">true</prop>
<prop key="hibernate.connection.characterEncoding">UTF-8</prop>
</props>
</property>
<property name="eventListeners">
<map>
<entry key="post-insert">
<ref bean="PostCommitInsertEventListener"/>
</entry>
<entry key="post-update">
<ref bean="PostCommitUpdateEventListener"/>
</entry>
<entry key="post-delete">
<ref bean="PostCommitDeleteEventListener"/>
</entry>
</map>
</property>

</bean>
PostCommitDeleteEventListener source code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
...
import org.hibernate.event.PostDeleteEvent;
import org.hibernate.event.PostDeleteEventListener;

public class PostCommitDeleteEventListener implements PostDeleteEventListener {
public static final String BEAN_ID = "PostCommitDeleteEventListener";
@Autowired
private SearchEngine searchEngine;
@Override
public void onPostDelete(PostDeleteEvent event) {
if (event == null) return;
Object eventEntity = event.getEntity();
if (!(eventEntity instanceof Advert)) return;
Advert advert = (Advert) eventEntity;
Long id = advert.getId();
17
18
19
searchEngine.removeFromIndex(id);
}
}
...
This is one of the most non intrusive solution. It also ensures that even if you add a new business method that
updates the database state, changes will automatically reflect in the index. No need to manually call index tasks.
3 Spring event model
When youre stuck with JPA you can use Spring event model. You useApplicationEventPublisher to publish
CRUD event then implement ApplicationListener to react to the event.
Parameterized types (generics) ensure your code will react to one type only, this can be quite convenient: not
reacting to Job event but reacting to Advert events.
That solution is not very resistant to changes because if you forget to trigger an event nothing will happen. It is
equivalent to manually call index taks but it is the only one available when using JPA.
Example of ApplicationEventPublisher call:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
...
@Autowired
private ApplicationEventPublisher eventPublisher;
/**
* @see org.diveintojee.poc.jbehave.domain.business.Facade#deleteAdvert(java.lang.Long)
*/
@Override
@Transactional(propagation = Propagation.REQUIRED)
public void deleteAdvert(final Long advertId) {
Preconditions.checkArgument(advertId != null,
"Illegal call to deleteAdvert, advert identifier is required");
this.baseDao.delete(Advert.class, advertId);
this.eventPublisher.publishEvent(new PostDeleteAdvertEvent(new Advert(advertId)));
}
...
Example of event listener:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
...
/**
* @author louis.gueye@gmail.com
*/
@Component
public class PostDeleteAdvertEventListener implements ApplicationListener<PostDeleteAdvertEvent> {
@Autowired
private SearchEngine searchEngine;
/**
* @see
org.springframework.context.ApplicationListener#onApplicationEvent(org.springframework.context.ApplicationEvent)
*/
@Override
public void onApplicationEvent(PostDeleteAdvertEvent event) {
if (event == null) return;
final Advert entity = event.getSource();
if (entity == null || entity.getId() == null) return;
this.searchEngine.removeFromIndex(Advert.class, entity.getId());
}
}
...
4 Elasticsearch river
An Elasticsearch River is a mechanism which pulls data from a datasource (couchdb, twitter, wikipedia,
rabbitmq, rss) on a regular basis (500 ms for example) and updates the index based on what changed since the
last refresh. The idea is really nice but there a too few plugin yet.Elasticsearch provides only 4 rivers plugin but
contributions are more than welcome :).
So far weve got familiar with search engine concepts then we started to have a first contact
with Elasticsearch writing CRUD tests.
We just discussed several solutions to trigger indexing. Now that we know how to index data we can finally focus
on the search business which is what the next post will try to present.
The source code hasnt moved, still on github. Feel free to explore it

Add search features to your application, try
Elasticsearch part 4 : search

Jan23by louis gueye
Elasticsearch relies on Lucene engine.
The good news is that Lucene is really fast and powerful. Yet, its not a good idea to expose such power to the
user. Elasticsearch acts as a first filter but remains quite complete.
When you dont master an API, a good practice is to have control over what you expose to the user. But this
comes with a cost, youll have to:
implement a query language
implement a language parser
implement a query translator (translate into something understandable by elasticsearch)
run search
translate Elasticsearch results into a custom structure
The task seems daunting but no worry: were going to take a look at each step.
1 The query language
Once youve delimit the perimeter, its simpler. I imagined something like:
1
http://domain/search/adverts?query=reference:REF-TTTT111gg4!description~condition legal!created lt
2009&from=2&itemsPerPage=10&sort=created+desc

1
2
3
4
5
query := (Clause!)* ;
Clause := (Field Operator Value)* | (Value)+ ;
Field := ? fieldname without space ? ;
Operator := (:|~|lt|gt|lte|gte) ;
Value : ? anything form-url-encoded ? ;
The query param is optional, if not specified, the default search should return all elements.
The from param is optional, if not specified, the 1st page is assumed
The itemsPerPage is optional, if not specified, a page will contain 10 results
The sort param is optional, if not specified, the result will be sorted by id desc.
Well even so, it is not trivial. For the purpose of the poc I simplified my requirements:
I did not use a ENBF parser like ANTLR : parsers deserve their own post.
I did not implement all the operators.
Below, the piece of code used to split clauses:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
List extractSearchClauses(final String queryString) {
if (StringUtils.isEmpty(queryString)) return null;
final List clauses = Arrays.asList(queryString.split(CLAUSES_SEPARATOR));
final Collection cleanClauses = Collections2.filter(clauses, new Predicate() {

/**
* @see com.google.common.base.Predicate#apply(java.lang.Object)
*/
@Override
public boolean apply(final String input) {
return StringUtils.isNotEmpty(input)//
&& !input.trim().equals(SearchOperator.EXACT_MATCH_OPERATOR.toString())//
&& !input.trim().equals(SearchOperator.FULL_TEXT_OPERATOR.toString()) //
&&
!input.trim().endsWith(SearchOperator.EXACT_MATCH_OPERATOR.toString())//
&& !input.trim().endsWith(SearchOperator.FULL_TEXT_OPERATOR.toString());
}

});

return new ArrayList(cleanClauses);

}
2 Translate to Elasticsearch language
First lets establish a few rules
an empty clauses list means returning all the elements (/adverts?)
a single clause that contains no field and no operator means a full text search on all searchable fields
(/adverts?q=condition+legal)
a multiple clause means a boolean AND query between clauses (/adverts?query=reference:REF-
TTTT111gg4!description~condition legal)
Elasticsearch comes with a rich search API which encapsulates the query building in a collection
of QueryBuilders.
Below an example of search instructions:
1
2
3
4
...
((BoolQueryBuilder) queryBuilder).must(queryString(clause.getValue())
.defaultField(clause.getField()));
...
3 Running search
The SearchRequestBuilder is an abstraction that encapsulates the well known search domain which specifies:
pagination properties (items per page, page number),
sort specification (fields, sort direction),
query (built by chaining QueryBuilders).
Once youve configured your SearchRequestBuilder you can run the actual search
1
2
3
...
final SearchResponse searchResponse = searchRequestBuilder.execute().actionGet();
...
4 Transfer results to a custom structure
Ideally, we should return a search result that contains total hits and pagination results(previous, next, first,
last). Those are the only information needed by the user.
But remember : the index stores a json byte array (not mandatory but I chose it because I build RESTful
services), not an object. We have to re-build our object from JSON representation.
Again, writing a Converter really helps.
I did not implement pagination as its another entire concern : building a RESTful search response that
respects HATEOAS principles. Ill blog on that later.
Example of Converter invocation:
1
2
3
4
...
final SearchResult result = this.searchResponseToSearchResultConverter.convert(searchResponse);
return result;
...
And below the Converter source (I could have used a transform function ):
1
2
3
4
5
6
7
8
9
10
11
12
...
public SearchResult convert(final SearchResponse source) {
final SearchResult result = new SearchResult();
final SearchHits hits = source.getHits();
result.setTotalHits(hits.getTotalHits());
for (final SearchHit searchHit : hits.getHits()) {
final Advert advert = jsonByteArrayToAdvertConverter.convert(searchHit.source());
result.addItem(advert);
}
return result;
}
...
This post closes a series of 4 on elasticsearch first contact.
We discussed the concepts, but before designing anything we wanted to get familiar with our new tool.
Once more comfortable with Elasticsearch, we started serious work: attaching indexing tasks to application
events first, then building a simple search endpoint that uses Elasticsearch under the hood.
I cant say that I totally adopted the tool because there still is a lot to validate:
searches : I dont know all the specifics/semantics/differences between all the pre-definedQueryBuilders.
facets : how do they work in Elasticsearch ?
I always heard it is insanely fast with high volumes. I want to see it with my own eyes.
JPA was disappointing (not an Elasticsearch problem) : maybe I could use CDI
I still have to figure out how to cleanly setup different clients instanciation modes : memory and transport
clients. Using Spring profiles is a solution but Im not a big fan of profiles
I wish I could test a mysql river. Id like to compare the river to the events mechanism.
I try not to be too exhalted but I have to say its a real pleasure once youve past the first pitfalls mostly related to:
node/client management: like jdbc connections, youre responsible for opening/closing your
resources otherwise you may have unexpected side effects,
mapping design: analysed and not_analysed properties have a huge impact on your search,
and blindness: in memory testing is for experienced user who already know the API. I would suggest a real
time-saver tool : Elasticsearch Head. This tool helped us understand how data was organized/stored, what data
was currently in the index, if it was correctly deleted, etc. The price to pay: only works with transport
configuration, not in-memory.
Anyway I hope you enjoyed the reading. If so feel free to share. If not, let me know why (I might have some
inacurate informations) as soon as we learn something.
The full source is on github.
Run the following command to launch Jbehave search stories
1 mvn clean verify -Psearch


Reference:
http://deepintojee.wordpress.com/2012/01/20/add-search-features-to-your-application-try-
elasticsearch-part-1-the-concepts/

http://deepintojee.wordpress.com/2012/01/21/add-search-features-to-your-application-try-
elasticsearch-part-2-start-small/

http://deepintojee.wordpress.com/2012/01/22/add-search-features-to-your-application-try-
elasticsearch-part-3-attaching-indexation-to-events/

http://deepintojee.wordpress.com/2012/01/23/add-search-features-to-your-application-try-
elasticsearch-part-4-search/