dimanche 21 juillet 2013

ElasticSearch quick start

In the beginning of July, I assisted in an ElasticSearch workshop organized by Xebia. It was a great event which gave me the opportunity to understand this tool basis. If you never heard a word about ElasticSearch, go to http://www.elasticsearch.org. To resume quickly, it is a NoSQL search engine based on Apache Lucene.

Today I would like to show you how it is easy to begin with this tool.

First launch 

  1. Download an ElasticSearch archive on http://www.elasticsearch.org/download/
  2. Extract the archive
  3. Launch elasticsearch : 
$ ./bin/elasticsearch -f
[2013-07-12 17:09:08,776][INFO ][node     ] [Centurius] {0.90.2}[2741]: initializing ...
[2013-07-12 17:09:08,785][INFO ][plugins  ] [Centurius] loaded [], sites []
[2013-07-12 17:09:11,253][INFO ][node     ] [Centurius] {0.90.2}[2741]: initialized
[2013-07-12 17:09:11,253][INFO ][node     ] [Centurius] {0.90.2}[2741]: starting ...
[2013-07-12 17:09:11,372][INFO ][transport] [Centurius] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[myComputer/127.0.0.1:9300]}
[2013-07-12 17:09:14,470][INFO ][cluster.service] [Centurius] new_master [Centurius][cuT7Cb_eQbaHyQ9fW9aCng][inet[myComputer.local/127.0.0.1:9300]], reason: zen-disco-join (elected_as_master)
[2013-07-12 17:09:14,514][INFO ][discovery] [Centurius] elasticsearch/cuT7Cb_eQbaHyQ9fW9aCng
[2013-07-12 17:09:14,542][INFO ][http     ] [Centurius] bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[myComputer.local/127.0.0.1:9200]}
[2013-07-12 17:09:14,543][INFO ][node     ] [Centurius] {0.90.2}[2741]: started
[2013-07-12 17:09:14,595][INFO ][gateway  ] [Centurius] recovered [0] indices into cluster_state

ElasticSearch is now running! You can check that by going to http://localhost:9200 in your web browser. Centurius is the name of the launched instance. You can override it by changing the property node.name in the elasticsearch.yml file.

You should only try to change the configuration if you encounter a problem. The default configuration is good enough to bring the server in production.

Basic queries

ElasticSearch provides a restful API allowing to create, search or delete documents. So to create documents you have to do a HTTP POST or PUT request, to search documents you have to do a GET request, and to delete documents you have to do a DELETE request.

Create a document


I want to create a simple database containing e-commerce products. Each document is created in a specific index under a specific type. From whose coming from SQL world like me, the index is similar to a SQL database instance whereas the type is similar to a SQL database table. To store my products documents, I call the index “catalog” and the type “product”. The document content must be in JSON.

To create a document, you have to respect the following syntax :

$ curl -XPOST 'http://<host name>/<index>/<type>/' -d '{<your json document>}'

For example :

$ curl -XPOST 'http://localhost:9200/catalog/product/' -d '{ 
"title" : "Ipad 2", "manufacturer" : "Apple", "submissionDate" : "2013-07-12", 
"text" : "tablette tactile" }'
{"ok":true,"_index":"catalog","_type":"product",
"_id":"iZ_SSvyBRbOBYeRHWIjSaA","_version":1} 

Congratulations, you have created your first document. Each document has an unique identifier. Here the identifier has been generated automaticaly because I did a POST request, it is "iZ_SSvyBRbOBYeRHWIjSaA".

List all documents


To check that the document has been correctly created, you can list all the indexed documents with the following command :

$ curl -XGET 'http://localhost:9200/catalog/_search'
{"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},
"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"catalog","_type":"product",
"_id":"iZ_SSvyBRbOBYeRHWIjSaA","_score":1.0, "_source" : { "title" : "Ipad 2", 
"manufacturer" : "Apple", "submissionDate" : "2013-07-12", 
"text" : "tablette tactile" }}]}} 

Search documents by term


If you want to search every document with a title equals to "ipad" :

$ curl -XGET 'http://localhost:9200/catalog/product/_search' -d '{
   "query" : {
     "match" : {
       "title" : "ipad"
     }
   }
 }'

Please note that you can search documents in an entire index (/catalog) or just on a type (/catalog/product).

Search documents with wildcard query


If you want to search every document with a title starting with "ip" :

$ curl -XGET 'http://localhost:9200/catalog/product/_search' -d '{
   "query" : {
     "wildcard" : {
       "title" : "ip*"
     }
   }
 }'

Search documents with fuzzy query


If you want to search a document from a term potentially misspelled, you can do a fuzzy query. For example the following query allows to search the products with a title like ipod or ipad :

$ curl -XGET 'http://localhost:9200/catalog/product/_search' -d '{
   "query" : {
     "fuzzy" : {
       "title" : "ipud"
     }
   }
 }'

Delete a document


If you want to delete a document, you have to do a DELETE HTTP request :
$ curl -XDELETE 'http://localhost:9200/catalog/product/_query' -d '{
    "match" : {
      "title" : "Iphone 5"
    }
}'

Understand ElasticSearch clustering

The ElasticSearch-Head plugin allows to see the cluster state. To install it :

$ ./bin/plugin -install mobz/elasticsearch-head
-> Installing mobz/elasticsearch-head...
Trying https://github.com/mobz/elasticsearch-head/zipball/master... (assuming 
site plugin)
Downloading ...........DONE
Identified as a _site plugin, moving to _site structure ...
Installed head

Now, without restarting ElasticSearch, you can check the cluster state here : http://localhost:9200/_plugin/head.





We can find our ElasticSearch instance called Centurius. This instance is composed, by default, of five shards (property index.number_of_shards). Each shard is a Lucene instance automatically managed by ElasticSearch. When you create a document, the document id is used to determine in which shard it must be stored.

Now we can launch a new instance of ElasticSearch (elasticsearch -f in a new tab of your console). If you refresh the ElasticSearch-Head page, you constat that a new instance apperead in the cluster :


If you check the logs of the first instance :

[2013-07-12 17:35:42,209][INFO ][cluster.service] [Centurius] added 
{[Reaper][Zqma1_5qQLCxbA0JQAO1BA][inet[/192.168.0.37:9301]],}, 
reason: zen-disco-receive(join from node[[Reaper][Zqma1_5qQLCxbA0JQAO1BA]
[inet[/192.168.0.37:9301]]])


The new ElasticSearch instance has been automatically discovered by unicast (by default). Each ElasticSearch instance from the same network and with the same cluster name is gathered in the same cluster. You can change the cluster name of an instance in the file config/elasticsearch.yml, property cluster.name.

Now let's add a third instance of ElasticSearch :






By default, each shard is replicated twice (property index.number_of_replicas) : one primary shard used for read and write operations, and one replica shard used for read operations.

Here the node Centurius contains the primary shards 1, 3 and 4. The node Noh-Varr contains the primary shards 0 and 2. Finally, the node Reaper contains only replica shards.

Conclusion


I hope this thread gave you the basis to work with ElasticSearch and to understand its clustering mechanism. Don't hesitate to play with ElasticSearch, change its properties and do more complicated queries! Besides this thread, a good way to start with this tool and to understand every concept is the ElasticSearch glossary page. Have fun!

lundi 1 juillet 2013

Why it is faster to write unit tests?

"We don't have the time to write unit tests", "we write the unit tests because it is mentioned in the definition of done", "we write the unit tests at the end of the sprint"... How many times did you hear or maybe say that?

The purpose of this thread is to prove that writing unit tests, particularly before testing in runtime, can help you to be more productive and efficient. To begin, let's compare the development process with and without unit tests.

Without unit tests

How a developer works without unit tests?
  1. Development phase
    • Write the entire code of the feature
  2. Runtime test phase
    1. Constat a bug
    2. Analyse where the bug comes from
    3. Fix the bug
    4. Test if the correction is correct
    5. Continue the test phase until all tests are correct

With unit tests

Now, how a developer works with unit tests?
  1. Development phase
    1. Write a unit test
    2. Write the corresponding code
    3. Write another unit test until the feature is completely developed
  2. Runtime test phase
    1. Constat a bug
    2. Analyse where the bug comes from
    3. Write a unit test reproducing the bug
    4. Fix the bug
    5. Test if the correction is correct
    6. Continue the test phase until all tests are correct

Runtime test phase is expensive

The runtime test phase (with or without unit tests) takes a lot of time for several reasons :
  • For each test, you have to prepare a set of data, even for the margin cases
  • You have to launch your application
  • You have to play your test and maybe find a bug
  • For each bug, you have to find where to fix the bug, possibly using a debugger
  • Then, you have to fix your code
  • Finally, you test again to be sure that your fix is correct. For that, of course, you have to prepare a new set of data

Runtime test phase is longer without unit tests

With unit tests, you can be sure to find most of the bugs directly during the development phase. Of course there are still some bugs but less than if you write no unit tests at all.

So if you have more bugs without unit tests, the runtime test will be longer.

Now, how can be sure without unit tests that a bug fix hasn't broke a functional case you have already test? Are you sure all your code is still working? Maybe you should restart the runtime test phase from the beginning? And what about refactoring your code?

With unit tests, you can be very confident to haven't broke anything because all the tests you have already written can prove you that. Plus, you won't be afraid to do refactoring in order to have a cleaner and more comprehensible code, what will help you to work more efficiently during the next developments.

Conclusion

Of course it is faster to write a code without doing units tests. But how much extra time do you spend testing in runtime or fixing bugs in production?

Less bugs also mean less stress and more time to work on interesting things.

In bonus, you do more code and less functional tests. That's cool, coding is what you are good for!

What about integration tests?

An answer to resolve the cost of the manual tests are the integration tests. Do you still need to do unit tests with integration tests? It is enough to do only unit tests? It is a very interesting question I will try to answer in another thread. But yes, unit tests and integration tests are complementary.