Installing and Using Elasticsearch Plugins
Traducciones al EspañolEstamos traduciendo nuestros guías y tutoriales al Español. Es posible que usted esté viendo una traducción generada automáticamente. Estamos trabajando con traductores profesionales para verificar las traducciones de nuestro sitio web. Este proyecto es un trabajo en curso.


What are Elasticsearch Plugins?
Elasticsearch is an open source, scalable search engine. Although Elasticsearch supports a large number of features out-of-the-box, it can also be extended with a variety of plugins to provide advanced analytics and process different data types.
This guide will show to how install the following Elasticsearch plugins and interact with them using the Elasticsearch API:
- ingest-attachment: allows Elasticsearch to index and search base64-encoded documents in formats such as RTF, PDF, and PPT.
- analysis-phonetic: identifies search results that sound similar to the search term.
- ingest-geoip: adds location information to indexed documents based on any IP addresses within the document.
- ingest-user-agent: parses the
User-Agentheader of HTTP requests to provide identifying information about the client that sent each request.
sudo. If you’re not familiar with the sudo command, you can check our
Users and Groups guide.Before You Begin
If you have not already done so, create a Linode account and Compute Instance. See our Getting Started with Linode and Creating a Compute Instance guides.
Follow our Setting Up and Securing a Compute Instance guide to update your system. You may also wish to set the timezone, configure your hostname, create a limited user account, and harden SSH access.
Installation
Java
As of this writing, Elasticsearch requires Java 8.
OpenJDK 8 is available from the official repositories. Install the headless OpenJDK 8 package:
sudo apt install openjdk-8-jre-headlessConfirm that Java is installed:
java -versionThe output should be similar to:
openjdk version "1.8.0_151" OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-1~deb9u1-b12) OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
Elasticsearch
Install the official Elastic APT package signing key:
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -Install the
apt-transport-httpspackage, which is required to retrieve deb packages served over HTTPS:sudo apt-get install apt-transport-httpsAdd the APT repository information to your server’s list of sources:
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic.listUpdate the list of available packages:
sudo apt-get updateInstall the
elasticsearchpackage:sudo apt-get install -y elasticsearchSet the JVM heap size to approximately half of your server’s available memory. For example, if your server has 1GB of RAM, change the
XmsandXmxvalues in the/etc/elasticsearch/jvm.optionsfile to512m. Leave the other values in this file unchanged:- File: /etc/elasticsearch/jvm.options
-Xms512m -Xmx512m
Enable and start the
elasticsearchservice:sudo systemctl enable elasticsearch sudo systemctl start elasticsearchWait a few moments for the service to start, then confirm that the Elasticsearch API is available:
curl localhost:9200The Elasticsearch REST API should return a JSON response similar to the following:
{ "name" : "Sch1T0D", "cluster_name" : "docker-cluster", "cluster_uuid" : "MH6WKAm0Qz2r8jFK-TcbNg", "version" : { "number" : "6.1.1", "build_hash" : "bd92e7f", "build_date" : "2017-12-17T20:23:25.338Z", "build_snapshot" : false, "lucene_version" : "7.1.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }To determine whether or not the service has started successfully, view the most recent logs:
systemctl status elasticsearch
You are now ready to install and use Elasticsearch plugins.
Elasticsearch Plugins
The remainder of this guide will walk through several plugins and common use cases. Many of the following steps will involve communicating with the Elasticsearch API. For example, in order to index a sample document into Elasticsearch, a POST request with a JSON payload must be sent to /{index name}/{type}/{document id}:
POST /exampleindex/doc/1
{
"message": "this the value for the message field"
}
There are a number of tools that can be used to issue this request. The simplest approach would be to use curl from the command line:
curl -H'Content-Type: application/json' -XPOST localhost:9200/exampleindex/doc/1 -d '{ "message": "this the value for the message field" }'
Other alternatives include the vim-rest-console, the Emacs plugin es-mode, or the Console plugin for Kibana. Use whichever tool is most convenient for you.
Prepare an Index
Before installing any plugins, create a test index.
Create an index named
testwith one shard and no replicas:POST /test { "settings": { "index": { "number_of_replicas": 0, "number_of_shards": 1 } } }Note These settings are suitable for testing, but additional shards and replicas should be used in a production environment.Add an example document to the index:
POST /test/doc/1 { "message": "this is an example document" }Searches can be performed by using the
_searchURL endpoint. Search for “example” in the message field across all documents:POST /_search { "query": { "terms": { "message": ["example"] } } }The Elasticsearch API should return the matching document.
Elasticsearch Attachment Plugin
The attachment plugin lets Elasticsearch accept a base64-encoded document and index its contents for easy searching. This is useful for searching PDF or rich text documents with minimal overhead.
Install the
ingest-attachmentplugin using theelasticsearch-plugintool:sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-attachmentRestart elasticsearch:
sudo systemctl restart elasticsearchConfirm that the plugin is installed as expected by using the
_catAPI:GET /_cat/pluginsThe
ingest-attachmentplugin should be under the list of installed plugins.
In order to use the attachment plugin, a pipeline must be used to process base64-encoded data in the field of a document. An ingest pipeline is a way of performing additional steps when indexing a document in Elasticsearch. While Elasticsearch comes pre-installed with some pipeline processors (which can perform actions such as removing or adding fields), the attachment plugin installs an additional processor that can be used when defining a pipeline.
Create a pipeline called
doc-parserwhich takes data from a field calledencoded_docand executes theattachmentprocessor on the field:PUT /_ingest/pipeline/doc-parser { "description" : "Extract text from base-64 encoded documents", "processors" : [ { "attachment" : { "field" : "encoded_doc" } } ] }The
doc-parserpipeline can now be specified when indexing documents to extract data from theencoded_docfield.Note By default, the attachment processor will create a new field calledattachmentwith the parsed content of the target field. See the attachment processor documentation for additional information.Index an example RTF (rich-text formatted) document. The following string is an RTF document containing text that we would like to search. It consists of the base64-encoded text “Hello from inside of a rich text RTF document”:
e1xydGYxXGFuc2kKSGVsbG8gZnJvbSBpbnNpZGUgb2YgYSByaWNoIHRleHQgUlRGIGRvY3VtZW50LgpccGFyIH0KAdd this document to the test index, using the
?pipeline=doc_parserparameter to specify the new pipeline:PUT /test/doc/rtf?pipeline=doc-parser { "encoded_doc": "e1xydGYxXGFuc2kKSGVsbG8gZnJvbSBpbnNpZGUgb2YgYSByaWNoIHRleHQgUlRGIGRvY3VtZW50LgpccGFyIH0K" }Search for the term “rich”, which should return the indexed document:
POST /_search { "query": { "terms": { "attachment.content": ["rich"] } } }This technique may be used to index and search other document types including PDF, PPT, and XLS. See the Apache Tika Project (which provides the underlying text extraction implementation) for additional supported file formats.
Phonetic Analysis Plugin
Elasticsearch excels when analyzing textual data. Several analyzers come bundled with Elasticsearch which can perform powerful analyses on text.
One of these analyzers is the Phonetic Analysis plugin. By using this plugin, it is possible to search for terms that sound similar to other words.
Install the plugin the
analysis-phoneticplugin:sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install analysis-phoneticRestart Elasticsearch:
sudo systemctl restart elasticsearchConfirm that the plugin has been successfully installed:
GET /_cat/plugins
In order to use this plugin, the following changes must be made to the test index:
- A filter must be created. This filter will be used to process the tokens that are created for fields of an indexed document.
- This filter will be used by an analyzer. An analyzer determines how a field is tokenized and how those tokenized items are processed by filters.
- Finally, we will configure the test index to use this analyzer for a field in the index with a mapping.
An index must be closed before analyzers and filters can be added.
Close the test index:
POST /test/_closeDefine the analyzer and filter for the test index under the
_settingsAPI:PUT /test/_settings { "analysis": { "analyzer": { "my_phonetic_analyzer": { "tokenizer": "standard", "filter": [ "standard", "lowercase", "my_phonetic_filter" ] } }, "filter": { "my_phonetic_filter": { "type": "phonetic", "encoder": "metaphone", "replace": false } } } }Re-open the index to enable searching and indexing:
POST /test/_openDefine a mapping for a field named
phoneticwhich will use themy_phonetic_analyzeranalyzer:POST /test/_mapping/doc { "properties": { "phonetic": { "type": "text", "analyzer": "my_phonetic_analyzer" } } }Index a document with a JSON field called
phoneticwith content that should be passed through the phonetic analyzer:POST /test/doc { "phonetic": "black leather ottoman" }Perform a
matchsearch for the term “ottoman”. However, instead of spelling the term correctly, misspell the word such that the misspelled word is phonetically similar:POST /_search { "query": { "match": { "phonetic": "otomen" } } }The phonetic analysis plugin should be able to recognize that “otomen” and “ottoman” are phonetically similar, and return the correct result.
Geoip Processor Plugin
When indexing documents such as log files, some fields may contain IP addresses. The Geoip plugin can process IP addresses in order to enrich documents with location data.
Install the plugin:
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-geoipRestart Elasticsearch:
sudo systemctl restart elasticsearchConfirm the plugin is installed by checking the API:
GET /_cat/plugins
As with the ingest-attachment pipeline plugin, the ingest-geoip plugin is used as a processor within an ingest pipeline. The Geoip plugin documentation outlines the available settings when creating processors within a pipeline.
Create a pipeline called
parse-ipwhich consumes an IP address from a field calledipand creates regional information underneath the default field (geoip):PUT /_ingest/pipeline/parse-ip { "description" : "Geolocate an IP address", "processors" : [ { "geoip" : { "field" : "ip" } } ] }Add a mapping to the index to indicate that the
ipfield should be stored as an IP address in the underlying storage engine:POST /test/_mapping/doc { "properties": { "ip": { "type": "ip" } } }Index a document with the
ipfield set to an example address and pass thepipeline=parse-ipin the request to use theparse-ippipeline to process the document:PUT /test/doc/ipexample?pipeline=parse-ip { "ip": "8.8.8.8" }Retrieve the document to view the fields created by the pipeline:
GET /test/doc/ipexampleThe response should include a
geoipJSON key with fields such ascity_namederived from the source IP address. The plugin should correctly determine that the IP address is located in California.
User Agent Processor Plugin
A common use case for Elasticsearch is to index log files. By parsing certain fields from web server access logs, requests can be more effectively searched by response code, URL, and more. The ingest-user-agent adds the capability to parse the contents of the User-Agent header of web requests to more precisely create additional fields identifying the client platform that performed the request.
Install the plugin:
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-user-agentRestart Elasticsearch:
sudo systemctl restart elasticsearchConfirm the plugin is installed:
GET /_cat/pluginsCreate an ingest pipeline which instructs Elasticsearch which field to reference when parsing a user agent string:
PUT /_ingest/pipeline/useragent { "description" : "Parse User-Agent content", "processors" : [ { "user_agent" : { "field" : "agent" } } ] }Index a document with the
agentfield set to an exampleUser-Agentstring:PUT /test/doc/agentexample?pipeline=useragent { "agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36" }Retrieve the document to view the fields created by the pipeline:
GET /test/doc/agentexampleThe indexed document will include user data underneath the
user_agentJSON key. The User Agent plugin understands a variety ofUser-Agentstrings and can reliably parseUser-Agentfields from access logs generated by web servers such as Apache and NGINX.
Conclusion
The plugins covered in this tutorial are a small subset of those available from Elastic or written by third parties. For additional resources regarding Elasticsearch and plugin use, see the links in the More Information section below.
More Information
You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.
This page was originally published on
