Elasticsearch Cheat Sheet and Short Examples

Quick short, Elasticsearch cheat API End-Point calls that takes a while to remember.  If I have missed your favourite or want to make a recommendation to add in please do leave comment

States


# Show all indices
GET /_cat/indices?v-
  
# cluster health state
GET /_cluster/health
  
# Show all nodes
GET /_cat/nodes?

# Show largest Index. Leverages the _CAT api
curl 'localhost:9200/_cat/indices?bytes=b' | sort -rnk8 | grep -v marvel,kibana

Index

Indexing


# Bulk Indexing Example
POST /factory/_bulk
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"swift","make":"suzuki", "mark":1, "release_year":"1998-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"swift","make":"suzuki", "mark":2, "release_year":"2003-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"baleno","make":"suzuki", "mark":1, "release_year":"2000-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"focus","make":"ford","mark":1, "release_year":"2001-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"focus","make":"ford","mark":2, "release_year":"2007-01-01"}
	  {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"rs","make":"ford","mark":2, "release_year":"2011-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"rav4","make":"toyota","mark":3, "release_year":"2009-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"mondeo","make":"ford","mark":2, "release_year":"2007-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"st","make":"ford","mark":1, "release_year":"2007-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
 { "model":"5 series","make":"bmw","mark":3, "release_year":"2009-01-01"}

 

Index Management


PUT /my_index/_settings
 
{
  "index": {
    "number_of_replicas": 4
  }
}
 
# Add a single alias
PUT /lmg_sem_v4/_alias/lmg
 
 
# Move Shard to another node
POST /_cluster/reroute
{
    "commands" : [ {
        "move" :
            {
              "index" : "amg_sem_v12", "shard" : 0,
              "from_node" : "UK-SEARCH-STG-02", "to_node" : "UK-SEARCH-STG-01"
            }
        }
    ]
}


Index Cloning

From ElasticSearch 2.3 you you may now use the built in _reindex API to re-index data


POST /_reindex
{
  "source": {
    "index": "my-index"
  },
  "dest": {
    "index": "my-new-index"
  }
}

 

Cloning with a filter/query


POST /_reindex
{
  "source": {
    "index": "my-index",
    "query": {
      "term": {
        "has-index-cloning-with-filter-on": true
      }
    }
  },
  "dest": {
    "index": "my-new-index"
  }
}

 


# Show cluster-wide Recovery state
GET /_recovery?pretty&human
GET /_recovery?pretty&human&active_only=true
 
# show tabular cluster-wide status summary
GET /_cat/recovery?v
 
# Show me all snapshots
GET /_snapshot/_all
 
# Show settings details of snapshot repo "my_backup"
GET /_snapshot/my_backup
 
# Show all snapshot details of repo "my_backup"
GET /_snapshot/my_backup/_all
 
# Delete snapshot "snapshot_2015_09_07-13_50_48" from repo "prod-0009"
DELETE /_snapshot/prod-0009/snapshot_2015_09_07-13_50_48/
 
# Register Repo + no need to verify permission on path location
PUT /_snapshot/prod-0009?verify=false
{
   "type": "fs",
   "settings": {
      "location": "/vagrant/prod-0009",
      "compress": true,
      "max_snapshot_bytes_per_sec": "200000000",
      "max_restore_bytes_per_sec": " 500mb"
      }
}
 
# Take Snapshot of just "cmg_sem_v6" index
PUT /_snapshot/one-off-repo?wait_for_completion=true
{
  "indices": "cmg_sem_v6",
  "ignore_unavailable": "true",
  "include_global_state": false
}
 
 
# Restore Snapshots of all index + global state
POST /_snapshot/prod-0009/snapshot_2015_09_11-10_23_29/_restore
 
# Restore Snapshots of only "log_river" index
POST /_snapshot/prod-0009/snapshot_2015_09_11-10_23_29/_restore
{
  "indices": "log_river",
  "rename_pattern": "index_pattern",  
  "rename_replacement": "restored_pattern" 
  "ignore_unavailable": "true",
  "include_global_state": false
}
 
# Speed up Recovery Speed
PUT /_cluster/settings
{
   "persistent": {
      "cluster.routing.allocation.node_concurrent_recoveries": "5",
      "indices.recovery.max_bytes_per_sec": "200mb",
      "indices.recovery.concurrent_streams": 5
   }
}
 

Having trouble With .Marvel* index creation?


# You can view the current settings template with :
curl -XGET localhost:9200/_template/marvel
 
# Modify settings with:
PUT /_template/marvel_custom
{
    "order" : 1,
    "template" : ".marvel*",
    "settings" : {
        "number_of_replicas" : 0,
        "number_of_shards" : 5
    }
}
 

 

More here

Move/Route shards to another elasticsearch node


POST /_cluster/reroute
{
    "commands" : [ {
        "move" :
            {
              "index" : "amg_sem_v12", "shard" : 0,
              "from_node" : "UK-SEARCH-STG-02", "to_node" : "UK-SEARCH-STG-01"
            }
        }
    ]
}

Top Elasticsearch plugins

These are my top must have Elasticsearch plugins, from monitoring clusters to moving indices and managing Elasticsearch snapshots.  If you are here you may already know of Elasticsearch’s marvel plugin, with combination with Sense you mostly have all you will need to manage Elasticsearch.  If like me Marvel and Sense are not enough for your workflow or just curious about what plugins by the Elasticsearch community can offer you please read on.

NOTE: Some of these plugins are not being maintained anymore since site plugins got removed from new version of Elasticsearch.  I will soon update this list with workaround/alternatives.  Meanwhile you may follow the links to the plugins landing pages of for the latest update/news/alternative.

Head

 

Elasticseach head cluster OverviewBy default the first thing I will always install.  Most times even before I install the official elastic search Marvel plugin suits.

  • Not quite polished for aesthetics but does what it needs to do very very good and that is quick snap via of the status of your cluster, nodes, shards etc.
  • Quick handy drop-down buttons to drop an index, alias or set and alias.
  • You can also easily write up your own queries and view them as tables or json.
  • Follow/contribute to the project on GitHub – elasticsearch-head

kopf

 

elasticsearch kopf cluster view

Another lovely tool aimed more at administering your elasticsearch cluster.  Very light weight and covers commonly peformed tasks, by no means comprehensive but its getting better and better.  Best thing this have over HQ is the REST client which you can leverage to explore most of what Elasticsearch API exposes.

Note this can also be run locally without being installed as a plugin albeit some browser limitations:

git clone git://github.com/lmenezes/elasticsearch-kopf.git
cd elasticsearch-kopf
git checkout <a given branch final version>
open _site/index.html

Note this can also be run locally without being installed as a plugin albeit some browser limitations:

 

WhatsOn

 

elasticsearch whatson

Aimed at elasticsearch cluster monitoring, visualisation and inspection with lots of stats. Can be used without installing at Whatson Git Hub Page.  Most Useful in large clusters.

 

ElasticSearch-Hq

 

HQ Elastic clusterhealth

For

  • Performance metrics reports
  • Monitoring
  • Charts
  • Open-source project.
  • Elastic search management tool.
  • Available as a Plug/Hosted/Self-Hosted war
  • Query functionality (limited, currently not customisable)
  • Follow/contribute to the project on GitHub – elasticsearch-HQ

For ElasticHQ users usage Stats  go here

 

Big Desk

 

bigdesk cpu

In short Big Desk pulls data from Elasticsearch via REST API calls in convert this to numerous charts of statistics about your elasticsearch cluster.

  • can be installed as a plugin
  • download and run locally or
  • run from the web
  • follow/contribute to the project on GitHub-bigdesk

tokenizers

 

 

 

 

 

 

  • Priceless tool for tuning results relevancy.  Exposes easy means to testing your Analysers, mappings and queries anatomy.
  • Fully dedicated tab for testing Tokenisers, Analysers and most importantly your Queries
  • This is a a tool I will highly recommend for learning as it quickly helps you learn how elasticsearch interpret, parses and match/not queries
  • Shame this plugin has not been updated in recent time but feel free to contribute at Git-hub elasticsearch-inquisitor

Mac Development Environment Setup

Over the years the list bellow has become my go to reference for setting up a new Mac.  These are the bare bone tools I will recommend for some serious development on a Mac, especially around Java and the open source world.  Read through, make it yours and add to it.

Firstly, as always with a Mac you’ll need to agree to apple’s Xcode terms to make full use of the terminal.
#go to your terminal and run :
xcode-select --install

 

Install iTerm.

An extremely great alternative to the in-built mac terminal

 

.profile

Add .profile settings for syntax terminal highlighting.  Why not? After all we spent quite a lot of time on bash terminals why not make them more interesting.  It might just help prevent you from typing the wrong command in the incorrect terminal.

Example Bash_Profile:


# Bash Colors
export TERM="xterm-color"
export CLICOLOR="true"
export LSCOLORS="exfxcxdxbxegedabagacad"

# Bash Prompt customisation with current location
export PS1='\[\033[01;35m\]\u\[\033[34m\]@\[\033[36m\]\h\[\033[00m\]:\[\033[01;33m\]\w\[\033[31m\]$(git branch -l 2&gt;/dev/null | sed -n s/^*//p) $(hg branch 2&gt;/dev/null) \$\[\033[00m\]'

#  Java Home Over-ride
#export JAVA_HOME=`/usr/libexec/java_home -v 1.8.0_45`
#export JAVA_HOME=`/usr/libexec/java_home -v 1.7.0_79`

 

Sublime Text

Sublime has gone pass the test of time for the new kid in the block text editor to a must have for all developers and dev-ops.  Install Sublim from here and its package manager

 

Font Made for Code

Install a equal width font like: Adobe Source Pro Font

 

Latest Python tools

Install python from here PIP command

 

Maven Installation

Download maven and unzip to preferred location

 

Set Path in profile:

  • ~/.bash_profile
export&nbsp;M2_HOME=/Users/developer/apache-maven-3.1.1 
export&nbsp;PATH=$PATH:$M2_HOME/bin

 

  • Post Configuration… set’up your maven .m2 settings.xml file
  • See more Installation Instructions here

 

Package manager

There are many options there but I will strongly recommend Brew :
Installs the stuff that apple did not, With Brew from here

 

# Example installation commands, as simple as:
brew install python
brew install git
brew install ruby
brew install node
brew install mongo

Automate Enabling/Disabling Hidden Files on a Mac

# show/hide hidden files
alias showFiles='defaults write com.apple.finder AppleShowAllFiles YES; killall Finder /System/Library/CoreServices/Finder.app'
alias hideFiles='defaults write com.apple.finder AppleShowAllFiles NO; killall Finder /System/Library/CoreServices/Finder.app'

 

Mac Host File Edit

Go here to edit your host file, blocking out distracting sites such as facebook, twitter, hacker news e.t.c.  Simply re-route them back to localhost:
/private/etc/hosts
# Example entry to route facebook to an invalid ip
0.0.0.0 facebook.com

# Example entry to route facebook to your internal ticketing system
jira.intranet facebook.com

Notes from Elastic {on} Tour London 3rd November 2015

 Whilst the Elastic Team are preparing material and editing the Tour’s videos here are my brief MVP notes from elastic{on} tour conference

New features of ES 2.0. lots……

  1. Elasticsearch migration plugin
    1. migration plugin to help detect any issues that may occur during upgrading to Elasticsearch 2.0. This can be installed and run before upgrading..
  2. Compartible with indices created in version 0.90 and above
  3. Faster fs to disk
  4. More use of kernel for cachingNotes from Elastic {on} Tour London 3rd November 2015
  5. Better indexing,
  6. Problem free upgrading from versions to versions
    1. Invert of index plus the actual data for faster analysis
  7. New plugin-ins, connectors are all targeting this version as minimum
  8. Removed :
    1. Rivers
    2. Facets for –> Aggregations
    3. Delete by query – is now a framework
    4. Shutdown API – removed

For more on these see Docs

 

Admin Features

  1.   – reindexing of the same index to :
    1. Same cluster
    2. Different cluster
    3. Modifying destination indice settings (shards, replicas e.t.c)

Yes. Elastic are now competing With Becchi Niccolo’s Index cloner and my Spring Boot FrontEnd App for it.  So yes if you are not moving onto Elastic V2.0 anytime soon, you can make use these to move indices around your cluster(s) and yes modify destination indices settings too

ELK

Kinana has its own server and can be scale as needed, configuration in a centralised place too.

Monitoring

  1.   Official tool :
    1. PacketBeats <— Acquired by elastic
  2.    Alternatives that I have played with – HQ, HEAD
  3.    There are multiple Beats
    1. File | Topbeat | Packetbeat
      1. Checkout demo by creators
    2. Analysis tops and push to elastic – “Top” as in unix command
      1.   can then be visualised by kibana
      2.   can be run from mutilple OSs

Extensions:

Shield Security features: can restricted to users/roles/groups to..

  1. Individual fields
  2. Individual docs
  3. Specific index/indices
  4. Type of queries? (I might have mis-heard this!)

Marvel 2

  •    Built on top of Kibana 4
  •    Easier to use
  •    Streamlined metrics

Use- cases

Excelian:

Consulting company that used elastic search as part of their solutions to build a grid for a finance firm:
  1.  40,000 cores | scalable to 100,000 cores
  2.  2 regions,one load balancer, one cluster in each region, one master node in each region (this is the holistic view)
  3.  Secure | monitor-able | ldap integration for login with SHIELD.
  4.  They used Ansible (alternatives shared puppet/chef) – open source project on packaging dependencies in one app
    1.  Use case = when running in a banking environment with no internet..
    2.  Everything installed from a single server (vs puppet/chef master and clients)

Pipeline Aggregation Talk:

Moving averages | data histogram | historical aggregations

  1. Supports multiple scripting languages e.g lucene expressions e.t.c
  2. Example Data to play with:
    1.  Nasa data sets of launches
    2.  London property sales data (London property prices)
      1. March 2012. has an anomaly that the presenter was not sure of.
      2. Underlying data for this point seemed okay.
      3. Quick googling : there was a new tax levied on properties around this date!

Goldman Sachs Search:

(a consistent Search Experience In No Time) – Reuben Tonna -Vice President
Requirements
  1.  Consistent user experience – diff data same tool
  2.  Zero ui dev effort
  3.  Self service on-boarding
  4.  Operate on large data set quickly – search and filtering
  5.  Enable dev to focus on modelling data
  6.  Facilitate adoption of new ui technologies
  7.  Support for various data source technologies
Elastic benefits on top
  1.  Great performance- improves the u.x.
  2.  Scales to very large data sets
  3.  Aggregation provide a way to slice and dice the data
  4.  Quality documentation – lowered the entry barrier…
  5.  Less development time
Configurations
  1.  Each data is configured for “entitlement” as part of the on-boarding process
  2.  All users have access to the same UI but only data they are allowed to see
  3.  Diff dataset form multiple sources – elasticsearch, OData and SQL via their respective adapters to the UI (GS Search UI and Services)
  4.  Lots of commonality with kibana, this could have been done with kibana???

Goldman Sachs Search:

(Building a firm wide single task list) – Stephen Coster -Vice President
Requirement
  1.     Develop a web based single task list manager for the whole firm (.NET – Gui but back-end all linux in Golman’s)
  2.     Five million tickets
  3.     Distributed to 38,000 users around the globe
  4.     Latency region of 4 – 10secs
  5.     Data to be source from mult production instances
  6.     Live updating site as user updates the datasources
  Architecture
  1.    In memory elstic index
  2.    Index sequence stream of data
  3.    No need for the server to maintain any user specific sessions state
  4.    Use an offset to returned data set to enable infinite scroll funtionality
  5.    Server side facet calculation
  6.    6 production cluster instances (“Sequence sources”)
    1.    these are then aggregated to a single endpoint via a software load balancer….
    2. Camunda BPM – BPMN 2.0 Engine

Between Two Ferns – fireside chat

Richard Owens – Senior Systems Engineer at Huddle interview by: Marty Messer – VP customer care | elastic

  1. Example use case was to Query logs
  2. Monitor different events from tons of applications
  3. Indexing diff docs – pdf, logs etc.
    1. Used own service for extracting text from various documents before moving to elastic search
    2. Meta data index in elastic search –
    3. Web ui hits –> files api which then hits –> Elasticsearch and returns
  4. Historically coming from a monolith to micro-services architecture.
  5. Great use of the ELK stack – Marvel seem to help them in production
  6. Initially with self implemented security, then moved to shield –
  7. SHIELD was the main driver for their enterprise subscription as they wanted ssl, https protection etc

Questions for Shay Banon – Founder & CTO of Elastic

  ES3 new features? Nothing official here yet?

  1.  Trunk currently has new changes happening
  2.  Consistency of data improvements
  3.  Multiple cluster replication
  4.  Ability to deploy a plugin that can coherently plug itself across the stack… not just for kibana or elastic
 

Others:

  1.  For integrations just stream your immutable data into elastic from the source /  but no plan for a direct integration with any sql db such as SQL Server 20XX

Feature of elastic as a company?

  1.   Not static… looking forward for new problems… 300 clever diverse people.
  2.   Lots of innovations based on people e.g mark with graph on elastic
  3.   Learn to expect the unexpected and embrace it especially from clever people.
  4.   Built on top of open-source – making huge investments 300-400 commits to open-source projects (a week, month?… i don’t remember)
    1.   approx top 8 developers for lucene are employed by elastic
    2.   commercial aspects is adding on top of open-source… but open-source never stagnant.
  5.   Lots of excitement around found and kibana and the notion of double clicking and having all these good stuff available with great smart defaults

Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. Elasticsearch is the second most popular enterprise search engine after Apache Solr. Wikipedia | Elastic Blog

Springpad is shutting down on June 25th

SpringPad

Good Bye Spring Pad.  Springpad is shutting down on June 25th

My First thought

“Well really! that’s not a surprise”

My First feeling

“deeply sadden”

Over the years Springpad has been a very reliable companion to myself and many others in my circles. Now time to make use of the export features and move on. But first a big THANK YOU to all that made Springpad the wonderful app it was.  The review and think twice about where next you are taking your data….

Two options:

  1. Migrate Your Data to other services
  2. Export Your Data
    1. HTML file
    2. JSON file for Importing to Other Services

Best of luck. I surely will need it..   I will update if i ever find a good enough alternative…

For More

Where are the engineers headed to?  … you guessed it.. Google!

IntelliJ – Configuring JVM options and platform properties

IntelliJ JVM Options

Location:

IDE_HOME\bin\<product>[bits][.exe].vmoptions file. 

-Xms128m
-Xmx512m
-XX:MaxPermSize=250m
-XX:ReservedCodeCacheSize=64m
-XX:+UseCodeCacheFlushing
-ea
-Dsun.io.useCanonCaches=false

IntelliJ IDEA Platform Properties

File location :

IDE_HOME\bin\idea.properties

idea.max.intellisense.filesize=2500
idea.cycle.buffer.size=1024

source: https://intellij-support.jetbrains.com/entries/23395793

IMAP and SMTP settings for Outlook/Live Mail

 IMAP and SMTP Settings

For Thunderbird mail and other supported app clients

  • Incoming (IMAP) Server
    • Server address: imap-mail.outlook.com
    • Port: 993
    • Encrypted Connection: SSL
  • Outgoing (SMTP) Server
    • Server address: smtp-mail.outlook.com
    • Port: 25 (or 587 if 25 is blocked)
    • Authentication: Yes
    • Encrypted Connection: TLS
  • User name: Your email address
  • Password: Your password

 

More here: http://windows.microsoft.com/en-GB/windows/outlook/send-receive-from-app

Manually Adding Oracle JDBC to Maven Repo

Thanks to Oracle, Oracle JDBC jars cant be downloaded from maven repositories… therefore, your pom will through this error :

Missing artifact com.oracle:ojdbc14:jar:10.2.0.1.0

Solution:

Find specific version on the Oracle site or download from others sites like this.

Then place in your .m2 folder and refresh your pom.

.m2\repository\com\oracle\ojdbc14\10.2.0.4.0

Maven will magically resolve the other related files such as the repository file, pom file e.t.c:

or you can manually cd to the downloaded location and lunch the maven command to install 3rd party libraries:

mvn install:install-file -Dfile=<path-to-file> -DgroupId=<group-id> -DartifactId=<artifact-id> -Dversion=<version> -Dpackaging=<packaging> -DgeneratePom=true

Java EE Environment Setup Links

 

Application Description
Eclipse IDE
Java JDK development tools
Tomcat Application Server
Subclipse Source Control eclipse plugin for subversion
Maven Tools create projects and manage dependencies handle class paths for dependences, build tool, compile package, release and documentation
Maven-Eclipse-Plugin (m2eclipse) Note: Currently installed with latest Eclipse version (3.6 Helios)
 Spring Tool Suite
 Subversion
Subclipse

 

Environment Variables Continue reading “Java EE Environment Setup Links”