Graph DB Connectors – elasticsearch example

Semantic Search gets the power of Full Text Search

 

Pre-requisites:

  1. An installed instance of GraphDB (currently only the OntoText Enterprise edition has connectors)
  2. An installed instance of Elasticsearch
    1. With port 9300 open and running (this can be configured in */config/elasticsearch.yml or through your puppet/chef)
    2. If you are running this on Vagrant ensure all ports are forwarded to your host [9200, 9300, 12055 etc]

 

Prepare GraphDB

  1. Setup GraphDB location

Setup Repository and switch it on to default

GrapghDB Locations And Repo

Create Elasticsearch Connector

  1. Go to the SPARQL tab
  2. Insert your query like bellow and hit run

 


PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>

INSERT DATA {inst:my_index :createConnector '''
{
  "elasticsearchCluster": "vagrant",
  "elasticsearchNode": "localhost:9300",
  "types": ["http://www.ontotext.com/example/wine#Wine"],
  "fields": [
    {"fieldName": "grape",
      "propertyChain": [
        "http://www.ontotext.com/example/wine#madeFromGrape",
        "http://www.w3.org/2000/01/rdf-schema#label"
      ]},
    {"fieldName": "sugar",
      "propertyChain": [
        "http://www.ontotext.com/example/wine#hasSugar"
      ],"orderBy": true},
    {"fieldName": "year",
      "propertyChain": [
        "http://www.ontotext.com/example/wine#hasYear"
      ]}]}
''' .
}

3.  Go over to Elasticsearch and confirm that you have a newly created index [my_index], this will be empty for now

4.  Example debugging to do is check for the listed Connectors and its status:


PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>

SELECT ?cntUri ?cntStr {
  ?cntUri :listConnectors ?cntStr .
}

PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>

SELECT ?cntUri ?cntStatus {
  ?cntUri :connectorStatus ?cntStatus .
}

 

Insert Data in GraphDB

 

  1. The Connector should listen in for any data changes and insert/update/sync the accompanying elastic copy.

 


@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix : <http://www.ontotext.com/example/wine#> .

:RedWine rdfs:subClassOf :Wine .
:WhiteWine rdfs:subClassOf :Wine .
:RoseWine rdfs:subClassOf :Wine .

:Merlo
    rdf:type :Grape ;
    rdfs:label "Merlo" .

:CabernetSauvignon
    rdf:type :Grape ;
    rdfs:label "Cabernet Sauvignon" .

:CabernetFranc
    rdf:type :Grape ;
    rdfs:label "Cabernet Franc" .

:PinotNoir
    rdf:type :Grape ;
    rdfs:label "Pinot Noir" .

:Chardonnay
    rdf:type :Grape ;
    rdfs:label "Chardonnay" .

:Yoyowine
    rdf:type :RedWine ;
    :madeFromGrape :CabernetSauvignon ;
    :hasSugar "dry" ;
    :hasYear "2013"^^xsd:integer .

:Franvino
    rdf:type :RedWine ;
    :madeFromGrape :Merlo ;
    :madeFromGrape :CabernetFranc ;
    :hasSugar "dry" ;
    :hasYear "2012"^^xsd:integer .

:Noirette
    rdf:type :RedWine ;
    :madeFromGrape :PinotNoir ;
    :hasSugar "medium" ;
    :hasYear "2012"^^xsd:integer .

:Blanquito
    rdf:type :WhiteWine ;
    :madeFromGrape :Chardonnay ;
    :hasSugar "dry" ;
    :hasYear "2012"^^xsd:integer .

:Rozova
    rdf:type :RoseWine ;
    :madeFromGrape :PinotNoir ;
    :hasSugar "medium" ;
    :hasYear "2013"^^xsd:integer .

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Cleaning Elasticsearch Data being indexed

Sometimes we just don’t have control over the source of data coming into our elasticsearch indices.  In such cases cleaning Elasticsearch data and removing unwanted data such as html tags before they are put into your elasticsearch index.  This is to prevent unwanted and unpredictable behaviour.

For instance given the text bellow:

<a href=\"http://somedomain.com>\">website</a>

 

If the above is indexed without clean the html, a search for “somedomain” will match documents with the above link.  It might be what you want, but in most cases users do not.  So to prevent this you can use a custom analyser to clean your data.
Bellow is an example solution with cool techniques to debug and analyse your analyser such as query the actual data that is in your index. Note not the Elasticsearch document _source field which will always hold the true 100% raw data that hits elasticsearch unmodified.

Cleaning Elasticsearch Data

 

Create a new

Index with the required html_strip mapping filter configured

PUT /html_poc_v3
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_html_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "char_filter": [
            "html_strip"
          ]
        }
      }
    }
  },
  "mappings": {
    "html_poc_type": {
      "properties": {
        "body": {
          "type": "string",
          "analyzer": "my_html_analyzer"
        },
        "description": {
          "type": "string",
          "analyzer": "standard"
        },
        "title": {
          "type": "string",
          "index_analyzer": "my_html_analyzer"
        },
        "urlTitle": {
          "type": "string"
        }
      }
    }
  }
}

 

 

Post Some Data

POST /html_poc_v3/html_poc_type/02
{
  "description": "Description &lt;p&gt;Some d&amp;eacute;j&amp;agrave; vu &lt;a href=\"http://somedomain.com&gt;\"&gt;website&lt;/a&gt;",
  "title": "Title &lt;p&gt;Some d&amp;eacute;j&amp;agrave; vu &lt;a href=\"http://somedomain.com&gt;\"&gt;website&lt;/a&gt;",
  "body": "Body &lt;p&gt;Some d&amp;eacute;j&amp;agrave; vu &lt;a href=\"http://somedomain.com&gt;\"&gt;website&lt;/a&gt;"
}

Now retrieve indexed data

This will by-pass the _source field and fetch the actual indexed data/tokens

GET /html_poc_v3/html_poc_type/_search?pretty=true
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "title": {
      "script": "doc[field].values",
      "params": {
        "field": "title"
      }
    },
    "description": {
      "script": "doc[field].values",
      "params": {
        "field": "description"
      }
    },
    "body": {
      "script": "doc[field].values",
      "params": {
        "field": "body"
      }
    }
  }
}

 Example Response

 Note: the difference for title, description and body

{
  "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "html_poc_v3",
            "_type": "html_poc_type",
            "_id": "02",
            "_score": 1,
            "fields": {
               "title": [
                  [
                     "Some",
                     "Title",
                     "déjà",
                     "vu",
                     "website"
                  ]
               ],
               "body": [
                  [
                     "Body",
                     "Some",
                     "déjà",
                     "vu",
                     "website"
                  ]
               ],
               "description": [
                  [
                     "a",
                     "agrave",
                     "d",
                     "description",
                     "eacute",
                     "href",
                     "http",
                     "j",
                     "p",
                     "some",
                     "somedomain.com",
                     "vu",
                     "website"
                  ]
               ]
            }
         }
      ]
   }
}

Further Cleaning Elasticsearch Data References:

Use this tool to test you analyser : elasticsearch-inquisitor

 

Missing logs in Elasticsearch logs at midnight

Case: of the Missing logs

I was debugging a curious case of my Elasticsearch instance on my vagrant dev box going to RED state every night at 00:00:00.  Consistently as far back as I can remember.

Right the obvious thing to do is look at the logs right? Except for this set of rotated logs there are no lines between 23:40hrs to 00:00:05.  Not in the current un-rotated log or the previous set.

At First Pass:

  1. Elasticsearch rotates its own log.  Could it be this process causing the missing Elasticsearch log lines?
  2. Marvel Creates new daily indices at 00:00:00.  Could it be this causing the missing Elasticsearch log lines?

What was the real was causing the missing logs

Well By default Elasticsearch uses log4j.  However, instead of the standard log4j.property file you get with log4j Elasticsearch is using a translated format to YAML format excluding all of the log4j pre-fix giveaways.  Another closer look at the configuration lead to the curious investigation of the type of rolling appender ; DailyRollingFile. This lead to this revelation :

DailyRollingFileAppender extends FileAppender so that the underlying file is rolled over at a user chosen frequency. DailyRollingFileAppender has been observed to exhibit synchronization issues and data loss. The log4j extras companion includes alternatives which should be considered for new deployments and which are discussed in the documentation for org.apache.log4j.rolling.RollingFileAppender.

Source :  Apache’s DailyRollingFileAppender Documentation

Missing Elastic logs Root Cause:

The sync issue with the DailyRollingFileAppender must be the cause to the missing Elasticsearch log lines around midnight.

Missing Elastic logs fix:

Use a log4j alternatives to DailyRollingFileAppender.  In this case a RollingFileAppender, changing my rolling strategy to roll my logs when they reach a certain file size. Replace DailyRollingFileAppender with RollingFileAppender and removing the  datePattern which was for the DailyRollingFileAppender.

Example:

file:
    type: rollingFile
    file: ${path.logs}/${cluster.name}.log
    maxFileSize: 10000000
    maxBackupIndex: 10
    layout:
        type: pattern
        conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

Note: YAML is particular about tabs!

Happy Ending

Marvel turns out to be the cause of the my Elasticsearch cluster going into RED state at mid-night on new .Marvel*** Index creation.  Which makes sense as there will be a few milliseconds-seconds when this new index will have been created with shards, replicas etc missing.

ElasticSearch Curator Short Guide

Elasticsearch Curate Features

Curate, or manage your Elasticsearch indices
  • Alias Management – add, remove
  • Shard routing allocation
  • Indices Management – Close , Delete indices, Open closed indices, Optimize indices and modify number of replicas
  • Snapshot(backups management) – Show, Backup, Restore.
  • Change the number of replicas per shard for indices
  • Pattern Matching for statements (e.g delete all indices matching .marvel*)

 

Example Elasticsearch Curate Use Cases

Automatically/manually

  • Snapshots Index creation
    • All indices except Marvels regex support
  • Restore from snapshots (using the Elasticsearch end point via curl ….. Not with curator currently)
    • All indices in the back up
    • Specific index from the backup
    • Keep cluster state as is
  • Delete indices
    • Older than a date range
    • By a regex matching pattern

Background

  • Started off as clearESindices.py for a simple goal to delete indices.
  • It then became logstash_index_cleaner.py
  • Then moved under logstash repository as “expire_logs”.
  • After Jordan Sissel was hired by Elastic it then became Elasticsearch Curator and is now hosted at https://github.com/elastic/curator
  • Today : Curator now performs many operations on your Elasticsearch indices, from delete to snapshot to shard allocation routing.

 

Curator all-in Command Line

  • Curator 2.0 underlying feature was the first attempt to detach the popular Elasticsearch API from the CLI.
  • These allows scripting to carry out any of the tasks featured above meaning you can for instance:
    • Automate your back up and restore procedures.
  • Also with version 2.0, Curator ships in an API along side the wrapper scripts/ entry points. This API allows you to roll out your own scripts to perform similar or totally different tasks to Curator with the same underlying code that curator uses…
  • Documentation can be found here

 

Curator installation for Mac


# you may ignore the python installations if you already have this
wget https://bootstrap.pypa.io/get-pip.py
python get-pip.py

wget https://bootstrap.pypa.io/get-pip.py
sudo python get-pip.py

wget https://pypi.python.org/packages/source/u/urllib3/urllib3-1.8.3.tar.gz
pip install urllib3-1.8.3.tar.gz

wget https://pypi.python.org/packages/source/c/click/click-3.3.tar.gz -O click-3.3.tar.gz
sudo pip install click-3.3.tar.gz

# install elasticsearch-py
wget https://github.com/elastic/elasticsearch-py/archive/1.6.0.tar.gz -O elasticsearch-py.tar.gz
sudo pip install elasticsearch-py.tar.gz

# install elasticsearch-curator
wget https://github.com/elastic/curator/archive/v3.3.0.tar.gz -O elasticsearch-curator.tar.gz
sudo pip install elasticsearch-curator.tar.gz

# To test : Verify version
curator --version
# should echo: "curator, version 3.3.0"

# To example Commands try the help menu
curator --help
# should echo : help menu options

For more installation guides please see the official Elasticsearch Guides

 

Curator Command Line Flags

 

Index and Snapshot Selection

–newer-than

–older-than

–prefix

–suffix

–time-unit

–timestring

–regex

–exclude

Index selection only

–index

–all-indices

Snapshot selection only

–snapshot

–all-snapshots

–repository

 

Elasticsearch Cheat Sheet and Short Examples

Quick short, Elasticsearch cheat API End-Point calls that takes a while to remember.  If I have missed your favourite or want to make a recommendation to add in please do leave comment

States


# Show all indices
GET /_cat/indices?v-
  
# cluster health state
GET /_cluster/health
  
# Show all nodes
GET /_cat/nodes?

# Show largest Index. Leverages the _CAT api
curl 'localhost:9200/_cat/indices?bytes=b' | sort -rnk8 | grep -v marvel,kibana

Index

Indexing


# Bulk Indexing Example
POST /factory/_bulk
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"swift","make":"suzuki", "mark":1, "release_year":"1998-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"swift","make":"suzuki", "mark":2, "release_year":"2003-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"baleno","make":"suzuki", "mark":1, "release_year":"2000-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"focus","make":"ford","mark":1, "release_year":"2001-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"focus","make":"ford","mark":2, "release_year":"2007-01-01"}
	  {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"rs","make":"ford","mark":2, "release_year":"2011-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"rav4","make":"toyota","mark":3, "release_year":"2009-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"mondeo","make":"ford","mark":2, "release_year":"2007-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
	 { "model":"st","make":"ford","mark":1, "release_year":"2007-01-01"}
	 {"index":{"_index":"factory", "_type":"cars"}}
 { "model":"5 series","make":"bmw","mark":3, "release_year":"2009-01-01"}

 

Index Management


PUT /my_index/_settings
 
{
  "index": {
    "number_of_replicas": 4
  }
}
 
# Add a single alias
PUT /lmg_sem_v4/_alias/lmg
 
 
# Move Shard to another node
POST /_cluster/reroute
{
    "commands" : [ {
        "move" :
            {
              "index" : "amg_sem_v12", "shard" : 0,
              "from_node" : "UK-SEARCH-STG-02", "to_node" : "UK-SEARCH-STG-01"
            }
        }
    ]
}


Index Cloning

From ElasticSearch 2.3 you you may now use the built in _reindex API to re-index data


POST /_reindex
{
  "source": {
    "index": "my-index"
  },
  "dest": {
    "index": "my-new-index"
  }
}

 

Cloning with a filter/query


POST /_reindex
{
  "source": {
    "index": "my-index",
    "query": {
      "term": {
        "has-index-cloning-with-filter-on": true
      }
    }
  },
  "dest": {
    "index": "my-new-index"
  }
}

 


# Show cluster-wide Recovery state
GET /_recovery?pretty&amp;amp;amp;amp;amp;amp;amp;amp;human
GET /_recovery?pretty&amp;amp;amp;amp;amp;amp;amp;amp;human&amp;amp;amp;amp;amp;amp;amp;amp;active_only=true
 
# show tabular cluster-wide status summary
GET /_cat/recovery?v
 
# Show me all snapshots
GET /_snapshot/_all
 
# Show settings details of snapshot repo "my_backup"
GET /_snapshot/my_backup
 
# Show all snapshot details of repo "my_backup"
GET /_snapshot/my_backup/_all
 
# Delete snapshot "snapshot_2015_09_07-13_50_48" from repo "prod-0009"
DELETE /_snapshot/prod-0009/snapshot_2015_09_07-13_50_48/
 
# Register Repo + no need to verify permission on path location
PUT /_snapshot/prod-0009?verify=false
{
   "type": "fs",
   "settings": {
      "location": "/vagrant/prod-0009",
      "compress": true,
      "max_snapshot_bytes_per_sec": "200000000",
      "max_restore_bytes_per_sec": " 500mb"
      }
}
 
# Take Snapshot of just "cmg_sem_v6" index
PUT /_snapshot/one-off-repo?wait_for_completion=true
{
  "indices": "cmg_sem_v6",
  "ignore_unavailable": "true",
  "include_global_state": false
}
 
 
# Restore Snapshots of all index + global state
POST /_snapshot/prod-0009/snapshot_2015_09_11-10_23_29/_restore
 
# Restore Snapshots of only "log_river" index
POST /_snapshot/prod-0009/snapshot_2015_09_11-10_23_29/_restore
{
  "indices": "log_river",
  "rename_pattern": "index_pattern",  
  "rename_replacement": "restored_pattern" 
  "ignore_unavailable": "true",
  "include_global_state": false
}
 
# Speed up Recovery Speed
PUT /_cluster/settings
{
   "persistent": {
      "cluster.routing.allocation.node_concurrent_recoveries": "5",
      "indices.recovery.max_bytes_per_sec": "200mb",
      "indices.recovery.concurrent_streams": 5
   }
}
 

Having trouble With .Marvel* index creation?


# You can view the current settings template with :
curl -XGET localhost:9200/_template/marvel
 
# Modify settings with:
PUT /_template/marvel_custom
{
    "order" : 1,
    "template" : ".marvel*",
    "settings" : {
        "number_of_replicas" : 0,
        "number_of_shards" : 5
    }
}
 

 

More here

Move/Route shards to another elasticsearch node


POST /_cluster/reroute
{
    "commands" : [ {
        "move" :
            {
              "index" : "amg_sem_v12", "shard" : 0,
              "from_node" : "UK-SEARCH-STG-02", "to_node" : "UK-SEARCH-STG-01"
            }
        }
    ]
}

Top Elasticsearch plugins

These are my top must have Elasticsearch plugins, from monitoring clusters to moving indices and managing Elasticsearch snapshots.  If you are here you may already know of Elasticsearch’s marvel plugin, with combination with Sense you mostly have all you will need to manage Elasticsearch.  If like me Marvel and Sense are not enough for your workflow or just curious about what plugins by the Elasticsearch community can offer you please read on.

NOTE: Some of these plugins are not being maintained anymore since site plugins got removed from new version of Elasticsearch.  I will soon update this list with workaround/alternatives.  Meanwhile you may follow the links to the plugins landing pages of for the latest update/news/alternative.

Head

 

Elasticseach head cluster OverviewBy default the first thing I will always install.  Most times even before I install the official elastic search Marvel plugin suits.

  • Not quite polished for aesthetics but does what it needs to do very very good and that is quick snap via of the status of your cluster, nodes, shards etc.
  • Quick handy drop-down buttons to drop an index, alias or set and alias.
  • You can also easily write up your own queries and view them as tables or json.
  • Follow/contribute to the project on GitHub – elasticsearch-head

kopf

 

elasticsearch kopf cluster view

Another lovely tool aimed more at administering your elasticsearch cluster.  Very light weight and covers commonly peformed tasks, by no means comprehensive but its getting better and better.  Best thing this have over HQ is the REST client which you can leverage to explore most of what Elasticsearch API exposes.

Note this can also be run locally without being installed as a plugin albeit some browser limitations:

git clone git://github.com/lmenezes/elasticsearch-kopf.git
cd elasticsearch-kopf
git checkout <a given branch final version>
open _site/index.html

Note this can also be run locally without being installed as a plugin albeit some browser limitations:

 

WhatsOn

 

elasticsearch whatson

Aimed at elasticsearch cluster monitoring, visualisation and inspection with lots of stats. Can be used without installing at Whatson Git Hub Page.  Most Useful in large clusters.

 

ElasticSearch-Hq

 

HQ Elastic clusterhealth

For

  • Performance metrics reports
  • Monitoring
  • Charts
  • Open-source project.
  • Elastic search management tool.
  • Available as a Plug/Hosted/Self-Hosted war
  • Query functionality (limited, currently not customisable)
  • Follow/contribute to the project on GitHub – elasticsearch-HQ

For ElasticHQ users usage Stats  go here

 

Big Desk

 

bigdesk cpu

In short Big Desk pulls data from Elasticsearch via REST API calls in convert this to numerous charts of statistics about your elasticsearch cluster.

  • can be installed as a plugin
  • download and run locally or
  • run from the web
  • follow/contribute to the project on GitHub-bigdesk

tokenizers

 

 

 

 

 

 

  • Priceless tool for tuning results relevancy.  Exposes easy means to testing your Analysers, mappings and queries anatomy.
  • Fully dedicated tab for testing Tokenisers, Analysers and most importantly your Queries
  • This is a a tool I will highly recommend for learning as it quickly helps you learn how elasticsearch interpret, parses and match/not queries
  • Shame this plugin has not been updated in recent time but feel free to contribute at Git-hub elasticsearch-inquisitor

Notes from Elastic {on} Tour London 3rd November 2015

 Whilst the Elastic Team are preparing material and editing the Tour’s videos here are my brief MVP notes from elastic{on} tour conference

New features of ES 2.0. lots……

  1. Elasticsearch migration plugin
    1. migration plugin to help detect any issues that may occur during upgrading to Elasticsearch 2.0. This can be installed and run before upgrading..
  2. Compartible with indices created in version 0.90 and above
  3. Faster fs to disk
  4. More use of kernel for cachingNotes from Elastic {on} Tour London 3rd November 2015
  5. Better indexing,
  6. Problem free upgrading from versions to versions
    1. Invert of index plus the actual data for faster analysis
  7. New plugin-ins, connectors are all targeting this version as minimum
  8. Removed :
    1. Rivers
    2. Facets for –> Aggregations
    3. Delete by query – is now a framework
    4. Shutdown API – removed

For more on these see Docs

 

Admin Features

  1.   – reindexing of the same index to :
    1. Same cluster
    2. Different cluster
    3. Modifying destination indice settings (shards, replicas e.t.c)

Yes. Elastic are now competing With Becchi Niccolo’s Index cloner and my Spring Boot FrontEnd App for it.  So yes if you are not moving onto Elastic V2.0 anytime soon, you can make use these to move indices around your cluster(s) and yes modify destination indices settings too

ELK

Kinana has its own server and can be scale as needed, configuration in a centralised place too.

Monitoring

  1.   Official tool :
    1. PacketBeats <— Acquired by elastic
  2.    Alternatives that I have played with – HQ, HEAD
  3.    There are multiple Beats
    1. File | Topbeat | Packetbeat
      1. Checkout demo by creators
    2. Analysis tops and push to elastic – “Top” as in unix command
      1.   can then be visualised by kibana
      2.   can be run from mutilple OSs

Extensions:

Shield Security features: can restricted to users/roles/groups to..

  1. Individual fields
  2. Individual docs
  3. Specific index/indices
  4. Type of queries? (I might have mis-heard this!)

Marvel 2

  •    Built on top of Kibana 4
  •    Easier to use
  •    Streamlined metrics

Use- cases

Excelian:

Consulting company that used elastic search as part of their solutions to build a grid for a finance firm:
  1.  40,000 cores | scalable to 100,000 cores
  2.  2 regions,one load balancer, one cluster in each region, one master node in each region (this is the holistic view)
  3.  Secure | monitor-able | ldap integration for login with SHIELD.
  4.  They used Ansible (alternatives shared puppet/chef) – open source project on packaging dependencies in one app
    1.  Use case = when running in a banking environment with no internet..
    2.  Everything installed from a single server (vs puppet/chef master and clients)

Pipeline Aggregation Talk:

Moving averages | data histogram | historical aggregations

  1. Supports multiple scripting languages e.g lucene expressions e.t.c
  2. Example Data to play with:
    1.  Nasa data sets of launches
    2.  London property sales data (London property prices)
      1. March 2012. has an anomaly that the presenter was not sure of.
      2. Underlying data for this point seemed okay.
      3. Quick googling : there was a new tax levied on properties around this date!

Goldman Sachs Search:

(a consistent Search Experience In No Time) – Reuben Tonna -Vice President
Requirements
  1.  Consistent user experience – diff data same tool
  2.  Zero ui dev effort
  3.  Self service on-boarding
  4.  Operate on large data set quickly – search and filtering
  5.  Enable dev to focus on modelling data
  6.  Facilitate adoption of new ui technologies
  7.  Support for various data source technologies
Elastic benefits on top
  1.  Great performance- improves the u.x.
  2.  Scales to very large data sets
  3.  Aggregation provide a way to slice and dice the data
  4.  Quality documentation – lowered the entry barrier…
  5.  Less development time
Configurations
  1.  Each data is configured for “entitlement” as part of the on-boarding process
  2.  All users have access to the same UI but only data they are allowed to see
  3.  Diff dataset form multiple sources – elasticsearch, OData and SQL via their respective adapters to the UI (GS Search UI and Services)
  4.  Lots of commonality with kibana, this could have been done with kibana???

Goldman Sachs Search:

(Building a firm wide single task list) – Stephen Coster -Vice President
Requirement
  1.     Develop a web based single task list manager for the whole firm (.NET – Gui but back-end all linux in Golman’s)
  2.     Five million tickets
  3.     Distributed to 38,000 users around the globe
  4.     Latency region of 4 – 10secs
  5.     Data to be source from mult production instances
  6.     Live updating site as user updates the datasources
  Architecture
  1.    In memory elstic index
  2.    Index sequence stream of data
  3.    No need for the server to maintain any user specific sessions state
  4.    Use an offset to returned data set to enable infinite scroll funtionality
  5.    Server side facet calculation
  6.    6 production cluster instances (“Sequence sources”)
    1.    these are then aggregated to a single endpoint via a software load balancer….
    2. Camunda BPM – BPMN 2.0 Engine

Between Two Ferns – fireside chat

Richard Owens – Senior Systems Engineer at Huddle interview by: Marty Messer – VP customer care | elastic

  1. Example use case was to Query logs
  2. Monitor different events from tons of applications
  3. Indexing diff docs – pdf, logs etc.
    1. Used own service for extracting text from various documents before moving to elastic search
    2. Meta data index in elastic search –
    3. Web ui hits –> files api which then hits –> Elasticsearch and returns
  4. Historically coming from a monolith to micro-services architecture.
  5. Great use of the ELK stack – Marvel seem to help them in production
  6. Initially with self implemented security, then moved to shield –
  7. SHIELD was the main driver for their enterprise subscription as they wanted ssl, https protection etc

Questions for Shay Banon – Founder & CTO of Elastic

  ES3 new features? Nothing official here yet?

  1.  Trunk currently has new changes happening
  2.  Consistency of data improvements
  3.  Multiple cluster replication
  4.  Ability to deploy a plugin that can coherently plug itself across the stack… not just for kibana or elastic
 

Others:

  1.  For integrations just stream your immutable data into elastic from the source /  but no plan for a direct integration with any sql db such as SQL Server 20XX

Feature of elastic as a company?

  1.   Not static… looking forward for new problems… 300 clever diverse people.
  2.   Lots of innovations based on people e.g mark with graph on elastic
  3.   Learn to expect the unexpected and embrace it especially from clever people.
  4.   Built on top of open-source – making huge investments 300-400 commits to open-source projects (a week, month?… i don’t remember)
    1.   approx top 8 developers for lucene are employed by elastic
    2.   commercial aspects is adding on top of open-source… but open-source never stagnant.
  5.   Lots of excitement around found and kibana and the notion of double clicking and having all these good stuff available with great smart defaults

Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. Elasticsearch is the second most popular enterprise search engine after Apache Solr. Wikipedia | Elastic Blog