An Elastic technical review.

An Elastic technical review.

PART 1 <<<
1 - Overview
2 - Elastic Overview
3 - Compare to MDB

PART 2
4 - Strengths, in Haiku

PART 3
5 - Use Cases
6 - Final Takeaways

This post is so long that it had to broken up into 3 parts. Use “View Whole Thread” to view it all.

OVERVIEW

As I said before, EVERY COMPANY MUST be a tech-driven company. This new tech landscape is driving the hyper-growth stories we are seeking here, as companies sprout up to be the “picks and shovels” plays that are supplying this gold rush of technological innovation. They are creating new tools that are enabling companies to solve their own problems themselves, that are cross-applicable to every business, across every industry. And luckily for us, it still seems like the early innings!

So I’ve had a series of posts, doing technical deep dives to try to isolate what companies are doing to make their service so sticky, as well as other posts about the tech behind companies products:

I have spoken on MDB twice before, and haven’t felt the need to dive very deep into the technical details of their product line, as it seems pretty easily understood – once you know what a NoSQL document store is, you know what MongoDB excels at. But let’s walk through their history a bit and where it has put them strategically. [Reminder, I call the company MDB to differentiate product from company. Elastic thankfully makes it easier so I refer to them by name. Downside is this means I use the word “elastic” over 200 times here.]

MDB started by making an open-source NoSQL database, then it sold support and tooling for that database to enterprises that were using it for either their internal database or as an embedded database within their products. Once cloud computing took hold, MDB then started providing a managed, vendor-neutral, cloud hosting service for its core database… one that its customers flocked to, for its scale, high availability (HA), ease of use, and the fact it completely saves them money by eliminating costs around infrastructure and ongoing maintenance. MDB’s approach has them creating a core platform around MongoDB of tools that reduce customer friction – for either self-hosted or for managed Atlas. They have apps for data exploration (Compass) and a mgmt interface (Ops Manager or Cloud Manager). They have SaaS tooling around Atlas service, like a serverless platform (Stitch), a cloud migration tool, and a visualization dashboard tool (Charts, in beta). And as I discussed before, it’s now increasing customer flexibility and increasing the applicability of its platform with its moves into being a synchronized mobile database (also in beta, but, finally, with a major acquisition to help them move faster).

Elastic is an incredibly similar storyline to MongoDB – the database and the company – but their technology stack and solutions it provides and the TAM it has are a bit tougher to understand. So today, we dive deep into Elastic, and its suite of technologies that underpin its appeal to customers, and its new product lines spinning from that core.

How do I know the company’s products so well? Besides being a software developer that works with a lot of databases and data feeds, I have worked with Elasticsearch (not the full ELK stack) for the past 4 years, using it as a vital piece of my architecture. More recently, I have run some parts of my stack within the AWS environment the past year (more for the data storage resources than compute). I’m about to try using managed Elasticsearch in AWS, and, besides it being a data store within my stack, I’m also about to start using ELK for APM and monitoring of my stack. [No brainer that I should have implemented long ago, it’s just I don’t have the time; too many other interesting projects (around data streaming) to do!]

Warning: There is a lot to like in Elastic, in ways that excite me beyond what MDB is doing. But for that, you’ll need to do a lot of reading below to get to the Final Thoughts. But don’t just jump there… I recommend the middle bits too! Dammit, don’t miss the haiku! My last deep dive into the tech behind a company was Okta – and this deep dive is even longer. For one, I know the company way better. For two, it was worth diving in deeper into their strengths and strategies.

ELASTIC OVERVIEW

Elastic is known for its suite of products it calls the Elastic Stack. (It was first called the “ELK” stack after the first 3 products, and it’s still mostly called that.) After starting as a company focused on its Elasticsearch search database, Elastic shifted gears early on towards being a solution for specific use cases, when it acquired companies with complimentary tools, then integrated them into a platform around its core engine.

Their Getting Started docs describe the core well enough: Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time. It is generally used as the underlying engine/technology that powers applications that have complex search features and requirements. https://www.elastic.co/guide/en/elasticsearch/reference/curr…

Elastic Stack is:

  • Elasticsearch, the search and analytics database at the core.
  • Logstash, the data processing and transformation pipeline, for data ingestion into Elasticsearch.
  • Kibana, the visual interface over Elasticsearch, with data visualization dashboards and a cluster & data management interface.
  • Beats, light-weight data shippers utilized for transmitting monitoring data from network and systems, and ingesting them into Elasticsearch.
  • “Features” (formerly X-Pack), are modules that enhance the capabilities of Elastic Stack, such as adding cluster monitoring, alerting, data security, reporting, machine learning (ML), and a visual presentation app called Canvas. [Not sure why Elastic decided to now blandly call it “Features”. I guess they let the new marketing intern have a go at it.]

The Product Line:

Elasticsearch (ES) is the open-source NoSQL database at the heart of the stack, which provides search and analytic capabilities over your data. It is built over the open-source Apache Lucene indexing engine (created by Doug Cutting, eventual creator of Hadoop), but with a focus on cluster capabilities to manage and search over ever-growing datasets. It is a distributed software, making the engine as powerful as the cluster hardware it is installed on, and the cluster can easily expand over time. A developer uses a REST API interface or native Java libraries to store JSON data, and can then search or analyze that data via the query interface. Bottom line - if you are slicing and dicing over large data (hundreds of Gb or more) or big data (hundreds of Tb or more) for search or analytical purposes, Elasticsearch is ideal. It is popular, having 40k stars on Github and a high DB Engine rating (#7 overall and #1 for search). There are a few competitors in the open-source space… but as I discuss later in detail, the real competition to Elastic is elsewhere.

Alternate open-source search-based databases on the market are:

  • Apache Solr, which is also based on Lucene. But it pales in comparison, having only 2.5k vs 40k stars for ES on Github as a sign of its popularity, and #16 on DB Engine ratings. It has very little of the surrounding ecosystem of tools that ES has. It emerged out of CNET, and later was merged with the Apache Lucene project itself. An company, Lucidworks, was created to support it, and they created their own enterprise edition of Solr called Fusion. [Contrary to anecdotal commentary on a recent post, this is not a company I would look to as supplanting Elastic in any way. But yes, it is a direct competitor.]
  • Apache Druid, a OLAP/BI analytics engine, which has 8k stars on Github. It’s also a clustered data engine but is a lot more convoluted & complex to run. It came out of work done at Metamarkets, an marketing analytical SaaS company, and now has an enterprise company, Imply, supporting it. Druid is a columnar store, which means it tracks field data together, not as separate rows. This allows for advanced analytics. [I’d keep my eye on it, but ultimately it isn’t gaining much momentum, likely due to the complexity of the cluster setup required.]

Other competition is other scalable NoSQL databases, like MongoDB or Cassandra. Developers may prefer and pick those, but the search and analytical capabilities of those pale in comparison to what Elasticsearch is capable of. Plus there is no tooling around those for doing ingest, like Logstash and Beats.

Kibana is the visual interface over Elasticsearch. At it’s core, it’s a visualization dashboard web app that allows you to rapidly graph ad-hoc queries against your ES data, and create persistent visualization dashboards. It is pretty similar to another open-source visualization dashboard, Grafana, but is more closely tied to having ES as the underlying database, and, unlike Grafana, provides a mgmt interface over your ES cluster and its data. Kibana also includes “out-of-the-box” dashboards for specific server apps you are monitoring with Beats. As you hook up monitoring over your server-side apps (say a PostgreSQL database or an Nginx web server) with Filebeat and Metricbeat, you can use Kibana’s dashboard templates honed for that specific application as a starting point, then customize it from there as desired. It has plug-in modules for time-series data visualization (Timelion), geospatial visualization over maps (Elastic Maps), and exposes dashboards over many of the X-Pack features like ML anomaly detection and APM monitoring.

Logstash is the ingestion piece, that allows for continuously reading in logs from various servers, transforming log entries into JSON objects and ingest them into ES. It has a rich system of data pipeline steps, where you can convert, enrich, filter and transform log data prior to ingestion.

Beats is a collection of light-weight “data shippers”, each specific to collecting a type of data feed from remote servers or devices. This includes a log file shipper, metric shipper, network traffic monitoring and more. They are installed as system agents on your servers, which allow you to continuously collect data and ingest it into ES. Each beat comes with a wide variety of server apps it can work with out-of-the-box. As mentioned before, on the opposite side in Kibana, it includes sample dashboards for each server app that are honed to that app’s metrics or logs.

The Beat flavors are:

  • Filebeat for log file monitoring. It can tie into logs from server apps (like databases or web servers) for log monitoring across your infrastructure, and can do container log monitoring of Docker and Kubernetes.
  • Metricbeat for real-time server metric monitoring, including metrics from system (like cpu/disk/memory usage, or temperature readings) or server app (like databases or web servers). It can perform real-time metric monitoring of your stack, and can do container metric monitoring of Docker and Kubernetes.
  • Packetbeat for real-time network traffic & latency monitoring.
  • Auditbeat/Winlogbeat for system-level auditing of Linux/Windows systems.
  • Heartbeat for system uptime monitoring. Simpler & lighter uptime check than Metricbeat.
  • Functionbeat for real-time serverless function monitoring (i.e. monitor your AWS Lambda function).

Elastic has built up a collection of CODE-FREE out-of-the-box solutions here within Beats and Kibana for monitoring use cases. Elastic has curated a list of log & metric Beats packs and sample Kibana dashboards for a wide-variety of popular server-side applications (like PostgreSQL, MySQL, MongoDB, Cassandra, Nginx, Redis, Kafka, Kubernetes, and even Elasticsearch itself). For each system within your architecture, like your PostgreSQL database, you can use Filebeat to pull in logs, and Metricbeat to pull in real-time metrics; then within Kibana, you can pull in the PostgreSQL-specific dashboard templates, so that you can start immediately visualizing those metrics and logs. From there, you can then customize the dashboards and the queries used on the visualizations and reports within, as desired.

Beats, with its focused modules and “out-of-the-box” dashboards within Kibana, form a monitoring solution that has competition:

  • InfluxDB, a time-series database, competes on certain use cases like monitoring. It has a TICK stack that is similar to the ELK stack, with tools for ingest and metric shippers. For visualization you must rely on outside tools like Grafana. There is an enterprise company that built it is InfluxData, and they, of course, offer managed cloud hosting. I spoke with an Elastic VP at their ElasticON conference last year, and he didn’t see customers picking InfluxDB over Elastic Stack. Perhaps a slanted view, as I do see InfluxDB in use at my work and in the field.
  • The open-source Prometheus monitoring platform is popular. It too must be paired with the Grafana visual dashboard board.

Features/X-Pack are modules within the Elastic Stack to enhance the platform and help focus Elastic Stack over specific use cases, and to help manage your cluster. [I added marketing link to the higher impact ones.]

Modules include:

  • Monitoring module for cluster monitoring.
  • Security module for role-based access security over your data, down to document/field level.
  • Alerting module for cluster alerting & data monitoring via queries (get notifications on spotted errors).
  • Reporting module to generate, schedule, email reports. Generate PDFs of your Kibana dashboards.
  • SQL module to allow for ES querying via well-known SQL data querying language. (It limits the search capabilities from what you have in the API, but is helpful for developers.)
  • Hadoop Connector to directly connect Elasticsearch directly to Hadoop for querying.
  • Graph tool to explore relationships in your data (use cases: fraud detection, security analysis, user recommendations).
    https://www.elastic.co/products/stack/graph
  • ML for performing machine learning over your data and visualizing results (use cases: detect anomalies, isolate patterns, pinpoint causes, demand forecasting).
    https://www.elastic.co/products/stack/machine-learning
  • Canvas app for creating presentation visualizations from real-time queries (aka over live data). Great for real-time demo presentations and interactive info-graphics.
    https://www.elastic.co/products/stack/canvas

With X-Pack, they used to all be Premium and only come with Enterprise Support tiers. However, back in April 2018 they changed their licensing. (MDB soon followed suit with their licensing change in October 2018.) They made a new pricing tier, Basic, that includes several modules that are now free. [More later on the licensing changes.]

There are a few other ancillary services and products that Elastic has:

  • APM Server (server-side app) for collecting APM metrics from your application code and saving it to ES. Not officially a “Feature” module, as it is a stand-alone server-side app. Definitely not a Beat, either, as it requires code changes - you embed Elastic’s SDK into your code so it can start streaming live metrics from your app to the APM Server. APM may allow you to bypass needing to pull in app logs via Filebeat - it’s akin to hooking up Metricbeat directly into your app.

  • Elastic Map Service to provide high-quality global and regional maps, with which you can overlay geospatial data in Kibana. https://www.elastic.co/elastic-maps-service

  • Elastic Common Schema is an effort from Elastic to try to unify data schemas for common activities (logs, metrics, APM, networking data). It is an attempt to unify how to store like-data from different sources (say, Cisco’s firewall logs vs Fortinet’s). Elastic is hoping to convert various data sources into a single common schema, which allows you to simplify the search, analysis and visualization queries you do on that data.

Getting Support:

There are 4 tiers of support subscriptions. https://www.elastic.co/subscriptions

  • Open-Source (free) - includes ELK, APM Server, most of Beats, limited Elastic Map Service
  • Basic (free) - includes free X-Pack modules (monitoring, SQL), APM Server UI, and all of Beats and Elastic Map Service
  • Gold - provides biz hours support, plus includes rest of X-Pack (w/ only the basic Security, and no ML), and Elastic Monitoring service
  • Platinum - includes 24/7/365 support, plus all the above, adding full ML & full Security modules (SSO, ACLs down to document field, encryption at rest), cross-cluster replication.

I griped above about how X-Pack is now just referred to as “Elastic Stack Features”, a kind of bland label instead of a product name. But after reviewing that pricing chart showing features that turn on and off per tier, they are clearly blending in these modular features now; they are now directly embedded within their product lines, like ML and Security both being heavily integrated across all of Elastic Stack. I think they are losing “X-Pack” because they don’t want to think of them as separate plug-ins – they are integrated modules, and that has only expanded across other parts of the Stack. Kibana, Beats and Logstash have internal modules that turn off and on. This is why the open-source purists are upset – there are proprietary modules embedded inside open-source packages. But, it seems Elastic is pretty clear about what is and is not included in each tier. Though its a bit too detailed, perhaps - they need some higher level overview of what modules are turning on and off. Support levels are spelled out clearly. You have to go Platinum tier for ML capabilities and advanced Security features. For Gold & Platinum support levels, they offer custom training and consulting services for additional fees.

Hosting:

As for hosting their stack, a customer has the typical options plus an extra one for running your own on-premise cloud:

  • Host it themselves (self-managed, self-hosted), either on-premise or in the cloud (on EC2 or Docker instances). It’s a complicated software to configure, but obviously doable. I’ve used it entirely for free thus far – but elsewhere in my company they just bought enterprise support to use Elastic Stack for monitoring infrastructure (system logs via Filebeat and metric collection via Metricbeat). It’s up to the customer if they need support and the added features enabled by Gold/Platinum tiers.

  • Elastic Cloud is their hosting service, where Elastic can host and manage Elastic Stack clusters for you in your cloud-provider of choice. Elastic Cloud service came from their acquisition of Found in early 2015. MDB followed suit, and released Atlas service in mid-2016. Managed vendor-neutral cloud hosting is the big reason these companies are growing revenue so strongly.

  • Elastic has a 3rd option, Elastic Cloud Enterprise, which allows a company to deploy Elastic Cloud onto its own infrastructure via Docker, and use it as an internal, on-premise cloud where they can create and manage multiple Elastic Stack clusters.

COMPARE TO MDB

There are many similarities between them …

  • Both are focused around a core open-source NoSQL document store, accessed via JSON-based REST APIs or native libraries. Both engines are cluster-able and can be horizontally scaled easily (by adding more nodes to the cluster). Both data engines are built on replicated shards, which enable high availability (HA), resiliency and scale.

  • Both founders created companies around that open-source database engine that provided enterprise support and continued adding features, tools and eventually platforms around their core database. Both then expanded to create managed vendor-neutral cloud hosting of their data engine (MongoDB Atlas vs Elastic Cloud). One strength for both over the cloud vendors: the fact that the authors and maintainers of these complicated clustered database engines are the ones best suited to running a managed instance. Put the experts in charge!

  • Both provide platforms containing tools around that core database. Both have apps for managing the cluster, data exploration and visualization. When you buy MDB Enterprise Advanced subscription, beyond enterprise support you get to use their mgmt interface app (Ops Manager or Cloud Manager) for monitoring and backup, advanced modules for security & analytics, data visualization tool (Compass), and also get a commercial license to embed Mongo in your released product. When you buy an Elastic Stack Enterprise subscription, you get enterprise support as well as expanded capabilities of X-Pack plugins for security and ML.

Yet, some major differences …

  • MongoDB is a general-purpose document store with a wide set of use cases. Elasticsearch, having much better search & analytical capabilities and more flexible scaling, is a specific-to-purpose document store with a narrower [but expanding] set of use cases. If you manage a collection of data objects that has infrequent search or analytics needs, you pick MDB. For data that you need to slice and dice continually with queries to search and analyze it, you pick Elastic. And for data that is “ever growing”, you pick Elastic.

  • Given this more limited set of use cases, Elastic has had to fight harder. They have rapidly expanded their product line by acquisition, adding tools and services that helped build their core database into a platform. MDB is building its platform itself, and IMHO is subsequently moving way slower. Their new product Charts seems too little, too late – there are many other viz dashboarding tools that do this already (from open-source Grafana to proprietary Tableau).

  • Both have an “Open Source” focus, and both try to address having cloud-providers become competitors, using their own database against them. However, they have different approaches on how to address their open-source licensing to combat competition. MDB is trying to prevent cloud-providers from running a hosted cloud service using MongoDB (making them direct competition against their own Atlas service), by changing the licensing on their core database (making the OSS purists angry). Elastic is keeping the core database fully open-source (Apache 2.0 license), but is changing the licensing and open-source strategy of their bundled ‘mostly free’ X-Pack modules (also making the OSS purists angry, but this time its about the fact it’s bundled in open-source ELK). Elastic seems content to let cloud-vendors be competitors to Elastic Cloud, and letting their feature-rich modules be the differentiator.

  • Speaking of ecosystem, Elastic released a set of plug-ins for the ELK stack in 2016 that they called X-Pack, for non-core features like monitoring, security, alerting, and reporting. They started with all plug-ins being Premium (enterprise license required), but now source code is publicly available, but not open-source, and these modules are now free-to-use in their Basic tier. A subset of them are still premium and require an enterprise subscription. The Elastic Stack releases are bundling the open-source and Elastic licensed modules together. (Yes, this may cause some licensing confusion.)

  • While both MDB and Elastic have a managed cloud-hosting service plus provide users support for their self-hosted & self-managed databases, Elastic has a 3rd option - Elastic Cloud Enterprise (also from the Found acquisition). It allows their Elastic Cloud product to be installed on your on-premise infrastructure, so you can easily manage multiple ELK clusters on an internal cloud.

  • Elastic isn’t building a cloud side and a on-prem side to their platform like MDB is. It’s all Elastic Stack in the Elastic Cloud, just hosted at whatever cloud provider the customer desires, and managed by the finest experts one could find – thems that wrote it! There isn’t tooling appearing in Elastic Cloud that isn’t in core platform, unlike MDB with their Stitch serverless platform. However, the downside is that their Elastic Stack releases must bundle the proprietary modules side-by-side with the open-source products.

  • One striking difference as I walked through the product line, is the number of use cases it solves that DO NOT INVOLVE CODE. MDB is for developers only, to embed into their application stack. Elastic is for that, but also for non-developers to use without needing any custom development. IT can hook up Beats for monitoring infrastructure or network traffic. Enterprise users can feed in datasets with Logstash, for staff to query, visualize, or apply ML in Kibana. I expect this trend to continue, as it really opens up the applicability as to who can use the product line.

  • Best of all, Elastic is making exciting moves that are moving their company beyond being a do-it-yourself tool provider. There is something afoot! [More on that soon under TAM section. Keep reading! But I’ll give you a hint, it rhymes with “class”.]

To be continued, in Part 2 (due to TMF post size limit)…

-muji
long ESTC (7%)

142 Likes

An Elastic technical review.

PART 1
1 - Overview
2 - Elastic Overview
3 - Compare to MDB

PART 2 <<<
4 - Strengths, in Haiku

PART 3
5 - Use Cases
6 - Final Takeaways

STRENGTHS, IN HAIKU

“With Better Search
An ever expanding TAM
To Infinity”

  • muji, TMF Poet Society, May 2019

… MORE INSIGHTS WITH A BETTER SEARCH AND ANALYTICS ENGINE

Elasticsearch provides much, much richer SEARCH and ANALYTICS capabilities compared to document stores like MongoDB. Elasticsearch has a more of a learning curve (having a more convoluted API format, and more complexity in setup and usage), but that is for a reason – it does a LOT more.

Relational database has an engine that can host separate databases, each having a set of tables containing rows, columns (fields), indexes. Think of it as an Excel workbook (database), having multiple workbooks (tables) full of rows and columns. Each row has a statically defined set of fields in columns, and typically has a unique identifier (primary key) used for retrieving that row. Each table has indexes, that allow for faster filtering & sorting capabilities across its rows in predefined ways. “Relational” is about how data is able to join together - in Excel, it would mean the workbooks would be able to reference the rows in other workbooks. You query, insert, update and delete data via a “structured query language” (SQL), which is standardized across the industry. You can calculate statistics via aggregations in an ad-hoc way via SQL (GROUP BY, SUM, AVG, MIN, MAX, nested sub-selects, joins, etc). But it’s not capable of advanced analytics - the data would need to be exported into another package for that.

MongoDB is an engine that can host multiple databases that contains one or more collections (tables) of like documents (rows), each with a set of fields (columns) that can be varied. Each individual document (row) can be looked up via its assigned uuid (primary key). It can also have pre-determined indexes of specific fields, for faster searching. Being NoSQL, there aren’t really “relations” (joins) unless they are manually looked up in a followup query, or embedded as a sub-object in the document itself. (v3.2 did finally add a simple join mechanism that allows retrieving a child document from a separate collection.) You can group and calculate statistics via an aggregation pipeline, which is a set of instructions to filter and group rows then calculate statistics over each group.

Elasticsearch is an engine that does ALL those same things MongoDB does. It is a cluster (database) that hosts multiple indexes (tables). Each index has multiple documents (rows) comprised of fields (columns) that can be varied. Each document has an assigned uuid (primary key) to retrieve that document. From there, however, the search, aggregation, and analytic capabilities are greatly improved in Elasticsearch’s indexes over MongoDB’s collection indexes. It too has an aggregation pipeline capability, where you can nest logic to filter, sort, group and analyze results. However, its analytical capabilities are much greater than standard NoSQL data stores.

Why is the search so much better? It’s built on top of Apache Lucene, an indexing system that allows for quick lookups. Lucene started as a full-text search engine, but has greatly improved over the years in handling numerical indexes as well, including time-stamp and geo-location fields. In particular, the numerical and geospatial indexing in Lucene has greatly improved over the past few years - helped by Elastic developers, who are major committers to the Apache Lucene project.

Where Elasticsearch really starts to shine is in AD-HOC filtering and aggregations. When you set up traditional indexes in a relational database, everything must be rigidly defined. MongoDB loosens up that rigidity a bit, but still requires pre-defining fields in indexes. Elasticsearch can act as ONE GIANT INDEX over your data that doesn’t require pre-defining how you are going to look at and search on that data. Data can be any combination of structured (rigid collection of objects with schema defining each property) or unstructured (loose collection of objects with varied properties).

For analytics, Elasticsearch’s aggregation capabilities are incredibly flexible, and allows for very custom nested aggregations. Aggregations can be done over search queries, to bucket the results into groups based on a criteria (repeatedly, if desired, in a nested hierarchy). You can either retrieve the raw data, or then extract statistics about each resulting aggregated group. For example, you can group results by year & month, then by city, then aggregate statistics for each resulting nested sub-group (like 2019-01, Denver). Time-series based data can be grouped into smaller time slices, like calculating a rolling 5 minute avg of a metric over a month span. Geo-locations (long/lat) can be grouped into shapes and regions.

Elasticsearch does full-text searching through a scoring & ranking system. It exposes a wide variety of search methods over textual data (word proximity, text variation matching, top X ranking, pattern matching, fuzzy text matching, account for misspellings, etc). The other search type is structured searching, exact matching or range matching on text or numerics. This includes many numeric use cases, like timestamps, IPs or geo-locations. Numerics really open up the use cases - searching and analyzing time-series data (like sensor data) and geospatial datasets (like sensor locations) really shine within Elasticsearch, where grouping and aggregating by time period or by geo-location is handled incredibly fast. It is hands down better than relational database indexes over columns, and the improvement widens further as the size of the data grows.

As data is imported into Elasticsearch and being indexed, it can be set up to go through different text analyzers & tokenizers to split an incoming document into relevant search terms. This allows you to customize how the search internals work based on how you plan to use your data. It supports structured or unstructured data, and you can provide the structured layout (schema) as desired. You can also embed child data within a parent document and search through that.

Multiple indexes and aliases:

One big strength of Elasticsearch is the ability to search over multiple indexes in a single query, including the use of wildcard. This allows you sub-divide your indexes into natural ways you plan on querying or managing it. For example, you can create a new index per month for the same core index name. Break up your data by month, and you can then query over any number of monthly indexes as desired. For ease of crafting queries, you can set up aliases that cover multiple indexes, so you only reference one alias instead of multiple indexes each time. Aliases can keep your query static over time, such as using an alias for trailing 12 months that you shift each turn of the month, and the alias will control what monthly indexes that covers each time you query.

Filters:

First thing you set up in your query is what is the overall data you want to view (filter) and what order you want results (sort). You can filter and sort by any field within the index, like a timespan or US State – however you want to slice and dice the data. Once you isolated that desired filtered results, you can then expand that filtered query with additional nested filters, sort it, or then add aggregations.

There are 2 overall modes of searching:

  • Full-text searches attempt to match the most relevant documents to your filter. It can use relevance and token analysis, combined with boolean logic ops (like must, must_not, should), in order to generate a ranking score per document. You can create +/- score adjustments based on criteria, called “boosts”, to better hone your results (ie searching on “tests”, but not wanting “unit tests” also matched, so you demote it).

  • Structured searches make a boolean determination (a yea or a nea) per document. It allows matching against exact text (categories, tags, names, ids) or numerics (number, dates, times, geospatial locations). You can search by exact match or ranges, and use logic operations.

There are multiple types of queries over those modes.

  • Full-text queries = ranked full-text scoring (match, match phrase, multi-match, synonyms, stemming, bigram matching, phrase matching, fuzzy matching, misspellings) and span matches (word proximity, order of words)
  • Term queries = yea/nea text matches (exact terms, ranges, wildcards, and regex) like searching over tags, usernames, locations, etc
  • Numeric queries = range matches against numeric operations (greater than, less than, between) including over date, timestamp, geospatial and IP fields
  • Geo queries = geospatial location matches (within distance of a point, or within one or multiple bounding shapes)
  • Join queries = search nested child rows in document

Beyond the large number of filters available in Elasticsearch, if you have highly specialized needs, you can use a custom embedded script to create your own.

Aggregations:

After performing a search (filtered query) you can also specify one or more aggregations, to group the search results into buckets and/or generate statistics. The power of aggregations is that they can be nested. For example, bucketing search results on state, then bucketing those results (per state) on city, then aggregating count on each group (per state & city).

  • Bucketing = Group data on criteria based on field(s), like term matchin, histograms, date histogram, geospacial bounds, term matching, etc.
  • Metrics = Compute analytical stats over group
  • Matrix = Group multiple fields into matrix
  • Pipeline = Aggregate results from other aggregations

All these filter and aggregation features, and the ability to nest them together in complex queries, open the door to all kinds of search & analytical capabilities.

… THAT CAN SCALE UP TO INFINITY, AND BEYOND

We live in a technological world where datasets are ever-growing, as you pull in time-series data feeds from monitoring IoT sensors or infrastructure. In some cases, data can get stale and need to be discarded – either archived, rolled up (summarized), or dropped. In other cases, especially around analytics, you want the entire dataset kept as a “data lake” (a pool of all your raw data) containing all your internal knowledge and data. The cloud allows this, as you can scale up your compute & disk capacity as needed. [There is still a place for “data warehouses”, which is more a pool of processed, filtered, or rolled-up data, as opposed to raw data. DreamerDad has mentioned Snowflake before, a cloud-hosted SaaS data warehouse service. I would definitely look closer at them if and when they are go public.] Regardless, most company’s data needs are large now, and only going to continuously grow from here.

There are 2 ways to increase the capacity of a data engine. Vertical scaling is increasing a server’s capabilities (giving it more disk, more RAM, increasing network bandwidth). This is what you were limited to in the “olden days” of relational databases. However, if the data engine is capable of running as a cluster, comprised of one or more nodes (individual systems), you can horizontally scale, which is increasing the number of nodes in that cluster. Each additional node added to the cluster increases capacity and capabilities. Both MongoDB and Elasticsearch are clustered data engines that utilize replicated shards. (Though, to be clear, MongoDB can run standalone for smaller datasets. Elasticsearch’s setup for production use requires 3 nodes minimum. It was built for clustering from the start.)

Sharding (also called partitioning) is a way to split up a data set across a cluster, to allow you to 1) horizontally split your data, to be able to scale the performance and size of your cluster, and 2) allows you to distribute and parallelize operations across shards located across cluster nodes, to improve performance & throughput. So, as an example, say your dataset has 5 shards, each containing 1/5 of your data. Those shards can be located across multiple nodes within your cluster, splitting up the work load and the compute, memory & disk use.

An important aspect that shards enable is shard replication. Each shard is a fraction of your dataset, and by replicating that portion of your dataset across nodes, you are able to 1) scale your querying capabilities (each node does searching and aggregating on shards it holds) and 2) prevent data loss by always having a replica to fall back to (“high availability”). As search queries come in, they can go to any node in the cluster that has a copy of that replicated (read-only) shard. A replication factor of 2 means each shard is replicated 2 additional times, to 2 different nodes than the one the primary shard resides on, so that a total of 3 copies of that data always exist. That replication factor would triple the concurrent search requests that can occur simultaneously (splitting up the load between the shard replicas, and so between nodes of the cluster). It would also allow for up to 2 nodes to simultaneously fail without data loss occurring – if a node goes down, a replicated shard on another node can turn into the primary shard (that new data is written to and that is cloned to the other replicas), and a new replica is created to take its place.

Scale (enabled by shards and replication) is everything when dealing with clustered data engines. The primary reason folks are moving off of SQL databases is the lack of horizontal scale capabilities in today’s data driven world. Huge volumes of incoming data means your cluster needs a large ingest capability. Large data sets require a lot of disk and memory to handle it. Having a constant stream of continual queries means your cluster needs a large search capability and network bandwidth. Relational databases, unfortunately, have really been left in the dust here – there are tricks you can play with read-only replication, but there are very few clustered SQL engines that can horizontally scale. MemSQL is the only one that comes to mind, and it’s in-memory only (no disk) and not open-source. You have to go to NoSQL for this kind of capability, such as Cassandra, MongoDB or Elasticsearch.

Taking clustering capabilities a step further, an “elastic” cluster is a “smart” cluster, one that knows when nodes are added and removed from the cluster, and is able to balance the data across all the available nodes. It gives “high availability” by being able to keep search and ingest continuously available during node outages, plus is able to balance shard data across new nodes as they are added, or shift replica shards to primary when a node disappears, and, in either case, reshuffle the replicas across the cluster. I think you can tell where Elasticsearch gets its name.

Unlike MongoDB, Elasticsearch also provides the ability to scale indexes (tables) internally. An individual “dataset” in Elasticsearch can and should be MORE THAN ONE INDEX. That gives developers a huge improvement in how they manage scale of individual datasets within the cluster, and to control how those massive datasets are segmented for queries.

Within a given dataset, you can split indexes on a time frame (say, spawn a new index daily). You can specify multiple indexes per query, plus use wildcards in order to search over split indexes in a controlled way. As an example, you can roll a base “system-metrics” into a new index per day, tracked as a suffix on the base index name (e.g. 2018-01-01). 2018 data can easily be queried via a wildcard search to “system-metrics.2018-", or just query the first quarter via "system-metrics.2018-01,system-metrics.2018-02*,system-metrics.2018-03*” (which would search over 90 different daily indexes of that dataset). You can utilize aliases in Elasticsearch, that are mapped to a specific grouping of indexes. So that last example above could be more easily referenced in the future as “system-metrics-2018q1”.

A dataset could be set up as one giant index that contains all the data and you would use query filters to narrow the timespan of your search. However, splitting indexes across time period (or other factors) greatly enhance SCALE and MAINTENENCE of that index. This provides many ways Elastic Stack users can gain huge improvements in speed. In using honed queries that are optimized to your split index strategy, you can greatly reduce how much data within that dataset to search through. So instead of searching over the one big “uber” index containing all the data over time (say, 20B documents), you can isolate your searches to query only the subset of the time period you need (say, the month of April, which is only 10M documents). As new data arrives, it can get split up into individual indexes per time period (known as Rolling Index strategy), so datasets can ever-grow (scale continuously) by creating new indexes once a new time period is reached as a split point. This greatly helps a user manage the lifecycle of their indexes, by allowing them to curate their data as it continues to stream in. As data ages, you can trim stale or archived data very easily, by simply removing the specific split indexes covering the time periods you wish to trim (instead of the traditional SQL method, of having to do a filtered delete query). Or you can set latest data to be on the fast hardware and older, less referenced data to be on slower systems (known as Hot/Warm strategy).

Add up all the scale capabilities, and Elasticsearch is a data engine capable of storing and querying a massive amount of data in “near real-time”. It can handle a fire-hose of incoming data (indexing it in a shard quickly, then replicating it to other nodes) so as to make the incoming data immediately available to ad-hoc searches and aggregations. This makes Elastic Stack very relevant for “Do-It-Yourself” style use cases like monitoring. You can make your own Splunk or New Relic clone to monitor your apps and infrastructure, and greatly reduce ongoing costs.

… IN A COMPLETE ECOSYSTEM THAT IS EXPANDING TAM

So they have search and analytical capabilities that can elastically expand as a company’s needs grow. Now that we have all that technical talk out of the way, let’s talk strategy.

One huge difference in their business strategies is that Elastic hasn’t been afraid to acquire its tooling. Kibana and Logstash were both acquired in 2012 to form the base ELK Stack. Packetbeat in 2015 provided the foundation of Beats. Found in 2015 was integrated as Elastic Cloud and Elastic Cloud Enterprise.

MongoDB is catching up here in terms of ecosystem. In 2015-2016 they added Compass (data exploration and mgmt console) and backup & mgmt tool Ops Manager, but it took until v4 release in mid-2018 for them to add Charts (viz dashboards), and to begin expanding their tooling into a cloud platform with Stitch. MDB made Atlas themselves in 2016, then later acquired mLab in 2018 to acquire their customers and migrate them to Atlas.

MDB has created a great company, and has created many new tools, but wasn’t always focused on creating a unified platform. Elastic has a large number of bolt-on acquisitions, where it acquired all the pieces and parts of its ecosystem. This has allowed it to move much faster than MDB; but then again, it needs to, given it’s more limited use cases.

Licensing changes:

Let’s take a minute to talk about all the open-source licensing changes that have occurred in these enterprise companies built around a specific open-source database. Jay Kreps, CEO of Confluent (company around Apache Kafka), explained the thought process in his blog post about Confluent’s changes. https://www.confluent.io/blog/license-changes-confluent-plat…

The major cloud providers (Amazon, Microsoft, Alibaba, and Google) all differ in how they approach open source. Some of these companies partner with the open source companies that offer hosted versions of their system as a service. Others take the open source code, bake it into the cloud offering, and put all their own investments into differentiated proprietary offerings. The point is not to moralize about this behavior, these companies are simply following their commercial interests and acting within the bounds of what the license of the software allows.

As a company, one solution we could pursue would be for us to build more proprietary software and pull back from our open source investments. But we think the right way to build fundamental infrastructure layers is with open code. As workloads move to the cloud we need a mechanism for preserving that freedom while also enabling a cycle of investment, and this is our motivation for the licensing change.

I don’t he meant that to be so either-or; companies can expand their open-source core as well as proprietary offerings. There are two different strategies these “open-source” enterprise companies are employing:

  • Attack on the edges, by differentiating on the surrounding modules/tools in the platform. (Elastic, Confluent)
  • Attack at the core, by preventing others from direct cloud-hosting. (Redis, MongoDB)

MDB is not leading the charge on the cloud front, either with cloud hosting services nor with fighting the cloud providers. Elastic bought Found in 2015 to make Elastic Cloud; a year and a half later, MDB released Atlas in mid-2016. Elastic made licensing changes in March 2018. Redis next followed suit in August 2018, but with a different strategy; they instead changed the license on the open-source core, to try to prevent cloud-providers from hosting it. MDB followed Redis’s path in October 2018. Confluent followed Elastic’s path in December 2018. Redis later changed their licensing AGAIN, in March 2019, to better clarify things. Which is the better or worse strategy here? Only time will tell, but thus far, we can see in their numbers that MDB and Elastic are having huge success with their managed cloud services, regardless of approach. Does MDB have to follow Redis again and change the license a second time, given the OSS community backlash? Elastic doesn’t seem to mind their backlash, but they have a different strategy, and their core remains open-source (albeit still tightly controlled by them as sole contributors). [As we’ll get in to soon enough, I pontificate as to how their strategy completely enables a whole new front in Elastic’s new product lines.]

Elastic’s acquisitions that added tools or SaaS services in their ecosystem:

  • Kibana (2012) - Visualization dashboard app, that worked over Elasticsearch. Became part of core ELK Stack. Still here as Kibana, but its been greatly expanded with modules (Timelion, APM, security, ML, etc) and cluster mgmt interfaces.

  • Logstash (2012) - Log collection app that converts log files into data objects, and imports them into Elasticsearch. Became part of core ELK Stack.

  • Found (2015) - A managed vendor-neutral Elastic Stack cloud hosting service. Was integrated and renamed as Elastic Cloud service and Elastic Cloud Enterprise for the “on-premise cloud” version.

  • Packetbeat (2015) - A real-time network packet analytics system, built on ELK Stack, to monitor distributed systems. Integrated and expanded as the Beats product.

  • Prelect (2016) - Predictive behavioral analytics firm, focused on cybersecurity, fraud detection, and IT operational analytics. Now likely integrated as the ML modules in Kibana.

  • OpBeat (2017) - APM system for Javascript apps. Integrated now as APM Server app and into Kibana for the APM UI.

  • Swiftype (2017) - Startup providing hosted SaaS search service for enterprises to easily add search capabilities to their website or app. Elastic re-branded these offerings as Elastic App Search and Elastic Site Search SaaS services. These services directly compete against Google Search Appliance [RIP] and it’s replacement, the Google Enterprise Search SaaS service, as well as Google Site Search [RIP] and it’s replacement, Google Custom Search Engine. It recently followed those up with another - the Elastic Enterprise Search SaaS service just announced, which competes against the Google Cloud Search service (which recently expanded from searching G-Suite to include other 3rd party SaaS tools). This acquisition marked a turning point in their acquisition strategy; instead of tooling, they are moving towards SaaS services built on the Elastic Stack!

  • Insight.io (2018) - Startup with a developer-focused SaaS tool for creating a search interface over your source code. Supports a wide variety of modern programming. It’s not just full-text search – it has semantic understanding of the code, so provides intelligence over your code base. Supports code cross-referencing, class hierarchy, functional understanding of logic & structure. [Wow! I could use this!] This seems an essential service for software development shops, allowing them to tie “code intelligence” over their APM, logging and infrastructure monitoring that Elastic Stack already excels at. I haven’t seen any particular Elastic-branded module or service appear yet from this acquisition [but I am watching now]. Sounds like it is something that would be kept as a SaaS service, and compete in the same developer-focused SaaS space that Atlassian is in.

Let’s review those acquisitions above – Bolt on, bolt on, bolt on, bolt on… check check check check. I am way more impressed with Elastic than MDB here. Very natural expansion of product line and TAM as they consume apps and services around or, even more excitingly, built UPON Elasticsearch. Kind of like the Borg. I’m not sure I can come up with an example of company doing this before – making a database platform for developers and IT staff to use, then starting to run SaaS services built upon it that are enabled by that database, that are solving the same use cases but without the need to maintain or interface with that database or require any custom development.

MDB is trying to move into the same ecosystem - enterprise plug-ins with a premium cost, mgmt interfaces, and (finally) a viz dashboard and improving analytical capabilities. But they are moving way slower than Elastic, and are never going to catch up to start stealing business from Elastic for search & analytical use cases. I think they are playing it safe & conservative, as they already have a wide set of use cases. Any collection of data in a modern web or mobile app can use MongoDB. You don’t build a financial transaction service upon it, perhaps, but most everything else, surely!

As I said in “MDB Goes Mobile” post: MDB just had an acquisition, their third. Their prior one, mLab in Oct 2018, cost $68M, and allowed them to convert customers from an Atlas competitor (managed cloud-hosted MongoDB instances) into Atlas customers. While that last acquisition was a customer and team acquire, they just bought Realm for $39M to acquire product lines that help them jump start their new mobile initiatives.

MDB had an acquisition to buy customers (mLab). They already had their own mobile database product and had a mobile sync service in beta – yet they NOW decide, after that development effort, to purchase Realm as a bolt-on (and, I’m guessing, scrapping their efforts thus far towards mobile and sync). I can’t tell why MDB is intent to keep moving so slow by building everything themselves. Charts, their new SaaS visualization dashboard service, was new in v4 released a year ago… yet is still in beta nearly a year later. Not impressive! They could have easily bolted on a viz dashboard long ago, but instead they built it from scratch. (Taking, what, maybe a year, maybe more, to develop? Make that two… it’s been 11 months since release and still is marked BETA.) And, a step farther, I don’t see MDB making enterprise SaaS tooling of products based on MongoDB, either.

Elastic on the other hand, is expanding their platform to help them find more and more appropriate use cases for Elastic Stack … AND to find new successful SaaS services built on top of Elastic Stack. Either way, they expand use cases and expand the potential TAM. There is NOTHING stopping them from making a competing services to Splunk, New Relic, or DataDog – but for now they are focused more on enabling others to do that.

Or are they? Look where Elastic is starting to acquire or create new business-focused SaaS services. Swiftype, a hosted search service acquired in late 2017, is the incubator of these impressive new business-facing SaaS services. Elastic is now “eating its own dog food”, as we say in the software biz (aka consuming its own services), by creating SaaS tools built on top of itself!! Elastic now has new SaaS services based on Elastic Stack under the hood. I linked to both branding pages, as Elastic seems to be leaving Swiftype segment on its own - perhaps for existing customers and word-of-mouth. Pricing tiers have document and use limits, with upcharges for more. Each come with one search engine per customer, but can be upsized for higher content indexing or searching needs.

New SaaS or Hybrid Services:

  • Elastic Site Search Service
    https://swiftype.com/site-search
    https://www.elastic.co/solutions/site-search
    SaaS search service to add site indexing & search capabilities to your web site or app. Provides embeddable search bar with autocomplete. Can leverage the power of Elasticsearch for text searches (spell correction, similar word matching, synonyms, phrase matching). Great for e-commerce, knowledge bases, media content. Okta and Shopify both listed as customers. They highlight Twilio using it in their API docs.

Pricing:

  • $79/mo Standard - crawl up to once/day, multi-lingual (5k doc limit; 50k request limit; 1 search engine)

  • $199/mo Pro - crawl up to once/hr, PDF/DOCX indexing, cross-domain, analytics (10k doc limit; 100k request limit; 1 search engine; 1 domain)

  • $??/mo Premium - dedicated hardware & support, SLA

  • Elastic App Search Service
    https://swiftype.com/app-search
    https://www.elastic.co/solutions/app-search
    SaaS search service built over Elastic Stack. Allows companies to feed in data from a variety of sources (like their own databases), either on-premise or in the cloud, in order to allow search capabilities within their app over that data. [Someone tell The Motley Fool, that right there is their new content & board search tool!]

Pricing:

  • $49/mo Standard - basic searches & analytics, 7d history, unlimited users (50k doc limit; 500k request limit; 1 search engine)

  • $199/mo Pro - advanced searches & analytics, multi-lingual, 6mo history, cross-domain (100k doc limit; 1M request limit; 1 search engine)

  • $??/mo Premium - dedicated hardware & support, SLA

  • Elastic Enterprise Search
    https://swiftype.com/enterprise-search
    https://www.elastic.co/solutions/enterprise-search
    New beta service JUST ANNOUNCED LAST WEEK (no pricing shown yet), that again emanated from the Swiftype team. SaaS search service that provides an enterprise search capabilities over all of their SaaS team tools (Salesforce, Github, Dropbox and Google Drive content, Slack, Zendesk, etc). Maintains data security as a user can only search the content they can view natively in those SaaS services. Wide variety of SaaS services supported, and you can +/- boost the search priority of each. If your tool isn’t supported, they have the ability to add it via a custom connector. Fantastic expansion of the Swiftype platform. I expect the number of services it integrates with to increase from here, Okta style.

As Darthtaco discovered in the blog post announcing it, you can run this on Elastic Stack yourself with the Platinum tier enterprise license. So it’s a Hybrid product that is both a SaaS offering as well as a module in the Elastic Stack. Their marketing maintains a great balance - basically saying “we can do this as a SaaS service for you, or build it yourself with Elastic Stack”. Great way to upsell the enterprise support licensing (like they did with ML features, which require Platinum tier). Given it ties into Github, I wonder if they will tie in the “code intelligence” platform from Insight.io, or if that becomes its own SaaS service.

All of these new services have enormous potential as enterprise SaaS apps in their own right. Perhaps not as sticky as Okta, considering it competes with Google on all of them. But it seems like just the tip of the iceberg as far as what Elastic could do here, in terms of enterprise-focused SaaS tools for search & analysis. It is interesting to consider that the lack of changing the licensing of the core ELK Stack is enabling this angle – they can let SaaS companies build search and analytical services upon their stack. Once they are successful and if they apply to a use case Elastic is targeting, they could acquire them to bring them into the ever growing ecosystem of search-related SaaS services they can provide to customers that don’t want the hassle of using the Elastic Stack platform itself.

To be continued, in Part 3 (due to TMF post size limit)…

-muji
long ESTC (7%)

77 Likes

An Elastic technical review.

PART 1
1 - Overview
2 - Elastic Overview
3 - Compare to MDB

PART 2
4 - Strengths, in Haiku

PART 3 <<<
5 - Use Cases
6 - Final Takeaways

USE CASES

Companies need a scalable database to handle search and analytics over a LOT of data, including ever-growing datasets like metrics. There are lots of reasons to integrate Elastic Stack into your infrastructure. Elastic Stack excels at search & analytics over:

  • Full text data (ie articles, blog posts, tweets, comments)
  • Terms text data (ie tags, usernames, locations)
  • System logs & real-time metrics (ie systems, network devices)
  • Application logs & real-time metrics (ie server-side apps, databases, APIs, microservices)
  • Security/Audit logs (ie firewall logs, system audit logs)
  • Numerical data (ie financial analytics, fraud detection)
  • Time-series data (ie metrics, events, devices, IoT sensors)
  • Geospatial data (lat/long points, geo-regions, location beacons)
  • IP data (network traffic, routing logs)

These data types combine into multiple use cases they market against - but these are just the tip of the iceberg. Expect the use cases to continue to expand from here as they expand the product line as well as address more verticals specifically.

Use Case: Needing Better Search & Analytics Capabilities

This is where they started - creating a database stack that helps software development companies provide search capabilities within their architecture. It began with a focus on full-text search, but Lucene could also be utilized for indexing any type of field, and over unstructured data. Lucene caught up over the years to all those use cases, as the performance of numeric searching over time-series and geospatial data greatly improved. This made Elasticsearch and Solr more and more relevant to more and more use cases in custom development efforts. As seen in the popularity of their open-source repositories, Elasticsearch won the battle. Multiple SaaS services depend on Elasticsearch under the hood.

Some examples:

  • The Library of Congress is digitizing their archives in Elastic Stack.
    https://www.searchtechnologies.com/enterprise-search-case-st…
  • Uber built a demand prediction system over Elastic Stack for UberEats service.
    https://eng.uber.com/elk/
  • Goldman Sachs built multiple internal tools, including a contract tracking system and a trade life-cycle tracking system, over Elastic Stack.
    https://www.informationweek.com/software/enterprise-applicat…
    “Elastic has been one of the most interesting open source products that we’ve seen in the last couple years,” said Don Duet, global co-head of the Goldman Sachs technology division, in an interview with InformationWeek. “What’s impressive about it is how much value it can create in organizations.”

Use Case: Infrastructure Monitoring

Elastic is really pushing a wide variety of time-series and geospatial use cases around monitoring; IoT, sensor, app, network and infrastructure monitoring are all major use cases of Elastic Stack.

Elastic is really going after do-it-yourself infrastructure monitoring. There are 3 overlapping angles to using Elastic Stack for monitoring your infrastructure:

  • It can ingest and search over log files output from your systems and server apps, like syslog and database logs.
  • It can ingest and search over real-time metrics from your systems (like cpu/memory/disk/network usage) as well as your server applications.
  • Then it can utilize APM, where you ingest your metrics straight from your apps themselves. It ties into your code directly via an APM library, available across a wide variety of software languages (Java, Javascript, Go, Python, Ruby). This becomes particularly needed if you have a distributed code base or use microservices strategy, where you really need to monitor the flow of communication and data between all your modules.

Same for networking and security monitoring. You can pull in logs from routers, firewalls and other networking equipment. Then use ML module to isolate anomalies, or view hot-spots on regional maps. So Elastic Stack allows and organization to watch their own infrastructure, networks and app stacks. This enables companies to do-it-themselves, for a fraction of the long-term cost of Splunk, New Relic and Datadog. I see those services as major competitors, where Elastic has to convince companies to do it themselves with Elastic Stack.

Use Case: Search services

It’s a search engine at the core, so if you need search within your enterprise, on your web site, or within your mobile app, you are in the right place to Do-It-Yourself and embed Elasticsearch into your stack. And as discussed in depth before, Elastic is making moves here with SaaS services that provide these search capabilities directly to enterprises, without the need to host, manage or interact with the Elastic Stack themselves. But customers could always do these items themselves in the Elastic Stack with custom development.

Use Case: Analytics

Once you have your data flowing into Elastic, you can leverage the analytics capabilities for security and audit purposes. Utilize the ML module, or pipe it into your own analytical package (Spark, Hadoop, AWS EMR). You can use any kind of geospatial data in Elastic Search, to view traffic flows or group data into hot-spots within maps. You can use geo-fenced search filters, to search only in specific regions or overlapped geo-shapes.

Competition:

Amazon is a competitor to hosting Elasticsearch. Unlike MDB, Elastic isn’t combating it via licensing, but instead are combating it with a richer feature set from the X-Pack modules and other services. MDB and Elastic have different licensing battles for the same purpose - combat the cloud-vendor alternatives. MDB is trying to prevent them from using MongoDB altogether, while Elastic is trying to have differentiated features.

What Elastic says they have over AWS managed hosting:
https://www.elastic.co/aws-elasticsearch-service

  • premium modules for ML, Security
  • free modules for alerting, monitoring, SQL, Canvas
  • Monitoring dashboards and APM UI
  • Index curation & roll-up features (Hot/Warm/Frozen indexes)
  • Elastic Map Service
  • Logstash/Beats mgmt UI

Beyond hosting, I think the major competition isn’t alternative open-source engines, it is their competitors in their use cases. SaaS infrastructure monitoring companies like Splunk, New Relic, Datadog, and the like are losing customers tired of the high monthly charges, who can build it themselves on Elastic Stack for a fraction of the cost. Elastic Stack is for DO-IT-YOURSELFERS and those on a budget, compared to tying into those SaaS tools where ever-growing datasets means ever-growing monthly expenses.

FINAL TAKEAWAYS

Elastic knew early on that they needed a complete ecosystem. Kibana is a data visualization dashboard, but also provides the interface to manage the cluster and the data within. Logstash and Beats both enable monitoring use case, and with Kibana, allow using Elastic Stack without coding. Elastic has a major focus on ML over the data, for things like anomaly and threat detection. In comparison, MDB has been catching up on ecosystem tools like Charts, but has nothing around analytics or ML tied in.

Yes, MDB has a much wider use case. But for search and analytics, there is really no alternative to Elasticsearch outside the way-less-used Solr. The choice for a company is really, does a search engine apply to our use case? If so, you go Elastic Stack. So the question of competition is really if you use Elastic Cloud or have AWS host your managed cluster. MongoDB is solely used by software development companies. Elastic Stack can be used without code! That means that, unlike MongoDB, it’s not just for software developer companies – any company can benefit. IT departments are using it just as a standalone Elastic Stack, directly integrating monitoring capabilities without needing any custom development effort. Kibana is a very easy-to-use visualization dashboard tool. IT can install Beats onto infrastructure, and suddenly it is all feeding into your cluster for DO-IT-YOURSELF monitoring.

I have spoken about Elastic before, as I attended and wrote up their ElasticON developer conference last October: https://discussion.fool.com/insights-from-elastic-conference-340…. Go back and re-read that now that you know what the hey they do now! Their main focus at the conference was for 2 main customer use cases: use it to monitor everything (logging + metrics + APM), and use it to help secure your network & infrastructure by building a Security Event Information Management system (SEIM) around it. I dove into more details about those use cases on that post. One highlight I continue to focus on was how Oak Ridge National Labs IT team brought their SEIM system from Splunk to in-house, and costs went from “$$$$$” to “$$” - showing they were cut by more than half. Simply put, companies with large infrastructure can save big bucks by taking a DO-IT-YOURSELF attitude with monitoring and security. Elastic directly competes with Splunk and New Relic here.

Elastic & MDB are similar companies with similar products. They are NoSQL databases that compete, and have closely matching business product strategies (both heavily focused on cloud-neutral managed hosting). Both have tried to differentiate their managed cloud-hosting service from AWS’s. At a minimum, both are the authors of the database, so are absolutely the best resource to host that database for you and help you with it. But beyond that, MDB offers Stitch and Charts, and Elastic Cloud offers many add-on modules. AWS is starting to fill in the gaps with their “Open Distro for Elasticsearch”, but they only cover a few of the basic X-Pack plugins so far (security and alerting). They aren’t going to catch up to Elastic Stack’s feature set like this. Elastic is more than happy to highlight what AWS Elasticsearch cannot offer in their marketing.

So the licensing battles are just a strategy difference on how to fend off competition from using their open-source core in a competing hosting services. MDB is fighting via their core licensing. Elastic is using their ecosystem of modules to differentiate their platform. Google and Microsoft are choosing to partner with Elastic for managed Elastic Stack hosting on their platform, instead of building a competing service. AWS is fighting it to the point of branching their own “Open Distro of Elasticsearch” that doesn’t include the alternatively-licensed modules, instead having to write their own open-source security, alerting and SQL modules. https://opendistro.github.io/for-elasticsearch/ AWS doesn’t typically contribute to open-source. They aren’t doing this out the kindness of their heart - they cannot sell managed Elasticsearch clusters without these features being present. Expect Elastic to continue expanding features to differentiate themselves. I can’t believe AWS gets away with this competitive behavior, but I guess that behavior is par for the course for Amazon the retailer. If they see a way to capture a few more points of margin, they take it and cut out the middleman. When MDB changed their license, the press sold it at the time as combating Asian cloud providers, but in reality the first front was AWS.

I am going to take it a step further – “Open Distro of Elasticsearch” shows me that AWS cannot compete against Elastic Cloud with just the core Elasticsearch, as they had to find a way to use the proprietary features that they couldn’t include under the “Elastic License” they are under. Different license game than MDB, but I think it’s working just as well. AWS has to find another way to differentiate their service from Elastic Cloud (besides price – yes, AWS is cheaper). I think they are already starting to market it differently, as recently I saw a blurb touting AWS Kinesis as a data stream platform that can easily integrate directly into AWS Elasticsearch.

[Side bar to the whole “open-source database company doing cloud hosting” part: Confluent, maker of Apache Kafka, is one to watch for going forward. Kafka is a data streaming platform on a high-availability cluster. Not a database, per say, but damn close (more a persisted, high-availability message queue). Disclaimer, I am a database developer that uses Kafka a lot. Confluent hosts Confluent Cloud managed hosting service, and so if & when it becomes public, I would consider its numbers and put it up with MDB and Elastic as an extremely sticky platform for software development companies. AWS runs a competing AWS Kinesis service, but now also runs its own AWS Managed Kafka service, as that platform has a lot of momentum. Confluent has taken the same route Elastic has in changing the licensing of other components in their ecosystem, not the core database.]

Creating a managed cloud-neutral hosting service over the core platform is clearly a big money maker for these open-source companies. That’s the current big revenue growth coming in. But Elastic is adding the next wave of growth – creating their own enterprise-focused SaaS services around search and analytics. This is the two fold nature of Elastic’s acquisitional prowess. It first bolted on tool sets around it’s core, to build an ecosystem around itself. But the recent acquisitions are altogether different. In Swifttype and Insight.io, it found companies that built themselves on Elasticsearch (as they are allowed to, by the permissive Apache 2.0 license!) for their SaaS search service for enterprises. Such a superb direction for Elastic; they can leverage their expertise plus provide an alternate path for their customers! There may be risks in this direction, but I think this has already been addressed by Elastic – the marketing is taking a great tack in saying you can use the SaaS Service or do it on Elastic Stack yourself. Elastic is also keeping Swiftype an independent division. It’s such a good idea – find companies building on the Elastic Stack, and acquire ones that align with Elastic’s use cases. They can leverage all their knowledge about the core Elastic Stack platform it is built on, but focus these SaaS services toward highly-honed enterprise solutions around search and analytics.

Very exciting, and this just seems like the start. The recent acquisition of Insight.io really has me intrigued. They have a developer-focused SaaS service that integrates into code tracking services (Github, Bitbucket) and provides intelligence and search capabilities over your code base. This puts them into same developer-focused SaaS market as Atlassian. Which, as it so happens, is a market that has extreme cross-selling potential to their existing Elastic Stack customers who are using it for application development. They could combine Insight’s service with app monitoring (especially APM modules) and make it a very focused SaaS service. Or it could combine with the new Elastic Enterprise Search as a differentiator over Google’s offering. Whatever is coming down the pipeline from this, Elastic is going to be competing in an all new market (software development SaaS tooling). They are already in the enterprise SaaS tooling market now, and against some big names – Swiftype directly competes with Google! With these new SaaS directions, each potentially opens up all new markets! Perhaps cutting in to Elastic Stack potential market for do-it-yourself solutions, but they capture that customer regardless.

Elastic has a land-and-expand philosophy with Elastic Stack customers; if they can get a new customer to use it for one use case, then they will find all their other use cases for it and start expanding their use from there. If on Elastic Cloud, managed hosting fees will likely increase over time. If self-hosted, customers may rise up the support tiers as their dependency increases. This is all easily seen in their $NER >130%. The new SaaS services may cut into this a bit, as this provides an alternate path for new customers to take, where they won’t get into the core Elastic Stack and find other use cases. However, those SaaS services will have their own growing customer base and expanding use (as customers have more traffic and more documents, so move up the pricing tiers), so difficult to know. Regardless, as you can probably tell, I am really enamored with the dueling business strategies they appear to be navigating perfectly.

It all combines into some fascinating moves by Elastic. I walked into this research project thinking they were a MDB clone, but I now feel Elastic has a much richer story than MDB. This SaaS tooling direction took me completely by surprise. [I had here-to-fore under-estimated the Swiftype acquisition more than I should have, and I had never even heard of [Insight.io](http://Insight.io) before this research.] I was mightily impressed by Okta after my tech review of them; I am moreso of Elastic. Perhaps due to my closeness to their product, I corralled my thoughts around this company incorrectly. Today, I cannot deny I have a new excitement around the potential here. And, if my financial bet is correct, it’s just the beginning of their SaaS moves. TAM potential is completely unknown. Tomorrow they could create or acquire a New Relic or PagerDuty or Everbridge clone by combining Elastic Stack with Twilio notifications. Any SaaS monitoring service is competition to a do-it-yourself solution on Elastic Stack, but they clearly have their sights on SaaS services built on that for companies that just don’t want the additional hassle of maintaining or interfacing with an Elastic Stack cluster (who’d rather avoid the do-it-yourself).

I think MDB and Elastic are such similar business models, that I’d really like to see a numbers to numbers comparison of MDB and Elastic. Nearly same revenue, nearly same growth [well, maybe… MDB just accelerated this recent Q!], nearly same market cap [then again, MDB just jumped 25%!]. I moved to have nearly equal allocations in them, and will start exiting one when it starts faltering in head-to-head comparison of their stats - I want to find the one executing better after a few Qs then move mostly to that one. Anyone in the collective want to start tracking and posting a head-to-head numbers comparison? (Please! I’m too busy pontificating here!)

In closing, I hope I convinced you, Saul and others here, to revisit your concerns about their open-source strategy. It’s two sides of the same battle of keeping their cloud hosting services differentiated against the cloud vendors’ offerings. MongoDB can’t be hosted past v3.6, so all new features are protected from here. ELK is entirely open and free, but it’s the integrated modules that require licensing, and all new features can be protected from here. It feels that Elastic letting companies utilize and embed Elasticsearch into their own products, via that permissive Apache 2.0 license, is what fueled this next phase of Elastic’s revenue growth in having these “side-car” SaaS services they’ve acquired.

Needless to say, I have increased my allocation prior to the publication of this massive missive. I hope you learned something. I sure did, which is why I love this kind of homework.

-muji
long ESTC (7%)

101 Likes

Muji,
Not sure what it says about me that i open Sauls board and see the title of your post and got really excited!

Pretty sure what it says about you is that these detailed writeups by an actual tech user are gold and greatly appreciated. Thanks!

Dreamer

32 Likes
  • I want to find the one executing better after a few Qs then move mostly to that one. Anyone in the collective want to start tracking and posting a head-to-head numbers comparison? (Please! I’m too busy pontificating here!)

Why not own both mdb and estc longterm, as they dont quite overlap?

What if they are literally the #1 and #2 best stocks you could buy fornext few years.

Just trying to understand why you are leanong towards either/or vs holding both?

Thanks again…tremendously valuable posts!

Dreamer

1 Like

Wow. Thanks muji. That is an amazing post. You know more about Elastic than I guess I know about anything.

In your first post you said: …if you are slicing and dicing over large data (hundreds of Gb or more) or big data (hundreds of Tb or more) for search or analytical purposes, Elasticsearch is ideal.

So what is your best guess of TAM? I am not a computer guy, so I have no idea. But it seems to me the number of people who would need to slice and dice hundreds of terabytes of data is not that large.

Am I wrong?

Thanks for taking the time to make this post.

Jeb

3 Likes

And this is an investment board at its very best!

Thank you muji!! What an incredible write-up. It simply amazes me that this sort of information is available to an ordinary guy like me.

Thank you TMF and Saul for hosting this awesome board.

10 Likes

Awesome post muji!!!

I’m a non-techie and have couple of questions. When comparing MBD to Elastic you stated multiple nodes needed for Elastic (min 3) and 0 for MDB?

Does that make things more expensive and more complicated for Elastic VS MDB?

What portion of the NoSql market assuming no other database company existed would you put MDB VS Elastic today and what trend do you expect to see a year or 2 out?

With the amount of data getting larger every year would you not expect Elastic to grow faster in the near future than MDB?

What do you think of the graph at the bottom of the page of the below link comparing MDB and Elastic? Will continual lower ram prices compensate for MDB speed in the long run VS Elastic.

http://bitcom.systems/blog/moving-mongo-to-elasticsearch/

Thanks.

Lafleur

PS I have been following/lurking all you on Saul and NPI for years. I am indebted to you all. Saul, what you have created here is truly amazing! Thank you all.

3 Likes

Muji,

Absolute dynamite!

I think we’ll be seeing more from Elastic. The shares have the disadvantage of debuting so much higher than the IPO. But if the business maintains the momentum it has, the stock will have its day in the sun. Slowly accumulating.

Darth

Muji:

Wow…that was a very long and technical series of posts…thanks for the effort!

There a few negative issues with ESTC including:

  1. margin declining from 77% to 71% over past year and half.
  2. Revenue growth rate declining
  3. Huge stock lockup right after this next earnings call…date of earnings not yet announced?
  4. AWS competitive open source environment (Open Distro)

These items probably explain the stagnation of the stock the past couple months and the impact of AWS remains yet uncertain IMO…will they recruit some of the more prolific open sourcers to their camp??

I agree with you that MDB and ESTC have somewhat different approaches to trying to protect their IP…but I am not convinced that ESTC’s model of trying to innovate their way to competitive advantage without licensing protection is more sustainably competitive than what MDB and other open source have been doing by trying to license their IP AND ALSO innovate.

In fact, it will greatly surprise me if ESTC doesn’t follow the same playbook as MDB and Redis labs. IMO, we are more likely to see these open source model companies walk in step than to go separate business paths/models (witness the recent dual announcements of GOOG partnership for BOTH MDB and ESTC).

But recognizing that ESTC has had the overhang of the above 4 items…the top three likely go away assuming they continue to grow revenue at a breakneck speed.

The larger more financial discussion is whether there is really an open source business model that can wildly succeed for a profit-based company…that has really been called into question of late with the growing near supreme power of the large cloud providers and AWS in particular.

This article looks at this open source business model as it pertains to MDB but it really is about all open source models. It questions the viability of the open sourced for profit by parallels with the record industry:

https://stratechery.com/2019/aws-mongodb-and-the-economic-re…

Economic Realities and the Future
Little of what I wrote is new to folks in the open source community: the debate over the impact of cloud services on open source has been a strident one for a while now. I think, though, that the debate gets sidetracked by (understandable) discussions about “fairness” and what AWS supposedly owes open source. Yes, companies like MongoDB Inc. and Redis Labs worked hard, and yes, AWS is largely built on open source, but the world is governed by economic realities, not subjective judgments of fairness.

And that is why I started with music: it wasn’t necessarily “fair” that music industry sales plummeted, and yes, companies like Apple with its iPod business made billions off of piracy. The only reality that mattered, though, was that music itself, thanks to its infinite reproducibility, was as pure a commodity as there could be.

The argument that he makes is that it really doesn’t matter how much better ESTC or MDB get (the innovation part)…the question remains as to what minimum performance is demanded/acceptable to the market to accomplish what just needs to be accomplished.

Anyway, just food for thought for long-term holders of any of the open source companies and their business models. The Gorilla game of yesteryear started with a requirement for open “propriety” software as having leverage for the greatest moat…that is obviously not open source. MDB and Redis have tried to move closer to that modeling at least slightly using a new licensing model…I expect ESTC still needs to do the same and stick together with MDB, work together and act together.

These open source companies are threatened in very similar ways from AWS. Etc…they can learn from each other for the benefit of them all.

19 Likes

One important note is that Elastic and MDB aren’t necessarily competitors. MDB is more of a database disruptor along the lines of the traditional SQL databases (SQL Server, Oracle, MySQL, PostgreSQL). In my opinion, its biggest threats in the NoSQL space are via Microsoft Azure’s CosmosDB and Amazon’s AWS DynamoDB. With more and more companies going to the cloud, they will be tempted to play in one of these areas instead. It is very interesting that MDB’s Atlas provides the ability to choose where the DBs are housed (can choose to have it in AWS or Azure among others). NoSQL is great for loading lots of data and especially, data where the structures can change. That said, you hit upon the biggest issue with NoSQL databases; search.

Elasticsearch is a searching technology. Think along the lines of when you type something into google and it auto completes; figures out what you want.
Many companies who go NoSQL move the data to a big data type of repository for analysis and reporting purposes. In fact, even in SQL databases, this is the ideal step for analytics and AI. For companies with big search needs, loading the data to something like Elasticsearch or something similar (Solr, Azuresearch, AWS Cloudsearch) is ideal. Elastic might have designs on being both search and main data store but they need to resolve a lot of issues first (security, reliability).

Lafleur, you asked about the nodes. That is gets into one of the big benefits to NoSQL. It is a much better data scaling mechanism called sharding. The data is horizontally scaled rather than the traditional vertical scaling by SQL databases. This makes it much cheaper because having multiple smaller computers is cheaper than one big beefy one.

The world is evolving. The days of a company only having one database technology are gone or coming to a close. The cloud is making this easier for companies big and small.

16 Likes

Open source does not necessarily mean what you do will be done or can be done by everybody else. Does it? Ultimately one could say that whatever functionality implemented by a code could be done by anybody else. Once that beat or that melody is out there one could use it to create something else.
From the past, Red Hat has also touted open source. You don’t think they have had a business?
If you argue that the issue is them being ‘open source’ and you look at the stock movement then why is Mongo going up and Elastic not in recent months?
What do you think will happen after the lock-up?

tj

Thanks all for the responses and questions. Feel free to hit the smiley face by my name to “Favorite Fool” me, I kinda like this trophy-chalice icon that has appeared next to my name recently.

Here are some answers to the questions up-thread from Dreamer, Jebbo, Lafluer, Duma and apwickman26.


Dreamer:
Why not own both mdb and estc longterm, as they dont quite overlap? What if they are literally the #1 and #2 best stocks you could buy for next few years. Just trying to understand why you are leaning towards either/or vs holding both?

I probably didn’t put it very elegantly - holding both is exactly my plan, ultimately in roughly equal allocations. MDB has been on a huge tear, so clearly wins on relative strength. ESTC is likely still held back by share lock-up so I (and many others posting portfolios here, including Saul, Gaucho, Fleiberman, and Bear) have been slow to build up a position more than a starter 2-3%. After this deep dive, I was impressed enough to start increasing my allocation now, rather than wait for lockup to pass. What I meant in my closing statements was that we should be able to have a good view into their execution & strategies by comparing their trajectories from here, so if one falters we can act appropriately. I really like basket situations like this, like buying both SHOP and SQ and watching them both succeed.

I argued against selling MDB in late Feb, due to AWS and Lyft news (it then popped at earnings), and bought more Okta early April after my review – both of those decisions have turned out VERY nicely. I hope the same for ESTC. I nibbled back in the mid-60s back in Nov, needing to research them more. It has been very range-bound the last 2 months since peaking in late Feb. Once past the lockups and having another Q or two of stellar numbers, and I think we’ll be sitting pretty.


Jebbo:
So what is your best guess of TAM? I am not a computer guy, so I have no idea. But it seems to me the number of people who would need to slice and dice hundreds of terabytes of data is not that large.

More companies than you’d think. Having to churn through terabytes of data is why Hadoop got created. I’d say all the Fortune 2000 and then many IoT or analytics oriented companies have datasets that size. Elasticsearch is built for large datasets, but any data that starts to outgrow the native search capabilities where it is stored can leverage it (say, anything more than 20 million rows or over 20Gb, and it may start exceeding the limits of the database it is in). Since I just had to look it up for work, I’ve got 33Tb in 300B documents in my 6 node Elasticsearch cluster, using it as a time-series database, with metrics going back to 2015.

Guess of TAM? No idea, I just pontificated on the business lines and strategy. I feel that search is relevant to nearly every business w/ an app or website w/ any kind of data (blogs, articles, comments), much less all the trove of specialized datasets out there.

While MDB has a lot of applicability to modern apps, where I like ESTC’s moves are:

  • Elastic Stack is relevant to non-developers, for things like monitoring infrastructure or network traffic.
  • Elastic’s moves into SaaS services to provide solutions for those non-technically inclined companies that don’t want to run their own database.

Lafleur:
When comparing MBD to Elastic you stated multiple nodes needed for Elastic (min 3) and 0 for MDB? Does that make things more expensive and more complicated for Elastic VS MDB?

Yes. Which is more a reason to use managed hosting. Plus you can size it to your current needs and easily scale as needed from there.

With the amount of data getting larger every year would you not expect Elastic to grow faster in the near future than MDB?

Well that is an interesting thought. I’d say any company using either platform can be expected to have ever growing data. But given that Elastic Stack has tools to auto-ingest logs/metrics via Beats, that I would side with it being a greater velocity for them.

What do you think of the graph at the bottom of the page of the below link comparing MDB and Elastic? Will continual lower ram prices compensate for MDB speed in the long run VS Elastic.

It’s an interesting chart. But their solution is vertically scaling by increasing RAM - there is only so far you can go. It’s pretty easy to see how MDB performs on full-text search if it can keep the dataset in memory (and when it can’t, it hits a wall where it starts getting super slow). They make no mention of setups (cluster size, etc) they are comparing so it’s all pretty ignorable.


Duma:
but I am not convinced that ESTC’s model of trying to innovate their way to competitive advantage without licensing protection is more sustainably competitive than what MDB and other open source have been doing by trying to license their IP AND ALSO innovate.

THEY DO HAVE LICENSING PROTECTION. The choice they are making is in protecting modules within their surrounding eco-system by changing the licensing on the periphery, instead of changing the licensing of the core. The end goal is the same - new features that differentiate them from AWS’s offerings are protected.

In fact, it will greatly surprise me if ESTC doesn’t follow the same playbook as MDB and Redis labs. IMO, we are more likely to see these open source model companies walk in step than to go separate business paths/models (witness the recent dual announcements of GOOG partnership for BOTH MDB and ESTC).

It’s very clear that is not happening, so expect to be surprised (though how one is surprised by an eventual lack of an action, I’m not sure… I guess I’ll just come back in a year and yell SURPRISE at you). Those who decided to take the periphery tack (Elastic and Confluent are the 2 I know of) have been very vocal from the start about their decision & the reasoning behind it, in blog posts from the CEO. They are trying to be very open and deferential to the open-source community about this necessary shift that the enterprise companies behind open-source software is need to take cloud providers if they hope to stay relevant in hosting it.

There is a lot of debate about this in the OSS world. Changing the core is rightfully seen as extremely drastic, as it is much more susceptible to unforeseen complications. Any easy one for forsee, is that many companies are extremely careful about reviewing licensing of their selected tools, and may have picked MongoDB because of its original permissive Apache 2.0 license. If they have doubts, or even legal reservations, about the new license, they are COMPLETELY FROZEN at the last version under Apache 2.0 (v3.6). Way to screw them over! At a minimum, it requires legal review at a lot of companies using MongoDB internally or embedded within their software product or SaaS service.

Redis and MongoDB’s choice of changing core license is getting wayyyy more push-back than the periphery choice of having proprietary modules & tools around the open-source core. Redis had to go back and clarify their license a 2nd time, it was getting so much heat. MDB is got some serious side eye from Red Hat, who removed their software from their repos, and MDB had to eventually pull their new SSPL license from consideration at OSI. See this mailing list post from MDB CTO whining about it (http://lists.opensource.org/pipermail/license-review_lists.o…). In looking at the note it was replying to, it’s clear what the push back is about – the wording about preventing SaaS services “substantially similar to that of the Program”. So it appears that SaaS companies that use MongoDB under the hood as a major component in their service have MAJOR CONCERNS about this new SSPL license. Changing the license on the core open-source database is having the rug pulled out from under you. It requires a legal review at a minimum, but these companies could be extremely upset about what is essentially a broken promise by MDB – expectations were set in stone that MDB decided to change for THEIR BENEFIT ONLY.

Maybe MongoDB feels that this shouldn’t scare SaaS companies, as they are a general-purpose document store, and SaaS companies likely are using it for a purpose that isn’t directly competing against MDB. If Elastic adopted this license on core Elasticsearch, they’d would be directly stifling companies like Swiftype, which built SaaS services over Elasticsearch that are “substantially similar to that of the Program” – which Elastic acquired! I’m guessing they may want more of those types of acquisitions. And, funny thing, SSPL license is full-on attack against services like mLab… which they happened to acquire right before that licensing change.

You mentioned the dual announcement from GOOG, but that has everything to do with GOOG wanting media attention around its choice to partner with these companies, and not end-run around them like AWS; it is NOT about MDB and ESTC being partners or being in lock-step.

These open source companies are threatened in very similar ways from AWS. Etc…they can learn from each other for the benefit of them all.

I agree they can learn from each other. One side is going to learn that the other made better strategic choices. I tried to be more neutral about it in my write up (like “who knows which is better”), but I think the signs are shouting pretty loudly about which choice is better.

MDB and Redis have tried to move closer to that modeling at least slightly using a new licensing model…I expect ESTC still needs to do the same and stick together with MDB, work together and act together.

Redis is really in the weakest position here - their tool is easiest to replicate, and there aren’t major enhancements being made to it from here. MDB had little choice in the matter, as they don’t have the platform of proprietary tools to make the periphery play (until they had Stitch, anyway - that’s a pretty big differentiator). Elastic and Confluent have an ecosystem of tools that were already proprietary, just with more permissive licenses. They are buttoning that up real quick, as those tools are differentiators against the core offering.


apwickman26:
One important note is that Elastic and MDB aren’t necessarily competitors.

I never stated they were. I thought I made clear the difference that Elastic is a search-oriented database, MDB is general-purpose document store for data collections. I am comparing them not as direct competitors, but as very similarly structured businesses with similar history – creating and maintaining an open-source NoSQL database and now having success at providing managed cloud hosting of that database. MDB and ESTC are both in the hypergrowth Saul universe, so I think it is important to view them in tandem - either to compare their execution and strategies to evaluate them, or to invest in both as a “sticky dev tooling w/ cloud hosting” basket.

MDB is more of a database disruptor along the lines of the traditional SQL databases (SQL Server, Oracle, MySQL, PostgreSQL). In my opinion, its biggest threats in the NoSQL space are via Microsoft Azure’s CosmosDB and Amazon’s AWS DynamoDB.

Yes. I didn’t rehash that as I already covered all that earlier in my “Cutting Through the FUD on MDB” post. https://boards.fool.com/cutting-through-the-fud-on-mdb-34145…

For companies with big search needs, loading the data to something like Elasticsearch or something similar (Solr, Azuresearch, AWS Cloudsearch) is ideal.

Thanks for bringing that up - I should have mentioned Azure Search and AWS Cloudsearch. They’ve been around a long while as cloud-based competitors to Elasticsearch, but don’t compare at all to Elastic Stack’s popularity. We don’t have numbers from cloud providers on usage, but those two vendor-specific databases are way down the list on DB Engine rankings (at 54 for Azure and a way pitiful 94 for AWS).

So to copy a bit from my MDB post linked above:

REGARDLESS of these competing products, Elasticsearch is the clear king of search databases.

REGARDLESS of these competing products that are more “native” to their respective clouds, Elastic has thrived, in part by building an entire eco-system of tools (Kibana, Beats) & modules (APM, ML) that those products are missing.

REGARDLESS of many cloud providers providing managed Elasticsearch themselves, Elastic has thrived as a cloud-neutral provider of managed hosting.

Elastic might have designs on being both search and main data store but they need to resolve a lot of issues first (security, reliability).

Elasticsearch is NOT angling to be your main data store. It’s not striving to be the “single source of truth”, as we call it in the biz (aka being the sole database for all of a company’s data). The common pattern using it is that you are pulling pieces from various “sources of truth” into Elasticsearch (say, your HR records but not payroll tx), with it acting as a storage engine over it.

I have to dismiss the issues you list. Elastic HAS solved security, it’s been a proprietary module in X-Pack for years, where you can control security down to the field level. It’s the first thing AWS had to address with Open Distro (they added a security module). As for reliability, stability has been a major focus since v1.5 (now on v7.0), and Elastic has been very diligent over the past several years about shoring up bugs and weak spots of their platform – writing distributed software over a cluster is extremely difficult. And with shard replication, it has been geared for high-availability from the start.

  • muji
    long ESTC
67 Likes

Muji,
Thanks for your detailed posts. I am invested in Elastic. I see that you use Elastic quite a bit. I am curious what would prompt you to become a paying customer. I realize if you want features like security, alert, and monitoring or their SAAS products or service you have to pay. But curious if most like you can get by fine with the free version thus limiting their TAM.

2 Likes

Found a blog post that references Muji’s fantastic overview :

SNIP…

ESTC is moving in the SaaS direction though. With a series of acquisitions, ESTC is well positioned to shift away from selling infrastructure software/source code and toward being an enterprise application vendor. App Search Service (powered by Swiftype which they acquired), Site Search Service, and the new Enterprise Search Service – these are all closer to what I would consider SaaS products. ESTC also just acquired Endpoint, and will use it to security application space.

CMF_muji correctly points out that this “true” SaaS approach is an upside option for ESTC. I think it’s more than that. It’s a way out of this whole “existential threat to business model” quandary.

This is a step function change in business model – selling application instead of selling enabling infrastructure. Your value proposition is different. The customer’s mindset is different. The SaaS user will focus on fulfilling the business case and ignore the infrastructure. They will not play around with Github and source codes.

At that point the whole debate of open source versus propriety becomes moot.

That is a promising thought, but one that is a long way off. The value proposition, customer set, as well as ESTC’s sales and marketing approach all have to change. This is by no means an easy move.

https://greytabinvest.blogspot.com/2019/06/elastic-estc-saas…

7 Likes

Thanks for the blog.

To cut the chase, the key issue is here:

CMF_muji correctly points out that this “true” SaaS approach is an upside option for ESTC. I think it’s more than that. It’s a way out of this whole “existential threat to business model” quandary.

This is a step function change in business model – selling application instead of selling enabling infrastructure. Your value proposition is different. The customer’s mindset is different. The SaaS user will focus on fulfilling the business case and ignore the infrastructure. They will not play around with Github and source codes.

At that point the whole debate of open source versus propriety becomes moot.

That is a promising thought, but one that is a long way off. The value proposition, customer set, as well as ESTC’s sales and marketing approach all have to change. This is by no means an easy move

Much more complexity than a MDB…so it seems.

I have yet to see anyone, including myself, be able to establish theTAM for its various niche business plays…”optionality” maybe…but for what calculated TAM???

3 Likes