Cloudera and some thoughts on cloud data

@jpcIV Jack - you guessed it - my name comes from the great Hermann Hesse. I recently reread the Glass Bead Game - he is an amazing author. And thank you for your detailed comments on Nektar - they have been very helpful for me in thinking about the company.

You asked about my thoughts on Cloudera, as I said I’m long on it in another post, and also pointed me to this great article by Bert:…

I also saw the posts by Tinker.

Conclusion first: I mostly agree with Bert, and mostly disagree with Tinker. I am long both Cloudera and Hortonworks, and I look at these as value investments in the cloud.

So if we look at data, this is an enormous area. And I do want to differentiate between cloud and data - I often see them discussed as if they are the same thing, and they aren’t at all, although they are affected by some of the same forces.

Ignoring the hardware vendors like Pure Storage, EMC, Nvidia, and AMD, you basically have the databases, the storage fabric, and the database tools.

I think you know my view on databases - Mongo has won for NoSql databases. There are 3 basic types of data stores in the Master Data model - transactional, enterprise, and analytics. Mongo is ideal for enterprise data. Sql is ideal currently for transactional, though I expect Mongo will take over there as well over time.

Analytics is the big problem. Analytics data volumes have exploded with the Internet, and even more with social media and web 2.0. In the mean time, analytics requirements have gotten immensely complex with AI, Deep Learning, and whatever is the latest buzz word that gets Universities grants from companies.

This problem hasn’t been solved and we are in the early days - there is no clear winner yet, though there are some indications.

The first part of the solution is the Data Fabric, which means a way for us to provision a huge number of servers to hold the data we are going to analyze, then to configure it with a distributed file system, then push in the data, analyze it with the engine of choice, get the results, then clean up the scraps.

This is what both Cloudera and Hortonworks do. Hortonworks is a partner of Pivotal (whose developers developed many aspects of Hadoop, by the way) and it is the top data fabric vendor for startups. Cloudera is a partner of MongoDB, and is the top data fabric vendor in the enterprise. This is just relative - both are after both markets.

Mongo is one of the contenders for the middle tier, meaning where you source the data and return the answers, and I expect with their money for R&D, they will keep winning in this area.

Hadoop Distributed File System is the file system winner, and I think this is pretty much decided, especially since it is open source. If the requirements change, the developers will just update HDFS.

Finally you have the analysis on top. If you look at companies like Talend and Informatica, they focus on ETL (extract, transform, load) which is a really old process based on analyzing data in Sql databases. You do this very slowly in offline processes. ELT (extract, load, transform) is an updated version of this trying to do it when you need it. Both of these are not keeping up with modern data analysis needs.

This is also why Cloudera and Talend aren’t competitors. In fact Talend runs on the Cloudera fabric, as well as the AWS fabric, and others. But ETL is old and not where the future is headed.

Hadoop Map Reduce and Apache Spark are the contenders for modern analytic engines on the data fabric. These are the real competitors to Talend and Informatica. Apache Spark has been winning lately, but this is very far from decided, there are different use cases for each, and anyway every Hadoop vendor support both. Now there is nothing saying that Talend couldn’t win with a new Apache Spark based engine - but there is nothing saying it will succeed either.

So maybe this explains why I’m invested in Cloudera and Hortonworks, and not in Talend or Alteryx. I’m not saying Talend or Alteryx are bad companies to invest in today - they are growing fast. But they don’t look to be the future. I’m perfectly willing to write puts when they drop in price, and own them for a while or get the income when they rise. But these are very temporary moves in companies I don’t believe in for the long term. These companies are just filling in while the real technologies are being developed.

In the long run, for companies that have the EV/S multiples of cloud companies, I’m looking for a big moat. I know MDB and PVTL have this. I have positions in CLDF and HDP because I want to be in cloud data fabric, and they are on the cutting edge and relatively cheap for cloud companies. However the data analysis wars turn out, the winners will still need to run on the data fabrics, provided by CLDR and HDP - they are the “picks and shovels” providers.

I don’t know who will win the analytics wars - it could be CLDR or HDP expanding their capabilities and winning with some open source tool like Hadoop or Spark. Or it could be Talend or Informatica reinventing themselves away from ETL to a modern analytics library. Or it could be Mongo expanding their analytics capability - this is a really strong possibility. Or it could be a new startup with a whole new approach. We don’t have the answer yet, and we won’t even be ready for an answer until there is enough NoSql in the middle tier to force enterprises to decide what they are going to spend their analytics money on when Sql is less important.

So today, I’m keeping my ears up for who is winning the future, and buying the picks and shovels that everyone will need no matter who wins.


Good post Steppen… thank you.

1 Like

Curious your take on this then:…

announced new combined capabilities that help data engineers build and run data pipelines, applications and services in the cloud at a fraction of the cost and without the burden of managing servers.

Dreamer. (Long ayx and mdb)

I am just citing Forrester in regard to their Big Data Fabric Web. They have Talend out as the clear leader, also with the largest market presence. They have Cloudera at the back of their middle (Tier 2 out of 3) groupings with a small dot.

I actually know little about Cloudera, other than when its share price crashed a few quarters a go it looked like it was a good candidate for the pattern that stocks like PSTG and NTNX and many others went through, with their share price collapsing, and then getting things back together, and clearly whooping the market.

Perhaps now it is timely to consider it in detail. Shall take that opportunity as Bert has provided an article, and Steppenwulf some nice analysis.

Talend has been growing its Big Data and Cloud business at more than 100% YoY for a few quarters in a row now. Talend is also featuring its native features to run Spark making claims Such as:

Talend Big Data jobs running Spark are 5x faster than MapReduce* providing real-time results

It is a short blog, but full of why at least an outside observer like myself, combined with the Forrester report would consider that Talend may be doing something right here, along with their sales results.

I shall dig deeper and see if I can better educate myself. Always fascinating to dig into such things.



Oops, here is the link to the entire blog/marketing piece if anyone wants to comment on it.



1 Like…

Cloudera has now had two bad quarter in a row and the share price is back down to the crash levels.

Part of the issue with Cloudera, that we discussed last quarter, was that Cloudera’s universe of customers was smaller than they thought because only the larger enterprises had use of (or were able to understand how to) make use of Cloudera.

Cloudera therefore made a major sales adjustment, and that apparently is still on-going. Just coming back to memory as we discuss a lot of such things on NPI.

As such I am going to dig a bit deeper, but in this period of market where we are free to invest in the winners and discard the losers, there is no need to try to invest in a turn around story prior to its hoped for turn around. Clearly Cloudera’s turn around is still in process from a business perspective and is one of those stocks one would clear out of their portfolio.

On the other hand a company like Talend continues to grow consistently at 40% per year, with an ARR growth, going back to 2014 at > 120%, with as stated earlier their Big Data Fabric and Cloud growing at faster than 100%. Certainly stuff to dig into, but there does not seem to be any hurry in regard to Cloudera.



I don’t know who will win the analytics wars - it could be CLDR or HDP expanding their capabilities and winning with some open source tool like Hadoop or Spark.

True, and then there’s Amazon Redshift, which is increasingly becoming the data warehouse of choice for companies that are already using AWS. I know that they’re not quite the same thing - Hadoop lends itself to parallel batch-type processing, whereas Redshift is for real-time analytics. But Redshift could be stealing market share from the other platforms you mentioned. It’s cheap, it’s fast, and it integrates with Amazon’s S3 (mass cloud storage) and Quicksight for data visualization.


So a quick update, just to make sure some people don’t mistake what I’m saying.

  1. I don’t think we know who the winners in cloud data are today, and I don’t think we will know for a few years. We need NoSql to start to take a big stake in key enterprise data stores, in order to understand how the change in the underlying systems will affect the data analysis vendors

  2. I personally have taken a small stake in Hortonworks and Cloudera because I want to be “in the game” in cloud data, and because I consider these companies to be undervalued and the key “picks and shovels” in cloud data. They roll out the Data Fabric that everyone else uses. They could also win the game - the best analysis systems out there are open source, and integrating open source is what these companies do.

  3. Cloudera and Talend are not competitors (for now). Talend doesn’t own its own Data Fabric - it uses the data fabric of its partners. Its top 2 implementations are on Cloudera and Amazon.

I should also mention that I have very little respect for the analysis from companies like Gartner. It might be shocking, considering how influential they are, but in my experience all they bring to the table is just info on company sales. They are fine as a starting point for the companies you should look at, but their actual analysis in their 2 by 2 charts is mostly cr**p in my opinion. Take that as a personal quirk, but I don’t take Gartner quadrants as a factor in my investment decisions.

Tinker’s relentless advocacy for Talend has gotten me to take a bigger look at them, however.

This is another case where companies use common words to mean different things. Here Cloudera uses “data fabric” to mean they provide it, and Talend uses “data fabric” to mean they use it, similar to Nutanix/Pivotal in the cloud. Cloudera is like Nutanix, providing the underlying data fabric on which the analysis happens. Talend is like Pivotal, providing tools that make it easier to do the analysis.

I will take a deeper dive into Talend when I have some time and see if they are a personal candidate for investment - considering where we are in the cloud data cycle, I am very price sensitive even if I like them and would only invest on a swoon.

If people are interested in my advice in this area, I would just say that cloud data is a risky area right now with everything in flux, no clear winners, and no clear moats, and expect this to last a couple of years. I would be price sensitive and invest slowly and carefully.


cloud data is a risky area right now with everything in flux, no clear winners, and no clear moats, and expect this to last a couple of years. no doubt. That is why I am buying a basket of of these stocks. But a select basket, embedding my guesses about likely winners. Certainly it would be safer to wait until later in the TALC. But one early lucky guess like MSFT INTC in the 1980’s , Amazon and Netflix later can pay off for decades.
I think a true collapse of most of these is unlikely as long as the Bull lasts. But watch out when it ends. Which is why I am keeping the exit sign in close view, and making sure the door is not locked.

My deepest thanks for your very informative posts. They have been been very helpful .


This is another case where companies use common words to mean different things. Here Cloudera uses “data fabric” to mean they provide it, and Talend uses “data fabric” to mean they use it, similar to Nutanix/Pivotal in the cloud. Cloudera is like Nutanix, providing the underlying data fabric on which the analysis happens. Talend is like Pivotal, providing tools that make it easier to do the analysis.

I sometimes get convoluted so this is part 1 ***************** (believe it or not I took out the fun parts)

When I turned my port over I chose companies that did not have any “real” competition, such as Nvidia does not (and yeah, I was tempted to just go Nvidia, but used the same rationale that Saul did for moving on from Nvidia - but I of course reserve the right to do what I wish in the future).

There are larger incumbents in the market, and much smaller start ups (that do not have sufficient scale to really compete), and in-between is Talend, not encumbered by legacy cost structures nor technologies, who is dominating in the new world of Big Data integration and cloud, while continuing to do well on-premise and hybrid.

Along with the ability to run “natively” on any cloud (because unlike Informatica, Talend produces Java code when it processes things - and has serious claims about superior speed such as on Apache Spark and not need to map reduce data (and 5x faster than MapR anyways)), Talend brings with it a disruptive pricing scheme that, at its optimum, according to the company, and explained in more detail at their product convention last month that creates, literally, 1/87th the data cost that Informatica will cost you (this data cost is both that Informatica charges proportionate to data load analyzed, plus the cost you pay to services in the cloud for your data).

Talend does not bill you for data loads, but just per seat, and Talend has developed methods, using containerization, and spot pricing, that they demonstrated a $28,000 data charge for an integration job that came down to $300 when using Talend to do the job instead. I WILL DO A SEPARATE POST TO DISCUSS THIS IN DETAIL - JUST LAYING OUT THE CASE THE COMPANY MADE - AND THIS CASE IS RIGHT IN TALEND’S SEC FILINGS. So for another post the details as to whether or not this is B.S. or not.

As such, and Talend management has mentioned this at, at least, 2 different recent earnings calls, that they have little competition. Their competition is as follows (1) hand coding, (2) their open source freemium product, and (3) then Informatica (they do not see IBM much).

This is consistent with their business results. 40% YoY revenue growth (with positive but small FCF) for years in a row, with revenue growth slightly accelerating; ARR growth at 120% + going back years.

Thus business results mirror Talend’s narrative of lack of real competition, best in the world in Big Data and cloud integration as well as ELT and ETL, and real time streaming, and all in-between, and truly disruptive data economics.

Part 2 ************************************

Part 2 goes with your comment that Talend is like Pivotal. I have noticed that a lot of companies are Pivotal like from MDB (with its Stitch component) to Talend (and a few more that I cannot recall at the moment).

But yeah, it is possible for Cloudera to add functionality like Talend offers, and it is possible for MDB to do the same, etc. Albeit, neither of them would do it nearly as well, nor have the marketing muscle to see it like a Talend does. But I digress on that part.

The real takeaway is that I looked at Cloudera after you brought it up, and besides its business performance and stock performance not reflection marketing leadership (not to mention the poor showing in the just released Forrester report - that very surprisingly had Talend the clear #1 in both market presence (equal to what SAP and IBM have) as well as vision and implementation, with clear space between them and everyone else) Cloudera was just TOO COMPLEX!

I do not know how Cloudera goes in to a company and tries to sell a specific product. Cloudera has too much, no focus, and it is all complex.

Talend goes in there as says (1) we are the best at integrating and cleaning data, (2) we do it more cost effectively creating clear disruptive data economics, and (3) we run natively on every cloud, 5x faster than MapR, x times faster than Z, and we run synergistically with all the Hadoop offerings and more recent offerings like Snowflake, and thus run faster than anything else out there.

So we are cheaper, and better than anyone else to move and clean your data. Any questions?

Talend has a simple message to sell. Talend’s product is simpler to use (and will continue to become more user friendly over time), Talend has a huge user base and community due to its open source aspects, Talend has 3 million downloads of the freemium product, and Talend does yes MAKE IT EASIER TO DO ANYTHING WITH YOUR DATA THAN CLOUDERA DOES.

If Cloudera wants to compete with Talend, Cloudera needs to do some serious marketing restructuring and some serious modification of how it delivers its product.

Part 3 *********************

This all said, I am still evaluating Talend, but that is the 10,000 foot rationale for my investing in Talend. No real competition, disruptive data economics, simple sales message that makes very complex data integration much simpler of a process to both understand and implement (and Talend is working to make it even simpler so as to further expand its reach into its customers - ala what Alteryx does).

Not a recommendation or derecommendation, just the investment narrative.

Talend’s story jives with its business success; Cloudera’s story does not as it has not had great business success relatively speaking.



@Tinker - Try not to drink the koolaid provided by the marketing groups of these companies. Every company looks like God’s gift, if you read their PR and case studies.

Along with the ability to run “natively” on any cloud (because unlike Informatica, Talend produces Java code when it processes things

Java is one of the default languages on the internet, along with Html, xml, Javascript and a few newer ones. Most companies produce code that runs “natively” on the cloud. Informatica is just an old dog. I would consider this a yellow flag against investing in Informatica, rather than a reason to invest in Talend.

and has serious claims about superior speed such as on Apache Spark and not need to map reduce data

Apache Spark is an open source library that competes against Hadoop MapReduce, which is another open source library. Apache Spark runs in memory while Hadoop MapReduce runs on disk. That means that of course Apache Spark runs much faster - but it also costs a lot more since you need massive amounts of memory to run. There are various other differences between them, and very passionate technologists on both sides. Both technologies are just getting started and are going to get a lot better - and of course something new could come up also.

But the most important thing here is that neither technology has a thing to do with Talend. Every cloud data analysis tool set provider works with both technologies. Cloudera, Hortonworks, Talend, and your aunt’s data analysis framework all have MapReduce and Spark.

What Cloudera and Hortonworks do is provide a data fabric you can use to analyze your data, and configure it with MapReduce or Apache Spark or whatever you like. They are DIY vendors. Here is your data fabric - go do what you want. But they are also adding things that their customers want, to make it easier to get the data analyzed - so they are creeping into Talend’s domain.

What Talend does is provide a set of tools that makes it easier for you to put your data analysis workloads on a data fabric. They don’t have a data fabric of their own - they work with partners, such as Amazon or Cloudera. They don’t own Spark or MapReduce - these are open source and not created by their developers. What they do is have tools that make it easier to get your data analysis done. Instead of DIY, they have standard scaffolding, have processes, and basically, their tools try and hand-hold you to get your data analyzed.

Now Talend is getting a lot of revenue, so there is little doubt they are currently providing value to their clients. My worry is how they will work out long term.

Part of the reason they provide so much value is that MapReduce and Spark are new and so still rough around the edges and hard to use. Over time, the technologies will get better. As the technologies get better, there may be less need for client hand-holding.

Also, none of these technologies are great yet. If a different technology comes up, Cloudera and Hortonworks will be in a better position to use it, since they are technology neutral. They just provide their platform and you can use whatever you want, while Talend has already made its bet.

None of this is to say Talend is a bad investment or is going to lose the game. But don’t take their word for it - they don’t know yet, none of us do.


Talend and MapR

Talend and Spark and Hadoop

Talend and Cloudera

Talend and Hortonworks…

Talend and Snowflake…

Talend and AWS lambda going “serverless”…

Talend and Qubole - disruption of data economics with “serverless” analytics.

Talend CEO describes Talend’s new products at Connect 2018. If I recall correctly, ~ 30 minutes in he demonstrates the why and how of utterly disrupting data economics. It is the Day 1 video on the left.

Just a few things Talend is doing. They are like Pivotal in that they are the software tool that enables you to distribute data anywhere you want, no lock in, and enables best practices in any many of data integration that you choose from real time analytics, ELT, to older ETL, with special focus on (and 100% R&D money put into) Big Data and cloud.

Yes, Cloudera may someday encroach, but if they encroach, they encroach only for Cloudera, and only as it wants to, instead of how the customer wants to.…

How Domino’s turned their business around with using big data, with Talend as the core of their architecture.

85,000 different sources of data, both structured and unstructured updated daily.

This is an excellent example of what Talend enables, and it is not likely that Cloudera or any other such company is going to be able to duplicate, much less keep up with the the Talend product offering, nor its cost.…

Talend blog on the Domino’s example giving more details of how many disparate data sources are collected, cleaned, and merged daily.

With Talend, Domino’s has built an infrastructure that collects information from all the company’s point of sales systems and 26 supply chain centers, and through all its channels, including text messages, Twitter, Pebble, Android, and Amazon Echo. Data is fed into Domino’s Enterprise Management Framework, where it’s combined with enrichment data from a large number of third party sources, such as the United States Postal Service, as well as geocode information, demographic and competitive information.

Domino’s has more than 7000 stores to begin with as just a start, and there is a lot of subjectivity as to how the data is entered in each store, customer by customer.

Just some of the things Talend does, and appears to be the leading vendor in the world in regard to Big Data, and in regard to use of the cloud.

Talend does make using all these cloud sources and databases, whatever their source, however they evolve (as Talend has evolved with them) easier and made the sources more useful. Talend is also vendor neutral, as they are integrated in about everything worthwhile to be integrated with.

And yes, Java is nothing special, except there is not another major vendor in this space that uses code that creates Java. Sounds weird but appears to be true.

Yes, there are smaller vendors that I am sure are doing good things. But that is true in every market, and very few, if any, small vendors, once a clear leader is established, manage to breakthrough.

Thus, although Cloudera and the like may “encroach” more into Talend’s space as they expand their offerings, that is like saying that AWS and AZURE will encroach into Pivotal’s domain, or Cloudera’s domain or Mongo’s domain (as both cloud services have products that already directly encroach on each of the domains of each of these companies).

“Encroaching” if a far cry from actually displacing, particularly when what Talend does is a moving and ever expanding target as Hadoop may fade from importance, or MapR become antiquated, but Talend adapts to every new technology and protocol that develops that is worthwhile to adapt to, enabling more user friendly access and greater utility to each new development.

Anyways, just a little spiel on Talend. Not just from their marketing pieces, and supported by their business success in the real world.

Whether or not this is continued a sustainable business model with a sufficient moat, it is up to each of us to decide. Presently Talend is running around like the Huns in between the Western and Eastern Roman Empires, letting the Informatica’s of the world keep their cities (legacy customers w lock in) and instead roaming mostly unopposed in-between. At least that is the analogy I have used before. For what it is worth Bert likes them as well, as he has written two articles in regard.

Again, I am still trying it on as an investment. It is my smallest holding, and of late has performed least well. However, with the dearth of real competition, the new Forrester ranking it can market to the moon, and an extremely large market out there to exploit, creating disruptive data economics (as claimed anyways), and 40% a year growth year after year (doubling every 2 years) it is at least hypothetically a long term investment worth considering.

I consider an investment as long term so that I can hold it at least a year to minimize tax issue if I decide to move on thereafter or not and hope that I will not want to move on at that point.



Here is Talend integrating with SAP data systems.

I am sure there are many more. Talend is the data version of Mule (that is the application integrator). Mule was urchased for 16x forward revenues if I recall correctly. I do not know if Talend is as valuable. Seems to me that MULE may have been more unique. Dumaflotchie made out big on MULE as well and had a fun time doing it with a name like Mule. Ride the Mule!

Given Talend’s position of integrating all this data from any source that seems to be worthwhile integrating it to and from, in whatever format it needs to go, Talend has to be on an acquisition list as well one would think. Maybe not. I don’t know.

But you get the gist of what makes Talend more than just an ETL company, or more than just basically an application, or more closely a user friendly operating system on top of someone else’s cloud. Although not the same, clearly, Microsoft ran on top of someone else’s computers. Apple at least has the decency to run on top its own dang computers! :wink:

What I look for in an investment is insulation from real competition, great management, long runway of growth, and real world business and stock performance that is consistent with this narrative. Not easy to find such companies, but we seem to do a pretty dang good job of it here on Saul’s board and on NPI.



but their actual analysis in their 2 by 2 charts is mostly cr**p in my opinion

I suppose you get to read their detailed report behind the magic quadrant. Because magic quadrant is not based on sales, rather features, functionality, ecosystem all have higher weightings.