@jpcIV Jack - you guessed it - my name comes from the great Hermann Hesse. I recently reread the Glass Bead Game - he is an amazing author. And thank you for your detailed comments on Nektar - they have been very helpful for me in thinking about the company.
You asked about my thoughts on Cloudera, as I said I’m long on it in another post, and also pointed me to this great article by Bert: https://seekingalpha.com/article/4181161-cloudera-big-data-c…
I also saw the posts by Tinker.
Conclusion first: I mostly agree with Bert, and mostly disagree with Tinker. I am long both Cloudera and Hortonworks, and I look at these as value investments in the cloud.
So if we look at data, this is an enormous area. And I do want to differentiate between cloud and data - I often see them discussed as if they are the same thing, and they aren’t at all, although they are affected by some of the same forces.
Ignoring the hardware vendors like Pure Storage, EMC, Nvidia, and AMD, you basically have the databases, the storage fabric, and the database tools.
I think you know my view on databases - Mongo has won for NoSql databases. There are 3 basic types of data stores in the Master Data model - transactional, enterprise, and analytics. Mongo is ideal for enterprise data. Sql is ideal currently for transactional, though I expect Mongo will take over there as well over time.
Analytics is the big problem. Analytics data volumes have exploded with the Internet, and even more with social media and web 2.0. In the mean time, analytics requirements have gotten immensely complex with AI, Deep Learning, and whatever is the latest buzz word that gets Universities grants from companies.
This problem hasn’t been solved and we are in the early days - there is no clear winner yet, though there are some indications.
The first part of the solution is the Data Fabric, which means a way for us to provision a huge number of servers to hold the data we are going to analyze, then to configure it with a distributed file system, then push in the data, analyze it with the engine of choice, get the results, then clean up the scraps.
This is what both Cloudera and Hortonworks do. Hortonworks is a partner of Pivotal (whose developers developed many aspects of Hadoop, by the way) and it is the top data fabric vendor for startups. Cloudera is a partner of MongoDB, and is the top data fabric vendor in the enterprise. This is just relative - both are after both markets.
Mongo is one of the contenders for the middle tier, meaning where you source the data and return the answers, and I expect with their money for R&D, they will keep winning in this area.
Hadoop Distributed File System is the file system winner, and I think this is pretty much decided, especially since it is open source. If the requirements change, the developers will just update HDFS.
Finally you have the analysis on top. If you look at companies like Talend and Informatica, they focus on ETL (extract, transform, load) which is a really old process based on analyzing data in Sql databases. You do this very slowly in offline processes. ELT (extract, load, transform) is an updated version of this trying to do it when you need it. Both of these are not keeping up with modern data analysis needs.
This is also why Cloudera and Talend aren’t competitors. In fact Talend runs on the Cloudera fabric, as well as the AWS fabric, and others. But ETL is old and not where the future is headed.
Hadoop Map Reduce and Apache Spark are the contenders for modern analytic engines on the data fabric. These are the real competitors to Talend and Informatica. Apache Spark has been winning lately, but this is very far from decided, there are different use cases for each, and anyway every Hadoop vendor support both. Now there is nothing saying that Talend couldn’t win with a new Apache Spark based engine - but there is nothing saying it will succeed either.
So maybe this explains why I’m invested in Cloudera and Hortonworks, and not in Talend or Alteryx. I’m not saying Talend or Alteryx are bad companies to invest in today - they are growing fast. But they don’t look to be the future. I’m perfectly willing to write puts when they drop in price, and own them for a while or get the income when they rise. But these are very temporary moves in companies I don’t believe in for the long term. These companies are just filling in while the real technologies are being developed.
In the long run, for companies that have the EV/S multiples of cloud companies, I’m looking for a big moat. I know MDB and PVTL have this. I have positions in CLDF and HDP because I want to be in cloud data fabric, and they are on the cutting edge and relatively cheap for cloud companies. However the data analysis wars turn out, the winners will still need to run on the data fabrics, provided by CLDR and HDP - they are the “picks and shovels” providers.
I don’t know who will win the analytics wars - it could be CLDR or HDP expanding their capabilities and winning with some open source tool like Hadoop or Spark. Or it could be Talend or Informatica reinventing themselves away from ETL to a modern analytics library. Or it could be Mongo expanding their analytics capability - this is a really strong possibility. Or it could be a new startup with a whole new approach. We don’t have the answer yet, and we won’t even be ready for an answer until there is enough NoSql in the middle tier to force enterprises to decide what they are going to spend their analytics money on when Sql is less important.
So today, I’m keeping my ears up for who is winning the future, and buying the picks and shovels that everyone will need no matter who wins.