It’s fascinating to me that many here are so interested in Big Data plays, for instance:
Hortonworks
Talend
MongoDB
Alteryx is a related play in this space
We’ve had threads on these companys’ financials and we’ve had threads on their underlying technologies. But as we all know, the best technology doesn’t always (if ever) point to the most successful company (eg, VHS vs. Beta, Windows vs MacOS, McDonald’s vs In-N-Out). But OTOH, while financials are extremely important, they don’t necessarily predict the future and they certainly can’t predict an eventual disruption.
Technology-wise, this is a confusing space. For instance, this article (It’s Time to Stop Using Hadoop for Analytics) will probably leave non-DB programmers very confused: https://www.interana.com/blog/stop-using-hadoop-analytics/ Google has long since publicly abandoned Map Reduce, yet it’s still a mainstay of many a Hadoop company. Companies not in the space, like MongoDB, try to stay relevant by talking about getting data in and out of the HDFS (Hadoop Distributed File System), see for instance: https://www.mongodb.com/hadoop-and-mongodb
Remember, Hadoop is an open-source software framework that was built to enable massively scalable storage and batch data processing. There’s no argument that Hadoop is complex and that technologies built on it crop up seemingly every month with very weird names (eg, Pig, Scuba, Hive, Spark, Dremel, Kafka, etc.) that give you no clue as to what they actually do. The batch processing aspect of traditional Hadoop is a problem for many current use cases, such as IoT, that need closer to real time processing. Facebook struggled with this for a while and developed its own solutions for both less than a minute processing and easier access to data by non-programmers. They wrote a paper on it back in the day.
Both Hadoop and MongoDB have security issues that the more mature Relational DB systems don’t have. From https://www.upguard.com/articles/apache-hadoop-vs.-mongodb-w… Most of Hadoop’s security shortcomings revolve around the central drawback of the platform: complexity. Because the platform—an intricate array of interworking components—is difficult to configure and manage, attack vectors are often left exposed by less-experienced Hadoop architects. And because Hadoop was not initially designed for security (initial uses of it were restricted to private clusters in trusted environments), incorporating security into the framework can be a challenge. Initial versions didn’t even authenticate users or services, incorporate data privacy controls, or encryption at the storage/network levels. Hadoop now comes with basic security mechanisms for things like authentication and authorization, but they are nonetheless turned off by default. Additionally, as a Java-based technology, Hadoop is subject to many of the exploits inherent to the language.
MongoDB has its own security issues: Though perhaps not as widely publicized as Hadoop’s shortcomings, MongoDB also harbors many critical vulnerabilities. Like Hadoop, MongoDB (and indeed most Big Data technologies) carries some baggage due to its origins in the private data center. These powerful data crunching platforms have been accelerated by the advent of the cloud, but have also gained a plethora of attack vectors as a result.
Talend (TLND) has its own confusing array of integrations with Hadoop. Take this Talend article, https://www.talend.com/resource/hadoop-setup/ , for instance: For organizations wanting to leverage cutting-edge Hadoop technologies for performing big data analytics, there are two broad dimensions to Hadoop setup. The first is the installation and configuration of the Hadoop core packages and Hadoop applications like HDFS, Hbase, or Hive. The second stage of Hadoop setup is the development of automated processes to move your data into Hadoop and to perform operations on it once it’s there.
For the installation and configuration stage of Hadoop setup, helpful guidance is available from the Apache project website, as well as from the websites of Hadoop distributions like Hortonworks, Cloudera or MapR. For loading big data into your Hadoop cluster and processing it there, the simplest and fastest solution is Talend Open Studio for Big Data, the free application from open source data integration leader Talend.
The very next statement is an oxymoron: Talend Open Studio for Big Data features an Eclipse graphical development environment that makes it easy to design and execute your Hadoop setup without having to do any coding. If you’re not a programmer, you probably don’t know that the “Eclipse graphical development environment” is. It’s a programmer’s tool for which you can write plug-ins. Jeez.
But, my point is NOT to have a technical argument over which technology is superior, but to point out that the complexity in these technologies is great. Experts disagree on superiority and appropriate choice and application. So, how can any of us know which companies will find the right markets for their technologies, construct the right solutions, and put in place the necessary aggressive marketing to make a lot of money? I say we can’t.
Creating mind-share around information technologies is of the upmost importance for success. A while back Microsoft made a big push around Bing competing with Google. Many a comparison was performed and Bing performed at least as well - in many cases providing more relevant results higher up on the list than Google. Microsoft even paid people to use Bing indirectly with reward points. But, Google held on to its lead because it already had the public mind share. It’s easy to switch search engines, but people didn’t.
The Database provider world is not like eCommerce shopping carts or website creation tools or shoes or home builders or electric cars. While I understand it more than most (and less than experts), I don’t pretend to be able to pick the technological winners and losers.
To me, that means we’re picking these companies by their current financials, and trying to use the past as an indicator of future performance. Even if you get that financial analysis right, that’s only for a point in time. As new technologies come into the market (see my “weekly” comment above), the DB world will continue to change and disruption can and will happen.
I’ve been bearish on Oracle for years. TMF recommended Oracle while I was bearish. I don’t know how well that worked out for them. But, I would not take the slow pace of disruption of Oracle as an indication that any of these high flying new companies won’t be disrupted more rapidly. Oracle won’t suddenly die as many customers will need support for their legacy databases for years if not decades. But, these new companies haven’t built up that kind entrenchment, the data is more fresh, in many cases the data isn’t production data but trial data to see how the new technologies work, and so I think companies are more likely to migrate off of them much more quickly when the next big thing comes along.
Unless one of these companies is already the next big thing. As a software professional, I can say that Oracle is VHS and that it’s going to take a DVD to displace them, but also that that’s already happening. But I don’t know which of these companies, if any, are the DVD to disrupt VHS, or which is the HD-DVD, or even which is the BluRay.
More importantly, it may not matter anyway because some video streaming company is going to disrupt all of them in the market. Can you pick that winner here?