Big Data Plays

Smorgasbord1 · March 24, 2018, 7:27pm

It’s fascinating to me that many here are so interested in Big Data plays, for instance:

Hortonworks
Talend
MongoDB
Alteryx is a related play in this space

We’ve had threads on these companys’ financials and we’ve had threads on their underlying technologies. But as we all know, the best technology doesn’t always (if ever) point to the most successful company (eg, VHS vs. Beta, Windows vs MacOS, McDonald’s vs In-N-Out). But OTOH, while financials are extremely important, they don’t necessarily predict the future and they certainly can’t predict an eventual disruption.

Technology-wise, this is a confusing space. For instance, this article (It’s Time to Stop Using Hadoop for Analytics) will probably leave non-DB programmers very confused: https://www.interana.com/blog/stop-using-hadoop-analytics/ Google has long since publicly abandoned Map Reduce, yet it’s still a mainstay of many a Hadoop company. Companies not in the space, like MongoDB, try to stay relevant by talking about getting data in and out of the HDFS (Hadoop Distributed File System), see for instance: https://www.mongodb.com/hadoop-and-mongodb

Remember, Hadoop is an open-source software framework that was built to enable massively scalable storage and batch data processing. There’s no argument that Hadoop is complex and that technologies built on it crop up seemingly every month with very weird names (eg, Pig, Scuba, Hive, Spark, Dremel, Kafka, etc.) that give you no clue as to what they actually do. The batch processing aspect of traditional Hadoop is a problem for many current use cases, such as IoT, that need closer to real time processing. Facebook struggled with this for a while and developed its own solutions for both less than a minute processing and easier access to data by non-programmers. They wrote a paper on it back in the day.

Both Hadoop and MongoDB have security issues that the more mature Relational DB systems don’t have. From https://www.upguard.com/articles/apache-hadoop-vs.-mongodb-w… Most of Hadoop’s security shortcomings revolve around the central drawback of the platform: complexity. Because the platform—an intricate array of interworking components—is difficult to configure and manage, attack vectors are often left exposed by less-experienced Hadoop architects. And because Hadoop was not initially designed for security (initial uses of it were restricted to private clusters in trusted environments), incorporating security into the framework can be a challenge. Initial versions didn’t even authenticate users or services, incorporate data privacy controls, or encryption at the storage/network levels. Hadoop now comes with basic security mechanisms for things like authentication and authorization, but they are nonetheless turned off by default. Additionally, as a Java-based technology, Hadoop is subject to many of the exploits inherent to the language.

MongoDB has its own security issues: Though perhaps not as widely publicized as Hadoop’s shortcomings, MongoDB also harbors many critical vulnerabilities. Like Hadoop, MongoDB (and indeed most Big Data technologies) carries some baggage due to its origins in the private data center. These powerful data crunching platforms have been accelerated by the advent of the cloud, but have also gained a plethora of attack vectors as a result.

Talend (TLND) has its own confusing array of integrations with Hadoop. Take this Talend article, https://www.talend.com/resource/hadoop-setup/ , for instance: For organizations wanting to leverage cutting-edge Hadoop technologies for performing big data analytics, there are two broad dimensions to Hadoop setup. The first is the installation and configuration of the Hadoop core packages and Hadoop applications like HDFS, Hbase, or Hive. The second stage of Hadoop setup is the development of automated processes to move your data into Hadoop and to perform operations on it once it’s there.

For the installation and configuration stage of Hadoop setup, helpful guidance is available from the Apache project website, as well as from the websites of Hadoop distributions like Hortonworks, Cloudera or MapR. For loading big data into your Hadoop cluster and processing it there, the simplest and fastest solution is Talend Open Studio for Big Data, the free application from open source data integration leader Talend.

The very next statement is an oxymoron: Talend Open Studio for Big Data features an Eclipse graphical development environment that makes it easy to design and execute your Hadoop setup without having to do any coding. If you’re not a programmer, you probably don’t know that the “Eclipse graphical development environment” is. It’s a programmer’s tool for which you can write plug-ins. Jeez.

But, my point is NOT to have a technical argument over which technology is superior, but to point out that the complexity in these technologies is great. Experts disagree on superiority and appropriate choice and application. So, how can any of us know which companies will find the right markets for their technologies, construct the right solutions, and put in place the necessary aggressive marketing to make a lot of money? I say we can’t.

Creating mind-share around information technologies is of the upmost importance for success. A while back Microsoft made a big push around Bing competing with Google. Many a comparison was performed and Bing performed at least as well - in many cases providing more relevant results higher up on the list than Google. Microsoft even paid people to use Bing indirectly with reward points. But, Google held on to its lead because it already had the public mind share. It’s easy to switch search engines, but people didn’t.

The Database provider world is not like eCommerce shopping carts or website creation tools or shoes or home builders or electric cars. While I understand it more than most (and less than experts), I don’t pretend to be able to pick the technological winners and losers.

To me, that means we’re picking these companies by their current financials, and trying to use the past as an indicator of future performance. Even if you get that financial analysis right, that’s only for a point in time. As new technologies come into the market (see my “weekly” comment above), the DB world will continue to change and disruption can and will happen.

I’ve been bearish on Oracle for years. TMF recommended Oracle while I was bearish. I don’t know how well that worked out for them. But, I would not take the slow pace of disruption of Oracle as an indication that any of these high flying new companies won’t be disrupted more rapidly. Oracle won’t suddenly die as many customers will need support for their legacy databases for years if not decades. But, these new companies haven’t built up that kind entrenchment, the data is more fresh, in many cases the data isn’t production data but trial data to see how the new technologies work, and so I think companies are more likely to migrate off of them much more quickly when the next big thing comes along.

Unless one of these companies is already the next big thing. As a software professional, I can say that Oracle is VHS and that it’s going to take a DVD to displace them, but also that that’s already happening. But I don’t know which of these companies, if any, are the DVD to disrupt VHS, or which is the HD-DVD, or even which is the BluRay.

More importantly, it may not matter anyway because some video streaming company is going to disrupt all of them in the market. Can you pick that winner here?

dumaflotchie · March 24, 2018, 9:24pm

More importantly, it may not matter anyway because some video streaming company is going to disrupt all of them in the market. Can you pick that winner here?

Yes.

Hey Smorg:

You ask good questions and your examples are well chosen.

However, the final question seems to me to be an oversimplification and an answer to your question requires one to assume this is a zero sum game. I will answer that question in just one second regarding MondoDB but before I do, could you clarify about the security issues with Mongo?

Several years ago, as reported on the NPI, there were security issues reported but these turned out to be user errors when we investigated and not endemic to the database software itself…do you know of some other vulnerability?

Here was an article that addresses this issue of proper configuration:

https://www.infoworld.com/article/3164504/security/the-essen…

Is MongoDB database software secure? Does it meet these standards? The short answer: Yes it is, and yes it does! It’s simply a matter of knowing how to set up, configure, and work with your particular installation.

Let me know if you are aware of other issues.

Now regarding your querie about picking the winner…it is not necessary to do so for this to be a great investment.

http://discussion.fool.com/mongodb-vs-oracle-thoughts-33009405.a…

As a result, we expect enterprises to take a hybrid approach to their database deployments, and leverage a combination of relational and nonrelational databases in order to take advantage of each database’s respective strengths/weaknesses.

We believe the opportunity presented by both of these segments of the database market is significant and continues to grow, with recent reports published by IDC estimating that relational and nonrelational database system spend will grow at a 5-year CAGR of 6.3% and 18.4%, respectively to $42.3B and $12.4B by 2021.

Looking at the market segmentation further we see that:

Generic database TAM estimated at over $65 Billion annually with lower to higher hanging fruit being NoSQL ($4 billion), new SQL ($16 Billion) and legacy displacements ($45 Billion). The latter two categories are really mostly accessible after release 4.0 this summer.

So there needn’t be a winner take all requirement for MongoDB…even on purely non-SQL, it has a TAM of $12 Billion…if it just sticks to a “hybrid” model of deployment and accesses some percentage of the “new SQL” and “non-SQL” above…estimated TAM at $20 Billion annually over the next 5 years…MongoDB could do $1 Billion is annual revenue completely ignoring legacy databases of ORCL…less than 10% of the estimated TAM of this hybrid construct.

Tell what its market cap would likely be with $1 Billion revenue run rate…around 2% of ORCL’s revenue run rate??

So…as I suggested…why must you insist on a winner take all???

Furthermore, as an additional investment thesis on steroids, if MongoDB’s new multidocument 4.0 release this summer does allow them to effectively substitute for ORCL, then of course the investment would be of gargantuan proportion…but it just isn’t necessary to dream this big…just based on the market segmentation I parsed out above.

Hope that clarifies why the requirement for MongoDB to completely displace ORCL just isn’t necessary.

Best:
Duma

RedandBlack · March 24, 2018, 10:43pm

Can you pick that winner here?

I know less than most on these boards about the big data companies. For that reason I’ve decided to invest in a basket of these companies. I have a helping of TWLO, TLND, AYX, and SPLK currently and may add MDB soon. I have no idea who will win in this space so have less than 5% of my holdings in aggregate for this basket. I’m hopeful that one of the above can achieve ORCL like market cap. If that happens I win.

Hodges

captainccs · March 25, 2018, 12:50am

There is a lot of merit to both Smorgasbord1’s and Duma’s arguments. I take a third path. Indeed most of these technologies are hard to understand and certainly picking “the” winner is more luck than skill. This is the reason I’m using ETFs for most of my technology investments. They will underperform the best technology stocks but they will outperform the general market. I’m OK with that because of the lesser risk.

That said, I have invested in two individual technology stocks, NVDA and MDB, because I believe their investing scenarios are easy enough to understand. I liken NVDA very much to ARM Holdings, they have Gorilla status because they have high switching costs. MongoDB is a different scenario, they are servicing a new customer that relational databases don’t service well, the new “big data” market that is essential for AI. This market does not need the precision that relational databases offer and therefore, as Duma says, they can live side by side with Oracle. The additional attraction of MongoDB is that they could disrupt the relational database market with their new transaction safe version 4.

I’m investigating MongoDB in more depth. I’ve created a sandbox account with their Atlas cloud service and installed the community version on my Mac laptop. Now I have to figure out how to use them. Reading one of their white papers I discovered that one does not need a database to have the equivalent of a relational database directly on the file system. I started writing a “noDb” class to test out the idea. It’s a natural outgrowth of OOP programming. Each object instead of being a row in a table is a file in storage. It’s a exciting idea.

Denny Schlesinger

RaptorD2 · March 25, 2018, 2:25am

To me, that means we’re picking these companies by their current financials, and trying to use the past as an indicator of future performance. Even if you get that financial analysis right, that’s only for a point in time. As new technologies come into the market (see my “weekly” comment above), the DB world will continue to change and disruption can and will happen.

Smorg, this makes no sense. What company can you NOT say the same things about? We don’t have the future financials. If you do, please share. To analyze anyone’s financials requires CURRENT financials unless someone got their time machine to finally work.

Sometimes we forget that when Mr. Softy was burning up the stock market, there were no guarantees that Windows would become the de-facto OS for the world. In fact, that little competitor that starts with an “A” invented the system in the first place that eventually became Windows. If Apple had been a little more aggressive, and Mr. Gates a little less so, we’d all be posting from mac books and Microsoft would only be found in schools and art studios.

Things NEVER are simple in real time. In hindsight, almost EVERYTHING is obvious. To invest is always looking to the future and the future can not be known until it is the past.

Big data is complicated for non-programmers. I’ll testify to that. But if growing revenue by 50-60-70% isn’t some kind of indication of whose product is in demand, then what is? Since we don’t know who will win, should we not invest in data? I don’t know who’s going to win the pharma wars either, so biotechs are out. I don’t know who’s cyber payment system will take over (if any one does) so that’s out … pretty soon there’s nothing left to invest in.

In the meantime, investing in big data, biotechs, software and cyber payment systems has done pretty well for me; maybe I’m just lucky. But sir, I suspect you have been lucky in the very same sectors as well. If so, are you really saying that’s pure coincidence?

I can’t help repeating myself; it makes no sense. Even the straw man is a hologram.

Investing is always forecasting the future and the future is unknown. We need to accept these facts or we should just have a party and play poker.

Please pass the chips, dip and the Bacardi. It’s your deal.

Dan

ps: Who forecast that Levi’s, Guess and LL Bean jeans would all be out of fashion for college students and that Carhart work pants would be all the rage for college students this year–again? I want to invest in his time machine!

tamhas · March 25, 2018, 10:53am

do you know of some other vulnerability?

Security is a very complex topic and one cannot simply tick of a box and conclude that a database is secure. One example of this is encryption. If one is going to store credit card numbers in a database which is accessed by customers, one could well decide that merely providing authentication was not sufficient to protect this sensitive data. Encryption capabilities vary greatly in RDBMS, both in terms of how fine grained the encryption can be and in the performance hit for encrypting some part of the database. Does Mongo support any form of built-in encryption, i.e., not provided by the associated code?

Furthermore, as an additional investment thesis on steroids, if MongoDB’s new multidocument 4.0 release this summer does allow them to effectively substitute for ORCL,

The operative word there is “if”. Multi-document/table ACID transactions are a minimal requirement for supporting some kinds of applications, but they are far from the only component. The importance of other factors depends on the application, of course, but it is likely that there will be many applications which are utilizing other advanced features of mature RDBMS which would not be good candidates for Mongo, even with the 4.0 features. While the rigid schemas of RDBMS are a negative feature when the incoming data consists of highly variable documents, those schemas also provide validation, structure, easy interface to reporting tools, consistency checks, and very rapid access by indexed fields, including advanced indexing such as word indexing.

But, as you and Smorg are both saying, there is no need to imagine that Mongo is suddenly going to be a preferred database for every application since there is gobs of growth available from new applications for which it may be an enabling capability supplemented by some conversion of existing applications which were not well suited to RDBMS in the first place, but were implemented there because alternatives did not seem to exist.

Smorgasbord1 · March 25, 2018, 2:45pm

OK, lots of responses here:

First, security isn’t my point, it’s just an example of the complexity involved in these technologies. That over 40,000 deployments got it wrong is proof of that. Professional DB software engineers aren’t ensuring that the right thing is being done, how can we non-professionals stand a chance of picking technology winners and losers? We can’t.

Duma writes: an answer to your question requires one to assume this is a zero sum game.

No, I don’t believe it does. I’m not talking about a “winner take all requirement” - that’s your wording. What I actually said was:

So, how can any of us know which companies will find the right markets for their technologies, construct the right solutions, and put in place the necessary aggressive marketing to make a lot of money?

Note that I use plural words (companies, solutions) throughout.

RaptorD2 writes: We don’t have the future financials.

Right! And so to compensate we often turn to evaluating business fundamentals. What is the company’s market share? What’s the TAM? How good are the company’s products or services? How is the company positioned relative to competitors? But, as I said:

The Database provider world is not like eCommerce shopping carts or website creation tools or shoes or home builders or electric cars.

captainccs writes: MongoDB is a different scenario, they are servicing a new customer that relational databases don’t service well, the new “big data” market that is essential for AI. This market does not need the precision that relational databases offer and therefore, as Duma says, they can live side by side with Oracle.

First, there is not a “precision” difference between relational and NoSQL databases. That is NOT a factor in choosing a database.

What relational doesn’t do well is scale to petabyte size AND handle data that doesn’t fit into a predetermined schema (fancy way of saying fields in a table). One way to look at the differences is that traditional relational databases require that the data be sent to it in a specific format. That helps optimize data access and processing as well as simplify the programming interface to the data (which is what SQL, or Structured Query Language, is all about!) OTOH, Hadoop and other noSQL databases can handle data of varying structure since the access portion is custom written to handle it anyway. That means you’re writing custom code to access data. Executing that code slows things down, which again is why Google has moved away from traditional Hadoop tools like Map Reduce.

As I said, while I’m a software professional, I’m not a Database expert. But I probably have a greater understanding than 99% of the stock analysts out there following these companies. If I have trouble figuring out the technology and products - heck if tens of thousands of professionals deploying these systems have trouble doing so - how can I trust what the technologically clueless stock analysts say about TAM, for instance? I can’t.

How many companies are using Hadoop or Mongo for the wrong reasons? It may be the cool thing to do, but just because you’re doing some AI doesn’t mean you need either Hadoop or Mongo. Unless the quantity of data you’re processing is too large for traditional Relational databases in terms of cost or performance, or unless you need to handle new data in various structures in the same database, then your application may not be appropriate for these technologies. And since both are complex to setup, run, and achieve necessary performance, you may be better off with other solutions. Here’s an article with some more depth: https://medium.com/xplenty-blog/the-sql-vs-nosql-difference-…

captiancss writes: Reading one of their white papers I discovered that one does not need a database to have the equivalent of a relational database directly on the file system.

I hate to break it to you, but this is the whole premise behind HDFS (Hadoop Distributed File System). It’s a file system, not a structured database. Same for most other NoSQL implementations. With a relational DB, you have a table with specific columns of data - type and size for each column. There’s also a mandatory index column to find the row in the table. So, you have to massage your data to store it in the DB table(s). With NoSQL systems like Hadoop and MongoDB, you just store files.

It’s a natural outgrowth of OOP programming. Each object instead of being a row in a table is a file in storage. It’s a exciting idea.

Object Oriented Programming for database access is nothing new. JDBC is a Java object-oriented interface that, for instance, lets you define DataSource objects as almost anything you want in relational databases or even files. It’s been around for well over a decade.

I don’t mean to start a technology fight here. But, if people are saying that Hortonworks or MongoDB are AI plays, that’s not good reasoning. If people are saying you can’t use OOP with relational databases, that’s clearly wrong. If people are saying that real time access performance is a reason to use an NoSQL database, they’re got it backwards.

OTOH, if people are saying that Big Data is coming to many more companies, and the size of the data they’re dealing with is getting too large for relational databases, OK, that’s truly indicative. But, how are people sizing the TAM for the growth in database sets too large for RDMSs? Remember, this cuts across all industries, both online stores and traditional hard good manufacturing, both people service industries like retail and business to business service industries like SAAS. Who knows enough about all of these potential uses to say which of them are too big for RDMS or need the changing data schema flexibility of a file-based storage system? I’d like to meet these jack of all trades, master of them all, people.

brittlerock · March 25, 2018, 7:12pm

Raptor,
To be clear, windows, the use of a pointing device (mouse), OOP (i.e., Smalltalk) and a host of other technologies came out of PARC (Palo Alto Research Center). This research center was part of the Xerox corporation. I’ve read (I won’t assert that it’s true) that there were large portions of PARC code (public domain) in the early iterations of the Apple OS.

Unfortunately for Xerox, the management didn’t perceive that these technologies would revolutionize computing and they failed to exploit them for the benefit of their owners.

Smorgasbord1 · March 26, 2018, 7:15pm

Since we don’t know who will win, should we not invest in data? I don’t know who’s going to win the pharma wars either, so biotechs are out. I don’t know who’s cyber payment system will take over (if any one does) so that’s out … pretty soon there’s nothing left to invest in.

I know you’re being facetious, but you’re actually basically right here - although it’s not finding the single one winner, but finding some of the winners. So, it’s a bit easier than you’re saying. But, it’s still hard thing to do, depending on the technology/business. For instance, it’s easy to predict that solar is going to be big in the future. But, it’s been really hard to find which solar companies are worth our investment. Solar City? First Solar? Solyndra? SunPower? Not a winner there.

But when I look at Arista, for instance, I understand how the technology of SDN is superior to the traditional (Cisco led) model. I see how Arista’s management runs the company, and I see good financial management and growth prospects, compounded by their disruptive position. That’s a Trifecta for me. I also see some good things for technology, management, and financials for Nvidia, too.

But, like I said, I don’t have equivalent insights with the technology advantages behind these database companies. Do you?

RaptorD2 · March 27, 2018, 1:03am

But, like I said, I don’t have equivalent insights with the technology advantages behind these database companies. Do you?

You must be kidding. http://discussion.fool.com/ot-i-forget–32968084.aspx?sort=whole…

Dan