MDB -- A (Former) DBA's Thoughts

I posted this elsewhere in Fooldom. I should post it here too (lightly edited), although this board has seen some of the points I raise already. I especially want to call out posts by Greg (sarksnz) and Ethan.

As you might guess from my handle, I’m a database guy. At least I was one… I retired a few years ago, before document databases were really a “thing”. Still, I think I get the concepts.

An important concept here is called “open source”. This is a paradigm where a programmer makes code freely available to the community. Members of the community are permitted to use and modify the code, but only on the condition that their revisions are also freely available. I’m oversimplifying here, but I think that’s the gist of it. Why would anyone do that? With many hands working on the code, it gets built faster and bugs are spotted and corrected more quickly. The Linux operating system is perhaps the most visible open source product out there. MongoDB’s code base started as open source. At one point not too long ago, MongoDB (the company) made some changes to the code that I – as a database professional – view as being critical to delivering a reliable database management system (DBMS). The feature is called ACID, which is an acronym. I don’t want to get into too much detail, but it is the feature that guarantees that either all of a transaction happens, or none of it does. If you’re transferring money from one account to another, either the transaction completes successfully or the money stays in the original account. There’s no risk to you that the money appears in neither place, or risk to the financial institution that the money appears in both! About the same time that this feature was introduced, MongoDB (the company) changed the licensing agreement to make it much more proprietary. Personally, I think a DBMS that doesn’t support ACID is little more than a useful toy. Once you support ACID, you’ve graduated to the big leagues.

What Amazon has done – if I’m interpreting what I’ve read correctly, and the postings have been accurate – is grabbed a pre-ACID version of the code, from when MongoDB (the software) was still open source. That’s legal, and it’s a risk of open source code. I am not aware that Amazon’s version has been altered to support the ACID feature. I think if it supported ACID, Amazon would go out of their way to say so. Amazon is also marketing their version as being MongoDB-compliant. Since I don’t know enough about the changes that were made associated with the post-ACID versions of MongoDB (the software), I can’t say whether those claims are true or not. I suspect they’re mostly (but not completely) true. In the SQL world, code like “BEGIN TRANSACTION”, “COMMIT TRANSACTION”, and “ROLLBACK TRANSACTION” supported ACID features. I would guess that similar code is absent in pre-ACID versions of the DBMS, and I don’t know how Amazon’s version of the code would treat those (or similar) instructions. If the answer is “It ignores them.”, then I view its “compliance” as suspect. But, again, I’ve been away from the database field for long enough that I could have some things wrong. Also again, my time in the database world predates document databases.

I think the true threat is less than it appears. If Amazon is targeting companies that know enough to want MongoDB-compliance but don’t know enough to ask about ACID (or require it), they may get some conversions, but I doubt they’d get any “mission-critical” database conversions. If they modify their version to include the ACID feature, they’d need to do so in a way that they’re not violating any intellectual property (IP) that MongoDB (the company) has regarding their proprietary product. Do I rule out the possibility that Amazon will want to get into the DBMS business and create their own proprietary version that is ACID-compliant? No. Do I think they’ll do that? No. I think what Amazon has done so far is probably pretty easy – creating their own fork of open source code. Maybe adding a few nice features so they can claim differentiation. Modifying and maintaining a DBMS is harder. I’m not sure they’ll want to take that on, although they’ve succeeded in tackling some hard stuff, no doubt. Many SQL databases from different vendors supported ACID without violating each others’ IP. It’s certainly possible. But the fact that they’ve taken an easy step doesn’t convince me that they’ve committed to taking the hard steps. More likely, I think they’re trying to grab low-hanging fruit, and will leave the serious implementations for MongoDB (the company and the software). We should definitely watch to see if and how Amazon’s version evolves. Until Amazon’s version supports ACID, I’m not too worried. If they show signs that they’re moving in that direction, though, concern is warranted.

I could be wrong, but that’s how I view it. I don’t intend to alleviate concerns so much as to give them context and give you some sense of the probability of a game-changing outcome for our little DBMS vendor. If you have questions about what I’ve written or if what I’ve written generates more questions, please feel free to follow up with another post.

Fool on!
Thanks and best wishes,
TMFDatabaseBob (long: AMZN, MDB)
Maintenance Coverage Fool
See what a “Coverage Fool” does here: http://www.fool.com/community/community-team.aspx
See my holdings here: http://my.fool.com/profile/TMFDatabasebob/info.aspx
Peace on Earth

Please note: I am not a member of any newsletter team. My opinions are my own and do not necessarily reflect those of the TMF advisers. I am not an investment professional, merely an investor.

98 Likes

Amazon has an ACID product. What’s the odds that MongoDB is just a front end and Amazon has connected the ACID in the backend to avoid the license?

2 Likes

Amazon is also marketing their version as being MongoDB-compliant. Since I don’t know enough about the changes that were made associated with the post-ACID versions of MongoDB (the software), I can’t say whether those claims are true or not. I suspect they’re mostly (but not completely) true.

As far as I can tell Amazon is marketing their version as being MongoDB 3.6-compliant.

I don’t think the issue is ACID which is just a hurdle that Amazon can overcome. Instead of looking at the issue from the point of the software, look at it from the point of the buyer. The question to answer is whether buyers who pay money, as opposed to users of the “free as in beer” version, are willing to be locked into AWS. Also, Atlas is Mongo’s money maker. Is the AWS version better that Atlas?

In high tech it’s not the product but the market that decides who is the winner.

Denny Schlesinger

7 Likes

I don’t think the issue is ACID which is just a hurdle that Amazon can overcome. Instead of looking at the issue from the point of the software, look at it from the point of the buyer. The question to answer is whether buyers who pay money, as opposed to users of the “free as in beer” version, are willing to be locked into AWS. Also, Atlas is Mongo’s money maker. Is the AWS version better that Atlas?

In high tech it’s not the product but the market that decides who is the winner.

Denny,

Made me think.

Is getting a free database like getting a free wooden boat?

(Have you priced varnish lately?!)

Cheers
Qazulight

1 Like

The ACID in DynamoDB is embarrassingly limited. No one who really needed ACID guarantees could use it with any confidence.

So with all this good discussion on MDB and possibility of this threat from Amazon, where is Saul, Tinker and Bear?
Do they normally not discuss news that might effect their investments?

Do they have a 3 day rule where when they sell a position they can’t talk about it?

Are they waiting for the stock to settle before commenting?

I’m surprised they I don’t see any posts and are not involved in this discussion.

What am I missing?

Chris

6 Likes

In high tech it’s not the product but the market that decides who is the winner.

Indeed. Beta was better than VHS in every way but one, and VHS won because it was cheaper. This is common knowledge, but let me reframe it slightly: VHS won with a product which was worse, but one which was perceived to be a better value.

It seems Amazon may have launched VHS. And not only that, but also with the same crucial feature Excel re-launched with against Lotus 123 back when the market belonged to Lotus: Compatibility. No hurdle to entry, try for free, leave with your data intact if you want. If that’s the offer, or rather, if that is the perception of the offer, along with the perception of a better value, that’s a hell of a one-two punch.

Just one way of looking at it.

Wot

6 Likes

It seems Amazon may have launched VHS. And not only that, but also with the same crucial feature Excel re-launched with against Lotus 123 back when the market belonged to Lotus: Compatibility. No hurdle to entry, try for free, leave with your data intact if you want. If that’s the offer, or rather, if that is the perception of the offer, along with the perception of a better value, that’s a hell of a one-two punch.

Except that the MDB database software is free. Doesn’t that make it a little more curious?

Bear

TMFDatabasebob,

I’m also a database guy - more architect and ETL than “administer” - so I looked into MDB - creating a database, collections, and taking some of the MongoDB university courses. Here is my big “technical” issue with MongoDB, via an example:

In a mongodb, there are collections - a collection is a group of documents that are the same.

So if you had a collection called “tv shows”, it would have a collection of documents, each one representing a tv show. Imagine a folder called “tv shows” with a bunch of Word documents inside, one called “Cheers”, another called “Greys Anatomy”, and so on.

Inside a tv show document, there would be a list of “seasons”, and each season would have episodes, and each episode would have a run-time, a plot, and a list of characters. Each character would have a list of actors (multiple actors can be used to play one character - think of babies, or young kids - they often use twins).

All great, assuming you always query your data with a tv show in mind. Find the actor that played so-and-so in season 3, episode 4 of the Mary Tyler Moore show.

But what if you need to find all tv shows an actor has been in? Your collection of tv shows is insular - you can’t build a relationship between items deep in each document. “What other tv shows has Betty White been in” is a hard problem to solve.

You have to parse each tv-show document, each season, each episode, all the way down to the list of actors, and do this for every document in the tv-shows collection. You’ll probably be matching by name (which is dangerous - sometimes the first name has a middle initial in it, or people change their names a bit).

In a relational database, this is trivial if you’re even marginally competent.

To make a long story short, a lot of the value of the data is in the relationships between data items. MongoDB has added some features in 4.0 to help model relationships, so the above example, if done right, can be made simpler

Anyway, I still have some MDB - as a stock, they’ve been very good for my portfolio (thanks, to whoever brought it to the board). I guess I am concerned that a lot of their market share is either people not understanding the problem they are solving, or management wanting to sound “hip and current” and picking something that has all the latest buzz words attached.

There could be a whole class of problems that MongoDB is very good at solving, and will generate a lot of revenue for the company. I suspect that maybe I’m not being imaginative enough and I should just focus on the business metrics of the company. After all, what I don’t know what I don’t know.

18 Likes

HiTechGuy,

I watched a video by MDB’s CTO, brought to the board by Tinker (or possibly it was on NPI):
https://vimeo.com/301656749?cjevent=93a8d8b1155411e9820d0173…

It was extremely helpful for me as a LowTechGuy!

In response to your “technical” issue critique, it seemed to me like this was actually super easy to do using their “Aggregation Pipeline Builder” by adding stages and you get sample output live as you keep updating and adding stages. I think they start talking about this at around 13:45.

It was quite cool. You might check it out and see of what I am saying is accurate. I very well could be absolutely wrong!

Daniel

2 Likes

Yes, the Aggregation Pipeline Builder is the CTO’s favorite feature. It makes sophisticated queries simple. It is yet another feature that version 3.6 does not have.

Tinker

Hi, Daniel.

The aggregation pipeline is for loading data from an external source into MongoDB - ETL (Extract, Transform, Load). It’s a powerful way of getting data from external sources and normalizing it into MongoDB (there are lots of tools like this - Alteryx does something similar, but a lot more powerful).

But it doesn’t help with relationships between data. I plan to dig a bit more and play around with that over the next few weeks.

I edited my post a few times to simplify it (and still missed “administer” versus “administrator”). One of the things I cut was that I highly doubt companies like Mastercard, Square, and Shopify keep their core data in MongoDB - they would use a relational database for that. Instead, they would extract it from a relational database, using something like the aggregation pipeline, to put it into MongoDB and run statistical analysis on it to look for trends in the data.

When I said, “There could be a whole class of problems that MongoDB is very good at solving, and will generate a lot of revenue for the company.”, this is the sort of thing I was talking about. And that’s why I haven’t completely divested my shares.

David

4 Likes

But it doesn’t help with relationships between data. I plan to dig a bit more and play around with that over the next few weeks.

Here is a simple example of the difference between SQL and a Document Store. Take your example with Actor/Actress example. Betty White, movies she was in. What if you also afterwards want to add was the movie made into a comic book, if made into a comic book was there a sequel? If a sequel did the sequel make more than the original? Was there a sequel to the sequel? Was there a remake of the entire series 10 years later or more? Did the actor or actress ever appear in Game of Thrones? If so, which episode(s), on which dates, with what ratings (to try to evaluate the true monetary value that the actor/actress incrementally brings to the show)…let me know how you would do all this in an SQL that already has its scheme set up and do it on the fly, at scale, with terabyte horizontal scaling on commodity servers.

Or something simple like adding her phone number. Oh wait, now phone numbers also need area codes (when I was younger phone numbers were 7 numbers unless you were going to make a long-distance call). Even if you had area codes to start with what if we then have to add in international codes (or perhaps and 11th or 12th number for region if 10 digit numbers become exhausted) but the schema only holds 7 digits or only holds 10 digits?

What if area code 678 (as used in Avengers) is changed to 412 at some point in time.

Each time in an SQL you have to change the schema, you may have to add new tables, it suddenly becomes more and more complicated and more and more convoluted and less and less manageable requiring more and more expensive DBAs to manage it all, and less and less ability to to be flexible with your scheme.

In a document database you can handle all those changes without a single change to schema.

Further, what is your date is unstructured to begin with, such as collecting data from social media…good luck with the SQL.

That is why NoSQL is starting to go mainstream. That is why Cisco uses MongoDB as its front end commerce website. That is why Fortnite runs on top of Mongo and not on top of Oracle.

Just a few examples. Also, traditional DBAs are going to be cynical about it just like laproscopic surgeons were about DaVinci.

Tinker

22 Likes

The aggregation pipeline is for loading data from an external source into MongoDB - ETL (Extract, Transform, Load). It’s a powerful way of getting data from external sources and normalizing it into MongoDB (there are lots of tools like this - Alteryx does something similar, but a lot more powerful).

But it doesn’t help with relationships between data

David, the Aggregation Pipeline is used for in database analytics.

Also, you may want to review banks increasing use of Mongo running not just their front end but starting to implement it into transactions.

Further, you may also want to examine how rapidly NoSQL databases (in particular Mongo) are improving their not just front but back end capabilities.

That is how disruption works. It starts out attacking a pain point that the existing technology does not serve very well, and then over time it gets better and better and grows its niche larger and larger.

SQL databases will be the predominant database structure, perhaps forever. However, SQL does not do a lot of things well, it does things with less efficiency in many instances, and there are things that it simply cannot do.

Example, go look at Met Life. They tried to use their customer data to create a complete database or agents and marketers of the company to be able to keep a 360 degree profile of their customers.

Met Life tried for years to get it to work and then within in a matter of a few months or less, they made it work with Mongo:

https://www.mongodb.com/press/metlife-leapfrogs-insurance-in…

But whatever.

Tinker

6 Likes

That is why NoSQL is starting to go mainstream. That is why Cisco uses MongoDB as its front end commerce website. That is why Fortnite runs on top of Mongo and not on top of Oracle.

Just a few examples. Also, traditional DBAs are going to be cynical about it just like laproscopic surgeons were about DaVinci.

Tinker

This isn’t the first time I’ve seen people talking about Fortnite. Do you mean Fortinet? Or is the videogame using MDB?

1 Like

I mean the world’s largest multi-player video game Fortnite that Epic games, running on top of MongoDB made $3 billion in PROFITS, not just revenue.

But nothing to see here. Should have used an SQL. Except, an SQL would not be capable of practically doing what Fortnite does. This is wealth created because Mongo enabled the game to exist in the first place.

Tinker

4 Likes

Indeed. Beta was better than VHS in every way but one, and VHS won because it was cheaper. This is common knowledge, but let me reframe it slightly: VHS won with a product which was worse, but one which was perceived to be a better value.

It seems Amazon may have launched VHS.

But your analogy, I think, breaks down with databases.
With video tapes there was a network effect…the rental stores needed to carry both formats.
Not good. So there was an advantage with winner take all (most) of the market, and VHS won.
I’m not sure how that applies to a database. There may be some sharing between companies that gives them incentive to standardize…bu most companies are silos, compared to others company’s data, I would think.

This is different than in the MS Office case, where people shared documents, spreadsheets, and presentations between companies all the time.

Mike

3 Likes

I think this comment within Tinker’s thread is key “In a document database you can handle all those changes without a single change to schema”

One might ask why? Look at it like a SaaS company. Why have traditional SaaS companies destroyed the legacy folks that are going to a SaaS model? One of the main reasons is that the SaaS companies built their software on a single baseline code so that enhancements, etc. just attach and makes it really easy, it’s faster, less mistakes, etc.

So you can make changes to the software without changing the underlying code. I think the same applies to the MDB thesis as Tinker points out.

Tinker - correct me if I am mistaken.

Sox

2 Likes

But your analogy, I think, breaks down with databases.

It breaks down partially, not totally. When the network effect is working (Office, VHS) adoption of the top contender (the Gorilla) is very strong. When the network effect is absent the top contender still gets the largest market share but nowhere as much as a Gorilla. Two reasons are that it attracts more developers and more add-ons than the lesser contenders. It creates a stronger value chain (supply chain) than lesser contenders.

Denny Schlesinger

“What Amazon has done – if I’m interpreting what I’ve read correctly, and the postings have been accurate – is grabbed a pre-ACID version of the code, from when MongoDB (the software) was still open source”

That’s not accurate. They are API compatible with the older version, but the underlying database behind DocumentDB is rumored to be Aurora. Aurora is Amazon’s proprietary fork of MySQL (https://www.percona.com/blog/2015/11/16/amazon-aurora-lookin…), which as you know is an ACID compliant relational database.

The trade off is presumably in CAP (https://dzone.com/articles/better-explaining-cap-theorem), though I have no background in Aurora’s distributed behavior. Aurora does has it’s own clustering and replication strategies (https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide…) which will differ from Mongo’s, though.

I think it’s a fair concern for MDB investors that many MongoDB consumers 1) only use a small subset of the API 2) do not operate at a scale where the CAP tradeoffs will be material. I think there will be some reluctance to put critical data in a proprietary cloud product that would limit DocumentDB adoption. I think it will be used against MDB in pricing negotiations.

No MDB position long or short.

3 Likes