MDB -- A (Former) DBA's Thoughts

The idea of imitating a document database with a MySQL clone seems quite dubious.

Hi ajm101.

Thanks for the correction. As I’ve continued to read more about the Amazon v. MongoDB situation, I realize I misspoke and your characterization is accurate. Good thing I’m retired, eh?

That said, I’m not sure that your correction changes my underlying thesis much. If the API isn’t ACID-compliant, then it would be difficult to communicate to an underlying ACID-compliant DBMS what constitutes the scope of a transaction. Again, I have no experience with any document databases, and I could be wrong on that point. It just seems like a task that any ACID-compliant systems would need to address with commands, which wouldn’t be present in the older API.

And, sorry Saul et al., if my talk of ACID is hallucination-inducing, or if my talking about APIs is taking things too far off topic.

Fool on!
Thanks and best wishes,
TMFDatabaseBob (long: AMZN, MDB)
See my holdings here: http://my.fool.com/profile/TMFDatabasebob/info.aspx
Peace on Earth

Please note: I am not a member of any newsletter team. My opinions are my own and do not necessarily reflect those of the TMF advisers. I am not an investment professional, merely an investor.

2 Likes

it would be difficult to communicate to an underlying ACID-compliant DBMS what constitutes the scope of a transaction.

In my experience, relational databases accessed with SQL require explicit transaction scoping, which would need to be a part of the API. There is one relational database, Progress, which preferentially uses a 4GL called ABL which has implied transaction scoping as well as explicit, but I don’t see how one would communicate that through the API.

1 Like

This getting off topic (so this will be my last response), but much of what you describe - updating an area code, adding a sequel - wouldn’t require a schema change. Any designer worth their salt would set the phone number field to be a 50 character string to deal with formatting, extensions, international numbers, etc. And the “show it in SQL” - the sql run against a relational database would be trivial. It would also be trivial using JavaScript to get it from a MongoDB database.

Changing the schema is free with MongoDB, but the code that accesses the schema still has to change, and if it powers a UI or web page, that has to change. And all that flexibility can be dangerous - the old “enough rope to hang themselves”.

I am not down on MongoDB - I just wonder if the market is as big as is believed. I’ve said several times that I could very easily be wrong.

I Googled “MongoDB horror stories”, and there were many (like losing data, horrible performance,etc), but it seems like the product has firmed up in 3.6. Other “horror stories” were where people used it for a task where a relational database would have been better.

Read the first answer here: https://www.quora.com/Why-do-people-hate-MongoDB

Sums up what I’ve been saying.

David

1 Like

David, I think that the issue is that most problem spaces are a mix of data which can be well structured and data which is not well structured. The solution depends on the nature of the mix. With data which is mostly well structured, you are right that good design can obviate any schema changes at all and then the best relational databases allow on-line schema changes, so, with good art many of these possible issues are not actually issues.

On the other hand, some data is just not well structured. If this is a minor adjunct to the rest of the data, one can use a BLOB or CLOB field to just store that data, even though it may not be easily indexed (except for possible word index on a CLOB field). But, if the documents are highly variable in structure … like a bunch of JSON objects … then a document DB can provide tools for accessing that data efficiently that would be hard to accomplish in a relational model. And, including some fixed fields is not a problem.

1 Like

@TMFDatabasebob

I’m really glad you started the thread and brought up ACID compliance. I straddle the two worlds. I did some work that involved writing and profiling sprocs, some schema design, and other typical relational database work a while ago. Then more recently I worked in a distributed database company in an engineering capacity.

There is not a lot new to them. To be really effective you still need to know relational algebra, BCNF, and ACID. You just also need to overlay (admittedly not easy) concepts from distributed systems/consensus (Paxos, CAP/PACELC).

I think Mongo was ACID compliant quite a bit earlier than the 3.6 API (believe this is closer to ODBC sense of “api”). If you want to be horrified read https://dzone.com/articles/how-acid-mongodb (note: 2013, information in it is obsolete). What is newer, and Amazon will have to fork should they want to support it, is ACID multi-document transactions. As a brief aside, I can’t tell you how it grates on me to read “ACID transactions” - if it ain’t ACID it ain’t a transaction. I believe basic single document operations were already ACID.

I don’t know enough about the implementation of distributed multi-document transactions or the guts of the DocumentDB implemenation to know if this would be more or less challenging here. Wish I could go on further, but it’s a bit late. Thank you for the reply.

I think Mongo was ACID compliant quite a bit earlier than the 3.6 API

No, multi-document ACID was only introduced recently with 4.0. The article you referenced was from 2013. Single document ACID is a nearly meaningless claim … all it says is that the indexes and record will match, although it sounds as if DynamoDB doesn’t even claim that.

Except that the MDB database software is free. Doesn’t that make it a little more curious?

I haven’t used MongoDB recently, but I did extensively a few years ago at the company I worked for at the time. The DB may be free, but having a scalable setup in a high-traffic environment was a royal PITA. So apart from Mongo DB administrators working for the company, the company paid for this extremely expensive service contract. This so that once in a while we could call upon a Mongo DB consultant to come and help us figure out why the damn thing wasn’t performing. And then the DB administrators were constantly leaving, because the workload on them was too high.

By the time I left, the company was well under way rewriting the code to use DynamoDB, just to avoid having to use Mongo DB.

If the business model is predicated on selling services to companies just so that the product remains barely usable, you may have some good sales for a while. Then you get a DBaaS offering that takes the administrative part out of your hands and you wonder “why would Amazon do that?”…

AWS usage fees are generally not cheap. But the costs are insignificant compared to consultant fees and DB administration staffing costs, especially in the Bay Area.

Mark

13 Likes

Amazon has an ACID product. What’s the odds that MongoDB is just a front end and Amazon has connected the ACID in the backend to avoid the license?

Extremely likely, in my opinion. Nothing in the product announcement leads me to believe they used any actual MDB code.

Bruce
Long AMZN
Short MDB sold Put

1 Like

The ACID in DynamoDB is embarrassingly limited. No one who really needed ACID guarantees could use it with any confidence.

They could, however, be using either Postgres or MySQL for the backend.

Bruce
Long AMZN
Short MDB sold Puts

But what if you need to find all tv shows an actor has been in? Your collection of tv shows is insular - you can’t build a relationship between items deep in each document. “What other tv shows has Betty White been in” is a hard problem to solve.

You have to parse each tv-show document, each season, each episode, all the way down to the list of actors, and do this for every document in the tv-shows collection. You’ll probably be matching by name (which is dangerous - sometimes the first name has a middle initial in it, or people change their names a bit).

In a relational database, this is trivial if you’re even marginally competent.

While this is all technically true, this is also an overly simplistic representation. Yes, in a relational DB it’s trivial to do such a query. But underneath, the DB would do exactly all you describe, parse each TV show, etc… Unless you define indexes in your schemas on widely used columns, such query would totally break down on large sets of data. And before you know it your DB schema has become hugely complicated.

The popularity of NoSQL databases is based on the realization that if you need cross references in your data, you might as well make them explicit in cross-reference tables. And once you have cross-reference tables, you don’t need all the complicated schema and query magic of SQL and you can use a simpler and more efficient DB implementation. This doesn’t always work easily for all relationships, but it works in enough cases that for the majority of DB needs, SQL is total overkill.

Mark

2 Likes

“No, multi-document ACID was only introduced recently with 4.0. The article you referenced was from 2013. Single document ACID is a nearly meaningless claim … all it says is that the indexes and record will match, although it sounds as if DynamoDB doesn’t even claim that.”

@tamhas

I understand. ACID is not only a property of “transactions” in a TSQL or PL/SQL context. It is a property of any mutations, even single entity (whether it’s row, document, or column family) which are particularly difficult in distributed systems. Sadly one has to clarify because earlier (much early, in before WireTiger) versions of Mongo had issues with single-document ACID compliance, which is why I included the 2013 article (along with the horrible stuff like server and database level write locks in earlier Mongo versions, which speaks to the practical difficult of even the single document case)… so it’s not a meaningless claim, particularly in databases with tunable consistency like Mongo.

In the context of CAP, and particularly “P” (network partitioning) it’s non-trivial to do this even for single entity. With systems that are CP (and I’m intentionally using the easier Brewer theorem classifications vs the more modern PACELC) you can still use techniques like row locks because you are talking about sharding, where data is partitioned but not replicated. For example, you issue a write to a coordinator and the coordinator determines the node that will contain the data, and you can lock the row/table on that node.

It gets tougher on AP systems, ie, ones where there there are many independent nodes the data is partitioned across and the data is also replicated. If you have N nodes that will have a copy of the write, you have to worry about conflicts arriving to multiple nodes out of order, nodes being unavailable (crashed) and how to make them consistent over time, and consensus on writes (and where you get into Paxos territory).

Mongo and several other “nosql” vendors have configuration that lets the administrator (and, to some degree the consumer https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlsh…) so they (and anyone claiming api compatibility) will have to deal with all of those problems. Good times.

To summarize - multi-document ACID is more closer to analogous to MS DTC or other distributed transaction coordinators in NoSQL. Single document ACID alone is nontrivial (but I am sure that DocumentDB satisfies it).

Extremely likely, in my opinion.

I would say quite the opposite.

On the one hand, if they simply take a copy of the Mongo 3.6 code it starts off having the Mongo 3.6 API with no issues. This means that the only changes they need to make are the performance issues they want to address. Depending on how long ago they started this, it could be quite a modest development investment.

Whereas, sticking a MySQL clone means not only developing the whole API, but it means having to figure out how to make a relational database behave like a document database. Most relational databases these days provide CLOB and BLOB datatypes which gives you a place to store a document, but there are no structures for accessing the internals of those fields beyond a word index on a CLOB and that is a LONG WAY from providing the kind of access needed in a true document database. There are a few relational databases that have JSON datatypes, which gets one closer, but still not equivalent performance to a document database. Moreover, you have to map the Mongo 3.6 API into SQL! This would be a huge development effort, almost certainly greater than starting with any other open source document database and building it up to be as good as Mongo. AND, you would still be lacking the API for ACID in the Mongo 3.6 API.

This seems like the least likely alternative. The only thing that really makes sense to keep it from turning into a monster development task is to start with the Mongo 3.6 code.

They could, however, be using either Postgres or MySQL for the backend.

No, see prior post.

but I am sure that DocumentDB satisfies it

The earlier discussion made it sound like indexes were not included in the ACID transaction for DynamoDB, so I would think this depends on whether 3.6 Mongo has it.

I have done some more poking around and while I haven’t found really definitive descriptions, it seems clear that Amazon is claiming to have written a new document database from scratch, but provided it with an API compatible with the 3.6 API for Mongo. This apparently took them 2 years … during which, of course, Mongo has added significantly to their product. This seems very peculiar to me because it represents quite a lot of development with an out of date target. Yes, the target was current technology at the time, but it doesn’t seem very agile not to keep up with additions to the target as development goes along.

So, not MySQL clone or some other RDBMS. Not actual 3.6 MongoDB. But something written from scratch to look like 3.6 MongoDB, a 2 year old version.

The supposed advantage of allowing near instantaneous conversion from an existing MongoDB installation seems a bit dubious because it would only apply to someone running on a two year old version of MongoDB and who had no reason to consider upgrading to the current version. Someone who is already running on 4.0 is hardly going to be interested.

As such, this hardly seems well suited for large customers, but rather as an easily and initially cheap option for low level users who don’t need advanced features and who are willing to commit to AWS.

25 Likes

The supposed advantage of allowing near instantaneous conversion from an existing MongoDB installation seems a bit dubious because it would only apply to someone running on a two year old version of MongoDB and who had no reason to consider upgrading to the current version. Someone who is already running on 4.0 is hardly going to be interested. As such, this hardly seems well suited for large customers, but rather as an easily and initially cheap option for low level users who don’t need advanced features and who are willing to commit to AWS.

Hi tamhas, that certainly makes sense to me.
Saul

7 Likes

I will admit that a lot depends on the current MongoDB community and what they are actually doing with the DBs. If a company is just starting on MongoDB, they are likely to want the latest version, whether or not they have an explicit, identified need for one or more features in that release. This is true whether or not they are also purchasing an application since an application currently for sale probably runs on the latest version or it isn’t very saleable. In some DB using communities where companies have the DB primarily to run a purchased application, it is not uncommon for them not to keep up with the latest version. But, this is most common in environments where the company is heavily modifying the application, thus creating an empirical fork with the development of the application in the vendor company. These companies can get stuck on old versions of the DB because they are on old versions of the application. I am skeptical that this occurs much in the MongoDB world, though, both because of the atmosphere of wanting to keep up with the latest technology and because I suspect that there are only a small number of companies using purchased applications vs using in-house developed applications.

3 Likes

Bruce
Long AMZN
Short MDB sold Puts

Bruce - unless I’m misunderstanding something, if you sold puts, you are long MDB, not short. (Price goes down, you get shares put to you at higher than market price.)

2 Likes

The misunderstanding of the betamax vs vhs format history is completely analog (pun intended) to the technical debate happening in this thread and the reason I bought more MDB after the drop on this news.

Betamax was technically superior according to the people designing and promoting the format. It doesnt take into account the actual usage and consumption. Price has been mentioned as the only advantage, but that is the misunderstanding. VHS won because it better fulfilled the wants of consumers; enough recording time to watch a movie. Beta was designed to record a standard tv program of one hour. VHS could hold a complete movie at about 90 minutes. Of course, the price was a major contributor during the mass adoption but the tipping of the scales began because vhs better fulfilled the consumer wants.

In this thread we have a lot of technical jargon I don’t understand. I don’t even need to know the meaning of the words in the technical debate to know that price is not the differentiation between MDB and AMZN’s version of it. If price was so important, MDB Atlas would never have been viable. They were making money when the code was completely open source…free!

These AMZN announcements mention MDB compatibility, not vice versa. MDB is in the driver seat here.

13 Likes