MDB and PVTL

Hey Rizz I thought I would create a new thread since the first one has gotten beyond my ability to keep track of. You were riffing off the discussion we had on PVTL, to look at MDB

Wulf, We’ve had some lively debate on MongoDB in the recent past. What is ur take on mdb? How pervasive is the database? Is the premium version worth what mdb charges? What line does one cross to go from the free database to the subscription? Do u use other dbs or mostly Mongo?

As you may know, I’m very positive on MDB - it is the largest position in my portfolio. I think it has totally won the NoSql db wars. That doesn’t mean it will be alone - the point of NoSql is that there are a lot of different db models now, not just Sql, and you can use what is appropriate for your use case. But Mongo actually directly supports multiple different models, and it is also the default NoSql db to use unless you have a niche use case.

First the bull case: MDB has won for standard or generic NoSql database. From an adoption point of view across the user sets that matter it is way above everything else and probably can’t be caught in this generation.

It is the top NoSql database in every tech stack where it is available - Java, NodeJS, Ruby, MEAN, Spring, Meteor, C++, and all the Javascript focused stacks.

Mongo is the top database used in startups, and it is also getting enormous growth in the enterprise space - you can go up and down from the smallest company to the largest, and it has interest and growth at all levels.

As you may know, my bank has set a target of 40% of all new data stores should be built in Mongo. The amount of buy-in required to set a target like this at a huge global bank is amazing. Now there is lots of push back from the data team which is deeply bought-in to Oracle. Getting approval from them for a Mongo data design sometimes takes months, and it is always suggested that if we did it in Oracle, they could do it in days. We’re going to miss the target this year. But this has been noticed at high levels, and heads are rolling in the data team (or bonuses are disappearing, which amounts to the same thing). They’ll have to change their policies. We are just going to see increasing adoption as the blockers get moved.

If you want to think about how Mongo is moving into enterprise companies, you can use a pattern called Master Data, which divides data stores into transactional, enterprise, and data warehouse or data lake.

Enterprise data is all the information about your company, its users, products, processes, etc. This area has exploded with the internet. It is all the information you want to put on your website and applications. Mongo is perfect for this, and this is the first area that Mongo penetrates in enterprise companies.

Data Warehouse is where you put all your data so you can do analysis, machine learning, AI, on it. There is a lot of work going on new technologies here, but Mongo is extremely common as the middle tier solution here. Actual analysis usually occurs on different systems - the top name right now is Hadoop, (which coincidentally Pivotal, which is also a Mongo partner, has enormous expertise on).

Transactional data is the last stronghold of Sql, and it will take a long time before this moves to NoSql, because Sql has built-in defences against broken transactions - like if you transfer money from checking to savings, and the computer dies when you debit checking but before you credit savings. Sql can guarantee that the transaction either all goes through or all fails - no partial failures. Even in this area, Mongo 4.0 will have ACID transactions, which means it will be just as good as Sql. But knowing the industry, it may take a while before people believe them and trust Mongo for this area.

So Mongo is the NoSql database of choice and we are at the beginning of the 20 year transition going from Sql everywhere to NoSql everywhere. And while Mongo takes market share from Sql, the over all data market is growing like crazy also. I expect Mongo to get a lot bigger than Oracle in 20 years, unless it shoots itself in the foot.

Now the cautions:

  1. MongoDB is open source and anyone can download the latest community version for free. Mongo makes money either from support or from hosting via Mongo Atlas. Now the thing is, no developer wants to host databases themselves these days - it is too hard, and too scary - it’s a good way to lose your job if you screw it up (not just the developer’s job - all the way up the chain). So every enterprise is going to either go Atlas if they trust the public cloud with their data, or get Mongo support if they are going to spin up their own Mongo hosting group in their own data centre. So Mongo the company should not have problem monetizing the use of Mongo the database - unless they screw up Atlas and someone else wins in hosting Mongo. They have so many built-in advantages that they would have to screw up very badly to lose here.

  2. Mongo needs to focus on selling to enterprises. They started by selling to developers and startups, and their marketing, demos, expertise, etc. is all still much better in this area. They are still transitioning from looking at enterprise platforms such as Java as a side issue - they provide much more support for niche dev stacks popular early in startups (which often later get changed to more enterprise stacks as the startups grow). Now I have seen a lot of movement in this direction since they went public (and presumably spent some of their money to bulk up their enterprise sales force).

Over all, Mongo has hypergrowth as far as the eye can see, with no real visible dangers. They can only screw it up themselves.

62 Likes

SteppenWulf, fuma posted a link to a the Fortnite blog seemingly discussing a failure of their MongoDB setup. It was over at the NPI and I’m not sure if it was posted here. Forgive me if I missed the discussion. The blog gets too technical for me very quickly. Is this something that indicates any issues with MongoDB or just a service that got too popular too quickly?

https://www.epicgames.com/fortnite/en-US/news/postmortem-of-…

https://www.epicgames.com/fortnite/en-US/news/postmortem-of-…

2 Likes

@IRdoc - really interesting blogs from fortnite. Usually companies don’t release the details of their issues.

A couple of things I noticed right off. First of all they are using version 3.2 of Mongo. The current version is 3.6, and there are enormous performance improvements in 3.4 and 3.6

They had big issues in scaling across all sorts of different areas based on their hypergrowth of user numbers. Mongo cache failures was a key issue, but also connection pooling and Memcached, which is a distributed cache solution. They are also finding bottlenecks in authentication. What is surprising here is not the Mongo failures - writing, sharding, authentication, and connection establishment are all areas that will be bottlenecks when the system is stressed. What is surprising the cache failures and especially Memcached failures - this is an indication that their entire system design was not built to support the scale they are seeing.

Certainly part of their problem was in their own code - they were using Mongo for ephemeral data, which should be handled by an in-memory process. It looks like they also screwed up their sharding. This is very important to get right, or you can significantly affect your db performance.

One good thing is that Mongo responded immediately to their requirements (and hopefully charged them for it also). We’ve flown Mongo experts on-site to analyze our DB and usage, as well as provide real-time support during heavy load on weekends

Over all, I don’t see any red flags around Mongo scaling yet. It looks like a group of developers who developed a good solution for reasonable scale, and then got overwhelmed.

It’s been an amazing and exhilarating experience to grow Fortnite from our previous peak of 60K concurrent players to 3.4M in just a few months, making it perhaps the biggest PC/console game in the world!

At the scale that they are at, any data solution would need customized tuning, and they would have to architect their data design, caching, and sharding for the usage they are seeing. Mongo can do all that, and also has an in-memory database that they don’t seem to be using to handle their user accounts.

This would only be a red flag if they find they can’t succeed, and decide to change databases. Thanks for bringing this to my attention. I might get a Mongo expert on my team to look at this, and maybe chat with the Fortnite guys to see how things got settled out.

25 Likes

Wulf,

Thanks for the “man at the front” report on MDB. Very helpful, and your undercurrent excitement on their product is palpable. I like that.

Unfortunately, I have little understanding of the evolution of databases, NoSQL or SQL, and the business time course of new database buildout as well as upgrade cycles and replacement of older databases. As a man in the field, could I beseech you for some perspective on how businesses might come to use MDB or even a potentially other “Database as a Service” entity in our current technology transition to the cloud movement?

Jack

Unfortunately, I have little understanding of the evolution of databases, NoSQL or SQL, and the business time course of new database buildout as well as upgrade cycles and replacement of older databases. As a man in the field, could I beseech you for some perspective on how businesses might come to use MDB or even a potentially other “Database as a Service” entity in our current technology transition to the cloud movement?

This is a very convoluted topic :slight_smile: In general people struggle mightily to get software to do what they want. Once it actually works, they try to update it, but the teams that built it inevitably leave or die, and you end up with software that more or less works, but that you are scared to change or update because your team simply doesn’t have the knowledge or skills. Then you build something in front of it or around it that you can change, that will give yesou the new functionality your product team wants. Inevitably that gets calcified too, and you end up throwing the whole rotting mess out and starting fresh.

In the late 80’s Sql was established as the best data store for transactional systems. So where do most banks store their transactional data today - in 30+ year old Sql technology?

You guessed it - in mainframe data stores mostly programmed in Cobol, that was the data language of the 60’s. Pretty much the whole world’s financial data, in the end, is stored in technology created 60 years ago, using code that absolutely know one understands any more (there are still Cobol programmers, slowly dying out, but no one understands the code written 60 years ago - it’s hard enough understanding last year’s code).

Now, every once in a while a bank gives up on its old system because it has just gotten too flaky, and reimplements the whole thing in new technology. I was part of that a few times, and one of the most enjoyable parts of it is reimplementing bugs from the old systems. The new system has to exactly implement what the old system does, and if it had bugs (as they all do) you have to reimplement those bugs!

So, coming back to your question - what is the implementation/replacement path of MDB or other NoSql?

  1. New systems will be implemented in the latest trusted technology. NoSql is just getting that label. Over the next 5-10 years it will be clear even to the die-hards that Sql, which was developed 30 years ago, no longer keeps up with modern needs. There will still lots of Sql advocates - there are lots of Cobol advocates today - but they will be like flat-Earthers.

  2. New functionality on existing systems will often be implemented as work arounds on new technology. Not because the old technology couldn’t do it, but because people are afraid to touch the code.

  3. Old Sql databases will die very rarely - they will be here a long time after I’m gone. Heck, old Cobol data stores will still be around after I’m gone.

  4. Data as a service will eat self-hosted data the same way all infrastructure is being eaten by the cloud.

22 Likes

While mostly agreeing with the picture SteppenWulf paints, I would shade a few things slightly differently.

The COBOL issue is indeed an incredible one. I don’t know that this is still true since the info was from a few years ago … but not that many … but apparently not only were there more lines of COBOL in use than any other language, but there were still new more lines of COBOL being written every year than anything else as people attempted to maintain these ancient systems. Of course, COBOL has some “advantage” since it is a very wordy language. There have been a number of attempts to create tools to automatically convert COBOL to something more modern, but they have all failed, producing horrid code which needed a lot of manual effort just to make work.

And, yes, while there are cases where the new system is required to replicate what was essentially a bug in the old system, this is certainly not always the case. There is a good story, which unfortunately I don’t remember details, of implementing Corticon, a rules engine, I think for some very large British entity like British Rail. When implementing a rules engine one defines the rules, tests them for consistency, and then compares the output to the output from the old system. They were surprised to find a substantial discrepancy, amounting to millions of pounds, but it turned out that it was an error in the old system which had gone undetected for 20 years. But, in this case, they followed the “rules” and corrected the problem.

New systems are not always implemented in the newest technology. Often, a shop will have substantial expertise in the old technology and limited expertise in newer technology and that will cause them to implement a new system in old technology. I think this is beginning to experience some significant shake up, however, because the move to web-based and/or device-based deployment means one has to use new technology … no COBOL compiler for your iPhone!

I also might note that while SQL is by far the dominant paradigm for relational databases, it is not the only one. Progress offers a 4GL alternative which is highly productive and has been used for thousands of applications with millions of users.

4 Likes

Now there is lots of push back from the data team which is deeply bought-in to Oracle. Getting approval from them for a Mongo data design sometimes takes months, and it is always suggested that if we did it in Oracle, they could do it in days. We’re going to miss the target this year. But this has been noticed at high levels, and heads are rolling in the data team (or bonuses are disappearing, which amounts to the same thing). They’ll have to change their policies. We are just going to see increasing adoption as the blockers get moved. Great post SteppenWulf

I think that is likely to be the pattern in most Enterprises when it comes to anything even slightly disruptive. Most people dislike change especially if it devalues the skills that got them the job in the first place. So the above applies to several companies we follow.

Which is why some companies like Arista have seen way faster growth in the Cloud where new ideas are brought in at the beginning. This is TALC stuff, mass conversion only happens when people see a cascade of peers switching, see that they are now on the way to being in the minority, and begin to fear for their jobs. Then they pile on in a rush to change. Mass market is suddenly there , the upslope of the lazy S curve.
As investors we hope to be invested before that happens.

Over all, Mongo has hypergrowth as far as the eye can see, with no real visible dangers. They can only screw it up themselves. sounds good to me…

4 Likes

@tamhas - I agree with pretty much all you said - I was attempting to describe the big picture, but certainly there are corner cases. My brother-in-law is a nephrologist, and he had a custom physician’s app written for him in Assembly language! I can barely comprehend the amount of work required to do this, or why anyone would want to. But it is what the programmer knew, and my brother-in-law didn’t know any better. He can’t find anyone for love or money to change a single line on it any more.

Concerning 4GL’s, I’m not hot on them. They make simple things easy, but hard things impossible. They had a bit of success 20 yrs ago, then flamed out - people go to those things at first because they solve some simple problem then find out they can’t maintain them or do anything even slightly hard. I see things like Appian today, and it looks similar - I wouldn’t touch Appian, even though they have been getting some traction in the Financial industry. I haven’t looked into Appian deeply - I may be doing it (and my portfolio) a disservice.

@mauser - thanks, and I agree it is typical for disruptive technologies to be resisted as long as possible by people whose skills are in the old technology

6 Likes

Concerning 4GL’s, I’m not hot on them.

You just didn’t meet the right 4GL! :slight_smile:

Back in the late 80s when there was a surge in their popularity, over and over again I would see descriptions of 4GL implementations in which the 4GL was used for 80-90% of the application and the balance had to be written in C. The 4GL part might be written 5X faster, but it was also the part of the application which was easy to write so, writing the whole thing in C, that part might only take 30-40% of the writing time, while the part that took 60-70% of the time was still written in C. It doesn’t take very fancy math to figure out that the overall productivity gain from the 4GL was minor.

With one exception … the Progres 4GL, now known as ABL (Advanced Business Language). With it, one could, with very rare exceptions, write 100% of the application in the 4GL at 5-10X productivity gains. These days, of course, web and device deployment tends to mean that the UI is written in something else (although Progress’ product line includes Telerik and some other tool for that part).

1 Like