Back to AI For Second With MDB

Can’t forget about the massive opportunity in AI that Mongo is especially well built for. Since we don’t talk about NVDA too much these days, there’s still a massive market out there.

Many of us had a lot of success with NVDA, riding the S-curve train (maybe a stop or two too long at least for me). All that infrastructure is being put to work. The world needs Mongo for the next phase. The software that powers this next phase will be the next big AI boom IMO.

Perhaps started working with Open-source version, but not for long.

This type of approach was limiting the velocity with which Continental could innovate and so they moved from SQL-based tools to MongoDB to build their deep learning framework. Originally Continental only planned to use MongoDB to store and label image data, such as scenes from the road. But the team quickly found that they can use the same database for the analytical image data, the derived metadata and the results of their experiments, significantly increasing productivity.

With MongoDB’s flexibility and parallelism, developers can build new models more rapidly, work together without sacrificing speed and accuracy, and quickly build and test new prototypes for the autonomous SensePlanAct framework. “In the end we were able to tame this deep learning beast with this flexible database”, says Martin Berchtold-Buschle, who is the subject matter expert for Big Data Infrastructure & Deep Learning at Continental.

Looking into the future, Continental is moving toward the cloud and plans to adopt MongoDB Atlas, the fully automated database as-a-service. The team believes that MongoDB will be a crucial component in helping them achieve Vision Zero which is being adopted as a new standard of safety across many cities and governments around the world.

https://www.mongodb.com/blog/post/mongodb-helps-bring-new-er…

Darth

34 Likes

Can’t forget about the massive opportunity in AI that Mongo is especially well built for. Since we don’t talk about NVDA too much these days, there’s still a massive market out there.

If you think of Mongo and Nvidia as the AI twins, like the “Wintel” twins of the PC era, Mongo will be the bigger winner

https://softwaretimes.com/pics/intc-msft-08-28-2019.gif

Going back to TAM and the “S” curve, there is practically no limit to data unlike hardware where, despite the billions of transistors on a chip, there are still limitations, specially the cost of manufacture. In bits vs. atoms, bits win hands down.

Another thing, unlike security, databases have strong moats.

Denny Schlesinger

22 Likes

great points…

though i dont quite understand how AI benefits MDB.
For NVDA, its simple, whole world’s AI training is done using NVDA chips… yeah, some XLNX and INTC share but NVDA is the king.

For MDB, yes they are king of un-structured data and more data = more MDB revenue… i get that point…

however AI actually has inverse effect on data growth… at-least to my understanding for data generated by physical world… like sensors etc…

AI essentially will throw out most of the unstructured, time series data and really get you the “information” which would be single digit % of the total un-structured data… so its good for AI user but not sure how it benefits MDB as much.

may be I am missing something here… may be its not the sensor data you are talking about…

BTW I am big MDB bull and its 2nd largest position at ~20% (after AYX ~21% which has grown to that size vs MDB is where I have intentionally built this size)… so this is not to pan MDB by any means… just looking to understand it better.

7 Likes

though i dont quite understand how AI benefits MDB

Here is a pretty good article summing up just a few key areas of AI, with this line in particular really driving the point home:

Put simply, artificial intelligence is the replication of human-like behaviors in software. But AI cannot become the revolutionary transformative force many predict it will be without machine learning algorithms, which work on finding patterns in datasets and act based on those patterns. It is machine learning algorithms that put the “intelligence” in artificial intelligence.

http://techgenix.com/ai-machine-learning-algorithms-ai/

And then this, which explains how NoSQL databases are used:

The leading NoSQL database, MongoDB, has come out ahead in the field for a few reasons. It’s the database component used in the MEAN software stack, it’s open-source, and it’s cross-platform compatible. It also has some impressive built-in features that make it an excellent choice for businesses that need fast, flexible access to their data, whether it’s to make real-time, on-the-fly decisions, or to create tailored, data-driven experiences for users.

https://www.upwork.com/hiring/data/should-you-use-mongodb-a-…

At least that is how I understand AI benefiting MDB.

Brandon

17 Likes

Just to keep it easy.

The explosion in data is already upon us. Everything around you and that you interact interact with is a data point shipping little bits to somewhere.

In the example I have it was specifically AI for autonomous cars. That car generates petabytes of data. And it won’t stop when it drives itself off the lot. That dats will continue to ship, be stored and be processed. The algorithms will never be “good enough.” Seemingly endless cycle.

In 2012 people generated 2.8 trillion GB of data worldwide, or enough to write 10 million Blu-Ray discs. By 2030, that figure is expected to multiply nearly forty times. The rapid expansion of the so-called Internet of Things, or IoT, is the spark behind this explosion of user data.

In 2010 there were 12.5 billion internet-connected devices in the world. By 2020 there will be 50 billion, incorporating any and all devices that can connect to the internet – such as smart home appliances, smart phones and in the not too distant future, smart cars. But of all the data generated by the 12.5 billion internet-connected devices in 2010, only 0.5% of it was processed.

We’re talking trillion gigabytes. That’s a lot of zeroes.

Amir Khosrowshahi, who joined Intel to head up its AI division after the chipmaker acquired his deep learning start up Nervana in 2016, recognizes that this onslaught of data requires “a staggering amount of compute” to extract value from it.

To that end, Khosrowshahi contends the surge in data will kickstart a “virtuous cycle” whereby companies like Intel are forced to develop better algorithms to process the data, which in turn will allow for even more data to be generated.

Typically when you find something generating value, data in this discussion, you don’t close up shop after mining it. And that’s what AI is doing with data. Extracting value. Teaching a computer to drive, detecting anomalies in financial transactions, extracting genetic codes, finding better polymer bonds in materials, recommending what you want to watch next, making better decisions on what to present on your e-commerce webpage, applying agricultural products to minimize chemical use and maximize yield, and so forth.

Success only leads to more breakthroughs and the hunger for more and more data from more and more sources.

Processing the data requires the right computing infrastructure with the right software. The first post in thread showed why Mongo is especially well built for this role in the database. But there are innumerable ways of extracting value from the data boom with software. And they make up the majority of most of the portfolios in this board.

Least we forget.

How to secure that data is another concern.

https://fortune.com/2018/11/28/the-explosion-of-consumer-dat…

Darth

22 Likes

though i dont quite understand how AI benefits MDB.

Maybe you missed the upstream link, Mongo taking market share from SQL

https://www.mongodb.com/blog/post/mongodb-helps-bring-new-er…

Denny Schlesinger

5 Likes

Denny,

I think this is what the key is from the blog post:

Restricting the data to a rigid tabular schema, like those used in traditional relational databases, isn’t practical because data scientists do not know up front how each data element will be used in the next model iteration, and the one after that, and so on.

Essentially in AI/ML methodologies, data collection needs a non-structured way to store and retrieve. So SQL databases will not suffice without putting a layer on top of it. MongoDB fits in nicely. In my view, this is a huge advantage. With the AI/ML wave going in full swing in the coming years, this will greatly help MongoDB deployment.

Thanks for posting.
Chandra

6 Likes

however AI actually has inverse effect on data growth… at-least to my understanding for data generated by physical world… like sensors etc…

AI essentially will throw out most of the unstructured, time series data and really get you the “information” which would be single digit % of the total un-structured data… so its good for AI user but not sure how it benefits MDB as much.


How might one determine which data is worth keeping and which should be discarded? First the entire dataset must be logged, cleaned, and stored. Then the standard statistical tools (or ‘machine learning’ if you’re reading a company marketing materials) must be employed to determine the useful data. Then what? Throw out the data that isn’t used in that particular model? Probably not. What if a new problem comes along, and the ‘useless’ data is suddenly useful? Or worthy of consideration? Standard practice does not include deleting raw data. Instead, a multi-tiered data structure is typically used (level 0, level 1, level 2) where level 0 is the raw (eg sensor) data and the model input is something like level 2 or 3 or 4. This doesn’t even pertain to issues of multi-format or unstructured data that mongo excels at. I’d be shocked if an AI world is one with less data storage than a non-AI world, but anything is possible.

1 Like

thanks for all responses… I see some of these points, will have to spend sometime to learn more.

this onslaught of data requires “a staggering amount of compute” to extract value from it

A new start-up company, Cerebras, has built the world’s largest computer chip, the size of an iPad. According to ARK’s James Wang, “the single chip contains 1.2 trillion transistors and is 57x more complex than Nvidia’s flagship V100 GPU.” CEO Andrew Feldman believes that with this humongous chip, AI will be reinvented but nobody knows how well it will work in practice. Watch out for some performance benchmarks of this giant chip in November. If it works, it could impact Nvidia’s data centers and possibly other chipmakers.

As I understand it, today’s microprocessors are constrained by the manufacturing process but larger computer chips have higher defect rates. Cerebras states they tackle this problem with redundancy. For a more detailed discussion, pictures and graphs, see James Wang’s tweetstorm, https://twitter.com/jwangARK/status/1163928272134168581?utm.

Other good sources are

  1. Cerebras white paper, Wafer Scale Engine, an introduction (PDF), https://www.cerebras.net/wp-content/uploads/2019/08/Cerebras…

  2. TechCrunch, https://techcrunch.com/2019/08/19/the-five-technical-challen…

  3. Fortune, https://fortune.com/2019/08/19/ai-artificial-intelligence-ce….

im

4 Likes

I think this is what the key is from the blog post:

Restricting the data to a rigid tabular schema, like those used in traditional relational databases, isn’t practical because data scientists do not know up front how each data element will be used in the next model iteration, and the one after that, and so on.

Yes, it is. In the spirit of Saul’s investing style, I didn’t want to get into technical details just pointing out how similar situations in the past have worked out. But since you bring it up, one has to remember the reason for SQL, back then storage was expensive and limited and flat files had a lot duplication which was not just wasteful but hard to update (every instance of duplicate data had to be updated). That was the problem SQL was designed to solve and it did it very well, the reason it was so successful for so long.

SQL history

The SQL programming language was first developed in the 1970s by IBM researchers Raymond Boyce and Donald Chamberlin. The programming language, known then as SEQUEL, was created following the publishing of Edgar Frank Todd’s paper, “A Relational Model of Data for Large Shared Data Banks,” in 1970.

https://www.businessnewsdaily.com/5804-what-is-sql.html

Most of today’s IT people hadn’t been born yet! Today’s data is a totally different animal with totally different challenges. Programming languages have changed from procedural to object oriented. Document storage is both a 40 year leap forward and a 40 year leap backward but the core virtue is that it solves today’s computing needs.

That does not mean SQL is dead, My portfolio app is entirely SQL based because that’s the best technology for transaction type data.

Denny Schlesinger

4 Likes