Since I’m invested in some of the same stocks that many of you are on this board, I thought of sharing my Current Portfolio as of 02/28/2022 which looks like this.
Datadog (DDOG): 24%
Snowflake (SNOW): 18%a
Zscaler (ZS): 15%
SentinelOne (S): 10%
MongoDB (MDB): 8%
Upstart (UPST): 4%
Affirm (AFRM): 3.5%
and some call options in DDOG, CRWD, S
In February, I sold out Monday (MNDY) and ZoomInfo (ZI): ( OLD Reasons)
I won’t rehash the financials as they have been mentioned multiple times but there are reasons why I hold these stocks and the allocation.
#1: Datadog (DDOG): ( Oldest stock & recent Application Security and how threats are moving to the Application level e.g. the recent log4j Vulnerability )
#2: SNOW ( SNOW + MDB link)
#3: S & CRWD ( Both are fighting each other hard and both are winning because the market opportunity is huge! Tailwinds + recent reason)
#4: UPST & AFRM ( Recent Reason and disruptors in Consumer Lending & Consumer Financing using AI. And AI is for real. Just before buying my first shares in Upstart I wrote how their technology may be at a point fo inflection and I strongly feel I was right and the next couple of years will see that change take over; read this incase you’ve missed: ). Upstart has done very well for me so far and I see it doing incredibly well in 2022 and beyond!
Now looking to the future….
Since the past couple of years my investments have been guided by Data and the ever exploding data…I had tweeted sometime back that follow the data and you’ll follow the money.
So, he’s a little bit about data and why I’m so excited to be invested in some incredible companies OR opportunities as I would call it.
Most of the time I’m active in the day is occupied by living inside code and working on cutting edge tech ( on the Software side) but here’s a way to define some complex terms so that it becomes hard for anyone to forget what they mean
Data Lake: Just think about this as “lake” and nothing more. What do you have in lake? Probably anything form water, fishes, plants, boats, people, swimmers, sunken stuff…anything. So that’s what a Data lake means; it stores Structured and Unstructured data.
Data Warehouse: Just think about this like a real Warehouse. What do you have in a warehouse? Some items that are stored in some order so that you can have quick access to them. So they have structured, filtered data that has already been processed for a specific purpose.
Everything is about the Data… where it leaves, how it is stored, how it is moved and how it is secured.
What is Apache Spark?
It’s is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley’s AMPLab, the Spark codebase was later donated to the Apache Software Foundation,
Going back to the release of Spark 3.0…
It added a lot of support for SQL and Python integration and it was brilliant move to unify SQL analytics for Data Warehousing and Data Science as the future is headed in that direction!
Data Warehouses were primarily built for BI ( Business Intelligence) and reporting more than 40 years back. However, with the rise of big data which contains a lot video, audio and text it makes ML and data science really hard on these data sets . And streaming support is also limited and that is again where the future is heading.
To overcome that nowadays most of the data is moved to a Data lake ( which is a Blob store) and as needed some of that data is moved to Data Warehouses.
What is Delta Lake? “Delta Lake is an open-source project that enables building a Lakehouse Architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, Ruby, and Python.”
So Data Lakes with the ability to store all different kinds of data ( video audio, text, structured and unstructured) are great for ML and data science but is hard to setup, not BI friendly and performance can be a bottleneck.
And very quickly these Data Lakes can become Data Swamps
What is Data Lakehouse? This tries to bring the best of both worlds which is
(Structured Transactional Layer) and brings structure, quality, performance and governance to Data lakes and enables BI, Data Science, Machine Learning and Streaming Analytics.
So, with Data Lakes being highly durable, very cheap and infinite scalability . With the ability to store every kind of data many organizations started building those with the hope that they will be able to do BI, Data Science, Machine Learning and Streaming/Realtime Analytics. In reality most of the undertakings in this area have proved/are proving to be failures and not knowing how to go about fixing those there’s a need for a lot of expert help/services.
Most of the challenges for these customers adopting Data Lakes are.
- Hard to append data: New data leads to incorrect reading.
- Modifying existing data: For example to meet GDPR and CCPA requirements.
- Half baked data during to failing jobs: A Spark job can fail midway and lead to this situation.
- Real time Operations: Inconsistency when mixing streaming and batch. Example appending data in batch operations and trying to read it in real time.
- Historical data storage is costly.
- Managing Large Metadata: As data is going in the petabytes, the metadata is growing in terabytes.
- Too many files in the Datalake making consumption hard.
- Performance: Data is stored without any assumption how how it is going to be consumed and this creates performance issues.
- Data keeps changing subtly and it creates a lot of problems.
So Delta Lake was built to address all of the above issues. It brings the best of Datawarehouses and data lakes to a single place.
And Delta Lakes solves those problems by using ACID transactions, Spark, Indexing and Scheme Validations.
So Delta Lake enables both read and write operations from Snowflake, Amazon Redshift, Amazon Athena,
So Databricks says that “Data Lakehouse: The best of both worlds in one platform A data lakehouse unifies the best of data warehouses and data lakes in one simple platform to handle all your data, analytics, and AI uses cases. It’s built on an open and reliable data foundation that efficiently handles all data types and applies one common security and governance approach across all of your data and cloud platforms.
So what is Databricks?
Databricks is an American enterprise software company founded by the creators of Apache Spark. Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks.
Databricks is proud to announce that Gartner has named us a Leader in its 2021 Magic Quadrant for Data Science and Machine Learning Platforms
In February 2021 together with Google Cloud, Databricks provided integration with the Google Kubernetes Engine and Google’s BigQuery platform. Fortune ranked Databricks as one of the best large “Workplaces for Millennials” in 2021. At the time, the company said more than 5,000 organizations used its products.
ronjonb ( https://twitter.com/ronjonbSaaS)
P.S. There’s a lot to write about Databricks and I’ll continue to provide my inputs. Databricks is poised to IPO anytime in 2022. I plan to participate in their IPO but if that doesn’t work out I’ll surely build a position post IPO and will decide my allocation with respect to Snowflake. MongoDB, Databricks and Snowflake look like the most probable entries in my Big Data basket. And it’s great to see how both MongoDB and Databricks are using the Open Source to their business dominance.