SNOW research report

I can’t find a link, but I’ve read that “Weakness Said Tied to Cautious Analyst Note - Cleveland Research Report said partners are seeing sales cycles elongate on increased competition from hyperscalers, particulary GOOG BigQuery”. Thought this would be helpful to explain the price drop this morning. Can anyone find more about the contents of the report?

14 Likes

I can’t find the sources, but that’s made me think about the same thing happened in ZS. So I’ll be cautious making any decisions before their ER on 25 AUG.
PS: last time ZS went not well.

Rick

2 Likes

I was doing the same and agree with waiting for facts. Being in Tech myself, we hear things in the market. One interesting bit was from a Googler friend of mine. Google big-data/big-query go-to-market includes a “Snowflake Compete Team”. Kind of self-explanatory, but compete teams in the industry are created with knowledgeable individuals about the players in the market to fight competitors that are real long-term threats.
It’s anecdotal, but it’s validating that Snowflake seems to be on the right path. On that same token, Google is a tough fight to take on.

8 Likes

Is this what’s causing the 10% drop here today? I have found no other news here.
Thanks

How can this company be seeing slowing sales cycles in a environment like this? Business seems to be accelerating across the cloud software sector based off the 2Q reports we have seen so far.

I threw up a note to watch SNOW closely when they reported adding just a single Fortune500 customer in 1Q this year. It’d be quite concerning if this report is accurate and they are actually seeing longer sales cycles already.

Bnh

5 Likes

There was this not too generous Seeking Alpha article this morning too.

https://seekingalpha.com/article/4450851-snowflake-is-not-a-…

1 Like

Hi everyone. It’s been some time since I posted. Quick recap on what I’ve been up to - about 3 years ago I took a career pivot into Data Science and completed a masters program at UC Berkeley. I graduated last fall and I recently took a position as Director of Data Science and Machine Learning Engineering at a consulting company, and building a practice there.

I’m also helping a Data Engineering team at my client with some decision making on tool selection for their next generation data platform (I’m consulting in a group that represents the MLE/DS side, but as a consumer of the data pipeline our group is keenly involved).

Herein lies the conundrum: The introduction of ML workloads as an equal citizen in the data pipeline. If you had to build your data platform to serve both analytics and data science workloads that includes MLOps, which tool stack would you pick? My client is considering ADLS/Synapse, Snowflake and Databricks/Delta Lake for the EDW part of the pipeline.

I’m really impressed by Snowflake as an analytics platform, but less impressed with it as a data science platform. For that Databricks is better. You can stage your own transformations with delta lake (managed by Databricks) as you work on your ML features, and has built in support for native distributed storage based on Spark which is open source. It is designed for massively scaled processing and machine learning.

So I think there is some truth to the sales cycle taking longer. My client is still deliberating.

For more on what Databricks is and how they are encroaching from the ML side of the house this article is a great read.

https://www.protocol.com/enterprise/databricks-snowflake-ana….

Best,
—-Kevin

28 Likes

I’m not going to claim that I know whether Snowflake is facing slow sales cycles in Q2, we’ll find out when they release earnings next week; but what I will say is that relying on anecdotal evidence from questionable sources is not the essence of this board.

First, the firm that issued the negative note on Snowflake (Cleveland Research) indicated in a disclaimer that “Cleveland does not formally cover snow and does not have an investment opinion.” Let’s think about that for a minute – they published a negative report on a company which they don’t even cover. Do they have some information we don’t? Maybe. But it’s equally as likely that this is speculative. The point is that we don’t know. An analyst at Piper Sandler responded to the report by stating that the claimed slowdown is “solely a function of tough comparison given it closed the first eight-figure contract one year ago…it is not reflective on elongating sales cycles, and is inevitable as RPO approaches multi-billion levels.” I will stop posting any additional thoughts from analysts as that is off-topic to the board, and I’d ask others to refrain from commenting on this.

Second, on the “not too generous Seeking Alpha article this morning,” I did not find any negative comments about Snowflake (the company). In fact, the author summarizes the article by stating that “Snowflake is building a phenomenal business that is likely to be a long-term market leader in the cloud data storage and analytics business.” His argument is that Snowflake’s share price is difficult to value, and provides a whole lot of quantitative data that is, again – off-topic to the board.

The point of this is not to totally dismiss the fact that Snowflake may be facing slower sales cycles. Those that follow the company closely might have noted that this is one point to pay attention to on their next earnings report. Rather, the point of this is to encourage everyone to focus on real news about companies as opposed to noise. Saul publishes this board’s rules on a weekly basis, and this thread already broke at least four: (1) no blabbing recent investment information from paid services, (2) no cluttering the board with one-line posts, (3) no technical analysis, (4) no market predictions. So I kindly suggest for us to save much of this discussion to when we hear from Snowflake on their earnings report next week.

-RMTZP
Please read this before posting https://discussion.fool.com/we39re-drifting-34904974.aspx and visit
https://discussion.fool.com/for-board-newcomers-and-oldtimers-34… to maximize your learning of the board

102 Likes

Hi RMTZP,

I am not sure if your post is directed at me but I will assume it is.

Couple of points to consider. I am an old time board member, yes, but I turned into a reader of the boards from a poster and do not post anything when I don’t have value to contribute. Being where I stand in the industry (and also as an owner of SNOW myself) I felt this would be valuable information to share. I am an industry insider in the subject matter. That is also what this board relies on.

The information I am relaying is also not just single client observation either. I am in a Slack network with over 3K Data Scientists from the Berkeley program and my summary is from a general consensus of practitioners. I also work at a company that does a lot of consulting with Snowflake for data analytics needs. Snowflake is amazing for data analytics. Less so for data science.

Please do wait for the info coming in the release. I am as well, and will continue to hold until the thesis changes.

All the best,
—-Kevin

38 Likes

For starters, here’s a Fool article on the Cleveland report on SnowFlake: https://www.fool.com/investing/2021/08/20/why-snowflake-stoc…

That article’s conclusion is: Responding to today’s sell-off, which in turn was a response to Cleveland Research’s note, Snowflake fan Piper Sandler said that even if growth does moderate in Q2, we’re probably only talking about a slowdown from average growth of 218% to just 144% – still pretty speedy growth, in other words.

For more on what Databricks is and how they are encroaching from the ML side of the house this article is a great read.

Well, it’s mostly a Databricks puff piece. When that article says:
Databricks, which is plotting its own public offering, is wading deeper into Snowflake’s territory with a new product that lets customers query data for more basic statistical analyses using SQL, a programming language that Snowflake also relies on. What a smart, independent writer should have said is that Databricks is just now, as it’s going public, realizing that its “best Apache Spark” approach doesn’t have legs in the market.

Databricks main claim to fame is that it reimplemented Apache Spark in C++ so it runs faster than the open-source version, and while that’s great for those companies that use the Hadoop data model and are willing to code up specialized programs for that particular environment, it’s hardly good enough for a public company that want to be a highly value growth company.

Databricks is a one-trick pony, and it’s a hard pony to ride. The article has the ease of use comparison completely wrong: Currently, however, Snowflake’s tools are more suited for data analysts — and will likely remain that way for a while. Databricks requires programming, you can get by with simple (and prevalent) SQL for Snowflake (although it can do much more if you want).

While Databricks requires computer scientists to deal with all the Hadoop issues and processing, Snowflake, OTOH, leverages SQL and can handle both structured and unstructured data within the same access model. It supports a variety of languages. It’s super-easy to use since you don’t have to worry about how your data is stored (Snowflake takes care of compression and optimization for you), and you can even run your processing jobs within Snowflake itself, so it’s cheaper (and easier) than running a separate processing job that has to pull data out from another job that’s running within the database process.

And it’s more than disingenuous to say that Databricks will have an easier time coming up with an SQL based solution than Snowflake will have moving to AI/ML applications. First, SQL solutions have been around for many decades - and yet somehow Snowflake’s magic sauce is resonating with customers. Today it’s not enough to support SQL (it actually hasn’t been good enough for 20 years now), so what’s going to be Databricks’ take on it? The article doesn’t say, nor does Databricks CEO Ghodsi.

The article actually does “cover” Snowflake’s approach as: relying on integrations with AI platforms from Google, Microsoft and AWS, according to SVP Christian Kleinerman. “Do you have algorithms that are natively hosted by us? No. But it’s not because we can’t or because we don’t know how to. It’s because we know the space is very fluid,” he told Protocol. “Our entire initiative in AI and ML has been to build extensibility into Snowflake so you can interface with your tool of choice.”

Which is, to me, the winning strategy. Spark is a very good tool in the hands of very good computer scientists. But, is it the be-all and end-all for high-end machine learning or neural net applications?Almost certainly not, especially since you have to massage your data into Hadoop format, which then means you can’t do simple analytics on it.

That’s kind of like saying a 5-axis CNC milling machine is super-great. And it is, but you have to design your parts in a CAD system like AutoDesk, build a series of toolpaths to mill the piece, and then clamp your stock in and run the job. Great, but sometimes all you need to do is cut that 2x4 in half and a circular saw will be much faster. And if the cut is more involved, then use a track on that circular saw. And, for other jobs, use a hand held router or a tablesaw, etc. There is no one tool that is best at everything. Great as Spark is, it’s not the best for everything and so Databricks is far more limited than Snowflake in its TAM and therefore its growth potential.

52 Likes

I will just say – mentioning Databricks…their technology is mature, powerful and pretty game changing for DataLakes, along with something called Trino/Starburst. If those go public, I will post them here. The big thing to realize is that even though DataBricks isn’t public, they have been selling a mature suite of products for quite awhile – similar to the way ESTC and MDB were fairly mature before dropping stock. I think hybrid cloud companies are going to be the next wave of big winners.

2 Likes

I will just say – mentioning Databricks…their technology is mature, powerful and pretty game changing for DataLakes…

Actually, Databricks doesn’t push Data Lakes, either. They claim to be a “Data LakeHouse,” which is some kind of cross between Data Lake and Data Warehouse, or a bad pun. See Databrick’s blog: https://databricks.com/discover/data-lakes/introduction for their take on it.

Note that it would be incorrect to assume that Snowflake lines up in squarely in the “Data Warehouse” column in that Databricks’ blog. For starts, Databricks claims a Data Warehouse can only store structured data and we already know Snowflake stores both structured and unstructured data. Both Databricks and Snowflake are cloud-based, which is where Elastic and Mongo had to migrate since they weren’t cloud native originally.

Snowflake also integrates well with Data lakes, as they outline here: https://www.snowflake.com/trending/data-lake-vs-data-warehou…

15 Likes

got this from twitter-- good analysis on SNOW…

https://twitter.com/Jeremy_Scott_/status/1428741398086537226…

https://twitter.com/Jeremy_Scott_/status/1428741398086537226…

$SNOW is down today on Street research that signings have slowed. We’ve also tracked deceleration in active domain growth since early 2020.

However, that’s just the law of large numbers. $SNOW continues to add gross new active client domains at a high clip.

Importantly, many of these new active $SNOW clients are large enterprises, which sets the stage for substantial growth in computing credits, particularly if more teams integrate.

Activated client domains are a good leading indicator for $SNOW product revenue 6 months out (after the client has been fully onboarded and begins generating compute revenue). The Street has largely baked in this decel. The power of strong IR.

Anecdotally, $SNOW is a gravitational force. More clients are demanding delivery on the platform, which forces more data vendors to accelerate their integrations

9 Likes

Smorgasbord, hardly a disagreement in any points made. However, you’re not addressing the use case needs of the data scientist (not computer scientist mind you, but just people trained in statistics with enough programming skills to be dangerous). There, the platform is Jupiter and Python. Then you go with Spark (aka Databricks) if you are doing massively parallel machine learning in the classical sense or you are using Pytorch or Tensorflow if you are doing deep learning (you don’t do deep learning on Spark - for that you need GPU clusters).

Point being here is that snowflake is just awesome with data analytics and that itself is enough for traditional needs from a business perspective.

But when you approach it from a ML side it is Spark and Pytorch/tensorflow.

Puff piece aside, no data science curriculum teaches machine learning engineering with snowflake. None. They do however use what I mentioned above and in this case Spark == Databricks and Spark != Snowflake. And trying to run Spark on top of Snowflake is a road no data scientist takes. You get absolutely no advantage.

Data analytics vs data science. The enterprise world suddenly discovered they need more data science in their tool set. Those who are retooling their enterprise are pausing to take a hard look. How do they keep their models trained and made operational? Snowflake is faltering down that path.

Again, there is a reason why data science curriculums don’t use snowflake. Until they do, snowflake has something to address and worry about. As investors, we do too.

Best,
—Kevin

15 Likes

Just a couple of clarifiers on the tech side. Spark is not Hadoop, and no computer science or data science person is coding on top of Hadoop for transformations.

Also, although SQL is a common way to interact with Spark (or Python) SQL itself has nothing to do with it. It is the massively parallel processing engine that sits behind Spark that differentiates it from Snowflake, and, in order to take advantage of the framework your data must be partitioned and distributed across clusters in a format called RDS. That’s the breaking point right there. Snowflake is great for its use case, but it’s data is not structured as an RDS. That’s databricks (and sparks) advantage and why you don’t connect to snowflake instance through a Spark data science notebook. It’s like swapping your custom Ferrari engine for a Yugo.

Best,
—Kevin

6 Likes

I’m going to try to continue to gear the discussion away from the technicals and towards business/investing concerns, but we can’t avoid technicals completely:

Spark is not Hadoop

That’s true, but Spark was created to work within the Hadoop infrastructure (and here’s a DataBricks article from 2014 about that: https://databricks.com/blog/2014/01/21/spark-and-hadoop.html… ). The Hadoop-oriented companies like Cloudera (which acquired Hortonworks and such have struggled with both ease of use and TAM, and since tried to move on. In practice, Spark has replaced MapReduce in most cases, but people are still using the Hadoop infrastructure (like the Hadoop file system) with Spark in many cases.

It is the massively parallel processing engine that sits behind Spark that differentiates it from Snowflake, and, in order to take advantage of the framework your data must be partitioned and distributed across clusters in a format called RDS. That’s the breaking point right there. Snowflake is great for its use case, but it’s data is not structured as an RDS.

There’s a lot to unpack here.

First, Spark and Snowflake are not the same type of thing. They’re not even both fruit. Snowflake is a Cloud Data Warehouse with cloud compute. Spark is a distributed compute framework. Spark itself does not have persistent data storage built-in, while Snowflake is all about persistent data storage.

Second, today’s differentiators for Spark (and therefore DataBricks as an implementation of Spark) are essentially two-fold: Big Data and Machine Learning analytics. But, a lot of the Big Data applications do not require the Hadoop filesystem (HDFS) and companies like MongoDB have made a very good business out of making pretty darn big and unstructured databases that are much friendlier. So where Spark comes into play is in its performance (which in open source is good, but Databricks’ implementation in C++ is even faster). The other aspect is that Apache’s popular MLib (Machine Learning Library) runs very well inside of Spark. So, applications that want to use advanced machine learning algorithms for analytics may gravitate towards that.

Third, maybe Kevin is using RDS in some other context of which I’m unaware, but “RDS” in the database world typically refers to Amazon’s Relation Database Service (see https://aws.amazon.com/rds/ ). You may recall a recent post of mine on Mongo vs Snowflake (https://discussion.fool.com/and-it39s-still-in-the-early-days-of… ) in which I discussed the differences between OLTP and OLAP (Transactions vs Analytics). Amazon RDS is for transactions, while Amazon Redshift is for analytics (Redshift competes with Snowflake).

Fourth, I’d hardly consider having to partition and distribute your data across clusters in an “RDS” format (maybe HDFS is what he meant?) as an advantage. OTOH, that Snowflake takes care of how your data is stored for you IS a huge advantage.

you don’t connect to snowflake instance through a Spark data science notebook.

Right, you use the Snowflake Connector for Spark (see https://www.snowflake.com/blog/snowflake-and-spark-part-1-wh… ).

From that Snowflake page:
The connector provides the Spark ecosystem with access to Snowflake as a fully-managed and governed repository for all data types, including JSON, Avro, CSV, XML, machine-born data, etc. The connector also enables powerful integration use cases, including: Complex ETL and Machine Learning. … Using the Snowflake Connector, the data produced by these complex ETL pipelines can now easily be stored in Snowflake for broad, self-service access across the organization using standard SQL and SQL tools…Snowflake can easily expand its compute capacity to allow your machine learning in Spark to process vast amounts of data.

That same Connector can be used to connect your Spark Clusters in Databricks to Snowflake. And this harkens back to what was being discussed up thread - that previously the two companies’ products were complimentary - store your data in Snowflake and when you need to, process it in DataBricks better-than-Apache Spark. But now (in my view) Databricks has apparently realized that the TAM for Spark applications is small niche in the world of corporate analytics and may not be enough for a company that has large ambitions to be a highly-valued growth public company.

It’s like swapping your custom Ferrari engine for a Yugo.

Well, Snowflake is no Yugo, and yet the market for Ferrari engines is quite small. I think my prior tools analogy is more apropos. A 5-axis CNC milling machine can do almost anything, but for many tasks simpler and cheaper tools are faster and easier.

But, back to DataBricks vs Snowflake. Note that Databricks has open-sourced its “Delta Lake,” architecture, hoping that it’ll get some traction and become a standard. I don’t think anyone else has taken that up yet, but if Databricks does eventually show traction and profits from it then a company like Amazon might take it up (as they did with MongoDB and other popular open sourced standards. In the near term, Snowflake is far easier to use while DataBricks needs to reinvent itself as something other than a better Spark. Will they be successful moving into the BI (Business Intelligence) world?

It’ll be interesting to read what Databricks puts out about his finances and development projects as part of the IPO process.

29 Likes

Hi Smorgasbord!

We are getting tons and tons closer.

First, I misspoke on my acronyms. It is the RDD that is behind Spark, not RDS. I must have had AWS RDS on the brain as well.

And that is the part that needs to be unpacked further to understand the differences. First, there is what Snowflake says they can do with a spark connector so that high level folks can say, “see it supports spark you finicky data scientists”. Then there’s the low level details where you expose what’s really going on and say - hey - this does not take advantage of anything in spark. I may as well be connecting through Jupyter.

you don’t connect to snowflake instance through a Spark data science notebook.

Right, you use the Snowflake Connector for Spark

You really don’t. Because as I mentioned you may as well not use Spark anymore. You’re bringing all your data to the “Driver” for processing, which completely defeats the purpose of Spark. Performing your compute where the data is located across distributed clusters.

The key idea of spark is Resilient Distributed Datasets (RDD); it supports in-memory processing computation - at those nodes where the data is located. This means, it stores the state of memory as an object across the jobs (coordinated by the “Driver”) and the object is sharable between jobs. Data sharing in memory is 10 to 100 times faster than network and Disk.

Snowflake does something like this too. But for a different use case, and you can’t cross the streams (to quote a favorite Bill Murray movie).

Now, here is something to study on the Snowflake front. It’s called Snowpark. And Snowpark is Snowflake’s direct answer to the Spark model. It lets you distribute your code to the remote instance where Snowflake maintains its compute and data - and you can send that code across their clusters. The concept is similar to Spark, and in theory could address the ML use case. But it is not there yet. It’s still in preview, and has a long ways to go. The other hurdle: No penetration of this model and architecture in the data science curriculum. A whole slew of data scientists have been trained up on the spark use case (and yes, before that Hadoop) and the Snowflake approach hasn’t even been winked at.

Here’s one of those courses.

https://ischoolonline.berkeley.edu/data-science/curriculum/m…

Closing this out (or taking it offline because this is a fun topic to talk about) the next question to ask: “Is any of this important?”

That depends on where you see the space evolving.

Will Data Science needs overtake the needs of the Analytics? If so, when? Who is in best position? Who is penetrating and getting to the talent?

My course curriculum at Berkeley actually switched during my term last summer from running our own Apache Spark instance in the cloud to having a direct partnership with Databricks. So, a whole breed of Data Scientists are already familiar and comfortable with not just Spark, but the actual Databricks product (there are actually some nice differences and features beyond just the C++ implementation).

All my best Smorgasbord! This has been a delightful conversation to say the least. If you’d like to chat further on the technical let’s go to email!

–Kevin

11 Likes

Will Data Science needs overtake the needs of the Analytics?

I’d like to steer this thread back to the investment thesis and talk about the evolution of products.

My hypothesis is that a large proportion of digital products for the remainder of our lifetime will involve not just an analytic component (your traditional business intelligence use cases) but the inclusion of actual predictive models.

Building predictive models requires a different skill set (Data Scientists) and different data (specially engineered and transformed feature data that predictive models can be trained on). You can’t train models on data that comes out of an EDW. That data must be processed, transformed and encoded as it moves down its own pipeline. The end result of that transformed data isn’t anything a human would use to perform analytics themselves. Instead, you get a feature matrix ready to be trained on machine learning algorithms.

So at an investment level where do you personally see the future of digital products? What is your hypothesis? What do you believe Tesla uses behind the scenes for it’s image detection so you can avoid a collision? Is the use of predictive models in an “app” going to be commonplace for our future? Who will be the consumer of these new types of applications? How big is that market? What companies will use predictive models as a core differentiator? How will they build their internal data platforms and pipelines?

I hope this can serve as a starter for where and how to look at our companies and where there may be opportunities. Prediction in particular, and not just for Snowflake.

Also, as a means to evaluate where we are invested, and for how long we continue dating. I had a very hard lesson from about 5 years ago. Never fall in love with a stock. Just learn to date your company and cut ties when the relationship is no longer working out (thank you Saul).

What do you think? Will building machine learning models become more important than performing analytics for companies? Just some food for thought.

Best,
–Kevin

11 Likes

I am new to this board as in I have never commented here before. However this discussion is prompting me to chime in as there seems to be some misinformation here about Snowflake and Databricks and I hate seeing disinformation spread, especially to folks on this board. Having started with Hadoop mapreduce and replaced it with Apache Spark (open source) in 2014, I can say that I firmly agree with what Smorgasbord has posted.

Databricks is not going to replace data warehouses period.

Databricks is a data processing engine and best for processing certain types of datasets, think large, unstructured and streaming. Having worked with ML engineers and data scientists I can tell you that even they prefer plain python and pandas with Jupyter notebook over Spark. They feel forced to use it with larger datasets.
Databricks has made it easier for Data Scientist to use Spark with their notebook interface keeping it closer to Jupyter, however I see it as niche as well. Some of the times that I used Spark with or without Databricks the data ended up in a Data Warehouse like Snowflake for analytical and reporting use cases which are broader and have a larger market.
Snowflake has further advantages and features over traditional databases even cloud native ones like Redshift. Cloud agnostic and data sharing features being popular.

Maybe one day the ML and data science use cases will grow, which they will and even in that scenario I doubt that Databricks will corner that market. Often times I have seen data scientists not wanting to deal with all the self management baggage associated with it and prefer Tensorflow from Google Cloud.

In short,
Snowflake and Databricks serve different use cases.
I believe in Snowflake and any slowdowns might be temporary.
Snowflake has a larger addressable market over that of Databricks, think both traditional and other cloud data warehouses, Newer implementations are going straight to Snowflake (This from personal experience).

That said, I can’t speak to stock valuations, expectations etc. and there could be a slowdown. I am also a big fan of Databricks and have done my share of evangelizing it to Data Science users, beyond traditional data pipeline processing for data engineers.

41 Likes

Hi rxkfoo, I’m glad this thread inspired your first post! I also feel like a new poster after being away for so long.

I sincerely hope this back and forth didn’t give rise to misunderstanding (or misinformation as you said). I think the dive into the technical details may have done that and if so you and everyone have my apologies.

Databricks is not going to replace data warehouses period.

Databricks is most certainly not a data warehouse tool. But, using Databricks for what it is (for data science) against Snowflake for what it is (for data analytics) illuminates that a fissure exists.

I do not know if Snowflake will slow down as a result of the fissure. But Snowpark is there to try and address it.

I think that’s the point of the back and forth. Hope that clarifies at least a bit.

The two use cases are different. But as a data science practitioner myself, I need access to a large scale data pipeline for model training, and Snowflake being what it is becomes a data source to me that I can’t do additional manipulations on directly through Spark. My workaround is to output my data from Snowflake into S3, and then proceed to build my own data lake for transformational processing. But this is not cost efficient.

My client is building a brand new data platform - to support both use cases - and one that is the most cost efficient. They are not the only ones out there, so you can see there is a conundrum.

Best,
–Kevin

4 Likes