Snowflake features to keep an eye on

Snowflake recently made available two features that I found important as a practitioner:

#1 Snowpark is now Generally Available…

This is their competing tool with Databricks’s Spark.

Spark is the go-to tool for distributed compute right now. But many companies, like ours, face the problem that

We are using Snowflake as our data warehouse
We need to run distributed Python scripts, but don’t want to subscribe to Databricks on top of Snowflake unless there’s a compelling reason.

I wrote about this feature back in November:…

This is one feature that can expand their TAM by tens of billions, in my opinion, from their data warehouse business. It is far from done, but they are in a great position to expand this and they do have the technical ability to do so and compete with DataBricks. Both Snowflake and Databricks are founded by PhDs in distributed computing. The only other competitor I can think of is Google, but GCP’s market share is so low that I wouldn’t worry about it at all.

In short, this is the feature that has the best potential to become their second meaningful growth flywheel besides their primary business of data warehousing.

#2 Object dependency in Snowflake

This one just came out last week and is not in their release note yet…

Afaik, Amazon Redshift or Google BigQuery have nothing like this. I will explain why this makes Snowflake a lot more attractive than competitors in the next section.

My current take on the data industry - why Snowflake has the strongest ecosystem

Snowflake + dbt has become the de factor stack for data engineering in the past year. If you are starting a new data warehouse, this is the 1-2 punch people recommend today.

On top of that, two hot topics in data engineering are Data Observability (Monte Carlo, etc) and Reverse ETL ( etc).

Data Observability means that you want to monitor the quality of the data you loaded into your system and track if there’s an anomaly in your data system, like unusually long load time or unusually small amount of data loaded. You also build the “data lineage” graph for governance, maintaining single source-of-truth, and preventing dashboards from breaking when making changes.

To me, Snowflake’s Object Dependency feature is a clear sign to me that they are working on Data Observability. It is unclear whether they will tap in this market themselves (my guess is no), but this allows the ecosystem to become a lot more attractive and sticky.

Reverse ETL means that you want to load data from your data warehouse back into a SaaS system. For example, load your machine learning predictions of customer life time value back to your marketing platform like HubSpot, then use that to decide who gets what kind of marketing email.

Snowflake doesn’t have a feature for Reverse ETL and I don’t believe they should build this product themselves, but almost all reverse ETL tools have built-in Snowflake support, as opposed to none to very little support for traditional data warehouses - Hightouch doesn’t support Teradata, for example. So this is another strength in their ecosystem - if you use Snowflake, you are much more likely to find a tool that can solve this problem for you.

My positions

I currently only have an experimental position of 0.2% of Snowflake, but I am a power user, love their product, and will likely use both features in the next year. Their valuation is a lot more attractive to me (still nosebleed…) than last year when they IPO’d. With these new features expanding their TAM, I plan to start building up my position with new paychecks.


Snowflake recently made available two features that I found important as a practitioner

This post, #83174, is well worth a read if you missed it!