Snowflake now using Python

In a press release today, Snowflake announced that data scientists, data engineers, and application developers can now use Python - the fastest growing programming language1- natively within Snowflake as part of Snowpark, Snowflake’s developer framework. With Snowpark for Python, developers will be able to easily collaborate on data in their preferred language. At the same time, they can leverage the security, governance, and elastic performance of Snowflake’s platform to build scalable, optimized pipelines, applications, and machine learning workflows. Snowpark for Python is currently in private preview.

Saul

53 Likes

I think Snowpark is pretty big deal in the long run. They are challenging Databricks.

If you want to run distributed compute with custom developed code, there is no good solution on the market besides Spark. Spark is an open sourced framework created by founders of Databricks. Databricks run managed services that lets you execute Spark code on the 3 big cloud providers.

Both Snowflake and Databricks charge by amount of compute on their platform. Where do the biggest compute come from I.e market with the largest TAM? It’s the machine learning training jobs. Think about churning through tetrabytes of data thousands of time to tune a model. Spark is pretty much the only solution when you go beyond the tetrabyte/petabytes level.

All three cloud provides have ways to let you execute Spark code on their platform but none works as well as Databricks. In fact IIRC Azure basically tells people to use Databricks platform if they can.

Back to Snowflake.

You can write Spark code in Java/Scalia/Python/R today. So there’s no feature parity from Snowflake yet. They are trying to catch up.

By adding Python, the dominant language in machine learning, it allows data scientists write heavy compute code on Snowflake that used to be only available on Spark.

Currently, the features on Snowpark are still lacking - no native way to train ML models on Snowpark yet. I would pin them at where Spark was 5 years ago. So I doubt it will have material financial impact in a year or two. But it’s pretty clear that’s where they are headed - they want to take the lunch from Databricks.

47 Likes

Hello, I have a slightly different opinion to Chang88 on the AI/ML part. this is certainly good news but I hoping to put in my 2 cents here. I wouldn’t be overly excited on something like this for the following couple reasons

  1. Python is popular - among certain group of people. If you need 1 liner on some BBS to get all coders fired up, all you gotta ask is “what is the best programming language?”.
  2. Python is an interprated language. So what? Ultimately for computers to understand the command, the code has to be somehow translated to “Machine Code”, the catch is in the translation process. Unlike compiled code (C++/C and some others) that get converted directly into machine code. the interprated language will need an interpreter to run thru each line to execute each commend. This makes Python slower. It used to be considerably slower, but not sure if the speed has been improved. This is also the reason C/C++ remain popular in many commercial trading algorithems and any time sensitive software, despite “on their way out” as argued by many.
  3. So Python is slower, but how does this matter. See on Commercial AI/Machine Learning, there is this process of refining or tuning the model. It is a complex process of finding some solutions of set or multiple sets of variables to make a set or sets of equisions simulate the result based on the input. (I have probably simplified this by 1 M times). A lot of serious companites will rent computing services to run these refining process to get the model.
  4. Back to topic, when you are renting things, time is $$. From my impression that many company before they pull the trigger they will rewrite - at least the part of the code that excuted repetatively the most - in C/C++.
    So, Adopting Python is a always a good thing, I am just not sure how good it is in the AI/ML part.
4 Likes

So, Adopting Python is a always a good thing, I am just not sure how good it is in the AI/ML part.

Python has downloadable libraries like NumPy and Pandas. Key parts of those are coded in C (a compiled programming language that runs fast). So the code that structures the neural nets or other ML models may be in Python, but the number crunchers that do all those matrix multiplies runs in optimized C.

Python enables those kinds of connections with other programming languages so the “interpreted” part is easy for users to make changes quickly, but the internals that do the heavy number-crunching still run fast.

There are more details, but my experience with ML is dated, so others will know more.

I am kind of surprised that Snowflake is just announcing Python support now. The Python language has been around and a key tool of data science for a couple of decades.

15 Likes

I know Python support is getting all the attention, because, well, developers are all over the internet, but I think two other announcements are much more interesting.

The press release is here: https://www.businesswire.com/news/home/20211116005473/en/Sno…

The first item I that caught my eye was: Improved replication performance:
Increased efficiency of data replication capabilities has resulted in up to a 55% performance improvement as experienced by one of Snowflake’s largest customers, which in turn translates in up to a 55% reduction in customer replication costs since Snowflake customers only pay for what they use.

Note that since Snowflake is essentially as DBaaS (Database as a Service), customers do not have to change a thing to take advantage of this improvement. Because everything is done under the covers and managed by Snowflake, these kinds of improvements come “for free,” without requiring any setup or coding changes. This is a huge long term advantage for Snowflake’s model.

The second item was a bullet item, saying:
ZoomInfo, which drove $1 million in ACV growth for their business after just six weeks of listing on the Snowflake Data Marketplace.

I’m always interested when two companies we discuss here are partnering, or just using each other. In my experience, best-of-breed companies seek out other best-of-breed companies to use or partner with. Having ZoomInfo experience tangible benefits from using Snowflake’s Data Marketplace in only 6 weeks is pretty extraordinary. I admit I haven’t yet invested in ZoomInfo, nor have been that interested given my personal bent towards privacy, but this little of bit of info not only confirms my belief in Snowflake and its Data Marketplace offering, it is causing me to look at ZI more closely.

76 Likes

This is great news.

By the way spark is open source, maintained by Apache and can be used with snowflake. https://www.snowflake.com/guides/what-spark

I don’t want to go into the weeds of languages, but I do want to correct some points mentioned about python above.

  1. Python is dominant in ML and Deep Learning as well as a lot of computer graphics data processing work. It is just great for rapid prototyping, scripting and automating. There are so many libraries and frameworks that people can grab and start using right away, e.g. NumPy, SciPy, PyTorch, TensorFlow, Spark, APIs for processing images and video, and on and on.

——- tl;dr; If you don’t care about python, stop reading. ——-

  1. Python is a high-level language, not an interpreted language. That means…
  • Python is plenty fast and most of the high frequency computations happen at lower layers written in C (can expose C++ using boost and such too). It is certainly possible to write python code that is slow, but that doesn’t mean Python is slow for all things as a rule. Also ‘slow’ is relative. Running 1 million lines of Python code that construct class instances will be slow, but running a single line of python that does 1 million things in C may be fast; use it to launch 1000 parallel processes that each do 1000 things can be fast too. It just depends. Note I said processes, not threads. When you want to process huge amounts of files, for example, this is what you want.

  • Python is also not an interpreted language. It’s more of a hybrid. It is compiled at runtime in to bytecode, if not already done (optionally kept as .pyc files; the ‘c’ is for ‘compiled’).

9 Likes