Snowflake vs "Open" Source/Data

Do Customers Want Open Data Platforms?…

Interesting article comparing Open Data/Open Source to Snowflake, coming down in favor of Open.

The article points out that Snowflake was an early detractor of Hadoop. But glosses over the very real problems that companies trying to implement Hadoop (an open framework with free utilities) have encountered. Those problems are well described in an article which quotes former Snowflake CEO Bob Muglia (… ).

The article does admit that Snowflake makes a compelling case that shielding user from technical complexity is worth giving up that control, but then comes around and claims customers won’t be happy with non-open source/open-data: “What about an enhancement in the file format that enables better compression or better processing?” they ask. says the article.

Well, we just saw an example of that! Snowflake implemented additional data compression. This was done completely behind the scenes with zero impact on users or workflows, with the only result that costs for Snowflake customers declined. An ultimate “taste great, less filling” technical result if you ask me, compared to trying to roll your own data compression on top of some open framework that isn’t tuned for performance.

And of course, competitors like Dremio claim that Snowflake’s architecture is just another veiled attempt to lock customers in to Snowflake, and is quoted as saying: The only functionality you’re going to get is what Snowflake is going to provide you. Which is inaccurate since Snowflake is working with multiple vendors for connectors so that you can do processing in/with them instead. Databricks is cited as an example of “Open,” without mentioning that it’s all Spark-based and that you have to literally code programs to use Spark, or that Spark is just another part of the Hadoop eco-system.

At any rate, I found it interesting to read an anti-Snowflake article even if it did have much with which I disagree.


Interesting article.

Being a lifelong supporter of Open Source, and someone who has contributed to it, used it, implemented and supported it for large and small companies, I completely agree that open and standard formats are a good thing, especially for the consumer. They enable transparency, flexibility, and portability.

I despise proprietary data formats, which is why I refuse to use pretty much anything from Microsoft if I can in any way avoid it.

All that being said, there is, very obviously, a place for closed source and proprietary data formats. Companies like Microsoft, Apple, and Snowflake, offer something the open source world has basically failed at for the vast majority of applications; usability, support, and documentation.

Open source is, in many cases, very high quality software. The vast majority of the internet would not exist without open standards, open formats, and open source software. AWS is built entirely on open source software, so is GCP. The internet only works because of standards not only in software and data formats, but all the way down to the hardware and networking layers.

But one must often be a technical and technological guru to figure out how to leverage these technologies. I know. I am one of these people. I could never see my own kids using something like Linux on a daily basis as I do. Or the average person configuring open source mail servers and mail clients. And that stuff is child’s play compared to what Snowflake is doing.

Hadoop was garbage. It failed because of that. It was difficult to understand, difficult to implement, difficult to support. Snowflake solves that. Much like AYX makes data analysis tools for the the average person, Snowflake brings data warehousing and analytics to the same people.

As an IT person I would much rather deal with a subscription service for my data storage and manipulation than deal with having to build or support my own data lake. Building such infrastructure is painful, tedious, and not at all fun. Supporting it, securing it, and maintaining it is a thankless job. And there is so much that can go wrong. And, as soon as requirements change, you’ve got people yelling at you to now morph an inflexible infrastructure into something it was never meant to do.

As an executive of a company where my core competency is NOT building data centers (physical or virtual) I want my IT people focusing on building, improving, scaling, and delivering our core competency to our customers. Not wasting time and money on fighting with some Rube Goldstein infrastructure of our own making that is not generating revenue.

I want to pay an electric bill. Not build and maintain the power plant.

Snowflake will never lose out to open source solutions from any company that understands they don’t want to own the power plant. They may eventually be forced into open standards, and that’s a good thing for the customer, because it enables portability and prevents lock-in. But, if SNOW can maintain the technological edge while embracing open data format standards, they’ll be just fine.

Microsoft has yet to lose out to any competitor just because they support open standards with Office, despite the plethora of open source and free alternatives which can import their file formats.