SNOW vs Databricks orchestration

From Business Insider……

Snowflake’s reliance on other companies may end up as a liability as its top rival Databricks moves into another billion-dollar big-data business
Matthew Lynley 1 hour ago

Databricks and Snowflake rely on orchestration, a process that schedules data-crunching tasks.

Databricks launched a product that includes orchestration, opening another front in the rivalry.
This leaves Databricks with an opportunity to capture another big market and edge out Snowflake.

Like the rest of the market, the data-warehousing giant Snowflake has seen its value crash, plummeting to a $44 billion market cap from a high of more than $110 billion a few months ago. And as the company looks to grow by expanding into new areas, such as machine learning, its rival Databricks is quickly building its own efforts across the entire data-analysis process.

Now a new battleground between the two companies is emerging at a critical time when Snowflake is under pressure to regain that lost growth. Orchestration, an early step in the data-crunching process, is also one of the most critical parts of the workflow for both companies. Databricks recently built its own orchestration tool, opening an opportunity for the company to capture another big market and edge out its primary rival.

Orchestration tools drive large workloads to companies such as Snowflake and Databricks. Databricks’ new orchestration tool, Delta Live Tables, gives the company a way to continue growing into an all-in-one tool, going into direct competition with Snowflake’s Tasks.

Primarily seen as a growth stock, Snowflake has aggressively expanded into machine learning. But orchestration is one of many areas of the data pipeline. Snowflake could indeed find opportunities for growth if it were to expand its presence there, said Daniel Newman, the principal analyst at Futurum Research.

“Growth at any cost is on hold,” he said. “Efficient growth from net revenue expansion and measures to decrease costs and expand margins will be in focus. Investors want to see earnings or at least a strategy to getting positive earnings and cash flow.”

Snowflake did not respond to a request for comment.

Startups such as Astronomer and Prefect already proved orchestration is a billion-dollar business
The first step in using data is scheduling the order for how the data is processed. Data engineers use orchestration tools to define the precise order in which these steps happen, understand if any point in that process fails, and tell the system what to do if it does not fail.

Various tools exist to help companies with this process. The most popular is Airflow, an open-source tool primarily maintained by an Ohio startup called Astronomer.

Astronomer, the largest startup in the space, was recently valued at more than $1 billion in a secondary round that was part of its most recent series C round, people familiar with the deal told Insider. But its history with Airflow is somewhat controversial — depending on whom in the industry you talk to, Astronomer either pushed its way into managing Airflow or revived an open-source tool that was stagnating.

Still, the round places a billion-dollar peg on the part of the data-processing stack that Databricks has its own product in, while Snowflake relies on partners such as Astronomer and Prefect. It also puts Astronomer in rare company with other unicorn-level big-data startups, including Hugging Face, dbt Labs, and Dataiku.

How orchestration is becoming a battleground between Databricks and Snowflake
Like many parts of the modern data stack, Databricks has its own product and works with external partners. Astronomer and Prefect are both options for Databricks customers, but they can also use the internally built Delta Live Tables.

Snowflake’s scheduling tool, Tasks, doesn’t necessarily satisfy all the needs of a strong orchestration tool either, said Jeremiah Lowin, Prefect’s CEO. Instead, a lot of the value relies on seeing where tasks are failing. Prefect focuses on alerting companies as quickly as possible and providing backup operations when something fails.

“We view scheduling as a commodity and not orchestration per se,” Lowin said. “The ability to run things at a certain time has been pretty maxed out on the innovation front.”

While many companies may opt to use those external tools, either through legacy adoption or specific needs, Databricks having its own product gives it another edge on Snowflake. Delta Live Tables, in particular, focuses on automating as much of the process as possible, giving customers yet another “set it and forget it” option if they want to keep their process as simple as possible.

“If you’re just running jobs, that’s great, but customers want to see what those jobs are doing,” said Ali Ghodsi, Databricks’ CEO. “If they’re failing, can you automatically fix it and automatically rerun it and optimize it. Our customers pushed us to have a better understanding inside the jobs — that’s why we built Delta Live Tables.”

Still, companies such as Databricks extending into new areas doesn’t necessarily mean they will automatically win in that business, said Ethan Kurzweil, a partner at Bessemer Venture Partners and an investor in Prefect. That might still enable Snowflake to fend off Databricks as it continues to work with multiple companies.

“I honestly don’t worry about the basic extension tools of the core platform technologies, as that’s always the starter edition of the more fully featured offering from a dedicated vendor like Prefect,” Kurzweil said. “The true high-value use cases for orchestration will continue to be more complex code-based orchestration and a UI that reflects the workflow of all of the various flavors of data team.”


On the article’s point that Databricks will edge out anyone with an orchestration product is…overblown.

Orchestration at the core means that “I want to run three jobs X, Y, Z in this particular order at this time.” There are so many tools around it, but most people care about “how do I know when something failed?” and “how do I monitor its progress if it takes a long time?” It is essential to data engineering but I can’t see how it’s relevant to this narrative of Databricks vs Snowflake.

The biggest competitors in the space, ordered by popularity, are

  1. Self hosted solutions - mostly with Airflow and some still use Oozie.
  2. Managed Airflow native to cloud providers. Both AWS and GCP have their managed Airflow services: AWS MWAA (…) and GCP composer ( Azure does not have one.
  3. Managed companies like Astronomer (based on Airflow), Prefect, Dagster (these two are their own technology and both are very good.)
  4. If you are a one person/two person shop, you may as well use a simple tool like Cron.

The problem is that the space is much smaller and – after Airflow 2.0 came out, many of the problems with Airflow 1.0 that basically started Prefect and Dagster to are now solved - for free. So more and more people are going to 1 and 2 instead of 3. And Databricks, with their new product, will need to compete against 1, 2, and 3. Why do I need to switch??

Both Databricks and Snowflake are solving important problems with long runway with their primary products. I think Snowpark (Snowflake, just became generally available in February) vs Spark (Databricks’s primary product) is more likely where the two would collide and Databricks has a huge head start here.

In my opinion, this is like Snowflake buying Streamlit trying to expand their ecosystem, but as a product won’t have material impact on their revenue.

The only company that I know that is trying to do everything data end-to-end as a package is Palantir, but that is a separate topic.