SNOW: Francis Crick Institute

There’s a nice article linked above in Computer Weekly from last week that highlighted the Francis Crick Institute and SNOW.
I found it was an interesting case study into how Snowflake can quickly embed into the critical workings of a data driven organization.

The Francis Crick Institute is a biomedical discovery institute that was established by funding from UK government agency, UK universities and charities. It has partnerships with organizations around the world. The article linked above specifically quotes 1,400 institutions across 90 countries.
They deal with patient data, so the need for data security and privacy is paramount (complying with HIPAA, GDPR).

To increase the scalability, efficiency, and security of its data infrastructure, the Francis Crick Institute chose to ditch its on-premise data centers in favor of the cloud - specifically, the use of Snowflake’s Data Cloud.
From the institute’s CIO:

“In scientific research, you used to have to log into a file store. Snowflake offers collaboration as a shared service, with real-time access control and workflow for auditing.”

They use all three public clouds that underlie Snowflake (AWS, Azure, GCP).
Doing so has permitted their researchers real-time access to data in a standardized and secure manner.
Gone are the days of highly local and unnecessarily complex project frameworks/infrastructure where solutions were pieced together haphazardly. Now infrastructure can be set up within 30 minutes instead of YEARS, thanks to Snowflake.

"…by using Snowflake, the IT function at the Francis Crick Institute can offer a more permeable environment to enable researchers to work within an ecosystem. “Developing the infrastructure for a complex global consortium used to take as a one to two-year undertaking,” he says. “Now a secure, auditable environment can be built in around 30 minutes.”

What’s great is that the “Data and Analytics Research Environments UK (DARE UK) consortium led by the Francis Crick Institute” is working on a standard platform for research consortiums - where Snowflake is the centerpiece for data:

"The proposed platform is designed to support the creation of secure trusted research environments on-demand, built around the needs and restrictions of individual research projects. The architecture is based on a set of core components that include Snowflake for the data sources, Apache Airflow for workflow, DBeaver as the data extraction, translation and load component, Okta provides user authentication and ServiceNow offers additional controls.

Out of curiosity, I tried to estimate how much the Francis Crick Institute is spending each year on the cloud.
The above article indicates that they only have 88 people in their IT department, and yet, according to their 2021 financial statements (Annual reviews and reports | Crick), they had spent $16 million USD for their IT budget.

I think with these figures, it’s very possible that this organization could be (or become) one of Snowflake’s $1 million+ spenders. There are projections out there that see 51% of IT budgets will be spent on the cloud by 2025 (IT spending will be mostly cloud soon. Are you ready? | InfoWorld). I wouldn’t be surprised if at least $1 million of the $16 million budget is already onto SNOW for the Francis Crick Institute!


SNOW makes data sharing more efficient and possible where it wasn’t feasible before. In addition, SNOW will make data sharing faster and less costly. Development of drugs takes years and billions of dollars. Making it happen faster (data sharing will help enable that) will not only reduce costs but also extend the time a drug is on-market with patent protection; this will increase revenue and profits. Utilizing SNOW data sharing in an ecosystem is an absolute no brainer for the life sciences vertical.



New article from two days ago


This will spread to other disease projects and it will greatly interest biotechnology and pharmaceutical companies. I think the outcome will be that SNOW will completely take over the life science vertical for the more data gets into these SNOW managed systems and the more organizations take part, the harder it becomes for a SNOW competitor or substitute to gain any sort of foothold.



I like SNOW as much as the next person, but what is your basis for the two claims in bold? Are there some market share numbers for life science data storage to support this?

There are many competing cloud databases, storage and analytical (including data science) solutions. I like SNOW’s chances, but saying “completely take over” seems a stretch. (but I think I’d be ok with it!)

As an example, many, many companies heavily use Microsoft products and a natural path of least resistance (that Microsoft is happy to assist with) is to migrate from Microsoft on-prem to Azure, including data storage. I’m not saying SNOW can’t win here, but there is real competition. Same goes for MDB.


The reason I predicted that SNOW will take over is that

  1. there are cost, time, and revenue benefits from sharing data as it speeds discoveries

  2. there are privacy, confidentiality, regulatory, and commercial (for profit companies want to benefit from sharing but they won’t share certain aspects because they want to protect trade secrets and patentability) barriers to freely sharing data. Therefore, there needs to be a gatekeeper to ensure that all these conditions are met.

  3. Given #1 and #2 above, the building of a system is quite an effort in time and money. So once one is established, I think it will be the standard. This is especially true if data can be useful for more than one disease category: for example, data in an arthritis database may have some value in other autoimmune conditions.


Data Gravity (some might call it FOMO), they’re already seeing this in the Financials industry vertical. Quote from CEO Frank Slootman on the latest CC.

As Mike said, financial is very much driven by the fact that historically in financial services institutions are pumping massive amounts of data around every single night to all these different destinations, especially in asset management and subsectors, you know, like that. So, we really view – I mean, data networking plays out differently in every industry sector and subsector, but they become – there’s a lot of data gravity is what we call it that starts to happen that benefits us enormously. It really lowers the friction of getting access to new accounts, and you see that very pronounced in verticals like financial services.<