Part of my thesis for owning Cloudflare (NET) is believing they can accomplish their vision of becoming not just a platform (which they already are) but a full public cloud. This, along with the similar news about Snowflake (SNOW) last month, says to me that some very important players share that confidence.
The other reason this interests me is because I have been patiently waiting for a Databricks IPO for a couple of years now. IPO-land has been on lockdown for at least a year, but with some signs that the market might be picking up, perhaps Databricks may yet come into the mix. At least I’m pleased to see them included among NET’s partners.
This part is big and I believe will be a driver for R2 growth
Databricks will now support Delta Sharing from Cloudflare R2, Cloudflare’s zero egress, distributed object storage offering. This seamless integration enables data teams to share live data sets in R2 easily and efficiently, eliminating the need for complex data transfers or duplications of data sets, and with zero egress fees.
Imagine you are say, HubSpot or another CRM company with 100TB of data hosted on Snowflake Data Marketplace or Databricks Delta Share to let others integrate with your data platform.
AWS egress fee is 0.09 per GB, which would cost $9,000 for 100TB. Integrations tend to run daily, so just for egress fee alone you’re looking at $9,000 a day or $3M a year - on top of the data storage fees you’re paying to AWS. Even if you look at the tiered pricing from AWS (the more you use, the less you pay) it still comes out at around $2M/year just to pay for network bandwidth. By hosting data on R2 you eliminate this cost. This is the value proposition for Cloudflare clients.
Databricks and Snowflake are the two companies that sits on top of the AI craze with actually useful products. Especially Databricks - they are basically the only option for petabyte level (1000s of TBs) data work. So the fact that they start integrating with R2 means we may start seeing petabyte level data actions on Cloudflare platforms. It was a matter of time that we see this.
Great question Bear, the reason is very few people have products that only uses the “storage” part of AWS. Most people use AWS to run website backends and the Cloudflare offering is extremely lacking compared to AWS or any big cloud as of now. For a large exodus to happen they need quite a few product rollouts to catch up.
Snowflake and Databricks Data Shares are two of the same, but special kind of products that basically only uses storage – only goal is to host data for others to grab – which is quite different from most companies day to day use of cloud. BUT I think it makes sense for data science/machine learning teams to use R2 for their data source instead of S3 and I’ve started to hear people doing it this way.
What’s interesting here is that Databricks is now going directly after Snowflake’s Marketplace (aka Snowflake Data Cloud). Look at the Databricks press release instead of the Cloudflare one, and you see that Cloudflare is just one of four vendors supporting Databricks’ Data Sharing (the other three are Dell, Oracle, and Twilio).
Part of Databricks’ plan is to make their standard an “open standard,” which implies that they would not control it, at least not exclusively. I view this as a shot across the bow of Snowflake, which has a proprietary sharing protocol. One question I have would be on the revenue accounting. I believe Snowflake makes it pretty easy to profit off your data sharing, and provides good granular controls over who gets what how often. I don’t know if the Databricks standard provides any sort of back-end for revenue collection (and if so, who controls that?), and what level of control companies have for sharing/not sharing, not to mention accounting for who owes what to whom.
Presumably, Databricks has throught this through, but I’ll wait for someone more knowledgable to report back.
Some great points. I’m not as deeply familiar with Databricks as I am with Snowflake so some of what I’m about to say here could be inaccurate. Anyone who knows better, please correct where I’m wrong.
The fact that DB is opening up to several vendors tells me its not as easy to use as Snowflake’s. Likely need to do some configuring (such as create and manage your Cloudflare account, perhaps create cloudflare buckets etc…). I’m not sure but that’s my interpretation. In addition, if you haven’t read Peter’s writeup on Snowflake yet, do that. He provides information on competitors I was unaware of. One was that databricks creates copies of the data for sharing, where Snowflake does not. Important distinction. Meaning as Smorg said, if the share-er stops sharing with the share-ee, then the sharee likely keeps the data they had because its in their account and they then control the data.
Summary: Its a good step for DB but IMO, the Snowflake one (turnkey, more control for sharer & not duplicating data as much – details here except for copies across regions) is still likely better.
I would be surprised if the “sharee” can’t make a copy (export to Excel or whatever) even if Snowflake doesn’t make one. Also, with either DB’s or SNOW’s method, if the “sharee” stops paying for the data, they may have their copy, but it is static, and they won’t know when the data is updated by the “sharer.”
You are correct. There is nothing preventing the sharee from making a copy. Overall, sharing is intended to be easier to combine disparate data sets, replacing APIs and such (Salesforce with ServiceNow with other data).
So yes, whatever data sharee copies over, will still be theirs.
According to the Databricks press release I linked to above:
Partners can share live access to data, AI models and notebooks directly with consumers without costly or complicated replication.
So, this sounds like a new development from DataBricks, unless DB is playing games and saying customers don’t need to replicate because the sharing mechanism is doing the replication for you.
I do wonder how Snowflake enforces what Offringa describes: “The recipient can’t “keep” a copy of the data after the partnership ends. …Access to data for any partner can be immediately revoked without having to request that the partner “delete their copies”.”
That said, I would expect that data sharing agreements are structured so that consumers cannot keep nor use copies of the data after the license period ends.
First, Smorg, thanks for that info about not copying data in DB environment. Very interesting… I wonder how they’re implementing & doing it under the covers…
In Snowflake, a share is represented as a database in the sharer’s Snowflake account. The sharee does NOT pay anything to have that data there. Zero in data storage and zero CPU, zero in egress. This is because its not moved, just pointers and metadata is utilized. Snowflake’s database storage architecture versions everything. So with data sharing, its just making that same version available to whomever the data is being shared with. No copying of data or anything like that. For techies, conceptually, its similar to source control. So the main ‘branch’ will continue to have additions (and that access will be able to be viewed by the sharers). This is also what Snowflake leverages for a lot of their other techs within the stack, like zero-copy cloning, which is often used to refresh nonprod data in the snap of a finger; instead of restoring or some other method to move data… oh, I’ve gotten WAY off topic. Sorry y’all. I can talk about this stuff all day!!
Anyway, if a share is revoked in Snowflake, that database no longer has the tables that are shared from the sharer. Any data the sharee had copied to other places (either in that same database as a separate table or another one), is still owned & controlled by the sharee, not the sharer.