There has been some discussion occasionally about the low-level technical details of Fastly (FSLY) vs Cloudflare (NET), especially around the different runtimes (the component executing software on their edge servers) they use and the potential significance of those differences even for their businesses.
Recently, both FSLY and NET blogged about porting the iconic Doom first-person shooter game onto their platforms. While getting Doom running on a platform doesn’t yet tell much (it has been ported to run on everything from refrigerators to supercomputers) their blog posts offer us a chance to evaluate the different approaches they took and analyze the outcomes a bit.
I’ll try to explain below, hopefully in a language also non-technical people will understand, why I consider Cloudflare’s effort more impressive than Fastly’s. I will also try to explain the architectural differences of their approaches and platforms and why this shows how the often talked startup time of the runtime (FSLY’s Lucet vs NET’s V8) is only one small aspect in much larger architectural discussions.
Before the actual comments about the porting efforts I want to point out that while the V8 runtime (developed originally by Google) is best known for being used as part of the Chrome browser, Cloudflare is definitely NOT running Chrome or any other browser on their edge network. Based on comments on Twitter this is not clear for everyone.
Let’s start with Fastly’s effort which is described at https://www.fastly.com/blog/compute-edge-porting-the-iconic-….
In the case of Fastly, what they basically did is they run the game on their Compute@Edge one frame at a time. First, an instance of the game on Compute@Edge sends a framebuffer (basically an image to be displayed) to the end-user’s browser. Next, the browser on the end-user device merely displays the image and then makes a new request with a state update (e.g., “user pressed key X”) back to the edge where a new instance of the game is again run one frame at a time. So on their edge the game is started, new state calculated, new frame created and sent to the user, and game stopped. They specifically explain in the blog how they piggyback state into the framebuffer and store it in the browser locally and then pass it back and forth since it’s not saved anywhere on their platform side since there is nothing persistent there. The user of course sees the game progressing as usual.
My take-aways from Fastly’s solution: 1) Nobody builds games like this. No matter how fast the runtime starts, starting&stopping it constantly and passing frame images over the network as updates is simply something you would normally never do. 2) In their solution the platform does not know anything about the state (they piggybacked state into the frame as a workaround). For this sort of demo it may not be a big thing but for real-world applications the concept of state can be crucial. 3) They have no multiplayer support and without the platform handling updates across players it wouldn’t be possible to implement it. 4) This does illustrate how Fastly’s platform can receive an individual request, do some computation on the edge very, very quickly, and then provide a response to the client.
So Fastly indeed benefits from the super-fast startup time of Lucet as the game is started for each and every frame update for each and every user playing the game separately. But that is needed only because the game runs one frame at a time for each user on the edge.
Cloudflare’s approach is documented here: https://blog.cloudflare.com/doom-multiplayer-workers/.
Cloudflare is running the game in the end-user browser. So the approach is completely different. They are running a message router on their edge which is responsible for communications between multiple players. The message router is started only once, not constantly as in the case of Fastly, so the startup time of the edge application or runtime becomes irrelevant with their approach. And since they are passing event messages between players (instead of frame images) the approach would be much more suited for real-time games.
My take-aways from Cloudflare’s solution: 1) Their blog post reads like an example of how to build applications for their platform, Doom just happens to be an application they used as an example. 2) Their approach utilizes a platform feature to handle networking across multiple players. 3) Application (game) state is not piggybacked in frame images but the Durable Objects feature of their platform is utilized. 4) It is unclear to me why Cloudflare is using the first person’s game instance as the Doom server instead of running it alongside the message router on the edge when they specifically write that end-user devices should not run security sensitive components. (Think of the horror someone altering the high score table!)
Perhaps a bit ironically, Fastly should be about doing things fastly but their blog ends noting that “this would not be an ideal solution for running a real-time game requiring game updates at a timely frequency” while Cloudflare just published a new blog post how to build real-time games on their platform with Unity: https://blog.cloudflare.com/building-real-time-games-using-w….
So we see how the startup time of the runtime is only a small piece of a puzzle, and the answer does it matter is “it depends”. Fastly obviously benefits from the super-fast startup time in their approach but for Cloudflare the startup time is not relevant since the components are started only once in their solution.
I think the very different approaches illustrate a bit the different mindsets they have. Fastly approaches this from the perspective of a device sending requests to the network for updates and their edge providing those. Lucet’s quick startup time helps here since it’s being started&stopped countless times, actually for each and every player who would be playing the game at the same time. But with Cloudflare’s approach the startup time is a moot point since everything is started only once, and they showcase the features of their platform as part of this effort. Cloudflare’s approach is more like a demonstration of their platform capabilities. But on the other hand, Fastly’s approach also demonstrates how they can handle lots of (individual and isolated) requests on the edge requiring computation very fastly - probably faster than anyone else.
It should also be mentioned that Fastly is developing additional capabilities for their platform and something like Durable Objects on Cloudflare is still in beta testing phase so recommended for business critical use yet. But again, additional aspects like these just underline how the runtime startup time is not one magical metric to evaluate their platforms unlike some would want to think.
With all that said it’s good to keep in mind that there are so much more that contributes to the success of a company and an investment than some low-level technicalities. It’s probably best to let the numbers speak for themselves while still making sure there are no dark clouds arising in the horizon. And based on the above I don’t see anything that would give me concerns around Cloudflare’s architecture, platform, or future - quite the contrary.
Long NET, keeping FSLY on watchlist.