Apex Legends is known for having one of the most “boring” game launches in history in the best of ways. Jokingly called Uptime Legends, Apex pursued a multi-tiered cloud infrastructure that helped EA deliver one of the smoothest PvP launches in history. Sid Dhulipalla, CEO and Co-Founder of Hathora, and Steven Hartland, VP of Engineering at Rocket Science, join our host, Alex Takei, Director at Ruckus Games, to discuss the role of cloud and server orchestration in the distribution of modern video games and the business surrounding it. 

We chat all things bare metal, peer-to-peer, dedi servers, hybrid setups, and their implications on the player experience and the developer’s cost profile. We learn about the landscape of players from public cloud to bare metal to server orchestration to boutique backend teams that are available to studios across the AAA and indie spaces. We also have the age-old chicken and egg debate of “predictability, cost, and flexibility” as relative unknowns for every game launch despite being major components of any game's financial health. This episode is about the backbone of all video games - tune in!

Grid

Big thanks to GRID for making this episode possible. GRID is a game data platform providing esports data infrastructure, analytics, and distribution solutions to leading game publishers including Riot Games, Ubisoft, and KRAFTON. If you're a fan, developer, or entrepreneur with an idea for a live data-powered project, make sure to apply for GRID Open Access, get free access to official data, and start creating today! To learn more, visit: https://grid.gg/?utm_source=naavik.co&utm_medium=media&utm_campaign=Naavik


This transcript is machine-generated, and we apologize for any errors.

Alexandra: What's up, everyone? Hello. Welcome to the Naavik Gaming Podcast. I'm your host, Alex Takei. And this, of course, is Interview and Insights. Today, our focus is on the role of cloud and server orchestration in the distribution of the modern video game.

The menu of options studios have and the financial implications of those choices, i. e. the business model around cloud. The topic is absolutely massive for any server hosted game. Think Minecraft or online multiplayer game. Think League of Legends or Sea of Thieves cloud services and server orchestration make or break the experience.

It does not matter how good the game is unless playable experience is delivered into the hands of gamers. We're going to open with clarification around what this episode is not about why we should care from an operative level about your cloud stack, and precisely what is a cloud stack, an overview of the options that studios have, an overview of the providers, And discuss the different business models present in the cloud business and how your cloud strategy should change over the course of a game's life cycle.

We've got a ton to get into today. So we're just going to jump right in and introduce our guests who are experts in this field. First up, my homie, Sid Dhulipalla, founder and CEO of Hathora, a company that's powering server orchestration and the cloud backend of many up and coming VC backed studios and supporting launches like Frost Giants, recent Stormgate beta, welcome to the pod, Sid.

Sid: Alex, thanks for having me on.

Alexandra: Awesome. And next I have Steven Hartland who has a long technical career in server management, event orchestration, and all things gaming technical infrastructure. Steven, now at Rocket Science, previously spent 20 plus years at Multiplay and has a background in enterprise hosting at Electronic Arts. Welcome, welcome.

Steven: Thank you, Alex. Yes, we've worked with Electronic Arts, not actually worked at Electronic Arts, just a little clarification.

Alexandra: Excellent correction. Thank you. Awesome. I'm super glad to have you guys both on. And before we start talking about anything real today our first attempt to do is to get some of our terminology straight.

Cloud gaming, though not as ubiquitous as buzzwords like metaverse and AI, means a lot of different things depending on who you talk to in context. And today, as our title indicates, we're here to talk about the business of cloud in gaming, not cloud gaming. Cloud gaming being things like NVIDIA GeForce or Google Stadia, rest in peace.

And so Steven, for our audience, I would love for you to kick us off. Do you mind describing what cloud gaming is as to make it distinctive from what we're actually going to talk about today?

Steven: Yeah, so cloud gaming, as you articulated there, there is a lot of misnomer around it. What we're talking about here today Is how do you get your infrastructure up and running?

How do you scale those virtual AstroTurf environments up to ensure that your gamers your customers at the end of the day, have that space to go and play. And that takes a number of different forms. It can have the core service is infrastructure side of things. So think matchmaker, think identity, the ability to just log on, the ability to search for matches.

Or it could be that virtual AstroTurf, that big scale piece where you're playing in an apex legends match, or you're playing in league of legends, that individual instance where you are logged on and you are collaborating with others in that space and attached to a specific piece of compute guy.

Alexandra: Okay. That's really helpful. Gives us some grounding that we're not talking about cloud gaming. We're talking about cloud in gaming. And with that, and now that we have that kind of cleared away I would love to actually pass it to you guys for some intros. Sid, how about you kick us off? Tell us a little bit about yourself and tell us about Hathora.

Sid: Sounds good. So I guess come from a very technical background. I studied computer science in college. I was always really interested in the system side of things, even college. And then I actually went into Palantir out of school and Palantir was making a transition away from selling on prem licenses to cloud hosted SAS and I was asked to get my hands dirty in that transition.

It was a super exciting period for me personally, because I learned a lot about how infrastructure at scale should be run. So everything from, what's the best way for the other product teams that we're using our platform to have like, seamless CI CD, how do we bring observability back to those teams?

So when something's going out wrong in production, they can see all of that. But also when you're spending upwards of 100 million a year on cloud, Cloud. You got to really optimize that because it really affects the bottom line of the business itself. And there's more things around like security compliance and it was great to kind of get my hands dirty across like all of those dimensions.

 I was at Databricks for a couple years after where we were working on multi cloud. So Databricks is a platform was available on GCP and Azure before I joined, but they just signed a deal with Google as well. And my team was basically tasked with the responsibility of hey, can we build a cloud abstraction layer so our internal product teams don't have to rebuild the same thing three times over, but rather they're building against this platform that you guys are building.

And then you guys are responsible for making that platform work across all three clouds, right? And then from there I actually got into gaming because my co founder was the, what we've been friends since college, but he got into programming in the first place because he wanted to build games and then got really deep into building multiplayer games.

And he identified this gap In infrastructure when it comes to like building games at scale and being able to provide it to a global audience, what he observed was, a lot of game studios have excellent talent. When it comes to finding the fun, which makes sense because that's what studios are meant to do.

But when it comes to all of the supporting set of services behind that, it's. With game launches going not as well, it's apparent sometimes, but we wanted to bring a platform to the gaming industry that made, launches seamless, developers focused more on finding the fun, and then ultimately leading to better player experience.

So today Hathora works with a lot of really exciting up and coming studios. Can't name all of them yet, but we're excited for some launches coming up, and once the game is out, we'd be happy to talk about that. But as a product we have servers in 12 different regions, and we take the customer, the studio built games, and then we have it deployed everywhere.

And then when you have players in a certain region of the world, you get our API, and we start that instance for you. That's us in a nutshell. Awesome. Nice. Very nice.

Alexandra: Steven, what about you?

Steven: Yeah I've actually got a little bit of a similar background to Sid. Started at university computer science degree with management.

Originally came out of university and did a stint in in the content management space. So I worked with a bunch of people about scaling infrastructure, scaling databases, backend services. Back in that day, it was all Oracle databases and things like that. Always was a massive gamer.

Loved my loved playing games. Loved, Unreal Tournament 99 was my original game that really got me into the kind of the competitive scene and attending LANs. And off the back of that, that attending lands I bumped into a company called Multiplay in the UK, and they were one of the biggest land providers, still a fairly small company at that particular point in time and to cut a little bit of a long story short, I built some technology that allowed that team to run their land infrastructure both from the competitive side, but also game server scaling.

So add the ability to scale up servers on demand for Counter Strike, for Doom, for Quake, for Unreal Tournament. And that, that infrastructure solution that we ran the events on became really popular at the events, but, and the customers were asking why don't you do that online? So we spun that off.

We created a solution that was very much a B2C solution back in the day. And that gained again, more traction in the industry. We got noticed by people like electronic arts and supported them with commercial instances, as well as the B2C side of things. So we built a that business up, continued to build that business up.

I was acquired a couple of times over. Once by game digital and then a little bit after that by unity, I worked with the unity for six or so years continuing to build that out and that multiplayer product was one of the first game server scalers, game server orchestrators on the market.

And we've had some great success with that technology. So I essentially, I was the CTO of that business for a Quite a few years and our big kind of kickoff success was Titanfall 2 that we built that out across multiple different cloud providers and then smash hit successes with Apex Legends.

After I left Unity, I joined the team over at Rocket Science, where we focus on all of the kind of the backend service solutions. So from cloud server gaining, but more on a contractual basis, coming into big publishers, big developers, as well as some smaller developers, people like Gangbeast, we work with them, ensuring that they've got that backend knowledge, that backend experience to create those seamless experiences online.

Alexandra: Alright, from those intros I am super lucky and privileged to be surrounded by people who have such deep technical expertise in this field, and so, um, I'm really excited because we're going to get really comfortable with a lot of things that are pretty complicated. And so we did a little bit of definition bingo with what cloud gaming is, Steven, thank you for that, and we're going to keep playing this game, and Steven, I'm going to lean on you here again.

There are three terms that I want to get some clarification on because I think it will help contextualize for our audience some of the things that we talk about later on. And these are three things, utility computing, grid computing, and edge computing. Steven, do you mind describing what each of those three things are for our audience so that we can get those terminology straight and so that we have context for how we use them later on throughout the episode?

Steven: Sure. So grid computing really is about bringing disparate infrastructure together, a bunch of computing resources that are spread over different geographical locations, typically to achieve a single task. So you're combining multiple compute units all to achieve that single specific thing. It's not multiple disparate tasks.

It's a single specific task. In comparison. Your utility computing is more akin to what cloud is typically understood as today. So that's provisioning specific pieces or specific blocks of computing resources to specific clients to achieve an on demand basis requirement. And finally, that kind of edge computing component is about bringing that geographically close to the user.

So something that is highly latency sensitive is something that really benefits from. Edge computing, there are definitely challenges when it comes to edge computing and how well you can use it because you don't typically have as much of it about because it is at the edge. It's close to the users.

You don't have massive data centers distributed everywhere.

Alexandra: Interesting. Okay. So, repeating what I heard back. Grid computing is like socialist servers. Lots of combining different resources to accomplish a single goal. Utility computing is more what we're talking about when it comes to cloud, be cloud, the business of cloud in games.

It's like public utilities, like water, you get charged on a specific usage. And then edge computing is physically locating those servers near the users. Which is, I think, very counterintuitive because. I would think the edge is far away, but maybe the edge is close and that is for four games. And so, alright, so that's really helpful.

Thank you for giving us that rundown on those three terms. From, I would love to hear from both you guys, which of these three is, it's obviously ostensible that utility computing is the most applicable for games, but what do we think about edge computing as it relates towards games?

And Whether or not that's actually being orchestrated today, and if it's something that's actually being done Sid, maybe any thoughts here?

Sid: Sure. Yeah, I think edge computing is the one that people think of as typically very helpful for games. And in practice, it He's helpful up to a certain level, but taken to his extreme version, it's actually not the most useful of things.

So when you think about latency in games like staying under a 50 millisecond profile, latency to the server and back is very important. But the difference between 15 versus 3 milliseconds is actually not as large. Because there's just like fundamental like bottlenecks and other parts of the stack including just like rendering a frame in the user's machine itself.

So when you think about edge computing there's a few downsides taken to its extreme. Imagine every neighborhood had a like edge node. That you could schedule matches into the problem then becomes you're going to have a lot of like idle capacity all over the world that you'll ever truly be able to fill.

So there's a cost problem that you'll run into. And then there's a secondary problem, which is player liquidity, right? So if you're trying to do fancy matchmaking, where you're trying to like, pull people together based on like language they speak or rank or, match modes. The set of players that you have available starts to shrink in a pool and by the time you end up, picking like five, 10, 64 players you need to kind of optimize for all of those players and not just the one player who happens to have a note in their neighborhood, essentially.

So that's why a lot of games have gone to the, Hey, we're going to pick like 10 to 15 regions globally and have large singular pools of compute in each region. But, everyone within that region connects to that same pool. And then, It's between like milliseconds for those players, but, it's in the realm of very reasonable while balancing costs and other things like matchmaking.

Alexandra: That's pretty interesting. And yeah, and actually an excellent segue because this has a lot to do with the player experience and why we care about the cloud stack in the first place. And I think this would be a great time to jump into, why do we care? We're talking about, oh, the difference being 10 between each player, but why as a developer, beyond that, should I care about my cloud stack?

And this is what's going to inform the menu options and the implications of each options, whether or not you wanted to choose a true edge compute or this geographically server co located setup that you just described. What does a cloud provider actually do for me and why do I care as a game studio about what I, how I select?

How should I think about that, Sid?

Sid: Yeah, absolutely. So I'd say like the number one thing at the end of the day is like player experience players are not having a good time. Then the game is going to suffer. So even when you think about your cloud stack, the first thing that you should be thinking about is okay how does this affect player performance?

And there's a few different dimensions within player performance, right? So the first we've talked a bunch about is latency. But beyond that, too, you have Concerns like availability. Suddenly you have 10 X the players do you have enough computer available in that region to handle that and how quickly can you add new capacity?

And then the last piece is reliability as well on that front. If your players are hopping in and their servers keep disconnecting every 10 minutes or, they have a high like dropped packet rate, it's also a poor experience, right? So performance player experience as a whole is like one bucket.

The second, I would say almost equally as important, but just slightly less important is cost because you can have the most cost efficient thing. But if it leads to a poor player experience, it's still not like that. That's not good. But cost does matter at the end of the day.

And so if you'd like to play purely on the cloud, there's like a set of cost considerations. If you deploy purely on bare metal, there's another if you don't even use dedicated servers, there's another cost consideration. Transcribed That's something we'll probably get into a lot later on as well, so I'll leave the costs there.

And then lastly, you know, reasons like three to ten there's like a whole bunch of them. The top ones that come to mind are like a developer velocity, right? Like, how quickly can you ship updates to your game? If you're doing like, seasons do you feel comfortable launching on a fast cadence and feel comfortable that when you're doing this you have enough capacity and all the other problems that come along with it?

Then you have concerns like security are you patching the underlying machines fast enough? Are you getting, are your firewall rules set up correctly? Are you, do you have your DDoS protection set up correctly? And there's a whole bunch of things around security as well. But I'll pause there, concerns like, five through ten are probably like way lower tier than the first four.

Alexandra: Yeah. But again, all of, a lot of it boils down to one The player experience. For security, it's also a player experience question, right? And then a developer experience, which is how much money am I going to be making for the services that I'm providing? Hopefully I'm providing a great game and a great experience, but, is my margin going to be so diminished by providing this great experience that I, as a developer, won't be able to continue my vocation?

And that's actually some really helpful grounding around security, latency, reliability availability, as you said, Sid. And Steven, I know that you have experience working on some pretty big titles at like the hyperscaler level, like you said, in Titanfall and Apex Legends. And I actually recently read an article in preparing for this episode about Apex's multi tiered cloud infrastructure.

And for anyone wondering, the article is aptly named Uptime Legends and is linked in the show notes, or rather, I'll link it in the show notes. And Steven, could you share a little bit about their strategy and how that contributed to the roaring success of the Apex Legends launch?

Steven: Yeah, so as Sid articulated really well there, consistency and player experience are utterly key, right?

But not only that, the ability to scale to the demand again as mentioned and one of the problems there is people will hear the word cloud and, they'll assume cloud is infinite, right? You can just keep requesting more and more cloud. That's actually not the case, right?

When you're at that higher tiers, at that larger scales, you actually need to be collaborating with your cloud providers to ensure that there is going to be enough capacity. So one of the ways that, you can reduce the risk Essentially to you and to your players is to ensure that you've got this kind of tiered solution.

And that was what we built over at multiplayer, which was what we called it was a hybrid solution. So we had the ability to select regions as Sid articulated earlier, but Where did that capacity for those regions come from? And our initial piece, talking back to the cost element was coming from bare metal capacity.

So that's like layer one, tier one, it's worth of capacity. But that's takes a long time to provision. It's on a monthly cadence in terms of renewal. So you don't want to buy too much of it. You don't know when you launch a game, are you going to be wildly successful, mediumly successful? Or underperform.

So there's a risk in terms of the cost of your business there. And the next year is. Your primary cloud provider. Now that might be from a cost perspective, you might have a deal with one of the major cloud providers. You might have commitments with one of the major cloud providers where you get you, you have to say that you've got to use a specific amount of compute.

It's going to cost X, but you have to guarantee that you've got to lose it. And you've got to, if you don't use it. Then you lose it, you still have to pay for it. And the third kind of tier really is what is your fallback? In the situation where your cloud provider doesn't, your primary cloud provider doesn't have enough capacity, what are you going to do at that particular point in time?

So that's your third tier. But there's also a an additional reason for having that kind of third tier and having multiple cloud providers and being able to be cloud agnostic. And that's in the case of failures. Thanks. We had a situation at one of the games comps where there was a major outage on one of the cloud providers over in the U.

S. And the Apex Legends solution that we put in place for that team stayed online because we were able to transition our players away from the affected compute and onto a new one. Totally different cloud providers compute bypassing the outage in its entirety pretty much seamlessly across the board.

Now there is a cost implication for that, right? Because you're spinning up an entire new set of cloud compute in a different location. But in terms of the impact to your players, you're minimizing that impact to your players. So again, as Sid mentioned there there's a balancing act that the developer and is doing between experience and cost.

Alexandra: That's a really, yeah, that's really interesting. And the way that I'm thinking about that is it sounds like Again, business, but there's the debt payment waterfalls where you have your like senior secured debt and then you have your mezzanine and everything else. It sounds like cloud set up similarly where you might have a stack of hybrid solutions where you have your dedicated, if you achieve escape velocity beyond that, you bounce into the cloud zone and then go up and up in tiers.

And then in summary, what I'm reacting to is that it just seems for Apex legends, their strategy was, to be partnered with, I think, almost three cloud providers. They were partnered with GCP, Azure, and I think also Amazon. And so the idea being here is that just not one provider works.

You need to have backups to the backup in the case of failures and to keep that, insulated. So thank you for sharing a little bit about that experience. And, again, Apex is obviously one of the biggest multiplayer games and, a lot of why it's been so successful is because launch was successful, players were able to access the experience and continue to play every day since it's launched.

Steven: One of the big challenges there, just to jump in for a second was, how do we test that? How do we get it to that state to the way we knew it was going to be successful? So the, that, that multiply architecture that we built out that supported that game launch was tested to millions of concurrence in very short amounts of time.

I'm sure there's a video knocking around on the internet somewhere of us, of showing the figures. It was up into the multiple millions within 15 minutes worth of capacity. So that, and that really allowed us to give that seamless experience to the users. One of the things that a developer always wants is a boring launch from the infrastructure.

Alexandra: Yeah, definitely a boring launch. Sounds good. Sounds really good. Yeah, I guess let's talk about how we can best find our way to a boring launch. In talking about the cloud stack and the overview of options, and we all know the phrase builder by and I would assume there's a lot of gray, though, in between here we'll get to this later on.

But one of the thing my favorite things about cloud and for us that there's not only options to build or buy, but they're also the answer to whether or not to build or buy is dependent on the stage of development. And Steven, since you've been in the engineering driver's seat at Rocket Science and in many of your roles in the past, do you think you could give me an overview of the options of the modern day technical architecture for each, and maybe give me one to two pros and cons for each of the four setups, which I loosely understand are Bare metal, dedicated, peer to peer, and hybrid.

And we talked a little bit about some of them in the Apex Legends example, but.

Steven: Yeah bare metal it is what it says on the tin. It's typically compute, which is rented out on a month by month basis. The benefit of that, the real big benefit of that, is it typically comes with a large chunk.

of free, essentially, bandwidth, right? Games are bandwidth hungry. They are not like a website where, you can ship tens of thousands of page impressions in a really small amount of bandwidth. You are typically shipping around hundreds of kilobytes per second per player. When you've got large instance servers, you are shipping around a lot of capacity for a large amount of time.

So when you've got that, those bare metal instances and they come with multiple terabytes worth of capacity bandwidth to be able to use for free, that is where that kind of big benefit is. The con of bare metal is how are you going to automate it? How are you going to orchestrate it? How are you going to set it up?

And that is where you need a solution like Multiplay or like Havera to ensure that you've got that automation that can take care of the heavy lifting of that.

You can. Make the wrong decision in terms of a specific instance size and adjust it really quickly. Typically spin it down, spin it up within minutes, if not seconds, depending on what you're doing there. So flexibility is the ultimate thing there, but the challenge comes from the opposite side of what bare metal is.

And that's cost of bandwidth, right? So you're playing. Per byte, per kilobyte, per megabyte, per gigabyte's worth of bandwidth. And when it comes in comparison to what you can buy though, that capacity kind of wholesale, you're actually paying quite a steep premium for the flexibility. But that flexibility is not just flexibility also comes with a great level of service, right?

That sits behind it. So you've got to get really good reliability. You're going to have to a certain extent DDoS solutions built in. So you are weighing the pros and cons there. And the final one that you mentioned there.

Sid: To add a little bit of color to the cost to the bandwidth cost angle, cloud providers have almost a 97 to 99 percent margin on the egress bandwidth charges that they have.

So it costs them almost nothing, but it's more of a strategic play for them to lock the workloads into their cloud platform, essentially. So that's why bare metal providers are able to like, offer that at a significant discount because the underlying cost is basically nothing.

Steven: Yeah. So, that's where the if you can balance it correctly, you can get the bit of, the bit, most of best of both worlds.

And that's where the hybrid solution comes in. For the multiplayer solution, which is now run by Unity. We were looking at a 70 30 split to get the optimal kind of cost bandwidth. So that's 70 percent of the capacity would be in bare metal and 30 percent that burst capacity would be hosted in cloud.

And the final one that you mentioned there is peer to peer. Now that is a bit of a. An edge case piece, because when we were talking earlier about, what is the player experience, one of those players is going to get a really great experience because they are the host. But as we all know the internet, the connections that we all sit on at home can be particularly unreliable, so you pay the penalty by reducing the costs of your overall hosting because it's free.

You are not paying for a server instance, but your player's experience might not be ideal. And that comes in a number of different forms. The first one being latency. You will get a host what's known as a host advantage for the player that is hosting that server instance and everybody else is connecting in.

But also from that is, if it disconnects, everybody's go to fall down unless you've got host migration built in and then You're adding complexity. So when we mentioned, what is the developer experience? What is the, how do how much effort is it to build these solutions? It starts off simple, but adds in complexity when you have to think about things like host migration.

And the one kind of final piece that sits between all of those is. When you've got something where you need. The security or the confidence that somebody can't change the game experience. So if you've got a competition server, as an example, a competition server where you want to ensure that there's not an aimbot running, and I say, ensure that's a really hard thing to do, but with a dedicated server, you have additional control over what.

You will let's do so you, the server can say no to a player traveling at a million miles an hour across the bench and appearing over there because it can do validation against it. Sid, anything else to add on those kinds of points?

Sid: Yeah, I think you hit the nail on the head. I would just kind of frame it as like maybe like two different axes though, right there.

One is like the net code axis, which is like. Peer to peer or relay or deterministic lockstep or server authoritative. And then the other dimension is if you need servers. Then, the spectrum of like bare metal all the way through pure cloud. I said, there's some overlap there between the two, but really they're two spectrums and the decision that you take on the net code impacts some of the options that you have available on the hosting side.

Alexandra: Absolutely, yeah, and maybe that's actually fairly helpful framing there, because I think what's also, coming from the gaming perspective, what types of games use peer to peer relay servers versus, like you just talked about, and obviously I would believe that in a competitive game, the competition server You need the competition server because you don't want someone to be able to play JIN and have infinite, infinite ultimate like in, in League or something like that.

And Sid, maybe you can, I'll lean to you on this, but can you help distinguish which games, what types of games and genres use each of those options most frequently? And maybe also throw in there a little bit of a curveball on talking about when, games like Minecraft that, where players host a server.

Sure. Where does that come from? Where do they get it? And are they player servers? In the sense that you were talking about, Steven, where the host, the machine, is the host. Sorry, the host machine is the host. Or, they're contracting with another cloud or server provider.

Sid: Yeah, for sure.

Peer to peer was extremely popular in the 90s. Just because like, hosting servers was just ridiculously expensive back then. So if you look at like, you know, the original Halo game. Halo CE, like you would spin it up, you would host a match and like players would connect to your server.

The definition of peer to peer evolves over time, but ultimately what it means is there's no dedicated server somewhere hosted by the game developer, right? And so within peer to peer, you can you can have like relay servers that pass the packets forward, things like Steam Relay or Photon fusion are great examples of that.

Or you can have player hosted servers. Like Minecraft is a great example of this, where I can go run Minecraft anywhere in the world. I could be on like, my laptop. It could be, I could pay a service like nitrato to, to run my player hosted server but at the end of the day, everyone's connected to a server somewhere.

And then you have the, the more like dedicated Hey, I'm the studio and I'm going to be responsible for the network experience and the player and the computer experience as well. And so even within that, there's like a spectrum, right? You take a game like Starcraft 2, for example, right?

Starcraft 2 uses what's called deterministic lockstep. Which basically is a set of actions that the player is taking will get relayed to a central server and that server will filter out actions that it thinks are like cheating and then re admit the actions that it thinks are saying back to all the players, right?

So it's closer to the peer to peer, but like a little further along in the anti cheat and some more credibility of the actions being taken, essentially authenticity of the actions. Okay. It's validation, isn't it, Sid? Exactly, it's validation. And then, you have your pure dedicated server options, and most modern, super successful multiplayer titles all do that.

Including, Halo today, to, League of Legends, to Fortnite. And in the in that model, what's actually happening behind the scenes is every player is sending their actions to the server 60 times a second, and then the server basically compiles all those actions, does the physics simulation, and then it re emits whatever relevant state to a specific player back to that player.

So just because I tell the server, I've moved from one end of the map to the other end servers I don't believe I don't like, there's no action you can take that would make that possible. Most games that are either competitive or, even a lot of like live ops titles, where if you're, you know, selling skins and you want to ensure that players who are appearing a certain way in the game have actually purchased that skin are all have all moved towards a dedicated server model.

Now the downside of dedicated servers is it ends up requiring a lot more compute and bandwidth on the server side. And that's something the studio has to decide if it makes sense for the business model that they're going for. And so in a lot of cases for like indies and even some super successful titles, the unit economics of dedicated servers don't make sense.

Like it would cost you more to provide dedicated servers for all your players than That you would ever make from each player in that game.

Steven: And there is one other component that jumps into my mind about that and that's about the scale right so you think of fortnight you think of apex legends where the pure number of players can actually overload the capacity that would be available if that was a peer to peer installation, right? Yes, we've made great boundary great jumps in bandwidth available, but there's, there are still locations which typical DSL, so you're talking five meg up and you simply couldn't provide the capacity for a hundred players if it was, Being hosted in a peer to peer situation, and that was one of the drivers between Titanfall one of our original titles that we supported and that was, they wanted bots, they also wanted that not only the ability to run large player numbers like they did with Apex Legends, but when it comes to bots, they wanted the compute to be able to comentáriosator Augments the experience that would otherwise not be possible within a peer to peer or direct connectivity experience.

Alexandra: Interesting. I see. Yeah, because in the peer to peer dynamic, you're basically relying on the player's infrastructure and their home country, their neighborhood, their house specifically, which is, I think, really interesting because you're outsourcing that to the player. To Friday. Um, And very interesting.

It's kinda like VAT tax.

Okay, so we got some grounding on all of these different types of dynamics and options, bare metal, dedicated, whether or not, where you're providing servers and as you as a developer is responsible for that experience, peer to peer hybrid solutions. But I would love to get a little bit of an overview of who are the actual providers of these services, and they're all the big names that we know.

In the cloud space, like AWS, Azure, GCP is public cloud. Tencent, as some may know, is investing heavily in its cloud division to provide platform tools to its developers. And these are general cloud providers. Steven, can you tell me a little bit about four games specifically? Who are, like, the big, like the companies that work with AAA games, who are the companies that work with AA, and who are the companies that work with Indie when it comes to cloud services?

Steven: I think, from the cloud perspective, it's all of the usual suspects that you mentioned there. AWS, GCP Microsoft Azure, and Tencent, if you're looking further afoard, particularly if you want any kind of support in China. The China firewall makes it really difficult to be able to provide good services from AWS or GCP within the the China region itself.

And it doesn't really make too much of a difference whether you're an India double A or a triple A when it comes to those pieces. They all provide a great experience. One of the more kind of standout uh, items that. Come is, how do you orchestrate those solutions, right?

So does the cloud provider have something where you can just go and turn it on a SIDS company does? So if you're the smaller Indy and you don't want to have to go and create that orchestration solution, you are going to be looking for a cloud provider that has a native. Native solution for it.

So for example, Amazon game lift, you can go and integrate their SDK. You can go and put it all together. Where there's the GCP solution is more of an open source solution, which is going to take you a little bit more effort and you don't have somebody that's gonna. be able to answer your questions at a drop of a hat.

So you won't be able to, as a triple A, have that enterprise support agreement. So when something goes wrong that you can reach out and get that level of support. And if you've built your own kind of solutions, that's on you to support. It's only going to be the infrastructure side of things.

To answer your question, I think you're going to see. the same providers used across all of the range. It depends on exactly what that particular customer is used to being created. What solutions do they have?

Alexandra: And then, yeah, guess what about the bare metal scene? I don't know anything about those players Sid.

I know that you're working with a lot of them.

Sid: Yeah, so Alex, when it comes to bare metal providers, there's a lot of different varieties that come in. But, Equinix is one of the biggest names in the world. And they handle the full spectrum of services like you could rent colo space from them where you can purchase the hardware yourself and just they'll just lease you a rack as you get further along the spectrum, you can start leasing just a server in a given rack. On a month by month basis. It's closer towards the cloud model, but slightly longer terms, but at a better price. And then funnily enough, like the cloud vendors also offer bare metal instances, which are basically the same thing as.

They just don't have the hypervisor running, right? But so some of the in between options, I would say, are things like our companies like i3d, servers. com, Volter, ZenLayer, all these companies, they have their own Kobo spaces, and like they purchased the hardware and racked it, and they're ready to lease you a server at a time for a month by month basis.

And those are the most interesting ones for a lot of game studios, because it's that sweet spot of operationally, they're not as hard as going and leasing ever like purchasing land and building everything. But on the other hand, they still give you a significant savings in terms of bandwidth and compute.

Alexandra: Got it. Okay, that's some really good context. And then briefly would love to touch upon. You spoke a little bit about what Hathora is doing and Steven, you've pointed to, Oh, you need a partner like Hathora to turn on the bare metal. Where does Hathora specifically sit in this market?

And then Steven, we're going to go to you and talk about where Rocket Science specifically sits in this market. So Sid, how about you kick us off?

Sid: Cool. Yeah. We're a very targeted part of the stack, right? Like we don't have a lot of like broad services, For example, like matchmaking, identity, and authentication.

That's all stuff that we integrate with other providers. But the part that we focus on is, alright, once you have a pool of players in a given region that are ready to enter into a match, and you want to tell them, hey, go connect here, and you can start playing, that's where we come in. So your matchmaker is going to go through the whole flow of saying okay, these five players are, ranked approximately the same.

And given their like geographic spread, like Tokyo is the best place where I want to serve or spun up for them. So in the matchmaker reaches out to Hathora. And then Hathora basically says, all right, like you want something in Tokyo and I know what you've uploaded and what we need to start here is that server started up in under five seconds.

And here's the connection details that you can pass back to your players to connect to, right? So that's like the rough flow of activity, but when it comes to like behind the scenes, like what's the magic that's actually happening, there was a lot of concerns around where do we schedule this game?

Is it on bare metal? Is it on cloud? Do we have enough cloud instances? Do we need to scale up more cloud instances? And when it comes to bare metal, actually like provisioning and like procuring the metal and enrolling them into our fleets and patching them and keeping them updated.

That's all of the responsibility that Hathor takes. So to our customers or our, the game studios or the developers at those studios. It's this super seamless experience where all they're doing is they're uploading their builds over to us telling us, okay, this is a rough percentage split of bare metal versus cloud that I want and then integrating with their, integrating their matchmaker into Hathoro and suddenly, they get all of the player experience benefits that we've been talking about and the cost benefits of, running this hybrid environment.

Alexandra: Interesting. Okay, got it. So you're like a I'm a little bit of a middleman between some of the cloud service providers where, you know, as a studio would work with a Thorin and Hathor would help them basically get to players in the best way possible, whether that's through a combination of bare metal and cloud or just cloud only because it's in Tokyo or something like that.

Is that? Summararily accurate.

Sid: Yeah, it's pretty close.

Steven: The word I'd use for it, Alex, is orchestration solution, right? Taking all of the complexity at the lower level, bringing it up and providing that nice level of SDK abstraction to be able to just request a capacity where it isn't needed.

What do we do over at Rocket Science? We're more around custom solutions than what's it does over at Hathora. If you think about publishing platforms, which has got all of the components from identity to economy to matchmaking that's where rocket science really specializes. We'll come in, we'll help build some of these.

Kind of custom infrastructure components in a way that we know will scale to the ultimate demands of of a game on one of those kind of those, that you only ever get that one attempt at a game launch, right? You don't want a solution, which is going to happen. Any problems at launch, mentioned boring launches and that's where rocket science comes in, but we're actually a company that is it's got two components to it.

So rocket science has two sister companies, so to speak underneath it, which what terminal velocity is the one I've just articulated around publishing platform backend services. Whereas atomic theory actually sits to the other side. So that's more about. UI, UX, in game components being able to work with the unities on the Unreal engines to create these game experiences.

So that's where we sit.

Alexandra: Very cool. Okay. I got it. So Hathora is like for people who want to do general stuff and rocket science, maybe for people who want to get a little bit more particular or a little bit more specific, and you guys would work alongside and be that kind of backend team partner for another game studio.

Steven: Yeah, absolutely.

Alexandra: Awesome. All right. So we've got an overview of cloud. We've got an overview of the providers. And now we can finally shift to our favorite topic, which is business models. And so I'm going to take a little bit of this here. Because I worked on a enterprise deal with Google Cloud at Activision Blizzard King, a three year multi, multi business unit, multi vertical deal and so I unfortunately know a good bit about cloud credit programs, burst models.

PNL and cost classification of different types of cloud. We talked about this triangle, the predictability, cost and flexibility. And I guess I'm going to give an overview of the standard models that exist, though these are not necessarily tailored to games. And I think one of the really interesting things here is that Because so many of these companies like Google, like Azure, like Amazon are public cloud providers, a lot of them are actually not optimized to be a win solution for businesses in games.

And it's actually rather inconvenient for games at times. But the standard cloud computing models as I know them are obviously infrastructure as a service. This is something like AWS and EC2. You know, The provider gives a basic computing infrastructure, I. T. Professional setup for your instances and network structure and you can remove or add them as you need.

The benefits are obviously better security, network electric system reliability. And the cons are, though, that you need a skilled IT staff, and they aren't your servers, but you have to act as if they are. Then there's platform as a service, which is something like, Amazon Elastic Beanstalk, which is the computing platform is provided for me.

My company is responsible only for the application software running on that platform. And the benefits are that you don't have to set up your own network, or config, or your own computers, and you have much less need for an actual on hand IT staff. But you don't have access to many of the operating systems involved.

I guess that's it. To me, it's maybe more like black box ish. Then there's serverless which is I don't have any virtual computer instances. I just have the functions that I want to execute. I pay for the compute time that I consume. And the cons for this is that it's not really very good for anything with strict real time requirements.

So games probably don't. Use this, obviously, because we just talked a bunch about having servers. And then there is the software as a service model, which is the company's providing software to people as a service, obviously, this is something like Google docs, but. Probably pretty self explanatory and probably should skip this.

So putting it all together here Steven, you pointed out at the very beginning of this episode there is this chicken and the egg problem. If I'm a game developer, I don't know how successful my game is gonna be. I don't know if it's gonna be a hit. I don't know if it's gonna underperform. Can you tell me a little bit about which of these business models I should prefer and when should I prefer them?

Steven: It very much, it does depend. But if we were to focus in on the scaling of that virtual AstroTurf, where we've talked about quite a bit in this episode, the key one is being able to control your costs. It doesn't matter whether you're an indie developer or an EA running Apex Legends, you need to know at the end of the day that you're going to be profitable.

in the solution. So understanding your costs is the key piece. Now if you are on a very small scale and maybe your Just currently developing your game. Being able to spin up a cloud instance for five, 10 minutes to run a test and then turning off is likely to be one of the most cost effective solutions, but it only stays cost effective.

When you're only using it for a small portion of a running day, for example. Now we'd mentioned earlier around the 70 30 split and that's, when you are using compute consistently over time you want to be. Committing to that compute because you can, for that, you can reduce your overall costs.

Now, there's a couple of different ways that you can do that. One with bare metal, where you're committing to a month by month instance, or even in compute, there are committed usage models within that. At the end of the day, it is that kind of balancing thing that you need to do. You need to understand what you're using for it, how much of it you're going to be using, and what time period of it that you're going to be using.

If you've got your game spikes up to 100, 000 players at night, but it only stays there for two hours, and then it's back down to 50, 000 that doesn't make sense to put in a large amount of bare metal to cover that because that bare metal you're going to be paying for 24 7, 365 or 24 7 for the month, right?

Yeah, you just use that to burst. Yeah, exactly. So you do want to look at that overall profile of when you need those resources. But similarly, the other complicating factors like shipping your updates. Now, one of the components of the orchestration system similar to Hadoorah is going to be that distribution of those images.

So you're not just talking about the compute running it. You're talking about the network bandwidth. Talking about the disk storage, you're talking about shipping your game title to each of those instances and being able to maintain the updates as well.

Alexandra: Yeah, I think one of the things that I think is pretty interesting is, you're talking about, bare metal kind of being that first line of defense.

And I think one of the other interesting things, this kind of folds more into the how profitable a game P& L appears is that CapEx dedicated servers are typically considered below the gross profit line, whereas cloud fees are in COGS. So they directly impact and so as a game evolves to becoming a little bit more stable, consistent, you talked about that 50 K of concurrence, spiking sometimes to 100 K of concurrence.

Once you have that reliable data, a lot of times the strategy would be to shift off cloud. to a dedicated server model so that you can put that stuff in CapEx increase your margins to float a better stock price, et cetera, because that stuff matters to the street. And all cloud costs are considered variable.

And so I think this is actually something that's pretty interesting because at the beginning of a game's journey, Variable costs are probably the best because you do not know which you are going to need. However, if you're Overwatch, League of Legends, Apex, three years out from launch, now you have a very consistent, reliable MAU or DAU that's predictable, you'd shift off cloud and you'd move into CAPEX to improve, basically PNL health.

So I think that's actually really interesting. And that's why I love cloud. Cause it's a very cool touch and go between finance, business and technical infrastructure.

Sid: No, it's really funny. What Like you're saying and how much it resonates with our journey at Hathora. So when we actually first started the company, we wanted to be the serverless platform for these multiplayer games, right?

And existing options like AWS lambda or Google cloud functions like you were mentioning, they have limitations that don't make them great for gaming. And while from a technology perspective, we were able to solve all of the problems. And studios loved it while they were in development. There was a major set of concerns that started coming up right around launch time.

Right, and a lot of that comes down to a lot of these serverless options are extremely high unit costs, right? If you look at how alias lambda is priced compared to how EC2, the underlying EC2 equivalent of that is, you're paying almost like eight times more per vcpu hour if you're actually exhausting that node consistently.

The advantage of serverless is if it's bursty, great someone else is taking on the risk of the unutilized compute essentially, but someone has to pay for that somewhere. And so when we started talking to studios about like launch today volume, and then, starting to put together estimates on our serverless offering, they're like, Oh my God, that's so expensive.

And we're like, okay what if we shift to this more like standard like compute reserved compute model. And everyone felt happier in that model. We felt happier as a business because we knew we had a more predictable amount of compute that we have to spin up. And we weren't taking on any of the risk of underutilized instances.

And because we weren't taking that risk on, we were able to fairly price the product to our own customers. But, imagine a world where we didn't know and we spun up like, 100, 000 instances anticipating a big launch, it's a, it underperforms and we only get paid a fraction of that, right?

That's the scary situation for us that we need to protect it. We needed to protect against. And so to do that, you basically overcharge or what you're offering essentially.

Steven: Yeah. When you're looking at the scaling equation. You're always looking for this safety buffer at the top. If I've got 10, 000 players online, I want capacity for 11, 000 players, right?

Or 10, 000 20, 000 players, depending on the velocity that happens, right? So one of the challenges when it comes to building an effective orchestration solution is giving users the right level of flexibility to go. Actually for my title. I know these facts about it, right? I know that I'm going to have short match times.

So when I've got short match times, that's going to impact how many instances that I'm going to cycle through on an hour by hour basis. What does that mean? And so providing that level of flexibility both from the technical solve, but also from the business model solve. Is really crucial. And it does come on to some interesting additional business model that you'll see from people like Google cloud, where they do offer the decaying discount, so to speak.

So where if you have used a larger amount over a period of time, you start to see a discount kicking in. And that's a really interesting component that does play into the complexity of the equation.

Alexandra: Yeah. And it's also hard to track. There's all of this, I think when the more complicated the business models are, something you have to consider is actual just governance of that business model.

Ratchet scaling fees different types of tiering it's stuff that, gets very complicated. And then you need like a full time You know, you need a full time deal team on one side and then the other side to manage it and be like, okay, you paid me X, but I actually should have paid you Y and this is the data that I have collected on my side that sees this and then the other side says something else.

So it's actually quite, it's really quite interesting that the although some of the business models need to be tailored to games to make them, the, most financially fair to both the server provider and the server orchestration side and for the studio that does come at another cost, which is.

It's complicated. But our final topic Yeah, management, so our final topic though is, talking a little bit about the bare metal business models. Instead, I'm going to lean on you here to close us out. You've been working with a ton of them to lay out stuff for Thora. Is there anything really here that's very interesting to you, as a server business that you think about pretty actively in terms of which ones that you like to partner with and why?

Sid: Yeah, for sure. So first and foremost, as a company, we put player experience as the most important thing that we should care about, right? And when you think about bare metal providers, there's a huge spectrum of them, and there's many dimensions of these spectrums as well. But one of the spectrums is how much does their egress bandwidth or how much do they charge for bandwidth, right?

Some providers are like, hey, you buy the compute, you get free unlimited bandwidth. The subtext there is, the bandwidth is capped to, a one gigabit per second uplink or. It's actually, they're just dumping traffic onto the cheapest provider that they could find in it. Like the, it hops over all over the world before actually arriving to your players or the routers are so poor that it drops half the packets that are coming their way.

So we've done a crazy amount of benchmarks on network quality and we've basically have a ranked list of providers that we feel. Excited working with and we've only really worked with the top three in that list when it comes to what's actually enrolled today. So network performance is like one aspect and then there's the other piece, which is like the term length that you're acquiring these leases for, right?

So some bare metal providers have actually straight up gone into competing with cloud vendors and they're like, you can rent. By the second from us, right? Like equinix acquired a company called packet and launched this offering called equinix metal. And you can hit an API and be like, Hey, launch me an instance in this region and start billing me by the second.

And if I don't need it two hours later, stop billing me for it. The problem there is the compute costs are almost as high as just launching on AWS or GCP with not a much better savings on bandwidth either. So then, and then on all doing the other end of the spectrum, you can literally go acquire the land.

You could go, build a data center, cooling power, internet cables. There's 2 other intermediaries between either extreme, right? So one closer towards the I'm going to do everything myself and I want more CapEx instead of OpEx, is the, I'm going to rent Colo space. What that means is some other company has done the legwork of like building the warehouse and the server racks themselves.

And it's set up such that they have people on hand that if you ship them hardware, the people on hand will go rack them for you. So you will pay Intel, AMD, all your like RAM storage directly. And you have to pay that up front, unfortunately, whether you get used for that over a three, five year depreciation period or not.

But what that means is, you can get them. If you do have good sustained use for it, you'll end up saving crazy amount of money. The challenge with that still is even if you have these like remote hands is what it's called, like actually kind of racking them for you. And if there's any hardware failures, they'll go fix it.

You're still on the hook for like managing the software that runs on it. Everything from the operating system to like, all the packages, et cetera, et cetera. And it's a lot more operational work. So then the next sweet spot is like the providers that we mostly partner with. And they're the ones Steve has been alluding to as well.

The I3Ds, the gcores of servers. coms of the world. And what they do is they go through all the racking. They purchase all the hardware as well. They say, Hey, here's this server. This one costs you 300 a month to rent. And at the end of the month, you can say you don't want it anymore and we'll take it back.

But for the period of the whole month, you can't give it back to us. And that usually is the most popular. It's the thing everyone thinks of when they hear bare metal nowadays, because most companies have lost the muscle for like racking hardware on their own, essentially.

Alexandra: I see. Got it. Okay. So it's very uncommon now for companies to be doing. I'm just thinking of the Silicon Valley episode where they build their own servers and they like bring it in a truck somewhere. No one's doing that anymore. Basically is what you're saying.

Sid: Very few companies are. Like obviously the Amazons and the Googles and the Facebooks of the world are, but it's become very hard to compete against those companies for the talent that you need to build it in house. From your, how your accounting works you're, you have all these like assets on your books that's not directly tied to the alpha of your business.

And there's all these open concerns people have that made cloud a lot more appealing, but there is some like pushback or some rethinking of is cloud actually giving me a savings that were promised that's starting to come up again.

Alexandra: Yeah it's definitely a question for sure, because it is expensive, right?

And actually, when you were talking about, the, all the way to the right, I'm going, I'm getting a land lease, I'm building the warehouse, I'm racking the, I'm racking the racks, I'm putting the servers in there. Heard about, there's actually a pretty interesting company that is, that's actually a real estate company, that's renting out the empty space inside office buildings, I think elevator shafts ceiling hedges, right?

To hold servers that have nothing to do with that company's building. Let's just say I'm the Uber office and Uber office has a bunch of empty space for like air vac, et cetera. And can you rent space inside of a building to store server racks? So that's kind of like, uh, you know, you mentioned that and I started thinking about that.

Steven, I don't know if you have any final thoughts on bare metal business models.

Steven: Yeah. At the end of the day we did that multiplay, right? We did do the racking and stacking. Of hardware. We were purchasing super Micro one U servers, installing Dell Blade units, installing them.

We had great success running Minecraft servers and all of those kind of BTC with that model. Having our own dedicated 10 gig lines here, there, and everywhere, Frankfurt, Amsterdam, London, with all of the peering arrangements that go with it. Sid mentioned, you still need a really great networking solution.

At the end of the day, you can't be handing off to the cheapest provider because they will root you badly. They will put you on an oversubscribed network link between two points. And that will result in a poor player experience. So yes, there's definitely that real kind of range of equipment and where the industry sits today, the I3Ds, the servers.

com of these worlds is if you want that level, that's really where you need to be playing. It's a lot more overhead than anybody would imagine to do the thing yourself from the scratch, like we did today. Many years ago.

Alexandra: Got it. That's great context, guys. And team, we are out of time and honestly I wish we could keep going.

This has been super interesting to go over the overview of all the different options across basically getting games in the hands of players in the modern age. Really appreciate you both coming on and sharing this and your expertise with the audience. If anybody wants to get in touch with Hathora or Rocket Science for services how can they do that?

Sid: Yeah they can message me. My email is [email protected].

Steven: Yeah. And from the Rocket Science side, you can contact us on our website or my email address is [email protected]. Awesome.

Alexandra: And guys uh, so that is our episode. As always friends, if you've got feedback or ideas, you can hit me up at [email protected].

I'm always open and without we're out. See you guys next time. Sid and Steven, thanks for coming on. Bye.

If you enjoyed today's episode, whether on YouTube or your favorite podcast app, make sure to like, subscribe, comment, or give a five-star review. And if you wanna reach out or provide feedback, shoot us a note at [email protected] or find us on Twitter and LinkedIn. Plus, if you wanna learn more about what Naavik has to offer, make sure to check out our website www.naavik.co. There, you can sign up for the number one games industry newsletter, Naavik Digest, or contact us to learn about our wide-ranging consulting and advisory services.

Again, that is www.naavik.co. Thanks for listening and we'll catch you in the next episode.