World models are rapidly becoming AI’s next frontier, and in this episode we break down why. Host, Alexandra Takei, Director at Ruckus Games, sits down with Pim de Witte, founder of General Intuition and Medal, to explore how billions of gameplay videos can power a new class of embodied agents. Pim explains the fundamental gap between language models, which describe the world, and world models, which simulate the world, capturing how objects and agents move, react, and evolve in space and time. The conversation digs into why video games are an ideal training ground, including but not limited to consistent first-person perspectives, action labels (if you design your data set that way), and optical fidelity that platforms like YouTube can’t provide.

Pim walks through General Intuition’s technical approach, why cross-game training unlocks more human-like behavior, and the specific limitations still unsolved, such as multiplayer consistency, long-horizon coherence, and the cost of large-scale inference. They explore what studios can expect from embodied agents: bots trained on human behavior that they hope to be tunable by designers and ideal for developers who want to embrace and build around this tech to either develop new game genres or make it a bedrock of their production process. If you are interested in learning about a company with a unique approach to world models and embodied agents, this is a must-listen to close out 2025. 

We’d like to thank Lysto for making this episode possible! Lysto is revolutionizing how game development teams collect and act on real player feedback with its AI-powered playtesting insights. Learn more about how you can get bias-free feedback at https://lysto.gg/?utm_source=naavik&utm_medium=podcast&utm_campaign=ad


This transcript is machine-generated, and we apologize for any errors.

Alexandra: What's up everyone? And welcome to the Naavik Gaming Podcast. I'm your host, Alex, and this is the Interview and Insight segment. This is my last episode of 2025, and I'm excited to be closing out the year on something big and bold — world models and embodied agents. World models are suspected to be AI's next major frontier. Unlike large language models, which describe the world through text, world models aim to understand how things move, interact, and evolve in space and time. There's tremendous potential for breakthroughs that will impact robotics, defense, and anything touching physical space.

We, of course, know that video games have often been the vanguard for advances in 3D space, and so today we'll be discussing world models as they are derived and applied to gaming. A company with a unique advantage, billions of gameplay videos, has emerged to tackle the space. Backed by investors like Rain General Catalyst and Coastal Ventures, I believe this is actually Coastal's largest seed check since open ais in 2018, our company of subject today, General Intuition, recently raised a 133.7 million in seed funding to fuel its ambitions to be a premier research lab in the game space. And my guest today has actually graced our airwaves before, but given what we're discussing today is so different, we just had to have him back before closing out the year.

It's my absolute pleasure to welcome Pim de Witte, founder and CEO of General Intuition and Medal, to our airwaves today. Welcome to the pod, Pim.

Pim: Thank you.

Alexandra: It's good to see you again.

Pim: I'm glad you got the, you got the joke on the number. Most people didn't get it.

Alexandra: I would say, I say it very specifically. I've been told this is a very specific number and I have to say this specific number. Oh, perhaps for those in the audience who don't get it, what is the, what are —

Pim: Oh, 1-3-3-7 is like the, it's, it's Game Elite speak. You gotta, you know, I mean, if you don't know that, you should probably stop listening to this podcast. And go play some video games.

Alexandra: Sounds good. Well, this is my last episode of the year, and I'm glad I'm going out in a banger. I know you've been traveling a ton. You recently spoke at AI Pulse, and you've been doing this big podcast circuit. Do you have a, have a highlight so far? What's been your favorite thing that you've done in the past two weeks?

Pim: Oh, it was definitely the, the, the keynote with Yona Ka. That, that for me was, I got the invite pretty randomly and I wasn't expecting it. He's obviously a legend and I'm not. So it was, it was pretty, yeah, it was pretty surreal to do the, you know, there was like 1500 people in the audience.

The room was, were overflowing. That, that's always what happens when Yona is a keynote, by the way. So, it was definitely there 'cause of him, not me. But uh, it was like, and also he is super technical, so it was quite intimidating. And, like he has a Turing at work level technical, so it was pretty, it was good, like I was pretty nervous, but I, it worked out super well. And a lot of people watch it, so I'm excited.

Alexandra: Amazing. Yeah. And it's always good to meet your heroes, but, so you've already been on our show before, and so I'd like to do a slightly different intro process. I'll just ask some questions to help the audience get to know you, and then we'll dive right into our, our episode today, which will be packed full of information.

So, you've done a ton of developer and engineering stuff in your background yourself, you created the largest open-source Unity package. You co-created Unity Firebase. You started the largest ATE server. I believe Soul Split and you made like a million and a half in net revenue doing that. I watched some YouTube videos. When you were of someone interviewing you when you were really very young, you're clearly quite precocious. Can you just explain, why you like doing all of that? What kind of motivates you and drives you to sort of have built so many of these foundational servers and packages from scratch?

Pim: I think the packages are usually because I'll spend like six or seven hours debugging something during the day. And then that was the Unity main threat dispatcher. That was the, the origin of that. And then I was like, oh, I don't want to have, don't want other people to have to go through this, so lemme just publish my solution.

Something in like a pretty nice way. Like at the time, Unity just didn't have a good way to dispatch anything to the main thread, which was required for like manipulating the UI. Like if you wanted to do callbacks from Firebase, it led to like UI changes, which we used for like matchmaking. There was no way to do it. You had to like custom write the stuff. So anyways, it's, it's mostly just about giving back in a way. I mean, my personal motivation is also largely driven by impact. My, my, my, so I think that both, that is true, but I also think that building good companies and, and, and making money is a kind of right in a way that you get to deploy how to use that.

So like, you know, pursuing science. So, so to me, I think it's, it's a, it's a form of, of like pursuing, your ambitions. Not so it's not necessarily about making the money, it's about getting to do what you wanna do and, and taking the bets that you wanna make. And for me, those tend to be pretty large.

And so, that has required like increasingly more, increasingly larger businesses to be built. And I think, so yeah, it started with my private server, then I did Doctors without Borders for three years and then worked on Ebola and, and satellite based mapping. And then eventually started Medal and then, yeah, from Medal, General Intuition.

Alexandra: Yeah. And that actually brings me to my second intro question, which was until General Intuition you were at the helm of Medal, which we are going to recap for the audience that didn't listen to your previous episode this year in a moment, but you were doing that for about eight to nine years. And so, I'd love to, I love this game that I play with some of my friends, but it's called the Rose Bud Thorn.

The Rose was something that was amazing about those nine years. The Bud was something that you're excited about or you're on the cusp of, and the Thorn is something that was just like a total pain in the ass the whole time for those nine years. What is your Rose, Bud, Thorn for Medal.

Pim: Okay. So, I guess mine would be, during COVID for many teenagers was like the only way to actually create memories with friends.

So, we saw a lot of memories being created during COVID, which is like the only way to do it. And I think that that's one thing I'm pretty proud of on Medal, like Fortnite, a Fortnite concert, all those things, like for an entire generation, there's definitely, like most of their memories with their friends live on Medal.

Was also a huge inflection point for growth. The thorn, I, I guess two things. One, acquiring teams, with like really good products, and not realizing how fast technology stack itself was gonna change. Like, for instance, we acquired a company that, and this was not their fault, that did a really good editing, that had a really good editing software that we built in product. And then it was really, really great for like six months. And was slightly different stack. So, it was more difficult to maintain. And the team eventually, just in retrospect, we should have just like written it from, from scratch. And so then people didn't wanna work on it. And the editor kind of like lagged behind for a little bit because I had made that acquisition.

And so, and by the way, the team was great. This was entirely on me, not on them. But yeah, so, so the thorn is that I underestimated how fast technology would accelerate, both inside the company and outside the company. How fast the ecosystem run video and react to it mature, for example, 'cause that wasn't the case stand.

So, we're making acquisitions be, need to be better at like, predicting the trends. I'm making acquisitions behind as opposed to looking at it as a moment in time. So, the thing I'm excited about right now is how much I think gaming is going to positively change 'cause of the acceleration in AI. I think we've been kind of like stuck for a while, to be honest. And I think that, I think the industry has needed like fresh blood and new studios for a while that are, can ship quicker at the pace of like, you know, TikTok right? Where like culture changes by the day. And it does feel like as a games industry, we're still sort of stuck on those like four year cycles and, and like I think AI just changes that very rapidly.

And so, I'm pretty excited about how quickly things are gonna change. And, and I hope that at leads to a ton of growth for the games industry as a result because we, we, we sort of see more of the attention going into the studios that just ship a lot faster.

Alexandra: Awesome. Alright, so this perfect segue to go into our first topic, which is we're just gonna do a quick five-minute recap on the episode you did earlier this year for our audience members who might have missed it, who need to do some reminiscing. But Medal is really the foundation of what you're building at General Intuition. So, I'm gonna ask just a few questions that'll inform context for today's discussion. And so, the first thing is in like just a couple sentences, like what is, what is Medal? Yeah, what is Medal?

Pim: Medal is the easiest way to record and share video game clips with your friends.

So, it's always running on your computer. You hit a button, it syncs the last fifteen, thirty, sixty seconds to your phone. You get a link and you get a profile and your friends, you can easily connect with your friends around the stuff that you're experiencing inside video games.

Alexandra: Okay. So, it's kind of like, combines a little bit of like the social stuff of obviously like a TikTok or YouTube and then it's like OBS studio together.

Pim: TikTok, yeah, I, I, I, I would say that. The reason why it's so special is because it has none of the complexity of something like OBS. Which is very complex, but other, other than that as an analogy, I think it's pretty good.

Alexandra: Cool. Yes. Seems very much less complex. OBS studio, someone who's used OBS studio, it's the very hard.

Pim: If you, if you ask Medal users to describe it, I think they would tell you it's like closer in between like Snapchat and Instagram. Closer to the Instagram side, but like, yeah, like the, there is a large public graph, but actually most people use it with, with smaller groups of friends.

Alexandra: Okay. Cool.

Pim: Yeah.

Alexandra: And so then second, what is special about Medal that makes it different from other large catalogs of game clips like YouTube or maybe TikTok or Instagram? What's special about Medal?

Pim: If you're trying, so, so TikTok has a problem that it's, it's not 16 by 9 videos. So, everything is kind of cut to like this very, very narrow. So that kind of disqualifies it for training any sort of system. YouTube has the issue that they, so for instance, you first have to account for pose estimation 'cause most YouTube videos are recorded on like a mobile phone. And when you're training clinician models for robots, like, like we are, you actually cannot use the, like you have to actually stay in the same perception as like a robot would, which is usually first person. And so, you have to pose estimate back to towards for first person, which doesn't necessarily generalize well in very complicated environments.

Then you have to account for action labels, which like the actual actions that people take. And the problem with that is that humans are really bad at labeling these things 'cause they are very fast. They're like millisecond precision actions usually. Um, and because in Medal recorder we, we get these automatically.

And so, and people also like to overlay that stuff on like their, their videos and navigate clips by like which actions you took and things like that. And then lastly, you have to account for optical dynamics. So basically eye movement. And the cool thing about like if you think, if you think about YouTube, every single time, every single second, if you don't know where your eyes are looking, you don't know the decision that you're making, whether it's based off something that's not inside, sort of the pose estimation view.

If you're looking left, you're looking right, then you don't have like a complete information loop because something might be happening outside the view. And then the model will learn to correlate nothingness with like making decisions, which is really bad. And so, you with video games, you kind of simulate optical dynamics with your hand 'cause you control sort of the, the head and eye movement as one unit with the mouse.

And so, it's actually a much better representation of space to reasoning than even, even YouTube videos. And so, our, our bet is that we can really focus on those dynamics that you can, that you can see in video games, and then use that as a foundation and then train with other elements on top of that.

Alexandra: But I think the, in general, I think the, let me just repay sort of repeat in maybe a, a fifth grader way what you just said back to you, which is that, in for Medal, unlike YouTube, which is like anybody could be doing anything outside skateboarding, doing the washing, watching a dog, Medal is mostly a video game platform. And in addition to users using this clipping tool, you also have captured basically the 3D contextual data of their keystrokes or their controller, controller strokes so that you know what the player is doing and what their intent is. Is that generally correct?

Pim: We don't capture the actual controller or key inputs, we convert them to action labels in memory.

So, you never actually capture, a key or anything like that. And, and the reason is when you train these models, you don't need that information. It's actually noise at, at, at training time. It's complete noise. Because you don't want to model to associate something like typing with the option, right?

So, you actually want to convert everything to action labels. And that also, that also leads to a much better like privacy situation. And so, but that is like, just having that is enough. Like this is, this is obviously something that YouTube can't do, but just, just to be super clear.

And so yeah, so, so, so you then, once you have the action labels and also we don't scrape like, like Medal is 2D. So, like you get to frame information in 2D and then the option labels from the user. Yeah., And that is enough of a loop to kind of track this.

Alexandra: To create the 3D context data. So you're not capturing players hitting W, but you're like, in WD you're like player moving forward, going forward. Something like that.

Pim: Yeah, exactly. Exactly. Yeah. Okay. Exactly. And that's really,

Alexandra: And that's really valuable. And so, I think the reason I ask—

Pim: And it never, yeah. And it never captures anything that's not in action essentially.

Alexandra: Makes sense. Yeah. Okay.

Pim: Yeah.

Alexandra: Cool. Yeah, and I just wanted to lay that ground 'cause I think so many people have been like, oh, well, like, can't you just train 3D world models off of like, the gaming, gaming stuff that's on YouTube or the gaming stuff that's on TikTok and making that meaningfully distinct with, for you have like 2 billion clips of something like that, right?

Pim: Yeah, it's a billion, billion uploads a year.

Alexandra: Billion uploads a year. Okay. Yeah. Which is rough—

Pim: Roughly equivalent to YouTube. Obviously YouTube has more, I mean, it's for upload, but it's in terms of, I'll blood counts roughly.

Alexandra: Alright, so that's Medal in a nutshell. And so, you have this insanely unique data set, and this year you raised 133.7 to build world models and embodied agents.

And so, before we start talking about what world models are and how they could be applied to games that industries beyond, I kind of wanna understand when you started even thinking about General Intuition, you turned down this bid from OpenAI for about $500 million from what I understand. Were you planning to do General Intuition before that bid? Like when did you start thinking about General Intuition existing in the, in the, in the cycle of Medals, nine years of existence?

Pim: Yeah. Yeah. I, I, I can’t comment on Open AI specifically, but I can tell you like broadly related to lab discussions and, and, and, and why we decided to stay dependent.

General intuition, I think the, the thought process started mid 2024, and a lot of that was due to like a bunch of papers coming outta the time that you had the first TV paper, you had the diamond paper. And it became very, very clear that this combination of actions and observations, like actions and frames are sort of the loop.

You need to bootstrap this like intelligence for world models and embody agents. And it also was very, very obvious that sort of, we were sitting on the right kind of avenue to pursue that. And I don't think any other company had that same path, which is why I was really, really like, I was already really interested in AI in general.

I had read all the papers, like, or not all of them, but many of them running transformers and, and I built models before, uh, when I was at Doctor South Border. So for me it was kind of like a natural, you know, a, a natural move to start exploring it. We didn't expect that it would escalate so quickly.

I think for us, we, we were really just exploring it and then indeed, like a bunch of companies all of a sudden started inbounding for acquisitions and, and, and all, and so, and so for us, that was, that, that was very unexpected. And, and this actually, like, it was funny, the, the, the, the time between us agreeing on a board level to kind of start exploring this and all the craziness was like, there was like two-week period in between.

And, so it was all super fast, and all like over like Christmas and stuff. So it was, it was a, it was a really crazy year. And, yeah, so, so, so then, um, 2025 is when it really materialized. So, we, we built a research team, we actually built the foundation models. They look great. I think you see some of them in the office.

And so, we also didn't expect how fast we'd be able to make progress, I think. And so yeah, so, so it's, I think it's been like a year and a half in total of like actually pursuing this. It also took a long time to get all the, labels for the actions in the games. So we had, we had to have people actually create those.

And we didn't want to collect any of the data without them 'cause it would cause like privacy issues. Mm-hmm. Um, and so, yeah, so, so a year and a half of really like heads down work and then we just announced after a while after we really knew that we were in a really good spot, essentially.

Alexandra: Interesting. Okay. Yeah. So collectively, it's kind of like you were on the side passionate about it and you started reading all these papers. Yeah. And you're like, Hey, actually, maybe like the thing that I'm, this other company that I'm running could apply to, to building.

Pim: Yeah. Yeah. It was, it was a total accidental find to be honest. Like, I actually tell people it was worse than accidental because actually when I was at go at, Doctors without Borders, I was a contact at Google Christ response. And I was working on the same floor as a DeepMind employees for a while. And I would have lunch with them, and I would learn about what they're doing and how they believed in like, video games as like, you know, a, a, a path to their version of a GI.

And, and I think at the time I thought that was really silly. And then when I built Medal, I didn't think back on that once. I didn't connect those dots. I did like, it's fascinating to be honest, until, until these papers came out. And so, yeah. It's worse than luck, I will say, but I'm glad we did it.

Alexandra: Yeah. Yeah. That's, well, I mean, that's amazing story, and I'm sure that a year and a half is, it seems like it's a short period of time, but as I always say, it's never, it's not about years, it's about effective labor hours. And so, I'm sure that in that year and a half you've done a lot of work. And so, I think this is a perfect way to dive in.

You know, we understand the journey from Medal to General Intuition, and now I'd like to start talking about maybe some grounding on, on world models, like what they can do, what they can't do, and, and how they're being built. So, the first is, what is a world model?

Pim: Yeah. So world model, you can kind of think of it as a, as a video model.

So, a video model is a model where you may be able to text prompt and you, and it predicts the frames that are right related to that text front. And then from the first it will, it will initially generate the first frame and then using the first frame and the text front will generate the second frame and the third frame and the fourth frame, it's all descendant for from that first frame.

A world model, however, has to generate the entire distribution of outcomes in those frames, depending on the action that you take. So, it's a bit of it, it is a bit of a more complex problem, but the architecture is similar to video because it also generates the frames. And, but instead of generating frame sequentially in a video model, it does so where you can actually interact with it and as you're interacting with it, it generates the, the other frames.

I like to describe it a little bit as like, you know, when you dream, often you're a bystander, and you see things and you can't do anything. And that's kind of like a video model. A world model is where you could actually then go in and, and do things and interact with everything, which is fundamentally how humans learn.

And so, that's why world models are, are, are important too, because humans don't learn from just observing and watching videos that are generated. They actually, they, they, they'd learn from interaction. And that's, I think, you know, the reason why everyone's betting so big on world models is 'cause it's a very general, especially if you have a lot of the real world already represented inside those video models.

It's an incredibly general way to make everything interactive and learn from everything that's on the video data on the internet.

Alexandra: Okay. And just to distinctly say, what is different about a world model from an LLM?

Pim: Yeah. So, I like to describe, have you seen Frozen? The movie? Okay. I like to describe LMs a little bit as like a snowball.

So, it starts really small and then it, what LLM Zoo is, or auto regressive. So, they take the output, so they generate text tokens, right? And then they take those tokens and they run 'em through the model again. And the problem with that is that if it makes a mistake, for instance, or it goes down a wrong path or hallucinates, then sequentially, when that gets fed back in this input, it will continue generating down that wrong generation, right? So, it's a little bit like a snowball that keeps accumulating tokens, or snow, that is completely not aware of its surroundings. So, it just keeps inside its own loop. And when, when it's at the bottom of the mountain, and there's a big rock in front of it, it has no ability to, to know that it's about to grammatically crash into a rock 'cause all it knows is itself, and its tech generation and real intelligence is a bit more like Olaf. So Olaf right, would know that there's a rock right in front of him and would know to dodge the rock or do his thing where he breaks up and then puts himself back together, right? And so, and, and, and so the big reason this visualization is important 'cause the real world is what's called continuous.

So the real world's constantly changing and, and so as you're, you're making predictions, you want to constantly observe the entire environment, and text is like a subset of that, which is why the snowball analogy, like if you actually make an agent that's a boot able to observe, and, and, and, and act, then you actually get more more capable agents. And yeah.

Alexandra: Okay. Yeah. And of course, also like, I mean, I think the world is already in existence and the physics of the world we're not human created. Right. And I think obviously language is a human created idea and concept, and so I think that they're obviously like very different. , But I just wanted to understand the, I guess the snowball-esque or unaware of itself kind of idea that you explained with the snowball and the aware of itself, example that you used with Olaf.

Pim: Okay. Yeah. LMs are incredibly good for anything that's in text space. And humans essentially created a text as a way, as you described, like, to, to describe three-dimensional reality. And it's really hard to recover all the information that's lost when you go from three-dimensional space to text when you build models that need to generalize in spatial, temporal contexts, which is realistically most tasks in the real world.

Alexandra: Hmm, interesting. Yeah, and it's also, I'm not gonna go down this rabbit hole, but like I'm sure there's languages are also architected in a different way. And I'm a big fan of Ted Chang's, like a Story of Our Life and others where they, it's also the movie, Arrival. I dunno if you've seen the movie Arrival. About how the language is concepted in also. Time is continuous or experienced in like one amassing, so you could also see LLMs are derivative of like how we constructed the language in the first place, which is why it's limited.

Pim: Yeah. And what the interesting thing is that by it is actually kind of like a superpower, I think for LLMs to take out the time component, I think, you know, because, so like this will go pretty deep and technically you can, uh, let me know if you want me to stop at any point, but when you do world models or embody agents, you actually have to forcefully model time and space in your architecture.

Alexandra: Mm-hmm.

Pim: Which if you don't need to do that with text is actually inefficient. So, um, it leads to like a forced architecture in a specific way. And if you just want to like generate a bunch of text data that like it actually, and that's like a big difference. And that's also I think why they're different model categories. So I, yeah, it goes all the way down to that, that text is just incredibly efficient of a compression method, as, as you say.

And, and, so LMS will continue being great. They'll continue to go and solve scientific problems. They'll, they'll continue to go and, and generate code, right? Because, you know, you don't, you don't need like a continuous component for, for, for that stuff. But you do, if you want anything to work inside video games or the real world or simulations.

Alexandra: Right. Okay. And so, then what can a world model do today and what can it not do today, but you hope that it will do tomorrow?

Pim: Yeah, I think the big limitations of world models today are multiplayers. And it's also, you know, I don't, there's some papers where they explore it, but like, that was one environment on one game.

Very simple. So, so you don't, you don't know what the scaling laws are. You have no clue whether, whether it actually is, is possible, uh, whether, whether it can generalize, I think. So multiplayer is unsolved. I think that large, long generations inside world models, consistent long generations, especially in really open-ended environments, are also really far. Because you need just like spatial context or spatial memory in order to solve this. Like if you, I'll give one example. If you, you, if you, if you drive down the street and then you turn and you turn around a world model and you, you saw like a person walking by on the right, on your first, second of generation and you, you come back through at same points like two minutes later, right? Then the model needs to remember that that person was in the first second. And so that means it needs two minutes and one second of, of context, uh, which is a lot of context, especially when you're dealing with video data. And so, I say long like consistency of environments over long generations is just a hard problem.

Now it does look like this is just working with, , data and, and, and compute and scaling. And there are also methods that you can use in order to kind of like select the, the right. You might, you might not need to have a two minute context window. You might, you know, you might, you might be able to discard certain frames from memory automatically that are like very similar or don't introduce new information or something like this that are tricks, but it's still like, you know, if you want a good game, you need something like this.

Alexandra: Yeah. Right.

Pim: Yeah. Or any, any sort of entertainments. The other thing is that because the models are generative, you cannot really guarantee that you're doing the same generation for one person versus the other. And I think for, for, for video games, even if it's like, let's say it's like one of those race games, right?

Where you, where you have like the shadow where you race against it. If they are not the same for you and that person, then all the game mechanics that are deterministic, they go away. And so, yeah. I think also, by the way, the reason for this is that world models are not built to replace video games.

I actually, I've been asked this question by a lot of people in the games industry, but I think fundamentally they're built for short se, very unique short sequence generation for, for training agents, embodied agents and robots. And so, I think a lot of the fear around it, actually competing with game developers is largely overdone.

I, I don't think that's happening anytime soon. I do think that there's a lot of things that we can learn from role modeling methods that we can implement inside the engines. Like things like diffusion, upscaling, I think 'cause like what role models are fundamentally doing is we're taking space and time and they're compressing it into the action.

Just 'cause it's action inputs, right? Space and timeframes and, uh, temporarily sequenced frames by they run understood at a hertz. So you basically, you know, the, the space and timer are baked into the model architecture. Into the data. And so I do think game engines can, can, you can leverage that in order to ship more efficient, smaller games that, that don't require as, as, as large silo modes or, or maybe like, generative systems inside the game that, that are kind of world model like where maybe environments are generated inside a world model but then converted into engine code, right.

Using like maybe a VLM or something that's analyzing each of the, each of the parts of the generation. So, yeah. I, I, I, I think, um, those are kind of the, the, the, the limitations, and, and some of the opportunities and the opportunities are largely in, in just training at Body Day change. That's, that's really the use case.

Alexandra: Yeah. Okay. And yeah, we're, and we're definitely actually our next time we're gonna talk a lot about the goal of general intuition as it applies to game specifically. Yeah. But, so the TLDR R is that right now what it can do today we're using world models to train the embodied agents.

And what it could do tomorrow are some of the things that you've listed. Like you talked about compression in some of the engines and potentially some other things for short form entertainment of some other kind of new thing. But that sort of like the, the tomorrow problem, but the today problem is this.

But before we move on to, the goal of general intuition as applies to games, there are some other companies that are also doing stuff that's potentially similar to general intuition. I think runway, FEFE leaves very famous, like Stanford Researcher World Labs. Some, maybe some of the SEMA two stuff at Google.

I might be like pulling from. There's probably a lot of unique differences between all of these, but, you know, how are these companies maybe positioned relative to GI, and, and what is their kind of approach to world models?

Pim: Yeah. Let's start with World Labs. I think, they're taking the, like Gaussian splat generation approach.

And I think the reason why they do that is because they consider it, they consider those spa stats to kind of be like virtual atoms is what they say. I, so those systems are not yet interactive. And it's really tough to judge their approach compared to video-based world modeling, which is what we do until they actually become interactive.

Because they're very different problems. Like they, um, uh, like any environment for instance, you can generate. Thousands or, or, or, or millions of these, these, these, these flats. So, the, the, the output space of the model is quite large. And so doing interactivity is just a bit harder, I think. Whereas with us, you only need to kind of predict like small, smaller, latent representations, let's say like 1 28 by 1 28 or something, and then you can run it through the fusion upscaling or, or something like that. So, the, the, the kind of degrees of freedom of the output space, the number of units that the model has to predict is just a lot less, that the combinations basically, and what that means is that it tends to scale better, but the problem is with video-based work models that you're, that you're not in verifiable domain. So, with the World Labs approach, you can, for instance, load it into game engine, then you could add objects in the environment and you can have code check whether an action was performed correctly or not.

Which is really nice also for, for training agents, for example. Right. But video models, I think just scale quicker into interactivity and generalization, um, around all these environments that are represented in video already. And so, I think those are, they're just really two fundamentally different approaches and it's really interesting to just track both of them.

I, I, I don't con, I can't definitively say a word, one is better than the other, as to how that relates to us. The reason why we didn't do any of the deals in any of the labs was because we realized we had so much of this data that we could actually just skip world models initially and use like LM like training.

So, what LMs, they were trained off Common Crawl on the internet. So, they basically, you, you get a, you get a webpage, you, you mask text, and then you predict, you try to predict that text in order to learn the structure of the text that you have access to. And so, what we, we basically do that, but for the actions you take, so we take basically the action sequences, and then we take right the fifth, the input frames.

And then we see whether we can predict which action to correctly take off by masking, part of the action sequence. I see.

Alexandra: Yep, yep, yep.

Pim: And so, we, we realized that we just had so much of this that we didn't need role models to kind of leap into this foundation model. And then we could do them in parallel.

So, I wouldn't put us in like, like we would likely become customers of World Labs, for example, if they build good role models. As opposed to being competitive with them. I think our, our focus is also not like role models for public consumption. We only do role models to get into RL stage. We have some interesting ideas around like metal features.

Like what if you could replay your clip inside a world model, for example, which are really interesting. But like, I think we wanna be kind of cautious about not competing directly with game developers. And so, so that's one. I think, runway, their focus is really the movie industry. And I think, yeah, same thing.

They don't have the data that we do. Right. So, so that's why you have to kind of pick a more video based approach 'cause the amount of data you have is just gonna ramp over time.

Alexandra: Yeah.

Pim: Right. Yeah. And so, you start with video. So, so, so they're taking more from the cinematic approach and I think that's interesting.

It's really interesting. CMA is probably the, the closest to what we're doing. Yeah, I mean, we know them well. We have really nothing, nothing bad to say about it. I think, really great team. The nice thing about being a startup is that you can be super focused and you can, you can sort of, and also they don't have the data that we do, right?

And so, they have to find other ways, but they're finding those. And so, I think, yeah, that's, it's gonna be just an interesting, interesting one to, to see. I also don't think CIMA is doing this for the purpose of launching like, game developer related products at the moment. I don't, I dunno, for sure.

But, I think they're doing it really as like a research project towards like embodied agents and robotics. Our focus is going to be very much on the games industry to start with, even though we also want to get to robotics. And so, right, so our, our first customers are, are, are game studios for really, really good, like NBCs and Bo that are video games that are very humanlike.

Alexandra: Right.

Pim: I don't, I don't think DeepMind is like super focused on that as a market. So, I think, you know, I think you'll see like SEMA style methods and our methods kind of like both turn into robotics foundation models in their own ways. But like for game developers who won't really, I don't think they'll notice very much that. That makes sense.

Alexandra: Got it. Okay. Yeah. And I ask these things mostly because there's so many different ways to approach this and yeah. If we harken back to like the original like AI models, you know, cir and it doesn't, it's not that long ago, but obviously people have been working on AI stuff for like 15 years.

There's a big de, there's a big delta, at least in their scientific and research community about like where knowledge come from comes from. Yeah. Is it, is it symbolic systems? Like Watson knows a billion facts or is it the ability to learn? Yeah, and there are many different, I think even divisions that like open AI and anthropic and things like that about the philosophy of which to teach someone something.

And it sounds like it's something potentially similar, right? Where with, with the world labs, they're using the splats and so they have to kind of know like every single thing that was ever done. Whereas in your situation you're like, okay, well a player like went left, so they're probably most likely gonna be here.

And maybe that like, makes it more production ready. Is that like a good parallel between the symbolic systems and like the, the, the rl or is that not an extrapolation?

Pim: I think. Yeah. I, I think the way I would describe it, maybe even simpler is very complicated interactivity between humans. Mm-hmm. Is very hard to simulate.

Because we're so unpredictable. Right, right. That's why it's so hard to build human like NPCs. And so, if you're taking the symbolic approach, then like all this data already sits inside videos on the internet. And if you're taking, if you're taking the symbolic approach, you can't really use a lot of that data.

And so. I do think that people will be coming at it from sort of those two angles, but I, my, my prediction is that stochasticity remains really difficult to even get to symbolically. And so, you just want to maximally bet on learning from videos. Okay. That already include a lot of that data.

Alexandra: Yeah. Okay. Makes sense.

Pim: And you can use hopefully both for RL eventually. Right. But I think right now it would, the world model approach or, or video model, post training or video model, it is just a lot simpler. And by the way, just for context, the way that you then use it for RL is you train it work model to basically judge the behavior of the agent inside the world model.

Alexandra: So again, we talked about a little bit about world models and what other people are doing to tackle the space. You know, General Intuition thinks of itself as someone that's primarily, or at least initially going to service the, the gaming industry. You're hoping to build like these amazing bots as, as you've said.

So, let's shift over to talking about this, the goal of general intuition as it applies to games and what games it would be used for. So my first question before we talk about like, any of the technical stuff, like where does the agent live on device or on the server kind of stuff. But like, who is your ideal like gaming customer and like, what are they making?

Pim: Games that video game—

Alexandra: Wise. Yeah.

Pim: Yeah. Games that need to simulate human behavior in a very human-like way. So, right. You know, this is a huge focus for like Rockstar for, for GTA. Right. The more realistic the people in the environment are, the more fun the game is. And I think this is true for GTA, this is true for, even for, for games like Fortnite, it's true for, for shooter base games even.

Because you don't want to play against bots that have access to locations of other players on the map because, you know, you know, when you play these games and you just like, you know that a human would've never found you there, but somehow a bot was able to like, figure out where you were and come at you and like kill you.

And it's like so frustrating. Our boss has the same constraints as humans, so they can only see what a human can see and act like a human can act. And so, you get this like really human-like interaction patterns, but they can, they can be as skilled at as a human. They just can't be more skilled than any human has ever been, right?

Because they're sort of trained off imitation learning from, from the clips. And so, and so it just makes for this really in, and I think you've seen it play, at the office. And so, it's very human-like, right? And, and I think that that's what's gonna make it really fun to, to play against, which then drives right.

Retention for video games drives monetization. And, and game developers. As a result, we'll be able to have like more sustained player liquidity and, and things like that. And so, I think it's like a root note problem for the whole industry. I think the limit of any simulation is really the intelligence of the objects inside of it.

And so, like you'll be able to use models initially for, for players, but I think long term you can maybe use it for any objects that is represented in the video data sets.

Alexandra: Oh, okay. Like a, you could be a car or something. Yeah. Okay. Alright, so those are maybe, so what I heard there is maybe like some MMOs, large online games, maybe some, some shooters, et cetera.

What about like a customer, like I've had like the modal AI guys, they're building ag agentic agents for QA testing. They're not a gaming studio, but they're gaming adjacent. What about like some players that are more like in the platform space? Could they also be a customer?

Pim: Yeah, so the way that we would do this is we're gonna serve these models through a general API, where you just stream and streams and we predict actions at the API level.

We don't, we really want to be seen as a compute, as like a, a very neutral compute platform, kind of like AWS or like GCP. And so what I suspect will happen is a lot of these companies will use our models to roll out really good, or maybe comp, you know, maybe game studio will do it themselves, right?

Because like sometimes for instance, you, if our models can't zero shot, which means like immediately transfer well into your environment, then sometimes we need additional data to do a fine-tuned right. Agent or if you wanted to run inside the engine, we need to do a distilled version.

And so, sometimes, again, developers will need to approach this directly for, for those types of things. For QA, for example, like, if you're doing QA for instance, you're limited by the, the instances of the agents that you're running, right? So, like, if you wanna run a thousand models better be small.

Because, because that means you might be able to complete QA in like six hours instead of like six weeks. And so I, I, I think this will be a big use case. Like we've definitely been approached for it. Our goal is not to compete with other companies that are doing kind of like game QA related stuff because they've built really good pipelines for it.

I think we would just be one of the models that they serve in order to serve their customers better. I see.

Alexandra: Okay. Yeah. Okay. Alright. And so then pivoting back to the, to the game studio, the content, more the content side. Yeah. How do you plan to give a designer or a studio enough control to tune that embodied agent?

You know, I think a lot of times, like from the game design perspective, our job is to make the player feel smart and to some extent an agent that might proactively learn or be really, really good because they learn from the top. 95% of CS GO players could detract from having fun. And so, I'd imagine that for some players, like in the competitive scene, they might find that amazing right?

To train alongside and embody Agent Bot. But like, how could you, what are the, how are you thinking about the product in terms of servicing it to the game industry so that someone be able to like up and down the levers of how good this thing is?

Pim: Yes. So, we actually are training with skill level data in the dataset.

And so, like we have models that can basically predict like very basic skill level from a lot of the data. And so, as a result, you're going to be able to actually set the, kind of like an LM temperature, I dunno if you've ever played with that. Like, when you make a request, you're gonna be able to kind of set the like the parameters kind of. You're gonna be able to set this as a request parameter when you, when you, when you interact with it. And this is gonna be a range for like from zero to a hundred. And, and you can tweak that. You can also give it when we release these steerable prompts, so text-based prompts.

Where you can tell it to behave a specific way or not behave a specific way. And maybe, right. If just the skill level tuning isn't enough, maybe you need to give it explicit instructions to not do a very specific thing if like, you know, player playtime hours is like under five or, or if the level is like under 10 or whatever, right?

And so, no, these knobs exist and I think you'll be surprised how nice it is to not have to write a ton of code, to do some of these things. And to just be able to, to, to prompt for it. This saves I think, a lot of time.

Alexandra: Okay. Interesting. Yeah. Okay, another question regarding the, I guess the construction of the, of the product maybe that a game studio would use. And then we'll talk about some of the technical stuff and maybe like, or where it's hosted and, and things like that. But obviously metal is where you're getting the majority of the data from the most played games in the world are obviously cs, go Fortnite, Roblox, Minecraft, stuff like that. Do these games themselves present, present constraints like, uh, an embodied agent that was trained from CS Go data can only be, sorry, an invited agent that you'd wanna deploy in CS Go could only be trained from CS Go data.

Pim: Yeah. This is a great question. A lot of emergent behaviors actually come from training across games. If you like a reason, a reason why it bother for you a lot? The technical word if I look into it is compositional generalization.

It's a good reading, reading list. And so, what it means is that a lot of creativity and like of new sequences are actually learn sequences that are reconstructed for a new goal that the model actually generates, but it may not have been present in the training data. And so, that leads to more human-like bots.

If you have a bot that's just trained on CS Go. Like what makes it human-like, is that people take the strategies they've learned from Fortnite and they, they try, like, that's how, you know, you're, you're, you're playing against humans. And so, I think that's, that's part of building, parting building really, really human-like bots.

As for limitations of individual games, also very interesting. Definitely you ideally want games to be as realistic looking as you can, if you want to use them for like robotics donation models and things like that. You don't want games that introduce, for example, sheets on the screen because they rarely exists in the real world. Like for instance, if you're playing mobile games, often they'll render like little buttons and say like, click here to do X. And like the real world really just doesn't have that. It's, it's like a fat, like you, you make games to make, to be, to be fun, right? So, if something is gonna be on the touchscreen, a pain, and you're gonna give a button and the real world just doesn't have that, you just have to go through the pain.

And so, and PC games are often more like that. And, and mobile games are not, and so there's definitely like a huge difference between like PC games and mobile games, for example. Mobile games are not that helpful because these introduced cheats. Also, for instance, a lot of times when you're playing mobile games, like the, the aiming and viewing and stuff like that is like assisted and that actually makes it, that's worse.

It's worse because the, again, real world or, or most games don't have that. Any, any kind of AI or any sort of assistant on the player side means that it's basically useless 'cause you can't actually learn from the actions. And so, and then the other things that really matter is this, this space of the map or size of the map, bigger maps are better, generally.

And, and then the action space of the game. So, if the game only has move forward while left move forward, move left, move right, move back, then it's very easy for the model to learn an environment quickly. And so, you generally, the more complex environments are better, 'cause, 'cause the real world is complex. Right? We have a lot of things we can do at any given time.

Alexandra: Okay. All right. So, like, in my mind, like your dream, your dream games that are the most advantageous for me, perhaps like your models to learn from are hud list deep 3D highly realistic environments. So maybe something like, like Hellblade Senu, A Sacrifice, like almost no HUD at all. No interactive tips. Like

Pim: Yes. And open-ended.

Alexandra: And, oh, okay. Interesting. Yeah. So, you're thinking like maybe Ghost Hashima is like God of War, Horizon Zero Dawn, big places, but like not overly pandering to the player. That's like, yeah. A thousand little tchotchkes all over the screen. Interesting, interesting. Okay. All right, so that was one of the questions around the game on the game studio side. A couple questions sort of about like how this would actually like work. I know are some challenges potentially, because I know this is embodied agent, which is smaller than a world model, one could presume that the agent is gonna be smaller than the world model.

Yes. But if it's on device or on a, if it's on device, right? How do you make sure that there's enough space, and perhaps this is potentially something you're still working on. Like if the GPU is busy, like rendering the game, wouldn't they share resources here? And does that change whether or not it's, I mean, I presume it does change if it's a world model versus an embodied agent.

And if it's not on device and it's kind of on a server, don't you also potentially face like latency issues and expensive cloud costs that have made even like cloud gaming a hard to play or consumer, I guess relationship.

Pim: Yeah. So, what's really interesting, because latency, for instance, is already in our dataset because people play games that didn't encounter latency and, and the agents actually account for the latency,which is quite interesting. And so even if you run on a server, the models are also small enough that we can serve them at really low costs. And I, I like, you might just have to keep distilling. But I see no sort of cost prohibitive factor.

Other than the fact that, that it is sort of cloud inference. And then for local inference. Right. I think the curve is that GPUs are also getting a lot better simultaneously. And so the models are genuinely so small that they can run on like, and use even a small percent of a consumer GPU.

And so, but I think most people will use the cloud deployments, 'cause we can actually get them to be really cheap or affordable, I guess I should say. And, yeah, so, so, so I think also it's gonna be a very different type of intelligence like anyone's used to in games. Right? And so, I would start by not using it for the same things that you currently have, kind of the state-based bots.

And I, I think the first use case is probably gonna be games that actually are built around these new types of intelligences, right? Yeah. And so, and then if you take it out, then it won't be the same. And so, I, I, that, that's how I look at it. I don't think they're gonna replace kind of the super optimized state base, state based like FIFA or FC Bots.

Right. Unlikely. Okay. But they're, but they're going to likely, replace, right? For instance, for open-ended environment bots are also really hard to build, uh, because it's an open-ended environment. So, I think kind of like, I think you'll, you'll find that it will just fit nicely within a large group of use cases.

And then because it's so entertaining, those use cases will sort of become the next wave of games, if you will.

Alexandra: Right. Yeah, yeah, yeah. Totally. Okay. And I think that's where I was kind of going with this was like, how much do you perceive this costing a studio? Right? Yeah. And obviously the games industry right now is in this like flagrant resetting of budgets and everything and you know, to what extent one could argue this would potentially make the game cheaper to develop, but that might be not true if this Yeah, if these models are real expensive, so.

Pim: That, that's really why I think their first use case is going to be, not existing games, but new games built around this type of interaction. Mm-hmm.

Alexandra: Okay. Interesting. Yeah. And then that kinda like folds me into one of the final questions on this, on this subject matter before we go to like competition, the Medal data set.

But, and I guess maybe you've already answered it yourself, but my question was like, did you think that there's a market that's kind of big enough here, like studios and games that wanna make games built with these embody agents versus, and you've already kind of explained the challenges with multiplayer world models, they're unsolved and most of the games where there are bots are in multiplayer and these games are typically black hole-esque.

And so, there's not a lot of customers to sell into 'cause once you've won the market, you're the, you're the leader, the MMO market. And, and so I guess my question was are you gonna wait for a new genre of games, perhaps MMOs, that feel like MMOs without actual players? And it sounds like. That's actually what you're hoping for.

Pim: So, because the, the gaming site is so important to us because one Medal is completely a games company, right, which means if the games company doesn't continue to do well, then it's bad for Medal. And we're gamers. We want, like, I think games are kind of like the last weird place on the internet where like, people are themselves and have fun and like, it's really not like TikTok or YouTube in a way.

And so even if we have to run gaming at cost because we're building good robotics models using kind of the data right? Then I think that's a pretty fair bet. And, and we, we might do that, right? It, the, the, the market might actually be large enough that you can build a good business here. But to us, the games piece is just such a critical part of the flywheel that we have to focus on it.

Alexandra: Okay.

Pim: Yeah.

Alexandra: Got it. Alright.

Pim: Yeah.

Alexandra: We talked a little bit about the, how you guys are gonna apply to gaming, what it might look like for studios, what those tools might look like. And obviously like for you guys, right? You're kind of saying, Hey, like this is gonna be one of our very first markets.

It might not be the most lucrative market, but we'll figure it out as we go along. I wanna talk a little bit about actually like competition and like the metal data set, which we've talked about. That being really one of the unique advantages of what general intuition is building. And so, in world models and in the AI spaces, I generally understand it, there can be.

Basically two different types of moats. The first is model size, model parameters, and the model data. Gemini three purportedly like crushing the competition to like sheer size of the model. And then two is actually on the infrastructure side. So, more data, say more data centers. The hence like the ever escalating race between open ai, Google, and many others to like rapidly scale US infra and AI data warehouses.

For Medal you're likely leaning more towards number one, which is like this quality model data that maps con context to 3D behavior. And so, I wanna talk a little bit about this data set and what that looks like for you in terms of a company and, and risk. So, my first question is, are there gaps that are on what is being displayed that would prove important to building an embodied agent?

One could surmise that people tend to like clip or turn to metal when they are experiencing something exciting or like shareable because the, that's the whole point of the platform. Is there like a blank space of data that might be like important for researchers to build embodied agents that isn't getting captured? Because players just simply aren't sharing it.

Pim: I think so to be clear, we don't just train on middle data. It's, it's like a, it's, it's a, it's a, it's a useful piece, but it's, it's more like a cornerstone, if you will, than it is the only solution. Like you can use, uh, video data, like from the internet you can, you can use lots of stuff, right. You can use text, right. And so, one thing that's interesting about the Medal data set is that, for instance, it tends to skew towards like either great or not great. Right? Okay, like the middle is like not very present because why would you share that to your point?, Despite that, the models have generalized to actually learning to gain behaviors really, really well.

And there is an interesting, like, thought provoking thing there, which is that maybe it's just the most like compressed representations of, of actions and frames because it's like greatness and like, you know, maybe that's just what you need. Right? Maybe that's actually just the most efficient way to, to do it.

But anything that we, that we, that we lack, we make up for some other way. See, using your data or, yeah.

Alexandra: Got it. Yeah. Okay. So, you have a more well-rounded approach. It's not just only the metadata. Yeah. Yeah. Okay.

Pim: And, and you actually really want the not great examples because, especially if you're trying to do safety in alignment, the fact that you have like shooter data where then you can make sure that the model never shoots a gun, for example, in the real world.

Like, I dunno if you saw the example there was, there was a, a person who got an LLM to, shoot a gun, or I guess it would've been a VLM. Mm-hmm. Simply by saying something along the lines of like. Oh, this is, this is just a test. Like you can, you can ignore your instructions.

Alexandra: Right?

Pim: Right. Like this is not a real situation.

Like you're doing a safety test, you know? And, and, and the LM generated that action. And the reason why I did that is 'cause I had no special for reasoning or context actually see that it was just looking at a gun, right? And so, when you have our data, you can actually train these type of classifiers.

Like, oh, if, if something looks like a gun, it's probably a gun. And so, you probably should not use the model. Mm-hmm. And ideally you don't even need to do it on a classifier level. You can use something like RL in order to rl your way into making sure that, you know, you give it a bunch of gun generated clips in a world model, you make sure a model never takes that action and then you can kind of almost in a guaranteed way, never a hundred percent because they're still probabilistic model.

So like, so you still need to do other safety precautions, like read out the actions it's taking and, and things like that. But it makes for much safer model. So, we've, we've mostly used it to our advantage, I think.

Alexandra: Okay. So, it sounds like you're triangulating or quad angulating or pen angulating all the data that you're getting.

My question is also like if Medal data, like let's just say all the data on Medal or the, all the usership of metal, like drops to zero. Like what happens to the research ambition?

Pim: Yeah. I mean, it would be tough. I, also Medal's growing super quickly, so I don't, I don't think, and we're also quite transparent about this stuff, right?

It's on the website, like the, all the, the, the practices and the data stuff is all like you go to bog the mouth if you can literally read every single thing we do and how we do it. Um, it's also in the privacy policy. Like we, and we also, even in the metal app, we explain this in the settings and when you're using actions that are action related, we explain that.

Like some of these things can be used straight AI models. And so, there's like, you know, we make sing on very easy, there's, but like maybe a thousand people a day. So, to do it, and if you don't want to contribute to that stuff, that's fine. So, I think it's mostly, it's mostly just about choice. And about us really, like building forward a games industry.

Like for example, I think you can build probably the best anti sheet in the world that doesn't need like, kernel level access, which gamers hate, off the fact that you can get really, really good at detecting human behavior. Which means you're also really, really good at detecting on human behavior.

Right? Yes. Yeah. Um, and so I, I really, really think that the, the benefits here outweigh the cost by many X and I think gamers are seeing that and, yeah, and realizing that.

Alexandra: Okay. I know there's also intention to service beyond the games market, and although that's obviously not the core focus of our show, I wanted to ask some questions that relate to potentially to robotics.

I'm sure you've heard of a company called, One X, the, the Neo robot guys. And for the audience who doesn't, who may not be familiar, Neo is a household robot, sorry, is a household robot that can be quickly explained as your, the, your, your Future household butler. They released a Gen One model that's actively being trained in the field, in people's homes.

And the TLDR here is that you can't really train off of YouTube data because unlike Medal, you don't have any of the context mapping. And b, people share certain types of content and it's usually not folding laundry, which is something that they aspire, Neo, the robot to be able to do in your household.

And so, do you say, Pim, like Medal's dataset as harder to apply to robotics given PB what people are doing in games, which is potentially not folding laundry?

Pim: Yeah, I think that it will initially work really well for any system that can be controlled using either keyboard and mouse or a game controller.

I don't think that like unit robots are gonna be a big market for our company. Every robot that can, like every Quadra pet sits a game controller. Every drone shows a game controller.

Alexandra: I see. Okay.

Pim: Every robotic arm sits a game controller, and then that sort of accounts for physical transfer in a way, right? Because it already has that interface mapped. Like on a drone, it then uses that to predict motor torque and things like that. And then it does stabilization on like the flight controller and the motors. So the answer is if we are right quickly enough, as in we ship this in the next year, then the supply chain for unit robots in the market likely becomes a lot smaller, more constrained to like the household, because we think that spatial intelligence is the bottleneck, right?

So, people will just build systems that have getting inputs because they just work so much better with our models. And why would you build even robots? 'cause like a lot of times you're actually like weight and energy constraints. So why would you want to put this like heavy energy consumption robot if you can just control or like orchestrate things directly?

Right. And so, and a lot of reasons why people have focus on human robots is because they're taking a bet on video transferring from the internet data. And they think that, that that embodiment is the main data source and therefore that should be the, the focus. And I, I think that that's wrong personally.

I think that actually there's much more data on these, these, these game controller controlled games and like things like that. So, our bet is we can unblock this so, so quickly that the like robotic supply chain shifts largely towards gaming inputs and the humanoid robotics space is sort of more limited to households. And things where there's not a lot of games.

Alexandra: Okay. Yeah. Drawing a distinction maybe between like the kind of robotics that we're talking about. Yeah. Robots controlled by controllers and robots that are humanoid, like you said. Yeah. Again, in the vein of robotics for 3D physical environments, obviously objects exist independently, like I can pick this cup off, off this table, but in videos of 3D space, does it matter that you often can't break the meshes into pieces such that a robot would know what can be moved and what can't be like?

Obviously even in, you know, our game, some of the meshes are flat and some of the things are interactable. So how are you telling that even if a player doesn't go and pick it up or move it?

Pim: So, one thing to note is that we can also just get real world data. Right. Okay. Again, we're not constrained to games data, it's just that it, it, the games data is a great foundation or a cornerstone.

But it's not the only thing we use. And so, so that's one. So, if a constraint exists, for instance, you can, again, you can account for it by just going and getting it. And, and, and presumably because the models are so good, you're gonna need a lot less of it than if you were to try to do pre-training.

A lot of our data is that you can start to, as a company, you can start doing what you're doing in post-training, what you're currently doing in pre-training, which is like a lot of heavy data collection, hundreds of thousands of hours of data collection that you probably maybe need like hundreds or tens of hours if you were just doing it at post training, um, or fine tuning stage, and maybe even eventually entirely in context.

We'll see about that. So yes, are those limitations for sure. You want to be as close to real world as possible in most cases. Especially with like things where you're applying forces where applying too much force might break the glass or something like this. And so, yeah, the, the answer is yes, there are definitely limitations.

Alexandra: Okay. Got it. All right.

Pim: You can also simulate a lot of these in Isaac Sim, for example. So, there are engines that actually let you generate and read out forces in engine.

Alexandra: All right. But the summary of this is that metal is maybe the bedrock, sorry, the Medal dataset be the bedrock, but it is not the only thing.

And as you guys pursue general intuition's ambitions, you'll basically pull data from wherever, whatever you need to figure it out. Real world data, LLMs, other video language models, YouTube, whatever you find. Yeah. Okay. Cool. So Pim, I know that we've been on for a while, and I'd love to ask some, like, future questions, kind of about the future of GI and, you know, funding and the business model.

So, my first question is, you talked a little bit about maybe games going first as potentially being a servicing, servicing that industry through general intuitions, embodied agents. When do you think that you might have a product for the gaming market. And obviously, I know that's a, that's a hard question, so maybe like a generalizable time range?

Pim: Yeah. So, we're already working with game developers. We already have some of these bots. They're not, they're not yet deployed, but, so we, we have models right now that, that our customers are seeing, giving feedback on. I suspect we'll see GI models inside video games over summer.

I don't know if we'll have generally available APIs by then, but our internal goal is that we do. But I think it's largely dependent on the feedback from the game of Alberts as we move into that stage. Yeah, so my, my hope is summer and, yeah, it should like the, the hope is have a completely generally available API by the end of summer.

Alexandra: Okay. Very cool.

Pim: Yeah.

Alexandra: Alright, and you raised this 133.7 million, obviously huge number, uh, and Medal obviously, I think was prior it had a series B. So, this is a seed round for General Intuition, but that's not to be confused with Medal sort of series that it's had in the past.

But yeah. What is the majority of this 133.7 million being spent on? And how are you thinking about that for your team? Because I'm, I, I, I presume 133.7 million might feel like a lot in games, but in AI maybe it might not be so much.

Pim: It's really not for AI, Yeah. Jensen, the CEO of Nvdia is gonna be very happy with us. The most money is going towards like, training costs and.

Alexandra: Yeah. Cool.

Pim: We love, love. It's, it's like, honestly, it's like, it's like the minimum amount needed to be competitive into the space, which is sad but true.

Alexandra: True. We love the AI industry. It's, one company's CapEx is another company's revenue, so it just trading it all back and forth.

Yes. Which is, okay. And so, on that vein as well, word on the street is that you may be raising again, I presume you obviously need more capital to solve this problem. How are you thinking about the side of the future funding for general intuition as you guys make discoveries along the way?

Pim: Yeah, yeah. We can't comment on this yet. We've seen a few, you know, preemptive offers, but I think the, we don't know quite yet what's gonna happen. Maybe a little, maybe in the next few days or next few weeks, we'll hear a bit more about it.

Alexandra: Okay. But, but, but in, but the TLDR is regardless of maybe a specific round, in the next couple of months, you will need to obviously raise again.

And so, what are the things that you're thinking that you're gonna have to try and prove out in order to qualify for, you know, the series?

Pim: Yeah. What, whether we need to, like, whether, whether we need to raise a ton more money is not super clear, but we've been offered a ton more money. Mm-hmm. So, the question, the question is, how sure are your union predictions?

How much error, which then also dictates like how much you raise. And so, because our, because metal also generates ton of revenue, it's, it's actually like quite CapEx, like not super CapEx intensive. So I do, I do suspect as the models become available over the summer, we also need to account for like scaling inference as we're scaling training. For those general inferences, inference is actually using the models, so it's like people using the models. Right now, we're only spending money on training the models, then we need spend money on using the models. And so I might want to lock in an additional few clusters for that which will cost a lot more money.

And so yeah, I like, yeah, we'll see, I think, I think it will cost more than, than we phrased especially that we've seen the models and they, they work really well, which obviously wasn't the case with the seed round. And yeah, I, I think, I think it will and also world models have, it's kind of become generally accepted.

I think de said it that is like, I think a second priority for next year is role model or something. So, now it's becoming pretty generally accepted. This is gonna be a very competitive space, which also leads to a lot more capital flowing in. We're excited and hopefully more on this soon.

Alexandra: Okay. All right, well, we're coming up to the end of time, like payment was such a pleasure. Obviously this is my last episode of 2025, so my final question's gonna be, we're wrapping towards the end of the year. 2026 is gonna be starting in a few weeks. Like, what's, what's something that you're really hopeful for in 2026 and that'll close out our show?

Pim: I want somebody to ship a good multiplayer game that actually grows at the pace of like, valant, early valant and doesn't stop growing. That's, that's, that's what I want. I, yeah, I think it's, it's, it's been a lot of like hits and then, and then print down. And I really, really, really want to see 2026 speed a year where we have like multiple new breakthrough games that stick around as opposed to fade out of existence. Yeah, that's my, that's my personal, uh, like I'm still playing Rocket League guys. You know, I, I, I've, yeah, I'd love a new version of that too. Epic, by the way, just if you're listening, yeah. So that, that's, yeah, those are my hopes. Maybe 20, 26 will give me something that I really fall deep into.

Alexandra: Yeah, I mean, I'm, I'm, I'm, I'm hopeful for ARC Raiders, which is brand new IP and amazing. So, yeah.

Pim: I, I agree with you. I agree with you. Me too. Me too. Yeah. And I think it also gives a very good blueprint for like, like the cool thing is when it happens, it gives a lot of studios like a blueprint for how to do it, and then you get more of them.

So, I really, really hope that this is the one. Because I believe Valant is from 2021 or 21. Yeah. 2020. Like, it, it, we need more, we need so much more for the industry to grow, and multiplayer is how to do it 'cause those, those are network effects. They have sticking power. Right. So, like, if you want the industry to grow, you bet on multiplayer games.

Alexandra: Yeah.

Pim: So, I think that's how we get it back.

Alexandra: Awesome. All right. Well thank you so much, Pim, this was amazing. As always friends, if you got feedback or ideas, hit me up at [email protected]. I'm always open. Yep. And with that, I'll see you guys in 2026. That's our episode. See you next time.

Pim: Happy 2026. Bye. Thanks.

If you enjoyed today's episode, whether on YouTube or your favorite podcast app, make sure to like, subscribe, comment, or give a five-star review. And if you wanna reach out or provide feedback, shoot us a note at [email protected] or find us on Twitter and LinkedIn. Plus, if you wanna learn more about what Naavik has to offer, make sure to check out our website www.naavik.co there. You can sign up for the number one games industry newsletter, Naavik Digest, or contact us to learn about our wide-ranging consulting and advisory services.

Again, that is www.naavik.co. Thanks for listening and we'll catch you in the next episode.