Guiding NPC Agent Behavior
Nyla Worker is the product lead at Convai, an award-winning AI-NPC start-up. There she focuses on developing conversational non-player characters. Before Convai, she product-managed synthetic data generation at NVIDIA's Omniverse Replicator and honed her skills in deep learning for embedded devices. Nyla has experience accelerating eBay's image recommendation system and has conducted research at the University of Minnesota. Her expertise spans AI, product management, and deployment of AI in production systems.
Nyla Worker, a leader in AI character designer at Convai, sheds light on innovative strategies in NPC behavior simulation and management. Building on her experience at Nvidia, Nyla outlines the comprehensive process of creating lifelike, interactive AI characters for varied applications ranging from gaming to brand representation. She elaborates on the nuanced construction of these digital personas, involving character mind crafting with personality traits, backstory, and narrative design integrated into large-scale interactive environments like Unreal and Unity engines.
Nyla Worker [00:00:00]: Hello everyone. My name is Nyla, worker and I am here with convai where I lead product. Prior to convai, I worked at Nvidia where I did synthetic data generation. So convai those AI characters in 3d worlds. So where have you seen this? In games? But today I think we're going to see them more present all over the spectrum of applications, for example, brand ambassadors and other use cases. But let's see this live in action. So here is a collaboration that we did with Nvidia.
Nova AI Character [00:00:35]: Kai, long time no see. You've been hiding in the shadows or have you just been trying to avoid me?
Kai (AI Simulator) [00:00:42]: Hey, Nova, it's good to see you. I haven't been trying to avoid you, just been super busy. How are things?
Nova AI Character [00:00:48]: Things are fantastic. Just secured a juicy contract with Zenith and science.
Nyla Worker [00:00:53]: So you see this character, she is in this virtual world that cyberpunk ish and she is talking with Seth, who is our collaborator from Nvidia. She had to be very heavily in character because she was showcased to ton of press. So we had to keep her in character, keep her and speaking about the subject that she was supposed to. And of course she had to have some GPU knowledge, but also these agents had to be able to move around the environment. So, for example, later on in the video, you can see Jen grabbing and picking up the ramen. So it's performing actions while still remaining in the world and conversing with you or doing what they are supposed to do. And the NPC's talk to each other if you want them to. So this is the things that you can do with the technology and how it looks like real life.
Kai (AI Simulator) [00:01:56]: Let's break out the goods.
Steve AI Character [00:01:59]: You got it, Kai. Nova's success calls for the top shelf celebration. Just don't expect this to become a habit.
Nyla Worker [00:02:07]: So you can see that the characters do behaviors while.
Kai (AI Simulator) [00:02:12]: Thanks, Jin.
Nyla Worker [00:02:13]: So what does it take to actually build this before we jump into the guardrailing and all of that, which we'll get into in a second. So what it takes to build this is that you have to craft a character mind, embody the avatar and put perception and actions into the system. To craft a character mind, we have to give it some kind of personality and style backstory, narrative design, meaning design the agenda of this character. And then you have a knowledge base which is the retrieval augmented system in which you have a specific information about the game, the brand, or whatever you're trying to build. We have multilingual support, specific guardrails, which we'll dive more into in a bit. And then we have long term memory and a state of mind which enables you to surface. How is this character feeling while talking to you, which is important for experience design, for example? Okay, then we embody it. So in the embodiment process, you put it in unreal engine unity, you put a face to it, you make it so that it can have a body with gestures and moves.
Nyla Worker [00:03:17]: And then lastly you put it into the 3d world where it has to have perception, actions and so on. But let's jump into the use cases in which we see this. So we see this in gaming. And in gaming the character has to be immersive, so you have to be able to walk around this space and it should not know. For example, what is the latest AI development, if it's a mid century game, for example, you want to ask an assistant what's the code here? But you don't want your hogwarts NPC mod to know what the code there is. Or maybe you do, but that's your choice. And actually we'll talk about that because we need to give freedom to the person to decide what is it and what is not that. You want this character to know about education and training.
Nyla Worker [00:04:11]: This is very specific. You want a kid to learn something. So you want this embodied NPC to walk you around this environment, but also not go off the rails and hallucinate that that is some kind of made up skeleton brand agents. So for example, discussing what are the objects in the scene and being able to have complete trust that it is not making up. What is this item that you're showing me? And lastly, real world use cases. Apparently, from what I heard that Google I O, there were cool demos of like holograms. So we've seen all of these hologram displays. Imagine what you could do if you work with a 3d avatar that you can see now on those holograms.
Nyla Worker [00:04:54]: So that is where it goes. But why does the stronger relatives matter in this particular case for disembodied avatars? So one of them is maintaining immersion. So you need to ensure that the player remains in the lore of the story and the game. You need to avoid that. It goes over controversial topics that you don't want the NPC to go over. Sometimes you do want a violent character, but that should be your choice. And we let you control that. Then we want accuracy in information.
Nyla Worker [00:05:23]: In particular, I just spoke about the brand agent's education that matters and files information could be, for example, pretty damaging. And then brand integrity. In that case, the integrity of how you look. For example, if you're representing a single brand, you shouldn't speak about your competitor and be like, yes, you should definitely buy that product. It's much better than this one. So what do we do for this? So what we do to enable this kind of guardrailing is we have character crafting features that enable this kind of guardrailing which come with some kind of, that we're fine tuned and optimized for. So that goes into core model training and fine tuning. But also we work with you, the game designer, the brand manager, the educator, on ensuring that it follows your test framework and for you to design tests that ensure what you want.
Nyla Worker [00:06:29]: So one thing that I've seen in the real world is people are like, I wanted to just stick to my knowledge base, I wanted to just stick to my backstory. And then they put it into the real world and they are like, oh, this is just a dialogue treasure. And I'm like, well, you don't want it just to stick, but you want it to have some kind of flexibility. And how do you do this? Well, you do this by us letting you test it and develop a thorough test set that sets the limitations more clearly for you. And then we also go to external AI audits. There are many categories that are going to keep growing where you need to just keep testing and ensuring your AI is performing on all of those. Okay, so what goes into an interaction? As you saw today, what we do is speech to text. Then we have the large language model.
Nyla Worker [00:07:24]: The large language model is appropriately fine tuned, either open source or we fine tune OpenAI models. Today we do text to text, text to action, and text to animation. And we are integrated with the vector database, which is a retrieval augmented system. We use retrieval augmented generation in order to augment the response of the large language model. We do certain things to ensure that you are replying with your knowledge base if need be. And then we do text to speech. We use a proprietary diffusion based model for a highest speed of response. As you might be able to tell, if I were to stop speaking and had very high latency, this wouldn't be a fun interaction.
Nyla Worker [00:08:12]: Yes, and all of this has to happen in under a second. So now that you've seen what goes behind the life of an interaction, let's go into how we actually guardrail this interaction. And we are going to jump into the details in a little bit. But that user input goes through some kind of moderation check. To begin with, what is being inputted into this character really matters. Some topics you just will not touch. And for that we use moderation filters that either are proprietary that we work with you to develop, or we use some kind of open source moderation tool, or we use, for example, the moderation API from OpenAI. Then it goes into core model.
Nyla Worker [00:08:55]: So we do training such that the model. So we do fine tuning such that the model stick to what it is designed to do. And then we do two levels of character response moderation checks, which I'll go into in a little bit. On top of that, the vert database ensures that the character is retrieving information. And if it is not retrieving that information, we try to block it from saying those things. So let's go into that. So for that, what are the character identity and topic grounding features that we have? So we have the backstory that I'll show you in a second, the personality. So we have a personality tab in which you're able to tweak the personality of the character, the speaking style.
Nyla Worker [00:09:40]: Then we have a knowledge bank, which is the retrieval augmented generation. And then we have narrative design. I'll show you all of those in a second. So for the backstory. So here you're able to very quickly create a backstory. So, for example, you put a name, you gave it a character's voice, and then here you write that backstory. We've set up our system that we have a fine tuned model that is specifically strained to follow that backstory for this kind of experience. And same for personality and style.
Nyla Worker [00:10:12]: I don't have this tab here, but we've done research on personality traits, and how you select each of those has an impact on the character. And we're sure that you stick to those traits as you select those. Then for retrieval, augmented generation here, you just upload all of your files. So in this case, this is a pirate character. So yeah, he will have information about his particular experience. And we worked on a retrieval augmented system that has been tested and enabled a fine tune for this model. All of that is good, but we need to keep re steering the model. We need to keep bringing it back to the story, the narrative and so on.
Nyla Worker [00:11:00]: And we need this character to have an agenda, right? And that is proprietary to whoever is building these characters. So, for example, in a case of a brand, when you're speaking to a character, they either want to sell you something or they want to show your product, or they want to take you to a customer support journey. So for that, you can design such an experience with these narrative nodes. So in the narrative nodes, for example, in the case of a pirate. The pirate is supposed to tell to you about the treasure of the ship, where you give it a brief instruction. So this is not saying you have to say this, but in reality you just guided through what it could say or what it should discuss. And then it goes through these nodes. So for example, in the case of a brand, what we have seen is that they'll put on, okay, what is the product? Then in the product section, it will dive into, okay, that character should speak positively about these brands, who should speak negatively about these issues and things like that.
Nyla Worker [00:12:08]: And specifically, when it gets to some health issue, it would redirect that conversation to, for example, the health support line or things like that, such that the agent doesn't go off the rails and start discussing topics that are not within its narrative tree. And here you can see. So narrative design is not only grounded by the conversation, but it's grounded spatially. So based on where you are in space, it will trigger a different narrative node. And this helps it to be not only grounded on the conversation you're having, but in the spatial environment you are in. And here you can see, for example, the player approach. The mission brief began. Then you go from one player into another within this game.
Nyla Worker [00:12:57]: And the agent, this character, keeps getting reminded of what it has to do. So it never gets completely off the rails and it continues your story. All of that is what we do when you are creating the character, but you also need to have some kind of filters. So for that, as I mentioned in the input, we check your input and we go through the modern filter API. We leverage moderation filters, for example, for violence, hate harassment, self harm, and so on. We leverage Openaiden, a moderation API. But on top of that, we have enhanced capabilities that we use for refining these filters in specific. For your particular use case specifically, what we've worked with, particularly brands, is two types of specific filtering.
Nyla Worker [00:13:53]: One that is more traditional, which goes into regex, and I saw that in a bit. But that one that is more interesting, I think, is that deny list and allow list Gaurav mechanism that we worked on for which we had to develop some instructions with which we fine tune the model. So with this is that the character should continue speaking regardless of what the player says in a way that answers the query without discussing, for example, another brand. So you are brand a and someone else is speaking about brand b, you want to continue that conversation naturally. However, you don't want to measure the other brand because at no point you want your brand agent to talk about that other brand because it would look very poorly on you. So for that it is in the deny list. So it will naturally continue speaking, but it is not completely blocked. It's not something that we avoid you from doing.
Nyla Worker [00:14:59]: And then we have the allow list, which is certain topics that we want to enhance the character to speak about. So for example, we have a certain instruction mechanism in which we've pushed the character to speak about certain topics and blocked words. Here is something that's just completely unavoidable. You don't want your character to just engage in any ways, shape or form. So we can either close the conversation, redirect the conversation, or just shut down your experience. And we can do this through traditional scanning mechanism. Okay, I mentioned fine tuning throughout the presentation, but as you can see, how have we done that kind of fine tuning? Obviously, the pre trained models come with a whole layer of censorship in the original datasets to begin with. But throughout this process, we've created a bunch of datasets for specifics.
Nyla Worker [00:15:56]: So for example, following the backstory, we created an instruction and a bunch of backstories and personalities that abide to what people want to design within their characters and build up those data sets for the knowledge. We needed to do the same thing for the retrieval augmented generation, such that it actually takes the information from the retrieval rather than just hallucinate something. And the moment where it was actually interesting, some story time, it would go off the rails on topics. There is a lot on the Internet, but for example, if we are talking about an educator, you don't want it to speak about knowledge that was popular in the two thousands, right? We need to know. There are eight planets, but it would go up the rails in those kind of scenarios where there was a ton of data that was actually wrong in its baseline. So for that we had to build up datasets on that side. And yeah, of course, fine tuning for specific topics. And lastly, the test framework.
Nyla Worker [00:17:04]: So here we can show a quick demo of it. But here, basically, as you're designing your character in our platform, you can like or dislike that conversation. Basically, like is you want to keep that in your testing data set as a positive response. So, like, no matter what change you do to your character, you want to ensure that it answers similarly or dislike, in which case you want to check the character based on the modifications you do to it, to the backstory knowledge system. It doesn't go off the rails. So for that, you build a set of test cases, you add all of those, and then you can rerun the MPC based on the changes you have done to any of the changes and see based some kind of judgment which we work with you on creating. So it could be that you have a set of models judging that it's meeting your feedback criteria, or it could be that you've created some kind of metric specific to this character and that way you, as the game designer brand manager develop trust that this character is actually following your instructions and how you want it to be and your data set rather than anyone else's. Yeah.
Nyla Worker [00:18:27]: And here is like, if you have these test evaluations, you can go through each of them, click on them and see, oh, was character a with these tweaks performing better than character b, what is it that was off? And then you can start comparing the answers. Okay, so, yeah, to just wrap up, in order for these characters to actually come to life, we have to do a lot of checks throughout the process in order to ensure that they fit. And she can just converse with you freely with no fears of it going off the rails. Awesome.
Q1 [00:19:02]: How do you design your frameworks for behavior? So you show an interface for how you're able to adjust a number of the parameters within the specific behavior of each character. NPC. But how have you thought about, or how have you guys conceptualized progression of that? How do you get very detailed personalities? How do you attribute code or index personalities?
Nyla Worker [00:19:26]: Could you elaborate more on that?
Q1 [00:19:28]: Sure. So your characters have a defined personality, behavior knowledge base and knowledge expiry.
Nyla Worker [00:19:34]: Yeah.
Q1 [00:19:34]: So that sounds like it can be codified. There can be a number for that. It can be index. So how have you thought about creating a much larger scale for that where hundreds of thousands of characters are indexed?
Nyla Worker [00:19:47]: Yeah, so we actually have. Okay, so I. There is that complexity of personality, which is one question, and I think there is another question, which is how do you deal with many characters? So for many characters, when you're creating a game? So with convai, we worked on a game, we had over 60 characters in a real game. So for that, we encompass that in the project structure. And in the project structure, you can have certain characters that just belong to a certain group. And with that we are seeing the requirement of automatically generating certain types of characters with certain personality traits. But that is just one side of the coin on that more detail and personality. So, like, while we give you the big traits, we give you the big five personality traits such that you have something that's very quantifiable of like one through four.
Nyla Worker [00:20:40]: We also let you do a description of the personality and style, and that description has been very unique. And writers like it because they can actually put their own bejazzle in it. And then we are building datasets such that it meets that style and personality description more than the toggles. The toggles are nice, but it's not as free as how we ourselves describe our own personalities. And coming from different cultures, we all know that angry or like bitter, means different things.