Generative Interfaces Beyond Chat
Linus is a Research Engineer at Notion prototyping new interfaces for augmenting our collaborative work and creativity with AI. He has spent the last few years experimenting with AI-augmented tools for thinking, like a canvas for exploring the latent space of neural networks and writing tools where ideas connect themselves. Before Notion, Linus spent a year as an independent researcher, during which he was Betaworks's first Researcher in Residence.
At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.
Linus has spent the last few years building and experimenting with new kinds of tools for thought and software interfaces for creation, like a canvas for exploring the latent space of generative models and writing tools where your ideas connect themselves. You can find his collection of well over 100 programming projects at thesephist.com, where he's also been blogging for almost a decade. Linus is currently prototyping interfaces for collaborating and creating with AI at Notion in New York City.
I don't know how to top that, but, but I'll make my best attempt um happy to be here to talk about generative interfaces beyond chat. Uh There we go. Cool. I don't know. Well, between, between uh Harrison and that song, I don't know how to top that, but, but I'll make my best attempt um happy to be here to talk about generative interfaces beyond chat. Uh I'm, I'm Len and I'll, I'll do my entry in a bit.
But um where I wanna start today is, I think we all generally have a sense of Uh I'm, I'm Len and I'll, I'll do my entry in a bit. But um where I wanna start today is, I think we all generally have a sense of chat, C P T style chat is super useful, super valuable. You can do a lot with them. But it's, chat, C P T style chat is super useful, super valuable. You can do a lot with them. But it's, I think at this point, we all kind of accept that it's not the end of the road, right? With these interfaces. And I had a, I had AAA tweet a while ago where I was like, you guys are telling me that we're gonna invent literal super intelligence and we're gonna interact with this thing by sending text back and forth and, and it's, it's I think at this point, we all kind of accept that it's not the end of the road, right? With these interfaces. And I had a, I had AAA tweet a while ago where I was like, you guys are telling me that we're gonna invent literal super intelligence and we're gonna interact with this thing by sending text back and forth and, and it's, it's obviously, it's not the end of the road, but chat is here today.
Most um usages of language models in interfaces that I've seen in production are built on chat style, interactions or dialogue style, turn by turn kind of interactions and there is interesting experiments on the fringes, some of which I'll hopefully mention later, but chatter is here today. And so the leading question that I wanna spend our time talking about today is given that chat is where we are and given the possibilities for other things to come. obviously, it's not the end of the road, but chat is here today. Most um usages of language models in interfaces that I've seen in production are built on chat style, interactions or dialogue style, turn by turn kind of interactions and there is interesting experiments on the fringes, some of which I'll hopefully mention later, but chatter is here today. And so the leading question that I wanna spend our time talking about today is given that chat is where we are and given the possibilities for other things to come. Um How do we incrementally evolve?
What we have chat GP T style chat forward to build more interesting interfaces that balance the flexibility of language models and the power of language models with the ease of use and the intuitiveness of some of the other kinds of interfaces we can build. Um How do we incrementally evolve? What we have chat GP T style chat forward to build more interesting interfaces that balance the flexibility of language models and the power of language models with the ease of use and the intuitiveness of some of the other kinds of interfaces we can build. So uh excited to talk about that a little bit about me.
I think a lot about U I design interfaces, interaction design and A I and I've, I've spent a bunch of time thinking about that in the context of building creative tools and productivity tools. Uh So it makes sense that I am currently a notion. I'm a research engineer at notion before that I spent a couple of years working independently, also pursuing these ideas, building a lot of prototypes. Uh It sounds like some of which are gonna be linked somewhere in the chat um and have worked on other kind of productivity tool companies and, and apps before.
So uh excited to talk about that a little bit about me. I think a lot about U I design interfaces, interaction design and A I and I've, I've spent a bunch of time thinking about that in the context of building creative tools and productivity tools. Uh So it makes sense that I am currently a notion. I'm a research engineer at notion before that I spent a couple of years working independently, also pursuing these ideas, building a lot of prototypes. Uh It sounds like some of which are gonna be linked somewhere in the chat um and have worked on other kind of productivity tool companies and, and apps before.
So So if I had to road map, we're gonna talk about, I think there's sort of three big buckets first. I wanna just lay the groundwork for how should we think about conversations and dialogue what are the parts of a conversation that we should think about when we build language models for conversations. And then second, talk about specifically the idea of context in a conversation and the selection in a conversation which will will come up. And then third, I want to land on this idea of constraints and if I had to road map, we're gonna talk about, I think there's sort of three big buckets first.
I wanna just lay the groundwork for how should we think about conversations and dialogue what are the parts of a conversation that we should think about when we build language models for conversations. And then second, talk about specifically the idea of context in a conversation and the selection in a conversation which will will come up. And then third, I want to land on this idea of constraints and uh adding constraints and the benefits that they can have and how we can balance adding constraints to make interfaces more intuitive and more easy to use uh without sacrificing and power. uh adding constraints and the benefits that they can have and how we can balance adding constraints to make interfaces more intuitive and more easy to use uh without sacrificing and power. So let's talk about conversations. So let's talk about conversations.
Let's say you and your buddy are, are about to talk about something and you wanna say something. Even before you say anything at all, the con the communication channel has al already opened and kind of started because in any conversation, any kind of act uh do you, you start with a shared context? That context might be uh your friend just pulled up with a chair next to your office and you're about to pay a program. It might be, you're in a supermarket, checking something out. It might be, you're collaborating with a coworker, it might be your friend Let's say you and your buddy are, are about to talk about something and you wanna say something.
Even before you say anything at all, the con the communication channel has al already opened and kind of started because in any conversation, any kind of act uh do you, you start with a shared context? That context might be uh your friend just pulled up with a chair next to your office and you're about to pay a program. It might be, you're in a supermarket, checking something out. It might be, you're collaborating with a coworker, it might be your friend uh or a stranger walked up to you on the street. That context determines how everything that you say and everything that your interlocutor says back to you is interpreted. And so I think context is important. And notably, I think in um applications like J BT context gets very, very low billing.
Uh you basically start with zero context and you have to embed all the context that you want the model to use to interpret your words in the prompt itself, which is where I think a lot of the difficulties of prompting come from. So you start with some context uh or a stranger walked up to you on the street. That context determines how everything that you say and everything that your interlocutor says back to you is interpreted. And so I think context is important. And notably, I think in um applications like J BT context gets very, very low billing.
Uh you basically start with zero context and you have to embed all the context that you want the model to use to interpret your words in the prompt itself, which is where I think a lot of the difficulties of prompting come from. So you start with some context and then uh your the speaker will imply some intent. It could be explicit and direct. Like, hey, can you pass me a glass of water? It could be a little more uh and then uh your the speaker will imply some intent. It could be explicit and direct. Like, hey, can you pass me a glass of water? It could be a little more uh it could be a little more implicit like uh if I'm a construction worker, I'm building a house or I'm like assembling a lego kit, I might say, oh the blue brick and that's not a complete thought, but in the context, it can be interpreted to, to figure out exactly what you're looking for or it could even be just like me pointing at something, right? Or, or like doing like one of these and then, and then that my partner in, in uh the conversation can interpret the intent out of what I'm doing. And that's, that's the speaker's role.
And once you have the intent, um it could be a little more implicit like uh if I'm a construction worker, I'm building a house or I'm like assembling a lego kit, I might say, oh the blue brick and that's not a complete thought, but in the context, it can be interpreted to, to figure out exactly what you're looking for or it could even be just like me pointing at something, right? Or, or like doing like one of these and then, and then that my partner in, in uh the conversation can interpret the intent out of what I'm doing. And that's, that's the speaker's role. And once you have the intent, um especially important for language models is giving the model time to think. Uh I'm borrowing some, I'm abusing some reinforcement learning term terminology here and calling this a rollout. But uh people call this chain of thought, call this uh scratch pad. especially important for language models is giving the model time to think. Uh I'm borrowing some, I'm abusing some reinforcement learning term terminology here and calling this a rollout. But uh people call this chain of thought, call this uh scratch pad.
Um Some internal monologue for the model or for the recipient of this action to figure out exactly what you mean and you kind of do the interpretation of your intent, what you said within the context that you have. And then once the model or the uh recipient of the message is done thinking, uh there's some action, the action might be answering a question. So it might be just textual. Um But more exciting and more often, I think we're seeing lots of apps where the action is some combination of a response back uh and uh Um Some internal monologue for the model or for the recipient of this action to figure out exactly what you mean and you kind of do the interpretation of your intent, what you said within the context that you have. And then once the model or the uh recipient of the message is done thinking, uh there's some action, the action might be answering a question. So it might be just textual.
Um But more exciting and more often, I think we're seeing lots of apps where the action is some combination of a response back uh and uh the uh action that the model is taking, whether in an application or integrating with an API. the uh action that the model is taking, whether in an application or integrating with an API. So I think this is kind of the anatomy of a conversation if you really break it down um in a typical kind of language model usage style. Let's let's take uh co pilot as, as as as an example, this is a screenshot from Copilot next or Copilot X chat. All these names are insane. Um Copilot chat inside V S code. So I think this is kind of the anatomy of a conversation if you really break it down um in a typical kind of language model usage style. Let's let's take uh co pilot as, as as as an example, this is a screenshot from Copilot next or Copilot X chat. All these names are insane. Um Copilot chat inside V S code. This is one of those cases where there's very clear you're already starting with some context. If you were building something like this, you wouldn't just build the chat, you would want the language model to be aware of all the context as much context as you can get out of this application.
So the context includes things like what files do you have open uh or do you have a terminal open? Is there what's the last uh few This is one of those cases where there's very clear you're already starting with some context. If you were building something like this, you wouldn't just build the chat, you would want the language model to be aware of all the context as much context as you can get out of this application. So the context includes things like what files do you have open uh or do you have a terminal open?
Is there what's the last uh few and, and the outputs that they got because maybe the error message in the terminal um can inform what the model can do for the user. It includes even things like what line is the cursor on for the user or what lines do they have selected? Because selection is actually a really strong signal for what the user is thinking about and looking at it's kind of like pointing but but on a screen or in a text editor. So you start with some context and, and the outputs that they got because maybe the error message in the terminal um can inform what the model can do for the user. It includes even things like what line is the cursor on for the user or what lines do they have selected? Because selection is actually a really strong signal for what the user is thinking about and looking at it's kind of like pointing but but on a screen or in a text editor.
So you start with some context and there's the intent, the intent is in this case is right, a set of unit test functions for the selected code. And you can see that and there's the intent, the intent is in this case is right, a set of unit test functions for the selected code. And you can see that in in interfaces like this, you really need the context to interpret the intent correctly. And the more context you have usually the better that interpretation is going to be. And then presumably there's some some internal monologue thinking for the model. Um uh And then after that, we we get the models uh action back out. in in interfaces like this, you really need the context to interpret the intent correctly. And the more context you have usually the better that interpretation is going to be. And then presumably there's some some internal monologue thinking for the model. Um uh And then after that, we we get the models uh action back out. So So when we think of prompt based conversational interfaces, I think we usually focus on the intent, the prompt and then the completion the output the action.
But there's there's these other elements that I think we should also think about um the context in which the users uh intent is being interpreted and then also the the internal monologue. when we think of prompt based conversational interfaces, I think we usually focus on the intent, the prompt and then the completion the output the action. But there's there's these other elements that I think we should also think about um the context in which the users uh intent is being interpreted and then also the the internal monologue. So um So um this is a screenshot of Microsoft's Copilot for Excel.
I think this is an interesting example of a really rich valuable application where there is a chat interface but where there's clearly a lot more we could do. this is a screenshot of Microsoft's Copilot for Excel. I think this is an interesting example of a really rich valuable application where there is a chat interface but where there's clearly a lot more we could do. Um So, so in this case, there is so much context that the model can use to figure out exactly what the user is doing and maybe what they're trying to accomplish. Um And if you had a chat sidebar like this, the sidebar sort of exists in a totally different universe uh than the chat. I mean, the model can theoretically look at everything that's in the spreadsheet that the user has open. But in this case, in this screen Um So, so in this case, there is so much context that the model can use to figure out exactly what the user is doing and maybe what they're trying to accomplish. Um And if you had a chat sidebar like this, the sidebar sort of exists in a totally different universe uh than the chat. I mean, the model can theoretically look at everything that's in the spreadsheet that the user has open. But in this case, in this screen shot, the user having to refer to specific columns and specific part of the spreadsheet verbally by naming things saying you know, the in this case, it's the column about uh the sales data or last month sales data.
Why can't I just point and and select and then say what about this column? Right? Um In the real world, that's kind of how we work. If I want to refer to something that's in front of me, I'll just point to it, I'll just look at it. Um shot, the user having to refer to specific columns and specific part of the spreadsheet verbally by naming things saying you know, the in this case, it's the column about uh the sales data or last month sales data. Why can't I just point and and select and then say what about this column? Right? Um In the real world, that's kind of how we work. If I want to refer to something that's in front of me, I'll just point to it, I'll just look at it. Um So in this case, I think having the chat agent be co located or like co inhabit co inhabiting the same workspace that the uh the user is in I think is a key part of how to make these interfaces gel a little better. So in this case, I think having the chat agent be co located or like co inhabit co inhabiting the same workspace that the uh the user is in I think is a key part of how to make these interfaces gel a little better. And without that, I think these conversational interfaces start to feel more like the the command line interface of generative A I where you're having to specify and cram in every possible information about your intent and your action explicitly into the prompt rather than being able to work more, more fluidly in a kind of a shared working environment. And without that, I think these conversational interfaces start to feel more like the the command line interface of generative A I where you're having to specify and cram in every possible information about your intent and your action explicitly into the prompt rather than being able to work more, more fluidly in a kind of a shared working environment.
So uh how do we, how do we, where do we go from here? Well, I think I keep talking about this idea of pointing and, and using the context and then selecting things. And so one really powerful technique that I think we can look for is using our hands, using selections and pointing. So uh how do we, how do we, where do we go from here? Well, I think I keep talking about this idea of pointing and, and using the context and then selecting things. And so one really powerful technique that I think we can look for is using our hands, using selections and pointing. And when you point out things in the context or when you select things, there are a few different ways I think for the language model to observe what you're pointing out or what you're doing And when you point out things in the context or when you select things, there are a few different ways I think for the language model to observe what you're pointing out or what you're doing sort of in order from both grounded in reality. So the I, I think the most interesting and kind of out there, I think we can start with. Uh Yeah. So, so point and selection interface is uh one way to think of point and select is uh sort of breaking down your action into sort of in order from both grounded in reality. So the I, I think the most interesting and kind of out there, I think we can start with. Uh Yeah. So, so point and selection interface is uh one way to think of point and select is uh sort of breaking down your action into nouns and then verbs.
So what I mean by that is if you're in a spreadsheet, the noun might be like the column that I want to manipulate and you select the noun and then the verb could be, I want to filter it or I want to aggregate it or I wanna hide it or delete it or duplicate it. Um If you're in a writing app the noun might be uh a single block in the writing app, like the title block or it could be the entire page or it could be like a new file um in the real world. This point and select mechanic is sort of built into every object in every material. If I wanna nouns and then verbs. So what I mean by that is if you're in a spreadsheet, the noun might be like the column that I want to manipulate and you select the noun and then the verb could be, I want to filter it or I want to aggregate it or I wanna hide it or delete it or duplicate it. Um If you're in a writing app the noun might be uh a single block in the writing app, like the title block or it could be the entire page or it could be like a new file um in the real world.
This point and select mechanic is sort of built into every object in every material. If I wanna take action on some object, I have to first like grab the thing and then do something with it. Um But in, in chat style interfaces, I think it's less obvious. take action on some object, I have to first like grab the thing and then do something with it. Um But in, in chat style interfaces, I think it's less obvious. But this point in select mechanic is also what makes the web great for a lot of applications because there's existing sort of materiality built into everything on the web. Every bit of text by default on the web is selectable. You can select anything you can copy, paste anything, uh you can uh often drag and drop files into web pages. And so there's all this like noun in Denver based mechanic built into uh materials that you use to build apps on the web. But this point in select mechanic is also what makes the web great for a lot of applications because there's existing sort of materiality built into everything on the web. Every bit of text by default on the web is selectable. You can select anything you can copy, paste anything, uh you can uh often drag and drop files into web pages. And so there's all this like noun in Denver based mechanic built into uh materials that you use to build apps on the web.
And uh in chat, all of that kind of all of those avoidances around selecting objects and then applying actions to them are are kind of gone. And I think we might, we could think about how to bring that back to uh chat interfaces And uh in chat, all of that kind of all of those avoidances around selecting objects and then applying actions to them are are kind of gone. And I think we might, we could think about how to bring that back to uh chat interfaces and point and select I think are most useful for helping clarify context and focus the model on something you're looking at or uh alternatively directing the kind of action stream or the output stream of the model. So if you're in a writing app, you could say, you know, take, select something, take to summarize this bit and then like put it here at the top of the page or, or, you know, make a new page over over here um and be up the point point and select, I think are useful for directing the output as well. and point and select I think are most useful for helping clarify context and focus the model on something you're looking at or uh alternatively directing the kind of action stream or the output stream of the model. So if you're in a writing app, you could say, you know, take, select something, take to summarize this bit and then like put it here at the top of the page or, or, you know, make a new page over over here um and be up the point point and select, I think are useful for directing the output as well. So there are a few ways that we can get the model to observe what you're doing or what you're pointing out the most common one currently. And I think the most obvious one is sort of this what I'm calling like the omniscient model, which is the model can look at everything everywhere visible all the time. It just kind of knows the entire state of the application, but it's up to the model to figure out what to query and what to look at. It's so it's technically the context is technically fully accessible, but the model doesn't know exactly what you want. Uh the model to look at.
So there are a few ways that we can get the model to observe what you're doing or what you're pointing out the most common one currently. And I think the most obvious one is sort of this what I'm calling like the omniscient model, which is the model can look at everything everywhere visible all the time. It just kind of knows the entire state of the application, but it's up to the model to figure out what to query and what to look at. It's so it's technically the context is technically fully accessible, but the model doesn't know exactly what you want. Uh the model to look at next level up from that is what I'm calling call by name, which I think is kind of interesting for certain types of application, especially kind of pro applications where there's a lot of flexibility and customization. If you have an application like like a design app like Sigma or a sketch, you could imagine naming different art boards or different panels and then being able to app mention and say, hey, can you clean up, you know, panel two or can you clean up like the like timeline panel. next level up from that is what I'm calling call by name, which I think is kind of interesting for certain types of application, especially kind of pro applications where there's a lot of flexibility and customization. If you have an application like like a design app like Sigma or a sketch, you could imagine naming different art boards or different panels and then being able to app mention and say, hey, can you clean up, you know, panel two or can you clean up like the like timeline panel. So uh this only way that makes sense for environments where it makes sense to name objects and refer to them by kind of handles or names.
But if that's the case, then I think this is an interesting way to incorporate context and be able to directly point to things. But with your words, using names, So uh this only way that makes sense for environments where it makes sense to name objects and refer to them by kind of handles or names. But if that's the case, then I think this is an interesting way to incorporate context and be able to directly point to things. But with your words, using names, there's also this kind of really interesting uh interface that I, I don't think anybody's really seen in production, which is what I'm calling literally mentioning something. Um This in particular is a screenshot from a paper from a project called S from uh I believe uh an M I T lab where they uh had a programming language that interleaved there's also this kind of really interesting uh interface that I, I don't think anybody's really seen in production, which is what I'm calling literally mentioning something. Um This in particular is a screenshot from a paper from a project called S from uh I believe uh an M I T lab where they uh had a programming language that interleaved icons and images with a way to program around the U I. And you could mention uh you could imagine an interface where if I wanted to refer to a paragraph, I could start writing and saying summarize and then I literally drag and drop the paragraph into the prompt box. Or if I wanted to transform an image, I could drag and drop an image or, icons and images with a way to program around the U I. And you could mention uh you could imagine an interface where if I wanted to refer to a paragraph, I could start writing and saying summarize and then I literally drag and drop the paragraph into the prompt box. Or if I wanted to transform an image, I could drag and drop an image or, or um if I have to, if I want to talk to a person that's on my contact list, I could grab that person's icon and, and say, hey, can you call this person and then just drag and drop that image of that object. And so having support for these rich objects inside the prompt box, I think is a really interesting possibility. or um if I have to, if I want to talk to a person that's on my contact list, I could grab that person's icon and, and say, hey, can you call this person and then just drag and drop that image of that object.
And so having support for these rich objects inside the prompt box, I think is a really interesting possibility. And then the last one is uh what I'm calling contextual actions, which is a, a great example of this is uh like a right click. And then the last one is uh what I'm calling contextual actions, which is a, a great example of this is uh like a right click. So uh uh an example of the right click is these context menus, right? So left is notion right is Sigma, you grab an object sort of metaphorically and then you can see all the options that are available to you. All the things that you might want to do with that object in a lot of current applications, these are hard coded in. But you could imagine using a language model to say, OK, here's what, here's the object that the user hasn't So uh uh an example of the right click is these context menus, right? So left is notion right is Sigma, you grab an object sort of metaphorically and then you can see all the options that are available to you. All the things that you might want to do with that object in a lot of current applications, these are hard coded in. But you could imagine using a language model to say, OK, here's what, here's the object that the user hasn't their hands. What are the most likely actions given the full context of the application and given maybe even their like history of actions and what the tough title of the file is and what they want to do. Uh What are the most likely actions they might want to do? You could, you could have the model select from a list. You could also have the model generate possible trajectories that the user might wanna take. And so context menu I think is an interesting way to reveal actions that you want the user to take without having to force them to, to type the instruction out fully. their hands. What are the most likely actions given the full context of the application and given maybe even their like history of actions and what the tough title of the file is and what they want to do.
Uh What are the most likely actions they might want to do? You could, you could have the model select from a list. You could also have the model generate possible trajectories that the user might wanna take. And so context menu I think is an interesting way to reveal actions that you want the user to take without having to force them to, to type the instruction out fully. Another kind of context me pattern is this kind of dot driven programming or autocomplete driven programming, which I think is that the analog of right click but with text. So if I'm typing in a text editor or code editor and you hit like dot like document dot potty dot And it'll show me all the autocomplete options. This is kind of like saying I'm holding this object in my hand. What are all the things that are accessible to me from this or what are the actions that I can take from this or uh Another kind of context me pattern is this kind of dot driven programming or autocomplete driven programming, which I think is that the analog of right click but with text. So if I'm typing in a text editor or code editor and you hit like dot like document dot potty dot And it'll show me all the autocomplete options. This is kind of like saying I'm holding this object in my hand. What are all the things that are accessible to me from this or what are the actions that I can take from this or uh in the in the other panel I have tab completion. So I'm working inside the terminal, I have the cli in my hand. What are the things that I can do with it? Tell me the possibilities. And this is another way of grabbing an object and then sort of showing me what's possible. And you can imagine powering something like this with a language model as well. in the in the other panel I have tab completion. So I'm working inside the terminal, I have the cli in my hand. What are the things that I can do with it? Tell me the possibilities. And this is another way of grabbing an object and then sort of showing me what's possible. And you can imagine powering something like this with a language model as well. And then lastly, this is a slightly more complex pattern. But, but where you, if the user selects an object, you could uh materialize an entire piece of U I like a side panel or, or a kind of an overlay. So on the left again, is notion A I on the right is, is keynote uh which is using to make the stack.
And in either case, you select an object and then you can see a whole host of options for how you want to control it. And this gives the user a lot of extra power at the cost of maybe not being obvious exactly what the user wants to do or what the user should take action on. And then lastly, this is a slightly more complex pattern. But, but where you, if the user selects an object, you could uh materialize an entire piece of U I like a side panel or, or a kind of an overlay. So on the left again, is notion A I on the right is, is keynote uh which is using to make the stack. And in either case, you select an object and then you can see a whole host of options for how you want to control it. And this gives the user a lot of extra power at the cost of maybe not being obvious exactly what the user wants to do or what the user should take action on. So, uh in all these cases, we have this sort of like noun and then verb like choose the object and then what action you wanna take kind of pattern. And that lets the system constrain the action space that the user might want to take and maybe even come up with uh follow ups or suggestions on what the best actions you could take are given. All of this given and given what we talked about around the anatomy of a conversation, I think. So, uh in all these cases, we have this sort of like noun and then verb like choose the object and then what action you wanna take kind of pattern. And that lets the system constrain the action space that the user might want to take and maybe even come up with uh follow ups or suggestions on what the best actions you could take are given. All of this given and given what we talked about around the anatomy of a conversation, I think. And, and then when you look back at something like cha BT chat P is really just about OK, you have this little tiny prompt box and you have to cram all of the context all of the intent in there and also everything that you want the model to know about where you want that model to take its action. And that, that I I think is um a good place to start, but it is limiting and, and there are ways we can expand out of it.
And, and then when you look back at something like cha BT chat P is really just about OK, you have this little tiny prompt box and you have to cram all of the context all of the intent in there and also everything that you want the model to know about where you want that model to take its action. And that, that I I think is um a good place to start, but it is limiting and, and there are ways we can expand out of it. Um So one way to summarize the, the sort of ground that we've covered might be that the holy grail or one powerful goal of user interface design is to balance um intuition building an intuitive U I that's uh easy to learn and, and sort of progressively understand but flexible Um So one way to summarize the, the sort of ground that we've covered might be that the holy grail or one powerful goal of user interface design is to balance um intuition building an intuitive U I that's uh easy to learn and, and sort of progressively understand but flexible um and intuitive and flexibility, I think is the strength of language models.
Um In chat style interfaces with chats cell interfaces, you have the full access to the full capabilities of a model. You can ask it to do anything that the model could possibly do, including things like use API and use tools and fully specify like a programming language that you want the model to use if you want. Uh And that's the strength of falling back to chat. But by adding um and intuitive and flexibility, I think is the strength of language models. Um In chat style interfaces with chats cell interfaces, you have the full access to the full capabilities of a model. You can ask it to do anything that the model could possibly do, including things like use API and use tools and fully specify like a programming language that you want the model to use if you want. Uh And that's the strength of falling back to chat. But by adding uh these constraints where you start with something in your hand and then try to recommend or suggest or follow up and say given, this is what you're looking at, given this is the locus of your attention right now. Here are the things that you can do. Um And, and maybe predict some actions and add some guard rails. Uh add some structure to the instruction. I think that's where we can add sort of bring back the intuitiveness of of graphical user interfaces without sacrificing the power of language models, uh these constraints where you start with something in your hand and then try to recommend or suggest or follow up and say given, this is what you're looking at, given this is the locus of your attention right now.
Here are the things that you can do. Um And, and maybe predict some actions and add some guard rails. Uh add some structure to the instruction. I think that's where we can add sort of bring back the intuitiveness of of graphical user interfaces without sacrificing the power of language models, open-ended natural language interfaces. I think uh trades off too much of that intuitiveness for, for the flexibility. Um So in, in an app like J BT, you have this blank page syndrome where the user doesn't know exactly what they're supposed to type in. Maybe they have AAA sense of maybe I want like a summary or maybe I want uh a conversation of a certain style but uh there's no affords in the U I to give them hints of, OK, these are the things that the model is good at. open-ended natural language interfaces. I think uh trades off too much of that intuitiveness for, for the flexibility. Um So in, in an app like J BT, you have this blank page syndrome where the user doesn't know exactly what they're supposed to type in. Maybe they have AAA sense of maybe I want like a summary or maybe I want uh a conversation of a certain style but uh there's no affords in the U I to give them hints of, OK, these are the things that the model is good at. These are the ways that you might have phrase the answer. None of this, none of this exists.
And so I think it adds a huge learning curve and is detrimental to the ease of discovery. And by bringing back some of these graphical interfaces, uh I think we can improve that situation a bit. These are the ways that you might have phrase the answer. None of this, none of this exists. And so I think it adds a huge learning curve and is detrimental to the ease of discovery. And by bringing back some of these graphical interfaces, uh I think we can improve that situation a bit. And then lastly since I'm closing it on time, um I wanted to add one more note around uh another goal of interface design often which I think is closing feedback loops, particularly in creative applications and sometimes also in productivity applications. Uh You wanna try to tighten the cloth, the um And then lastly since I'm closing it on time, um I wanted to add one more note around uh another goal of interface design often which I think is closing feedback loops, particularly in creative applications and sometimes also in productivity applications. Uh You wanna try to tighten the cloth, the um tighten the the feedback loop uh between the user attempting something and maybe having something in their mind they want to see and then looking at the result, evaluating the result and then figuring out OK, this is how I need to iterate. This is the fix that I need to apply to get, get the model to generate what I want. tighten the the feedback loop uh between the user attempting something and maybe having something in their mind they want to see and then looking at the result, evaluating the result and then figuring out OK, this is how I need to iterate.
This is the fix that I need to apply to get, get the model to generate what I want. And uh there are a few ways to do this right. One is instead of the model generating one output. If you're say, generating images instead of the model, generating one output, it could generate a range of outputs. Uh And that allows users to pick OK. These are, these are maybe four different ways of looking at this answer or four different images. That you could generate. This is the one that I like. And then iterate on that again, this only really works if the output is easy to evaluate. If I asked the model to write an essay and it gave me four different essays, it would be, it would be pretty difficult to use. And uh there are a few ways to do this right. One is instead of the model generating one output. If you're say, generating images instead of the model, generating one output, it could generate a range of outputs. Uh And that allows users to pick OK. These are, these are maybe four different ways of looking at this answer or four different images. That you could generate. This is the one that I like.
And then iterate on that again, this only really works if the output is easy to evaluate. If I asked the model to write an essay and it gave me four different essays, it would be, it would be pretty difficult to use. Um Sort of going along with that idea. I think uh you want um Um Sort of going along with that idea. I think uh you want um if, whenever possible you want to prefer uh what I've heard referred to as like people like to shop more than like to create, they like having um a range of options they can choose from and maybe even like swipe left and right style kind of do I like this? Do I not? They want that kind of interface is easier to use and more intuitive uh and more engaging.
Then here's a blank page, tell me exactly what you want. Um And again, powering that kind of thing comes back to OK. What coming up with uh options and predictions of actions and suggestions that you, that you sort of if, whenever possible you want to prefer uh what I've heard referred to as like people like to shop more than like to create, they like having um a range of options they can choose from and maybe even like swipe left and right style kind of do I like this? Do I not? They want that kind of interface is easier to use and more intuitive uh and more engaging. Then here's a blank page, tell me exactly what you want. Um And again, powering that kind of thing comes back to OK. What coming up with uh options and predictions of actions and suggestions that you, that you sort of uh plan out for the user in case they want? Um And then the user can make that selection. And then lastly, um uh plan out for the user in case they want? Um And then the user can make that selection.
And then lastly, um I've seen some prototypes of this, this thing that I'm calling interactive components. What I'm referring to this, we're referring to by interactive components is if you are in a chat kind of interface and you ask a question and instead of responding with a paragraph of answer. Maybe I ask like, uh what's the weather in New York? And instead of responding with a paragraph of answer, the model says, OK, here's maybe, maybe it says the temperature tomorrow is 85 then there I've seen some prototypes of this, this thing that I'm calling interactive components. What I'm referring to this, we're referring to by interactive components is if you are in a chat kind of interface and you ask a question and instead of responding with a paragraph of answer. Maybe I ask like, uh what's the weather in New York? And instead of responding with a paragraph of answer, the model says, OK, here's maybe, maybe it says the temperature tomorrow is 85 then there a little weather widget with a slider for time or with, with buttons for looking at precipitation and these other things.
And the model can synthesize, maybe the model will be able to synthesize little interactive components, little widgets on the fly. And that again helps me close my feedback loop by saying, OK, these, there are these other options of information that I can look at and I can really explore them um really directly without having to re prompt and retype my, my, my queries. a little weather widget with a slider for time or with, with buttons for looking at precipitation and these other things. And the model can synthesize, maybe the model will be able to synthesize little interactive components, little widgets on the fly. And that again helps me close my feedback loop by saying, OK, these, there are these other options of information that I can look at and I can really explore them um really directly without having to re prompt and retype my, my, my queries.
So uh bringing all back, I wanted to close out with uh one of my, one of my favorite quotes from one of my favorite uh papers, essays when I'm Thinking about Creative Tools uh by um So uh bringing all back, I wanted to close out with uh one of my, one of my favorite quotes from one of my favorite uh papers, essays when I'm Thinking about Creative Tools uh by um Kate Compton in this essay called Casual Creators. Um And I, I think this quote is great. So I'm just gonna quote at length the possibility space of creative tools and what you can do. The action space should be narrow enough to exclude broken artifacts like models that fall over or break when, when you're in a 3D printing app. Kate Compton in this essay called Casual Creators. Um And I, I think this quote is great.
So I'm just gonna quote at length the possibility space of creative tools and what you can do. The action space should be narrow enough to exclude broken artifacts like models that fall over or break when, when you're in a 3D printing app. But it should be broad enough to contain surprising artifacts as well. The surprising quality of the artifacts motivates the user to explore the possibility space in search of new discoveries, new use cases, a motivation which disappears if the space is too uniform. So again, she's talking about this balance of you want to constrain just a bit just enough that the user never gets stuck in that like blank page state that there's always some option that they can take or always some suggested action that seems interesting. But it should be broad enough to contain surprising artifacts as well.
The surprising quality of the artifacts motivates the user to explore the possibility space in search of new discoveries, new use cases, a motivation which disappears if the space is too uniform. So again, she's talking about this balance of you want to constrain just a bit just enough that the user never gets stuck in that like blank page state that there's always some option that they can take or always some suggested action that seems interesting. Uh you want to preserve the power and the flexibility and the sometimes surprising quality of these language models. And I think that striking that balance uh is is sort of the primary challenge of building interfaces for these models. Uh you want to preserve the power and the flexibility and the sometimes surprising quality of these language models. And I think that striking that balance uh is is sort of the primary challenge of building interfaces for these models. Oh, Oh, something's happening. something's happening. There we go. OK.
Uh uh So last slide just to sum up, I think five big ideas that I want. Uh it'd be great if you could take away from this, this conversation, I think good dialog interfaces built on L L MS can have agents that co inhabit your workspace that are there and can see what you're doing in its full detail, including where your attention is. It should take full advantage of the rich shared context that you have uh with the model There we go. OK. Uh uh So last slide just to sum up, I think five big ideas that I want. Uh it'd be great if you could take away from this, this conversation, I think good dialog interfaces built on L L MS can have agents that co inhabit your workspace that are there and can see what you're doing in its full detail, including where your attention is. It should take full advantage of the rich shared context that you have uh with the model uh to interpret your actions so that you don't have to cram everything into a prompt.
I think these interfaces can lead initially with constrained happy path actions that you can use models and other language models and other predictive models to try to predict. And then if the user wants to do something more advanced or different. We can always fall back to chat as an escape hatch because there is the power and the flexibility in language models. And then lastly, uh uh to interpret your actions so that you don't have to cram everything into a prompt. I think these interfaces can lead initially with constrained happy path actions that you can use models and other language models and other predictive models to try to predict. And then if the user wants to do something more advanced or different. We can always fall back to chat as an escape hatch because there is the power and the flexibility in language models.
And then lastly, uh whether you're building a chat interface or something, a little more direct manipulation, graphical. Uh It's I think it's always good to think about how we can speed up that iteration loop, especially by forcing the user not, not forcing the user to type text, but by responding uh more directly with the, with the mouse or with a touchscreen for uh closing that feedback loop. whether you're building a chat interface or something, a little more direct manipulation, graphical. Uh It's I think it's always good to think about how we can speed up that iteration loop, especially by forcing the user not, not forcing the user to type text, but by responding uh more directly with the, with the mouse or with a touchscreen for uh closing that feedback loop.
So with that, uh I hope that was interesting and useful and hope you can build some, some great conversational applications uh Given there's a, So with that, uh I hope that was interesting and useful and hope you can build some, some great conversational applications uh Given there's a, wow, wow. I mean, so many questions, there's so much stuff that's going through my mind and I love the idea of how it's like wow, wow. I mean, so many questions, there's so much stuff that's going through my mind and I love the idea of how it's like you, you, you're helping guide people that is so nice to think about instead of just leaving this open space and then trying to figure it out. It's like, hey, can we can we suggest things so that people can figure it out with us as opposed to just letting their, their imagination go wild and then it may or may not turn out? OK. you're helping guide people that is so nice to think about instead of just leaving this open space and then trying to figure it out. It's like, hey, can we can we suggest things so that people can figure it out with us as opposed to just letting their, their imagination go wild and then it may or may not turn out? OK. Yeah, exactly. I think Yeah, exactly. I think some uh there's some history of predictive interfaces like this. And I think uh in the design world, collectively, our, our taste has sort of been soured a bit on predictive interfaces because the model that we've used models that we've used in the past have not been that good. And so we couldn't really predict that far and we could really predict only simple actions. some uh there's some history of predictive interfaces like this. And I think uh in the design world, collectively, our, our taste has sort of been soured a bit on predictive interfaces because the model that we've used models that we've used in the past have not been that good.
And so we couldn't really predict that far and we could really predict only simple actions. But I've seen prototypes of like programming interfaces where given the full file context, uh you can predict not only code but you can predict, hey, do you want to refactor this function or do you want to like rewrite this type into this other type? Or if you're in a creative app, you could predict a fairly complex trajectories for the user like, hey, do you, But I've seen prototypes of like programming interfaces where given the full file context, uh you can predict not only code but you can predict, hey, do you want to refactor this function or do you want to like rewrite this type into this other type? Or if you're in a creative app, you could predict a fairly complex trajectories for the user like, hey, do you, you want to take this drawing and Recolor it in this way or do you want to apply this filter and then this other filter and uh given the power of these models? Uh I think we should, I think it's worth taking another look at um these predictive interfaces as well. Obviously leaving the escape hatch that is uh just normal chat. you want to take this drawing and Recolor it in this way or do you want to apply this filter and then this other filter and uh given the power of these models? Uh I think we should, I think it's worth taking another look at um these predictive interfaces as well. Obviously leaving the escape hatch that is uh just normal chat. Yes. So I'm excited for the day. That notion automatically knows I want to create a table with something and it will populate it with exactly what I want. And uh I'm guessing that you're going to be one of the people that's making that the reality of the future Yes. So I'm excited for the day. That notion automatically knows I want to create a table with something and it will populate it with exactly what I want. And uh I'm guessing that you're going to be one of the people that's making that the reality of the future one day. one day. Sweet man. Well, this was awesome.
There's so many incredible questions for you that are happening in the chat. So if you all want to continue the conversation, I am pretty sure Lini is on Slack and it's not at Linus. It is at Sweet man. Well, this was awesome. There's so many incredible questions for you that are happening in the chat. So if you all want to continue the conversation, I am pretty sure Lini is on Slack and it's not at Linus. It is at the, it's my, there, it is my internet name. I guess so. the, it's my, there, it is my internet name. I guess so. At the, At the, and I dropped his, uh his blog in the chat too in case you wanna get schooled a little bit more and go down deeper down the rabbit hole. Thank you, Lin for coming on and. and I dropped his, uh his blog in the chat too in case you wanna get schooled a little bit more and go down deeper down the rabbit hole. Thank you, Lin for coming on.