MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Ghostwriter - AI Writing That Learns From You

Posted Mar 06, 2024 | Views 213
# Ghostwriter
# AI Writing
# Shortwave
Share
speakers
avatar
Jonny Dimond
CTO @ Shortwave

Jonny is the co-founder and CTO of Shortwave where he has been integrating AI Into every aspect of email. Before Shortwave, he worked on the real-time query engine in Firestore. Jonny has a degree in Computer Science from Karlsruhe Institute of Technology.

+ Read More
avatar
Adam Becker
IRL @ MLOps Community

I'm a tech entrepreneur and I spent the last decade founding companies that drive societal change.

I am now building Deep Matter, a startup still in stealth mode...

I was most recently building Telepath, the world's most developer-friendly machine learning platform. Throughout my previous projects, I had learned that building machine learning powered applications is hard - especially hard when you don't have a background in data science. I believe that this is choking innovation, especially in industries that can't support large data teams.

For example, I previously co-founded Call Time AI, where we used Artificial Intelligence to assemble and study the largest database of political contributions. The company powered progressive campaigns from school board to the Presidency. As of October, 2020, we helped Democrats raise tens of millions of dollars. In April of 2021, we sold Call Time to Political Data Inc.. Our success, in large part, is due to our ability to productionize machine learning.

I believe that knowledge is unbounded, and that everything that is not forbidden by laws of nature is achievable, given the right knowledge. This holds immense promise for the future of intelligence and therefore for the future of well-being. I believe that the process of mining knowledge should be done honestly and responsibly, and that wielding it should be done with care. I co-founded Telepath to give more tools to more people to access more knowledge.

I'm fascinated by the relationship between technology, science and history. I graduated from UC Berkeley with degrees in Astrophysics and Classics and have published several papers on those topics. I was previously a researcher at the Getty Villa where I wrote about Ancient Greek math and at the Weizmann Institute, where I researched supernovae.

I currently live in New York City. I enjoy advising startups, thinking about how they can make for an excellent vehicle for addressing the Israeli-Palestinian conflict, and hearing from random folks who stumble on my LinkedIn profile. Reach out, friend!

+ Read More
SUMMARY

Knowing what to say is often the easy part - actually writing it is hard part. Ghostwriter will learn from your past, mimic your style and tone, include relevant content and write emails that you would have written. This talk discusses the core ideas behind Ghostwriter and how we are able to make an AI not just sound like you, but write exactly what you would have written.

+ Read More
TRANSCRIPT

Ghostwriter - AI Writing That Learns From You

AI in Production

Slides: https://docs.google.com/presentation/d/1a6kdDCXQNb7AB5MPEV2Z6mDR5401pJ0DliY9fnYYocc/edit?usp=drive_link

Adam Becker [00:00:05]: You guys did an excellent job, because next up is Jonny. Hello, Jonny.

Jonny Dimond [00:00:09]: Hi. It's great to be here.

Adam Becker [00:00:11]: And do you feel just, like, the coherence and the thematic consistency between just for what we just heard from Alex?

Jonny Dimond [00:00:21]: Yes, absolutely.

Adam Becker [00:00:23]: Great. Talking about today then.

Jonny Dimond [00:00:26]: Cool. We're going to talk about ghostwriter, basically the tech behind our AI writing feature that we built into our shortwave app.

Adam Becker [00:00:37]: I would love to just wear the prompt engineer glasses right now that Alex given us, just to try to read what it is that you're doing from even, like, a prompt engineering perspective, if anything. But for you, it isn't just the text being the input, it is the text being the output. And that is the thing that you're sort of selling. And so very curious to see what it is that you're building. And I let you to it, and I'll come back in 1015 minutes with some questions.

Jonny Dimond [00:01:09]: Cool. Sounds good. All right, let's get started then. I'm here to talk about ghost writer AI writing that learns from you. I am one of the co founders and CTO at Shortwave. And at shortwave, we're building the smartest email app on planet Earth. And we give you an AI executive assistant right in your inbox. And it could be used to summarize your emails, to ask questions, to search your email, and maybe most importantly, to write emails that sound like you.

Jonny Dimond [00:01:44]: We launched this earlier last year, and writing has been one of our most popular features. And we built this tech to basically make the email sound like you. And we then used that tech to ship a new feature earlier this year, which is a feature where it will autocomplete the drafts that you're writing right in your editor. And today I'll be diving into how we built it and what the tech is behind that. I always like to spice talks up with a demo, always add a little danger that the demo won't quite work as it should, especially because AI isn't involved and you never know what the AI will say. So here I have an email from Bobby, and Bobby wrote in, and Bobby asked, hey, I want to try shortwave. Do you have a promo code? And you can see the AI already suggested an opening. I'll just tap complete that.

Jonny Dimond [00:02:42]: And this is actually a phrase that I would typically use for someone who expresses interest in short wave. But in this case, we want to actually answer with the promo code, and you can see it understands the context. And I could just tab complete, and it will actually insert the right thing, and it'll go ahead and it will actually suggest the promo code and say, 10% off. And now here's the key. This is not an hallucination. This is an actual promo code. And the reason it knows this is a promo code is I set heads up to the team earlier saying, hey, I want to create a promo code for the AI and production attendees. I want to name it AI Project 2024, and it should give 10% off.

Jonny Dimond [00:03:29]: And the AI was basically able to learn from what I've written in the past and able to autocomplete that information directly in the editor. So if you've used GBD four to write emails, and I'm sure many of you have, and you just ask it, hey, write an email, it will produce the most boilerplate, boring email ever. And at this point, I'm pretty sure they've hard coded that. If you ask it to write an email, it will say, I hope this message finds you well. And then it'll just spit out full paragraphs of long winded text. And I don't know about you, but that's definitely not how I write emails. And so, talking to users, there are basically two problems that we identified. The first one is the AI needs to sound like you.

Jonny Dimond [00:04:14]: Whatever it writes, it needs to sound like you, because you won't send an email that just sounds AI generated. And the second part, which is maybe even more important, is it actually needs to fill out the right information. Just writing a wall of text is not useful. You want to write what you actually wrote. And so if we look at the two problems, there's a bunch of things we tried, and the first thing we tried is, well, what if we just give you a description of what you sound like? And this is actually the first version that we launched, and it worked fairly well. We gave a description. There's basically a psychological analysis of you, like, what kind of sentence structure do you use? Do you like to use emojis? What kind of sign off and salutations do you use? Are you heavy on tactical terms, et cetera? And that worked fairly well. Another thing we tried is to fine tune a model on emails you wrote, and this actually works really, really well.

Jonny Dimond [00:05:07]: It's just very challenging to do at scale across all the users of shortwave. And then we actually discovered something that works really well, too, which is if you give the AI examples of what you wrote, it's able to mimic the style based just on those examples. So we look at the second problem is we also want it to add the right information. And there's the obvious choice, which is well, you can just tell the LLM what it should write, but at that point you're sort of spending a bunch of time giving instructions and you could may as well just write the email yourself. There's another thing that we actually built into the assistant, which is we have a feature where you can perform AI search and ask about information like what is the Wi Fi password of our office? And you can actually tell Assistant, hey, reply to this email with the Wi Fi password of our office. It'll do the search and it'll add it to the email. But we also found that if the example emails you give to the AI have the right information, then the AI will actually happily use that information. And so we sort of figured both of these problems have the same kind of solution, which is if you're able to find emails that both show how the user writes and contains relevant information, then it's able to produce emails that sound very much like you and include the right content.

Jonny Dimond [00:06:39]: And so this is what we ended up doing. So in our prompt to the LLM, we find a bunch of relevant emails and we tell the LLM, copy the style, copy the content and generate a reply based on that. So it turns into a new problem of how do we find the relevant examples? How do you find emails that are well suited for the current email you're writing? And the idea is pretty basic. First step is index embeddings for all the users emails. So we take your emails, we run it through an embedding model, and now we have a semantic understanding of every email that you have. And then we want to find emails that the user sent by similarity. Basically for any given thread. Have you responded to a similar thread in the past? If yes, then that email is highly likely to be relevant in this context.

Jonny Dimond [00:07:36]: If you started to write a draft, then that is also highly relevant in the current context, right. If you're writing a draft about promo codes and you've written an email about promo codes in the past, we should include that email in the examples. And so we gather those emails and we include them in the prompt. So this is basically how our system works in a very simplified version. On the right, we have our indexing pipeline and this is an offline stage where as the emails come in, we'll run it through an embedding model, we'll store it in a vector database, and that makes sure that we have basically all of your emails readily available for search when we need to. When you then go ahead and ask the system to draft an email for you or you writing an email in the editor, we take the current thread or the draft you're writing and we run that through the same embedding model. And then we can go and do an AI search or a semantic search in the vector database and find emails that you've sent that are similar to the ones we're looking for. And so we get a bunch of results and we sort of take the top five to ten.

Jonny Dimond [00:08:43]: This depends on the exact length of the emails and other conditions. We sort of take five to ten simple emails you sent. We combine it with a system prompt that we've written and some custom instructions that the user can give, like preferred language or preferred sign offs, et cetera. We take all of that, we combine it into a single large prompt and we run that through an LLM. And the LLM will then output the draft for assistant we just use off the shelf GBD four. But for the autocomplete model, we actually noticed that we can improve its performance significantly if we do some fine tuning. So one of the things you have to do with autocomplete is you kind of need very specific output. To give an example here, I included a screenshot of, if you write happy two, the LLM should not repeat the happy two.

Jonny Dimond [00:09:36]: It should start by adding a space and then adding the word help. And so we ran into a bunch of problems while building this feature. One of them was, well, the LLM really wants to give you a suggestion. So even if the draft is finished, it just would continue giving suggestions, basically repeating the same sentence over and over. Also turns out that llms are not great at emitting correct white space. Formatting has real hard time, like adding the right space, adding the right punctuation, continuing where you left off. And so what that ended up with is we tried to do this with instructions in the system prompt, and whatever we tried, we could not get it to reliably output the right thing. And so we essentially figured, okay, we need to try something else.

Jonny Dimond [00:10:22]: And we ended up doing some fine tuning. And so the goal here was let's get some training data that we can use to fine tune a model. The thing that surprised me is you only need a few hundred examples for a big improvement. We trained it with four or 500 examples and the improvements were massive. And it turns out you can actually synthesize training data from real emails. So you can take an email that you actually sent and now pretend like you're actually writing this email as a draft. You can do the AI, the semantic similarity search, and find the emails that it would in real life. And then you can take and sort of randomize the cursor position in the current email.

Jonny Dimond [00:11:08]: For example, at the start of a draft, at the end of a sentence, sometimes midword or at the end of the draft to train it to not output when the draft seems done. And then you take the rest of the email and just cut out either the complete rest of the draft or just the rest of the paragraph because we do want to support completion sort of in the middle of the draft. And we did one other thing which is an important use case is, as I just showed you with the promo code, it should do kind of like fact lookups, right? It should include the right information. So we handcrafted a dozen examples where we added things like the wifi password or office address or your phone number and included those in the training examples. So it would learn that if it saw this fact in the examples, it should include it. We also included examples of no fact lookups where it was missing information so it wouldn't hallucinate the answer. So that is what we've shipped. But there's much more we want to try in the future.

Jonny Dimond [00:12:09]: It turns out fine tuning is very effective and there's a lot more we could do. We're going to do it with more diverse examples, more styles, more languages, larger number of examples. And we actually know when a suggestion is good because we know when a user accepts a suggestion. So we want to sort of look at ways in incorporating that into our training step. Fine tunes are also at the stage where a per user fine tune may be possible. It used to be cost prohibitive, but we're getting into the point where it's not out of the question to have a per user fine tune for writing emails. We want to explore different models. We're not tied to the models we have in production today.

Jonny Dimond [00:12:53]: And it turns out this space is changing so rapidly that the best models are changing sort of on a weekly or monthly basis. And then the other thing we want to look at is how to include the most relevant content so we can look at improving the ranking for similar emails. Right now we just use similarity, which works fine, but there are probably lots of benefits to get from using something else. Looking at cross encoding, et cetera, looking at more fine grained snippets. Right now we can only include a full email. We'd love to include just the relevant text for this fact lookup case and then trying different or even multiple embeddings. Having multiple embeddings per email to make sure we capture the right semantics in different cases. And then it turns out email is really messy.

Jonny Dimond [00:13:43]: And so it's really important that we actually do the data cleanup and we still have a lot of work to do there. So that was ghost rider. Thank you.

Adam Becker [00:13:53]: Very cool. I see a couple of questions that the audience has, and I have about 500. So I'm going to start with the audience questions Apurva is asking. Does it also learn from the changes that the user makes to the generated draft?

Jonny Dimond [00:14:13]: Not yet. So this is something we'd love to look at as well. And that sort of ties back to either the offline learning stage or some way to figure out, okay, the user clearly is not accepting these suggestions. Basically find a way to have it suggest that type of suggestion less in the future.

Adam Becker [00:14:33]: Cool. Yeah. Can you go back to that initial system diagram?

Jonny Dimond [00:14:40]: Yes.

Adam Becker [00:14:41]: Okay. Just trying to download it all into my brain real quick. Okay. So I want to zoom out from this a little bit and just kind of like, see, I mean, this is the product stage. I want to understand this first, from a product perspective, are you replacing traditional email providers like email clients like Gmail, or is this some intelligence layer that will ultimately be helpful for users? Because you also need to get read access to all of the emails. Right. So how do you see this from like a product positioning perspective?

Jonny Dimond [00:15:18]: Yeah. So we're available for Gmail today, so you can sign in with your Gmail account. And we're basically a full featured email client. A full featured email client that sort of puts AI first. AI is going to totally change the way we work, and it could be really helpful, especially in something that is as busy as email. And so we offer you sort of all the great things that AI can enable directly in sort of a full end to end experience.

Adam Becker [00:15:49]: Got it. So when you say that you integrate with Gmail, it just means that you have a way of reading the emails that I had previously written, but now I get to live my new existence in your platform, right?

Jonny Dimond [00:16:07]: Correct. Yes. So instead of going [email protected] you go to shortwave.com. You see the same emails you see in Gmail, but you now have a powerful AI that helps you write the emails. You have an AI that can search the emails for you, that can summarize the emails.

Adam Becker [00:16:22]: Oh, wow. That's very interesting. Did I remember correctly that you said you've been working on this for a couple of years now? Right? Is this even before the insanity and frenzy around chat? GPT, you guys have tackled this even prior to it.

Jonny Dimond [00:16:38]: So yeah, we've been around for quite a while now, close to four years or just over four years. It turns out building an email client is a lot of work and building an email client that integrates with AI is even more work. We started looking at this probably two and a half years ago at first with various features that we could integrate and it turns out there's a lot of things that get better if you use AI in your email client.

Adam Becker [00:17:08]: Yeah. What about things like do you see any vulnerabilities? And if back in the day if somebody asked, you know, what's the password for AWS credentials and I sent that to them, is there a risk right now? You will also suggest this for a future draft.

Jonny Dimond [00:17:29]: So we will sort of suggest what you've written in the past. The thing we don't do at all, one of the guiding principles is no actions without user confirmation. It's very clear that the AI or LLMs tend to either hallucinate or very vulnerable to sort of injections and stuff like that. So everything we do in AI with AI in our app basically requires confirmation. So it will write the email but it won't send it for you. It will suggest an event it can add to your calendar but it won't actually add it. And so that's sort of a layer where the user has to confirm that this is actually what should happen. Before we take any action, can you.

Adam Becker [00:18:13]: Take a couple of slides back?

Jonny Dimond [00:18:15]: Yes.

Adam Becker [00:18:21]: The things we tried. Things we tried. Okay, I have a question for this, but I see somebody from the audience having, okay, vanai, is this an add on to Outlook 365 as well? This is another question.

Jonny Dimond [00:18:33]: We get a lot of requests for Outlook or Microsoft 365. Unfortunately does not work with that yet, although hopefully sometime in the future we can.

Adam Becker [00:18:45]: Yeah, okay, let's see. I think there's no other question yet from the audience. Okay, things we tried, give it a description of what you sound like. Does this description mean, is this a different prompt? Are you just sort of like wrapping up your own prompt with some of the previous description? What does that description look like?

Jonny Dimond [00:19:08]: So the way we actually built this is we took ten emails, we thought we were representative of how you write and we asked the LLM please describe the style of this user's or this user's writing style, like what do the sentence structures look like? What kind of tone is it very professional? Is it sort of casual? Does the user like to use emojis et cetera across many different dimensions? What you got was this tiny psychological profile of how you like to write, and it was surprisingly accurate. And we just used that, added it to the system prompt of basically this is how the user writes, and then the LLM would use that to make sure the draft kind of sounded like you got it.

Adam Becker [00:19:56]: Okay. What about the fine tuning a model emails you wrote? So here you've fed in a bunch of emails. Basically, what does the fine tuning look like you've collected? The other thing is it feels to me like there's a necessary pipeline of the data here in terms of cleaning. Right. Because this sounds like an incredibly complex sort of thing. I mean, each one of my emails is just a thread embedded with more nested threads. It's a giant mess.

Jonny Dimond [00:20:26]: Right.

Adam Becker [00:20:26]: You don't want to learn most of these things. So is most of the work actually going into building a data cleaning pipeline? What does that actually entail to fine tune a model?

Jonny Dimond [00:20:36]: Yeah, so that's a good question. So as part of building shortwave, we actually also built this email cleaning, data cleaning pipeline that will look at your emails, look at the replies, basically extract the actual text content of the email. And so we have this pipeline at shortwave, and it turns out that if you have that, it actually becomes much easier. So for a lot of emails, we sort of have the, I wouldn't say ground truth. There is still some messiness in there, but we have a very good representation of what you actually wrote as text. And so if we take all of those emails, and now we have the emails you've written, we have the emails you responded to, and you can fine tune a model, given the email you're responding to, what would you respond with? And so we ran some experiments on our own accounts only to fine tune a model. And it's very effective at actually capturing sort of your unique style. Once you fine tune on sort of a large corpus of emails you've written.

Adam Becker [00:21:44]: There's one more question now from the audience, actually a couple. So Apurva is asking, are you ever worried that it would ever be added as a feature in Gmail or outlook?

Jonny Dimond [00:21:56]: Of course, yes. That could always happen. One of the benefits we have as a startup is we can move very quickly. We can sort of make use of the latest technologies. Large companies, they have their own advantages, but they're generally moving fairly slowly. And so our benefit is if there's a new model, a more powerful model out tomorrow, we can ship the next day with that model.

Adam Becker [00:22:26]: Yeah. Okay, next one from Kanak. Did you consider applying this architecture for summarizing chats across social media or instant messaging platforms?

Jonny Dimond [00:22:40]: Not yet. So we are very much focused on email right now. The infrastructure that powers some of the writing actually powers also other features in our assistant. For example, the ability to perform what we call AI searches. Right. Ask it what is my wifi password in the office? Or what's the mailing address of someone you want to mail? And so we use the same infrastructure for the AI search that we also use for finding the relevant emails for the writing. We also have a summarized feature in our app. But yeah, right now it's just emails.

Adam Becker [00:23:22]: Yeah. So this is interesting because right now, obviously, we're talking about the architecture for making the email recommendation sound like you. But the moment that your entire email experience is powered by AI, it feels like one of the things that you're saying is that now you have unlocked a bunch of other AI capabilities that you might not have been able to do. And that's just got me thinking about a bunch of different features that I would wish Gmail were to have. Can you give us a few other examples of things that you guys had spoken about that are not what we're seeing?

Jonny Dimond [00:23:57]: One, one example that we'd love to build and we're sort of exploring is, well, I don't know, you get an email saying, hey, do you have time to meet next week? And there is no reason we shouldn't be able to just pull up your calendar for the exact time spot that this person is asking about and show you that proactively. Right. I think right now our AI is very much driven by user intent, but what happens if you turn around? Make it proactive. Here's your schedule for this exact time that the user is asking about. And there's many different examples like that as well.

Adam Becker [00:24:38]: Yeah. So I'll tell you sort of like what I'm compelled to do. I'm compelled to ask you about all of the challenges associated with just building an email client, which sounds like an enormous undertaking, but I'm not entirely sure that's relevant to this conference. If people in the chat are telling me that they would entertain me in this, I would love to grill you on that. But short of that, I want to then ask you about what it actually takes to even do what you just did, which also sound, even from that system diagram that you've shown, it looks like there are many different steps here. Can you tell us a little bit about. Can we just zoom in? How are you actually doing this?

Jonny Dimond [00:25:19]: Yeah, so lots of trial and error, I think, is the reality. And making sure we're building stuff that we can reuse across our stack. Right. So for example, yeah, the vector database with embeddings, it's not just driving this feature. We notice we could use it both way, AI search, et cetera. But I think when it comes down to it is making sure you sort of follow what's happening in the space, because stuff that was true six months ago is now outdated already. Don't we sort of. Internally in our team, there have been many, many times where someone claimed that is impossible and someone else said, let's just try it.

Jonny Dimond [00:26:11]: And it turned out it was not impossible. I've been wrong. I hate to admit it, but I've been wrong on a couple of occasions where I said the current technology can't do this. And someone else is like, but what have we tried? And we tried and it worked. So that's another big part of it. And then, yeah, just making use of what's out there. And we make heavy use of some open source models of OpenAI, basically what fits best.

Adam Becker [00:26:40]: Yeah, but it feels like even having started this a couple of years ago, two and a half years ago, the landscape had shifted dramatically since then. Right. Almost every framework that we would use today didn't exist a year ago, let alone half years ago. So on one hand, you probably got to see the actual origin of many of the challenges and you've actually gotten to suffer the kinds of pain that ended up necessitating the construction of many of these frameworks. And so you probably have a deeper appreciation for them. On the other hand, you want to stay agile and actually use the latest frameworks at the same time too. So I'm sure that there's some tension there. You don't always want to rebuild everything at the same time you want to incorporate what's new and useful.

Jonny Dimond [00:27:22]: Yeah, I think one key insight here is we try to build the system in a way that we can take any component and replace it. For example, we don't rely on a specific embedding model right now, we're just using one. And we'll have to update our infrastructure. We'll have to spend a bunch of computes if we change our embedding model, but we have the infrastructure to basically switch it out. And for example, we could start with new users and say, okay, let's just use a new embedding model, a lot of a system. One of the focuses we have is we should be able to sort of replace any particular Lego block with something new. One example of that is also I think the work in our pipeline is paying off. The fact that we have clean representation of text of your emails has been hugely beneficial for us in building this.

Adam Becker [00:28:30]: I imagine that you're using this for your own personal needs, right?

Jonny Dimond [00:28:34]: Yes.

Adam Becker [00:28:36]: Have you learned certain things about yourself? You know what I mean? Because I have a feeling this is almost like putting a mirror up to you and be like, okay, by the way, this is how you normally respond. This is the kind of not funny joke that you normally make. Have you had any insights about your own writing style?

Jonny Dimond [00:28:55]: Yes, there have been a couple of uncanny moments where the AI spits out a sentence where I'm like, this doesn't sound that great. And then I stop and I like, wait, this is what I write all the time. So, for example, I don't know, one of the phrases is the holler if you have concerns there. I love to insert that into my emails. And I notice, well, I probably use that too often or sort of writing in a passive voice of like, let's do something, instead of being specific about who should do something. So there are definitely sort of a weird mirror of your own writing that happens here.

Adam Becker [00:29:39]: Jonny, thank you very much for coming and sharing show wave with us.

Jonny Dimond [00:29:44]: Yes, it was a pleasure to be here.

+ Read More
Sign in or Join the community

Create an account

Change email
e.g. https://www.linkedin.com/in/xxx
I agree to MLOps Community’s Code of Conduct and Privacy Policy.

Watch More

From Virtualization to AI Integration
Posted Sep 12, 2023 | Views 358
# Virtualization
# AI Integration
# JazzComputing.com
From Robotics to AI NPCs // Nyla Worker // AI in Production Talk
Posted Feb 22, 2024 | Views 460
# npc
# Multimodal LLM App
# AI
# GAMING
From Research to Production: Fine-Tuning & Aligning LLMs // Philipp Schmid // AI in Production
Posted Feb 25, 2024 | Views 1.1K
# LLM
# Fine-tuning LLMs
# dpo
# Evaluation