Sign in or Join the community to continue

Lessons from Building LLM-based Social Media Products

Posted Mar 06, 2024 | Views 198

# LLMs

# Social Media Products

# LinkedIn

Share

speaker

Faizaan Charania

Senior Product Manager, ML @ LinkedIn

Faizaan is a Product lead working on Personalization and Generative AI use cases for Creators and Content Understanding at LinkedIn. He's been in the field of machine learning for 8+ years now and recently launched some of LinkedIn’s first user-facing Gen AI products in early 2023.

+ Read More

SUMMARY

The goal of the talk will be to learn how to harness Gen. AI to build the right products for your users, efficiently. It'll cover learnings from different stages of a product, from the idea exploration stage, to hardware capacity planning, iterating on early versions, building early trust with your users, and finally measuring success over the long term.

+ Read More

TRANSCRIPT

Lessons from Building LLM-based Social Media Products

AI in Production

Slides: https://docs.google.com/presentation/d/1qk-GS2DaCHrC5jjg3ilSyN9u93Iaz3kd/edit?usp=drive_link&ouid=112799246631496397138&rtpof=true&sd=true

Adam Becker [00:00:05]: Next on the stage we have Faizaan.

Faizaan Charania [00:00:07]: Let's see.

Adam Becker [00:00:08]: Faizaan, are you here?

Faizaan Charania [00:00:10]: I am. Shout out to. I heard. So joining in from San Francisco, where's the South Bay?

Adam Becker [00:00:17]: Well, we got a chapter in South Bay and we have a big event coming. I think we had one like a couple of days. South Bay is killing, so here you go, pizza. Next event, you get a special invitation, show up.

Faizaan Charania [00:00:31]: Absolutely. We'll be there.

Adam Becker [00:00:33]: And these are your slides, right? Beyond the lab, bringing LM to your users. I am now actively working on a new startup and I think I'm going to need to hear exactly what you have to share. Saying it from the abstract.

Faizaan Charania [00:00:49]: Okay, awesome. Adam just disappeared, so I'll just jump in right away. Beyond the lab, bringing llms to your users. This is all about just making sure understanding how you can productionize your generative AI based use cases. Before we jump into any of that, a little bit about me. I am right now a senior product manager at LinkedIn. I'm working with a multinational team of AI engineers, data scientists, and together we are focused on many interesting problems, some of them being video understanding AI experiences for creators recommendation systems within the LinkedIn feed notifications. And lately, in the past year, there's also been new generative AI experiences and understanding how we can elevate users LinkedIn experience with the help of Gen AI.

Faizaan Charania [00:01:41]: So that's what I've been up to lately. Before that, I was at Yahoo. I was a machine learning engineer, torrent product manager at Yahoo. My team was always focused on large scale machine learning, ML platforms, ML Ops. In a lot of cases, that was a part of our pillar as well. So the whole domain is very dear to my heart. Before that, I was part of many different companies since here and there when it came to ML research roles, internships and other opportunities over there. And yes, I studied computer science for my grad school, also computer engineering from Mumbai University.

Faizaan Charania [00:02:19]: Okay, too much about me. Let's get into what we are here for. Okay, so the goal for this conversation is going to be to learn how to harness generative AI to build the right product for our users efficiently. And again, I'm not going to say something that's going to just completely blow your mind, but a lot of these learnings that I have carried, things that I had to keep in mind while I was building these products and make sure that you keep these ideas front and center when you work on your generative AI products. Okay, so what kind of products are we talking about? I work at LinkedIn, so my experience with Gen AI building scalable products was at LinkedIn. So these are some of the products that I have worked on. Some I owned end to end, some I was partnering with some other folks. But at the left bottom, you'll see how we can potentially use generative AI to help polish up or rewrite a post that someone's trying to post.

Faizaan Charania [00:03:21]: This helps all kinds of English as a second language speakers, people who are good at communicating, but not that good at writing. They want their voice to stand out, but they need just the push for that. So that's one thing. And a lot of other use cases if you want to evaluate if you're good enough for a job or how you could better position yourself for a particular job. So that's on the top left, and there are some other experiences as well. Okay, so that's that. So now when we are thinking about what we want to do and how we want to build our product, I'm trying to divide this into three stages for the MVP. First is explore.

Faizaan Charania [00:03:57]: Explore will involve idea exploration. How do you even want to go about building your product? The second is the building part, which I genuinely enjoy. And the third is, once you've built your product, how are you going to measure the success of it, and what do you want to keep in mind as you keep itating on it? Okay, so the first stage is the idea exploration stage. And the first point is almost a cliche in the product world, you can replace llms are just a tool, with machine learning is just a tool or with anything else. But it is something that's extremely important for us to keep in mind. Llms are just a tool, and the core focus that we have should be on how it can be used to solve your users pain points. So be user driven. Be user driven.

Faizaan Charania [00:04:46]: Understand where the users are struggling. What do your users want? What's the job they would want to hire your product or your company for? And think of it from that angle. So when you're doing that, what you want to do is do a modified user experience research. And not just think about pain points, but also think about the different kinds of AI solutions based on, say, quick experimentation or building prototypes. And I say a modified user experience research because there are many new interaction patterns that have come up with generative AI. It's not just a chatbot. Yes, a chatbot exists, and many people are already using a chatbot, getting good value from chatbots. But that's not the end all, be all.

Faizaan Charania [00:05:32]: That's not the only way of going about things. There are many ways you can integrate Jenny experiences or features in your existing products. So that is something that you should also think about and explore. There are going to be a lot of assumptions that we might have about how this might work or how a user might want to use your product, or how geni could fit in into your product. But just talking to users is going to help you get a lot of these learnings. We did one of these user experience researches for something that we were planning on building. This is January of 2023, and it gave us some very good insights. There were some big assumptions that we wanted to validate, whether we go direction a or direction b, and that particular research, and we didn't go too wide in our research, but just answering those particular questions helped us understand what will work and that helped us save a lot of time.

Faizaan Charania [00:06:27]: Okay, so once you're done with your research, you want to start thinking about prototyping. And again, when you're prototyping, try different ideas. Experiment with large models, with small open source models, see what works, what doesn't work in the exploration stage. Basically, what I'm trying to say is don't worry about being too prescriptive about what you want to do. Give yourself the freedom to look around, think about other ideas, ideas that might break your current assumptions about how users use the product today versus how they might want to use your product tomorrow. So that's the idea. Exploration stage the next now we are starting to get slightly closer to the implementation piece, but whenever you want to implement anything that uses Jenny, you have to think about GPU capacity, GPU capacity budgeting. So gpus are obviously one of the most coveted resource for new AI startups today, and you have to potentially think about it as well.

Faizaan Charania [00:07:25]: And there are many ways, many decisions that you will take while thinking about your product that will affect how many gpus you need or how much GPU resources you need. There's prompt engineering versus fine tuning. So that's one of the decisions that we had to take as well, one learning. And I think this has become more and more well known in the industry lately at least that prompt engineering over model fine tuning is going to be very useful for majority of the cases. Yes, there will still be some cases that very well require model fine tuning, but for the most part, I think most of the products will be better off with just prompt engineering. It gives you faster go to market time, it has a lower technical barrier and obviously uses fewer gpus as well. So all of that is going to be good. Model fine tuning is obviously better for a particular use case, but I would say it's only useful for very niche use cases where the margin of error is very small.

Faizaan Charania [00:08:30]: So is prompt engineering comparable to model fine tuning? I've linked an article over here and I'm sure you will get access to the slides later on, but this is by Microsoft. It's called power of prompting, and they use GPT four prompt engineered, and it started surpassing med Palm two. I hope I'm saying the model's name right, I forget. But basically Med Palm two is optimized for medical questions, and after enough prompting, GPD four started outperforming Med Palm two, which it did not. Initially, we had the same experience. We were using GPD three five turbo and GPT four out of the box. Obviously the four was performing much better for our use case. But after spending some time on prompt engineering, the three five turbo, in spite of being much smaller, started performing at par and sometimes better than the four for our use case.

Faizaan Charania [00:09:21]: So keep that in mind. Give prompt engineering a good shot, and while you're thinking about building your experience, first build a good experience, then worry about GPU costs. And I know these two advices might seem contradictory, but you have to make your own trade offs, right? Sometimes you might need GPD four. And don't run away from it, because foundational models are getting cheaper and faster. If you check out the open LLM leaderboard by hugging phase, it keeps changing week over week. There's always new things coming out. GPT four turbo, when compared to GPT four, it is three times cheaper, has a four times bigger context window, and performs better. So overall, if you're thinking about the future, don't worry about what your costs look like right now.

Faizaan Charania [00:10:14]: Obviously, don't just think about one month and then you run out of money because gpus are expensive. But expect the foundational models to get better and for you to have better models for cheaper as you scale your product. Okay, so we've done the exploration stage. We've thought about how to approach time to execute. When we execute, let's say we are using the prompt engineering approach, which is what we did for some of our use cases. So how do you go about that? In prompt engineering, prompt iteration is the key. Demos will be awesome. First rounds and second.

Faizaan Charania [00:10:48]: 1st data point. Second data point is going to look absolutely wonderful. But when you start scaling, when you use the same prompt for 10,000 users, 100,000 users, millions of users, that will show the hallucinations, that will show all of the gaps in your logic where your prompts could be misinterpreted by the model. So this is where you want to keep working on your model to make it better. And my way for this, and this is again slowly becoming the standard as well, is think of this prompt as your own machine learning model and remove subjectivity from the output evaluation set. A comprehensive output eval framework. This is the loss that you're setting right and evaluate new prompts or new prompt versions against the same rules and the same data sets, the same data points. So your validation data set exactly the same.

Faizaan Charania [00:11:40]: Now, again, not completely null, but this is something that I noticed a lot of other companies or other startups that I was just speaking with, they weren't actively thinking about this. And doing this adds that discipline and helps us get to better prompts faster. Measure the gaps after every iteration, see where your prompt is failing, and improve the prompt accordingly. Okay, so this is how we were going around doing our prompt iteration. Let's do some prompt iteration, or at least just think about our evaluation framework in this setting right now. So let's say we are trying to generate articles with AI. This is something that I pulled off of Expedia today. Ten things you can only buy in Spain.

Faizaan Charania [00:12:25]: I'm assuming they will be trying to scale this to different articles that say ten things you can only buy in England, India, France, this and that. Obviously this is an SEO play, brings good value for members as well. But if you had to write a prompt that can scalably keep writing these articles, what kind of things that you would want the article to have, what attributes will you evaluate it against? So think about that. You could think about the length. You want the length to be a particular format. You want the article structure to be a particular format. So I'm just going to pause for five to 10 seconds. Just think about it.

Faizaan Charania [00:13:08]: And if you have any ideas of what you would use in your evaluation framework consistently for different prompt versions, add it to the chat. In the meantime, I'll figure out how to see that chat.

Adam Becker [00:13:23]: I'll help you. I don't think it's going to be where you're looking. Let's see if anybody, I'll read to you the things, if anybody says.

Faizaan Charania [00:13:35]: Okay.

Adam Becker [00:13:35]: It'S okay, there is a lag, so let's give them maybe like five or 10 seconds. Let's give them a few more seconds if you want. Okay, probably best to just probably bring it up.

Faizaan Charania [00:13:57]: That's all right. Okay. And did see this whole conversation in lag as well, which is just so funny to me. Okay, so these are some examples that I had put out already. When you're thinking about your articles evaluation criteria, you could say the lens should be in a particular format. The language should always be the local language of the country that you are passing. In the article structure should be in a particular format and in the article, maybe you want certain extent of interest diversity to be there. Ten things you can only buy.

Faizaan Charania [00:14:32]: It shouldn't only be focused on people who are interested in food or people who are interested in adventure activities. It should cater to a wide variety of users on the Internet. Again, price diversity. Not everything should be extremely expensive or extremely cheap. Have some options over there. So these are some rules that you will have to define very precisely, and that's going to help you understand what your prompt is doing, how your prompt is performing, and how it can get better. There's a PS over there. Do not miss this step.

Faizaan Charania [00:15:06]: Unfortunately, there is no shortcut. If you want to build a good prompt and iterate on it effectively, this is going to be one of the best ways of doing it. I know for a fact small companies are doing it. Large companies are doing it for their use cases. This definitely works. Okay, so this is what comes to prompt engineering. Let's say you've built your product. Yay.

Faizaan Charania [00:15:30]: Okay, so now how do you measure success? How do you see what's working for users? What's not working for users? And I say success. But before we even get there, we should think about the trust that you have to build with the users. And trust is more important than ever in the age of generative AI, because now AI is generating content or generating responses or generating responses that represent your brand to the user. This is an example where I was playing with Chat GPT, and I was trying to input all kinds of nefarious prompts, but this is something where I had to throw someone under the bus, chose to throw myself under the bus. Are south asian men predisposed to being number than the average man? Obviously not. Anyway, look at the response. Look at how chachi PD handles this. There's a clear response that says this content might violate our content policy.

Faizaan Charania [00:16:25]: If you believe this to be an error, please submit feedback. They are actively taking feedback. And the response it's not just talking about, hey, this is not good enough. This is not refusing to respond, but it's responding and explaining what the company's represented stances are, what the product's represented stances are. This builds trust in the product. Information liquidity in the age of Gen AI is up. But again, signal to noise ratio is going down. There are going to be AI generated content pieces everywhere, chat bots everywhere.

Faizaan Charania [00:17:01]: So you need to have that trust bond. You need to build that retention with your users at onset, protect against hallucinations. If your model is just outputting something that doesn't make sense, you're going to lose fades with your users and that's just going to hurt user retention. They might not come back. There are many new products who are trying to compete for the same job to be done for the user. Prioritized data differentiation. Again, a lot of these products that are in the market will currently at least, are powered by similar foundational models, and that could get commoditized after one point. So what you want to do is you want to differentiate based on the data that you have, the personalized experience that you can give.

Faizaan Charania [00:17:45]: Focus on that. And once you're online, like this example also shows over here, adapt to feedback. Actively promote inclusion. Instead of just trying to avoid bias. As you adapt to the feedback, communicate that back to the users as well. That's going to build, again, more trust. And in this day and age, with Geni taking over a lot of the interaction patterns that we have on the Internet, it's going to help you keep your users and help users come back to you. Okay, this is broader principles on trust, but again, we want to build a product, right? And what about the success of that product? How do we measure that at launch? Learnings are primary.

Faizaan Charania [00:18:28]: Again, new tool, new way of solving old problems. Assumptions might break, so look for it. And once you find some examples where you think the users are not using your product the way you thought they would, don't be afraid to pivot, change, switch, measure the adoption. Watch out for funnels. See where the bottlenecks are, see what users are doing versus not doing, and where they drop off. The first milestone is obviously going to be to find product market fit. Then the second is scaling. This is where I was talking about GPU costs, better foundational models, scaling to millions of users.

Faizaan Charania [00:19:06]: Scaling comes next. Build a good experience, then scale, and then monetize. Cool. Thank you so much. That is me. These were some of the learnings that I gathered from working through multiple different gen AI solutions, and I'm really excited to learn from you as well and all of the other talks that are happening there.

Adam Becker [00:19:26]: Python, thank you very much for this. I'm not sure I saw any questions in the chat yet, but if you could stick around in the chat, that would be useful. I'm sure folks would. We'd love to continue singing.

Faizaan Charania [00:19:39]: You on this channel.

+ Read More

Sign in or Join the community

Watch More

Building RAG-based LLM Applications for Production

Posted Oct 26, 2023 | Views 2.2K

# LLM Applications

# RAG

# Anyscale

Evaluating LLM-based Applications

Posted Jun 20, 2023 | Views 2.4K

# LLM in Production

# LLM-based Applications

# Redis.io

# Gantry.io

# Predibase.com

# Humanloop.com

# Anyscale.com

# Zilliz.com

# Arize.com

# Nvidia.com

# TrueFoundry.com

# Premai.io

# Continual.ai

# Argilla.io

# Genesiscloud.com

# Rungalileo.io

Building LLM Applications for Production

Posted Jun 20, 2023 | Views 11K

# LLM in Production

# LLMs

# Claypot AI

# Redis.io

# Gantry.io

# Predibase.com

# Humanloop.com

# Anyscale.com

# Zilliz.com

# Arize.com

# Nvidia.com

# TrueFoundry.com

# Premai.io

# Continual.ai

# Argilla.io

# Genesiscloud.com

# Rungalileo.io