MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Dynamic Embedded Sales Agent // Valdimar Eggertsson // Agent Hour

Posted Jan 09, 2025 | Views 129
# sales agent
# voice agent
Share
speaker
avatar
Valdimar Eggertsson
AI Development Team Lead @ Snjallgögn (Smart Data inc.)

Raised in Reykjavík, living in Berlin. Studied computational and data science, did R&D in NLP and started making LLM apps as soon as GPT4 changed the game.

+ Read More
SUMMARY

//Abstract Demonstrating agents embedded within websites that utilize real-time audio and structured outputs to dynamically update web pages through conversational interactions.

//Bio Raised in Reykjavík, living in Berlin. Studied computational and data science, did R&D in NLP and started making LLM apps as soon as GPT4 changed the game.

This is a bi-weekly "Agent Hour" event to continue the conversation about AI agents. Thanks to arcade-ai.com for the support! Join the next live event at home.mlops.community

+ Read More
TRANSCRIPT

Demetrios [00:00:06]: What's up, man? So you got to talk for us. Yeah.

Valdimar Eggertsson [00:00:11]: Yes, yes. I'm gonna show you just some concrete examples of some agents and I'll just dive right into it. Cool. I like that. Share screen.

Demetrios [00:00:29]: Boom, we're in business.

Valdimar Eggertsson [00:00:30]: You see it? Yeah. Okay, I'm just gonna. Yeah, I'm gonna show some examples of what I call embedded chat agents. Yeah. And I'm utilizing the structured outputs, which I love and I guess more people than me love. I'll just. Yeah, you'll see what it's all about. About me.

Valdimar Eggertsson [00:00:54]: Yeah, I studied computational data science. I got into NLP kind of at the right time, like a GPT, GPT2, did some research in language understanding and just got into building some applications after GPT4 came out. Basically the startup I'm at is located in Iceland. I'm working for both companies and government. Little bit about the startup. It's very hard to pronounce. Snajlk, that means smart data, doing data science stuff focused on customer service because that was the lowest hanging fruit. Also demand prediction, forecasting, inventory management for like storage and stuff.

Valdimar Eggertsson [00:01:41]: But what I do is mainly the chat agents. Yeah, lots of them. By now. I'm going to show you a few. There's the embedded sales agent, the dashboard agent, the forms filler, and the agent wizards. So what's the basic concept? So like chatbots are usually down there in the corner of the page and you chat with it. But we thought why not just have it inside the page so you can talk to the web page and. Yeah, so websites have some data schema, JSON schema specified for like the data that's visible there, which fits really well with the structured outputs that more tornado lamps can deal in.

Valdimar Eggertsson [00:02:33]: And the output of the LLM is mapped to the website schema to update the website that the bot lives in. Draw a little picture here so there's the speaker in the top left corner. And the bot I'm going to show you uses the real time audio thing from OpenAI which deals with audio directly and awesome. But it also makes transcripts of the conversation which can be fed to a model that extracts parameters from the conversation to update the JSON which came from the website in the form of a schema. So at the start the web page lets the AI know what data it uses. Then during the conversation the data is extracted to fit what the website accepts and can update the website. And I have a demo. It's based on this tourism customer support bot that we made.

Valdimar Eggertsson [00:03:41]: It's an agent that can call functions. It can look up information and update bookings and stuff. It searches using rag with some hip grid rack. Fancy rack has these nice audio and vision multi modalities and yeah, launched into production at the end of 23. So guess that's before the year of agents was an agent. And it's for tourism guided tours in Iceland. So here's a video. Maybe turn up the volume.

Valdimar Eggertsson [00:04:14]: Whoop. Yeah. So the website has these filters. This normal ui, you can filter it down by clicking buttons. Duration of a tour, which month it is, where is it? But we can extract all of these parameters from the conversation. And here's the bot, the professional assistant. And I'll shut up because. Yeah.

Valdimar Eggertsson [00:04:41]: Hello. Who are you?

Virtual assistant [00:04:44]: Hello. I'm a virtual assistant for Arctic Adventures, here to help you explore and book exciting tours in Iceland. How can I assist you today?

Valdimar Eggertsson [00:04:54]: Are you like a sales guy in a store?

Virtual assistant [00:04:58]: I'm here to help you find the best adventure tours in Iceland. Think of me as your personal guide, ready to provide information, recommend tours and assist with bookings. How can I help you with your Iceland adventure?

Valdimar Eggertsson [00:05:12]: Well, I'm actually visiting in June and I'm wondering what to do. Can you find the ideal tour for me?

Virtual assistant [00:05:21]: Sure. I'd be happy to help you find the ideal tour for your visit in June. Could you please tell me a bit more about what kind of experience you're looking for? For example, are you interested in nature excursions, adventure activities, cultural?

Valdimar Eggertsson [00:05:37]: For sure. I want to see some unique Icelandic nature.

Virtual assistant [00:05:41]: Great choice. Iceland's nature is truly unique. To find the best tour for you, could you also let me know the maximum number of days you want the tour to last? This will help narrow down the options.

Valdimar Eggertsson [00:05:53]: Don't have a lot of time for just one day? Please.

Virtual assistant [00:06:06]: I found some fantastic day tours that showcase Iceland's unique natural beauty. Perfect for your June visit. 1. Golden Circle and Blue Lagoon Small group tour. Explore Thingolver National Park, Gullfoss Waterfall, the Gazer geothermal area and end with a soak in The Blue Lagoon. 2. Classic Golden Circle Full day tour. Visit iconic landmarks like Thingolver Gazer, Gullfoss Falls and the kerit volcanic crater.

Virtual assistant [00:06:38]: 3. Golden Circle and Caving with a lava get expensive.

Valdimar Eggertsson [00:06:42]: Do you have tours under like 150 bucks?

Virtual assistant [00:06:48]: Let's find something within your budget. Could you please specify the amount you're comfortable spending so I can narrow down the options? Here are some great day tours under $150 that showcase Iceland's unique landscapes.

Virtual assistant [00:07:13]: Classic Golden Circle Full Day Tour the $92, visit Thingulver National Park Gazer, Gullfoss Falls and Carrot Crater. Experience iconic landmarks with convenient pickup and drop off in.

Valdimar Eggertsson [00:07:29]: Thank you. Okay, I hope you heard everything. Okay, so as the conversation went on, the parameters that are used to filter down the search are like extracted and then a backdoor search is made to find what matches your hypothetical description or like a vague description. So we're using GPT4O, which has a structured output. Yeah, the LLM reads the JSON schema, knows what data is like extracts parameters and then like translates the search results into the JSON schema used by the website. And this is, this is like a proof of concept, but it's more useful when you have a huge product catalog. So we're making it now for an electronic store like electronics chain that has thousands of products. And then the idea is that you go into the store, you just talk to the sales guy to find like the specific thing you want based on a conversation.

Valdimar Eggertsson [00:08:31]: Yeah. The same technology can be used for other stuff. We have the embedded dashboard, so we can talk to your dashboard to visualize your data. The bot knows what data is in the dashboard, so what can you possibly see? And then maps your input to the filters and updates what you see. I'll just show you this. I hadn't deployed this with audio version, so I'll just dub this. But it should work fine with audio, just kind of new. So here's an electricity use dashboard and lots of different parameters.

Valdimar Eggertsson [00:09:07]: Don't really know what they all mean, but I want to see just KVA, kilovoltage, something last 10 days and it maps this to the structure. Okay, November 8th hourly please split by usage category so we can extract some insights. Okay, November 8th, the biggest user of electricity at 7 was gardening, indoor gardening and something I don't know. Point is you can talk to your dashboard, which would be useful for data analysts. And it's the same concept except the audio was dubbed by me. But it should work with the audio too. Talk to the Realtor audio model. Structured output is extracted from the transcripts.

Valdimar Eggertsson [00:10:16]: It knows the JSON schema from the web interface and sends the parameters into it. There's a form filling agent, so yeah, it's boring to fill out forms. Would be nice to just talk to the form. You can fill out the example. Well, I'll just show you the example. I guess everybody here has applied for jobs and then you need to fill out all of these fuckers where with the same data. Again, again, this is dummy mock up job application. It's an Icelandic.

Valdimar Eggertsson [00:10:47]: It's kind of to showcase also the language agnostic part of it because I'm just going to drag and drop PDF. So what is this form about? Form is designed for job application submitted in Icelandic, yada, yada, yada. Here's my colleague's resume, which is in English, by the way. So English guy. It's extracted the relevant info. Okay. And then we can just chat with it. I think I'll ask it to add some data there.

Valdimar Eggertsson [00:11:25]: Yeah. The age and the gender of the person was missing, wasn't npd. And refine it via chat. Add some more detail to the experience part because it was in the PDF concise there. Okay. Format it a bit more nicely. And here it's sending a JSON schema with this structure based on the chat. Here's another application more multimodal product registration.

Valdimar Eggertsson [00:12:11]: So you have like there's people whose job for the supermarket. It's a supermarket chain. Their job is to fill in forms with products. And how about if you can just drag and drop an image. I threw an image here of a loaf of bread which the person would usually need to type in. But they don't have to, not anymore. Yeah, energy ingredients and so forth. There's another photo that I drag and drop too just to get the guess the barcode and the weight was missing.

Valdimar Eggertsson [00:13:09]: Wait. All right, that's it, I think. Yeah. Thank you, buddy. Finally I have the agent wizard. So this is a system I used to manage the bots. It's online your prompts and functions. Anyway, so I want to fill in the function, but I can do it with a bot.

Valdimar Eggertsson [00:13:37]: Let's make a carbon footprint calculator. I'll just jump ahead a bit because I'm running out of time. It fills in the name of the function, fills in something, something declaration. Okay. Boom. Code. It made the code based on my chat here and doc string and all this is used to like deploy a bot. They can use this function.

Valdimar Eggertsson [00:14:06]: I think there's not much more here to say about this. Load all variables from settings and then boom, moved into settings. But this is always the same technology. JSON structure. Jsons will be cool with the voice. It was super cool when I made it a couple of months ago, but like now I'm using cursor with cloth and I feel like similar. Anyway, yeah, that was it. Reading the JSON schema can be very powerful combined with a structured output.

Valdimar Eggertsson [00:14:41]: And yeah, there was a bit tricky to program the real time audio thing so it works properly but it's just so powerful. Only downside, I guess, that it's a bit expensive. That's me, thanks.

Demetrios [00:14:56]: Hell yeah. That's wild. There's a few good questions coming through here from Robert in the chat. Robert, I don't know if you want to just come off mute or if you want to.

Robert B. [00:15:07]: Sure, yeah. Well, it seems like having an embedded agent within a website be really great if you have specialized knowledge you wanted to tap into and make that available. But there's also like a chatgpt approach where you have like it sees your screen and then it can interact with the screen, kind of do things for you. So the future, what you see happening more, I mean maybe they both have places, but what's your take on that?

Valdimar Eggertsson [00:15:38]: Sorry, I didn't really catch it properly.

Robert B. [00:15:42]: There's two modalities, right? Like you could either embed in a website and maybe that agent knows something specific about that website itself, how to interact with it, or you can see a modality where I just have a personalized AI that's on my system and it can see my screen and then interact for me with like a website. So there's two different approaches here. Maybe they have two different ways of being good at solving these problems. You have a take on this like what do you expect maybe to see in the future between these types? Or maybe there's more. I don't know.

Valdimar Eggertsson [00:16:15]: Yeah, I don't know. I think like interacting with screen computer for you is a bit more complicated I guess. I think that's more exciting. The second one is mentioned. I guess the first one was just what I was showing. Yeah, I'm not sure. It's just. We'll see what the future brings.

Valdimar Eggertsson [00:16:38]: Yeah, I don't know. Thanks for that.

Virtual assistant [00:16:43]: Cool.

Demetrios [00:16:44]: So. Oh, I see somebody else jumping on. Yeah, hit us.

Anas U. [00:16:49]: Yeah, Yeah, I had some questions. I guess the first experience around the travel, the website for Iceland finding tours, we've been experimenting on the support side where having an agent where you could converse with. Now this is generally not common user behavior today. And I'm thinking from a modality perspective like the adoption of it. Right. Because that experience, although it's really cool, it's slow. Right. I could have done that experience probably in a fraction of the time just by typing a few things.

Anas U. [00:17:21]: Now that's where I want to maybe get some feedback in terms of what adoption have you seen? And then have you guys thought about how do you speed up that process to make it. I guess a similar type of speed that I could do by typing In a few words or hitting a couple of filter buttons, for example. Now I know the experience is slightly different because now I'm having a bit of a conversation, it's guiding me through it. But I just wonder if you guys have any thoughts around that.

Valdimar Eggertsson [00:17:45]: Yeah, I mean this, like this for now hasn't been optimized for speed. There are definitely like maybe too many questions back and forth. You could condense it in fewer, fewer interactions. Yeah, I guess having it like yeah, speed. There's a downside to it but like the feeling of having a personalized assistant might have some value and when the search space is really big. So one company we're working for has these kinds of things you need to. For artisans and carpenters to. So I don't know the words but they have like 25 different types of glues depending on which surface you need the glue for.

Valdimar Eggertsson [00:18:43]: So maybe when you need the specialized knowledge in a conversational form, like when you go to an actually like knowledgeable salesperson and when you can get more info, then it's just in the UI and you don't want to need to read through the manuals maybe for the different clues, different types of glues. So yeah, that's just what I'm thinking when answer this question. So going deeper in the knowledge might be something the agents can help with. But adoption is always a thing. Like we're not used to using audio, but now that it's so good, these audio models, maybe people will start using it more. I don't know.

Anas U. [00:19:27]: Yeah, yeah, that makes sense. Have you guys done any experimentation with Gemini? Their latency? I don't think anybody competes with their latency in real time audio today.

Valdimar Eggertsson [00:19:39]: No, no, not yet. And yeah, we're based outside of the US So I think that was an issue last time I knew with Gemini.

Anas U. [00:19:53]: Yeah, thanks. Appreciate that in a sense.

Valdimar Eggertsson [00:19:57]: Okay, thanks.

Demetrios [00:19:58]: Yeah, the idea of like which use case or when you would want to use it, what type of store you would see this actually having more lift or better outcomes for is a fascinating kind of problem too because if it's just for the shirts that I sell for the ML Ops community, that's probably not the best use case. But like you said, if there's 25 different types of glue, glue that unless you're a very, very good expert, you're not going to know those names of the glue then then it's valuable but also like trying to figure out how to make the UI more multimodal. So you're having the, you're having the sales assistant walk you through it and get to the point faster is another almost like tricky, tricky question or tricky way of prompting it. So, dude, I, I'm a huge fan of the forms. 1 I think that is very cool that filling out the forms on your own and almost I couldn't figure out was like the form buddy, someone that goes with you on the Internet and it's just, it's in my browser. How does that work?

Valdimar Eggertsson [00:21:21]: Well, yeah, I guess it could be like that, but it was just a part of the web page. So yeah, it needs to be kind of customized for the web page to like. Or it takes some work to embed it into the page. But if you had extension in Chrome or whatever, they could read the JSON schema of the website, then yeah, would be a cool, cool, cool thing.

Demetrios [00:21:50]: Well, there's always those, there's always those like password managers that are trying to fill out forms for you with your information. And I find they rarely get it right. And especially once you get to like, more complex questions that you could potentially answer in different ways, it would be great to be able to just talk back and forth with something instead of having to type it out.

+ Read More
Sign in or Join the community

Create an account

Change email
e.g. https://www.linkedin.com/in/xxx or https://xx.linkedin.com/in/xxx
I agree to MLOps Community’s Code of Conduct and Privacy Policy.

Watch More

Intelligent Autonomous Multi Agent AI Systems // Natan Vidra // Agent Hour #2
Posted Dec 19, 2024 | Views 348
# Autonomous
# Multi-Agent
# Agents
# AI agents in production
Why we built PydanticAI, and why you might care // Samuel Colvin // Agent Hour #2
Posted Dec 19, 2024 | Views 438
# Pydantic
# Agents
# Agent Hour
# AI agents in production
Best Practices from a Live European AI Agent in Logistics // Vanessa Escande // Agent Hour
Posted Jan 09, 2025 | Views 185
# logistics
# europe
# AI Agents