MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Machine Learning, AI Agents, and Autonomy

Posted Jan 08, 2025 | Views 178
# Machine Learning
# AI Agents
# Autonomy
# Wise
Share
speakers
avatar
Egor Kraev
Head of AI @ Wise Plc

Egor first learned mathematics in the Russian tradition, then continued his studies at ETH Zurich and the University of Maryland. Egor has been doing data science since last century, including economic and human development data analysis for nonprofits in the US, the UK, and Ghana, and 10 years as a quant, solutions architect, and occasional trader at UBS then Deutsche Bank. Following last decade's explosion in AI techniques, Egor became Head of AI at Mosaic Smart Data Ltd, and for the last four years is bringing the power of AI to bear at Wise, in a variety of domains, from fraud detection to trading algorithms and causal inference for A/B testing and marketing. Egor has multiple side projects such as RL for molecular optimization, GenAI for generating and solving high school math problems, and others.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
SUMMARY

Demetrios chats with Egor Kraev, principal AI scientist at Wise, about integrating LLMs to enhance ML pipelines and humanize data interactions. Egor discusses his open-source MotleyCrew framework, career journey, and insights into AI's role in fintech, highlighting its potential to streamline operations and transform organizations.

+ Read More
TRANSCRIPT

Egor Kraev [00:00:00]: So, hi, I'm Egor Kraev. I am now principal AI scientist and Wise until recently heading up and building up its AI team. Parallel to that, I'm fortunate to also be able to work on my startup on causal inference in marketing, codename causaltune. Instead of coffee, I prefer green tea. Things like LangChain, for example. There is, you know, there are certain things. There are certain things where once you know how the good stuff tastes, you cannot really drink the bad stuff anymore. Like wine, whiskey and green tea is certainly one of them.

Demetrios [00:00:40]: Welcome back to the ML Ops community podcast. I'm your host Demetrios. And today we got into the traditional ML, the AI world of things and also the AI agents world of things. And Igor has built packages and open-source products for each one of those. It was really cool to dig into what exactly he's done with the different ones, be it fraud detection or segmentation and a B testing in emails or LLMs as being just one part of your DAG. And how it's. It's really useful to think of an LLM as taking a unstructured messy data and turning it into structured data. And lastly, the framework MotleyCrew that he created around AI agents so that you can use various AI agent tools or frameworks if you want to.

Demetrios [00:01:37]: You don't have to be locked into just LangChain or Langgraph or Llama Index or Crewai. MotleyCrew allows you to have a motley crew of agents and leverage what's best about each one of those. I think we spoke two or three times in the past year and we didn't record any of those conversations. And now I'm glad that we are finally recording this conversation. So before we get into any of the technical stuff, which I want to talk about, because as I mentioned many times when we spoke, I'm a happy user of Wise. I love the product. I know you're leading all kinds of AI initiatives at Wise. What's up with the pirate stuff? That's what I really want to know.

Egor Kraev [00:02:31]: I am one of the founders of the Swiss Pirate Party.

Demetrios [00:02:36]: What does that even mean?

Egor Kraev [00:02:38]: Well, it means whatever people make it mean. Of course, for me, what it means is that I think the balance between copyright and public access is way out of line in pretty much all Western countries or all countries I'm aware of, really. Because being able to control anything, any information that you created is not a God given right. Right. It's a monopoly created by government and therefore that monopoly must serve the public good. And because the Monopoly always destroys value, so it's only worthwhile if it also adds value. And now, for example, the thriving open source products, which we all know, from Linux to countless things, show that it's not necessary in all cases to have copyright protection for good things to come into being. Sometimes it is, sometimes it isn't.

Egor Kraev [00:03:31]: And so what? For me, what the pirate part is striving to do is to shift the balance a little bit.

Demetrios [00:03:37]: Wow. Okay. So I was way off base when I thought it was like you all were dressing up as pirates, maybe outside of Halloween and having parties. When I saw that written, I thought, oh, man, he likes to wear patches over his eyes.

Egor Kraev [00:03:54]: Oh, no, it's the kind of pirates. It's a classic example of a concept hijack.

Demetrios [00:04:00]: Yeah, great. You did well. You had me very misdirected. So the other piece that is cool to talk about before we jump into Wise, and everything you're doing there is the work that you've done over the years before you hit Wise. And I know you were in Africa. You were in Ghana for a little bit, right? Yeah. And. Yeah.

Demetrios [00:04:24]: Can you explain what you were doing there?

Egor Kraev [00:04:26]: Oh, that was a wonderful story. I was a student in the US and being a young, passionate student, I got involved in the anti globalization protests against IMF and the World bank doing bad things, or what the organizers said were bad things, in all sorts of developing countries. And then as part of those protests, it was all very nice, very civilized. In fact, I was reminded of Humberto Echo's descriptions in the Name of the Rose about the heretics. It's like a carnival. It was this wonderful, wonderful thing. And that as part of this, some of us were invited to the imf, where the people from the IMF were explaining to us what it is they actually do and they're not actually evil and all that. And after that, just that very night, there was a party at one of the nonprofits that I was hanging out with.

Egor Kraev [00:05:22]: And there I was telling the lady who was organizing the party and who heading the nonprofit, I was telling her, this IMF guy took such a horrible job of explaining what they did. I could have done a better job of explaining what they do, clearer and shorter and anything. And then she said to me, well, there's somebody I'd like you to meet. And then she introduced me to Chariza Bougre, who was a big guy in the nonprofit scene. So he had a startup in Ghana doing all sorts of public good stuff. And then he invited me over for a week, first of all, and then we got along and then I ended up spending at least half of my PhD there doing all kinds of economic research for that number of it.

Demetrios [00:06:06]: What kind of economic research was it?

Egor Kraev [00:06:09]: It was quite mundane, actually. So relationships between inflation and income distribution, inequality, that sort of thing. But what made me then move on, change careers is the realization that it doesn't really matter what research I was doing, because the name of the game is just by having somebody who does any kind of credible research moved the nonprofit to a different league, so they were invited to different tables and could take part in different conversations on the back of that. But the content itself was irrelevant and once or largely irrelevant as long as it was largely pointing in the right direction that the nonprofit was aligning with. And once I realized that and also got tired of being poor, which is part of working at nonprofits in Africa, I changed. I changed careers and went to an investment bank.

Demetrios [00:07:05]: So you went on the exact opposite side. You were like, you know that guy from the imf, he actually that stuff they're doing wasn't so bad. Maybe I should get try my hand at finance.

Egor Kraev [00:07:17]: Yeah. You know, if you can't beat and join them.

Demetrios [00:07:20]: Yeah. Incredible. Well, I want to say a few things that I've jotted down from our conversations and I would love for you to elaborate on them because every time that I talk to you, I feel like you have a ton of hot takes and you are knee deep in so many different areas of AI and ML, from traditional ML all the way to LLMs and AI, quote unquote, all the way to AI agents. And so one thing that you said to me that stuck with me was AI is a bridge from unstructured to structured. Can you explain what you mean by that?

Egor Kraev [00:08:05]: Well, I would rather say that one of the biggest thrills for large language models is being a bridge between unstructured and structured data. Because if you think about the way we did data science only two years ago is you first had to make everything into a vector or a matrix and then you could begin working with it. So when you did topic decomposition, then topics were just areas and vector space really. And then you had a hell of a job explaining what they even meant. And you have no hope in hell for a normal human to modify them. Whereas now if you do topic classification, then topics descriptions are the topics because the LLM can work with the draw text directly and at least 3/4 of production. Implications for LLMs that I've seen are just the LLM converting incoming Fluffy language data into something structured. Is it A customer complaint.

Egor Kraev [00:09:05]: What is the rate on this contract? The late charges, percentage rate? In goes a contract, out goes a number. And the vast majority of the applications that I've actually seen are just that.

Demetrios [00:09:22]: So taking a lot of messy data of the unstructured and then trying to figure out some kind of a system so that that's the input and the output is something structured.

Egor Kraev [00:09:34]: Exactly. And in fact, I think the vast majority of use cases for LLMs that actually work in production are not just income is the magic LLM, income is the magic agent and does everything, but rather you take the LLM as an additional LEGO brick in addition to all the other bricks you have, and you can buy them and that's how you get out the value.

Demetrios [00:09:58]: Okay, now do you think that it's easier to explain or champion for AI now that it feels like it's more simple to grasp like that than when you were doing ML two years ago and you had to try to tell someone, oh, well, we're going to vectorize this and then we're going to matrices and all of that jargon. Hopefully you weren't doing that when you were explaining it to leadership or, or presenting different use cases. But now you don't really have to do that. Right? You can speak at a different level.

Egor Kraev [00:10:38]: Maybe. But really the problem is never the technology or even being able to explain the technology, because if you know what you're doing, then you're able to explain it. But there seems just the biggest blockers are always of two kinds. A organizational structure, but this can be broken down with sufficient will from above at least. And the other one is the invisible walls in people's heads. Because there's a fun thing I've observed with a variety of technologies, from bringing FX swaps to Wise treasury to now LLMs in customer support to a bunch of others, is that even if you have a team and then you have. Which has an existing workflow that kind of works for them, and you bring a new technology to them which clearly adds value that they have never. Which they have never seen before.

Egor Kraev [00:11:32]: And then from the moment you start trying to educate them about it to the moment it becomes just another thing and commonplace. Minimum of two years. Wow. It doesn't matter how hard you try, it just takes time. Time for people's heads to adjust around this thing along with all the other stuff, all the. Well, around all the other problems they have. It's two years.

Demetrios [00:11:54]: Yeah. And to build those habits, to build those muscles of I'm going to use this instead of my Traditional workflow. That makes sense. So if you budget for two years, then you set your expectations at a realistic level and if it happens before then, maybe you get lucky.

Egor Kraev [00:12:10]: Yes. And generally, generally it won't. Like you could have the prototype out in two weeks, but for people to actually adjust to the fact that exists, this is how it works. That is good for you. It's not scary. Two years.

Demetrios [00:12:23]: Yeah. And if it's more than that, it's time to start looking for a new job.

Egor Kraev [00:12:29]: Perhaps. Perhaps. Thankfully. Thankfully. Wise is actually one of the more agile places. So it's. I have never had good ideas blocked here.

Demetrios [00:12:40]: Well, there's plenty of ways that you're using ML and AI Wise from the traditional fraud detection because it is a financial services company or it's a financial fintech company, I think is what you would categorize it as. Right. And for those who don't know, it makes it really easy. And the reason that I love using it is because I can have money in the US and I live in Europe and so it's really easy for me to move that money around and not have to pay exorbitant fees from traditional banks. And I can just juxtapose the differences of when I would come to Europe when I was like 19 and 20 and so Wise kind of cut that all out and made it really easy. So coming back to the AI and ML that you're using at Wise, since it is finance, I imagine there's a lot of A fraud detection and then B, maybe some. Are you doing loans? I don't think you give loans, do you?

Egor Kraev [00:13:47]: No, that we don't. But there is a lot of things. So absolutely, of course, fraud, detention, anti money laundering, the whole wide, wide area of in crime is probably the oldest because it's so clearly beneficial. And that's classic ML. So white table of data, XGBoost, high performance tuning, the whole thing. And that's also the area that our PR department is not very fond of us telling my details about. And then there's also, understandably for good reasons, the other one is Treasury. And treasury is what you'd call a trading desk in banks because the flows that people want us to transfer don't always balance.

Egor Kraev [00:14:32]: And so you have to go to the interbank market in one way or the other and source the necessary currency and then manage the risk of holding currencies. Because one way of looking at Wise is actually as a market maker to the masses, we always give a bid and an ask price for any currency pair. So effectively we are a kind of market maker, but a very unusual one because we market maker to the masses rather than to other big financial institutions. And we try to make keep our spreads as tight as possible as opposed to the banks who try to see how much they can get away with. So treasury is a lot of machine learning, trading, estimation of flows to make sure we have cash in place. For example, did you know that if you try to withdraw money in Wise in Sri Lanka or places like with the currency control places, actually it's not Wise has to send the dollars to our partner bank the day before so they can arrive overnight and wait there safely at the partner bank. So then when somebody wants to withdraw money Wise can ask the partner bank to please charge us the dollars and give you the local currency. So there is a lot going under the hood for this quasi instance experience.

Egor Kraev [00:15:48]: And then my final big area of course, or my final favorite area of course it's not by far not the only one is marketing and causal inference and all those fun and games.

Demetrios [00:15:59]: Yeah, well, and you did kind of mention before too support, right?

Egor Kraev [00:16:06]: Absolutely. I think support is comparatively young because before LLMs happened it was quite hard to do because so much of the data is text based and. But now we have a great data science team in place there and they already have some things in production and more is on the way.

Demetrios [00:16:26]: So another thing that you said that stuck with me and I want you to elaborate on Is LLMs or AI shouldn't be necessarily looked at as the solution. It should be looked at more as one step to get to the solution almost. And the way that I understood it is we should just look at it as another step in a DAG.

Egor Kraev [00:16:51]: Yeah, 100%. I find it very silly when people ask ChatGPT to add 2 and 2 and ChatGPT tells them it's 5 and they're like, oh, AI has failed. The whole thing is like you should think about this as one big Lego set and now you have a couple of extra blocks that you couldn't do before. Maybe the blocks can blink or emit sounds or whatever it is they do. There's just one more addition to your LEGO set and it'll be sealed and then it gets its power from being combined with all the others. And I'm sure it won't be the last cool thing either because I still remember when RNNs arrived 10 years ago, it was exactly the same kind of hype wave or RNNs and neural networks will solve machine learning for us. And that kind of settles down for eight years or so, there's nothing and there's another big thing comes along option will be another one.

Demetrios [00:17:50]: Now which ways are you seeing both LLMs or foundational models plus other traditional ML or just regular heuristics being used together?

Egor Kraev [00:18:08]: Well, so the most, the most obvious one is just LLMs being used in old school pipelines. Or As I mentioned, LLMs just transform data. For example, they give you a score of does this customer email look like a complaint? Like a complaint based on the text, what is the likelihood? Oh, and then you add that score to a bunch of other data points you might have and do old school machine learning to classify the email. Yeah, and that is at this point this is the easiest thing to do, the most controllable, the safest, and so also the most production radio for the other ones. I guess the Fun Part about LLMs is that they humanize machine learning in so many ways. So they make this interface between, they make this interface between humans and machine learning fluid. For example, in customer support bots, right before you had this text block and then you had to classify it, what is the customer asking about? Is it an old school model, vector spaces, yada yada? No, you can just ask an LLM, does this info message contain enough information to understand what the customer wants? If not, what else should I ask the customer? And that's a little prompt and then you can ask the customer for more information and they'll give it to you. So you unlock this whole interactive potential that just wasn't there.

Demetrios [00:19:38]: And I want to talk about causal inference technology and what you're doing there because you said marketing and all of that fun stuff is really one of your passions. Give me the lay of the land on what you're doing, how you're doing it and what it looks like with ML AI. I know it's a B testing, right? But what else is going on there?

Egor Kraev [00:20:01]: Well, it's not just a B testing, but the trick is that the trick is estimating causal impacts and that is hard. It's not like regular machine learning because in regular machine learning, for example, if you want to predict how much a customer will buy in the next month, then after a month you can see how much have they bought. So you have an observed true value and then you can rate your prediction or the multiple prediction variants, you can rate which one was closest. When you choose to send a customer email al A versus email B, you really have no way of measuring that impact directly because you cannot send only email A and only Email B to the same customer and compare. So that's quite hard. And then unsurprisingly, people have built models specifically for this case, causal inference models. So for example, Microsoft's Econom library is wonderful, but then again, they have half a dozen different models, each with its own hyperparameter universe and absolutely no guidance about which one to use. So what we've done twice under my guidance is to find a way to score these.

Egor Kraev [00:21:17]: So even though you can't observe individual impacts, turns out if you have a whole population, like an AB test, you can score these models out of sample. And once you can score out of sample, you can do model selection, hyperparameter tuning, all those wonderful automl things. And so now you actually have, even though you can't observe them directly, you have a verified and selected estimate of impact for every single customer. And then, so now you can. And once you have that estimate, you can do fun things, right? You can do targeting first of all. So you can send to the customer the email that is most likely to have them do the action you want them to do, like click on that link. You can also do segmentation. You can see which parts, once you have that impact, a customer level, you can now segment much cleaner.

Egor Kraev [00:22:07]: Because how do people do a B test segmentation now they slice the whole a B test sample in little chunks and try to see if there's significance. But that's very, very noisy. That doesn't work this way. You can.

Demetrios [00:22:20]: How many customers or people do you need to have on a list in order for this to actually have like statistical relevance?

Egor Kraev [00:22:31]: So certainly as many as you have for a regular a B test is enough. Oh, usually I expect it to be even smaller because now instead of treating in an a B test the whole customer variability, you treat like noise, right? You average over it. All you want is an average. Whereas with all these models, you first model customer natural variability based on the customer features, what you know about them. Because after this, you can only model the impact on top of this. So now you actually treat customer variability in the customer behavior signal. And so this is also. So we haven't tested that extensively, but I expect you actually need smaller sample sizes than for behavior tests.

Demetrios [00:23:19]: And is this what you're doing with Wise Pizza or is Wise Pizza a little different?

Egor Kraev [00:23:24]: Wise Pizza is related. So Wise Pizza is there for finding fun segments. It started with growth analysis. Supposing you have this data set with customer segments, regions, the device they use, region, they're in customer, from currency to currency product, they used any Number of dimensions you might want to have for your customers. So you have a million micro segments and now you want to find out my overall say area revenues per customer went down by 2% or went up by 10% from 1/4 to the next. And you want to find out which simple explainable segments were driving this. Oh, and that's what Wise Pizza does. Or Rick, my growth rate went down by 1% from 1/4 to the next.

Egor Kraev [00:24:16]: What were the main customer segments explained in simple terms in terms of those dimensions I gave that drove it. And you could also apply it in causal tuned results. So you could also apply it on causal inference results but you don't have to.

Demetrios [00:24:30]: Do you always find segments that match up and that you understand? Oh, it's, it's these people with these different features that drove whatever the question was whether it's an increase or decrease in revenue and or is it sometimes that it just is a little bit scattered like or, or very weighted towards one or two people that don't really have anything in common, I guess.

Egor Kraev [00:25:01]: Well in this sense fortunate to be a B2C company. So we have large, large customer bases and so sample size is not a problem. And this thing will always find something, right? Because that's machine learning. That's what machine learning models are built to do. You tell them to find something, they will find it. And then you could always just split your sample in half and fit it on one half and then look at statistical significance on the other half and if it's there then it's really something.

Demetrios [00:25:35]: Wow, that is so cool. And how do you bring that? Because. Wow. So let's say that you are answering some of these questions and you're finding different segments that are driving the reasons and you've split it in half and then you've recognized okay, this, this seems to be true. Right. Then what do you do with that information?

Egor Kraev [00:26:04]: Well that's really where humans step in. In our case we have so what the rest of the world calls data scientists, the wires calls analysts. And what Wise calls data scientists is more like an ML research engineer in the rest of the world. And so we have those wonderful analysts who then go in and dig deep into those segments and now they don't have to go through wandering around in pivot tables looking for something that's out of line. They can clearly see these are the drivers. Okay, now let's look at say iPhone users in Asia, what happened there or it's really the big tickets that drive the change. So let's look at the Big tickets, what's going on there. But that's also a bigger philosophical point.

Egor Kraev [00:26:51]: I have exactly zero fear that two to be replaced by algorithms. In fact, the more machine learning can do things that are currently manual labor by data scientists, the more data scientists can use them and therefore add even more value. So the more labor replacing stuff like ChatGPT happens, the more jobs there will be for data scientists, not the other way around.

Demetrios [00:27:16]: Fascinating. And so you're digging around in that data, you're trying to find insights and then they're presented to leadership and hopefully you're whatever, creating a new campaign to try and target that specific segment or you're running a discount or you're doing something, some kind of action is taken to help with that.

Egor Kraev [00:27:40]: Exactly. And also of course it wouldn't be 2024 if I wasn't currently working on an extended agentic version of this. When you run this analysis, you come up with those segments and then you see whether you can source some more data from wikis or chat or slack channels or any kind of internal information you have that would help explain it or maybe be relevant. You still could never fully automate it, but you could go a long way. But you would have an interactive tool where in dialogue with the machine, humans could do a much better job of telling the whole story, not just the numbers.

Demetrios [00:28:20]: So if I'm understanding you correctly, your analysts are digging through very specific segments of users who are driving some question that you have. And let's use this example of revenue went down 1% in the last quarter. So let's figure out who are the major reasons for that and you find out that it's iPhone users in Asia or something along those lines, analysts are digging through that data and then they can augment that data, that structured data with oh, you know what happened, we raised the prices in Asia for XYZ transfers or we did something. And those analysts would not know that unless they had the agent bringing them extra context.

Egor Kraev [00:29:15]: Absolutely. And this is not well. So I don't want to over promise this is not something we have just yet. It's something I'm actively working on right now. So hopefully we have something rough out by end of year, but that's a niche that I don't really see occupied right now because like let's ingest all your corporate data and put a chatbot on top kind of startups are dime a dozen, that's such a natural idea. But combining this with qualitative analysis to do storytelling that's linked to what the data Tells you that is not really something I've seen out there.

Demetrios [00:29:56]: And so in a way it's just, it's plugging into all of your internal docs, your internal messaging systems and it's trying to find relevant information to the data segment. Or how is the, how are you interacting with the agent? Is it through a chatbot?

Egor Kraev [00:30:15]: Yes, I think so. It's just, just, it's just a chat dialogue. So again this is now very much in the design stage again, but it's very much just the dialogue. So you have the kind of report with graphs and text that the thing is generating and then there is a chat panel to one side telling you where you can say oh no, let's zoom into the segment or what's going on here? Or they make changes report this way. Because I think it has to be a dialogue because ultimately humans, they know what story they want to tell and this has to be a story that's supported by the data. But potentially there are many stories that could be supported by the data and ultimately it's a human thing. Of those stories, which are, which is the one you actually want to focus on this time?

Demetrios [00:31:01]: Yeah, I also like the ux. Sometimes when I'm playing around with different agent chatbots of it will guess what question I want so I don't have to think as hard. And for me it's a lot easier. Just say oh yeah, let's see what's going on there. Almost like when you click on a YouTube video because the title entices you. So the agent comes up with like four questions that maybe you might want to know. And I've seen this done where you ask an initial question and then there's the follow up questions. That's a pretty common theme that is in agents these days or just in chatbots in general.

Demetrios [00:31:42]: And it feels like it would be very cool if you're moving around in the data and you're asking questions and then the agent can suggest things like oh, maybe you want to know about X, Y, Z. And so I like the way that you're going about that. And it also is, is very nice to augment the capabilities and augment the tools that an analyst or a data scientist has at their fingertips to be able to tell the story that they want to tell. As you were saying, when you're building out the agents, what have been things that have been difficult.

Egor Kraev [00:32:28]: I am afraid it's the usual, right Prerequisites, data quality and then convincing and then getting, working with engineering lead times for getting things into production. So those are. But because a chatbot on its own is good as a toy. But for example, if you want a chatbot to answer customer queries, you need to have a hard old school underlying taxonomy of what the possible customer questions you could handle are. And you have to distill that taxonomy and build it and expose it to the agent and then the actual agent part is easy. In fact I think that's the general theme and the whole hype around agents like it's kind of fun but I think agents like using an agent in itself will be kind of in very shortly will be kind of like using a database. Like yeah, they're useful, they add value, but it's not a big deal. It's like oh, we have a database and we have a multi database application.

Egor Kraev [00:33:34]: How wonderful. It's, you know, it's just a pattern and it's not a very hard pattern. All the stuff around you to make it work a hard agents as well. So easy.

Demetrios [00:33:46]: The database is not for the majority of people. It's not the most sexy of technologies. It's kind of old. Maybe it was back in the day, but now it is. Like you said, it's just the database.

Egor Kraev [00:34:01]: That'S the ultimate success actually.

Demetrios [00:34:04]: Yeah, yeah. So is that the maturity of where it's at? So the thing that I, I'm wondering around agents is how are you making sure that the right data is not going to the wrong person or the wrong data is not going to the wrong person? I, I guess when it comes to for example with Google Docs you have a very clear sharing protocol and I'm not worried that somebody is going to be able to see all my Google Docs. But with agents I think it's really easy to not have that role based access. And so the agent can now get into any data that it wants to even in private Slack channels or whatever and then you're surfacing information that that analyst maybe shouldn't have access to.

Egor Kraev [00:35:08]: Well that's an excellent point actually. And to some, well on one level we don't really have that problem yet because we don't have, we certainly don't have customer facing agents which directly output LLMs. All this thing is human in the loop and also internal tools. Most of these things they provide drafts which humans then can edit. But in the long run, actually the problem you address is not that difficult because the way you store things, you think about how these things work, you ingest snippets from everywhere and then those snippets and then Via some kind of rag. And I could go on about rag for a day. Probably will be giving some talks on its sewing stuff on. But you retrieve some of the snippets that might look relevant or that look like they might be relevant, and then you give those snippets in the prompt.

Egor Kraev [00:36:07]: They'll learn along with your question. Now, there is nothing easier than to attach to those snippets metadata where we channel they came from and then filter by that. So that's actually not a very hard problem, but it's an important one.

Demetrios [00:36:24]: Mm. Oh. So you're doing it within the metadata of the different chunks as opposed to in the database itself.

Egor Kraev [00:36:35]: Oh, well, I'm not sure how else you would do it because there are so many different systems and the permissions are granular and entangled. So all you could. The best you can hope for is just. Is just flag. Okay, this chunk came from here. Is this person titled to this source? Yes or no?

Demetrios [00:36:56]: Yeah, yeah. Is this person in that channel? If it's a private channel, and if not, then make sure not to include it.

Egor Kraev [00:37:04]: Exactly. Who could they read? Are they allowed to read this wiki page? And so on and so forth, huh?

Demetrios [00:37:10]: Yeah. Okay, so now tell me about Motley Crew.

Egor Kraev [00:37:16]: Ah, Motley Crew is fun. Basically, it's the usual nerd thing. We had a vision with my partner about what agent frameworks should be like. I first started looking because I assumed there was something there would be something there, and I looked at a bunch of them. LangChain Alamanx didn't have much back then. Crewai, a couple of the young, a couple of the more prototypy ones that were around back then. I didn't find anyone that looked that worked quite the way I wanted it to. And in particular, Kurai came closest.

Egor Kraev [00:37:59]: But then what? But then they really wanted to have their own walled garden. So when I submitted to them a PR that would allow to freely interoperate with any kind of LangChain, Alama, Index agents or any other agents, they ignored the PR. Oh, wow. @ that point, I said, no, it's a Lego set, right? It's my favorite metaphor for this whole game. I want to mix and match. And so that was the starting point. And so now Motley Crue's core premise is you want to be able to mix and match any frameworks at all from Autogen, Llama, Index, LangChain, krii. They all have their strength and their weaknesses, so you should be able to use the best tool for the job without trying to Pull people into a locked garden.

Egor Kraev [00:38:45]: And then as we try to use it for things, which is the only way to make it good, we also started adding other patterns that I haven't seen anywhere. And my favorite one, for example, is forest validation. So what normally happens when you use an agent with tools? For example, you have an agent that generates Python code, then you do want to make sure that the Python code is valid. Then the agent calls a tool which tries, for example, to run the code, and if there's any errors, it feeds them back to the agent. And the hope is you tell the agent in the prompt that the agent will keep trying until. The agent will keep trying until the code is valid, until the tool says you're good. However, there is no guarantee of doing this, and the LLMs are famous for doing strange things sometimes. So basically you put the intent into the prompt and hope.

Egor Kraev [00:39:43]: Whereas with forced validation, what you say is that the agent is only allowed to return result via the tool. And so the agent tries to call the tool with that, say, Python code. If the code is fine, the tool returns the code, and if the code is not fine, it returns the reasons why not to the agent. The agent tries again, and if the agent tries to return directly to the user, the agent gets told no, you have to return by calling the tool try again. And this way you have guarantees, because then if you get anything at all back from the agent, you know it's been validated and you see this pattern, people reinventing this pattern with like Klami index workflows and line graph and whatnot. But I haven't really heard it described as a pattern elsewhere, strangely enough.

Demetrios [00:40:33]: Huh.

Egor Kraev [00:40:35]: So you're.

Demetrios [00:40:36]: I like the force validation description because it is very clear on how the agent is not the one that is giving you it, it's the tool. And by way of the tool, if you're getting it back, then you know that it's. It's been able to go through the tool and past what it needs to go through.

Egor Kraev [00:40:59]: Exactly.

Demetrios [00:41:01]: Huh. Okay, so and the other idea on Motley Crue, which is really cool, is the. And it also makes sense on the name. You're using any framework that you want. So how does that even look? It's an abstraction above the lane graphs and the llama indexes and autogens.

Egor Kraev [00:41:20]: So right now, first of all, all of our. Well, we have wrappers for all the common agent types, because every framework has their own agent parent class. And we can wrap them all, which is necessary, for example, to make them all support the forced validation pattern which we can. Well, certainly Llamaindex and LangChain agents we can make support. We support the false validation pattern as well. They all support the runnable interface. So you can plug them into line graph as well because line graph is actually cool. Cool.

Egor Kraev [00:41:55]: And so those are the main two bits. So you have wrappers and those wrappers also inherit from Lang Runnable because you can say many things about Langshade which are not which contain the word flag planting and similar not entirely nice words. But some of the things it has is really cool and langgraph is certainly one of them.

Demetrios [00:42:17]: Yeah, yeah, excellent. Well, that's fun. And that's fully open source, so anybody can go and play with it right now.

Egor Kraev [00:42:27]: Exactly. And the commitment is to make it truly open source so it will never be used to try and upsell people into stuff or will never deliberately cripple it to make people pay a little more for the pay a little for the paid version. It's meant to be max. And that's also why it is maximally compatible with everything. Like the next fun thing that we'll have to look at now is Anthropic's model context protocol. Oh yeah, it looks really cool. So we'll really have to support that.

Demetrios [00:42:54]: Very cool.

Egor Kraev [00:42:55]: Not there yet, but coming soon.

Demetrios [00:42:58]: And now, what's Causal Tune? First of all, can we just take a moment to note that like you've been pretty busy creating a lot of stuff over there. We've gone through what Wise Pizza, we went through Motley Crue. Now we're going to talk about Causal Tune and also just everything in general that you're up to. Hats off to you all at the Wise, ML and AI section of that company.

Egor Kraev [00:43:27]: Well, first of all, I've been very fortunate to work at Wise, which is very cool with people doing other things in parallel outside of working hours. And so now I've been able to officially lay down my people leadership responsibilities and work part time, advise and devote the rest of my time to bringing up a startup. And CAUSAltune is exactly the stuff I've been telling you about causal inference, segmentation. So the idea that in marketing you can extract more value from an A B test than just averages. In fact, you can observe the impact on every customer and use this for segmentation by impact, targeting by impact, and all those wonderful things which until you've seen them done, you wouldn't even believe are possible.

Demetrios [00:44:18]: So causaltune is the causal inference with marketing, but it's also the segmenting piece because I know there was A few different parts that we talked about within that.

Egor Kraev [00:44:29]: Well, so causal tune, it's an open source, it's a library open source by Wise. So we are actually using it at Wise, by the way, successfully with the numbers. So we do see distinct upticks and click through rates and such like. So causaltude is just library for causal inference. So it does two cool things. Cool thing number one, it allows you estimate customer level impacts. Cool thing number two, it allows you to estimate outcomes of hypothetical treatments. So suppose you did some kind of assignment using your causal tune targeting or just a random test even.

Egor Kraev [00:45:06]: And then all of a sudden your head of marketing comes in and says, oh no, we should have reused these rules instead. And then instead of waiting for another month to run another test to test those rules, it's actually possible you having for example, a past randomized trial result to compute with a very high degree of precision of what the outcome of any rule would have been on that sample. Oh, so if you want to test. So if you want to test ideas of simple targeting rules, you don't have to run a new test for each one. You just run a random test first and then you use that data set and get the outcomes with confidence intervals of any other assignment. You could try it.

Demetrios [00:45:50]: And you don't actually need to even be sending emails. You just run the tests on old data.

Egor Kraev [00:45:58]: Well, you need to have one test, okay, you need to have one test. A fully randomized one will do perfectly. And then using this, you can estimate, you can get the average outcome of any other assignment based on the same customer features that you had in the original test without running more tests. So why do people, for example, set aside randomized samples when they're targeting. It's just ways of sample size when you can just do it exposed by math.

Demetrios [00:46:29]: Yeah. Okay. And so now what you you mentioned, you kind of hinted at it. You're building a startup with this. What does that look like?

Egor Kraev [00:46:38]: And so very early stages. We hope to have something up in a couple of weeks because it's not just the product itself. So the technology itself has been tested. Advice. It works. It's been open sourced by WYZ as well. So there is no obstacle to using it in the startup by anybody, including myself. But now there's the whole machinery of a SaaS, startup hosting and to user authentication and all the bits that go into a functioning system.

Egor Kraev [00:47:11]: Once we get those up, we really would love to have more people outside of Wise try this out and see the benefits. Because any kind of targeting where you know something about the users who you target is this is the technology with which you can squeeze as much information. We can do the targeting as well as it can be done, given the features that you have.

Demetrios [00:47:34]: Hmm.

Egor Kraev [00:47:35]: I actually think it's pretty. It's pretty close to optimal.

Demetrios [00:47:39]: And how have you been using it? Is it just more for. If you're running a sale or if you're running better rates or what are the. Can you give me an example of what you actually do? Because I'm not sure I fully get across the finish line on understanding how that looks with an email that I get from Wise.

Egor Kraev [00:48:01]: So in this particular case, the one that we have just got where the test is just over and we've seen nice numbers come out, the. So Wise doesn't just offer transfers, it offers many, many nice things. You have assets, you have balances, you have the card, and you can use the card in different ways. And so we have between half a dozen and dozen emails encouraging users to use a particular aspect of our offering, what we internally call a product. And then the measure of success are people who. Who actually go ahead and not just click on the email, but actually register and start using that particular aspect of our offering within a certain time window of receiving the email. And then the question is, now that you have eight or ten different encouragements for different facets that were to choose from, which one do you send to this user? And that's something which you can try naive rules, but you'll get much better impact if you do it. So with this kind of technology.

Demetrios [00:49:13]: Okay, okay. Yeah, it makes sense. So it's like, hey, this person. Generally we'll use my use case because I feel like I've probably seen these emails before. And as I mentioned, I use Wise. There you're looking at me and you're saying, I generally transfer from American dollars to Euros, and I'm using the savings function or the checking function right now. And you know what he might like is the credit card or the card. Because I have thought about that.

Demetrios [00:49:52]: And so I definitely have. And if an email hit me in the right moment, I probably would end up getting it.

Egor Kraev [00:50:02]: Exactly. And that's exactly the kind of thing. And the nice thing is that these things are that now you can take into account many of the other things, like different regions might grow differently or people with different average transaction sizes might behave differently. You can take any kind of features into account, just train your model and it tells you which is the thing that's most likely to produce a positive result.

Demetrios [00:50:28]: Very Cool. Well, can we talk a little bit about organizational structures and what thoughts you have on those?

Egor Kraev [00:50:39]: Oh, absolutely, I would love to. In fact, that's the big revelation when I came to Wise was the degree of autonomy that not just a person, but a team could have. And when I first joined Wise treasury, it was really a revelation where to be in an organization where certainly all I saw around me is that nobody told anybody else what to do, not even your lead, which is strange when you hear about it, but it actually works. And so this idea of autonomous organizations is very close to my heart since. Because it's not just about how people behave, it's not just about intent, it's also about how organizations are structured. For example, if you have a vertical IT organization, which everybody has to compete for, you can forget about autonomy. Or if you have an organization, if you have a team, which is only regarded as a cost center, then this team will penny pinch to the detriment of the rest of our organization because that's their incentives. And so there is a personal and a structural component to it.

Egor Kraev [00:51:48]: And I've actually been working on a book. Let's see how well I get about this. So reasonably far advanced, hoping to bring it out early next year, but we'll see. Nice. And the fun part about the timeliness of this is that I think, and I'm not the only one thinking this, of course, that the new wave of AI will transform the way organizations are structured. Right. If you think about it, a typical firm, sort of like a big firm, is structured around information flows and as information flows are hierarchical, because that's, until recently, that's the only way humans knew how to deal with text data. You have middle managers making reports for their managers, who are making reports for their managers.

Egor Kraev [00:52:39]: And then many Chinese whispers later, the CEO thinks they know what's going on.

Demetrios [00:52:44]: Yeah.

Egor Kraev [00:52:45]: But now if you can shortcut this pyramid of Chinese whispers and just have AI look directly at all the raw data in the organization and if necessary, go out and ask questions over chat when it doesn't know enough and then give you the answer dragging on the right granularity, maybe you don't need the hierarchy. And so that's a very, very exciting thought. And there's certainly a space that I will try be presenting over the next, over the next years.

Demetrios [00:53:19]: You did say something that was a question that came up time and time again when we did the AI Agents in Production conference a few weeks back, which is how do you create an agent that understands when it does not have enough Information. That's a really hard problem I think like you just said, because AI and you, you mentioned it before, like if you give AI and ML a task to find something, it's always going to find something. Whether or not it actually is the thing that you're looking for. It's always going to find something. So it's really hard to have an agent understand that it does not have sufficient information to answer whatever the question is or whatever the report it's drawing up. It will. A lot of times that's when it defaults to hallucinating.

Egor Kraev [00:54:14]: That is true, but at the same time I think there are solutions for this. You have to build to this from the start. I mean the simplest one is consensus. If you run it several times, does it come up with the same answer or different answers? And if it does come up with, if it does come up with different answers, then maybe it does. No, and I'm 100% but frankly, as far as this kind of technology goes, I'm happy to be a fast follower. I'm 100% sure that smart and well paid teams at Google, OpenAI et cetera are even now working on smarter decoding methods to deal with this because it's such an obvious blind spot in LLMs and I'd rather wait another quarter to quarters and use that to build products.

Demetrios [00:55:07]: Yeah, and it was a bit of a tangent to the actual organizational structures conversation that I wanted to have but you said it and my mind went that way like a little dog chasing a ball. The organizational structures piece though for me is interesting because a I can imagine a lot of folks that are listening, including myself, they think well how does anything get done and how do real initiatives, who's crafting the high level vision or the goals if nobody is telling anybody else what to do?

Egor Kraev [00:55:47]: Well, I think there is still hierarchy. It's just that the hierarchy is not coercive. It's just that the point of a lead is not telling people what to do. It's really. Instead it's the following. Firstly it's telling people the story. Why is the thing here? What is it there for in big terms, how does it fit in with the rest of the organization? That's task number one and task number two is this is really enough for smart people to figure out what is most important for them to look at and the other half is then clearing out the way obstacles the team can deal with itself. So if there is an organizational problem somewhere, somewhere else in the organization, some stupid process is stuck or things happen.

Egor Kraev [00:56:37]: But Then just going in there, cleaning this up. I think those are the two biggest things. And if a leader, each level does this, especially the storytelling part, then the amount of freedom that is unleashed thereby the amount of creativity is astounding. And it even has an economic effect. So when I was hiring for Wise Data Science Team, it was not uncommon to people turn down higher offers elsewhere because they came here for the autonomy.

Demetrios [00:57:10]: And the autonomy in practice means that I, if I was on your team, I can say, you know what this project, considering what our story is and why we're here on this team, I feel like the best way to help us move the needle on this project or on this team is to do this project and I propose something and then it gets sign off or is it just I run with it and I say, hey, I've already hacked together a little bit and now I need more resources from XYZ teams.

Egor Kraev [00:57:48]: And that's where I guess that's how seniority works in autonomous teams. Because seniority is measured by the amount of people that you're able to bring along with your idea. So if you're really a junior and you don't really know what's important, then you're generally happy to take guidance. And then as you progress, you have to convince people around you that what you're proposing makes sense and at the very least makes enough sense for you to work on it. But then how you grow to be more senior is when you start saying, you start convincing people that it's important enough thing for the whole team to work on it. And the more people you can engage and bring along with your storytelling, the more senior you are. And then eventually like titles adjust.

Demetrios [00:58:44]: And what about this scenario? I am very excited about working on a project like let's just say Wise Pizza. Before Wise Pizza was a thing. I go around trying to rally the troops and it falls flat. People don't understand it. People aren't really interested in joining the cause. How long do I have to try and champion for that before it falls flat completely and I give up and I'm on to something new. Is it a matter of days, weeks, months? Or is it just something that I put on the back burner and I never truly let the dream die?

Egor Kraev [00:59:31]: Well, first of all, of course everything I say, needless to say, is my are my opinions about how and autonomous organizations where an autonomy would really work in any particular organization, including Wise, many people would disagree. That's how organizations and humans work. So I would say that since there is no objective standard of truth, of usefulness for most people, maybe marketing people, they increase revenue. But most people in our organization have no direct, measurable impact on their own. Then you have to bring. You have to deliver enough value that makes sense to people around you, but you'll have to have some slack to pursue things which. To pursue things which might make no sense to them, but you have to have. The more record you have of delivering things that visibly add value in a way that makes sense to others, the more slack you will gradually get for doing things that might not make sense at once.

Egor Kraev [01:00:39]: But it's very much a relationship. It's a human thing. I guess that's the thing about autonomy. Whether it's not, you can think about. My favorite metaphor is that of a machine versus a forest. So many advice, so much advice on scaling an organization centers around making it like a machine. Every role is precisely described. People are replaceable, and then it can scale.

Egor Kraev [01:01:04]: Whereas autonomy centric organizations are more like gardens where things grow the way they are and they adjust around each other. But it's really. Oh, there is. No two plants are alike, no two humans are alike. And then you have to. You have to tell stories to the people around you, and you and what you're doing must make sense to them. And that's the main criteria.

Demetrios [01:01:34]: Are you familiar with permaculture and that whole movement for gardening?

Egor Kraev [01:01:40]: Which one?

Demetrios [01:01:41]: Permaculture?

Egor Kraev [01:01:42]: I am not. Oh, what is it about?

Demetrios [01:01:44]: So it talks about how. And I'm by no means an expert, but from what I understand is it is not only if you look at traditional farming and traditional gardening, you'll place all the tomatoes in a line and. Or maybe you'll have a whole field of corn. But this is more like. You know, what goes really well with tomatoes is basil, because it keeps the fruit flies away. And so you put tomato, one tomato plant, and then one basil plant and one tomato plant. And so you're varying things and you put different plants together because they have a nice ecosystem or homeostasis together. And it feels like that is what you're talking about.

Demetrios [01:02:34]: It's. It's not only gardening and having each individual plant be an individual that grows in the way that it does. But if you can put two plants together that work very nicely together, you're going to get that combined effect and the outcome of both plants doing better.

Egor Kraev [01:02:54]: Ah, this is. Thank you for a lovely metaphor. In fact, this is perfect because also in terms of autonomy, functional verticals are the kiss of death. So a centralized IT reporting line, which overrules product centric reporting lines is the case of death for any kind of autonomy. So it's exactly this. You have to mix different specialties and proportions that allow them to do the thing that they had to do and without. Without having any kind of vertical, vertical priorities getting in the way of that.

Demetrios [01:03:28]: Brilliant.

+ Read More

Watch More

Trustworthy Machine Learning
Posted Sep 20, 2022 | Views 1.3K
# Trustworthy ML
# IBM
# IBM Research
FastAPI for Machine Learning
Posted Apr 29, 2022 | Views 1.6K
# FastAPI
# ML Platform
# Building Communities
# forethought.ai
Monzo Machine Learning Case Study
Posted Dec 07, 2020 | Views 1.9K
# FinTech
# Case Study
# Interview