Sign in or Join the community to continue

Fast & Asynchronous: Drift Your AI, Not Your GPU Bill // Artem Yushkovskiy

Posted Dec 10, 2025 | Views 37

# Agents in Production

# Prosus Group

# AI Drift

Share

Speaker

Artem Yushkovskiy

Sr ML Engineer @ Delivery Hero SE

Sr ML Engineer at Delivery Hero, for the last >7 years building ML platforms and ML use-cases. Now scaling a global image auto-enhancement service that rides on a massive self-hosted Kubernetes infra. Passionate about MLOps, distributed systems, and anything that bends infrastructure to the will of AI.

+ Read More

SUMMARY

Stop thinking of POST /predict when someone says ""serving AI"". At Delivery Hero, we've rethought Gen AI infrastructure from the ground up, with async message queues, actor-model microservices, and zero-to-infinity autoscaling - no orchestrators, no waste, no surprising GPU bills. Here's the paradigm shift: treat every AI step as an independent async actor (we call them ""asyas""). Data ingestion? One asya. Prompt construction? Another. Smart model routing? Another. Pre-processing, analysis, backend logic, even agents — dozens of specialized actors coexist on the same GPU cluster and talk to each other, each scaling from zero to whatever capacity you need. The result? Dramatically lower GPU costs, true composability, and a maintainable system that actually matches how AI workloads behave. We'll show the evolution of our project - DAGs to distributed stateless async actors - and demonstrate how naturally this architecture serves real-world production needs. The framework is open-source as Asya. If time permits, we'll also discuss bridging these async pipelines with synchronous MCP servers when real-time responses are required. Come see why async isn't an optimization — it's a paradigm shift for AI infrastructure.

+ Read More

TRANSCRIPT

Artem Yushkovskiy [00:00:04]: All right, thank you. Yeah, my talk is going to be less about actor agents and more about actors. And I couldn't share my very famous picture with me, Vin Diesel. So yeah, who I am, I have the background from very broad background from web security to ML Ops and now AIOps. I see myself more as the bridge between everyone in the team, at least a little bit involved in the project. So yeah, who is. I'm working currently for the last four years for delivery here. It's a global company with we are operating in multiple countries and we are providing the food delivery services.

Artem Yushkovskiy [00:00:52]: My current team, what we're doing, we are.

Artem Yushkovskiy [00:00:57]: Automatically using AI improving the quality of images on the menu. So for example, the restaurant uploads the image which is too blurry, too cropped, too close or too far away. And we're using different pipelines.

Artem Yushkovskiy [00:01:15]: To automatically enhance these images. So as an example you can see this image is blurred and too close and we are producing multiple variations and some of them are in this background with that background, some of them are unrealistic. And we're facing multiple problems of how to not only.

Artem Yushkovskiy [00:01:38]: Analyze and detect.

Artem Yushkovskiy [00:01:41]: Which images will be better when deployed to, to the product, but also we're facing lots of cost constraints because the scale is very large. And today in my talk I will present, I will try to shift the paradigm for whatever you think about.

Artem Yushkovskiy [00:02:04]: Batch pipelines for gen use cases. And I'll present the tool that my, my team has developed in throughout the last two years. So yeah, let's get started. So two years ago it is something like Dall E2 era and we're tasked with this project, we're doing the POC. It's a simple Kubeflow pipeline that calls the APIs and everything works. And we're very excited. We start scaling it to the MVP. We are calling a few.

Artem Yushkovskiy [00:02:40]: AIs through API calls. We're connected to some backend logic and everything looks fine as long as we start until we start scaling it to.

Artem Yushkovskiy [00:02:50]: Hundreds and thousands of pipelines because we're starting to get rate limited. We started to get random errors in random places and it just doesn't scale at the beginning. We're spending 60 to 80% of engineering efforts just to keep the lights on and that was not the way to go. One of the most pressing issues was of course the cost. Oh yeah, sorry for AI, it looks like there is just phone that starts suddenly ringing from all over the place and this is not the way to go, at least for the batch use cases. So how it looks like.

Artem Yushkovskiy [00:03:32]: On the diagram is that we have multiple pipelines, we have single or multiple AI servers, which is the APIs. Some requests are served properly, some of them start rate limiting. The pipelines get into the.

Artem Yushkovskiy [00:03:48]: Exponential back of waiting time and then we just don't know. We observe that there is 50 pipelines waiting for something and we can't do anything with that.

Artem Yushkovskiy [00:03:59]: So for cost constraints and for rate limiting, to solve rate limiting what we've done we took this AI and self hosted in the same, rather same interface request response, but asynchronously. So we have a separate cluster that has several models running there and talking through sqs message queues to the pipelines. And yeah, we've sold for the AI it looks like not the phone starts ringing, but it has to reply to the mails which is.

Artem Yushkovskiy [00:04:37]: When there is no strict latency concerns. It will work.

Artem Yushkovskiy [00:04:42]: But not always because the pipeline still starts waiting for something, start getting errors on this part, on that part, and it is still extremely hard to scale. You see, we've solved many problems, but not all of them. So we still spent at least half of our time just running the pipelines at scale. So one small off topic on this data is completely imaginary. So we're not talking of millions of tokens, we're talking in war and peace volumes which are known to be very thick. So if you talk about cost and you're working with any kind of cloud provider, your cost will grow very fast.

Artem Yushkovskiy [00:05:34]: In best scenario at linear pace. So imagine one cloud provider, second cloud provider, but once you start self hosting these models, your cost starts to be very, very low if you manage to optimize it properly. So we didn't stop at that architecture before we.

Artem Yushkovskiy [00:06:00]: Decided to.

Artem Yushkovskiy [00:06:02]: Completely decouple the pipelines and apply the best practices of engineering to make it distributed. And what we've came to was the actor framework, the old ideas, but working surprisingly fine for complex AI powered pipelines where some of the steps are very fast milliseconds, some of them are very slow seconds or minutes. And this all has to work in a single pipeline. So you imagine there is a flow of input requests, some the actor takes the risk request, it scales up when needed, it processes the request, sends it further to the next actor, the next actor sees the request, scales up and so on. And we have this cascade of requests with a proper error handler handling. If something happens, it goes to the deal queue or to the error processor that is able to return the message back to the queue so that someone else is able to retry if their Error is retrieval. You can see that.

Artem Yushkovskiy [00:07:13]: We have optimized our GPU usage. So yeah, all these actors are not talking to APIs, they are self hosted on local GPUs in the cluster. So only when we have actual work, only when we're having the actual traffic, we scale these actors up. So yeah, this allowed us to push the throughput to the limits, to not have any rate limits and to have fully distributed system without any orchestrator. And yeah, so minimal cost, no vendor lock, we're fully in control of each AI being served in our cluster. And as a bonus, this system works in the near real time mode once you put a small add on on the latency.

Artem Yushkovskiy [00:08:06]: Tracking the latency, the timeouts for end to end request. So we decided not to stop there and we extracted this all ideas into the framework that we have open sourced last week. We call it Asia. This is the short for asynchronous actors. So the basic architecture is that the key idea is that there is no single pipeline. There is no.

Artem Yushkovskiy [00:08:35]: Notion of the pipeline, there is just a message. The message contains the root which is just a list of steps that this message has to visit, list of actors. And then the pointer which is the current step I'm in, there's a payload which contains whatever necessary information for each actor. And then the actors do not think in the request response, they are thinking in more.

Artem Yushkovskiy [00:09:00]: Enrichment pattern. Meaning that you receive some big chunk of payload data, you don't delete anything, you just add some more information to this payload and send it further so that some other actor downstream might use it, might not use it. So at the end we have very, very big payload.

Artem Yushkovskiy [00:09:19]: This message gets into the first queue, it gets the first actor. It is relatively fast, so we scale it with a medium pace. Then.

Artem Yushkovskiy [00:09:30]: Once it's done, it sends this message to the next actor. This actor is extremely fast, so it doesn't scale that fast. Then we send it to the next one. This one is more powerful actor. This is called truther. It is able to rewrite the route. So we allow to add more steps from current to the further to the future. So that this truther can can be basically a LLM judge or some smart router that decides which path this, this package needs to take.

Artem Yushkovskiy [00:10:07]: Now this or that. It can even create loops if needed. And the framework protects you from from any kind of errors because in case of any error, the message automatically sent to the error end error handler which persists the message or does the retry logic or if everything was fine, we're Sending it to the happy end which also persists or does some logic. On top of that we have implemented the easy synchronous gateway for all the systems so that you as the developer can develop your complicated pipeline and just call the HTTP.

Artem Yushkovskiy [00:10:53]: Server to produce some some result through throughout this, this, this routes and then return you back through the hcp. So basically it's just the synchronous gateway that allows the easy integrations with any client and we implemented as the MCP compliance so that you as the data scientist or developer or whatever.

Artem Yushkovskiy [00:11:19]: Integration unit you're able to just say hey do produce you know, AI implement sorry improve this set of images or use the system for, for me.

Artem Yushkovskiy [00:11:35]: So yeah this is in a nutshell the the architecture of of Asia framework it is open source. The documentation is.

Artem Yushkovskiy [00:11:45]: Is available here. If you see take a look how it is implemented. So the data scientists what they see they see only the Python code which does not have any people installed, doesn't have any.

Artem Yushkovskiy [00:12:01]: Library. It just takes the dict, returns the dict. It can return the error, it can return multiple dicts to do the fan out but they are completely decoupled from any kind of infrastructure or the.

Artem Yushkovskiy [00:12:15]: Pipeline structure. How the platform teams see it as the custom resource definition async actor which is very simple the KEDA auto scaling sorry auto scaling configuration and then the workload as is and then the framework injects the sidecar and the whatever.

Artem Yushkovskiy [00:12:40]: Things needed to for. For this thing to to work and yeah you can call it you have the status and yeah, so basically you're able to.

Artem Yushkovskiy [00:12:54]: To operate on this.

Artem Yushkovskiy [00:12:57]: On the system from two different perspectives as the user and as the platform. That's why it scales so well. So the open source alternatives I don't think there are so many. It's definitely not any kind of pipeline flow, meta flow or whatever flow. So the main point of my talk today is that pipelines don't scale.

Artem Yushkovskiy [00:13:25]: It can be raised serve with any kind of work queue but it seems to be an overkill to have a central.

Artem Yushkovskiy [00:13:38]: Orchestrator on top of which is celery on top of the racer which scales itself there. There might be some other alternatives with the more mature stack but we've noticed that.

Artem Yushkovskiy [00:13:56]: The world is going more to the async actors world. One fresh project is DAPR which supports it's also the orchestration framework but more general purpose not a specific which also supports actors. One of the fresh frameworks is the Google's Python library for running kind of a similar ideas but in the Same machine on based on Asyncio. So yeah, this was one of the. The release of this repository was one of the reasons for us that we are going. We have developed something that would be valuable for other teams and we would like to share it. And yeah, some other frameworks which we see rather the complementary to Azure framework for example Kaito, the.

Artem Yushkovskiy [00:14:54]: Framework for.

Artem Yushkovskiy [00:14:56]: Easily running the LLMs in your cluster. So yeah, that's it. Thank you very much. This QR code leads to the GitHub repository where you can check out the documentation, leave your stars and become the early adopters or early users, early contributors. This one goes to my LinkedIn. Please reach out to me if you have any more questions. Thank you.

Adam Becker [00:15:23]: Awesome. Thank you very much. Artem. This is cool. I have a couple of questions for you. We have some more questions from the chat so we'll dive right into it. First of all, Rajesh is asking is the Asia framework improve performance only for. Is the idea that it's going to only improve performance for batch loads? How do we handle real time?

Artem Yushkovskiy [00:15:47]: Yeah, so this way the framework shines in the near real time. So from more in the second scale rather than in.

Artem Yushkovskiy [00:16:01]: So we're not talking about batch which is like days or you don't really care of your latency. You do care if your latency as a framework does have timeouts. But we're not talking about milliseconds of latency which other REST API frameworks talk about. Just because you need to scale up, scale down to zero the workloads and we're talking about tens of gigabytes of images and the models. So it takes time.

Adam Becker [00:16:33]: So yeah, yeah, but you're not picturing somebody necessarily just doing it one go at the end of the day. No, this is built for some. I mean obviously I imagine it could be scaled up in that manner. But the idea here is near real.

Artem Yushkovskiy [00:16:48]: Time section is near real time. I was talking about batch at the beginning only as the. To show the logic of how data scientists science projects think about. They always start in batch and they slowly migrate to more near real time. Now we are. My project is operating in real time and near real time. Yeah, I say we're talking about minutes, seconds to minutes timeouts.

Adam Becker [00:17:14]: Arthur, we got another one here from Dmitry and I imagine that this is a question so he's asking so is this an assembly line? In a way. Right. So when you showed the different routes it seemed like he might be referring to that. Right. So in the beginning they seem to be pre configured. These are going to Be the routes and then the router can continue to extend more paths. But none of them do they ever go through multiple queues at the same time in parallel.

Artem Yushkovskiy [00:17:46]: How do you picture that about the sequential steps? This is up to you. To the developer, basically data scientists who is using this framework, you are able to. We took away the pipeline definition from the code and put it into the data in the messages. So the message which is passing through the agents, the actors is able to be rerouted and.

Artem Yushkovskiy [00:18:16]: Change its roots. So yes, in a way this is the assembly line about the parallel execution. It is very easy to do fan out. It is very hard to do fun in. You need to do to have a stateful step that would wait until some time out for at least three out of five messages to arrive and so on. This is the feature which we see a very big value, at least for our team. This is one of the first priorities for our for implementation. But it's not yet there.

Adam Becker [00:18:49]: Yeah. How do you handle, let's say the start times? So kubernetes, right. So like you're scaling up these AI actors, these GPUs are taking a minute to warm up. How do you handle that?

Artem Yushkovskiy [00:19:04]: Yeah, so.

Artem Yushkovskiy [00:19:07]: Two ways. There are some optimizations on the side of cloud which is pre built.

Artem Yushkovskiy [00:19:14]: Virtual machine images, mounting the model files into PVC persistent volume and mounting them into the machines. So there are different optimizations and I'm sure the community is solving and will solve it for kubernetes. That's why ASEA framework does not tackle it directly for now. But if you need some guaranteed latency for some stable traffic.

Artem Yushkovskiy [00:19:42]: Without maybe big spikes, you need some over provisioning. So this is inevitable. But you can still amortize this over provisioning not to maybe zero replicas, but to one. And, and then you still can benefit from auto scaling.

Adam Becker [00:19:57]: We got another one here from Miguel. Each actor seems to add context to the payload. Right. And does that ever get too big?

Artem Yushkovskiy [00:20:07]: Good question. Yeah, we have faced payloads that through 25 plus steps become quite large. But we've never hit the limitations of the message queue and it's just not really easy to use. So you're kind of advised to minimize it. But it's not hard requirement. As for the images or videos or hard artifacts, we recommend in documentation to pass this through again PVCs or S3 or any kind of blob store. Because you don't want to pass megabytes in the message.

Adam Becker [00:20:47]: Right. You want them to mostly just refer to data that lives elsewhere.

Artem Yushkovskiy [00:20:51]: Right.

Adam Becker [00:20:53]: What is the scale. What's the scale it's achieved so far that you guys have tried to run this with? This is a question from Srinivas.

Artem Yushkovskiy [00:21:01]: So far we are my current project quarks scales from zero to hundreds of GPUs and we're talking about not the eight hundreds yet, not eight hundreds. We're talking about medium sized GPUs. A 10G to host stable diffusion models or SLM as they're called. But we are eager to test it on heavier machines and a bigger scale. So we need more use cases.

Adam Becker [00:21:28]: Yeah, have you seen any use cases that sort of like demand a relatively complex request and like specific specification for your SLAs? So like it says, okay, only really do this but I need this within five minutes. If it's more than five minutes, just forget about it.

Artem Yushkovskiy [00:21:48]: Five minutes. This is what we're doing now. We're trying to squeeze it into one minute and for that we need fan out fanin. So I think it will come come soon. But yeah, minutes is not really a problem. My point is that it's not milliseconds, it's seconds. But one note is that in the pipeline of let's say 50 steps, different steps, there will be five 10 steps which are seconds but everything else is just a pure backend database call transformation, whatever. This is all milliseconds and the only way to stay in a reliable and easy way to join it all in a single pipeline.

Artem Yushkovskiy [00:22:28]: We say that this is the actors.

Adam Becker [00:22:31]: Each actor lives in what. What is the compute environment for each actor? Is it a pod? Is it a. What is it?

Artem Yushkovskiy [00:22:38]: Yeah, for now we as a framework we allow deployment by default which will be then scaled by the Kida Auto scaler and stateful sets. But we haven't yet used tested it on the real use case. We are looking forward to integrate with other projects like Kaito which in turn run their own deployments and others so that we want to inject into already existing deployments inject our sidecar so that to bring these deployments rest APIs into our asset framework so that it's easier to use.

Adam Becker [00:23:13]: I see, I see, yeah. So as like a sidecar deployment next to it, how do you think about applications integrating with it? So I think a lot of applications they might not be thinking about an async kind of process. They're still living in like a very real time world and they're. They're imagining that at some point they're going to scale and then deal with the challenges when they arise. Now it Arises. How does it work with you, given the async nature?

Artem Yushkovskiy [00:23:39]: Yeah, it's a very good question. We face this problem that there's two worlds, Sync and async. Async is better to manage, easier to manage, sync is better to integrate with. That's why as a framework comes with the stateful HTTP gateway, which has the state of each message passing in the framework and then it can reliably be the integration point for the users.

Artem Yushkovskiy [00:24:06]: Cool.

Adam Becker [00:24:07]: We have another one from Miguel. How do you prevent the router from entering a closed loop? By mistake. Is there info in the payload that says that it already passed via the router many times?

Artem Yushkovskiy [00:24:18]: Yeah. So runtime.

Artem Yushkovskiy [00:24:22]: So we're explicitly forbidding to rewrite the past history, only the future history. You can abort it, you can add some more steps in the future, but you cannot put the pointer and not rewrite the past. So that in theory you're limited by the message size. But it is a good point to add some loop prevention or loop warning, at least in the framework itself, for now it has not been the priority.

Adam Becker [00:24:52]: Yeah, well, I clicked the link and it's pretty cool. You know what I think would be fun? Even if. I don't know if you guys are open to it, but walking through the project I think would be very cool. Just like the actual code, you know, I mean, just in general, I think that could be fun. We could just do that. I don't know if you ever open to it. I think that would be a fun session to have.

Artem Yushkovskiy [00:25:15]: Sure.

Adam Becker [00:25:17]: Imagine right now you're not. I mean, I look at it now, you're saying it's going live soon.

Adam Becker [00:25:25]: This is very fresh.

Adam Becker [00:25:28]: All the files there, the latest commits are like an hour ago. So very active.

Artem Yushkovskiy [00:25:32]: It's very fresh and active. We are still migrating. So we had implemented this kind of a system in our project and now we are extracting it to the as the separate project and migrating on it ourselves.

Adam Becker [00:25:47]: Yeah. What's been the most difficult part so far about building it?

Artem Yushkovskiy [00:25:52]: Oh, good question.

Artem Yushkovskiy [00:25:56]: I don't know, maybe convince talking around people and convincing that this is the way to go. Because whenever you talk to the backend engineers, they say, yeah, that's obvious. You don't serve pipelines, you serve as stateless microservices. Whenever it comes to data scientists, they say no, no, no, please, no kubernetes, no no nothing. We want something simple. So convincing people that this is the way to go, at least for some use cases, which are big companies with separate teams, with maybe not a POC with large clusters and large use cases.

Artem Yushkovskiy [00:26:32]: I guess this.

Adam Becker [00:26:33]: So you're saying it's the adoption and the integration in existing workflows more so than.

Artem Yushkovskiy [00:26:40]: It's like. It's like a mindset of, of the data science use cases? I would say, yeah.

Adam Becker [00:26:47]: And would you say the ideal person to be integrating it in would be therefore not, not, not a data scientist, like a backend engineer.

Artem Yushkovskiy [00:26:56]: Two people, One one who manages Kubernetes and sets it up and CICD and everything, the other one who is not afraid to not use any Python cli just use pure Python function.

Adam Becker [00:27:09]: Yeah, but would you say that. So in that sense it's.

Adam Becker [00:27:14]: It should be much easier for them at some point.

Artem Yushkovskiy [00:27:18]: That's, that's my bet. It will be much easier, it will scale much better. It's much more testable, it's much more easy to understand. It just. You need to do this, you know, shift. Mental shift.

Adam Becker [00:27:31]: Yeah. What about the engineering of it, all these systems with the queues and the scaling. What aspect of that would you say has been maybe surprisingly difficult?

Artem Yushkovskiy [00:27:43]: Yeah, I would say maybe the Kubernetes Customer source ownership. So whenever my CRD creates the.

Artem Yushkovskiy [00:27:52]: Whenever I create the customer source, it creates the deployment, the queue, automatically, the whatever, resources. But then this deployment suddenly starts to be managed by Kido who wants to scale it up or scale it down, or then you want to integrate with some other resource, as I mentioned before, which is for example Kaito framework, which runs and owns their own deployments and you want to integrate with them. So I would say the ownership in the Kubernetes is the hardest problem. You need to state clearly what you're owning, what you're not owning, what you're changing. And then, yeah, it's one of the last PRs fixing this bug.

Adam Becker [00:28:32]: Miguel, thanks for dropping the link to the GitHub and that was for sure. Another question here from Pavel, why Asia isn't ideal from training ML jobs. Can you explain quickly why it is.

Artem Yushkovskiy [00:28:47]: Not ideal for training?

Adam Becker [00:28:49]: That's how I'm reading it. I think so.

Artem Yushkovskiy [00:28:53]: I sure answer. I haven't tried it, I haven't thought through it much.

Artem Yushkovskiy [00:29:01]: Maybe. Maybe because there are better tools. So it's like if you're talking about training on demand, you have metaflow, whatever pipelines that are great for that. If you're talking about distributed training, cross gpu, I don't think Asia wants to go there because this is more about. It's a different kind of challenge. So probably.

Artem Yushkovskiy [00:29:24]: We need to try. Yeah.

Adam Becker [00:29:27]: How could the Community help you to grow this. What do you think would be most.

Artem Yushkovskiy [00:29:31]: Useful ideas and projects for now?

Artem Yushkovskiy [00:29:36]: We need adopters, we need feedbacks like.

Adam Becker [00:29:39]: Actual use cases and you want people.

Artem Yushkovskiy [00:29:41]: To take for a spin. We do have some use cases in the queue in my company. We are going to do this, but it would be great to have it from the community because. To have more fair comparison.

Adam Becker [00:29:57]: Yeah, yeah. Just to have the visibility of how it actually integrates and what problems it's solving, I could see that always being very difficult. So I laud your efforts in this. One more question. To what extent do you think people need to know Kubernetes? Like what if they're not on Kubernetes? Is there still a way to interface such that they just give you access, whatever AWS and you just do the full deployment on your own?

Artem Yushkovskiy [00:30:25]: Right, so the answer is diagonally different for two different users. For data scientists. Don't worry, you don't have to know it. If platform team has set it up properly for platform team, don't worry, you will use Kubernetes because from my.

Artem Yushkovskiy [00:30:44]: From my feeling and I love Kubernetes, I love it as a platform. There is a very rich ecosystem of tools and it is very clear and transparent and stable. So yeah, you will use. Those who want to use, will use it. We just separate these two groups.

Adam Becker [00:31:02]: Got it. So if that's the case for startups that might not yet have the full platform team, you know, to actually operationalize these things, should they nevertheless talk to you because you think that the actual deployment of the Kubernetes is not as heavy a lift as they imagine?

Artem Yushkovskiy [00:31:18]: I don't think it's a great. At least at this stage. It is probably not the great starting point for startups. It is as you remember from the graphs I showed before, in the initial stage, the cost will be a rather similar other use. So self hosted AI with, with the all the operational overhead or you use the APIs. It really matters when you scale in terms of the amount of work you do and amount of different processors or actors you run. And probably, yeah, we're talking more about bigger teams, bigger use cases.

Adam Becker [00:31:56]: On that note, can you. Do you mind sharing the slides again?

Adam Becker [00:32:01]: I want to go to one of the first ones that was there. Give me one second.

Adam Becker [00:32:07]: Yeah.

Adam Becker [00:32:10]: No, actually, yeah, you had the curves of the cost where you had the 30 times cheaper. Yes, this one.

Adam Becker [00:32:21]: So is this something you've actually seen or is this sort of how you're kind of like estimating things like. Yeah, where is this from?

Artem Yushkovskiy [00:32:29]: This is. I would say it's hallucination based on real feeling. So the scale is rather same. We have caught it somewhere at some point. We just realized that if you keep running on APIs we will just get broken. We use all our budget and we had to migrate to self serve. But yeah, the numbers are completely imaginary by chatgpt here I think. But the scale more or less is.

Adam Becker [00:32:56]: We should be thinking about that scale, right?

Adam Becker [00:33:02]: Very interesting. Awesome. Artem, thank you very much for coming and joining. Stick around the chat in case folks have more questions. And I think we put the link to the GitHub so if people would like to explore this in more detail. Artem, please, thank you very much. It was a pleasure having you and best of luck on the next soup.

Artem Yushkovskiy [00:33:22]: Thank you. Thank you.

+ Read More

Comments (0)

Popular

Watch More

How to Systematically Test and Evaluate Your LLMs Apps

Posted Oct 18, 2024 | Views 15.2K

# LLMs

# Engineering best practices

# Comet ML

Small Data, Big Impact: The Story Behind DuckDB

Posted Jan 09, 2024 | Views 13.4K

# Data Management

# MotherDuck

# DuckDB

Building LLM Applications for Production

Posted Jun 20, 2023 | Views 11.1K

# LLM in Production

# LLMs

# Claypot AI

# Redis.io

# Gantry.io

# Predibase.com

# Humanloop.com

# Anyscale.com

# Zilliz.com

# Arize.com

# Nvidia.com

# TrueFoundry.com

# Premai.io

# Continual.ai

# Argilla.io

# Genesiscloud.com

# Rungalileo.io