Building AI that Doesn’t Break
speakers

Elliot is a passionate maintainer of Hera, the Python SDK for Argo Workflows. At Pipekit, he is helping to bring scalable data pipelines to the Python world, unlocking the full potential of Argo Workflows for data scientists. Previously, at Bloomberg, Elliot supported Machine Learning engineers to accelerate their model retraining with Argo Workflows through Hera, simplifying the authoring of complex workflows.

Qian Li is a co-founder of DBOS, Inc. She completed her Ph.D. in Computer Science at Stanford University in 2023, where her research focused on abstractions for efficient and reliable cloud computing. Qian also holds an M.S. in Computer Science from Stanford (2019) and a B.Sc. in Computer Science and Technology from Peking University (2017).

Co-founder & CTO of Rasa, doing chatbots since before they were good.

SUMMARY
No YAML? No Problem: Orchestrate Kubernetes Workflows the Easy Way with Python
Sick of writing orchestration logic in YAML? You’re not alone. Discover how Hera, the Python SDK for Argo Workflows, lets you express complex Kubernetes workflows using clean, testable Python code. Keep your business logic and orchestration logic in one place — no indentation nightmares required.
Building Reliable AI Applications with Durable Workflows
Chaining functions together is easy. Keeping AI workflows running when things go sideways? That’s the hard part. This talk introduces durable workflows — systems that checkpoint state, recover automatically, and gracefully handle everything from human delays to API flakiness. You’ll see real examples of AI pipelines that stay resilient in production.
Process Calling: Agentic Tools Need State
Function calling gave LLMs a way to "do" things — but it’s not enough. When you’re building agents for customer-facing use cases, stateless abstractions fall short fast. Learn why the future of agentic tooling is process-based, not function-based, and what it means to build agents that remember, recover, and reliably finish what they start.
TRANSCRIPT
Ben Epstein [00:00:27]: Okay. All right, we are live. Thank you everyone for taking the time and joining today. And if anybody was in New York and was at the RAMP Modal Prefect event last night, that was also a really fun event, but this will be a little bit different. Same kind of theme. Robust, durable systems, running pipelines at scale, all of those great things. We have three speakers today. I'm really actually excited about all of them.
Ben Epstein [00:00:54]: Should be a really good talk. We have Elliot from Pipekit, we have Chen from dboss, we have Alan from Rasa, all really on a pretty consistent theme about running things durably, reliably, orchestrating these things in the cloud, getting them off of your laptops. And less about kind of the fanciness that agents can showcase like fancy demos, more about how to actually get these agents running in ways that when they fail, they start up again, when they crash, they can come back. You can actually get these pipelines running in reliable ways and start building building blocks in your code base. Really appreciate all three of you coming on. We have three speakers today and usually when we have three speakers we are a little bit crunched on time. So I'm going to just let us jump into the first one. We'll start with Elliot from Pipekit.
Ben Epstein [00:01:43]: He's going to show us a better way to think about YAML and Kubernetes. And after that we'll jump over to Chen and then Alan. But whenever. One second, Elliot, whenever you're ready, I'll let you jump right into it. Your screen should be available.
Elliot Gunton [00:01:59]: Yep. Okay. Hello everyone. Welcome to my talk. No YAML, no problem. Orchestrate Kubernetes workflows the easy way with Python. So this is going to be an intro to other workflows in Hera in 15 minutes or less. First, who am I? I'm Elliot, a senior software engineer at Pipekit and maintainer of the HERA open Source project that I'll be talking about today.
Elliot Gunton [00:02:20]: And who are Pipekit? So, we provide enterprise support for companies adopting and scaling Argo workflows. We also have a control plane product to expand Argo to multi cluster scenarios. You'll get direct support from active Argo workflows, maintainers, saving you engineering time and compute costs. And here's the outline of today's talk. So first we'll give a brief overview of Kubernetes and Argo workflows before diving into hera, its features and examp. And finally, the key takeaways of the talk. So I'm sure everyone has at least heard of Kubernetes. It's the cloud native standard and allows users to scale up easily.
Elliot Gunton [00:02:53]: It's container native with an active community. You've got long term support from big names, but overall the project is vendor neutral. But what you might not have heard of is Algo Workflows and spot the difference with this slide. Algo Workflows is built on Kubernetes, so you'll see all the same benefits. It's become one of, if not the most popular workflow orchestrator, and so it's effectively the de facto standard. And the rest of these bullet points are the same from the previous slide. And if you want a mental image of Argo, it looks like this in our web ui. So let's take a look at the anatomy of our workflow.
Elliot Gunton [00:03:25]: Like all things, Kubernetes workflows are custom resource definitions. So in this case they have a spec. This contains a list of templates and you can think of these templates as functions in a library. You can arrange the templates in a dag. So that's what we have here, calling this Echo template. And there's also a sequential steps template. But the entry point here acts as the main function for your workflow. So what problems do developers face when using Argo? If you recall the name of the talk, we don't like YAML.
Elliot Gunton [00:03:59]: It can be a barrier to entry, it's hard to test workflows programmatically, it's hard to reuse raw YAML. And for Argo itself, YAML just makes it hard to maintain long workflow files. You can imagine this fine tune LLM workflow containing hundreds, if not thousands of of lines of YAML. And what if there were an easier way? Here we introduce hera. It's the Python SDK for Argo workflows which lets you write templates as Python functions as well as the workflows themselves. And you can interact with the Argo workflows API entirely through Python. So we get the best of both worlds. Python comes with a better developer experience, code completion, testing frameworks and more.
Elliot Gunton [00:04:37]: And we're using Argo, the best in class Kubernetes Workflow Orchestrator. We get a seamless developer experience from this, where HERA gives you the all in one solution for writing business logic, workflow, orchestration logic and your submission code, all from Python. Plus it comes with the extensive documentation that was recently refreshed following a developer survey we ran earlier this year. But that's enough talking. Let's see some code. Let's see how we can recreate this Basic DAG example written in YAML and rewrite it in hera, going line by line. We start with the custom classes that HERA provides, giving you better code completion and type hints. And next you'll see the Context Manager pattern for workflows and dags.
Elliot Gunton [00:05:20]: This mirrors the YAML syntax, if we remember what we were just looking at. So it helps developers actually transition to hera. They're looking at the same patterns on both sides. Then in the Context manager we see this container, it gets automatically added to the workflow and. And then down here we see the tasks get automatically added to the dag. The DAG itself is also a template, so that gets added to the workflow. And to reduce boilerplate we have a sprinkling of syntactic sugar. So this is a pseudo call of the function where you pass the arguments of the task.
Elliot Gunton [00:05:53]: So see the difference with the previous slide. Here we have the task object where we're passing the template of, well, the echo template we just created container to the template variable. And here we just reduce the boilerplate. And finally you see we can set up the dependencies of the tasks using this bit shift operator, which you might recognize from Airflow. Going back to the original idea, recreating the DAG diamond inherer, which do you prefer? What you might not realize is the Python code on the left actually gets translated into the YAML on the right. This is because ARGO is on Kubernetes, so only understands the CIDs written in YAML, unlike something like Airflow, which is running the Python code that is there on the Airflow server. So container native is cool and all, but what about Python native In Hara, functions are templates. All you need is this script decorator to containerize your function.
Elliot Gunton [00:06:49]: Then you create a DAG in the same way as for containers that we just saw. And once you've got your workflow together, you can run it on the Kubernetes cluster through a simple workflow service and a function call, and we can actually use the object returned from the Create function. Here we're just printing a link to the console so we can click through. So this is an example that can run on local Docker desktop. So that's why it's a localhost. And on the actual UI you can see the live progress of the dag. So these three ABC are finished and D is pending to run. The whole DAG is in progress.
Elliot Gunton [00:07:27]: And then we see when it's finished, everything has run. On the right we see a sidebar of details for the selected node of the graph, and you can also see logs and more details from the other two tabs. So we've seen the basic DAG diamond and how to recreate it in hera. Now let me show you about one of the key features of Hera, the HERA script runner. But first let's take another look at the function we just used in the DAG Diamond. There are no types. HERA adds some JSON lube boilerplate, but makes no guarantees on the type of the parameter. So this could be a string, it could be a list, could be a dictionary.
Elliot Gunton [00:08:04]: As long as it's valid JSON in the input string, it will get loaded and printed. So this is an inline script template where the function body is simply dumped into the YAML. So don't forget HARA is a translation layer to the Kubernetes CIDs that Argo understands. You'll see any types that you add here, they have nowhere to go in the dumped YAML. This source field acts as a single Python file, so itself is not a function. So you can't have a return statement, you can't have input types. So how does the script runner solve this? It's a module contained in the HERA library itself which runs your function. It handles all the input parameter parsing and validation.
Elliot Gunton [00:08:43]: And all you have to do is build a Docker image which is accessible from Argo. And how do we set it up? We just need to add two values to the script decorator, the constructor and the image. You'll then see the function compiles to a script template, like we see on the right. The command is Python. It's being passed the HERA workflows runner module, and that takes an argument for the entry point of your package's function name. So what do we get from this HERO runner? The main selling point of this feature is that your inputs are validated against the types that you provide. So the HERA runner is actually deserializing and checking that this is a float and this is a float, and you'll also be able to return values from the function. So that means you can test script templates like any old Python function.
Elliot Gunton [00:09:29]: This addresses one of the biggest problems of using plain YAML workflows, as you would otherwise have to test the template directly on the cluster and use your eyes to check what is going on. So let's take this up a notch. If you've been using Python for some time, you'll have heard of Pydantic. Pydantic is a data Validation library, which is really good for APIs where you want to translate from JSON to Python objects and back again and make sure they have the correct structure and correct types. So HERA already actually integrates with Pydantic for general type validation, like the floats that we just saw. But you can also use Pydantic base model classes for your template inputs and outputs. This gives you better Python type hints and auto completion during template development and at runtime. The HERA runner uses PyDantic to automatically deserialize JSON inputs into type safe Python objects and and serialize Python objects into the JSON outputs.
Elliot Gunton [00:10:18]: So let's take a look at how we can use a Pydantic base model class in this basic example. So we're going to create a rectangle class to replace these two input values. So first we take the base model and create a subclass from it, and we're going to give it its own function. So we're actually doing something inside this class and then we use that class as an input to the function, and that means you can test it like normal. In the world of Argo, this is almost groundbreaking. In the world of Python it might not look like much. And so what if you need binary data inputs or outputs? Well, you can do that too by passing user defined functions in the type annotations. So in this example we're annotating the return type as a PANDAS dataframe.
Elliot Gunton [00:11:07]: And by using the annotated type we provide metadata in this section of the artifact, it sees this dump B function and turns the output of the function a plain data frame into a binary parquet file. So a quick fire round of more things HERA can do for you. We saw earlier how you can interact with the live workflow object on the cluster, which opens the door for checking execution path through the DAG and checking outputs of individual steps. HERA has a built in mechanism for common setups and config such as default resource requests, image pool policies and more. So this can help a platform team in your organization create a wrapper package around HERO which suits your organization's needs. And finally, if you want to version workflows, you can piggyback off Python versioning, which makes it easier to know what's running on your cluster for the key takeaways of today's talk. So HERA helps you avoid the headache of YAML and makes workflow orchestration on Kubernetes accessible. If you're using Python, then HERA can supercharge the developer experience of Argo workflows and finally linking back to the title of the summit.
Elliot Gunton [00:12:14]: If you want to train AI that doesn't break, you'll need data pipelines that don't break. The TLDR of all this Algo workflows in HERA can give you the best cloud native workflow orchestration experience. If you'd like to connect with us on the HERA project, you'll find us on GitHub under ARGOProJLabs Hera. So please come and suggest features, report bugs and contribute code. We'd love to see it. And if you want to learn more about HERA, the docs are at HERA ReadTheDocs IO and finally we have a semi official HERA Slack on the CNCF community, but I'm Also on the MLOps community Slack if you want to DM or tag me. And finally some free stuff from Pyekit. We have all our previous talks from other conferences available on the Pyekit talk demos repo and if you're using Argo workflows already, we have a free workflow metrics product which will help you get started with better observability.
Elliot Gunton [00:13:01]: And that's all from me. Thanks very much.
Ben Epstein [00:13:06]: That was awesome. This is actually really, really cool. I didn't, I guess I didn't know this existed and maybe it didn't exist back when I was using Kubernetes, which was a company from a couple of years back. This was actually the exact thing that we were looking for. We ended up not being able to go with Argo because our system and our customers were so Python native. But we wanted, we had to like we ended up using Airflow, which like we weren't particularly happy about. This is literally the thing we were looking for. I don't use Kubernetes anymore, but like for my current company.
Ben Epstein [00:13:39]: But that is, that's a really cool thing. That's amazing. How long have you personally been working on this project?
Elliot Gunton [00:13:46]: I think about two and a half years now. So I was at Bloomberg before Pipekit and we had the same problem of airflow versus Argo and we found the HERA project was like in its infancy where Flav was maintaining it and he let us come onto the project and kind of start helping and we helped rewrite the library for like a version 5 major release. So then it was fully one to one with YAML with the spec, whereas before it was a much more simplified data engineer approach.
Ben Epstein [00:14:15]: How have you seen the adoption of it? Like what kinds of teams have been Adopting this. Are they smaller teams or larger enterprises or kind of a mix of the two?
Elliot Gunton [00:14:25]: Definitely a mix of the two. We're seeing loads of big companies using it, ones that are surprising me in terms of hedge funds and other financial institutions, obviously like Bloomberg are also still active on the project. But we've also seen startups that we work with adopting it and just finding it really much less painful than young.
Ben Epstein [00:14:43]: That's cool. Yeah. My guess would have been mostly larger enterprises, people who took that shift to move on to kubernetes and were like trying to be ahead of the game of the, of the trends and really like get onto a scalable system and then got there and then got stuck in the world of YAML. Like I could imagine a lot of people wanting to quickly build this up. That's awesome. Okay, thank you so much, Elliot. Jen, we're going to jump over to you and we'll save some other questions for the end. Your screen should be ready.
Ben Epstein [00:15:13]: Go ahead whenever you're ready for it.
Qian Li [00:15:17]: Hi everyone, my name is Chen. I'm the co founder of dboss. Today I'm going to talk about how to build reliable AI applications with durable workflows. I believe, and this actually our company's belief, is that your database is all you need. Cool. Let's go, let's go. Next slides. Here is an example AI application.
Qian Li [00:15:44]: It's actually a very typical use case we've seen. For example, if you want to build a document extraction agent, you may have a user interface which takes input from user says, I want to summarize this paper and I want to answer some questions. Then the AI model will dynamically construct a workflow saying I'll first download some PDFs from the website, download some papers, and then I'll index each page of the paper and then they'll generate vectors and they'll store those vectors in the vector database, store vector embeddings in the database and then I'll use them to answer user questions. That sounds really simple. And you can easily build an AI demo or some kind of very simple AI prototype with any of the AI frameworks. But the tricky part is that any step can break your process, may crash when you download, may run out of memory. When you index a page, you may have a bug where you generate wrong index. Your database may become temporarily unavailable.
Qian Li [00:16:48]: Your AI model may return 500 internal server error. Also, your user may not be as reliable as you think they may take. If you need a human verification step, they may take a bit too long to respond so you have to consider all the challenging scenarios. So you may ask, okay, those are already challenges for software engineering. Why do I have to care? So the problem is that AI driven workflows make things even more challenging. So the thing we've observed is that AI driven workflows are extremely dynamic and long running. So you can't typically write a static DAG or statically define a workflow graph for it because the workflows are often driven by the LM decisions. Say the LLM will tell you I want to call this to and this tool dynamically based on your user input.
Qian Li [00:17:42]: And secondly, LLMs or external API calls can be really unreliable. And because those AI agents are increasingly interacting with external world, like helping you to book tickets or or interact with your database or interact with other services, they can have unreliable performance. Sometimes it will take five minutes or even longer to respond. And then many APIs would have rate limits. Say you can't call the LLM more than 50 times per minute, then they often have transient failures and then they often sometimes will give you wrong result. Moreover, as I mentioned before, even with AI, those autonomous workflows usually require human in the loop. So sometimes you need to send an email or send a message to some admin or human or user to verify the workflow and then the workflow will have to wait for user input before going. Finally, those workflows are extremely large scale.
Qian Li [00:18:45]: Sometimes if you want to crawl the web, if you want to do some deep research agent, it will have to crawl a hunch, hundreds of thousands of documents and then process hundreds of steps. So to solve all these issues, we can apply our traditional wisdom to try, try, try again. If something fails, no problem, we can try it. But it doesn't usually work for AI driven workflows. First, if you try again, some completed step may run again. So you may get data corruption, or you may have duplication. For example, if your agent needs to send a user email, then if you try it again, it will send the email again and users will become very confused. And moreover, you will waste your compute resources.
Qian Li [00:19:32]: And those AI models are not cheap. So if you keep retrying, you will burn a lot of money. And finally retrying could be really slow. So what should we do then? I'm sure all of you play games. So when you play games you always want to say I want the save point. So if I take a break, if I go for dinner, when I come back, I don't have to always begin restart my game from the beginning, I can just resume from where it left off. That exactly is the idea of durable workflows. Basically, durable workflows checkpoint your program's execution state so that it can resume from where it left off.
Qian Li [00:20:16]: Here's a concrete example. In dboss, we built a duplicate workflow engine as a library. Basically you just write normal functions. Currently we support Python, TypeScript and we're developing Java and go. Then you just say, for example, I want to index some document. And then you can just decorate this indexing workflow as a deboss workflow. And then for each step in the workflow, for example, first download the document and then we'll index the document. Using LLMs, you can decorate each individual step as deboss step.
Qian Li [00:20:54]: So what's the difference between a workflow and steps? So workflows need to be deterministic. For example, if I crash in the middle, when I restart from the beginning, the workflow will follow the same path. But it doesn't mean that workflows have to be static. Basically you can be dynamic as long as your control flow is deterministic. Then for any non deterministic logic, or any external cost, LLM costs, you can wrap them in a step. For step, you can also configure retries with exponential backup. So the step and workflow are two basic building blocks for debos and for most of the durable execution engines. So how it works? Basically, when we execute a workflow, we first checkpoint a workflow input in a database and then after executing each step, we checkpoint the steps output in the database and say if I'm executing step three and the machine crashed, I have to recover it.
Qian Li [00:22:05]: It's fairly simple if we have durable workflow, basically, when we recover the pending workflow, we take a look at what had finished before. Then, because we recorded the workflow input step one, step two in a database, we can safely skip those completed steps and use the recorded results from the database. And then we resume execution from step three and then run it to completion. That's how we with durable workflows, we can resume from where it left off. You may have heard of other workflow engines or durable workflow engines. What is different? Why dbos is different is that we believe your database is all you need. So DBOSS runs in process in your application code, in your application process. There is no separate orchestration servers or workers.
Qian Li [00:23:04]: As long as you install debug library, use the library and connect it to a database. For now it's postgres, then you're good to go. DBOSS will automatically checkpoint the annotated functions, inputs and outputs to the database and then automatically recover when your workflows break. And it's really simple to integrate with existing programs. And it's easy to integrate with AI frameworks, so I'll have some concrete examples later. Moreover, besides durable workflows and steps, we have other database backed features. So it turns out that it's really simple to implement workflow graph tracing because we've store the inputs and outputs of those workflows and steps so you can extract those data from your database and then construct a visualized graph. Then it also makes workflow control super easy.
Qian Li [00:24:02]: If I want to fork a workflow for say my step three is buggy, I want to deploy a new code, but I don't want to repeat step one and step two. Then you can fork the workflow from step three by leveraging the previously recorded result but running the new code to resume the workflow. You can also do workflow cancel and resume easily to recover from failure. We also implemented durable queues to deal with rate limiting and concurrency control when you talk to external APIs, it's also easy to implement durable sleep for long wait. And we provide durable timeouts so that we can actually bound the workflow to for example to not run if it's beyond one hour. And we also have durable events to durably wait for human input and other external triggers. And we also provide durable cron jobs. When I say it's really easy to integrate with AI frameworks, actually we have a blog post about it.
Qian Li [00:25:07]: We showed how to use debos for example with launching to build a reliable refund agent. So all you need to do is here use the process refund function. You can construct a deboss workflow by decorating it as a deboss workflow. And also you can tell the AI agent to use it as a tool by decorating as a long chain tool. All right. And then so you may ask when can we where can we use this durable workflows in applications. So actually it's really flexible. Also, because debos is so lightweight, you can use it essentially anywhere.
Qian Li [00:25:46]: Here are the three top common use cases we've seen. The first is data pipelines. For example, you want to index PDFs or index website. You may want to make your data pipeline durable so that when your process crash or out of memory, you don't have to re index the previously completed steps so you can resume from where I left off. And then secondly, you can also use durable workflows to drive your AI Main loops. For example in the loop you can orchestrate LLM calls and ask LLM what next steps should be and then based on the output you can call those tools durably as durable steps. Finally, you can also implement your tools with driver workflows for MCP servers and agents. So each of the tool calls can also be a sub workflow to say like to implement a checkout flow or implement an email notification flow.
Qian Li [00:26:51]: So finally with the last two minutes, I'm going to walk you through a concrete large scale production AI agent use case. So this is actually an AI production AI agent running like hundreds and thousands of workflows right now on dbos. So this is a autonomous AI agents that monitor the web. So the users can say I want to monitor the price of flight ticket. And then they will launch a fleet of AI driven workflows to periodically crowd a website and finally email users about your update. So it looks like the workflow I've shown in the first slide, but it's actually it's in production so it's quite complicated. The user say I want to track flights to Tokyo under 900 in August and then the LLM will parse that question and then construct a workflow specifically for that user. For example, it will first crawl websites, relevant websites and gather all the relevant information and then store the information in the database and then trigger the second step whereas to extract those information and understand what's going on and then store some of the insights in the database and then pass it to the next step.
Qian Li [00:28:10]: For example to format your information to figure out the diff between previous cross and then finally send a user email for the update. When the users came to debauch, they already have some of the workflows build up, but their system is not super reliable. Here are the challenges and how DBOSS helped. First their workflow all of their workflows are dynamically generated. They cannot use a static dag. Also, those workflows are really long running. For example they may wait for a few days for user input or for some response from certain external APIs. So they use workflows for they use debug workflows for overall orchestration and then they use steps for individual LM calls or API calls or any non deterministic or external interactions.
Qian Li [00:29:16]: And then they also use durable sleeps for a long wait. For example, if they need to wait for external additional user feedback, they will for example sleep for one day and then check if the user has responded yet. Then to handle unreliable LLM calls, they use debug steps with automatic retries and exponential back off. And then they also want to make sure that they email users exactly once. Because if you don't email users, people will be very confused. They may think your agent doesn't work. And if you email users more than once, then they may think it's kind of unprofessional. So to solve this, they use debug steps with built in idempotency key so they can check whether email has been sent before or not.
Qian Li [00:30:08]: And then they also have many parallel subtext tasks. For example, when you crawl the website, you want to crawl them all in parallel, but you also have to be careful about rate limits on some of the websites. So that's why they use deboss durable queues. And they also use child workflows to paralyze those tasks and then to control rate limits and concurrency using the Q interface. And finally they also have periodic tasks. For example, if I want onto monitor the price changes for tickets, I want to run this workflow every hour. So they use the debug durable cron jobs. And then because everything is based on the database, everything is persistent.
Qian Li [00:30:55]: So even if some of the process crashes will always resume from where it left off. If you have a long sleep, we'll make sure you never oversleep and you have a cron job, we'll make sure that everything will run in schedule. In summary, your database is all you need. Your database can be your durable engine. Your database can be your durable queues. If you want to learn more, please visit our website and check our open source libraries and meet us on Discord. Thanks.
Ben Epstein [00:31:30]: That'S awesome. Thank you, Chen. I am super stoked about dbaas. I think it's a really, really cool project and fits like really well in the world. That's kind of emerging of 1 like separating compute from storage, 2, like owning your compute, owning your execution engine and 3 just like moving back or re emerging the world of just like postgres for everything. I really love the energy of postgres for everything. It's such a cool concept and it's so unbelievably dynamic in what it supports. We got two questions from folks about your talk.
Ben Epstein [00:32:09]: One, the second question we got was over what infrastructure was the AI agent running? So I think they're asking about when you ran that fleet of models that was checking the flight, both like where was that running? And then also where like what was the brain that was actually scheduling these things to run and pick up and continue, right?
Qian Li [00:32:29]: So as I said in the talk, everything is open source, so the brain is in the open source library. Basically, when you start a process, we'll start some background thrust, check the database to figure out when to schedule those things. You don't need an additional scheduler or orchestrator, you just need to embed dbos library into your process and we'll use the database together. Using the database to figure out the schedule. Then for the production AI agents, deboss is open source, so you can run it anywhere, but in this specific case they're actually using dbos cloud. It's a serverless cloud we provide, so they're, they're hosting on our platform, but we also have great self hosting support. So you can essentially run dbos anywhere.
Ben Epstein [00:33:18]: When and when you say just because he asked about infrastructure, just to tie up on that, on that comment, when you say you just embed dbos into anywhere as the brain, for example, that could be like an EC2 instance that you spin up. Right. That has a connection to your Supabase or your Neon Daily base.
Qian Li [00:33:35]: Yes.
Elliot Gunton [00:33:36]: Cool.
Ben Epstein [00:33:38]: The second question you got was, how does dboss's checkpointing system into postgres differ from langgraph's checkpointing system? I actually don't really know about langgraph's checkpointing system, but I imagine you do.
Qian Li [00:33:55]: I briefly read about it, but I could be. Don't quote me on this. My understanding is that Lang Graph requires you to statically define a graph and then it will checkpoint the overall. The idea is similar, but deboss design is more general. You can just decorate any Python function and you don't have to specifically spell out a graph. Yeah, and also we provide all the durable cues and other features that Long graph doesn't provide. So as I showed earlier. Let me see quickly.
Qian Li [00:34:32]: Oh, so we actually build an agent together with long graph. So for the outer loop you can use long graph, but for individual tools you can use debug. So it's very flexible in terms of like how you use it, right?
Ben Epstein [00:34:48]: Yeah. And I think, I mean, everyone's going to have their own opinion, at least from my experience using these tools. The, the more general the tool, the less opinionated the tool, the better it tends to be to work with. In many ways, LangChain might be. Or Lang Graph might be like an implementation of checkpointing specifically for running LLMs over a set of tasks, but there's a whole bunch of other things that have to happen that aren't just like the LLM calls here. We got another comment on this. Langgraph checkpoints Execution as well, and allows node retries. It doesn't allow the whole graph to reboot.
Ben Epstein [00:35:23]: Okay, that's interesting. I think that you can imagine dboss, obviously, Chen, correct me, but it seems like dbos is much more where it can be your overall system, not just the parts of AI, but everything that needs to run in code can be run and managed and checkpointed and retried and scheduled and everything through dboss. Whether that's like an endpoint request or whether it's an LLM request, or whether it's, you know, some Pandas or Polar's data manipulation. Is that fair?
Qian Li [00:35:53]: Yes, that's correct.
Ben Epstein [00:35:55]: Awesome. Go for it. You have one more comment there.
Qian Li [00:36:00]: Oh, I think you summarize really well. Thank you.
Ben Epstein [00:36:03]: Okay, you got one more question, then we'll. We'll jump to the next talk. How would this couple with gen kits on major cloud providers, for example, gcp, Agent Langraph deployments. Similar question, but maybe a different.
Qian Li [00:36:16]: Yeah, so I think because basically DevOps can be used in any workflows or any python functions you can imagine, some part of your tools will require durability, and you likely will, for example, when you talk to external systems, when you want to, for example, send an email or scrape the website, you can just use. You may not be able to write everything in a single framework. That's how you can say, decorate this Python function with debos and then use this function as part of the tool call. When you provide a list of tools for your agents, then you can say, this is a deboss decorated function. I'll use that as a tool.
Ben Epstein [00:36:59]: Can you just use a deboss step without a deboss workflow?
Qian Li [00:37:06]: Sure you can. Deboss depths will be retried from the beginning. Deboss workflow will give you finer checkpoints capability. There's a balance there. Definitely. If you say this is a single API call, then sure there's a step. If it fails, we'll just retry the entire step.
Ben Epstein [00:37:28]: Yeah, got it. Okay, awesome. Very cool. Thank you, Jen. All righty, Alan, we're going to jump to your talk and then we'll bring everybody on for community questions. Whenever you're ready. Yeah, kick it off.
Alan Nichol [00:37:45]: Absolutely. Yeah. I get to bring us home. Well, thank you and well done to the previous two speakers. I'll try and keep up the pace, keep up the energy a little bit and talk about an idea that I think is sort of bubbling up in the community and I see it in other places and I just, I'm trying to give it A name and starts. This idea of process calling. You know, very briefly about myself, co founder, CTO of a company called Rasa, you know, I've been called the OG Chatbot framework, doing chatbots for, for quite a while and you know, care a great deal about building, you know, robust and reliable conversational AI. That I think fits really with the theme of all the things that we're talking about today.
Alan Nichol [00:38:31]: So I also really appreciated the perspectives of the previous two speakers. So this is the take that sort of sets us up for this, this talk, which is really in the context of like customer facing agents, right, which is one of the main use cases for Rasa. It's not the only use case for Rasa, but things that are sort of agents that are representing your company or your organization to some external people, right? To some customers. And my take is that, you know, function calling or tool use or whatever you want to call it is actually the wrong abstraction for that kind of use case. And you need something else which I'm seeing emerge in a few different places. And I'm starting to call it process calling. And I will use the words like function calling and tool use and function use. I will use those interchangeably.
Alan Nichol [00:39:19]: Some people might, you know, claim that there's a difference between them, but from my perspective they're, they're the same thing. And so I think this is in the context of, you know, this idea of the, the augmented LLM. So this is a diagram from a wonderful blog post by Anthropic, you know, that really clearly explains, you know, some of the key techniques that people are using to build agents, you know, things like rag, things like tool use. And so they explain it as, you know, you have this LLM and you get to augment it with additional things, right? So you get to augment it, you know, with retrieval, to pull fresh information, with tools to, you know, read and write data to some kind of system of record, write an API or something like that. And you need some kind of working memory. And I think that this diagram needs another box. It's missing something really essential beyond just the retrieval, the tools and the memory. So just very, very quickly, just to ground us on what we're comparing against or what our starting off point is, one of these things you use to augment what an LLM can do is this idea of tool use or function calling.
Alan Nichol [00:40:29]: And the idea is that you equip the LLM with the ability to read or write some data to the outside world. So that if you go to your agent or your Chatbot or whatever you want to call it and you say, show me my GitHub issues. Of course, just a fine tuned LLM that somebody trained couldn't possibly know what those are. So they're going to go and call some API so it has the ability to say, hey, look, I've understood the user's request, I've understood that I have a source of data where I can retrieve that information for them, so I'm going to use that. And we're wrapping that, that actual raw API call Of like listing GitHub issues, we're wrapping that in some glue, right? Because you got to extract some parameters from the user's question and you've got to take something from the output and formulate into a nice answer and all that kind of stuff. And so MCP is an emerging standard and I think a very popular topic right now. And it's a way of standardizing this kind of interaction pattern, right? So historically if you're a developer and you had to integrate some APIs, you would be writing that glue code, you would be formulating the right parameters for your request, and then you would get some JSON back as a response and you would take out the parts that you need and do something with it. For the purposes of this talk, I think we just need to know that MCP is a standard way of implementing this pattern.
Alan Nichol [00:41:54]: And then specifically what I would say about it is that it means that you don't have to write that glue code because instead you're going to tell the LLM, here's what the request should look like, here's what the response is going to look like and kind of let the LLM do that data sort of manipulation for you, right? But you know, it's a very nice, it's a very nice idea because then, you know, other people can publish kind of ready to use APIs that if you're building an LLM based app or agent, you know, you don't have to write a bunch of this glue code, but here's kind of the key piece. So there's a, there's this key idea in the MCP specification which is that tool operations should be focused and atomic, right? So something like list my issues or you know, post a comment on a particular pull request, or you know, just some one small thing, do, read or write one small thing. And I think that's the wrong type of abstraction, the wrong thing to be using when you're building a customer facing agent. So when people come and they talk to an organization that they're not part of. They don't speak the right lingo, they don't think in terms of like your, you know, your back ends and your operations, right? So this is a, you know, classic use case. This is, you know, customer support for an E commerce website. And people don't come in with a perfectly specified request, like, you know, list my GitHub issues. People come in and they describe their own reality and they say something like, you know, hey, you messed up my order.
Alan Nichol [00:43:23]: And you know, that could be many things, right? There could be many core issues that this person's unhappy about. It just hasn't been delivered yet. It has, but it was late, there was something missing, there was something wrong in it, something was damaged. There's so many things that could be in there. And any business, any organization is going to have pretty strong opinions about how they want to handle each of those different cases, right? This isn't like unknown stuff. They have a pretty well defined process for, you know, for particular types of customers, how they want to handle all these cases. So in this case, for example, if we kind of play this through, you know, in this case, the user had a missing item, right? The phone case that they ordered wasn't part of it. So we're going to have to go and call some backend systems to figure out what happened.
Alan Nichol [00:44:07]: And we look at our fulfillment API and we look at our delivery API and we check all this out and we realize that actually, you know, this shipment was dispatched in two separate boxes and the first one has arrived, the second one's on the way, it's coming, but it's not there yet. And then we're going to go fetch some information about this particular user and we realize that they're not a scammer. They're like a good faith user. They have good loyalty with all of that stuff and we want to be proactive and offer them something nice. So we're talking to a bunch of different systems and have some conditional logic and we are stitching together this experience of what we should be doing in this particular case. And so if you try and do this in a, you know, an atomic tool type type of scenario, you just have these tiny units of functionality. Like I have an API that can tell me, you know, the delivery status for this particular delivery. I have another one for my fulfillment API, which lists the deliveries given a particular order number, like all this kind of stuff.
Alan Nichol [00:45:12]: And so the way that you would implement this in a tool calling paradigm is what I would call prompt and pray. So you get this pattern where you get an LLM to produce sort of intermediate text, right? And people call this like thinking, let's leave that aside for a second. The point is it produces some intermediate text that says that sort of, you know, it's kind of reasoning through, okay, what's the thing I should be doing next, right? And it's then deciding kind of autonomously, you know, I should use this tool. Now I'm seeing this output, okay, here's the thing that I should be doing next. And it's, you know, I mean, this is very neat and it's very cool. And, you know, if you have a use case where this is genuinely dynamic all the time and you need to be improvising all the time, then, you know, this is the way to do it. But it's also very flaky and it's very difficult to use, right? I mean, anyone who's, who's played with this realizes that, you know, extra steps kind of appear or disappear randomly. Questions get asked in random order every time.
Alan Nichol [00:46:07]: And the worst part is you can only debug this thing via trial and error, right? So, okay, you look at some of this intermediate text, you're looking at traces, you have some observability, you're looking at the thought traces, and you go, okay, well, this time I see that it did this. Why did this time decide that it should also verify the order contents first? Maybe if I change this in the prompt or add some additional information, it will do that or it won't do that next time, and it's just a game of trial and error or a game of prompt and pray. The challenge with this, as if you're a sizable team and you're not building one or two use cases, but you're trying to build coverage for, you know, the hundreds of different things that people come and ask your customer service system, your development velocity very rapidly drops to zero because you're constantly introducing regressions, you're constantly trying to chase, you know, why was it that the system did this on this particular instance and it didn't do it the other time? And how do we get this sort of soup of information that we put into the prompt to behave reliably? So the process calling thing is nothing really that special. It's just saying, let's think about a different abstraction and a different way of interacting with APIs when we're building an LLM based app, right? So with process calling, the LLM invokes not a single atomic tool call, but it invokes a stateful process, right? So aside from that, it looks a lot like a function call, right? It just says, okay, I have this tool at my disposal. It happens to be a stateful tool. It's going to guide me through some steps and then for multiple steps. And by steps, I mean back and forth with the end user for multiple steps in the conversation. It's driving the control flow through that process.
Alan Nichol [00:47:52]: So it had some shared state with the LLM. It's reading, writing some data, and the process is the thing that's actually interacting with the APIs. So the process is the thing that's defined, you know, by the business. And they say, you know, for a missing order, a missing item in the order, you know, first go check if there were, if it was dispatched in separate orders, and then go check the delivery API and then check what kind of user it is. And then use that information, you know, to follow these next steps to recommend something. Right? And then when either, you know, this process is completed or the user kind of interjects and says, hey, no, actually I wanted to do something else, or, you know, switches to another context, in that case, the, you know, the process goes away from the context again and you once again have the ability to either invoke a tool or another process or something like that. But it hangs out for multiple turns, right? That's kind of the key thing. And if you go back to that anthropic post that I lifted, the Building Effective Agents is the name of the blog post.
Alan Nichol [00:48:48]: That's where I lifted that initial picture from. They have this idea of workflows, which sounds a lot like kind of what I'm talking about with process calling. The key difference is that what they described as workflows is you have a single user message that goes into the LLM and then you have like this Rube Goldberg machine of all these things that are going to happen afterwards, right? You have all the steps that happen in response to one user message. But they don't talk about the really interesting case, which is, well, actually you now have a stateful process and it's going to persist and you're going to have to guide the user, ask them some questions and guide the user through some kind of process here. So, you know, the key thing, and this is, you know, I think, important, especially if you have worked in conversational AI for longer than the last two or three years. Important thing to understand is that when we define a process, we're not defining, like, here are the possible conversation paths, or one of these, like, horrible flowcharts that people used to build of. You know, what if the user says this or what if the user says that? It's not describing the conversation, it's just describing the business process. Only kind of like one, one encapsulated piece of logic, right? And so for a task, a process just describes what information do I need from the end user, what information do I need from APIs and like any branching logic that I have based on the information that I'm gathering through the process, right? And so the way to frame this back to the original idea of like function calling, process calling is that function calling is just a special case of process calling where the process has just one step, right? So a process is just a fancy function that's stateful and has multiple steps and hangs out for a while and drives the conversation forward.
Alan Nichol [00:50:30]: And it's a far more scalable approach than the sort of prompt and pray approach to building more advanced, more sort of scaled up AI agents. I mean one, I mean, and then it kind of feels ridiculous that this has to be a selling point, but it's actually a pretty important one is you get, you know, reliable deterministic execution of business logic, right? You don't get like 80% of the time it's right. You get it always that it's right. It means you can build true modularity so you can truly break up processes into sub processes. So you have like reusable pieces of logic, right? In a system where you have like LLM control flow, you know, you don't have true decomposability, right? This goes back to like dicer, you know, if you can't determine the entry and exit conditions for a piece of code, you know, you can't reason about it and you can't use it as a true module. Yeah, it means that debugging is straightforward because you know, the process is either being followed correctly or it's not. You're not sort of squinting, looking at traces of LLM outputs, trying to guess and divine what it is that was going on. And you get some nice non functional gains as well.
Alan Nichol [00:51:35]: You get a lot less token use and a lot less latency because ultimately you know, your business process is something that you already know up front. So you can just write it down rather than laundering it through an LLM and hoping that it, you know, survives the trip through the laundry and comes out intact. So it's also just a, you know, a much more efficient way to run things at scale. So kind of going back to the original point, I think at Rasa we're certainly not the only ones thinking about this. And I'm seeing this sort of pop up in other places. And so I think it deserves kind of its own name of process calling and it deserves a box in those posts of like, you know, here are like common patterns for building agents because it is becoming a common pattern. Right. I'm seeing it in other places.
Alan Nichol [00:52:22]: Specifically, you know, in Rasa we have a framework called com for building, you know, AI agents. And process calling is sort of one of the key tenets of it. It's not the only piece of it. You have lots of other things around, you know, having a fluent conversation and how do you deal with sort of changing context between these flows and orchestrating across like rag and flows and those kinds of things. So there's lots of stuff that, that happens to build like a full conversational AI framework. But this kind of key idea of process calling I'm seeing kind of, you know, crop up more and more commonly. Keeping an eye on the time. I'll.
Alan Nichol [00:52:54]: I'll have to skip the demo for today, but hopefully we have time for a couple of questions.
Ben Epstein [00:52:58]: Wait, I think I actually, if you.
Alan Nichol [00:53:00]: Want to run the bed down, finish.
Ben Epstein [00:53:04]: Alan, are you able to hear me?
Alan Nichol [00:53:05]: Yeah, I'll just, I'll just finish. I'll just finish and say if you want a fun prompt and praise sticker, just shoot me an email. I'll happily send one. I've shipped them to, you know, all sorts of different countries and yeah, thanks for listening and look forward to any questions.
Ben Epstein [00:53:22]: I think that you can run through that demo if you'd like. We have the time.
Alan Nichol [00:53:25]: Okay, let me see if it's. If it's up and running because I'm now in a different browser, so this is going to have to be.
Ben Epstein [00:54:08]: While you're pulling up your screen, I'll drop in a question that we got. Yeah, about Rasa. So it says we're thinking. We've been thinking about similar concepts at sequent more from a DAG perspective where tools are combined into a graph of. Of processes instead of a linear chain of processes. Makes sense. For example, after running a tool you can use tools B and C but not tools D and E. Also makes sense.
Ben Epstein [00:54:32]: Does Rasa allow defining such kinds of graphs and validating them? Do you have integrations with Lang graph?
Alan Nichol [00:54:39]: Yeah, all good questions. And I don't know why my UI is frozen. So I will post the link to a. A demo on YouTube that some people can check it out afterwards. Remind me of the question I was busy trying to get the running.
Ben Epstein [00:55:01]: Does Rasa allow for defining those kinds of graphs where they're more of a DAG and less of linear graphs, where, for example, you can run a tool a based on the output, you can call some other tools, but not other tools.
Alan Nichol [00:55:13]: Yeah, no, that's valid. So it's not even a dag because you can have loops and loops actually can be very useful and very important. You can do lots of very simple things in a nice way with just a single step that loops user if they're happy or something like that. Yes. The nice thing is that you can build out these little processes in as small of units as you want. One of the common things is at the start you need to authenticate users. At the end you want to send a quick feedback survey to see if the person was happy or if they're willing to follow up and give some more feedback. So being able to reuse those bits of functionality and compose them together is a really nice way to also scale out your implementation.
Ben Epstein [00:56:02]: I think the secondary question to follow up was if you have an integration with Langraph, what is that integration?
Alan Nichol [00:56:10]: I can drop a link to a demo where we share, I think we compare side by side, one of the Langgraph demo apps and then we re implement it in the process calling paradigm, which might answer some of the questions. We do a bunch of stuff on Rasa orchestrating, you know, across different types of applications. Right? Because I think one of the interesting things is, okay, if you have sub agents that are potentially built with different technologies, how do you smoothly have a conversation that integrates all of those skills? Right? So if, if each of them is truly independent and you just need to route the user to one of them, then it's an easy problem. Right? But I think one of the things that we try and do well with calm at Rasa is, okay, how do we share context across those different use cases? And how do you have a fluid conversation that sort of allows users to not worry about which sub agent they're talking to at any given time and just sort of talk as if they were speaking to another human. And so we have some things coming out, we have a tutorial coming out in the next couple of weeks on doing that kind of smooth integration with some process calling with MCP and with A2A so kind of throwing it all in the kitchen sink, like showing how you can, you know, orchestrate all these different pieces. And I kind of like the idea of maybe also, you know, maybe one of the things that we orchestrate in there could be a line graph button and I think that could be a nice addition to that tutorial. So it's a good question.
Ben Epstein [00:57:44]: Very cool. And Alan, if people want to reach out, what would be the best way to get in touch with you?
Alan Nichol [00:57:49]: Yeah, just reach out and ask for a sticker. It's just alan.com and I will send you a sticker.
Ben Epstein [00:57:58]: Stephen asked if you've tried using DVOS for your state checkpointing within rasa.
Alan Nichol [00:58:03]: I am learning about deboss for the first time today, but I really, I really. It appeals to me, so there's a lot to be said for it. So, you know, I'd love to check it out.
Ben Epstein [00:58:13]: That's awesome. Would love, I think, the community or certainly I would love to see that. That collaboration. That would be very fun to. Yeah, no, because coming.
Alan Nichol [00:58:21]: Something that we, that we, you know, that's definitely a strength of dboss. That it is something that we don't do particularly is sort of this truly like Async, hey, I'm going to go do something and I'm going to pick it up again, like in a day or a week or something like that. Right. And yeah, I think that's very, very compelling and interesting. So I'd love to, you know, work together, Jan, and maybe we can build a, build a tutorial or something.
Qian Li [00:58:43]: Awesome. Yeah, I see a lot of synergies between ROS and dwas.
Ben Epstein [00:58:49]: That's awesome. All right, we're coming up on time. We have one minute. Any final comments or words, anybody, you guys can all share. Maybe just go around where people can get in touch with you, reach out, learn more about the respective things that you're building. The community. Alan, you can kick it off and then we'll, we'll go backwards from, from the order of the speakers.
Alan Nichol [00:59:10]: Good. Yeah, just, you know, my call to action would be don't just prompt and pray. There are better ways to live. Don't pull your hair out trying to make these things viable. If you're architecting a system that involves an LLM, think about what part of this system is genuinely dynamic and going to change every time a user interacts with it and encapsulate that complexity with an LLM. And for everything else, just build traditional software. Then my call to action would be, if you disagree with me or just want to chat, just shoot me a message.
Qian Li [00:59:48]: So on my side, as I presented our belief that your database is all you need, durable workflows should be really accessible for everyone. It should just be lightweight as a library. So my call to action is please try dboss give us feedback. Hang out on Discord with other users. If you want to reach out let's connect on LinkedIn. Happy to chat. Have to help.
Elliot Gunton [01:00:18]: Cool and me going last. I was getting worried as I was watching Jian's talk that we were on competing technologies but it seems like they have a place to live in the world so durable execution for that long term aspect versus algo workflows with something that you want to run on kubernetes in a short time period. You can find me on the CNCF, Slack and ElliotPipeKit IO and also on LinkedIn Elliot Gunton. There's not many Elliot Guntons in the world so awesome.
Ben Epstein [01:00:49]: All right, thank you all three for presenting. These were really really awesome talks. Really well synergized I think. And thanks for everyone for coming on and listening with us. We'll see you in the next one.
Qian Li [01:00:58]: Thank you.
Alan Nichol [01:00:58]: Thanks Ben.
Elliot Gunton [01:00:59]: Thanks very much.