From Console Scripts to Agentic Services: Building Observability into Everyday LLM Workflows // Colin McNamara // Agents in Production 2025
speaker

Lifelong problem-solver combining deep technical expertise, business leadership, and hands-on innovation to tackle complex challenges—using cutting-edge AI to create meaningful solutions for businesses and communities
SUMMARY
This talk shares the ongoing, real-world journey of building agentic infrastructure at AlwaysCool.ai—from simple GPT-based tools to our first production-ready AI microservices. We started with small wins like automating nutritional analysis and FDA label validation, but quickly ran into issues with sync limits, cost control, and debugging blind spots. That led us to build a shared agentic service layer, using LangGraph to orchestrate multi-step flows and FastAPI to serve those agents cleanly. With OpenTelemetry at the core, we now send metrics and traces to Prometheus, Grafana, and LangSmith for real-time visibility, which is critical for compliance workflows such as HACCP, CAPA, and FDA traceability. We’re not claiming to have it all figured out—this is a story of learning in the open, much like we do at the Austin AI Middleware Users Group (AIMUG). If you're navigating the same terrain—tooling decisions, observability gaps, or production pressure—this talk offers patterns, tools, and cautionary lessons worth carrying into your own journey.
TRANSCRIPT
Demetrios [00:00:00]: [Music]
Colin McNamara [00:00:08]: So hey everyone, my name is Colin McNamara, I run engineering and finance for Always Cool Brands and Always Cool AI. I want to give you a little story about the journey that we've been going on over the past two and a half years transforming our internal processes with AI in the world of highly regulated industries, the Food and Drug Administration industries. So our beginnings at Always Cool and just a little bit of background Always Cool brands, we make products for other brands specifically for retail grocery stores and stuff like that. We have 19 products that are coming out over Q3 and Q4 across 4,500 stores in America right now under those stores brands. And we specialize in taking the dyes and the additives, the fillers out and reformulating high performing branded products under those store brands for like 10% cheaper. Good for you, good for the environment. So the journey to the amazing sauce of having all these SKUs released into production was challenging. Right? So about two and a half years ago we started doing the formulation process, the bid management processes, the quality management processes of developing formulations for these retailers.
Colin McNamara [00:01:32]: And at the start, and I'll give some use cases based around two key applications that we made to automate our day to day. But to start, there's a lot of spreadsheets, A lot, a lot of spreadsheets. And early on when we're doing binge testing and reformulating items and whatnot, the formulas would change all the time, the nutrition facts would change all the time. And the challenge with that is when you're creating a sample to send off to your customer overnight. Some of your customers might have nut allergies or they may have religious preferences. And you have to make sure that all the ingredients are properly stated on this or all the allergens are properly stated to some fairly complicated Food and Drug Administration guidelines with heavy, heavy penalties including people dying. And so it takes a couple hours to get it right. And it is pretty chaotic when you think about the compliance world of heavily compliant world of food development, food science.
Colin McNamara [00:02:31]: And so we started doing things by hand and matured in a couple of ways like a bunch of people did two or three years ago. We opened up that ChatGPT console starting in 3.5, making our 4.0 and started copy pasting nutrition data, nutrition facts and these things. We found the errors in math, but we found some things that were really good. We ended up building custom GPTs that we used in our teams and basically built a little python adapter in the Food and Drug Administration's interface as well as USDA and Use that to generate accurately each time a great Nutrition Facts label. And of course I'm a large scale engineer, Abby's a nutritional scientist. We double check everything by hand. But what we found was we were able to take these custom GPTs, then hack it into open web UI to basically do a private console internally with pipelines. It was okay, but it got us there.
Colin McNamara [00:03:28]: What are some of the challenges with this? One of the big challenges was that there's just so much you can do. It's hard to guide someone down, especially if you're delegating some work. It allows you the space to experiment whether you're doing this in your playgrounds or in a ChatGPT console, cloud or maybe your own private hosted. The challenge is though scaling for console workflows. It can be done easier now with cloud code and whatnot, but it's really hard to have the same thing done everywhere unless you're pipelining to a specific agent or tool. That brought us to the next generation. We had, we had these scripts that were held that would do the Nutrition Facts generation. We had an AI agent script that would pull labels and go through the FDA CFRs and all the different rules.
Colin McNamara [00:04:27]: But all the different things those helped. But what really helped is when we went ahead and we put them up in a next JS interface, put it in a web player just backing into OpenAI and it works really good with four omni for omni for like deconstructing images. The challenge is that when everything moved to O3 and we got access to these like super smart models on the back end, we went from synchronous information coming out to the end users or us as end users to where they have to wait five or 15 minutes. And that completely broke our applications or made it so we could not upgrade out of our labs. And at that time we implemented Langgraph TypeScript for state management of our AI web tier for two of eight applications that we needed to go async. And it really was a game changer at that time. We also do a call out to the LangChain AI team. The been incredibly supportive for us on this.
Colin McNamara [00:05:30]: We got Langgraph SDK, excuse me, Lang Smith SDK also at the time. So we got a lot of visibility. It was really, really cool. Now for us, one thing that's taken langgraph and it's exploded like tribbles. Use it kind of everywhere right now. And we're in the process of porting from Langgraph TypeScript and Web Services layer to put them in Python as graph behind fast API and getting centralized governance, control and separation from that layer. The thing about Lane graph that's been really great for us is that it's just magic state management. I can share state across all the nodes and agents or it has this magic way of checkpointing.
Colin McNamara [00:06:17]: If a script goes halfway through, I don't know, doing a file operation or pulling something off an EDI system or whatnot and it knows where exactly what step it was. You can go back in time in your logs and most importantly you can use it to instrument your agentic flows. With open telemetry which has been this amazing eye opening world where now where we have these agents that are doing something really important for compliance and we should log how we're doing these things and we can have that business process automated in a simple set of scripts deployed via git, running in a microservice and we can audit them and have that governance come full circle using opentelemetry especially when you're building the agentic workflow and you can capture the KPIs that the agent's seeing and insert that into the log stream. It is a brilliant tool chain and langosh real at the core of that for us. We are in this great migration right now. It started with the Langsmith SDK and really expanded into us exporting to OpenTelemetry collectors. We're turning up new ones right now on Cloudflare containers it was really, really cool. But the ability to have one set of fully instrumented basically stack that is common across our web application as well as agentic processing.
Colin McNamara [00:07:49]: And we can do things like transmutations, we can clear up pii, we can invalidate data governance, all sorts of really cool things. And what that is supporting for us, in which I'm really excited about, is it's bringing us to a architecture where everything can come out in the dashboard, right? So where everything's been moving towards us is highly specialized agents that have an entry point, right? The specializations themselves we can validate through elements of judge, through direct observation and through, you know, we always validate like with a little, with a CPG lawyer. The labels are good, that type of stuff. We have a great little feedback cycle to make sure those are good, but they're small. FastAPI service layer is amazing. It's super robust, self documented and it boots up these lane graphs really great. Then the telemetry right now that we're getting, especially with open telemetry, it is really, really mind blowing. That's directly resulting in dashboards that allow Us to manage our business both on the compliance side, but then the functional management of our agents.
Colin McNamara [00:09:01]: Some things that matter is Langgraph is more than state management for AIs, you can do it for everything else. And for us it took two to three hour jobs down to five to 10 minutes. And it made it so they're accurate each time. It allowed us to scale. So we learned that this process of going from console microservices is important. State management is where it is and that observability is important, but only to keep a human in the loop. For us, we have a lot of cool things in the future. We are maturing pretty much in every single aspect, including tying in our earp systems and doing fun stuff.
Colin McNamara [00:09:39]: So if you'd like to connect and talk more, please reach out with me. Scan the QR code. That's how to get in touch with me. I love being associated and connecting with people who are doing a little bit different. Thank you.
Demetrios [00:09:54]: Oh, you did not disappoint. Yes, my incredibly high standards. You hit them and actually you surpassed them. I will say I have one question for you before we rock on to our final talk.
Colin McNamara [00:10:09]: Yeah, what's that?
Demetrios [00:10:10]: That is I have talked to friends who are also using Langraph and the one question. Well, so there's open telemetry, there's Lang graph, and then there is the traditional observability tools that I imagine you're also using. And what I guess the questions are first are using the open source Lang graph or using like the hosted cloud version.
Colin McNamara [00:10:40]: I am an open source guy. True and true. And I come from web scale. So like at some point you have to, you really have to partner with the community on that and find what lifts your boats. So yeah, I'm not on the land graph cloud which you can get on, you can order on Amazon now, Right. Amazon Marketplace if you want to do it yourself. But you can build all this yourself out of commonly available components.
Demetrios [00:11:03]: For some reason my head was in the wrong space and I was like, oh yeah, they'll just deliver that to your door these days on Amazon.
Colin McNamara [00:11:10]: Right.
Demetrios [00:11:11]: How they do it with that Amazon cloud. So yeah, I just have had friends that I've talked to that are wondering where they plug in the traditional observability into these systems and how that whole loop works.
Colin McNamara [00:11:28]: So traditional observability of like managing state, managing matrix, managing your metrics, managing your logs.
Demetrios [00:11:35]: Yeah, for the, for the agents part specifically. Right. And then I think those, that data.
Colin McNamara [00:11:41]: I think that Prometheus manager, Loki, that though and Tempo guided together with using open telemetry and collectors to distribute to the appropriate things combined with. And I'll say on the, on the. The AI observability stack you have Langfuse open open source. Right. Which has a great API. There's some security architecture things I'm not a huge fan of but when it comes to how the prompt CMS thing. But honestly LangChain does the same thing with Langsmith. But anyways fanning out to your full CNCF observability stack and then you can embed your evals, your prompt versioning basically your AI flow inspection with LankViews if you want to do it completely open source or you can do like a Rise Phoenix or whatever you route it into Link Smith still if you want to do it that way.
Colin McNamara [00:12:41]: Yeah it's good to support the projects you consume. I do that totally.
Demetrios [00:12:46]: Yeah. Yeah. Okay. So that makes a lot of sense and that's awesome. Last question. Prasad's asking are you on Kubernetes.
Colin McNamara [00:12:57]: Right now? That is so. In many different ways. Yes.
Demetrios [00:13:02]: In many different ways. Is that what you just said?
Colin McNamara [00:13:04]: Yeah.
Demetrios [00:13:06]: Behind you. So I can only imagine what that means. Hold on, let's get a better look at that.
Colin McNamara [00:13:11]: It's a little get a lab right here. These run on like cloud run and stuff. You're able to do that quite easily. But that mess right there is my Kubernetes lab. Basically building a reference, a pre rebuildable reference I can throw down in the factories and then we can start to use the same architecture for the quality engineering of the actual lines that are running.
Demetrios [00:13:33]: Oh incredible. Wow, that's super cool. How do you ensure memory consistency and long running lane graph workflows? Do you snapshot state or rely on runtime context along open telemetry traces?
Colin McNamara [00:13:47]: From what I understand for me it's just been completely taking the state management. Nice. Yeah it's, it's. And this is the thing like this is all pre 0.1 so expect breaking changes. I'm. I for me I'm investing as much as teams in teams as in in platforms in the community. Okay. So yeah it's.
Colin McNamara [00:14:08]: It's been. You know I got these two applications that are deployed properly inside of it. Right. And so I'm still learning too. But the big thing is like you know, the full, the full Grafana's Prometheus stack. You know I'm in love with Clickhouse now as a metrics database. So clear and just this notion of being able to instrument with attributes at that agent layer when you know so much about the application and be able to drive tag those attributes with OTEL and your logs like you're. You have all the basics that you need to stay compliant right there.
Colin McNamara [00:14:42]: What you did, how you did, when you did it, who committed what, like, boom.
Demetrios [00:14:45]: That's huge. Yeah, that is huge. Well, dude, thank you. I will really appreciate this. I'll be in touch.
