Operationalizing AI Agents in Data Analytics Workflows // Ines Chami // Agents in Production
Integrating Large Language Models (LLMs) into production-level data workflows presents both significant challenges and opportunities. In this talk, we'll introduce Numbers Station, a platform that automates data analytics workflows using LLMs, Retrieval Augmented Generation (RAG) over a Knowledge Layer, and a customizable multi-agent architecture. We'll start by discussing practical use cases for analytics, such as dashboard search, query generation, or automatically adding analysis summaries to slide deck presentations. We'll then delve into the methodologies for deploying LLMs within data analytics workflows, focusing on a detailed case study to build a SQL agent from the ground up. We will cover the architectural considerations necessary to support agent-based analytics, including the role of dynamic control flows and the importance of incorporating business context through a unified Knowledge Layer. This session aims to provide a deep technical insight into transforming theoretical AI frameworks into practical, scalable solutions that advance organizational data capabilities.
Andrew Tanabe [00:00:05]: Hi everybody, welcome back. We have Inez here from Number Station and we're looking forward to hearing her presentation on data analytics workflows. Inez, I'm going to step off stage now and I'll put your presentation up. You'll have about 20 minutes for your presentation with five minutes at the end for Q and A from the audience. As we're getting close, I'll give you a small warning, just a time check and then when it's Q and A time, I've got a separate stream to the Q and A, so I'll step back on to help you just to provide some Q&As from the audience with that. Inez, thank you so much for being here and we're looking forward to your talk.
Ines Chami [00:00:47]: Thank you for having me. Hi everyone, my name is Ines. I'm the co founder and chief scientist at Number Station. Just for my background prank to Number Station, I was doing my PhD at Stanford with Chris Ray and I was working on representation learning for structured data with a focus on representing knowledge graphs with the vector embeddings. After graduated, we started Number Station with my advisor and some of my former lab mates with essentially the mission to bring AI to structured data. And I will tell you more about it today. Okay, so in the talk I'm going to start by giving a brief introduction of Number Station, the platform, the problem we're solving and I'll give a brief review of the solution we're offering. I'll then do a technical deep dive covering things like LLM Agents rag and showing how we can build a multi agent system from scratch.
Ines Chami [00:01:41]: And I'll leave some time for Q and A at the end. Five minutes. Okay, so the problem we're solving at Number Station is essentially self service analytics. So in the past decade we've seen many enterprises investing in setting up what we call a modern data stack. And this is great on many aspects like ETL pipelines, improving storage and compute. But it also has caused many issues on the analytics side. And because of the complexity of the tools and resources that people use, we've seen data teams overwhelmed with requests from data consumers. So things like helping me find a dashboard, understanding how a field was created in the database, assistance, writing SQL queries, and many more types of requests like this, which ultimately leads these data teams to spend most of their time in support functions like triaging tickets for instance, instead of investing in longer term data projects that can actually impact the business.
Ines Chami [00:02:41]: With the recent development of LLMs, this problem is actually getting worse in some sense because everyone has seen demos of large language models, writing SQL queries, generating charts and doing amazing things, the expectations from end users are really rising to the next level. But in reality, very few of these magical demos have made it into production. This is adding even more pressure on the teams that are supposed to build this. And so at Member Station, we're really focused on the last mile to bring this magical AI into production. And we've spent decades working on this problem at the intersection of structured data and LLMs, both at Stanford and enterprises like Elation and Hotspot. And we're essentially developing this platform to help relieve the pressure on the data teams and to really bring the AI experience that people expect in terms of customers. Just to give an example Vouch, which is an insurtech company, they grew their business very quickly and they were getting overwhelmed with this data request from end users. So we help them self serve and bring AI experience for the end users through a chat interface just for the platform.
Ines Chami [00:03:54]: And then I'll wrap up with Number Station. The way it works is we have essentially it starts with connectors to all the tools used in the modern data stack. So things like the data warehouse, dashboarding tools, data catalogs, and even more thing that may contain knowledge like Slack channel or message history. We then automatically create a unified knowledge layer that captures all that knowledge from the tools that the organization uses. And that knowledge layer is probably the most important piece in our platform because it allows the agents to provide accurate answers tailored to the business by storing things like tables, columns, metrics, dimensions. There's many things that go into that layer, but it's really the key to provide accurate answers to the users. And we can interact with that knowledge through RAG retrieval augmented generation to give context to the models for every question that a user asks. And so that leads me to the second component, which is essentially this RAG retrieval model that is able to interact with the knowledge layer to get context.
Ines Chami [00:04:56]: And the last piece is the AI Agent framework, which I'll talk about today, which essentially is highly customizable and can interact with many of the tools that the organization uses. And just for an example type of workflow that would work with this agentic platform, a user can come in, ask for a specific dashboard. For instance, we'll use search and RAG on top of the knowledge layer to get the best dashboard. If the dashboard actually doesn't satisfy their need, they can use a SQL agent to write and execute a SQL query in the original data warehouse. And if they're happy with the result, they can use for instance, a messaging agent, like a Slack agent, to share the results with their team. So that's just one example workflow. It's the amount of possibilities is very wide, but just wanted to show where we're trying to go. Okay, so I'm gonna go to the fun part now, which is the actual technical deep dive.
Ines Chami [00:05:49]: And how do we build this agentic system from scratch? Just checking on time. Okay, we're good. So I'm going to use SQL as an example and I'm going to show how we can create a SQL agent that writes and executes SQL queries to answer business questions. And so if I wanted to build such a system from scratch, the obvious first thing to try is prompting an LLM with few shots or zero shot learning. So I'm going to create a prompt which has an instruction like please help me write this SQL for this request. I'm going to provide a natural language question from the user, the schema of the data, and then I'm going to send that to an LLM using an API for instance, and get a result. This is a good start. But once I have the model response, which is the SQL, there's obviously a limitation because I have to take that SQL copy, paste it into my SQL executor and run the query to get the results.
Ines Chami [00:06:46]: It's essentially using ChatGPT, but there's no action, there's no way to actually run the query. This is time consuming and how can we do better? The next thing we can add is execution tools in our pipeline so models can make tool calls and run this tool after the model generates the SQL. Instead of just being done here with the text generation, we can actually send that query to the data warehouse, build a tool for query execution and get the result. The naive way to implement this would be manually writing code that implements those steps. The first step is generation and the second step is execution of the code. That's what we call a manual control flow. It's better than having to run the query manually in my data warehouse. But it's still pretty limited because let's say, for instance, the query doesn't compile and then the model use, for instance the wrong table name.
Ines Chami [00:07:43]: With a manual control flow, we're stuck. After the model fails, we could add edge cases and consider all the scenarios that could happen. But this quickly gets very complicated and so we want to have a more flexible and a more robust approach. And that's where essentially agents come in. We talk about agentic systems where essentially the control flow of operations that the model takes is actually non deterministic and it's set by the LLM itself. Instead of automatically running each query that the model generates, we can use tool calling which lets the LLM decide what actions to take next to complete the task. Here, the sequence of step that the model is taking is actually non deterministic and it's set by the LLM and it will decide the best course of actions to take to get to the result. For instance, here, if it writes and tries to run the first query it doesn't compile, it can go back and correct that query based on the error message, run the query again and get a final response.
Ines Chami [00:08:45]: So that's much better. But there's still a caveat because even if the model outputs something, we're not sure that it's correct. So let's say for instance, as a user I was expecting a number in the range of 2000, but actually I get 4000 here. The calculation that the model used could be completely made up and it doesn't match the user's expectation. And so that's where the knowledge layer comes into place. And the key to fix this accuracy problem is to bring in the context in the model through retrieval, augmented generation. And by leveraging this knowledge layer that I was alluding to earlier here, we need to use rag. And we can't just feed all the context in the prompt because there's a lot of context in an organization and there's just not enough context length in the models to put all of that as a prompt context.
Ines Chami [00:09:33]: So we use retrieval to get all those fragments of SQL database schema, pretty much anything that we think is relevant for the model into the prompt to make the model generation more accurate. And so an example item that we would store in the knowledge layer here is this metric calculation. So if a user is asking about the number of active customers, the definition that we have in this example is contextualized to the business. And for instance, it has this filter that closed is not zero and that definition the model will not guess out of the box. We have to get it from somewhere in the organization, whether it's a tableau report, whether it was in an email, but that information lives somewhere and we have to learn it to make the SQL agent accurate. So that's part of the ingestion and the knowledge layer creation. Okay, I'll probably skip quickly through this, but this is how RAG would work. So if we have all that knowledge created, the idea is to store the data that we want in an index, which could Be like a text index or an embedding index.
Ines Chami [00:10:40]: And then when we get a new query from the user, the RAG tool is essentially going to run embedding or text similarity search over that large index, get top K results based on the context that we want to have in the model, and then we can include those results in the prompt so that the model has more business knowledge before it actually runs the query. We can then equip our LLM agents for SQL with this additional tool. Now it has a SQL execution RAG tool to get the retrieval context. Then in terms of control flow, a user comes ask a question. We're going to first get the context, then write the query and generate a response that is now more in line with the user's expectation. So, putting a lot together in terms of implementation, I'm doing a very simplified example of how this would work. But essentially, if you recall, the LLM is a combination of LLM action and decision. So if I want to implement this, the first thing we're going to do is implement this prompt that the LLM has to set its role and responsibilities.
Ines Chami [00:11:48]: So for instance, here the LLM is told to write and generate SQL query, I mean generate and execute SQL queries to answer the question. So that's just the prompt that sets the LLM responsibility for the tool. We have to register all the tools that the LLM has access to. So in this specific example, we said we have a SQL execution tool and we have a tool to search the knowledge layer to get the metadata about the, about the specific request of the user. So that's just the RAG piece. And for the decision component, it's really about making the LLM decide the set of action. I skipped a little bit some of the details here, but the key idea is that we don't have anything that tells the LLM to stop or what to do next. It's really up to whether it generated a tool call or whether it didn't generate a response for us to know that the LLM is done with the task.
Ines Chami [00:12:37]: And so we have really this agentic piece that the LLM is deciding what it's going to do over the course of actions. I put a little demo, but putting it all together, you can get this type of experience where the model goes and run all these different tasks. Okay, so just sharing some numbers. But this approach is really much better than the vanilla approach of just prompting an LLM. The big, big piece that improves this is really this knowledge layer piece of making sure that we're using contextualized SQL and not just having an LLM guess out of the box. And also all this retry mechanism of being able to correct errors with this agentic framework really does help a lot. So we did see like huge improvements on real benchmarks from customers, which are usually more realistic for the SQL that happens in the real world, not the easy public benchmarks. Okay, so in terms of next steps, I wanted to also talk about this multi agent type of workflow.
Ines Chami [00:13:44]: And so far we've covered how to create one agent, which is the SQL agent. But we know that in reality people want to do more things like find the reports, generate charts. There's many types of requests that go to these data teams. How can we actually build a system that has all these capabilities together? Just to show you another example of what another agent could look like, this is an example for building a charting agent. In this toy example, I'm illustrating the same steps as before, where we have a prompt which essentially here sets the responsibility of the chart, and the chart here has to write charts in Python. But it could be any type of framework for charting that we want to use. We have also tools like before, but instead of SQL execution, since we're using Python, we need Python execution. We also provided for instance, a tool that searches documentation.
Ines Chami [00:14:38]: So it's the equivalent of the knowledge layer for the charting agent, where it can see example charts that have been generated before and learn from that to generate charts that are more in line with what the user wants in terms of decision path. It's the same idea that the charting agent is going to decide its own course of operations. To get to the final result, it might search for documentation. Maybe it's not going to need documentation. It will write the query and then it's going to run the query and finally return the chart and be done. We now have two agents. As an example, just in the video you can see you can customize the color. It's very flexible in terms of all the things you can do.
Ines Chami [00:15:20]: Now we have a SQL agent, we have a charting agent. We still don't know how to put all these agents together in a single chat. That's the last thing I want to cover today is how do we create a multi agent system where all the agents talk to each other. You'll see online there are many ways and conversations patterns that people can create, like group chats, pairwise chat, there's like very different combination. For our specific use case, we're using a sort of hierarchical chat where there is a planner which is responsible for decomposing the tasks that the user gives to the, to the AI into multiple subtasks that these sub agents are responsible for solving. And that planner is also responsible for routing to the different sub agents. So if we have that type of architecture, we can essentially create many agents depending on all the expertise that we want to give to the user. So we can have an agent that searches over reports, an agent that searches over existing emails or conversation.
Ines Chami [00:16:22]: We can have an agent that writes queries, generates charts, generates slides that. So it's really like the realm of possibilities is very wide. Anything essentially that can be interacted with through API calls is like up for ally here. But once we have implemented all those individual agents, we can coordinate the operations with the planner. So ultimately the user only talks to the planner. But the planner has a lot of capabilities because it can talk to all the sub agents. The nice thing also is they all share the same history. So you can also do things like jump from one agent to the other.
Ines Chami [00:16:59]: They can help each other. If I'm asking for a chart directly, I can first go to the SQL agent, write the query, then the chart is going to take it over. It's just a very nice and natural interaction for the user. I think that's all I have. I even finished early. I went too fast. But that gives some time for questions. I'll just stop here.
Andrew Tanabe [00:17:23]: Great, thank you so much, Ines. What a super in depth conversation there. And really, really nice to just get a sense of the actual architecture that you're looking at and some of the, some of the demos that you're showing there. Really cool to see it in action. Thank you so much. We do have a couple of questions here, actually more than a few questions here in the Q and A and I'll just grab the first one here. So this question is really about permissioning in a sense. There's a lot of big companies out there that are working on this problem, smaller companies as.
Andrew Tanabe [00:18:00]: And there's often a dichotomy between doing the agent performance side well versus doing the permissioning side well. And as you know, in the enterprise space, people are almost more concerned with the permissioning. Right. So just the question is really around how do you manage access to the right data? Who gets what?
Ines Chami [00:18:19]: Yeah, that's a great question. And the way we see this is this problem has already been somehow solved or is like people are working on this in the organization. Like they have governance in place, they have access control for all these tools. So all we do is whenever we use a tool, we just make sure. To respect that access control. We don't try to rebuild our own internal because it's just like not our scope. And also we have that information already in the source system. So it's really just about respecting whatever privacy is set in the original tool.
Ines Chami [00:18:50]: And that can be done in a straightforward way. Like basically when someone authenticates based on their permission, they have access to certain things and other nodes.
Andrew Tanabe [00:19:00]: Got it. Why reinvent the wheel when you don't even know the actual requirements there? That makes a lot of sense. Going into another question here. You mentioned early on that there was a really big disconnect. I love that comment around the disconnect between users expectations and then the reality of where these agents are right now and what their performance looks like. Curious, I think about how you as a company and as a solution go about handling that disconnect. Is this something that you try to address through UI solutions training? How do you approach that from a user perspective?
Ines Chami [00:19:46]: From what we've learned, it's mostly that knowledge layer piece. It's very easy for me to build a demo and the model is going to generate great SQL. It can do many things, but this accuracy for people to actually use this. And also if you remember that the end users here are not as technical as people who build this dashboard, so we can't give them answers that are wrong. It has to be grounded into something that is real. And so having this knowledge layer to back the model and help it provide accurate answer is the way we've solved this problem. So even if like we don't have an answer, we're capable of saying I actually don't know and you better not believe what I say next. Like having that sort of confidence is for us what we've learned.
Ines Chami [00:20:27]: To actually have people using the product daily and being confident in the, in the results that it provides.
Andrew Tanabe [00:20:33]: Cool. So yeah, like a lot of proactive communication and sort of guiding expectations, I guess.
Ines Chami [00:20:39]: Yeah. And in reality like to build this, a lot of that work is just parsing, ingesting data, wrangling work from all the metadata stored in the tools. So it's not like fancy AI work. But we've learned that this is the key to actually get the AI to work correctly.
Andrew Tanabe [00:20:54]: That's a really great point. I guess just quickly following up on that one. Is that, do you, do you have a lot of setup with the users? Like is there sort of a big process? How long does that take in general?
Ines Chami [00:21:07]: Yeah, so I'd say like a year ago it was a very big process. Like Couple of weeks of setup and us manually sitting with customers, we brought that down to like a couple of hours of training where essentially users, I mean admin users, start using the platform and provide feedback and we take that feedback and inject it back in the piece that was fully automated. Long term, we want to have some full automation. So those three hours, we don't even want to have them as a requirement to start using the platform. But today there's still a little bit of human training to give feedback to the machine, to really tune it towards how the users will ask questions.
Andrew Tanabe [00:21:44]: Yeah, no, that makes sense. I think that here, working on similar internal versions, we've seen a lot of. When I say this city, I mean this city, not that city. When I say this state, I can spell it three different ways and I still mean the same thing. So there's a lot of, especially in the SQL side of things, there's a lot of lingo sometimes that internal knowledge of the users.
Ines Chami [00:22:08]: We don't want to have users to have to list all those things because all that knowledge actually lives in the data that they have. Whether it's a slack in the reports as well. Based on the titles of the chart, we can do a lot with full automation. But that sanity check, we still want to have it at the last mile to make sure everything is working perfectly.
Andrew Tanabe [00:22:29]: Yeah, no, that makes a lot of sense. One last question here. I think just in terms of scalability, I think there's a big focus here on taking things out of the research lab and into full on full fledged production and you're very much on that way, way down that road and having real users and business information flowing. That last section you were talking about, these layers of sub agents and this network, it made me think back to one of the earlier presentations today around costs. Right? Well, while the per token costs are going down, the complexity of these systems is going up and it's ending up increasing costs. Do you see scalability concerns in that framework, either from a cost perspective or just other technical issues going forward?
Ines Chami [00:23:21]: I think yes, at least today we have at most a dozen of agents and the experience is still very smooth. I think if we had 50 or hundreds, obviously this design like the hierarchical would not work. But the way we envision things is having multiple levels of hierarchies with smaller groups of agents. So you can reduce the amount of possibilities and paths that it can take. It might become an issue, but it's for how we're using the product today. And even if we get to up to 1015 agents. It hasn't been a problem. We also built our own agent framework, and some of the routing we optimize, like, if we don't need a model call to make the routing decision, we skip that.
Ines Chami [00:24:01]: So it does, like, make the performance faster and the flow a bit more smooth for the user.
Andrew Tanabe [00:24:08]: Cool.
Ines Chami [00:24:09]: Great.
Andrew Tanabe [00:24:09]: Well, Inez, thank you so much. It's been a real, really exciting talk and appreciate you sharing so much insight, so thank you very much.
Ines Chami [00:24:17]: Thank you for having me.