Sign in or Join the community to continue

APICA: The Digital Colleague at the Port of Antwerp-Bruges // Pierre Gerardi // Agents in Production 2025

Posted Aug 04, 2025 | Views 51

# Agents in Production

# Apica

# Superlinear

Share

speaker

Pierre Gerardi

MLOps Team Lead & Senior Machine Learning Engineer @ Superlinear

I’m interested in building AI systems in a way that makes them usable and able to deliver value. This goes from identifying the most valuable use cases to ensuring their solutions can be trusted.

+ Read More

SUMMARY

Is there value in implementing AI in your business processes? Undeniably! But as more AI solutions are deployed, the maintenance burden increases, and it becomes harder for users to find and use the right tools. The marginal benefits of single-point agents are limited, real gains come from creating a platform where different agents can collaborate and perform tasks autonomously. At the Port of Antwerp-Bruges, we've developed APICA, a multi-agent platform. APICA acts as a single digital colleague integrated into Microsoft Teams. Users interact through one familiar interface, while behind the scenes, APICA coordinates specialized AI agents to handle complex tasks. In this talk, we’ll share how we built APICA, what the architecture looks like, and how agents collaborate. We'll walk through a case study of our nautical agent, which processes maritime SQL data to answer questions. Finally, we’ll provide practical insights into the challenges we faced and what we’re still working to improve.

+ Read More

TRANSCRIPT

Pierre Gerardi [00:00:00]: Hello everyone. Let me first shortly introduce myself. So I'm Pierre Jordi and I work as a solution architect at Superlinear and together with the Port of Entre Bruce we have built the Apica Chat application. So the Applica chat is also known as the digital colleague of every employee at the port of Anthropruges. And you will see in a bit why we call them like that. So before diving into the actual technology, let us first take a look at the following quote of the CEO of the port of Anthropruges. So the Port of Antwerp, Bruges aims to develop a digital nervous system to manage the port entirely remotely by working together with innovative partners and opening up the port as an innovation platform that they are trying to make the port smarter, safer and also more efficient. So this is a bit of the goal where the port wants to go to and this is how they do it.

Pierre Gerardi [00:01:07]: So they do it by creating the Apica ecosystem. So I will not dive too much into detail of what it actually entails the name, but it's focusing on the following five technologies for now. So it's a bit of an umbrella of every smart technology that they use in the port and for now they have like five big domains. So the first one is the Apica digital twin. So this is like a virtual replica of how the situation looks like in the port. So say for example, if a ship comes in, you can also see it coming in in the digital copy. Then you also have the Apka chat like small spoiler. This is where this talk will dive deeper into later.

Pierre Gerardi [00:01:47]: So this is the virtual colleague and it tries to capture the question and then based on some internal processing of the data, the colleagues can have the answer back. Then we also have the Apica Planner and that's like an optimization model to get the highest number of incoming ship and outcoming ship by planning the number of take boats in the most efficient way. Then you also have Apica Vision. So Apica Vision relates to everything of smart cameras. So for now they are using it to detect some floating debris in the docks, but also to recognize the container number on different containers. But also more things are coming and definitely are there already. And then the last MAR technology that they are focusing on is the Apica Navigator. So this is a bit of the race of the port, so captains can use it to find the most ideal route and also see how long it takes the captain to get from point A to point B be.

Pierre Gerardi [00:02:50]: So for today we are focusing on the Apica chat and this is because the Apica Chat is like the, the goal is that it becomes the interface of this entire smart system so that the, the Apica Chat is the digital interface. And then he can orchestrate the question to the right agent. So say for example, if a planner has notified that one of the tugboat captains becomes sick, he could for example, ask Epica Chat, okay, one of my tick boat crew is sick. Can you solve this? And then the question is orchestrated to the planner, then a new planning will be made and the planning will be propagated back to the APICA Chat interface and there the answer is given back to the, to the user. So APICA Chat will be the interface of this entire system. So that's why we are diving a bit deeper into this technology by itself, because otherwise it would be bringing me a bit too far. So before building the Apica Chat application, we received some requirements on the port. And the first one was that we should have only one point of entry of this complex system, because otherwise if we create an interface for every agent that we have, the maintenance burden becomes too big, but also the complexity and finding the right interface for accessing an agent is quite, quite complex.

Pierre Gerardi [00:04:18]: Then the second requirement was because we are building like a lot of agents, why not create first a template so that we can maintain each agent in the same way, but also that they can work together better. So if they all have like the same framework, then it's also more easy to connect them and to make them work together and be more autonomous by themselves. And then the last focus point was we should promote reusability as much as possible because a lot of the agents have like the same tasks or some, some small difference in them. So why not focus on building like good quality tasks and then make them available to all the agents. This is what we have done by implementing this architecture. It looks like a bit of a mess, but I will dive into it a bit deeper. You have the interface and that's a teams copilot. So people can add it in their own team space and then they can ask questions to the colleague and then the question is orchestrated towards the right agents.

Pierre Gerardi [00:05:27]: So for now we have three agents. So we have like the typical talk to your documents, but then we also have like a service desk assistant which can answer questions based on human resource data, but also based on ICT data. And if it's not able to find the correct answer, it tries to escalate it towards the right ICT coworker where a ticket can be sent to. And then the last one and most Complex one is the talk to your nautical data. And this is actually a SQL agent where a question is translated to a SQL query and the SQL query is then executed and the technical answer is given back to the user. You see also on, on the top that we use a Data Galaxy, so maybe some, some more information on that. The Data Galaxy actually is like a description of how your data landscape look like. So it's like all the business specific terms have a description.

Pierre Gerardi [00:06:31]: There is also some links between those concepts. And also we have some mapping towards the physical data. So it serves a bit as a roadmap how the agents can use the internal data so that we shouldn't not put prompt them like, okay, this data is there, this data is there. They can go autonomously through the data. So that's an important part in making the entire agent system working more by themselves. All right, so remember the code of the CEO. So they want to make the port smarter, more efficient, but also they want to do it safer. And as most of you know, a lot of safety issues come with deploying an agent system.

Pierre Gerardi [00:07:20]: So we're trying to mitigate them by focusing on the following four domains. So first of all, we have monitoring and alerting. Then we also have an access control layer. Then we have some guardrails, of course, and then finally we also have a security evaluation framework. So let me first dive in into the monitoring and alerting. For this we have three big layers. First of all, before we can monitor anything, we should identify what do we want to monitor exactly. In our case, we monitor the number of failed logins, we monitor the number of requests incoming by a user.

Pierre Gerardi [00:08:01]: We also monitor some guardrail violations, but there are tons of more measurements. The first part ideally is to identify what to measure. Then once you have the measurements, you can also put thresholds on top of them. So if those thresholds got exceeded, then the right people should be notified. And then those people can make the actions that need to happen. And then we also have a descriptive layer to also make a small analysis of what went wrong so that we know what to do in the future, so that we can learn from what went wrong and can take the right actions. Then the next layer of making it more safe is the access control, of course. And we focus mostly on two access control issues.

Pierre Gerardi [00:08:54]: So first of all, which data may be used by a user, but also which actions do we allow our agent to take or orchestrated by which user? So those are like the two biggest blocks that we focus on. So first of all, I will dive a bit deeper into which data may be used. And for this it's good that I first do like a small case study of how we tackle or how we prompt the agent to solve such a task. So if a question come in, we first map it to the data catalog. So we selected the relevant concept to give some more information on the question so that some domain specific language is made understandable for our agents. Then whenever we have identified the concepts, we are going to take a look which physical data needs to be extracted for answering that question. And then once we have identified which data we need to query, we need to create the query itself. And this is also done by the agent.

Pierre Gerardi [00:10:03]: And then the query is also executed by that agent and the answer is giving back a natural language to the user. So this is the typical flow of data. To make it a bit more concrete, because it's quite vague still, let me dive into one specific question that the agent should be able to handle. So this is the data catalog that we use. So you have all the concepts of how we describe the port. So this is only a small part of it and it's connected like a graph. So you know, okay, for example, we have a journey, and a journey can be done by a vessel. So if we take the following example question, how many seagoing vessels have arrived yesterday? Then the first thing that we do or that the agent does is it connects the key concepts.

Pierre Gerardi [00:10:50]: So in this case this is arrival, journey and sea ships. Then the next thing it should do is it should try to find the path going from one concept to another one. Because this is really important if you need to link the underlying physical data together, because this gives a good description of how you can combine it. We see that arrival journey is a journey. A journey is done by a vessel. A vessel has a type, and the type can be seashell. So now we have all the concepts. So the next stage is to take a look at each concept and see if there is some physical data laying underneath it.

Pierre Gerardi [00:11:33]: And for this case, we have two data sources. So we have a SQL table called journey and we also have a SQL table called vessel. So the next thing it should do is based on all those descriptive language of the specific language within the port and also the physical data sources. And then we also have like a set of validated queries by business analysts. The agent can formulate the right query and it can execute it. So there are a number of places where it can go wrong. So first of all is if only concepts are selected which are GDPR sensitive, because this is Also a feature which is in the data catalogs, if a concept is GDPR sensitive or not. So.

Pierre Gerardi [00:12:21]: So if all the concepts are GDPR sensitive, we have to formulate back to the user. Okay, we cannot answer this question because we only select GDPR sensitive data. Then also if physical data is selected to which the user don't have access to, we also need to check this. And if data is selected, if only data is selected which the user don't have access to, we need to also propagate this back to the user. We cannot answer this question because you don't have access to this data. And then finally in the query, if a physical data source is used which the user don't have access to, we have to propagate it. And also if some concepts are selected which are GDPR sensitive, we can also not answer it. So this is how we mitigate this risk.

Pierre Gerardi [00:13:08]: So by impersonating the user rights and also by using the quality of the data catalogs. Then the second risk is which actions do we allow our agent to take, orchestrated by which user? So for this we do like two mitigations. So some actions we don't allow to be automated. Say for example, the deletion of data, we never want to make that action available to an agent, so we give some rights to the agent itself. But then also we propagate the user rights to the agent so that other tasks can be executed. So say, for example, the creation of a ticket to the service desk is a right that we can propagate from the user to the agent, and then the agent can send the ticket to the right people. So this is how we tackle that. All right, so now we have handled access control, then the next thing, and I think the most known thing is guardrails, so I will not dive too deep in that.

Pierre Gerardi [00:14:08]: So in total, we use three types of guardrails. So first of all, we use the input guardrails, and that's to detect questions that we don't want to handle. So say for example, the question how do I smuggle directs in the port? That's a question that we would never like to answer by our agents. So we have to block it before it comes in, just to be sure. Then also we have some output guardrails, and this is just to make sure that we don't leak internal information. So say, for example, in the future we want to make some parts of the agent available to the outside of the port. Then we have to make sure no internal data is leaked. And also we do some check based on hallucination.

Pierre Gerardi [00:14:50]: So we compare the references that it's used together with the answer and we validate it if it's hallucinated or not. All right, then the last thing that we do is to monitor it. So we use an evaluation framework for it. So we have created a test set and we scored a test set on types of violations. So we have the input violation, the output violation, the factual consistency, and also if some data access violations are made. So we simulate 50, 50 test cases, we let it answer by the LLM application and we check if we have to make, if we have to, if we need to make improvements and where do we need to make improvements. Then also we have a monitoring system collecting the feedback and that can be also used to make improvements more ad hoc. All right, so to conclude, how have we done the process of building this thing? And for this we have used like a continuous loop.

Pierre Gerardi [00:15:57]: So first of all, we focus on governance, mostly on data governance. So we need to have a good description of the data, but we also need to know which data is accessible by which person. So this is like a lot of governance work. Then whenever we have identified a good governance, we can go and start developing applications. So we should focus on getting the agent itself better. So we have to increase the competence, but we also have to identify the risk of each agent. And then whenever we have built something, we can publish it. And starting from there, we need to monitor.

Pierre Gerardi [00:16:31]: So this can be done by using test cases like I have illustrated before, but also to just monitor the real time use and make reaction based on that. And then it's just like a loop of improvement. So you should learn from it in the end. So in this way we are trying to make the port smarter, more efficient, but also more safe. And this also concludes the presentation.

Skylar Payne [00:17:01]: Awesome. Thank you so much. I don't know that we have questions in the chat, but we got some time so I'd love to just chat through some of this with you. So I'm very curious to hear more about the monitoring piece, particularly the reactive element. Do you have any examples you could share of a case where you had to react to something? What does that look like in practice? You get the water.

Pierre Gerardi [00:17:27]: You've been talking about the direction by itself because this is still only for internal workers. So security issues we didn't have had yet. This will become only when we, when we source it outside the company. But we are trying to capture some feedback based on users. So it's a typical thumbs up, thumbs down, and then they can leave like A comment and we try to improve the quality. It can be whatever the case is specifically, but it's just like some ad hoc improvements based on feedback coming in.

Skylar Payne [00:18:04]: Got it. Totally makes sense. It's really interesting I thought, to see some of the architecture you had in terms of answering queries by first determining are we compliant, what things do we have access to, and then also finding the tables and then generating the queries. It's refreshing to see something like that because I think realistically most of the demos you see are text to SQL or something like that where none of these real world problems are there.

Pierre Gerardi [00:18:40]: Yeah, like I've mentioned, the biggest success for us really became when we use like the data catalogers because they use some specific language at the port which is not like typically known by an LLM. So really prompting them with the concept of how they use language in general was for us like a huge improvement in the system by itself. So I can really suggest by everyone. So first take a look at what the actual case is, what the data looks like. Do you have some descriptions? And then I think that should take you already a long part.

Skylar Payne [00:19:16]: Totally, totally awesome. Is there any way folks can connect with you if they want to follow up with other questions?

Pierre Gerardi [00:19:27]: They can definitely add me on LinkedIn. It's just typing in my name and you will get there.

Skylar Payne [00:19:34]: All right, you heard it here folks. Go find him on LinkedIn. Pierre Girardi. Cool. Well, thank you so much for your time. So great to see, very kind of like in depth real world application of an agent. So really love to see it. Can't wait to hear from you again when we have some more learnings from this.

Pierre Gerardi [00:19:58]: Perfect. All right, thanks for having me. Bye Bye.

+ Read More

Sign in or Join the community

Comments (0)

Popular

Watch More

LLMs & the Rest of the Owl // Neal Lathia // Agents in Production

Posted Nov 26, 2024 | Views 1.3K

# LLMs

# AI Agents

# Gradient Labs AI

LLMs to agents: The Beauty & Perils of Investing in GenAI // VC Panel // Agents in Production

Posted Nov 18, 2024 | Views 1.1K

# VC

# Investor

# AI Agents

Planning is the New Search // Fabian Jakobi // Agents in Production

Posted Nov 26, 2024 | Views 1.7K

# streamline workflows

# Agentic

# memoryrank