MLOps Community
Sign in or Join the community to continue

Cracking the Black Box: Real-Time Neuron Monitoring & Causality Traces

Posted Jan 27, 2026 | Views 12
# EU AI Act
# Regulations Compliance
# Tikos
Share

Speakers

user's Avatar
Mike Oaten
Founder, CEO @ TIKOS

Mike Oaten serves as the CEO of TIKOS, leading the company’s mission to progress trustworthy AI through unique, high-performance AI model assurance technology. A seasoned technical and data entrepreneur, Mike brings experience from successfully co-founding and exiting two previous data science startups: Riskopy Inc. (acquired by Nasdaq-listed Coupa Software in 2017) and Regulation Technologies Limited (acquired by mnAi Data Solutions in 2022).

Mike's expertise spans data, analytics and ML product and governance leadership. At TIKOS, Mike leads a VC-backed team developing technology to test and monitor deep-learning models in high-stakes environments, such as defence and financial services so they comply with the stringent new laws and regulations.

+ Read More
user's Avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

As AI models move into high-stakes environments like Defence and Financial Services, standard input/output testing, evals, and monitoring are becoming dangerously insufficient. To achieve true compliance, MLOps teams need to access and analyse the internal reasoning of their models to achieve compliance with the EU AI Act, NIST AI RMF, and other requirements.

In this session, Mike introduces the company's patent-pending AI assurance technology that moves beyond statistical proxies. He will break down the architecture of the Synapses Logger, a patent-pending technology that embeds directly into the neural activation flow to capture weights, activations, and activation paths in real-time.

+ Read More

TRANSCRIPT

Mike [00:00:00]: But yeah, so I don't see. I don't see the regulations as a barrier. If you get it right, they're there reflecting what good practice should be anyway. So think of it like that. These are the expectations of people, of good systems that integrate with society. So get that right. Then we should. We should be able to move along a bit quicker.

Demetrios [00:00:25]: Let's talk regulations compliance. We're gonna make it fun today and hopefully help all of the engineers listening understand what is important.

Mike [00:00:40]: You have the hidden sexiness that lies under the surface of compliance regulations.

Demetrios [00:00:47]: Yes. So be careful, listener. You may fall in love after this episode. Why and what have you been focusing on in the regulatory space recently, Mike?

Mike [00:01:03]: We've been pretty much solely focused on the beast that is the EU AI act, which is a piece of horizontal legislation. So it's kind of supposed to be applied across the whole, you know, the whole territory of the eu, regardless of what business you're in or what your activities are. And it's kind of the tip of the iceberg because the, the act is like a piece of legislation. So it lays down, this is what you got to do. Basically all these are. These are the things we expect you to achieve. The standards, really, to not get into any trouble. But that's only the start of the story, because kind of hidden away in dark rooms around Europe for many years have been lots of experts who've been working on what's known as harmonized standards.

Mike [00:01:49]: So this is the kind of implementation layer under the act, and there are thousands of people that have been involved with that process over many years, and they've been writing these, these standards, which actually are full of real gold. They tell you, oh, if you need to comply with, like, the most stringent rules are in a system that's supposed to be biased, unbiased and fair. Okay, that's. Everybody acknowledges that's somewhere that you'd like to get to, but what does it really mean in practice? So you've got all of these, these underlying documents that give you a real kind of paint by numbers, nearly kind of walkthrough of what you should focus on in terms of your systems and your models and the sorts of things which would, if you can, if you can match the standards, you've got a presumption of conformity with the actual. So our product basically is reverse engineered against the EU act, but actually it's reverse engineered against this whole raft of standards, harmonized standards which sit below it.

Demetrios [00:02:49]: Okay, so if I'm understanding this correctly, you're saying there's the EU act, which is very fuzzy and interpretable by lawyers. It can go many different ways. But there's also a group of folks across Europe that are trying to say, here's like numerical and kind of tangible ways that we can put this legislation into practice and we can cover our ass so that if for some reason it seems like our AI is acting biased, we can say, well, we followed all of these steps that we were told, our best practices.

Mike [00:03:36]: That's right, yeah. So you've got this, this phrase called presumption of conformity in this world, which is if you do, if you have actually, if you can demonstrate that you follow the standards, then you will automatically be compliant with the Act. So all the risk teams in big businesses, they, you know, get very hot under the collar about the risk of all this stuff, like regulatory fines and reputational damage and all that kind of stuff. So in terms of how inside a mlops team or a dev team who are putting this stuff, trying to get it into production for business value, you've really got a roadmap. As long as you can follow the instructions in the standards, then you don't have to stray from that. Essentially, if you follow it, then your risk team will be happy because you've got this presumption of conformity. There's still some interpretation from those standards. It's not.

Mike [00:04:23]: It won't apply clearly for every single case that they could possibly be. So you have to sort of interpret it and say, all right, well, this is my reading of how it would apply in our particular case. So there's still work to do with the risk team and to kind of define exactly what these testing evals and sort of monitoring systems need to do. But the gap's pretty small. So, you know, most people who are familiar with the space would be able to, you know, spec out what, what those tests should be to. To make sure that you conform. But yeah, so it's kind of like a. I think it's just like gold, really.

Mike [00:04:58]: Nobody really knows about that. You just find these standards, I mean, one of the things is, is that they are generally, you gotta buy them. So they're not like, they're not like freely available because they get produced by standards organizations like the ISO or here in the uk, by bsi, British Standards Institute, which kind of. They license out all of their. Their know how in these, in these documents.

Demetrios [00:05:19]: And you mentioned words like evals and observability. How does that way that they interpret observability differ from, like, me in my head we're just observing the systems and we're seeing is the traffic going through, is it not? Are we predicting things, are we not? How are the predictions coming, are they drifting, et cetera, et cetera. Like, what kind of observability do you think about too?

Mike [00:05:55]: It's more about, okay, so I think there's like the, the starting point for most engineers who are doing observation tasks are we want to make sure our systems work well all the time and we're aware of the failure modes and we kind of patch for that and we work that in and you try and fix it. Whereas from the perspective of complying with regulatory demands like the eu, AI and there's loads of others in different places, they got different flavors, they tend to all have the same types of themes in them. They're coming at it purely from a risk perspective. So the idea there would be right. We've done an audit of our inventory, all our AI systems and the models inside them, and we compared that to, let's say you're deploying a system into the eu. We then check it to see whether any of those systems are in fact classified as high risk. Because there's prohibited systems, you're not supposed to use them at all. Then you've got the high risk ones and then you've got literally everything else and everything else kind of gets a pass.

Mike [00:06:59]: It's like, well, if it works, it's fine, it's not going to hurt anyone. So the ones in the high risk category tend to link to whether you can be discriminated against, your basic human rights. So you're not getting a loan because you're this type of person rather than that type of person. Okay, so that would fall foul on it. Or if it's a system which already needs some kind of conformity from a product safety angle. So if it's a AI system that's in a car, in a lift, in a toy, in some machinery, then automatically it's classed as high risk because it already needs to prove it safe without an AI element. With an AI element, you've got to then apply the same kind of stringent rules to it. So all of the thinking around observations then becomes what's risky here? And therefore what have we got to track? So if you take the example of like a, a loan application that might be biased to some applicants, the risk is, is that you are potentially going to cut across that human, you know, those human values measures.

Mike [00:08:07]: So then you identify, okay, we've got a system here which is going to be called by the act. It's going to be caught by the act because it's potentially could be unfair to certain people. So then you've got to build your monitoring tests and evals around trying to catch that. So from a, you know, the status quo is, does it work? Well, I. E. We don't want a biased model, but in this case the demands of making sure that it isn't biased go up several notches. So there you'd have to kind of engineer to a much higher level of granularity. And on that particular point, our kind of, our secret sauce with our technology is really around in mechanistic interpretability.

Mike [00:08:44]: We're getting inside models when they're running inference time and we're pulling out those causal chains of decisions so we can have a kind of ground truth of what's actually running in the model when it spits out an output to say, oh, you, you get a loan or you don't. And then we build tooling, observation and testing tooling on the back of those raw, raw traces.

Demetrios [00:09:05]: So this may be a bit naive, but if I am sending everything to a model provider like Anthropic or OpenAI, am I not offloading the risk onto them or is it like, oh no, they're a black box system and should be treated like so? And so it is much more dangerous if I have them as part of my flow.

Mike [00:09:35]: You inherit whatever their regulatory risks are, but you have the disadvantage that you can't see it. Sure, they'll produce model cards for all, you know, for all their models. And you can do some due diligence around that and do some testing and some drag race type testing with these kind of metrics. But they are only input output type tests. So you know, that's pretty sophisticated that world, but it won't necessarily comply with the highest demands of the EU AI Act. So in those cases, the risk assessment around that, when you're planning that system might be okay. We don't want to inherit a closed commercial model where we can't get access to the internals, so we use an open weights model. Okay.

Mike [00:10:20]: So, you know, you've then got a kind of performance decision to make. If we get an open weights model, we have to make sure that it's going to do the job well. And there's often a trade off between those things. What's the risk of the system falling foul of the compliance and the regulations and what are the downsides to all of that versus what's the performance hit we might take if we have to use an open weights model. But you know, there's a, there's more and more people that are kind of considering that because a, it gives you more insight into the internal operations of the model, get more control over it and you can, you can get more, more flexibility around fine tuning and sort of rag based systems, that sort of thing. So for LLMs in very high stakes situations like defense, financial services, healthcare, there's a kind of a move, I wouldn't say wholesale, but there's a growing movement to take the open weights models because they've got this level of control and you can then prove to the regulators what you're doing with them to a much, much higher, much higher level of granularity.

Demetrios [00:11:26]: And so I remember back in the day when we used to call it machine learning, there were some folks that would come on the podcast that were in the financial services world and they would talk about how every time they released a new model there was a whole rigmarole process that they had to go through to get it approved. And then there were things that they could do once a model was approved and it was, was able to go out into production but like if they needed to retrain it or things like that. But I'm wondering if and how that world has changed since the advent of LLMs.

Mike [00:12:13]: Not a lot for financial services. That's my, my understanding. I mean we've got two target markets, financial services and defense. And we're having early conversations with clients with in both of those sectors. The in the uk So I can speak more more specifically around the situation. In the uk There is, has been a tendency to lean on and use more linear systems like gradient boosted models and random forests and that kind of thing. And they've become very sophisticated at doing things like fraud and credit decisioning and that kind of stuff. But they are inherently more explainable and interpretable because they are not opaque models.

Mike [00:13:01]: So they're more linear. So basically there's been sort of a branch in that decision way back to say okay well the compliance tells us that we have to be able to stand behind the decisions to a certain level of understanding. So this whole class of models, like all deep learning is off the books because it just won't ever give us that. So they've gone this other way historically and have now very good and have made those models very, very, very sophisticated and do their job very well. But even, yeah, but there's now kind of a, I think a drive to sort of say okay well we're missing out here. We're leaving things on the table, leaving value on the table. If we had a better model then you know, we should be using it. So let's try and make that work and the regulations in that space.

Mike [00:13:48]: There's a. Sound like a bit of a geek now. SS1 23, which is the PRA bank of England prudential regulatory authority who basically put a whole piece around explainability and they're saying lime shap, input output kind of feature relevance type metrics aren't really good enough. You're getting calls really, you know, get a variety of correlation type insights but you're not getting causality. So you don't know where and when these things might. What's going on inside. So yeah, and because of that same argument around using language models, I think, I mean they, they are used but I think people, the risk teams are very cautious about where they're being deployed. Back office stuff, passing a load of PDFs and doing some structured output into some other part of the system kind of okay, you know, it's just kind of just admin type processing.

Mike [00:14:40]: But if it gets close to the regulatory sensitive areas like customers and bias, fairness, that sort of thing, then. Then these, they get more highly scrutinized.

Demetrios [00:14:53]: Yeah. I was fascinated in the financial sector back when LLMs came out. I spoke to one of the lead machine learning engineers at Angellist and he was telling me about how he was using them and he came on the podcast and he was like, dude, LLMs are amazing. They're much better than any of the models that I built myself. And it's way faster to get into production because I just have to hit an API versus me figuring out the whole platform side of things. But I was kind of looking at them like, oh, so are you just like how are you dealing with the compliance side of this? Because I know it's in the financial sector and it does feel like that is dangerous area. You're kind of like walking on thin ice the whole time.

Mike [00:15:50]: Yeah, yeah. There's a driver, I think a big driver is this sort of race dynamics is I guess boards and people who make investment decisions around it, they are in a race, aren't they, against the next guy and the next girl. You know, we're kind of. If you were overly prudent and you just didn't invest R D into this, you might end up. It could be a company killer. So the boards are really kind of, I think quite scared around. We need to be in the race, but we need to do it in such a Measured way that we're very aware of what the risks are and the compliance risks and just the business risks have fallen behind. So I mean, they've been recruiting, I think quite a lot, especially in the uk, a lot of financial services businesses been beefing up their kind of responsible AI, AI ethics AI committee type staff.

Mike [00:16:39]: Because historically it's not necessarily a kind of horizontal decision team. You've got your risk team, your business team, your IT or devs team or whatever. And the AI challenge in that sector kind of sits across all those things. They've kind of been recruiting to form forums which can get the expertise into shaping what do we invest in, what risks are we happy to take and what kind of goals are we setting and the progress against those goals versus, you know, vis a vis their competition.

Demetrios [00:17:17]: All right, y', all, real quick, let me talk to you about Hyperbolic's GPU cloud. It delivers Nvidia H1 hundreds at $1.69 per hour and H2 hundreds at $1.99 per hour. And this is with no sales calls, no long term commitments or hidden fees. You can spin up one GPU or scale to thousands in minutes with VMs, bare metal clusters and high speed networking. You've also got attachable storage and you only pay for what you use. Save up to 75% less than legacy providers. And oh yeah, by the way, you need steady production grade inference. Well, you can choose dedicated model hosting with single tenant GPUs and predictable performance without running your own hardware.

Demetrios [00:18:10]: Try now at app Hyperbolic AI. Let's get back into this show. Yeah, talk to me a little bit more about this stakeholder alignment and how you've been seeing technology help in that regard because it does feel like what you're trying to do is help bridge a people issue with tech in a way or just make people's lives easier to communicate against certain standards.

Mike [00:18:44]: Yeah, it's probably the biggest challenge we have I think is because there isn't an established sales process because it's not really what we're selling, is not really in a category yet. So we're chills trying to work out who are all of those key decision makers and how do we corral them into a decision making group. It's not easy. And I think the main, the main challenge is historically the business leads and the, you know, inside a business there's a business need. We want to invest in a system to make it happen. And they would deal with the tech team and then scope it out and they build it and Then later, or at some point later, the risk team would come along, GRC team, and start to, you know, be seen as the fun police, basically, and say, oh, no, you know, prove me to, you know, prove this model against these metrics. And then the scope of the project could shrink. Or there's a load of tech debt that builds up because you've got to start wrapping everything in, loads of additional tests and if you change any component.

Mike [00:19:49]: So I don't think historically the cultures between those teams have been like, let's just go down the pub for a couple of beers. At the end of the day, they kind of. That's my take on it anyway. So we're trying to. We're trying to work. How do we. How do we get into that? How do we get into that world and make these people love each other dearly? So what can we give both. Both parties something that is helpful to the other? That's basically been a strategy.

Mike [00:20:11]: So, like for. For the tech teams, we're trying to produce materials which explain in absolute, definite detail you will need eventually to make a system of this type comply with the following. And this is exactly the walkthrough in the regulations, which means why you've got to do that. That's the kind of the justification for including that particular thing in your testing regime or in the product, basically, or in the service. And then tell them, whilst it's all very boring, you're gonna. When you show that to your risk team, they're just gonna throw their arms around you. Because the problem that the risk team got is exactly the other way around. They come into the.

Mike [00:20:53]: And they don't really understand the nuance of the tech, but they do understand what those systems have to achieve. So we kind of give the other party the secret source, basically all the keys to the kingdom. And then when they both meet up, then they've got this shared ground. Even though they might not even kind of internalize all the details, they're armed with the right information to say, well, we put these two things together and then we're going to get where we need to go. Because I think the frustration with building stuff that doesn't get into production is nearly taken as taken for granted. Oh, yeah, we're working on something and we got X amount of projects on and this many are in prototype and this many got this far and this many got this far. And it's like a. Can we.

Mike [00:21:39]: Can we actually get it through and push it to production? So everything which doesn't make it, which has still Got a valid business case is, you know, some cost, isn't it? Apart from what you might learn all the way it's like kind of waste and redundancy. So if you can get rid as as much of that as possible and get your prototype to kind of production conversion rates, you know, higher, then everybody's going to be happy because you're, you know, as builders, you see all stuff you build go out in the world and do good things and you don't have it knocked down or re scoped as the project goes along because of this kind of ugly spiky compliance issue that keeps, keeps coming in.

Demetrios [00:22:20]: Well, it does feel like those two sides of the business or these different stakeholders are almost like at odds with each other and, and there's different incentives there and different goals. So they're going to have that friction. But I would love if you can walk me through since I've never been in one of these companies that is in a highly regulated space and putting AI into production in a company like that. What is it that the technical team now needs to do? I know you mentioned like having a certain set of tests understanding. I am guessing there's like special kinds of documentation that hit criteria. But can you give me like a concrete use case and then concrete needs that they would need to perform?

Mike [00:23:19]: Yeah. Okay. We've got a, a project which we're scoping at the moment in the defense sector which is for an autonomous surface vessel. Okay. So it's gonna slight sub the classes under 24 meters. So these are kind of in near shore doing survey. You know, whatever they're doing, they're out there, there's nobody out on board and they've got to not hit anything. That's the name thing they've got not hit.

Mike [00:23:48]: So to scope that project on day one, we have to understand what the domain of operation is. Once you spec that, you then spec what are the prevailing regulations that this system once it's been deployed needs to comply with. Of which there are several. There's the maritime and coast guards regulations which have got AI sections in them. There's the mod's internal version of that. There's rules of the sea, there's lots of. So all of these things are if there were people on that boat or on that vessel, they would have to comply with all these regulations. You take the people off and now there's it's autonomous, it still has to comply with all of these things.

Mike [00:24:36]: So sensing visibility help me understand because.

Demetrios [00:24:42]: It'S if there were people on the boat they would need to wear life preservers and they would need to have horns that they honk. But if it's autonomous, what is it on the level of compliance that they need to be doing besides just not hit something? I, I'm not, I can't, I'm trying to connect those dots, you know like the code needs to comply with all of these different regulatory bodies. Idea of what autonomous code should look like?

Mike [00:25:17]: No, it's more about what are the, what is the cape, what are the capabilities of the vessel, what's it trying to do. So then you specify its sort of domain of operations. So this, when we deploy this autonomous vessel, we are going to follow, operate within a certain area. We are going to report our status back to you know, some reporting station. Every X, whatever, seconds, minutes, whatever. We are going to alert other sea fairing vessels to our presence. We are going to scan through the following sensors, radar, lidar, sonar. We're going to ingest GPS data of other vessels.

Mike [00:25:59]: So it kind of you, you, you work out what it's going to do. So it's very much just like specking any other system. It just happens to be the system is a boat on the sea. And then once you know what your domain of operation is, you then isolate off the subparts of that system. So you say right for the bit which is going to be tracking the GPS of other vessels. Like what, what does that sub component of the system do? What are the autonomous bits and how do those autonomous bits relate to the prevailing regulations? So as doing all of that scoping work you need to do that right at the beginning is because it literally leads the rest of the project in this case because it's such a sort of safety and security focused project. So it's really that kind of mindset is that you have to do all of that scoping and translation of the regulations and then you map what your automated systems are doing. Now some of this stuff is all, you know, you've got lots of electronics, electronics on boats anyway.

Mike [00:26:57]: So there's these things are covered by safety cases and safety tests and certifications and inspections. So you're just mapping that kind of world onto what autonomous or AI or agentic type functionality within that. But that's the kind of world compared to like build something. Is it a good model? Does it, is it great at predicting A, B or C and then ship it? I just check it works all the time. When you get into these high risk scenarios it literally is a wholesale change, mindset change that you don't build A damn thing for at least the first third is all scoping, checking, compliance work and kind of scoping to fit the, to the rules. Yeah.

Demetrios [00:27:38]: Because you could end up burning a whole lot of money if you don't go that way.

Mike [00:27:46]: Yeah. And I suppose our, our business is predicated on, on the fact that whilst there aren't that many systems which get caught by the EU AI act at the moment it is the gold standard because people have been working on it since the late teens. There's thousands of experts that have kind of fed into this harmonized standards library sits below it that even if you're not caught by it, if you are taking a belt and kind of a gold copper bottomed approach, we don't want unknown unknowns cropping up in our risk as a business. So we want to ship the most responsible and trustworthy system we possibly can. Then you would need to hit these sorts of standards anyway. In which case it then starts to make the working practice more like I just explained, loads of scope and loads of checking. So our business is trying to take that pain away basically. It's like we can plug into all these different rules and regulations and then the metrics can map quickly to all of these, all these specifics.

Mike [00:28:49]: And then when you're building, you know that you're going to be compliant to these higher standards without all that pain. Cool.

Demetrios [00:28:56]: Talk to me more about the product then. Is it I come to the product and say I'm building an autonomous boat and what do I need to know about or how? Like how do I interface with the product itself so that it can give me the right regulation and map to the standards I need to hit on.

Mike [00:29:20]: Okay. So the, the first bit's easy, is the integration. We give you an SDK and that hooks into your deep learning model. So you've got to basically have your own open weights model that you can get into that you basically develop yourself or you've got access to an open weights model. So the SDK has got some boilerplate in there that just hooks in when you run that model. We then pull out the inference data. So that kind of raw vector style data which then depends on the deployment. So lots of these higher risk or high risk scenarios they don't want, you know, they want complete, they want their hands completely around the data governance piece.

Mike [00:30:03]: So no data, they want any, any data missing. So our product roadmap, we don't do this at the moment but it's on the roadmap is it's basically completely SaaS deployable solution. You put it in your world, on your client tenant, integrate it with your models. The data doesn't go anywhere, it just is processed by our kit in your world. But at the moment we're not there. So the inference data comes to us with that rule. Inference data. We then do a couple of data proprietary data transformations which get it.

Mike [00:30:33]: Basically it's information minimization is the first step. So we squeeze out all of the noise and hang onto the signal. So every time the model runs, we get an inference log for that particular output, store it. We then do this information minimization and then we end up with a trace, basically a causality trace of what's happened inside. And that's basically a representation of what's happened inside that deep learning model when it ran on that particular time. You can then run it, you know, in your test 10,000 times. Then we'll end up with 10,000 cases. And then we start to do post processing on those to analyze them to work out against certain metrics what is good and what is bad.

Mike [00:31:17]: So you would end up. The goal for a user is to define like a golden set in this, this system when we build it, normal and expected use looks like this. So inputs like this, we expect the outputs to look like this. That's gone through other types of accuracy testing you might have done. And once you're happy with that, then you'd run a load of test cases. We then you hook our kit in, we drag out the data and then we do different analysis over the top to find out whether they are robust, accurate, unbiased, whether the transparency standards are there, that kind of thing. So it depends what, what is the specific metric that we're testing against basically. And then that becomes your.

Mike [00:32:05]: Basically you end up with an in profile set and an out profile set. So normal use. Look at the trace. Fine. This one's fine because it's in profile. It's great. Let it go. Next one comes through.

Mike [00:32:16]: Right. There's something wrong here. What's that? Oh, let's say there was a prompt injection for a open weights LLM. There was something weird with the input that meant that the internal activations in the model just were squiffy. And then we can classify what squiffy looks like for certain things. And then we can say oh, that looks like as a prompt injection. So we can halt the process at that point. So in a live deployment we'd be able to gate that output because we say look, red flag, there's something up here because it's out Profile.

Mike [00:32:49]: Yeah, so that's the kind of data bit. And then We've got a SaaS front end platform which does dashboarding. Basically, you know, load up your models, configure your tests and then report on them in, you know, in graphical format and hang on to all the source logs. So that's how it works. And then your question was about how do you tune that to the specific needs of various regulations. We're building it to fit with all of those requirements for the EU AI act and the difference between those and let's say nist, which is the American AI Risk Management Framework. It's more, it's more. In that case, it's not the law.

Mike [00:33:30]: It's like recommended best practice, but there's so much crossover because they've had the same people work on. What are the principles here? Accuracy, safety, security, they all crop up in the same, in the same kind of ways. That's only a small tweak very often to, to make, you know, what, what is robustness in the, through the eyes of the EU AI Act? What is robustness through the eyes of the NIST AI rfm? They're very, very similar, so little tweaks. So these are kind of like a layer that sit on top. Yeah. So basically to summarize, there's modules like, if you want to comply with like, you know, the, the FCA in the UK that's a financial regulator, they got a whole sort of suite of your system must do this and must not do this. Basically we were building modules so that they, as a user, you just choose what you need and you just plug them in.

Demetrios [00:34:20]: Yeah, what, what I didn't understand is how you have all of this rich data and then you have this compliance or these modules as you're calling them, how do you map those two together or how do you connect those two worlds?

Mike [00:34:39]: Okay, so the thing around, like the bias as an example, the law, I'll paraphrase, but the law basically says when you do a risk assessment of your system, if you think that, so go back to the loan example, that your system could be biased against your basic human rights, which is you're being discriminated of because of your sex, gender, race, whatever, then that would, then that system would fall foul of the bias requirements in the EU AI Act. In practice, what it means is you have to build a system which can identify when bias might be creep into your system. So how do you do that? Well, if you're using a deep learning model as a classifier in some part of Your system that can be tested. So this is a separate part. You know, we can do this or the client can do it. You could use a fairly standard assessment to say, okay, is this potentially biased? Are there anything, is there cause for concern around this model as it's operating at the moment? If it's yes, then our systems can then go and do this internals analysis which then can report any event, any individual output from the model which is out of scope. So you have your in scope. These are all normal and unbiased because you've, you've got a golden set which we can kind of get a kind of a, like a, a thumbprint, you know what I mean? It's like a fingerprint of that.

Mike [00:36:05]: So we know the profile of what, what normal looks like. And then if we have that model run and do a loan application decision where compared to what the normal profile looks like around these features, yeah, we then have something which is out of scope that indicates that that model has now fallen foul. So you have to have this logging at that kind of granular level because your risk management of that system says we must be aware to be compliant with the act. We must be aware every and any time that the model could be biased. So the only way to do that is actually to get that information at inference time and make this calculation against whether it is potential, you know, whether it's in scope or out of scope. And if it's out of scope, then if that output goes as far as the customer, you have to save all that stuff. And because you never know when it's wrong or could be wrong, you've got to save the logs for every time that system runs in production. Every time, for every model.

Demetrios [00:37:06]: Because you need that audibility. Right. To backtrack.

Mike [00:37:09]: Yeah, exactly. So having that chain of custody all the way back down to the internal activations inside a model is necessary because you will have internal audit, which is like your third line of defense in financial services. This example, who are like, they have nothing to do essentially with the operations of the business. They walk in internal police. Right. We're going to test your systems. They need to get access to this stuff. External regulators need it for either doing investigations or you report this kind of stuff to them, usually in aggregate or, and probably most pertinently, any end user who's a citizen who gets an output of one of these systems has got the cause for contestability and redress.

Mike [00:37:51]: So if they think, I'm not sure this is right, I feel something's wrong here. I like to Ask a question. So the comparison with GDPR would be a subject access request. Send me all the data you have on me because, you know, I just don't like what you got. And then you can have, you can ask them to remove it. You've got the right of removal under gdpr. So in the EU AI act the same thing. You can contest it.

Mike [00:38:18]: And to contest it, the only way to contest it is to get all the information and have it presented back to you in an explainable format. So it's no good as a, as a bank who's denied me a loan, sending me a load of logs with vector, vector matrices or whatever and go, there you are. That's why, that's why we could give you the loan. You have to then build systems which explain in that particular case what were the features in the model, what was their particular, you know, characteristics of them as the, the input data, basically why it was right or wrong. So you have to build our load of explainability kit. There's no, no good just having the raw information. You have to then take it to this explainability level. Wow.

Mike [00:39:00]: Yeah. So the log, the login is like the, is where it all starts. You have to be able to grab or grab everything, otherwise you can do everything else. And because you never know when something might be wrong, you got to save it.

Demetrios [00:39:11]: Are you creating different data sets or more data or less data, depending on the use case or that module that you're talking about? Are you logging different things? Are you doing it differently? Or is it that you just capture all the data you possibly can and then depending on the module you're applying it to one or the other.

Mike [00:39:33]: So you can do. So as long as you, as long as we save those traces like you get the raw inference data, that's kind of step one. Then we do this data transformation into our own proprietary kind of data structures. Then we save those so we don't need to do the next bit if the client doesn't want that. If they say all we want to do here is have a complete system of record. So every time the model runs, we just want save it. And we know that if we get a complaint from a customer or a regulator, puts a request in to see xyz, we can then pull that from storage, then we can run the analysis over it. Yeah, so they don't necessarily have to run all of this stuff in real time as like a monitoring thing.

Mike [00:40:11]: It can be saved, stored. I mean, this is a decision for the client in line with their risk management approach. Really? So yeah, you could like the belt and braces approach is that you would have everything monitored live with reporting on all of these sensitive areas live and gating the answers live. That's a huge overhead and big engineering challenge to make that work. Or you take a risk based approach.

Demetrios [00:40:36]: Why is it a huge engineering overhead?

Mike [00:40:39]: Just. Well, you might have models that are running, I don't know, hundreds of thousand times a day. So the amount of data which is being collected and then processed. No, that would, that'd be great for us because it's a huge amount of work and therefore a very, very big contract. So yeah, it's not, it's not insurmountable. I think, you know, you have to optimize quite a lot of the data stuff because otherwise it would just become overwhelming quite quickly. So that's part of our, part of our value proposition I guess is be able to make it more manageable than it would be if you tried to build something like that yourself.

Demetrios [00:41:13]: Dude, this is fascinating to me because it is a whole world that I do not play in at all. And I know living in Europe I just hear people complaining about the EU AI act because back in the day it's like, oh, we're not going to open source models to Europe, that's just a pain in our ass. And so we didn't get, I think like llama3 when it came out it wasn't open sourced in Europe or something. And so I have seen that side of it. I haven't seen the in depth side of it that you're talking to me about right now.

Mike [00:41:52]: Yeah, which I feel like the assumption, you know, I'm coming for like a free marketeer and I always think like, you know, put regulations anywhere that just pain and you want to just let the market decide. But I kind of think that because this technology, this current wave of AI has is hit like society and has become a very immediate thing that they think about is like the socio technical thing. You know, it's not just technology and it's not just society. These two things are kind of blending together quite tightly and quite quickly that in that space it's probably it the regulations are just reflecting or they should just reflect what everybody wants. Right. I don't want to be interacting with a system which is going to be difficult to understand and could be unfair to me or could disadvantage me or if I wanted to ask why I got an answer from it, it was unable to do that. I just think it's common sense that whether you call it regulations or whether you say this is just the expectations of stakeholders like the users and the systems owners and the companies and all the rest of it, then we should aspire to be able to make these things work like deterministic software systems. It's really hard because they're not non deterministic and you will live in a world of probabilities really.

Mike [00:43:08]: But we should be able to not shy away from that as a challenge. That's what we're doing. Our company exists, we're trying to attack that black box issue and try and make it as transparent and explainable and as kind of deterministic as possible. So that if you have this trust that's then kind of accepted like oh, we can prove things to everybody, then the adoption rates for this technology speed up. And that compounds what we might be in danger of now is like having all this promise and all this investment. But because you can't cross this trust bridge, we're all, everybody's on one side being a little bit cautious about it. And then you don't get the kind of widespread adoption and then you can't compound the benefits which should be amazing for medical research. And all the big problems of the world could be attacked, not necessarily solved quickly, but you can make some great progress on lots of them.

Mike [00:43:59]: But you're not going to if you can't bring everybody with you. So you need this trust to get in there somehow. And what we do is only a tiny part of that. There's lots of other things around AI literacy and other policies like that that will help. But yeah, so I don't see, I don't see the regulations as a barrier. If you get it right, they're there reflecting what good practice should be anyway. So you know, think of it like that these are the expectations of people, of good systems that integrate with society. So get that right.

Mike [00:44:29]: Then we should, we should be able to move along a bit quicker.

Demetrios [00:44:34]: Yeah, we can't have no hacks building our systems around here.

Mike [00:44:40]: No shortcuts, right?

Demetrios [00:44:42]: Nope. Well, dude, this has been great. Is there anything that you wanted to talk about that I didn't ask you about yet?

Mike [00:44:52]: No, I really enjoyed that. I think that was like at the right kind of level in terms of, you know, we're deeply technical. So my CO founder, the 10 year AI research career, PhD solely focused on AI trust, reasoning, explainability. So he's kind of been in, deep, deep, deep in that world for, for a long time. So we could have done more kind of technical bits but really I wanted to have the opportunity to kind of fly the flag for why regulations and compliance is a good thing and that, that it's, it's basically a false force multiplier. For good systems, if done right, it should be seen as like kind of a water. Try and hop over or dig underneath or get around the sides.

+ Read More

Watch More

Racing the Playhead: Real-time Model Inference in a Video Streaming Environment
Posted May 09, 2022 | Views 936
# Infrastructure
# Video Machine Learning
# Runway ML
# RunwayML.com
The 7 Lines of Code You Need to Run Faster Real-time Inference
Posted Mar 10, 2023 | Views 740
# Enterprises
# ML Approach Adoption
# ML Solution Development and Operations
How to Systematically Test and Evaluate Your LLMs Apps
Posted Oct 18, 2024 | Views 15.2K
# LLMs
# Engineering best practices
# Comet ML
Code of Conduct