MLOps Community
+00:00 GMT
Sign in or Join the community to continue

ML and AI as Distinct Control Systems in Heavy Industrial Settings

Posted Jun 25, 2024 | Views 123
# ML
Richard Howes
CTO @ Metaformed

Richard Howes is a dedicated engineer who is passionate about control systems whether it be embedded systems, industrial automation, or AI/ML in a business application. All of these systems require a robust control philosophy that outlines the system, its environment, and how the controller should function within it. Richard has a bachelor's of Electrical Engineering from the University of Victoria where he specialized in industrial automation and embedded systems. Richard is primarily focused on the heavy industrial sectors like energy generation, oil & gas, pulp/paper, forestry, real estate, and manufacturing. He works on both physical process control and business process optimization using the control philosophy principles as a guiding star.

Richard has been working with industrial systems for over 10 years designing, commissioning, operating, and maintaining automated systems. For the last 5 years, Richard has been investing time into the data and data science-related disciplines bringing the physical process as close as possible to the business taking advantage of disparate data sets throughout the organization. Now with the age of AI upon us, he is focusing on integrating this technology safely, reliably, and with distinct organizational goals and ROI.

+ Read More

How can we balance the need for safety, reliability, and robustness with the extreme pace of technology advancement in heavy industry? The key to unlocking the full potential of data will be to have a mixture of experts both from an AI and human perspective to validate anything from a simple KPI to a Generative AI Assistant guiding operators throughout their day. The data generated by heavy industries like agriculture, oil & gas, forestry, real estate, civil infrastructure, and manufacturing is underutilized and struggles to keep up with the latest and greatest - and for good reason. They provide the shelter we live and work in, the food we eat, and the energy to propel society forward. Compared to the pace of AI innovation they move slowly, have extreme consequences for failure, and typically involve a significant workforce. During this discussion, we will outline the data ready to be utilized by ML, AI, and data products in general as well as some considerations for creating new data products for these heavy industries. To account for complexity and uniqueness throughout the organization it is critical to engage operational staff, ensure safety is considered from all angles, and build adaptable ETL needed to bring the data to a usable state.

+ Read More

Join us at our first in-person conference today all about AI Quality:

Richard Howes [00:00:00]: My name's Richard. I work for CBRE right now as a product manager. I focus on data science products primarily, and my coffee is usually taken black after a long run. So hit you hard with the caffeine at the beginning of the day.

Demetrios [00:00:18]: Hey, everybody. We're back for another ML Ops community podcast episode. I'm your host, Demetrios. I'm talking with Richard today. We took a few of the learnings that he has had over the years working in the heavy industry and tried to apply them to how we can build more reliable systems with AI and ML at the core. We went on so many tangents, I had a blast. But one huge takeaway is his idea of control systems and how we want to make sure that when we are working with a data scientist and they go and they do their exploratory data analysis, you give them all of the most important information that they will need. You likened it to context stuffing of an LLM, really? Because you are less likely to fail when whoever it is that you're working with, the other key stakeholder in this case, the data scientist, understands thoroughly what they are trying to accomplish.

Demetrios [00:01:36]: They're going to be much better at getting that outcome if they understand what it is. We've heard it time and time again, but he breaks it down eloquently. Let's get into it with Richard. And as always, if you like this one, share it with one friend or write us on Spotify. They can write in stuff now, and I'm reading all of them. Some of them are pretty funny, so feel free to comment. And I got thick skin. Say whatever you want.

Demetrios [00:02:15]: So, dude, you've got a violin back there. What's the story with that? Because, you know.

Richard Howes [00:02:24]: Yeah, I, uh, I picked it up, like, a year ago. Um, I needed to get up off the desk, right? And my daughter, she's always playing the guitar and the harmonica, so naturally started jamming children's songs to start. But it's a good start. You get the basics down.

Demetrios [00:02:45]: The violin goes so nicely with the guitar. That's not the first instrument that I would expect. You're like, hey, my daughter's jamming the guitar. You know what we need in that Bob dylan song that she likes to play is the violin solo.

Richard Howes [00:03:03]: Yeah, totally. Let's grab a. Yeah.

Demetrios [00:03:08]: Yeah. Well, maybe. Yeah, maybe you're more on the bluegrass side of things. I guess in that style of music, you can get a fair bit of violin.

Richard Howes [00:03:18]: That's where I'd like to get into. I'm not quite there yet. They're pretty, pretty complicated compared to where my skill level is, but it's pretty fun around the campfire. You just background a few little things, you know.

Demetrios [00:03:31]: Yep, busted out. That's so good. Well, man, you're working on some pretty wild stuff with AI and ML in heavy industries. Can you break down what the state of AI and ML is right now and what you've been working on?

Richard Howes [00:03:47]: Yeah. So there's a ton of stuff going on in heavy industries. And by heavy industries, I mean, like oil and gas, forestry, manufacturing, those things with lots and lots of equipment in there. And you see anything from the embedded type devices, like Iot doing fancy measurement of, you know, sounds to monitor safety in a workplace to the enterprise application, which, you know, we've seen lots of chatbots, we've seen a lot of, you know, Q and a reg systems to try and help us understand our data a little better. But the common ones out there, you know, I think almost everybody's kind of doing them. They're like preventative maintenance. Let's like, try to predict before machinery needs to fail. Right.

Richard Howes [00:04:33]: QA QC processes around compliance for the regulatory bodies. Like, that's kind of a complicated endeavor, especially with energy and emissions and carbon reduction. The regulations change frequently, so keeping on top of that's challenging.

Demetrios [00:04:50]: Wait, what does that mean? It's basically like you're using. How does Mlai get incorporated into that QA?

Richard Howes [00:05:00]: So, you know, first, regulations. Regulations come from legislation, right? Like from some government body, and then the regulators are usually third party government agencies who regulate the industry say, like boiler pressures and safety vessel kind of situation. You have the auditors coming in to see if you've done the maintenance and check out your procedures and stuff like that. So it's kind of difficult with the rapidly changing industries, like carbon emissions, to keep up with policy changes. So using a simple rag bot to help you understand the regulations and maybe queue up changes, kind of helps you advise on what to do next for your policy or procedures. You know, the change management in these industries is pretty substantial. So when changes come down the line, it's usually a big to do. Right.

Richard Howes [00:05:56]: We have a whole team of people managing that for us, so it's important to get it down to the people who are doing the work as soon as possible. So that's the QA QC kind of thing. Quality checking, all of the compliance and stuff.

Demetrios [00:06:14]: You also mentioned embedded systems. I mean, the predictive maintenance I've heard a ton about before, because it can be so useful when you're working with machines that cost more than a yacht.

Richard Howes [00:06:27]: Yeah. And the coordination of them. Right. Like, so if you had a, like a gas plant, for example, and you're gonna do a turnaround in June, you're trying to plan what kind of stuff needs to come out and what kind of stuff needs to go back in. You want to really time your equipment replacements and updates that one time when the equipment is down for the year. Right. If you mess that up, your plan is going to be down again, and then you're losing money. So you really want to keep on top of what needs to be replaced when and keep track of the patterns on failure.

Richard Howes [00:07:03]: Right. Most of that stuff comes from the control systems themselves, actually. So get a sense of what the performance of the equipment is.

Demetrios [00:07:13]: Can you explain control system for myself and maybe some people that are listening that might not know what it is.

Richard Howes [00:07:21]: Yeah. You see them every day. The most basic ones, like control the temperature of your room. Right. You got the thermostat, you got the, you know, element, heating element, maybe the baseboard, optimize the temperature to 20 c, and, you know, cycle the element on and off to get to the right temperature. That's one system. But typically in like a processing facility, there's tens or hundreds of those working together to give you some output. So it's kind of modular in the design and heavy industries.

Richard Howes [00:07:59]: And it's like that for a reason, because then you can test each of those control loops individually to make sure that they're working.

Demetrios [00:08:07]: One thing that fascinates me with these types of use cases is that it's almost like you mentioned, you're using AI from this point in time. It's not like it's a continuous prediction. It is, as you mentioned, we have all of this equipment that's coming offline. We need to predict which one or five or ten need to stay with us and we need to replace and the other ones that need to go out. But if I'm understanding it correctly, it's not like every day you're predicting this one might go offline tomorrow. So you don't need to build your infrastructure and your system in a way that it's constantly making predictions. Is that correct?

Richard Howes [00:08:52]: Yeah. Usually, you know, preventative maintenance is a batch permission process. You know, whether you're doing it monthly or annually, there's stuff that you do every day, too. But, you know, when you're scheduling a big retrofit or overhaul to make sure your system's running as you expect it to be, that's usually done once a year. Hopefully just once a year, you don't want to be going down too much. But yeah, it's really important to nail down the critical components and their failure points just to make sure you're operating at an optimal space within your processes. Pretty substantial investment and just taking things down because you're going to be checking all the safety systems at the same time. I haven't seen an application there with machine learning and AI to help you identify what the critical components are.

Richard Howes [00:09:44]: Am I missing any? But I imagine that'll come.

Demetrios [00:09:48]: So that is the predictive maintenance on the batch schedule. You're feeding in all this stuff, trying to figure out the anomalies, you're trying to know what has happened in the past and then just decide if something needs to be upgraded or not. You also mentioned, though, there are some embedded systems. What does that look like?

Richard Howes [00:10:11]: That comes from vendors typically, like the companies that I work for. We typically wouldn't develop our own embedded systems, which would be an IoT device, a sensor. You know, if you bought a turbine, those would come with embedded systems to measure the performance of that machine. So they're getting smarter and smarter. I mean, embedded controllers themselves have AI embedded right in their architecture now. So it's, it's kind of getting out of control, not out of control. It's just interesting to see the development over time. Right.

Richard Howes [00:10:47]: Because I've seen anything from vacuum tube relay systems to fully automated dispersed systems over a giant geographical area now. So the change in control systems themselves is quite extravagant. In the last 20 ish years, 30.

Demetrios [00:11:10]: Years, when we had Paco on here and he was talking about the heavy industry and how he was consulting with a company that was in this field, and I was asking him about the different use cases, and he said that by far the most valuable things that they were doing with AI were ingesting PDF's and then being able to understand the paperwork better. And he was like, it's much more contrary to what you would think. Yeah, you would think that, yeah, this predictive maintenance, or maybe there's this tiny ML going on and you're doing some cool stuff with the machinery, but it was more valuable to get all of these PDF's ingested and then get some computer vision on them than it was to have crazy stuff going on with the big machinery. And I'm guessing that's because the machinery is very expensive and it's very high risk if you're doing anything on there that could potentially mess up. Is that how you look at it?

Richard Howes [00:12:14]: Um, sort of. I think it's reliability is a big one and we're hitting that one pretty hard with the quality drum lately. It's, we need to know what the control systems are and the heavy industry has a pretty decent way to document these systems so that everybody understands it. Right. Like the electrician, the plant operator, the plant manager, they can look at the set of documentation they've developed and understand the system for their use.

Demetrios [00:12:45]: Nice.

Richard Howes [00:12:45]: So we use that a lot to kind of get into the control side. But I think you hit the nail on the head with the non equipment related stuff. The differentiator in I think heavy industry specifically is going to be your data, right? The technology is going to be out there available for others to buy and use and whatever, there's open source stuff that we can all use. The models are being produced by other people who we have no impact in. So the best thing we can do in heavy industry is just mine. Understand and store that data somewhere so that we can use it later. Once these technologies have developed to a point where they are reliable and we can just start rolling them out trusting that they'll work, regulations will probably come in and help us with that a bit too.

Demetrios [00:13:38]: And what do you feel like are some challenges of getting good data in these industries? And specifically, which types of data are you talking about and what are the challenges?

Richard Howes [00:13:52]: Well, the challenges are, like you said, the PDF's. We've got everybody, I think, in every company has the mountains of Sharepoint sites, the distributed folders all over the place, the stuff on desktops and whatnot. So reining that, that's going to be where the gold mine is. It's everywhere. Everybody's got their little source of data on their computer or in a Sharepoint site someplace. So getting that into a usable format like a knowledge graph, I've seen pretty slick applications of been recently playing around with neo four j. They have a lot of stuff that gets you going real quick. So the quicker we can kind of adopt these newer techniques, the faster we're going to be able to implement and test the AI use cases.

Richard Howes [00:14:41]: But in terms of the equipment side of things, the assets and the equipment is like the number one piece of data that you need, right? So you might have this big basque plant with control systems all over the place, but if you don't know where to get the manual for this or that or how to look something up quickly because you're troubleshooting a plant issue, you're going to lose production time. So knowing your assets, knowing where to access the information to solve your problem right now, that's the number one gap. I don't think too many people have like the giant repository of equipment manuals sitting around. Like, I remember back in the day operating plants, you'd have like a stack of manuals that big. But that was just like the six or seven key pieces, right. So getting access to those is probably the biggest challenge.

Demetrios [00:15:36]: And that feels like it's such an easy fix, right? And so me naively saying it's easy, I imagine if there are people that are listening that this is their day in, day out, they're like, yeah, easy, right? Because you would think, okay, you just put it in some kind of a database and you're able to search for that. But what makes it hard? Why can you not get those manuals and then just like put them in one spot?

Richard Howes [00:16:04]: Just a few housekeeping things really? Right. So we all have, when something is built, you have like the pre construction drawings. Once it's built, those are as built drawings. Sometimes you don't update your pre built with your as built, so you know exactly what you installed and where and how. Right. Like how that whole thing went down. So keeping up with your documentation is a challenge. And usually, to be honest, we just open your phone, Google the piece of equipment which requires you to look at the serial number, the equipment manufacturer and things like that.

Richard Howes [00:16:41]: So it's kind of cumbersome and there are some solutions. I've seen barcodes or simple things like that, but I haven't seen anything fancy yet with say computer vision or any even repository of this information. I haven't seen that yet. Be a good pet project.

Demetrios [00:17:00]: Wow. Yeah, yeah, it sounds very familiar. Like, I think I've seen something on people being able to take a video of a machine and then they get a walkthrough or they get told like what it is, but that feels also like a bit of a Twitter demo. Like, it sounds cool in practice or in theory, but then in practice it falls down and you don't actually have that. And then you end up, like you said, you have downtime, which costs a lot of money. And so you better make sure it's working if you're trying to use this new fancy tech.

Richard Howes [00:17:41]: That's what I mean by augment. People's are like the operators, whoever's current work. Like if we can give the operation staff, the maintenance staff, access to any of the manuals, the maintenance information about a particular piece of equipment, we could probably get to a solution quicker. And those downtime, quick hustle times of the day. You could say the same thing about our data and AI world too, right? Like, how many people know about all the different vendors out there, what they can do for you? And if you have one, like now, you have to go to their website and decrypt their, their docs and stuff like that. You had a, you had a fella on just recently about docs and like, I liked what he was throwing down. Make it quick and easy, man. What are the things the people who are reading this doc need to know? Let's get it to him quick so they can move on with their day.

Demetrios [00:18:40]: Yeah, yeah. I give them the dopamine hit of feeling like they've done something. And that is fascinating to think about, how with software development specifically, you have a whole team that is keeping like, the upkeep of the software. And so you've got this whole team on the hardware side of things too, and you have the same piece of it. There's still documentation. It's just since it is probably a little bit more hardware like physical world, you can get caught in this space where the documentation is more in the physical world and less easily searchable.

Richard Howes [00:19:24]: I think some pretty interesting things we're going to see in the near future is the modeling behind these systems to help us understand them a little better. I think we might have briefly mentioned it offline, but they're made up. All these control systems are made up of blocks of PiD controllers. They're just a controller similar to your thermostat, right. Simple, so that we can test them modular, so that we can port them from one place to another. But when you have a bunch of these together, we understand how these systems work from an engineering perspective. You get the mechanical engineer and the petroleum engineer, and they can understand how the refinery works. But are there any hidden issues with the thing we built? I think AI can maybe help us identify the system degradation over time, or control systems that aren't performing as well.

Richard Howes [00:20:16]: I think it might be really good at that. Although we're not quite doing that with AI right now. We're all about reasoning and understanding, getting some text back. You know, if we can kind of think of these systems as maybe a different language, we can get them to help us understand them, like the physical world a little better.

Demetrios [00:20:39]: So this is fascinating. And how would you go about that?

Richard Howes [00:20:44]: There's a few papers out there already that I think they were back in 2020. I forget the name of them, actually. Sorry. But they were using these traditional Pid loops to see if we can understand. Is this system performing good or not. And what would be some mitigations to fix it? You know, up the parameters, the weights in this control pid loop, and they were quite successful on, like, a small scale. But getting that to, like, the plant level to understand all of the interconnectedness, that's where it fell short and didn't work. So, you know, is there a solution? I think there is.

Richard Howes [00:21:26]: We as humans can do this. So, I mean, over time, I think we'll get there.

Demetrios [00:21:32]: Yeah. You have to have such a deep understanding of the plant and how everything works together. And so it's not like just one model for one little piece, that's one movement. It is ideally all of the movements together, that it's conducting an orchestra, not just like playing the violin.

Richard Howes [00:21:57]: Basically. Yeah. Basically, yeah. Totally. And that's where all of the key documentation comes into play. They have this diagram that everybody kind of knows about. It's called a piping and instrumentation diagram. And on there is your whole plant and every single control loop or system or sensor, and it clearly articulates exactly how that thing is working.

Richard Howes [00:22:24]: It's useful just to make sure that the whole team, like the maintenance staff, the management, the operators, they all understand it, especially the controls people, the automation folks, you need everybody to work together to make sure these systems work well. And that diagram is one that kind of glues all of them together. And I haven't seen a similar diagram in the AI or software space as much. Something that almost every stakeholder who's doing something on this system can relate to, to kind of help them integrate into the system itself. What am I doing? What am I offering this thing?

Demetrios [00:23:08]: Oh, I see where you're taking this. Hold on, let me see if I understand this, because there's a huge jump there that I like where you're going with it. It's basically saying in the plant, everyone speaks the same language, which is this diagram. Everyone knows if there's something going on, they can refer to the diagram and check out at least how the system works and where the electrical parts are, where the plumbing is, whatever it may be that is pertinent to their concerns and their needs. But you want to generalize that type of a diagram and say, if we have these MLAI use cases, where's that diagram for what? Everyone, because there are so many stakeholders in these MLAi use cases. So you have the whatever, the DevOps team, the Devsecops team, you have the platform team or the data scientists, you have the ethicists or the whatever regulation, and everyone has their questions. Everyone wants to understand how the system works. And so you can have one of those types of roadmaps.

Demetrios [00:24:25]: Not necessarily roadmaps, just diagrams on the system.

Richard Howes [00:24:29]: Yeah, the end to end system. What's going in, what's going out and what happens in between. It's like our typical ETL thing, right? I love ETL for that. You can generalize it to anything, but the key point to get that diagram is to understand what you're optimizing at each stage. So it's essentially your boxy diagram. Everybody knows boxy diagrams. It just has standards wrapped around it. What means what.

Richard Howes [00:24:56]: So we do need that end to end thing and we need each group, each person in the group, to understand what they're optimizing for. What's my objectives? And have those clearly articulated in the diagram so that others can see and contribute to it. And it identifies dependencies. If we're trying to troubleshoot the system, it allows us to have a common point to reference and be like, yeah, it's failing. There's. This is the potential reasons why.

Demetrios [00:25:25]: Wow, I had never seen something like that. I do like it a lot. I wonder if it. The hard part is, how technically detailed do you make it for someone who is a business stakeholder?

Richard Howes [00:25:39]: The business stakeholder would have their own set of inputs, I would think, right? Like they'd be probably sponsoring the program, wouldn't they? They're the ones that are hopefully helping us identify the outputs at least. Like, what are my expected outputs? Do I need inference to happen right now in a couple seconds or under a second? Or can it wait? Is going to be done in batch? What should it look like? What are the, you know, it can help us understand. Just like a gas plant, you don't want to be putting out crude oil at the output. You want something refined that I could put in my gas tank. So that's where the business stakeholder would come in. It's like at the higher level, what's going in, what's going out? You know, are you guys overcooking this solution or not? So each stakeholder would have an input, not necessarily a technical one, but they would still have something and they should be able to understand, you know, at least the different stages and about what's going on in there. I would expect.

Demetrios [00:26:39]: It feels like what you're, what you're talking about is much more conclusive or it is much more detailed because you have what the business stakeholders are looking for for the output. You also have the technical side of things and how that looks, but you're able to have a strong representation of the whole system, especially if it's a complex system. If it's one of these, you know, just make an LLM API call and then you're good. It may not be needed, but when it is a very complex system, you want to have a deep, deep understanding and have a strong diagram that everyone can, everyone can reference.

Richard Howes [00:27:25]: Yeah. Especially because, well, I hope every single one of these products that we build, well, collectively, we. But they have business impact. Right. If something goes wrong, it's not going to be good. You know, somebody you could have risk to everything, you know, society, environment, debt, profit, all of that. So having a diagram like that will help you identify the critical control components. Right.

Richard Howes [00:27:51]: Like during your mlops or ll ops processes. What are these things that we're looking at that would indicate a failure of some kind and, you know, back to your business stakeholder, what would they have input over? Well, those would be numbers front and center that we're really concerned about. If these conditions go over their prescribed values or whatever, we have an emergency situation, meaning we have to either like, revert to a different version, do some triage. I mean, you know, we've probably all been there, done that. But knowing what those control points are having a process that is helping you through that, like an emergency response plan. Right. We have those in heavy industry for every single potential case. In the same way, we probably should have something like that for these products that are being delivered.

Richard Howes [00:28:45]: Right. Mass volume. We need to know right away and what to do about it. And everybody needs to know what to do or what's being done. At least.

Demetrios [00:28:56]: It's like this leveling field where it's clear, just as you said, with the factories, you have, the electricians, you have anybody that is trying to figure out what's going on, they can reference this diagram. And so you could potentially have something like that with your AI project or your AI product. And anyone who is trying to figure out what's going on with it or what it looks like under the hood, they can reference this like one source of truth, hopefully.

Richard Howes [00:29:30]: Yeah, yeah. And those are usually, you know, you might say that's the docs that, you know, you typically find for a product. It might be some of the artifacts that are being developed throughout the development and planning and stuff like that. Stuff that the product manager, engineering teams might build. You might be like, oh, we have these already. But, you know, if you were to get, say, this was some regulation, like you were regulated to just like you were in a plant to know your data flows, to know how much information you're pushing through a given pipeline to mitigate risk. Right. I don't think many people would be able to answer those questions in like, a succinct manner.

Richard Howes [00:30:11]: Right. One after another. What's, what's first, what's second? What's third, what's fourth? In order to understand that, you need the whole team in the room, whereas if you have a diagram like this with proper sops on what to do, it's already covered as part of our compliance and risk mitigation plan, is just to have this. So everybody has a common understanding what's going on around here, because just some feedback from colleagues, kind of informally now. Nobody's really sure what AI and ML is really doing, right? We're taking data from someplace, got a model, cool. We put it through the model, we got this output, cool. And then they, you know, they can help wrap a process around it to make a good decision, but it's still a black box for the most part. And if you can bring comfort to them, you know, being like, I did my due diligence, I got these diagrams, I got, what does a shutdown look like? What happened? What are the catastrophic things that can happen potentially, if you can show and demonstrate that you've thought these things through, we got some risk mitigation in place.

Richard Howes [00:31:19]: Here's how it's being done. Get a little bit more buy in from that perspective. And I'm mostly focusing on that because of the industries that I'm in, right? Like, there's risk if we replace a million dollar chiller and we didn't need to. Right? Like, we do that pro or preventative maintenance, trying to predict it, and it's wrong and we just trust it. Well, not great.

Demetrios [00:31:46]: I do like the idea of how you're saying that anyone, you could basically go to anyone and ask what's going on here, and they should be able to know because you have that diagram and you have these, like, standard operating procedures, and also you're able to identify what the key metrics are that everyone cares about and what the metrics are that you're going to be watchful of. So in case something does go wrong, you know that. All right, our metrics, if they're starting to show these numbers, we need the escape route. We need to figure out, like, what are we doing, rolling back, what, what's going on? It's not just some kind of a call to the engineer, the one engineer that built the system in the middle of the night and it's like, hey, things aren't working. And then that one single point of failure, right, can be a liability for sure. So yeah, I like this organizational piece because it shows the maturity and also it gets you the ability to champion for things with a little bit more transparency within your organization. If you are trying to bring some use case to see the light of day.

Richard Howes [00:33:14]: Yeah, totally. If you can start with these kind of basic things, you can be pretty vendor agnostic when you're choosing the actual solution, right? Like if you're at the very beginning stages of some use case, you have an awesome idea for your company. You're going to draw it up, try to get buy in and sell that idea so we can start getting it funded and build it succinctly. Being able to relay this kind of information brings comfort to people. It's like, yeah, I'm going to give you a bunch of money to build something. I kind of trust you more with this money because you've kind of thought through what could happen and what should happen. And that's kind of common in our product development lifecycle. We do all sorts of requirements gathering and things like that.

Richard Howes [00:34:02]: But when you bring in the engineering folks, the data scientists, and they can also back you up from a common point of view, it kind of gives you a little bit more backing. Right. It kind of shows that we're kind of together on this. We all kind of understand it and where it could fail.

Demetrios [00:34:19]: Yeah. Get everyone on the same page. I am fully on board with that.

Demetrios [00:34:23]: Alright, real fast. I want to tell you about the sponsor of today's episode, AWS, Trainium and Inferentia. Are you stuck in the performance cost tradeoff when training and deploying your generative AI applications? Well, good news is you're not alone. Look no further than AWS's Trainium and Inferentia, the optimal infrastructure for generative AI. AWS, tranium and inferencia provide high performance compute infrastructure for large scale training and inference with LLMs and diffusion models. And here's the kicker. You can save up to 50% on training costs and up to 40% on inference costs. That is 50 with a five and a zero.

Demetrios [00:35:10]: Whoa, that's kind of a big number. Get started today by using your existing code in frameworks such as Pytorch and Tensorflow. That is AWS, Trainium and Inferencia. Check it out. And now let's get back into the show.

Demetrios [00:35:26]: There is another direction that I wanted to take this because you mentioned stuff about like satellite imagery and external statistical data and market trends and working with that, I know it's a hard left that were taken from the organizational piece and the system diagrams and this product management. But can you go down that rabbit hole for a minute?

Richard Howes [00:35:53]: That's a lot of information. Hey, like satellite imagery. Well, not just satellite imagery, but what's on the top, but what's underneath. We do like seismic, their seismic exploration, and there's lidar data for the forestry industry and things like that. So being able to understand the environment, those are all your inputs for these heavy industries, right? Like the raw resources. And typically we've done spot studies, do a survey from here to there to try to understand if there's opportunity for, say, logging or oil exploration or whatever. I haven't personally built anything along the lines of satellite imagery, but I have some colleagues that sure have, and they get rate excited about it just because you can understand things so much faster, right? Having your typical analyst kind of reviewing pictures or videos or just core samples and stuff like that, it's laborious and boring. You know how much coffee we have to drink to get that done? But pushing it through some AI applications in machine learning, like especially vision, we can kind of understand these things at scale and at least get closer to the intended outcome versus before.

Richard Howes [00:37:11]: We're just augmenting people in general.

Demetrios [00:37:14]: Yeah. So this is, these are use cases where you're trying to figure out if a certain site, you can do x on it, and maybe it's logging, maybe it is building something, maybe it can be putting up a wind turbine if. And so you're getting all this data. And normally the human would have to manually look through all this data to figure out if it is a viable place to go logging or to build a wind turbine. But now this use case is another one of those high value use cases because you're able to augment the human's ability with machine learning. Is that what I'm understanding?

Richard Howes [00:38:03]: Yeah, pretty much. It's essentially anywhere where there's like copious amount of information that's kind of a bear to deal with, like the controls data we were talking about earlier, or, you know, hectares and hectares of, you know, lidar about a forest, what kind of species are there? Is there any disease? You can help learn more about the forest and its kind of ecosystem just by firing a few algorithms through and interpreting it on the other side with subject matter experts who can also validate the output of this machine learning and kind of augment it with their professional opinions and whatnot. So it's not like we're removing any of the professionals. We need them. They're the ones with the stamps usually. Right. Like this is a quality assessment. Here's the checks and balances.

Richard Howes [00:38:55]: We've done all that sort of thing. It's just allowing us to do a lot more, a lot quicker.

Demetrios [00:39:00]: And from what it sounds like to me is that a lot of this stuff that is happening, it's not necessarily one off, but it's not like it requires a lot of infrastructure behind it for the technology side and like the ML side. Am I wrong in saying that?

Richard Howes [00:39:22]: Not a lot of infrastructure? You mean like we're doing, you know, batch here or there? It's not a continuous kind of integration situation. Is that what you're getting?

Demetrios [00:39:32]: Yeah, exactly. Yeah, yeah. You're not needing to figure out like the serving, low latency serving. You kind of are doing it almost all offline, and that makes it a little bit easier to deal with.

Richard Howes [00:39:48]: Yeah, totally. There's some like, real time drilling applications in like, the oil and gas industry that kind of help help them navigate drilling to the reservoir in like a low impact kind of manner. I know of a few applications using AI directly on the drill heads to. To measure the pressures and temperatures and figure out what. What's going on down there. Is it what I think it is? You know, kind of deal?

Demetrios [00:40:14]: And so in those cases, that is 100%, like just signal data that's coming from sign in a hole. And, yeah, it's trying to put together like you're seeing it on a screen, but you're also, you've got a sidekick that's saying like, yeah, all looks good to me, or maybe you might want to reassess.

Richard Howes [00:40:38]: Yeah, yeah, exactly. Yeah. Most of them, like AI ML, use cases that I've seen quite a big value revolve around the assets themselves. Right. Like the turbine that comes with the AI, the Iot sensors that allow us to, you know, understand safety and whatnot. The enterprise ones, which, you know, you find typical of prioritizing capital investments or the project management assistant kind of thing. Those are all kind of. They're less interesting to me.

Richard Howes [00:41:13]: They have a bit of impact. But where the real fruits going to be coming from in the near future is the assets, the equipment. How do we make these things work better than they have in the past? Or how can we augment the people maintaining and using them to do it more effectively, efficiently protect our planet, protect the people, that kind of thing.

Demetrios [00:41:34]: Dude, this is so cool to think about. What is something that I did not ask you yet that you wanted to talk about?

Richard Howes [00:41:44]: There's a few lessons that I've learned from these heavy industries, and they're mostly back office type things, but they've helped me out of some pretty big problems. When you're first formulating products or trying to find the best use case, really, right? A lot of times, my academic partners, or the super smart folks, the data scientists, they know all the algorithms and whatnot, feeding them as much. It's like an LLM, right? You want to feed as much contacts as you possibly could fit into that window to get it, give you the best answer, right? So I always try to feed, feed the people. I'm working with as much information about the system, the business process or whatever as possible. And the concept of a control philosophy has really hit me hard lately. So in these big gas plants or facilities or whatever, you have a control philosophy. It's just a document. It's like a requirements document kind of thing, but it outlines your operating principles.

Richard Howes [00:42:50]: A control narrative, like what's happening from start to finish kind of thing, has all the diagrams in there, like that piping and instrumentation diagram we talked about earlier, has, what do we do if things go wrong in there? You know, all sorts of stuff about scalability and data quality, like help them understand what's important from the business perspective. Because, you know, when you send the data scientists off to do EDA on some database, that database might have hundreds of tables and so much stuff, how are they supposed to decipher it all? We going to run some algorithms to find out what's important? No. So I like to bring the control philosophy to the table to help narrow that scope. And here's your optimization objectives. This is what we're trying to achieve. And then let them loose on it as well. So they start adding to it so that we have this body of knowledge about a specific problem. And I've used it lots of times.

Richard Howes [00:43:49]: Sometimes that particular idea is successful and we start doing something about it. And other times the data scientists or engineers are like, no, we can't even get data for that. Or you're kind of out to lunch with some of these control narrative that's not going to happen. Getting from a to b is not possible. How are we supposed to extract all of this data from, like, a remote station when there's no connectivity there? I mean, it doesn't make any sense. So bringing that control philosophy to the table, that just kind of narrows the scope in, narrows the understanding in for the team, kind of set it, set me up for success quite often.

Demetrios [00:44:32]: So at the end of the day, you want to make sure that whoever it is you're working with, the data scientists, let's say, in this case, they understand as much as they can about the problem, so that when they do create the solution, they're creating the proper solution. And this goes back to, I think, something that I've heard a ton on here, but I typically reference one of the first conversations we had with Dan Sullivan on how he mentioned he spent, I think, two weeks or three weeks in one of his jobs, one of his early jobs as a data scientist, trying to improve the FYI score, only to emerge from this cave three weeks later and find out that the whole solution that he proposed, it didn't matter if it was 100% accurate or if the f one score was whatever, because the solution itself was so far off the mark that it's. It wasn't valuable. And so you're trying to. I like this idea of giving them as much context, just basically context stuffing in their context window and making sure that they understand, here's the most important pieces, and this is what would be a huge outcome for us. This is what we're trying to get to. And they can also stress test any of your assumptions on, oh, well, this data we don't have, or this data is not clean, or this data we just can't get even if we wanted to. Those types of things.

Richard Howes [00:46:12]: Yeah, exactly. It allows you to challenge each other's ideas pretty easily. That's kind of the whole point of that piping and instrumentation diagram. It allows you to debate amongst the professionals what is the best, and get people to have a say at the table from a common point of view, because I found it's pretty relative. A lot of these business problems, sometimes they're based on experience, sometimes you're brand new, you're coming into it fresh. So you have a different perspective than the guy who's been there for 20 years doing that process. So if you're gonna have both of those guys at the table, they better have a common viewpoint, like something they can reference and be like, yeah, we need to change this or that so they don't feel like they're battling each other. It's just.

Richard Howes [00:46:56]: It's just this document. We're just trying to get this document as accurate as possible, and then we can build. It's real easy to pull the trigger and start building, but building in the wrong direction will cost you time, money if we put it into production customers, probably, yeah.

Demetrios [00:47:15]: Trust. Yeah, trust with your customers for sure. So this is the control philosophies. What else did you want to talk about?

Richard Howes [00:47:22]: I've been interested in this quality conference that you're kicking off, and, you know, we see regulations popping up here and there around data and AI and privacy and all this stuff. I'd like to dig a little bit down into quality, I guess. You know, are we concerned just about the AI, just about the ML, just about the data platform? AI quality is really about, like, the outcomes, isn't it?

Demetrios [00:47:55]: Yep, exactly. That's what. And so I think, for me, I don't know if you've been feeling this, but data quality has become something very popular in the data engineering space because there's been so much pain around it for the data engineers. Every data engineer I think in existence has had to go and try and debug some kind of a dashboard, or they've been approached by an analytics engineer and they say, like, hey, what's going on with my data here? This dashboard isn't working. And then the data engineer has to go and figure out why that is. And that's a data quality issue, so it resonates pretty well when it comes to data engineers. Now, AI quality is something that, for me, it feels like if people haven't had experience with it yet, they will soon. Especially when we're putting all these random agents into production, or we're putting all these rag chat bots into production, and you hear the horror stories and you're trying as much as you can to make sure that you don't land on the front page of the Internet, but how do you make sure that you can properly de risk that scenario? Right? And so that, for me, is the AI quality piece and ML is this.

Demetrios [00:49:25]: It's in the same boat. Right? And a lot of this is organizational being able to, like, you're talking about, like, explain to stakeholders, if we implement this type of technology, we are at risk in these different ways because of whatever it may be like, beyond just hallucinations. It can be that some nefarious actor is trying to do some prompt injections and.

Richard Howes [00:50:00]: Yeah, exactly.

Demetrios [00:50:02]: Ready for that.

Richard Howes [00:50:03]: Yeah. Or you don't understand the problem completely, and now the input isn't what you expected, and it's not really even working. I kind of liked John Cook's approach with knowledge graphs applied to the business processes so that the engineers and the technical people can get access to that data, as well as all the other business benefits of a knowledge graph for your business. I think things like that, we'll see creeping up more and more as we see compliance being a real thing or even risk to the business outcomes be a thing. They're going to look for solutions to quantify and document their business because like I said before, the data is the differentiator. You know, you might have this technology or that model or whatever, but the data is your secret sauce for your business. It's going to be interesting to see how it levels the playing field. But if you don't document those well enough, how can you expect your AI and ML to operate on top of your business process, which might be fragmented Excel sheets, right? So getting that somewhere is, I think, the key to quality here.

Demetrios [00:51:13]: It's so funny you mentioned that. And it reminds me of a piece, something that I saw somewhere online that was referencing how you've got this rag chatbot talk with your data use case, but it turns out your data is absolute shit. Like the documentation hasn't been updated in years. So the chatbot is basically worthless.

Richard Howes [00:51:45]: Yeah, yeah, it's kind of cool. I think it's actually a real plus to have these things kind of show up and be in our face because like, been fighting for a long time on data quality and sometimes the business doesn't want the data quality to be improved. Right? I don't know how many people I've had this experience before where you do this analysis, you're like, oh, cool, we can save a lot of money if we do x or y. You take it to the business and they're like, yeah, but we're just going to put that under the carpet and not talk about it again, right? So, you know, having these things come up front and center, I think is kind of important. It makes us ask these questions of like, oh, yeah, I guess we do need to do that analysis, or we do need to document that data about whatever process is being used by this machine learning algorithm so that we can feed it good data.

Demetrios [00:52:35]: What a great point.

Richard Howes [00:52:36]: Because it's so in your, yeah, so in your face. Like if you're going to try to get an ML algorithm to classify something into say, four classes or something, but the input is always changing. We're always getting numbers, letters, anybody can put in whatever, you put an essay in there if you want to, we're just going to end up with garbage. So understanding these things and getting the business on board to go back to that whatever platform is creating that data and be like, no, we need four things in there. And now we don't need this ML algorithm to do this classification anymore because we are on top of our data quality.

Demetrios [00:53:14]: Yeah. You don't need the band aid downstream.

Richard Howes [00:53:17]: Yeah, the band aid, yeah. We're using ML and AI as a band aid sometimes, but in a positive light, I guess. Right. If it brings together the conversation and an ultimate, you know, resolution of that data quality problem, I think that's a plus.

Demetrios [00:53:31]: One of the reasons why I am so bullish on data engineers and I am so empathetic to the data engineering job in general, because, man, they got it hard. They, yeah. Definitely have some job security, that is for sure.

Richard Howes [00:53:50]: Yeah, just a bit. It's like, well, I got all your data, so what are you gonna do? But, but they also have enough because how it goes. Yeah. Where it goes, where flows, who's using it. But like the problem with that is usually they don't own the data. Right. They have no control over a source, applications way to process or gather information. Right.

Richard Howes [00:54:17]: So a lot of the times it's the please and thank you process. Work with me kindly, please don't change your attributes crazily or your schema or whatever. Right? Yep. Poor data engineer.

Demetrios [00:54:32]: And if you do. Yeah. Oh man, you're preaching to the choir. That's why we are making data appreciate data Engineering appreciation day. I think I mentioned this once online, it definitely needs to be a thing if it is not, but go and say thank you to your data engineers that you're working with, because they just get, they get hammered, man.

Richard Howes [00:54:57]: Yeah. If you report on KPI's or use reports or whatever you're doing, just thank a data engineer.

Demetrios [00:55:05]: Yeah.

Richard Howes [00:55:06]: At the end of the day, decisions matter.

Demetrios [00:55:08]: You touched Ada. Yeah, I like that. Well, dude, Richard, this has been awesome, man. Thank you for coming on here and explaining this idea of control philosophies and the standard operating procedures. Also just the big diagrams. Maybe some people are putting it into practice already, and if they are or they're doing something similar to this, just calling it a different name. I would love to hear from those people and how they have been finding it. It feels like it is something that a very mature data AI ML team would be doing.

Demetrios [00:55:47]: So I hope that somebody's doing it and those people that are doing it would love to hear from them.

Richard Howes [00:55:54]: Yeah, me too. It's just such a fundamental practice in physical engineering of machines and stuff like that, that I think that we could learn something from that in this data and AI space, because we also want to build reliable systems. Really reliable systems. That's a good way to do it.

Demetrios [00:56:12]: That's a great one. Awesome, dude. Well, thanks for coming on here.

Richard Howes [00:56:17]: Yeah, thank you so much.

Demetrios [00:56:18]: The valley close?

+ Read More

Watch More

Technical Debt in ML Systems
Posted May 06, 2024 | Views 240
# ML Systems
# Technical Debt
# King
DevOps, Security, and Observability in ML
Posted Jul 21, 2022 | Views 715
# DevOps
# Security
# Observability
The Future of AI and ML in Process Automation
Posted Nov 22, 2021 | Views 448
# Scaling
# Interview
# Indico Data