How to Build Production-Ready AI Models for Manufacturing // [Exclusive] LatticeFlow Roundtable
Pavol earned his PhD at ETH Zurich, specializing in machine learning, symbolic AI, synthesis, and programming languages. His groundbreaking research earned him the prestigious Facebook Fellowship in 2017, representing the sole European recipient, along with the Romberg Grant in 2016.
Following his doctorate, Pavol's passion for ensuring the safety and reliability of deep learning models led to the founding of LatticeFlow. Building on a more than a decade of research, Pavol and a dynamic team of researchers at LatticeFlow developed a platform that equips companies with the tools to deliver robust and high-performance AI models, utilizing automatic diagnosis and improvement of data and models.
Deploying AI models in manufacturing involves navigating several technical challenges such as costly data acquisition, class imbalances, data shifts, leakage, and model degradation over time. How can you uncover the causes of model failures and prevent them effectively? This discussion covers practical solutions and advanced techniques to build resilient, safe, and high-performing AI systems in the manufacturing industry.
Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/(url)
Demetrios [00:00:07]: What is up, good people of earth? How y'all doing out there? I am very excited to be with you today. We are going to be talking all about how to build production ready AI models for manufacturing, something that has been coming up over and over and over again when I talk to people about AI use cases. And so hopefully, by the end of this, you're going to have some of your questions answered, but maybe you're going to have some food for thought, and you're going to be chewing on some of these questions for a few days afterwards. And I want to bring out our esteemed guesses guests right now. But before we do that, I've got a few announcements on the mlops community side of the house. In case you didn't know, we do have a podcast that I recommend you all listen to. We're gonna be putting stuff like this and other great episodes onto the feed. You can check it out.
Demetrios [00:01:06]: There's a little QR code for you that I'll throw up in the left hand corner. But the most fun that we've been doing is, oh, here's a better, there's a better QR code for you. So hopefully you can check out the podcast there. But what I have the most fun with is presenting you the merch that we've got. We've got shirts that say I hallucinate more than chat. GPT, get yours now. It is unlimited edition run. And with that, we're going to bring out our guests.
Demetrios [00:01:41]: Let me get that off. Hit that QR code if you want to check out the shirts. Otherwise, if you want something a little more tame, you can go with. My prompts are my special sauce. We've got all kinds of fun merch out there. And last announcement before we get rolling with this roundtable session is on June 21. June 25, June 25 in San Francisco. We are having our first in person conference.
Demetrios [00:02:12]: If you want to join, hit me up. Let me know. It's going to be San Francisco, June 25. Will also be streaming one of the tracks onto this platform that you are on right now. So we will make sure that you are all updated. Enough of the announcements. We've got a lot of ground to cover, and I want to get right to it and bring out our guests. Let's get Pavel out here.
Demetrios [00:02:44]: Where you at? And we've also, we're also joined today by Mohan and Aniket. Where you all at? Hey, there they are. What's going on, guys?
Pavol Bielik [00:02:53]: Hello. Hey.
Demetrios [00:02:55]: Normally I do guitar intros, but this time I didn't because I thought the QR code of our merch was funny enough to have some fun. No? So let's jump into the topic at hand. I'll lead us through. I'll kind of guide us and guide the conversation but I am by no means an expert in this topic and so I'm here to learn from you all. I would love to start with a little bit. Oh I just realized we're missing our fourth to make it a foursome. There he is. Hey.
Demetrios [00:03:33]: Sorry, I didn't mean to keep you out. Keep.
Jürgen Weichenberger [00:03:35]: No worries.
Demetrios [00:03:37]: So let's start with a little bit of like ten second intros so everybody can know who you are and what you're doing and why you are on this virtual roundtable stage. And Mohan, since you're big on my screen right now, you can start it off.
Mohan Mahadevan [00:03:56]: Thank you. Thank you. It's a pleasure to be here. First of all, thank you for having me. So in background I've been working in the computer vision machine learning AI space for the last 25 years. About 19 of those were in the manufacturing industry, in the semiconductor manufacturing industry and other than that I've also worked in robotics, fintech and Insuretech. And yeah, that in a nutshell is me.
Pavol Bielik [00:04:24]: So hi everybody, my name is Paolo Pavol. A bit less experienced than Mohan. Not 25 years but we've been working I would say probably more than ten years on the topics of safety, robustness of AI models, more from the academic background I did teach. This is where I did my PhD after which we started company with two other professors that tries to bring some of the kind of research and knowledge we accumulated out in practice to help actually deploy these things in a safe way.
Jürgen Weichenberger [00:04:57]: Okay. Hi. So yeah.
Jürgen Weichenberger [00:04:59]: Jürgen Weichenberger.
Jürgen Weichenberger [00:05:00]: I'm part of Schnard Electrics AI hub where I'm responsible for strategy innovation. I'm hanging around in the AI field about like Mohan for the last 25 years. And also we're starting in the space of generate DVI and actually lots of work in lvms, large vision models but also doing lots in classically say today you would have said old school machine learning stuff but it's still important to have it. And yeah, my teams were working on next generation solutions like teaching GPT time and space.
Aniket Singh [00:05:36]: Hello, I'm Aniket. I work as a vision engineer, ML engineer at LTM cells which is one of the EV battery manufacturing plant here in Ohio. I don't have much experience like everybody else here but I did my masters last year and been working for almost a year.
Demetrios [00:06:00]: Well fellas, when it comes to industry and heavy industries. AI and ML use cases are, they're very prevalent, but I always want to know more about them. I think there are some common use cases that you have probably seen and some uncommon use cases that I want to hear about. So, Anakin, do you want to start us off with what some of the use cases that you've been playing around with are?
Aniket Singh [00:06:30]: Sure. So, I mean, most of the use cases for us, I mean, there are multiple spaces where we use AI, but especially for us, we use it for battery inspection. We do the electrode inspection actually made. It's in the. So we do surface level inspection, dimension level inspection, but for surface, we use the AI for the most part.
Demetrios [00:06:55]: All right, then. What about you, Juergen? How you been playing with it? Besides teaching time and space, which sounds like a song, I mean, I gotta be honest with you, that does sounds like something out of a SCi-Fi novel too. But what are, what are some other use cases?
Jürgen Weichenberger [00:07:13]: Battery health, as Ankit said, plays a big role for Schneider, as when we are manufacturing upss for data centers. So keeping this stuff alive is a super important thing. So understanding the cells health during the life, how you can manage, charge and recharge. So it is a huge area of business for us, but other big business areas, as it comes to heavy industries, is how do you control carbon, carbon emissions, carbon reduction. So working on carbon digital twins, and then incorporating this with process optimization and electrification logics and machine learning programs, there's a lot to gain. And we find out there's lots more to understand because of people don't even know where all of the GHG is coming from. And so this is two big areas for us. Battery health, sustainability, ght emissions, where we do lots of work.
Mohan Mahadevan [00:08:15]: My view is a bit different. So, Aniket, his view is in the manufacturing of the cells. And Jurgen, you were talking of what happens during the monitoring of cells as they're used in these data centers and their light span. My experience has been more in the semiconductor manufacturing industry, where chips are made. And so there are two pieces to this. AI is used first in the development of the process itself, because that's a very, very difficult journey, and that tends to be the hardest set of problems that we solve to help the TSMCs and the intels and the Samsungs of the world build their process, to figure out what will work when they manufacture their next gen nodes at the critical dimension nodes, as Moore's law scales. And then the other part, which is volume production, when once a process is optimized. Then you go into volume production and there you're doing much more of a monitoring of the process.
Mohan Mahadevan [00:09:19]: So you sample some and you make sure that in both of these cases where AI comes in is to discover known knowns, which means that you have some kinds of defects and you know that you can live with some statistical process control of how many defects per wafer or per cell or per hundred cells or whatever. Then you've got the known unknowns and then you've got the unknown unknowns. And so you've got to be able to be able to capture all of these things in the process so that the right actions can be taken in the manufacturing process.
Demetrios [00:09:54]: Anything else you want to add before we jump to the first question?
Mohan Mahadevan [00:09:59]: Oh, I think there are a lot of nuances and subtleties that we can unpack in some of these disciplines, but happy to jump on and get into the details.
Demetrios [00:10:12]: Cool. Very cool. Well, let's get your take on this panel and maybe we can go into some of the challenges that you've seen. As you worked with many different use cases and many different customers in this field. What are they struggling with?
Pavol Bielik [00:10:31]: Yeah, I think maybe to follow up on Mohan. So what we try to do at lattice flow is kind of be in this part of the manufacturing process. So this is like once there is already a production line and then you would want to set up a quality control on top to try to figure out what are the known, known unknowns as well as the unknown unknowns. And this is one of the challenge where there is a process associated with kind of figuring it out. And this is at this point very manual. So you would rely on domain experts that have both the machine learning expertise as well as the domain knowledge. And this is important because often when we work with some of the kind of applications, you would for example do preventive maintenance, you would look at the defects and if you show them to me, I have really no clue like you would show. And this is how it looks correct.
Pavol Bielik [00:11:21]: This is how it looks incorrect and I just don't see the difference. And there are people who have ten years of experience who can see it from the image or maybe actually we cannot even solve it from one signal. You would need multiple signals put together and this is all the information you need also at the design phase to be even able to design such a system.
Demetrios [00:11:41]: Okay, so the main thing that I'm wondering about when you have these different use cases, and I appreciate that each one of you has a very specific piece. I know that depending on the machine learning use cases that you do have and what you are doing with AI, sometimes you're really trying to optimize for certain pieces, like latency. And if you're using LLMs, you can't really do that. If you are, however, trying to do like predictive maintenance and figure out which piece needs to go off of the construction line or, sorry, like the line, the conveyor line, I can't think of the word right now. But you don't necessarily need really low latency for that either. So what are some of these constraints and things that you are optimizing for as you're thinking through your AI problems? Are there different pieces that you're saying, all right, well, we're, when we think about trade offs, we think about trade offs like this, and maybe you're gonna, I'll throw this one over to you first, and whoever wants to jump in afterwards can see it heavily.
Jürgen Weichenberger [00:12:54]: Depends on the application. When we have ultra low to a zero latency applications today, specifically in computer vision related use cases where we talking total time to make a decision, 50 milliseconds inference time for AI sub 20 milliseconds. Because what many people don't understand is it takes time to acquire an image or multiple images, they need to be interpreted. You need to send signals to control systems. Control systems needs to actuate stuff so people, when they hear 50 milliseconds. Oh yeah, 50 milliseconds for AI. No, we don't have. So we have ten to 20 milliseconds.
Jürgen Weichenberger [00:13:39]: So it puts you in a whole world of pain to figure out the pure hardware setup for that. So that is one big area. Then when it comes to other scenarios where time is of less importance, it's accuracy. How precisely can you predict scenarios? How accurately can you make distinctions between, this is still within the realm of acceptable and this is where it starts to be not acceptable? We also see lots of safety use cases where the safety of human beings plays an enormous bad area. So somebody enters an area where he's not supposed to be. If he doesn't leave the zone within a given time frame, you have to stop production. And it's for people, looks like stop production is not a big deal. I hold it, get the guy out, throw the switch, and it runs again.
Jürgen Weichenberger [00:14:37]: And most processes, it's not that way. It's not, you just flip the switch and it runs again. It's one of my best examples. Take a paper machine, those things are between four and 600 meters long. And if you have to stop that machine, it's not flipping a switch, you have to clean it, restart it, it's hours to get it and running. So these are applications now where different types of accuracy plays an immediate role. And then there's other applications where time is of essence. And then when we going into the third big area around generative solutions, what it becomes there is can you manage the quantity or the amount of hallucinations.
Jürgen Weichenberger [00:15:25]: So how accurate, how little hallucinations you actually produce when you create content. And I think Mohan was speaking about the semiconductor industry. We have, for example, a use case around generating lithography, which is super important because this is literally how you run all the stuff on the chip and you want to make it as small as possible and so on. So you want to hallucinate, ideally zero. And this is three big areas, which different implications, which really bother us today.
Mohan Mahadevan [00:16:01]: Yeah, Joergen makes some excellent points and I could just expand on one or two of those points. So in a manufacturing process, if you've got too many false positives, you'll stop a line and it may trigger other processes like Jurgen was saying, which if you trigger too often, it's very expensive, and yet if you miss certain manufacturing defects, you can create a massive loss of material because you manufacture defective material. So it's really walking that tightrope. And here's where the initial process of figuring out the statistical process control of where those limits are. So one thing I've learned is there's nothing called zero or 100. It's all a question of how many errors you can live with, how often do you do them? And figuring out statistical process controls in the manufacturing process. So to Juergens point, there's a cost on both sides and it's a fine line that we have to walk and we have to set that up at the point where the process is developed and then make sure that that is followed throughout as the process is run in time.
Demetrios [00:17:19]: That's brilliant to think about, especially when it comes to this idea of it's not black or white, you have a lot of gray area, there's no zero or 100. And so you have to be thinking through these different trade offs. And if you're stopping the production line, that's great, especially if you make sure that there's not faulty products that are going out. But if you're stopping it too much, that's not great. And you have to really answer to people who might be a little more hawkish on, hey, let's just, let's stop it. But then what you're going to say in is people's lives are on the line. So you can't be playing loose, fast and loose with somebody who is, who's got their life there. So next up that I wanted to, I wanted to just dive into like the idea of going from proof of concept to production and maybe Anakit, can you talk to us about some of your experiences? If you've had any of like you've seen it in as a PoC and then when you wanted to take something to production, how is that and what are some challenges that you've had there?
Aniket Singh [00:18:29]: This is commonly one of the biggest challenge, actually proof of concept, because in especially battery industry, you don't want to deploy any ML models right away just because you think it's working, because it takes months to make sure that your model is robust enough to be deployed on the line, because it's not just about millions of dollars that you're going to lose in material, but also it's too risky to leak anything to the cars that those batteries are going to go to. So this is one of the biggest challenge that I think we do have. So we spend a lot of time in the proof of concept and also doing like secondary inspections instead of actually doing a live inspection before deploying the model in the life for making actual.
Demetrios [00:19:21]: Judgments and is there certain infrastructure? And I know one of the big questions that people have is around the teams and how the platform looks, what kind of teams are responsible for the models when they go out. Once you get that green light, there's so many different stakeholders that are involved in these questions. How do you make sure that everyone is on board and everybody understands what's going on with these AI models that are going out?
Aniket Singh [00:19:53]: First of all, in our use case, when we do the proof of concept and when we know that we are ready, we have to let the production team know that this new model is going in place and then they take care of their part, who they need to confirm, like they need to notify that this change is happening. But even while the change is happening, it's usually not as a primary change because there is another model that's actually working to make sure that the secondary model is there. And then we compare the results. So like, our analytics team would be the one comparing to make sure that, okay, this is actually doing a better job than the current model. So now it's time to get it on the live. So then we have to notify the people who need to know it. And there is a chain of command. Of course, it does take some time.
Aniket Singh [00:20:49]: But it's very important that we go through that step.
Demetrios [00:20:53]: Yeah, along that. And there's a great question coming through in the chat from Richard. There's also an awesome one from Hattie that I'm going to ask in 1 minute, but the one from Richard is perfect. I want to throw this to everyone. Are we looking at purely AI ML solutions or hybrid where the legacy hardware and techniques are used in sync with AI? How does AI integrate with what is in play already? Safety and emergency response is what I'm thinking about primarily.
Aniket Singh [00:21:24]: I can talk a little bit about this. So not everything is AI. AI is used, but we also use rule based system, which is the OG of inspection. Still a lot of like dimensions. We have to stay with rule based methods. So and in many cases we use rule based on and then AI after that, or both models at once doing the judgment because rule based is pretty fast. Usually we don't have inference speed issues with rule based. So sometimes both models are working and then we are using judgment of both of these models.
Aniket Singh [00:22:07]: Sometimes they're using being used as a combination. So the let's say there is a surface defect that's visible to the rule based model. It's then sent to the AI model to confirm, okay, it is actually a defect or not, instead of just sending everything to AI because that's going to be time consuming. And second, it's going to be very difficult to send everything to AI.
Jürgen Weichenberger [00:22:35]: Yeah. So there's two class, two groups of faults. Yeah. So the not everything has and must be AI. It's not here. Just a beauty contest to just strict everything. So we are using lots of sensors or intelligent sensors which make things easy when you can, can have little variables you put on workers in. Literally when they are entering zones they're not supposed to enter, they just get notified.
Jürgen Weichenberger [00:23:02]: And that's very effective and especially also low error rates to Mohan's words previously. And the other thing we are doing now is to marry the, let's call it first principle on natural world with the data. I will spin physics informed neural networks. It was a natural evolution step because if you just go purely data, what these algorithms do is to optimize and they always try to find a maximum. The reality shows when you show this, let's see this optimized solution to an engineer, he's going to say to you, wow, this is a perfect solution, but I will never going to do it because I have zero margin for error. The critical thing is how you teach systems margin for error because this is what clever engineer says, he said, I leave myself always a bit of wiggle room in case something goes a little bit off track. We're not having immediately disaster, we just keep it going. And this is what it is in many processes when you look space.
Jürgen Weichenberger [00:24:06]: Take Elon Musk's fourth launch. Yes, the thing was falling halfway apart on reentry, but it still had enough margin for error to get it to a successful splashdown in the Indian Ocean. And this is what view up. Database systems don't understand. They don't understand the concept of.
Demetrios [00:24:30]: This is a fascinating idea, because it's like in software, you have a lot of wiggle room. This type of scenario, when you're in the industry, you don't have that wiggle room, and you can't do things where you have the zero margin for error. Is that what I'm understanding?
Jürgen Weichenberger [00:24:49]: So, there's very little environments where you truly have zero margin for error. So every engineer tries to give himself a space where he can operate in, but this space is often differently defined, or does not necessarily cohere with. You run an optimization algorithm which say, I'm trying to climb the mountain as high as I can. So they would say, if you take Mount Everest, would they go to 8843 meters? No, they would say, for me, seven and a half thousand is good enough. If I'm sticking around there, I'm safe, and I can manage it. If I'm going to the top, my margin for error is literally zero. And seeing is, in many, many industries, it's exactly what they're managing the margin for error to keep the process going, because continuity in production is more valuable than short periods of peak production and then long, prolonged periods of time where you are not running at all, because you break something. And that's what these guys are factoring in.
Jürgen Weichenberger [00:26:03]: And this is, teaching this machines isn't an easy task.
Pavol Bielik [00:26:08]: Yeah, maybe just to follow up on Jurgen, I think this is an excellent point, because essentially, exactly, if you just take the data, you throw it in, you take the latest, greatest optimization algorithms, they just don't have this notion of optimizing and adding the error for the margin. And not only that, but, but it's also, these algorithms are optimized for the average case, for the expectation over the data set. But this is, again, not something that you want to optimize for. You actually want to optimize for the corner cases that will lead to production line stopping. And this is where the margin of error is really important. And this is where you now, again, need to go. And actually start thinking about how do I maybe change the algorithm? How do I add this margin of error for these things to actually work well in practice? And maybe one more thing to also follow up on what Anik has said in the original question. So it is definitely often these kind of systems, of systems where the ML algorithm is like one piece that maybe has actually replaced some original algorithm that now can be optimized more.
Pavol Bielik [00:27:15]: And this is also something that often, when going from POC production, not always is kind of so clear, because then you get problems like, okay, if I change this algorithm in the future, now I have other things that depend on it, and maybe the other parts actually learn to kind of manage the mistakes the AI algorithm makes. And now maybe I improve my AI algorithm, and actually the whole system breaks even though my AI algorithm is better. And that's because it's just not this like one small box, it's a bigger system.
Demetrios [00:27:48]: Yeah, that is. Yeah. It's like you're taking one step forward, but two steps back in a way by making that algorithm better. So you have to be careful.
Pavol Bielik [00:27:57]: Yes. And then this also goes back to like, how are we making it better again? Is it like, is it strictly improving over what we have previously such that we really avoid these regressions? Or is it this, on average, better? If I look at this traditional metrics, how we evaluate these models from purely machine learning perspective.
Demetrios [00:28:19]: So the chat is awesome right now. I got to give a shout out to everyone that is asking questions. Saeed, what's up? Assad is in the chat asking about, in the realm of quality control, how can AI systems be trained to handle the vast variability in defect types and severity seen across semiconductor wafers and EV components?
Jürgen Weichenberger [00:28:45]: And as a manufacturing company, Manchinada has 200 factories. So we manufacture a lot of our components ourselves. So we have encountered what I call predicting the unknown, because things can fail mathematically in many, many, many ways, which we can never have, where we never have enough data to train anything to predict it. So we made a conscious decision a while ago where we said, we forget the failure pattern training because it's never going to work. You will encounter every day something new which you haven't seen, and you'll be just retraining, retraining, retraining all the time. So we said, okay, let's train the algorithms on how the ideal state is supposed to be says, okay, this is perfect. And Pavel mentioned it earlier, it's not always easy to define this, but saying, okay, this is the ideal state. This is how it should look like everything within a certain margin of area which does not look like this is bad.
Jürgen Weichenberger [00:29:48]: And the advantage of that approach then is you don't need to know the beds beforehand. You can either retro label them. So if you encounter something you've never seen, you can label it and add it to your foundation. So the next time you would not only create, okay, you have deviation from good, you could also say, yeah, we have this in this case, and this triggers this and this kind of fix. But the advantage of this is it keeps the system stable and robust, and you can deal with all sorts of forms of abnormalities which you have never encountered before. So that's the take we took in. Since we switched this. We also found we have way lesser issues with label training, examples, reference cases, because the amount of good production data we have outperforms the bad data by, I don't know, a million to one or even more.
Jürgen Weichenberger [00:30:47]: So that is how we literally switched and tackled the challenge of predicting the unknown.
Mohan Mahadevan [00:30:53]: Yeah, so it sort of boils down to a system design problem, as Jurgen mentioned. The points Jurgen mentioned are very important. So if you try to map out all the failure modes, then I think it's an impossible task, because there's always going to be something new that pops up, some variation of something that you haven't seen before or what have you. And machine learning models, unfortunately, as Al mentioned, they are function mappings, which means that they learn distributions, and the long tails of those distributions are very, very poorly learned because there isn't data there for them to learn. So really, this is about system design. You've got to build what we call anomaly detectors, that just detect, as Jurgen mentioned, from the ideal state, the deviation. There's far anomalies that are easy to detect. Any kind of simple detector is going to detect that.
Mohan Mahadevan [00:31:48]: It's the near anomalies to the boundaries of the statistical process control that become really hard. And that's where, that's where the challenge lies. And that's where you need to build a system designed quite carefully. If you do that well, then, and you have the right feedback loop within the manufacturing system, that means that anything that you detect in that window, you must get it QCD right, and you must be able to qc it with root cause. So you not only QCD what you detected, but also you're able to root cause that to the manufacturing process itself. And then you can close that loop. Then you can maintain a stable process as you manufacture. Nonetheless, having said this, surprises pop in now and then, and it leads to those late nights where everyone has got to get up and get going to figure out what the hell happened here.
Pavol Bielik [00:32:40]: Right.
Mohan Mahadevan [00:32:40]: And so that happens every now and then in these high stakes manufacturing environment where you really got to solve that and get that process in control ASAP.
Demetrios [00:32:51]: This is so, we always love hearing about those late nights and 03:00 a.m. calls, especially on Sundays. But it does feel like there's. So I want to address one thing. Tons of questions about LLMs in the chat. I'm going to get to that. We'll have a whole LLM section about that. The Genai hype train will arrive.
Demetrios [00:33:15]: Don't you worry about a thing. But before we do, staying on this idea of just like complex systems and system design challenges in designing the systems, it is something that you were saying, Pavel, they don't work in isolation, right? So you have your model. That's only one piece of the pie. And what are some of these challenges of integrating AI into the equipment into the processes? When you've gone about doing that you mentioned before, maybe the model starts working really well, but everything else has almost been picking up the weight or it's been carrying the model. And so now when the model starts working better, things start to not work as well. And especially when you're dealing with, I feel like when you're dealing with things in real world, it's not bits and bytes, right? It is actual physical materials that you have to deal with. How do you think about some of these system challenges?
Pavol Bielik [00:34:25]: I think definitely one of the big challenges here is, or at least traditionally have been trust. Because if you have the systems that have been designed by engineers, by humans, you can look inside, you can understand this is how the decision is made, this is why it was made. Maybe now this is why it's failing. And now we're saying, let's pray this little box with AI, you better get better return on investment than just now saying, okay, I don't actually know what's going on. So you really have to have very good reason why you are replacing it and there are very good reasons why companies are doing this, but then you have to somehow get the trust back. And this is again like one question is the system design, and the other one is just very, very thorough evaluation. And this is also what Anik had mentioned. These are not things that typically happen overnight.
Pavol Bielik [00:35:23]: Like you have to be very careful about how you set up the process. There is like a backup system typically that is you're not running it on one model, you are running it on multiple models. And whenever there is a disagreement. You would also start flagging things and things like this.
Demetrios [00:35:39]: I do like one thing that you mentioned about how you have analysts that are looking at the performance too, and you're seeing is it actually doing better than what our baseline is or what we had before. One question that came through is what are some of those KPI's and metrics you use to evaluate this successful of if something is going better than before.
Aniket Singh [00:36:05]: In this actually, I mean, of course accuracy and a couple of things which we get from confusion matrices are common, but you do have to actually have a human look through it and not just numbers because you're comparing the images. Of course there's no way to do all the images, but the ones with the less confidence may be the ones where we are having more issues. Which one is performing better and is it still performing good on the ones that we are not having issues because we can't just because the model starts to perform better on, let's say a new type of defect that we started to see and we train well on, but then starts to perform bad on the other ones. So we still have to make sure that the ones that were already good are still good and the new ones are, is it actually performing better? So there's a lot of manual work in this process other than the mathematical numbers that we get just because of how the industry is, we need to be sure we can't have just rely on the numbers.
Demetrios [00:37:12]: Juergen, I want to throw one over to you because there was a pretty hairy use case that you mentioned beforehand about this low latency use case. I think it was computer vision, low latency. Now, would you say those are some of the hardest ones that you've had to deal with where you're trying to incorporate AI or ML into it? Or if not, which ones are some of the hairiest problems?
Jürgen Weichenberger [00:37:41]: It's a time, and this one's from an AI perspective. It's still computer vision. If you have a well trained model, it's not that much of a big deal. The things you need to solve is how you eradicate transfer times. Because you still have multiple players, you have a camera or a camera system, you have control system in place. How do you get, and you need to execute your AI model somewhere as close as possible. So how you eliminate the time in between, you can get the individual components run very, very fast. It's just how you stitch it together so often it's not the individual parts, it's how you stitch it together.
Jürgen Weichenberger [00:38:29]: But we have other problems where the pure challenge on the AI side is reaching dimensions which, which are right now, if you would say with standard means are not really solvable, you would hear the likes of Nvidia saying, oh, you can use our super chips and we can bring lithography from days to minutes and hours. But this is still often not the answer. It's just throwing more compute power on. It doesn't solve the actual problem on hand. And this brings us back to the mathematical realm of, are problems solvable in on or not? And so this is then where it comes really into it. And we also encountering scenarios where the algorithms we have today available just don't cut it, or they cannot replicate the concept. For example, what I always have to smile is when the generative AI people speak about how they intend to teach reasoning to a generative model, and they're quoting, yeah, chain of thought is the answer to reasoning. When I see it, I say, like, just by changing from a programmatic programming language into a natural language programming language, this has nothing to do with teaching a system chain thought or reasoning.
Jürgen Weichenberger [00:40:03]: It's just you provide instructions in a different format, but it's still providing precise instructions to a system. And you can see that you can very easily circumvent or prove them wrong by making some tiny alterations. And when there's the classic examples in computer vision, then you change one single pixel in an image of a cat. And for the human eye, it's still 100% a cat, but for the computer vision, it's already a dog. And so handling these things where you have ultra minor changes in assuring that still, if you use human perception, you can easily determine, yeah, that's still a cat. Doesn't matter if that bloody pixel is changed or not, it's for systems, massive problems. And they always make you think, like, I know there's a lot of marketing going around, say, yeah, yeah, no problem, we have it under control. And when we're seeing it in some very simple programming, generative programming examples, it's by all means not simple, because following the human, the human understands the process and then writes the code to fit the process.
Jürgen Weichenberger [00:41:32]: Making a machine understand the process is unsolved territory today. Trying to break it down, making the chunks small enough. But look at Mohan's example of manufacturing a computer chip. It's 1700 steps, which have to be synchronized and sequenced. Try to ask a GPT, write me the control program which synchronizes the 1700 steps you get. What you get is simply not working. And so this is the things where it gets really, really complex. And where we reaching limits of machine comprehension?
Mohan Mahadevan [00:42:14]: No, I think Joergen makes a lot of good points. From my perspective, I think it's about both the machine learning models, the foundational models, and these large language models, large language vision models. They definitely have all kinds of problems, but the question is, can we, as humans, introduce enough inductive biases to find small but controlled domains to use them impactfully in the industry? So it's almost as if you're working around this very rough tool that is very powerful, but it's got all kinds of problems, and you figure out a way, no, no, no, I can cut my thing by just doing this there. So today we are in that kind of a situation. We bring in all these machine learning scientists that try to figure this out day in and day out by building systems and trying to figure out how to make these rough and ready things that on the demos are amazing, right? You can see a demo that just looks like, oh yeah, we saw world hunger yesterday, but the reality is very different. So I think, I think to Jurgen's point, there is a place where if we can build the right guardrails, we can still get a lot of value out of these things today, even though they're far from, you know, perfect. And that's what you were talking about, protecting against hallucinations and really finding a way to use it. So I think the current state is very clear.
Mohan Mahadevan [00:43:48]: These things are highly unreliable, and if you want to use them, you better put a lot of guardrails and protection mechanisms in place. But, but let's see how these things evolve. I think there's just a lot of science and engineering and hardware and compute getting thrown at it, and, and a lot of smart people working across the globe. So, yeah, let's see how this evolves. And our challenge, I think, is to stay on top of this. It's so insane. Right, then a number of new models, a number of new approaches, some good, some not so good, and both staying on top of them, but also the cost of staying on top of them. It's very expensive when new multimodal models come out to figure out if something is an actual improvement or not for our use case.
Pavol Bielik [00:44:32]: Right.
Mohan Mahadevan [00:44:33]: It's an interesting world we live in, but we diverge away from manufacturing a bit.
Demetrios [00:44:39]: That's a great point. That's not just the cost of the API calls, but the human cost of trying to spend time and figure out if it's worthwhile to put a team on this and implement it, or at least mess around like Anikit was saying, do a PoC with it and then see if it can graduate up to being in production. So there's a few other valid points that I wanted to call out here. In these complex systems, we're by no means at any point where we can say to an LLM, okay, go figure out this whole very complex system and optimize each step. But Mohan, what I'm hearing from you is you're saying if we can limit the scope as much as possible, then potentially we can optimize maybe one small step or half of a step or 0.3% of a step, that type of thing. So I do like that idea. Now, since we're on the topic, and I think everyone in the chat has been asking for it, we should start to talk about LLMs and how you all are using them, if at all, with and especially like, I know there were some use cases that you had mentioned, Jurgen, about using them. Like what? What are you using, if at all? Because I heard from my friend Paco the other day and he was talking about with a company that he was working with that was in the heavy industry, you would think that a lot of this predictive maintenance and the stuff that we've been talking about for the last 45 minutes would be the highest ROI and the most important things that they can do with AI.
Demetrios [00:46:31]: And he said that when he went into the company, the biggest ROI and the lowest hanging fruit was just digitizing PDF's and letting those be searchable, which is debatable if it's even AI at all. And so I would love to hear if you all have found that on the admin side, there's been areas to leverage AI use cases.
Jürgen Weichenberger [00:47:02]: So obviously by order of our CEO, we had to look into LLMs and how we can apply them in Belayvidman, the obvious use cases where it actually the lowest value is in automating. Simple things like we get lots of emails, customer service gets lots of emails, going through the emails and creating an appropriate response which would have otherwise taken days or weeks before it has been processed through the xx hands who have to take a look on it. So there is use cases around chatbot automated answer generation where these things working extremely well because it's what they are designed for. They can very nice, create very nice textual answers. When we start to bring it into the industrial world. Then you see like when everybody was thinking coding, creating code with lnms should be a super easy thing to do. And when in the first instance for your let's say, private coding examples, it works fairly well. But when you lift it on an industrial scale where repeatability, maintainability, consistency in language plays a role, then this whole story becomes very, very different.
Jürgen Weichenberger [00:48:34]: Because the nature of the auto encoder, specifically in combination with Reg, is he tends to continuously change his answers. So if I ask ten times over time, the question, write me a ten second wait loop. My likelihood that I get ten different answers for the same task is extremely high. Now translate this. You have field service engineers, service engineers going in the field, they're using these, these tools. And when these guys are not dumb, they know how to fix things. So if he asks the tool, tell me how to fix it, and he asks it five times and he gets five different answers, and that brings it back to the point of trust. Pablo said earlier, if you give him five different answers, will he, is the guy going to trust the system or will he just stop using it? As I'm reverting back to what I have in my head, because this is more reliable and more repeatable than following a system which tells me every five minutes something different, just in the way how the probabilistic mathematics fall, it guides it through different.
Jürgen Weichenberger [00:49:40]: And so, as charming as it is, on first glance, using these things on industrial scale, you tap into the probabilistic nature of the algorithms and therefore the likelihood that they give you for the same question, multiple answers is very high. And this creates a huge issue because many, as I said, if you control process control systems, PLC's, you don't want to have ten PLC's with 100 different programs for the same task. So that's what we are finding. And this is where we now have to delete open heart surgery on the GPTs to, again, contain, to Mohan's words, put the guardrails in place, that it's not going to go ballistic.
Demetrios [00:50:33]: Yeah. One question that was coming through the chat that I think a lot of people have is, are you fine tuning directly? What if. So, what does that look like? I imagine it's not just hitting an OpenAI API. So maybe break down the use case Mohan for us and then tell us if you are using it in production or if you're like Jira, getting a little bit more like, eh, it's fun, it might work, but it's very dangerous.
Mohan Mahadevan [00:51:04]: Yeah, I think honestly, Jurgen covered most of the important points here, but I'll just add something from my perspective. So wherever there's a copilot use case, it can be quite useful, which means that think of a copilot use case, where it's helping you to do some tasks and it's a, and you as a human is other master and you control, you know, what that end result is. So a lot of copilot use cases are seeing a lot of traction from GPTs where hallucinations are caught by human, and yet for a majority of time, if it gets it right, then it's useful to the humans to go faster, to do more things, etcetera, in terms of actually deploying this in production. So we've got think of the world as the development world, where we actually develop things in house before we go to production. So improving the pace of development and internal processes, I think they're very much in play today in terms of production. Building the right guardrails becomes a bit challenging to do it right off the bat. So you're not quite sure. Suddenly a new LLM pops up, suddenly a new multimodal model pops up, suddenly you want to try that, and then it's just as wild west.
Mohan Mahadevan [00:52:31]: We've got some limited sort of production deployment type situations, but we are very careful because for us, trust is important. If we deploy something and it leads to some errors and some obvious hallucinations, then our customers are just highly unlikely to trust anything. Then you break trust. Not for that particular mistake. You break trust across the whole, whole product. So production wise, I think in our world, and right now I work in insure tech, it's not, we are there in bits and pieces where we can control the destiny very well, but we're not there as a standalone multimodal model or an LLM that's deployed in production.
Demetrios [00:53:19]: So since you brought up the topic of trust, I think there is an interesting question, question around trust between traditional ML versus trust between LLMs. And have you seen pavel in your experiences, people just finding that with traditional ML, since we've had more time to stress test it, there is a bit more trust there, and you see things going out into production. What is your take on the trust with one versus the other?
Pavol Bielik [00:53:56]: Maybe I would just say, and because the lms are new hype, everybody knows about them, but this is not a new problem. We had the same problems five years ago, because that does boil down to there are people using systems, and if these systems have inconsistent answers, as Jurgen mentioned, because I turned on my light, or the camera shifts like one degree to the left, and suddenly it's not a defect, I will just not be very happy with the system. And there are users at the end, and maybe all the people who have. Tesla's also seen this like there is a software update. Suddenly things work differently than before and I have to adjust. I think with LLMs it's just more pervasive because people have more access to it. And the main thing I would add here is it's a challenge of kind of building even the framework to evaluate the systems in the first place. And kind of coming from academic background, this is also something we are very interested in.
Pavol Bielik [00:54:57]: And we would be also open sourcing an initiative around this to actually build a framework where anyone can go and check the model against kind of set of properties and benchmarks that these models should satisfy.
Demetrios [00:55:14]: You're going to open source that? Did I just hear you say that?
Pavol Bielik [00:55:16]: Yes.
Demetrios [00:55:17]: All right. We will make sure that everybody finds out when you do. That is super cool to hear. I feel like we are hitting up on time, so I want to be very conscientious. You all are awesome for doing this and thank you so much. There's been some incredible questions that we didn't get to all of them, so forgive me for not being able to ask them all. If anyone is listening in the chat and wants to continue the conversations with any of these incredible panelists, hit them up on LinkedIn. I think they're all, they're all active on there, so it would be great to continue the conversation.
Demetrios [00:55:53]: And as I mentioned at the beginning, feel free to join us in California for the AI Q Con, aka the AI quality con. Friends would love to see you there. Folks, this has been a lot of fun and hopefully you learned some stuff and you have more questions when you're thinking about AI in industry. Let's keep the conversation going. I'll see you all later. A huge thank you to every one of you for coming on here and talking with me and enlightening me on these very important topics.