Sign in or Join the community to continue

From Arduinos to LLMs: Exploring the Spectrum of ML

Posted Jun 20, 2023 | Views 764

# LLMs

# TinyML

# Sleek.com

Share

speakers

Soham Chatterjee

Machine Learning Lead @ Sleek

Soham leads the machine learning team at Sleek, where he builds tools for automated accounting and back-office management. As an electrical engineer, Soham has a passion for the intersection of machine learning and electronics, specifically TinyML/Edge Computing. He has several courses on MLOps and TinyMLOps available on Udacity and LinkedIn, with more courses in the works.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

Abi Aryan

Machine Learning Engineer @ Independent Consultant

Abi is a machine learning engineer and an independent consultant with over 7 years of experience in the industry using ML research and adapting it to solve real-world engineering challenges for businesses for a wide range of companies ranging from e-commerce, insurance, education and media & entertainment where she is responsible for machine learning infrastructure design and model development, integration and deployment at scale for data analysis, computer vision, audio-speech synthesis as well as natural language processing. She is also currently writing and working in autonomous agents and evaluation frameworks for large language models as a researcher at Bolkay.

Prior to consulting, Abi was a visiting research scholar at UCLA working at the Cognitive Sciences Lab with Dr. Judea Pearl on developing intelligent agents and has authored research papers in AutoML and Reinforcement Learning (later accepted for poster presentation at AAAI 2020) and invited reviewer, area-chair and co-chair on multiple conferences including AABI 2023, PyData NYC ‘22, ACL ‘21, NeurIPS ‘18, PyData LA ‘18.

+ Read More

SUMMARY

Explore the spectrum of MLOps from large language models (LLMs) to TinyML. Soham highlights the difficulties of scaling machine learning models and cautions against relying exclusively on open AI's API due to its limitations. Soham is particularly interested in the effective deployment of models and the integration of IoT with deep learning. He offers insights into the challenges and strategies involved in deploying models in constrained environments, such as remote areas with limited power and utilizing small devices like Arduino Nano.

+ Read More

TRANSCRIPT

So my name is Soham. I am currently leading the machine learning team at FinTech startup in Singapore called sleek. I, I try to make coffee with like, you know with high precision grinder with a with a temperature controlled kettle and with you know, with a very sensitive scale.

I use distilled water that has been specifically, you know, where they've added some minerals in some specific Formulas that are in percentages that's supposed to make your coffee taste better. I, I, I don't buy beans that are like older than that, that haven't been roasted like more than a week ago.

What else? Yeah, I mean, I do all of these things and you know, my coffee still tastes the same, actually. It doesn't, it taste good, but, but you know, it doesn't taste like, Like what I expected to taste based on the amount of effort I put in. But you know, the good thing it smells really great when you grind the coffee.

It's really it's very calming, like grinding the coffee and just going through the whole motion of. Making it. And I think that's the best part. Yeah. It doesn't matter how it tastes. It gives, it, it, it gives you a sense of calm. What's up everybody? This is Dimitrios and you are listening to another edition of the Lops Community podcast.

Today I am graced with the presence of Abby. What is going on? Abby? How you doing? No, very well down with fever. I just took meds. Oh, no. Oh, I didn't even realize that. We just recorded this whole episode. And you seemed absolutely all together. Everything was fine. So I didn't even think to ask if you weren't doing that well.

But I guess you hold your composure well under stress and under pressure. Well, but it was an excellent conversation. I was actually excited with this conversation. Maybe that's difference cause there were so many other ones where your bored outta your mind never said, I knew they were going to come

I was just saying that as opposed to other ones where you're bored outta your mind and maybe you are in pain even though you're not sick. I, I don't think I'm being bored outta my mind in any conversation, but this is also like one of the areas that I'm working in now. So there's a little bit of more excitement.

If something is a little bit more recent, you know, there, there are things that you can learn for the sake of learning, and then there are things that you're like, oh, I'm doing this now. I'm looking for a new perspective on those things. And I think he had a very interesting idea of, or at least I could say, his thought process on a lot of things did align very well with what I've been studying and observing with Bible learning.

Yes. Oh my God. So we should probably just mention Soham came on here. He just talked to us about two main themes, one being tiny ml and the other being actually using large language models in production. I loved it because it was the refreshingly real take on the pains of using large language models now in their current state.

And he has been trying to use 'em. I mean, he started using them. When there was G P T two that was out and he's continued to use them, and I mean for me, the biggest highlight was how he talked about, you know, things. It's very easy to stand something up and get that toy demo working really well, but when you want to actually go to scale, that is where it gets really hard and you can't just rely on open AI's API because who knows what their SLAs are.

They haven't actually released any SLAs. And you also start paying a lot of money, like really fast. So as a aside, for those who do not know who Sohu is, he leads the machine learning team at sleek and he builds tools for automated accounting and back office management. He started out as an electrical engineer, which we get into.

And he has this passion for tiny ml and he loves talking about it. So we started out with that and we went down that road for a bit. And the Andinos Arnos, I think is what they're called and how he loves deploying machine learning on them. And then we went to the opposite side of the spectrum and we talked about the large language model.

So basically how small can you make the machine learning and how big can you make the machine learning? Was there any key takeaways that you had before we jump into the actual conversation?

which was the kind of glass that he looked at coming from Tiny ml. He said, you know, the scale is quite similar, at least the scale difference ratio, which is earlier we were trying to deploy these models on very tiny devices. And we had the same challenges. These challenges were how much power is they consuming, how much latency does they have?

How private dear sharing is happening? How much excess and control do you have over these models? How are you compressing your models? And most of these things now translate into the realm of elements as well. So I love that one. So good. So he's also got a ck, I think I found him through CK or he came into the community and he's one of those people that.

It was obvious when he came into the community that he, he knows what's up. He's been doing this for long enough and he's been thinking about it deeply, and so I encourage everyone, if you want more of so ham's work, go and read his sub. We'll leave a link to that in the description. For now, let's jump into this conversation and, oh, wait, no.

Before we go anywhere, it would mean the world to us. If you could share this episode with one friend, that would just be incredible. If you know someone that is, In this field and would appreciate this type of podcast, please share it because that's how we can grow and get out to more and more people and hopefully spread the good word let's go, let's do it. A you ready?

Here we go. . So him just have to make sure it's, am I pronouncing it correctly? So is it, so ham soham. What is it? It's more hum. Like, like you Yeah. Like humming music. Yeah. Yep. So I, but, but, but yeah. Ham is fine. Yeah. In, in yoga classes. So Hum is kind of a popular word that they use. Huh? Do you know that you didn't even Yeah, I've heard that.

I, I, I, I know that, but I haven't, I haven't any of the, like none of the yoga classes I've been to have, have used that word, so, but I've heard other people say that. Yeah, there's, I've been to yoga. What, what does the name mean, by the way? Yeah. So from what I understand, it means I am like what I am.

Tech. So I, I guess that can be interpreted in a lot of ways. So accept it. Yeah. Enjoy it. That's why you so, exactly, exactly. Oh, dude. Awesome. Well, I'm super excited to talk to you because we are going to go down almost like two pads where you do.

A lot of fun stuff with large language models, but you also have this alter ego that is doing stuff with tiny ml, and you mentioned to me before we hit record that one of your passions is to look at how people are using and try and figure out yourself. How to use machine learning at the edge. So let's get into all of that, but before we do, I think it's worth just having a bit of a brief explanation on how you came to find yourself where you're at now.

Yeah. Sure. So so my background is not in computer science, like at all. I have a bachelor's and a master's in electrical engineering. But I, I mean, I can't do anything that is electrical related apart from like, change the occasional like light bulb, ? What, what I learned, so, so when I was in when I was doing my undergrad, I, I this was back in like 2014 or 15 and IOT was like really popular at that time.

And, and what happens in, in any iott devices, it generates a lot of data . And at that time there weren't really a lot of people trying to use that data and do something smart with it, . Or do something intelligent with it. Machine learning and deep learning was, it was just gaining popularity at that time.

A lot of people didn't know how to. Deploy these models really well, or even like how you could use the data and build something smart with it at that time really well. So, so yeah, it was kind of at that time when I, when I started exploring like what we can how we can take this all the data that these iot devices are doing and build something smart with it.

So, so that's, and that's, that path kind of led me down. To studying machine learning and then deep learning and and then combining iot with deep learning. You did your bachelor's in electrical engineering. Yeah. And then you went on to work with a company called Samar Technologies as a deep planning researcher. Yeah. How did that happen? Yeah, so so as part of this student run lab in our university where, where, you know, we were just like learning stuff, experimenting with all this stuff, we, all the cool things that were happening at that time.

Iot deep learning virtual reality, stuff like that. And so the person who eventually went onto you know, like be the head of the. Deep learning lab at SAMMA gave give a talk at that lab. And that's how we got connected. And when I was graduating, he, one day he just randomly messaged me on Messenger.

At that time I was in the middle of class and he messaged me and he was like, Hey, can you come down to the, to our office, which is like a four hour plane ride. I think it was. Wow. And you know, just, just, just like see if the vibes, if you like the vibes and if you wanna join, maybe do an internship or something.

Yeah, I was like a broke student at that time. And he and changed your life and, you know, like he is Yeah, yeah, yeah. Exactly. So I'm really happy that I got got that opportunity. I think it's really hard to get to get a job in machine learning, especially at that time if you didn't have like a PhD or like a like prior experience.

And I'm glad that his name is Malay, by the way. He's still at samma. At Malay. And. Yeah. Shout out to my life for giving me the opportunity and trusting me. So you ended up with a background which was between deep learning, edge computing and quantum computing. That's a very interesting combination.

Yeah. I, I did dabble a bit in co quantum computing. I, I wouldn't say I know like that a lot about it. So the research lab at SAMMA was it was like we, we, we were allowed to experiment a lot and work on things that we thought would benefit the company at that time. So SAMMA is like a pharma IT company.

So we built tools and products for pharmaceutical companies. At that time a lot of people were using quantum computing to do I think protein synthesis and like, oh yeah. So, so yeah, we just tried experimenting and seeing if something worked. Yeah, that, that, that's how I, I dabbled a bit in quantum computing, but yeah, not, I, I wouldn't say, I'm like, I'm qualified to talk much about it.

Because you've worked in tiny ml, or at least you've exploded, what are some of the challenges that you've seen with deploying machine learning models on tiny ml or kind of edge devices? Yeah, and maybe we should mention real fast, just as far as tiny ML goes.

What do you consider Tiny ml? Yeah. Good question. I think so for me, tiny ml, like the really, really tiny ml, it's probably like devices, like an Arduino nano, which has only a few kilobytes of wow. Of ram. So and, and that ram has to hold not only your model, but also your code for running the, executing the model reading data from sensors, you know, processing that data, all of that stuff.

That, that's, that's I think like the really, really tiny stuff. And then what makes it more challenging is if you deploy that device to like a remote area where maybe you don't have, have access to utility power, ? So you have to run it off of a battery. So how do you make your code, like really optimize so that the device can really autonomously for like a year at least?

Yeah. Yeah, so, so that would be like the really, really tiny like at the really edge, ? But then I think you know, phones now are really powerful. But that's also, I guess in some ways I wouldn't call it tiny ml, but it's, it's an embedded device at least. And and then another popular embedded device would be it has very pie.

Yeah, so, so I guess like there's, there's levels of tiny ml. You have like the really, really constrained devices with no power no connectivity. So you can't even make like a query to a server somewhere. And then, you know, those constraints slowly get less and less as you move further up the chain to like a raspberry pie and then a mobile phone and, and so on.

Yeah. And you're playing around with what, when you mess around with tiny ml, what is your favorite? Area to play in on that chain. Yeah, I think I, so I really like working with Arduino Nanos because that's, that's where the, that's where it's most challenging. Yeah. I had a few You were gonna say that?

How do you even see something work on a device like that?

It's, it's probably going to be like an l e D that blinks, ? And, and the joy that a blinking l e d can give you. It's it, I can't describe the feeling. It's a little strange. Yeah. So for the applications that you were exploring on, on these devices? Yeah, sure. So, so one of the first applications that I worked on was at samma.

We were working with a, a medicine manufacturer and manufacturing medicines. It's it's highly confidential. You need to have a lot of safety and there's a lot of protocols around it. So this particular manufacturer, they wanted. To make sure that any box of medicine that was going off of the factory you know, production line had like, not only the medicine, but also medicines usually have like some, a piece of paper with like some information and some warnings on it.

Stuff like that, they wanted to make sure that all the boxes that they were shipping out had all of those things. So and, and, and the reason they need a tiny ml. Was because it, because they couldn't make a query to a model out in some server somewhere, because that would be sending like confidential information outside.

And then factory floor space is also very expensive. So they, they didn't want something really large, ? They wanted something small that they could fit maybe next to a camera somewhere. So, so that was one of the first applications that I worked on with tiny ml. What we ended up building for that was at that time there was this device called a neural compute stick.

And the neural compute stick, it's like a usb it's like a thumb drive that you can stick on into a raspberry pie. Mm-hmm. And and, and, and it's called an accelerator. So what it does is You can offload the the difficult and compute intensive ml workload on that accelerator. And that accelerator can only execute ML models.

It, it can do anything else. And because it can only do that, it's really efficient at that, so it can do it really quickly as well. So, so yeah, so that, that was one of the first applications. More recently I was working with neuromorphic Hardware. So, so when you so, so when you have to deploy a model so, so when you have to deploy an application to a place where there's no utility power and you have to use a battery, you want to make sure that the devices you are using consume as less powered as possible, ?

And sensors are one of the most powered, hungry devices that you can have in a tiny ML system. So, so one of the solutions is to use neuromorphic hardware, which are which are very efficient. So, so yeah, so we built and, and another thing in this field is the more custom you make your hardware, the more you can save on energy.

So what we did was we interfaced with this neuro, we we built a chip that could interface with this neuromorphic camera. And, and it would consume a few milliwatts of power and, and it could monitor like, like for instance, you could deploy it in a jungle and you could monitor for movement of like of cars or, or you know, any illegal activity, stuff like that.

Yeah. And. I would love to say you, I love the fact that you mentioned latency, you mentioned size and you mentioned power, energy as well. Because there's something you said earlier when we started this conversation that the scale difference is very similar. How many things or how many operational challenges have you seen transform or like come from one domain to like this new domain?

Yeah, sure. So latency is the biggest challenge I would say. Because you want your mo you want your architects, and you, you want the results from your model to come really quickly, ? So, so you want to optimize for latency. So that, that's been a, a really big challenge.

One challenge that Well, it's there in tiny ml, but not a lot is the cost. So at work, when I deploy these large language models, we have to really think about whether it makes business sense to to, you know provision an instance with like one or two GPUs just to deploy a model. Because a lot of times just the cost of that doesn't, doesn't make business sense at that time.

So to kind of solve these problems, what people are doing they're trying to find ways to fine tune the models and compress the models, ? And there's a lot of techniques to do that. For instance you know, the what's the module called?

Alpaca. Model, which, which was which was compressed down from like your from a LAMA model. Similar to how in t in Tiny ml we do knowledge distillation. I feel like they've, they've kind of they they've done it similarly and, and you have other projects that are doing quantization. What, what these compression algorithms do is they actually have been shown to increase bias in your models.

Which, which has been a huge challenge in tiny ml and I'm sure that'll also come up now in, in LLMs as well. Oh, fascinating dude. So I love the fact that you are talking about all these constraints and really looking at, when you're using Tiny ml, it's because you have certain constraints that you have to fit in.

Like there's not enough space on the factory floor. Or we can't actually send the data anywhere because that would be illegal. And so it forces you to come up with mm-hmm. Certain ways of doing machine learning that you wouldn't necessarily think about when you're just at a computer and you're creating some kind of SaaS software.

. Like, but yeah, this, this is awesome to look at. I also know that you have. A sub, and that might be what got me hooked on what you've been doing. You talked about building a chrome map on this sub I think and the different phases of building with large language models. At your job, you're using large language models, ?

And you're always thinking about, and from what I understand from talking to you is that it still isn't really clear if you should be using them or not. And so maybe we could go down that road a little bit. But I also want to hear about your, your basically your journey when you talk about in, in your articles on CK building with large language models and how you think about those things.

And so I'll leave it very open for you to go down whatever path you want and then I'll follow up with what, whatever you missed that I still am curious about. Yeah, sure. So the, the reason we started this Chrome extension it's because well, me and Archna who, who I'm building this with we like, like anyone else in the field, got really interested with LLMs and how they work.

But, you know, we weren't seeing a lot of people were just building apps and, and selling and, you know, like hooking it up to. And just, you know, trying to build a startup around it. . And, and in the process we we weren't liking how closed source everything was becoming I, I wanted to go back to like a few like the times like we had like a few years ago where people would build something and they would share how they built it and so that others could learn from it.

Yeah. And I felt like that wasn't happening and I wasn't learning. A lot about how these models about the challenges of deploying these models and, and what are the constraints when, when you go, when you try to build a product around them. So that's why we start, we thought, okay, fine. You know, let's just build like a toy example and, and, you know, like see what kind of challenges we face and, and, you know, try to solve them and, and tell people about it, ?

What we've learned is so, so from so what you're, what what you mentioned from the ck and like these stages of building, like stages of building a product that uses LLMs. We've learned that if you are like a startup and you want to build NLP application, it's very easy now to build that.

It's very cheap to just call call's API and, and you know, have a few prompts and, and just build their apple around prompts. And I think that's the first level of products that we are seeing out there now, which are and, and this is a this is an oversimplification, but it's basically a rapid over prompts, ?

And, and. And, and what we realized is building a product like that is, is not feasible. Because as your product gets more complex and as you start adding more features and as you try to kind of make the output from your model more re from the api, like more reliable. What you end up doing is you have to create these long prompts.

You have to do stuff like like just few short prompting is usually not enough. You have to do like chain of thought prompting and, and even more complex prompts, ? And, and your outputs also become longer when you go to chain of thought prompting and stuff like that. Yeah. And your cost just starts to skyrocket.

The model, it's also not that reliable especially if you try to build chains and agents and where, you know, the output of one L L M E P call feeds into another. The, the reliability becomes even worse with every, so what, what we think will happen, it goes down and down and down.

Yeah. Yeah. Yeah. And, and, and I've seen people to, to fix that. What they've done is they have like watcher models that watches the output checks for you know, checks to see if there's errors happening. There's and you know, people have built these systems where they, they also have models that watches for prompt injection attacks and trust issues and you know, watching for incorrect outputs, stuff like that.

It, it, it doesn't make sense to me. It, I don't think that that's something that's scalable. Mm-hmm. And, and I don't think that that is something that makes a lot of sense from the cost perspective as well. So what I think will happen is people will have to start fine tuning the models and start to own the models so that they have more control over it so that they can deploy it on their own their own server.

So that should reduce costs, ? Because if now what we pay when we make an EPA call is on the tokens, which doesn't really make that much sense. And, and yeah. And, and eventually I think as things get more complex, you have to start building custom models. So I've seen a lot of people like become scared and I was one of them as well.

And I was, I, I thought that, okay, you know, these models with all the emergent properties are going to make data scientists obsolete. Mm-hmm. I don't think that's, that's the case. Because while it make, while it makes it easier to build applications, it pretty soon you'll have to move pretty soon.

You'll have to build custom models. But what, what I really like about the, about the APIs is how easy it is for for a startup to build a product and get it out there really quickly. And, and And you don't, and nowadays you don't have to be a company with a lot of funding or an incumbent with a lot of data to build these products.

You can be so for instance, our Chrome extension, ? It does you know, like some changes like the tone of the text makes the text more formalizes it, stuff like that. Building a product like that with no data would take us months, ? I don't think, I don't think any company in the mind would would try to even attempt to do that because this is not enough training data out there to do it.

But, but now you can do it and you can create something very nice really quickly get a lot of customers and that's, I think the a really powerful thing about LLMs. Especially like in machine learning, there was that one phase where everybody was picking up their most optimal model.

You know, like everybody was doing hyper parameter tuning and then eventually people were like, we don't really need to hyper perimeter tuner much. We really need to build models that are very specific to domain. And is that something which you are seeing with this as well, which is if we want to run these massive models and to be able to make a business case with them and to be able to scale them, we need to have smaller domain specific models as compared to larger models.

The case of the reliability and you know, the other issues which come with scaling. Yeah. The, exactly. Yeah. You just, yeah, that, that's what I've been trying to say. One thing that I definitely wanted to know, but it's, it's totally changing gears, so maybe we can keep going down this LLMs path a little bit, because I do like the idea, and I think it's been, it's become fairly obvious that you can validate ideas very quickly with.

An open AI API call, and once you start to hit scale, it very quickly becomes unreasonable. And so it's almost like your success with a app that you built on open AI is also your downfall, and especially, I really like what you said is when you want to start going. And making a few calls, or you, you're chaining things together or you are making it more complex.

That's where you start to see really difficult problems coming in that it doesn't feel like it's fully solved. And so maybe you have watcher models and I've, I've heard some really cool stuff that people are working on to try and figure out how to do this, but it is very much like, I mean, I guess the big companies have been doing it for a while.

Like if you talk to people at Facebook, they're not just using one large language model. They're using a bunch of 'em together, and each one has its specific task and it's doing certain things, but then you have like so much complexity of trying to get all these models to work together. And how are you calling them?

Is it just like 50 API calls? That doesn't really work because there's not like 50 different models that you can call out to. So, It's a little bit of a conundrum now where we sit. Maybe by the time this podcast comes out, it won't be as bad because there's some new open AI update that changes everything.

But I do think that you, you are hitting on something very, very important that gets swept under the rug a little bit in the conversation. What everyone loves to talk about is how much these large language models have democratized the ability to use. Machine learning and how you can do so much with them.

But then the actual production use cases are a whole different story because like you said, you still need to have these chops to actually bring the model in-house and then really see. Actually what part of this gigantic model is what we need and can we fine tune for that and actually can we prune it down?

Can we see, can we get the same kind of benchmarks? If we're using a much smaller model, we don't need to use this gigantic model. So I love everything about what you're talking about. It is ringing true on so many different levels. And I love the part you mentioned on prompt engineering. Especially because this is another conversation I've been having with many people.

Some are open and saying, let's prompt engineer and get as much optimal performance out of our models as possible. And there's another set of people who are saying, you know, there's Human feedback, which is going into these models, which is updating and making these models more powerful as they come in.

There's a very good chance that by the time GBT five comes out, we may not need to use these prompts as much. So I wanted to get your opinion on prompt engineering, like as a specific discipline as well. And how do you feel like how much prompt engineering would. People do in terms of like just a research project or like a toy project versus a production level application?

Yeah, absolutely. And yeah I, I so it's really hard to talk about these things because like, like you said, you know, like G P T five could come out like, I don't know, in a few days and, and just make everything I said obsolete and, and I'll be like the weird guy with a podcast on YouTube where he's just, you know, said random crime.

We're gonna date it. Today is April 25th. So in case buy some like stroke of God, G P T five comes out tomorrow or between now and when we release this, everyone knows that today when we were talking about this. It is, the world is like we are saying it is, but continue. Sorry, I didn't mean to derail this conversation.

Yeah. No, no, that's all . So about prompt engineering, ? When we started building this extension G pt Ford hadn't come out yet. And the API that was available was the, the Vinci 0 0 2 api. And we, you, you know, we had built like a pretty good We, we'd done some prompt tuning, some prompt engineering and, and had a set of prompts, and they were working really well.

We were getting really good results. But as soon as G P T four came out a lot of those prompts stopped working. Or, or maybe they worked, but they weren't as good. So, so, you know, we had to, so, so we had a choice at that time. Either continue with DaVinci 0 0 2 or, you know, move to the DaVinci 0 0 3 endpoint.

And, and we didn't really so a lot of people say that it's really easy to move, but, but in reality it's not. A lot of the work you've put into pro it's very easy to change the change where you're making the API called, but, but all the prompt engineering and stuff you've done may, may be obsolete and may not work.

So we decided to, you know, stick with the 0 0 2 api. But I think OpenAI they decreased the num the amount of load that that API could handle. And at times we, we, it, it would take like a minute maybe even more to get a response. Especially like I, I think around February or something, it, like, it got really slow.

So, so, you know, we had to move to the 0 0 3 api. So, so, so, yeah. So that, that, that's also another challenge with prompt engineering because a new model could come out, your old prompts could become obsolete. It'll level the playing field for sure, but, but at the same time it reduces the reliability in your product.

It reduces you know, if you have SLAs or KPIs about how fast You know, you want results to be out there. How, how trustworthy and, and how good if you want to benchmark of your results you know, you, you may not be able to maintain that anymore. Especially because Open A actually doesn't have any SLAs or KPIs for the endpoints.

Yeah. . They haven't come out and said anything official about that. So yeah, so there's that, that's also another reason why I think it it's crucial that, that you move away from from APIs and try to I don't think it's worth training these models, but at least find you on these models for your application.

Mm-hmm. That's so funny you say that. There is something that you mentioned there, which I think that is worth. Noting one is that there is no SLAs that open AI has set out with, and we probably all have at some point or another, gone to chat G P T and it's like, ooh, we're experiencing too much of a load.

So let me write you a poem about how we don't have anything working now. It's like, I don't wanna fucking pole man. Yeah. I wanna be able to get this, whatever my task is done. And so if you are. Building a product that is fully reliant on that api. It's just very scary in my eyes. And I'm sure there's SRE out there that are listening and they're just like, that's a no-go.

You cannot do that. And that I think is another reason why you see a lot of, it feels to me like there's a lot of great projects that are out there, but they're toy projects and they're doing, they're in that first phase of, Hey, can we validate quickly with open ai? But then we have to figure out something else because.

We don't know if it's gonna be up or not. And then the other thing that I wanted to mention from what you said, and I was just talking to somebody about this yesterday, is if you have, let's say, and this may have been your scenario, I don't know, but you have 300 prompts that you were using with G P T two and now G P T three comes out.

How are you gonna migrate everything, all the infrastructure you've created around those prompts and everything over to G P T three, and then you realize like, oh, half of these prompts don't actually work like they did in G B T two, you know? So it's just an absolute headache and especially every time you bring in a new model, it is a little bit weird.

And then, so top that off. Like let's think about, oh, if you are using G P T for, for some prompts, but then. You want to go to an open source model, and it totally reacts completely different. It's like night and day. You can't even use the same prompts at all. And so you have to re Yeah, start from scratch again.

And that's, that's just a bit of a mess. And I appreciate that you're pointing these things out because again, like. A lot of this gets swept under the rug when, and especially if you hear VCs talking about how this is the future and there's the l l M ops hype that I've hopped on the wave and I, I love it, but I also am very, very realistic about it.

And so, Yeah. And sorry, I, I I, I didn't mean to interrupt you, but Abbi also mentioned how these models are also getting better in some ways because the, they're incorporating feedback. So, so what, what you have to do is you have to maintain, like maintain these prompts. And you also have to maintain like some outputs.

And you have to constantly benchmark the the outputs that you're getting today with your ideal outputs. And if you move to a new model, you have to do that benchmarking as well. And it, it, it's not easy to, it's possible to do that, but, but it's not that simple. And especially if you have a lot of prompts, .

And if your, if your application is really complex, so, so yeah definitely. I don't see a lot of people saying this, but now your prompts are your gold, ? That that's what differentiates you from other companies. So you should keep your prompts you know, locked down as if they were your e p keys.

That's one thing you should do. Wow. And, and at the same time you should, yeah, because that's you know, if you come up with a, a, a prompt now is the equivalent of a custom trained deep learning model a few years ago, . That, yeah. Your prompts are special sauce you need to have.

Yeah, yeah. Exactly. Exactly. You need to keep it locked down. You need to, ha you need to test your outputs with your benchmark outputs and Yeah. You, you need to set up tests for that. Yeah. I see a new shirt coming. Keep your prompts to yourself. That, that's a good one. Yeah. Yeah. Dude, I wanna know.

So I got I got one for you. You've been, you've been like really sinking your teeth into all of this. Where do you think you have succeeded, where others have failed? When it comes to any of this, whether it's tiny ml or it's working with large language models that's a, that's a good question. So with tiny ml, I think something that I'm really proud of is these standards and best, best practices that, that I've come to learn about Tiny ml.

I've been trying to put that out there and help other people in the field. So I'm really proud of that, I guess. And with lms I think I'm, I'm just happy. I'm not that caught up in the hype. A lot of people are like, So caught up in the hype. They, they don't realize that these applications don't really work.

Like they do work well, but then not, you know, it's not really great for production. And yeah, I think, yeah. Yeah, I think that's where, where we've succeeded. I think you definitely hit the hammer on the nail. I dunno what the hem is. But despite the fact that these models are openly accessible via a p i, they're not open sourced and they are not as accessible as most people would like to believe that they're, yeah.

Yeah, absolutely. It's a there's very few. Scenarios where I think it makes sense to deploy an l l m a hot take. And I might get hate messages after this whole podcasters, but, but yeah. No, it's refreshingly. Yeah, it's, it's refreshingly real takes. And I appreciate it because, man, there's just something that since like G p t chat, g p T came out, and then if you scroll Twitter at all, Your Twitter feed is probably just blowing up with this change in the world.

Nobody knows how to use chat, g p t like I do. Here's the best prompts, or here's like how this is going to change life forever. And there's so much hype in it and it's nice to actually talk to somebody who's like, yeah, I've been doing this for a while, and here's where there's some real hangups that. If you are going to use chat g p t, you have to think about these things and think through these things.

Yep, yep. I think another thing that me and ER are doing really well with this product is we are taking it we are not trying to get caught up in the hype. We're trying to annualize things objectively and trying to Trying to grow the product, how we think it would, or trying to build this application and grow it, how we think any other application would, would grow.

You know, like add features, slowly verify them make things more complex, slowly. So for instance, now we haven't the next thing we want to do is actually fine tune a model and maybe compress the model. And then try that out and see if that helps. We haven't even like, tried on, tried out vector databases or any of them. It, it, it, it, it's great. We have those tools, but we haven't really had the need for them, you know, we could integrate them. Mm-hmm. But, but yeah, it doesn't make sense. Yeah, you don't see it yet.

So, with the compression angle, I find it fascinating that you mentioned it. Adds to the bias. And also I imagine you've been trying to figure out Yeah, like pruning down these models and fig also the, the thing that we didn't say yet, that probably needs to be said when it comes to open source models.

There's a lot that's left to be desired out there on the open source market and. Yeah, there's been some releases or leaks of, you know, like llama. You can't actually use llama. You can play around with it, but if you want to create an app with it, you can't. And yeah, like that's, that's a whole nother topic for another day.

But talk to me about this bias and how did you recognize that you, there is more biases when you try can press it. Sorry, not make it similar. So, so maybe I'll, I'll take an analogy from tiny. Well so, so one of the people who found out about this bias was this researcher called Sarah Hooker.

She has a paper on it. I can't recall the name now. But it's, it's a really nice paper. I think it goes over the problem really quick. Really easy, like in a in a way that's easy to understand. So so what hap, so how the bias comes out is let's say you're trying to let's say you're building a camera trap, ?

So what that'll, so it's like it's checking to see if some animal crosses and, and, you know, identify the species, ? You, you know you could build a model that identifies like common animals, like, I don't know, like elephants seabra. I, I, I don't like animals that are big. We have a lot of pictures of them.

It's easy to and also integrate like other animals that are less common because that's what you want to capture it in a camera. Replica animals like, I don't know, like a snow leopard. Something like that. You, you won't find them in the same habitat, but like, let's just say you have the same model for that.

Now the snow leopard, you'll it'll be in your model, it'll be a part of like the long tail at the end where you have few images and and, and the model. And your large model might be able to recognize it really well. But what happens when you compress your model is your model will try to optimize for the, those examples in the dataset that it seemed a lot and forget these at the cost of these examples that it hasn't seen a lot. . So you'll have a bias towards things that occur more often when you when you compress around your models. And that's something I think we'll see happen with With large language models as well. When we compress them you might lose vocabulary.

You might lose a lot of a lot of the richness, I guess of the larger models when you move, when you try to compress your models. So, good man. Well, this has been awesome man, and I really appreciate you coming on here and giving this real take. I wonder if How things are gonna play out, and if it is going to be something that we see people going back to this in a few years and saying like, oh yeah, that was a little bit of a bubble and that was a bit of a craze, like the crypto craze that happened, or if it's just going to continue advancing.

And all of this stuff that we're talking about, all these pains that we're mentioning now. We're going to overcome them in the next couple years and we're gonna figure out ways to work past them. I'm act absolutely fascinated by that. And yeah, I think that's it and we'll leave it there. Thank you for coming.

Thank you so much for having me.

+ Read More

Watch More

Exploring the Impact of Agentic Workflows

Posted Oct 15, 2024 | Views 7.8K

# AI agents in production

# LLMs

# AI

From Idea to Production ML, From Idea to Production ML, From Idea to Production ML

Posted Apr 28, 2021 | Views 783

# Googler

# Panel

# Interview

# Monitoring

The Evolution of ML Infrastructure

Posted Jan 10, 2023 | Views 1.6K

# ML Infrastructure

# ML Adoption

# Landscape of ML

# ML Investing

# Bessemer Venture Partners

# bvp.com