MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Generative AI & Elastic Observability

Posted Apr 29, 2024 | Views 389
# Generative AI
# LLMs
# Elastic
# Elastic.co
Share
speaker
avatar
Anna Maria Modée
Senior Solution Architect @ Elastic

With over nine years of experience in solution engineering and architecture, Anna Maria is today working as a Senior Solution Architect at Elastic, the company behind Elasticsearch, Kibana, Beats, and Logstash. I help customers and partners design, implement, and optimize solutions that leverage Elastic's products and services for search, observability, and security use cases.

Anna Maria is passionate about learning new technologies, sharing best practices, and collaborating with diverse and talented teams. A consultant at heart, an engineer by training, and Solutions Architect by trade.

+ Read More
SUMMARY

The fact that search is not just traditional TF/IDF anymore but the current trend of machine learning and models has opened another dimension for search. This talk gives an overview of:

  • "Classic" search and its limitations
  • What is a model and how can you use it
  • How to use vector search or hybrid search in Elasticsearch
  • Where ChatGPT or similar LLMs come into play with Elastic
+ Read More
TRANSCRIPT

Anna Maria Modée [00:00:02]: Thank you Kaminan for being here and letting us have this awesome meetup. So my topic is slightly different but all the same. But first off, who has heard of Elastic before? Everyone. Excellent. I can skip the introduction. Great. So my topic here is going to be about the one thing that hit Mlworld last year, Genai, and how we manage that elastic. Of course my clicker didn't actually connect now and I also tend to forget, so I have an extra slide for that.

Anna Maria Modée [00:00:42]: My name is Ana Maria Murdib. Please call me Mia. You can reach out to me at any time. At Mia atelastic I'm a solution architect. And how did I get there? Well, I studied computer science at KTh here in Slothpolm with alignment machine learning. I did a year abroad in Tokyo and came back and started working as a developer. In specific, it was developing applications, mobile apps and also working in QA, which means I was destroying the apps I was trying to make, which I turned out to be very good at. But then I got a call.

Anna Maria Modée [00:01:23]: John, you want to work in pre sales down in Malaga, Spain? I was 25 and I said yes. So on that it went and it was actually a very, very good fit because I got to work on different types of solutions and what really triggered me when I was studying computer science is the problem solving. So that really hit home. I've been in love with my work as a solution architecture. Resales solution engineer has many names ever since. And now I am three and a half years into elastic. When I joined I thought I would learn the product in because it's just one product, right? It's not like my previous employer who had 3000 plus projects. How long can it take? I thought it would take me six months tops.

Anna Maria Modée [00:02:15]: I'm a bit humble now, but yeah, I'm still learning and we have a pretty rapid release schedule. So what I will talk about today is to actually going into that release schedule, we're going to talk a bit about the evolution of search and it's going to be elastic specific and then genai and private data and how we tie it all together in elastic. So this is the meteor that hit that I mentioned kind of prefaced yeah, and it has really revolutionized search. Elastic is, as you might know, is search company a search analytics company and llms chat GPT. Yeah, it all hit home. But what do we actually do with it then? So search in the background. We are actually based on that open source project called Lucius and elastic. You can see it as leucine on steroids.

Anna Maria Modée [00:03:12]: We are what you call the glue that makes a leucine scalable at a like petabytes of storage searchable in an instant. But we have come quite a long way from the old type of searching where we tokenize. We have an analyzer, you, you lowercase everything, you get a long list of tokenizers, you create it in index. So what is it that really hit home? We want to have meaning. Semantics users are looking for context when they're searching. They want to have a human answer as well. So in elastic, we're kind of like, yeah, we can actually do this. So there's a lot more to realistic than just traditional search.

Anna Maria Modée [00:04:10]: So vector search, of course, enters the arena. And this was already a couple of years ago, two, three years ago, I think. And what we do is that we vectorize, of course, data. So we do a dense vector representation. We also vectorize the query that comes in, do a nearest neighbor or approximate near sniper to be more exact, and then we get the result, which is very accurate, but not perfect. And this is kind of important because sometimes good enough is good enough. So if you want to have an exact measurement, Knn is not going to work. So this is how you practically do it.

Anna Maria Modée [00:04:55]: We pytorch for framework, you get the model you need, you can use elan to very easily upload your model of choice if you created it yourself, found it online, and then you can manage your models within elastic, and then you can use these models to create embeddings. And these embeddings live next to the different information that you can also use for traditional search. We have created a new model for vector representation in search called l cert. So it's very much about making it as easy as approachable to get this working. So we have the developer in mind, but you don't necessarily have to have a data scientist background to work with this. So if you are just interested in getting a good enough result, it's not going to be perfect as when you create your own models, but you just need something that is very, very good, but maybe not. Elser is specifically English currently. So if you have a need for another language, you would suggest using e five, for example.

Anna Maria Modée [00:06:17]: So there's still a need to import and manage other models. But this is like the summary years of development in this field that we've done. So we started with implementing supervised machine learning in the platform. All the way back in 2017, we introduced vector search in Lucene, so that it's very contributing back to the open source project that we're actually building it off. So we're really trying to give back. Elser came very recently, so it was last year, so we're working a lot on the last. So we also introduced Esser, which is a tooling kit. You could call it like the relevance engines for anything AI and elastic.

Anna Maria Modée [00:07:12]: And on top of this, so maybe those in the back can't really see. I was told to grab onto this, otherwise you wouldn't be able to hear me in the recording. In the lower end, you can see how we're trying to apply this and making it as approachable and user friendly as possible. So even though we have the capabilities making it easier to use by building, for example, anomaly detection out of the box detection rules, or research on how to get started, all of those things. The most recently we launched AI assistant for our different solutions. So this is how we're trying to do. But my talk was supposed to be about genius, right? And private data more specifically. I'm gonna try to make sense of why I introduced embeddings and such.

Anna Maria Modée [00:08:06]: So what's not great about chat GPT? So someone once said, and I think it ran true. So chat GPT is like my dear grandma, she's very sweet and always wants to give you an answer, but sometimes she doesn't have the. Like, her memory isn't really all there, and sometimes she makes things up. You have kind of these hallucinations. You might not actually get an answer that you can use or that is relevant to what you're searching for, but it creates something. So what you want to do is actually getting this on your data. But how do you do this? So you have four corpus on your private data there. Yeah.

Anna Maria Modée [00:08:50]: And you have the huge corporates that the LLM is based on in combination this. Okay, so how about augmenting this LLM, right? Very, very interesting. Oh, like our own private LLM. Wow, magic. So there's a few considerations when it comes to this, right? First of all, it's extremely costly in terms of cycles of your GPU. So we're talking. So if you want to get an actual price tag on all of these numbers, 1 million GPU hours is approximately $2.4 million in cost. That gives you context to these numbers.

Anna Maria Modée [00:09:36]: So building this not only costs a lot to host, but augmenting an LLM that, you need a really solid business case to have that. But let's just looking at fine tuning it, then. We don't need to build our own, we can just fine tune it a bit. So there's a bit of different approaches, of course, models to how to approach this. You can use the self instruct one is kind of the most prevalent one. I can't actually see the notes on the slides here. I need to turn around. So you definitely reduce the number of hours needed to run to get this, but there's still a high cost in expertise that's not always accessible.

Anna Maria Modée [00:10:24]: So there's a few things that we see in elastic that's very difficult to come by. Good data scientists, they don't grow on trees. So if you want to have this, that also means it's going to be difficult to maintain, because if you actually got that data scientist to work and create this for you, what are you going to do if that person leaves? So having something that is more easily maintained and that has a lower threshold is actually very valuable for the corporations as a whole when it comes to longevity and main operational quality of it. So we want to help provide this by making it easy for people to get started. And there's also the aspect of having to retrain the model every time your data changes. So the solution you could say is RAC augment retrieval augmented. And this is, you can probably replace elastic here with any vector database. But there's a lot of upsides.

Anna Maria Modée [00:11:34]: I think Rebecca mentioned a few of them, why it would be a good idea to keep it in one place. So what you want to do is be able to bring context to the LLM, and then you have a relevant answer that's not hallucinating, that's going to give you an answer quickly and cost effectively. In summary, and if you want these slides, feel free to just email me afterwards. Or Patrick, maybe you can provide the methods as well, so you don't need to take a picture of it. But that's a lot of the reasons why we chose to go the rag approach and why we think that makes sense for the majority of people. I mentioned before, that elastic has some different solutions apart from search. So we also provide a box in Cubana for observability and security. What is it that we see is going to revolutionize these areas or have a high impact when it comes to Genai? So there's really three things.

Anna Maria Modée [00:12:41]: So you have the detection, and we already provide that with the AIOps ML ops when it comes to anomaly detection. Then of course diagnostics. What's wrong? What's causing that? This alert and how do I fix it? So remediation. We are trying to provide this with assistance, as I mentioned. And this is very easily to connect to your favorite LLM with a connector. A few of those, when it comes to observability, is what is this? What's going on? What is this library, as long as the LLM is actually trained fairly recently and can recognize, the library can tell the SRE what that library is supposed to do, and if it makes sense to have in production, it can give that context, that private data of the actual measurement that you're seeing in elastic, and give basically a helping hand, like an assistant. When it comes to security. There's a few more use cases that I've heard resonate very, very well.

Anna Maria Modée [00:13:55]: So data scientists may be experienced or expensive and difficult to find. SOC security specialists are even more difficult to find. And when we have our dear EU coming with new regulations, telling them to write reports. So this is maybe something that you're not thinking about, but you can help solve and that is making it easy to generate reports that are regulatory to create based on any type of data. Regulatory understanding of large corpus of complex legal documents. So as an example, in security, there's a new regulatory thing called NIST, two directive coming from EU that says that every security incident slopes to create a report that is extremely time consuming. And that means that you basically need to hire a new security specialist for your team just to be able to keep up with those reports, because there's a lot of security incidents happening, not least because of Russia. Hi.

Anna Maria Modée [00:15:12]: So how do we solve this? We ask Genai to create these reports based on the template. We can give it the context based on the incidents that we have in our cases, but we can anonymize the data. So nothing that is actually secret is leaving your platform. You can just utilize that LLM to create the boring text that needs to be generated, but still very, very context driven. This seems time. This saves money. This is where Mlops really, really benefits the corporations, in my opinion. So what are we working on besides what I told you? Customers wants Jnai and private data, and we want to do this for developers.

Anna Maria Modée [00:16:05]: We have a few things that we're doing. We want to continue working with Nucine to make it the best vector database out there. It is already the most downloaded one, but we want to make it even better and make it the best one. That's our mission. At least we have a few things that we're working on there. We're also aware that for developers such as yourself, you prefer to be API driven. So we're trying to create even better and more simple APIs and really be API first. Another thing is of course, as part of this, improve the inference API and really work on the developer experience.

Anna Maria Modée [00:16:56]: So our PM's are actively reaching end users to interview and see how can we improve the user experience. Now we're having interactive sessions coming back showing so this is not something that we take lightly. We really want to make a product for developers so if you have any feedback please let us know. This is how we improve and we've also created search labs. So this is by developer for developers. This is not marketing at all. There's a lot of tool guides you can find, tutorials, collaborations, blog posts on how to or just research on performance, benchmarking. All of this we have an entire ML team just creating and working and trying showing things here.

Anna Maria Modée [00:17:54]: So this is a great start if you want to just look into the capabilities of elastic and genai or mlops in general. So we want to make, as I mentioned, listen, the best vector database and elastic, the most comprehensive and simple search experience for gene apps. But not only of course. So search speed, scale and relevance are our key mottos, if you haven't heard those already and really be open. So we want to give back to the community and I hope that why this we've shown case that we want to hear back and we want to be there for the community. Thank you.

+ Read More

Watch More

24:49
Generative Interfaces Beyond Chat
Posted Apr 23, 2023 | Views 2.5K
# LLM
# LLM in Production
# Notion
# Rungalileo.io
# Snorkel.ai
# Wandb.ai
# Tecton.ai
# Petuum.com
# mckinsey.com/quantumblack
# Wallaroo.ai
# Union.ai
# Redis.com
# Alphasignal.ai
# Bigbraindaily.com
# Turningpost.com
Navigating Through the Generative AI Landscape
Posted Jul 04, 2023 | Views 742
# Generative AI
# LLM in Production
# Georgian.io
# Redis.io
# Gantry.io
# Predibase.com
# Humanloop.com
# Anyscale.com
# Zilliz.com
# Arize.com
# Nvidia.com
# TrueFoundry.com
# Premai.io
# Continual.ai
# Argilla.io
# Genesiscloud.com
# Rungalileo.io