MLOps Community
+00:00 GMT
Sign in or Join the community to continue

LLM, Agents and OpenSource // Thomas Wolf // Agents in Production

Posted Nov 15, 2024 | Views 778
# LLM
# Agents
# OpenSource
Share
speakers
avatar
Thomas Wolf
Co-founder - Chief Science Officer @ Hugging Face

Thomas Wolf is co-founder and Chief Science Officer (CSO) of Hugging Face where he has been at the inception of the open-source, educational and research efforts. Thomas enjoys creating open-source software that make complex research, models and datasets widely accessible (for instance by creating the Hugging Face Transformers and Datasets libraries). When he's not building OSS libraries, he can be found pushing for open-science in research in AI/ML, trying to lower the gap between academia and industrial labs through projects like the BigScience Workshop on Large Language Models (LLM) which lead to the BLOOM experiments, model and dataset. His current research interests circle around LLM accessibility as well as measuring and overcoming present limitations of Large Language Models.

+ Read More
avatar
Euro Beinat
Global Head AI and Data Science @ Prosus Group

I am a technology executive and entrepreneur in data science, machine learning and AI. I work with global corporations and start-ups to develop products and businesses based on data science and machine learning. I am particularly interested in Generative AI and AI as a tool for invention.

+ Read More
SUMMARY

A fireside chat between Euro and Thomas about Agents, their challenges and their future

+ Read More
TRANSCRIPT

Euro Beinat [00:00:07]: Hello, Tom, good to see you. Hi there.

Thomas Wolf [00:00:09]: Hi.

Euro Beinat [00:00:10]: Good to see you. Good to see you again. Everything well? Everything good?

Thomas Wolf [00:00:15]: Everything well. Thanks. Just back from Web Summit. Super happy to be here as well.

Euro Beinat [00:00:20]: Thank you for being here, Tom. I have. We have a lot of people today that are really looking forward to this conversation. I prepared a number of questions for everybody who's online and looking at the streaming today. You will have two buttons in the platform. One it is about Q and A. To chat with everybody else online and sorry, yes, there's a chat and there's a Q and A. So if you want to ask us question, use the Q and A please and then I'm going to look at the questions as we go through.

Euro Beinat [00:00:55]: Tom, I think everybody knows about Hugging Face. We just heard that you pull off the biggest heist in history with the Haik emoji, which is great. It's a nice way to start. Let me just say a few things about Hugging Face. Anyway, just a sort of introduction. Was founded in 2016 by Clem Delang, Julien Chomon and Thomas Wolf, who's here today today with us. Over the last several years it has contributed significantly to democratizing AI and by making all range of models accessible to everybody. I think Hugging face hosts around 1 million models at this moment.

Euro Beinat [00:01:33]: You might have the latest and greatest numbers, but anyway, this is probably a good estimate and sees millions of downloads annually. I think one of the llama models alone was downloaded more than 20 million times. Right. In a single month. So it's a lot of traffic. Tom specifically has the role of chief Science officer Attack in Face, which is very broad if I understand well, leading scientific research, open source projects, engaging with the community, advocating for open source and enhancing the user experience of NLP tools. I think, correct me if I'm wrong, but that should be more or less Tom, right?

Thomas Wolf [00:02:11]: Yeah, yeah, that's right. I mean it's a specificity of hugging face. I would say also that a lot of people here cover very wide domain from open source to science to community. You even see them talking about what they do openly. It's one of the nice perks of being a Hugging Face team members.

Euro Beinat [00:02:31]: Great. Hey Tom. So I'm going to fire off a number of questions and then we go to the audience. But let me start first with something which is about Hugging face. Let's say the evolution of hugging face. So at some point you have become the hub for open source globally, right? That happened at some point. What is the single event that made you believe that was what Hugging Face was becoming.

Thomas Wolf [00:03:00]: Yeah, it's very hard to pinpoint I would say like every community based company, it grow really over time. So definitely I think very early on my co founder Julian had really this feeling that there would be many models in the future. And at the same time I had this idea that transfer learning, which was the kind of pre generative AI idea that one model could do many things. But at that time we were always fine tuning. We didn't yet have this foundation model. But transfer learning sounded very early on like a good idea basically materializing this kind of training compute which is huge. And I think yeah just we saw the community growing progressively over the year Person NLP then with maybe stable diffusion was also a breakthrough in image in terms of visibility. More recently we see speech interface catching up.

Thomas Wolf [00:04:00]: Right. We've started at OpenAI and then and then a couple of open source projects as well. And in the future we are moving to robotics. I'm very big on robotics. I think next year is going to be maybe the chat GPT year of robotics. So I think yeah, it's quite progressive. I was definitely not expecting in the first year where I had the first four, eight models that we would reach one day several million models. Like you said, we passed one million public models.

Thomas Wolf [00:04:30]: But it's also true we have a lot of private models. So it's actually several times this number in total. Which is crazy.

Euro Beinat [00:04:38]: It is crazy and impressive of course. And by the way, I'm coming back later on with this robotics thread because there is something that is super interesting that might shape the way that we look at AI in open source. AI in the next, well years for sure, but also shorter perhaps. But since we are here and we're talking about models and so on, can you say at this moment which are the models, libraries or tools that trend most in hugging face and what do people do with them? Where do you see the users?

Thomas Wolf [00:05:11]: I think the very nice thing about hugging face is everyone can actually just go to the front page and you have this trending session, which is a trending section which lists the trending model of every week. It's usually based on the last seven days. You can see how it changed a lot. A couple of weeks ago it would have been definitely the release of Flux, which was an image generation model released by a new lab, Black Forest lab, which went crazy because of a level of photorealism that had never really been reached yet. And then we had first video model Pyramid Flow, the first open source video model. So we have already many, many close sources but this was the first open source that reached an impressive quality. And right now we are back to nlp. I see a quantum coder which is kind of a co generation model from Quen.

Thomas Wolf [00:06:05]: So Quen is Alibaba is a team in Alibaba. They release extremely strong models. We have also another very interesting model for agents, which is the new Omniparser agent from Microsoft which is able to basically understand the screen and output it in a very synthetic representation that is extremely useful for agents saying okay, there is text here that say that there is an image there that kind of summarizes this image. So it's always very diverse. I think it's really great. We see all modalities basically progressing together and we see also this mixed of models but also data set being released recently. I think it's really started with ChatGPT so about two years ago. This idea that you should also release a demo where people can try the models.

Thomas Wolf [00:06:56]: Then models will be very accessible. I think something that really OpenAI pioneered and that we expect now also from open source models. So there's a lot of spaces which is the demo that you can read as hugging face that are regularly training, trading.

Euro Beinat [00:07:12]: Right. And so you know that the theme of this conference here is agents in production. So we are going to talk about agents clearly, but also agent production. So agents that are not only the tier but actually make a difference in applications and so on. So when it comes to agents, what's your view about how they are going to develop? So how much role will there be for agentic type of AI in hugging phase? Are you planning to create beyond what you have done now, frameworks for these open source frameworks or how do you see this evolving in terms of hagging phase?

Thomas Wolf [00:07:51]: Yeah, it's interesting. Interesting also because this term agent covers so many use cases that are very different from one and the other. Right. Some of these agents, I mean, I mean like a code completion agent, some of them are very close to maybe software developer tools. Some of them are much more related to business use case automatization of business process. And some of them are very close to robotics, robots that evolve, that move in your kitchen. It's also an agent. So I think it's really hard to think about one single framework or library that would cover all of this.

Thomas Wolf [00:08:29]: What is sure is there is some common points, for instance the use of of tools. Many of these agents need to interact with API, with software for robots. They need to interact with real world tools in the future. And so that's a common point. So I think tool use is something we've started to explore a bit. Hosting tools on hugging face I think it's very useful. But in the end what I believe also is to build application today it's very interesting to focus on your verticals, maybe focus on the set of tools that you need for your agen. So I think general, very general agents might come at some point and we see, you know, the leading companies like Entropy Copen AI playing with this idea of a very general agent but we also see that it's not yet fully functional, you know, or one does not always sometimes think for a very long time but the answer are not fully fully there yet.

Thomas Wolf [00:09:27]: So I think today the real, the good way to tackle an agent project I think is really to understand okay, which tool do I need in which frame framework should I be and be kind of a specific modern generic until this day come of generic agent model.

Euro Beinat [00:09:46]: Do you see a marketplace for tools in particular? Because that requires some form of standardization. There are some commonalities across these tools. Everybody's sort of developing tools that feed agents and so on but you know, everybody does it in its own way. So do you see some sort of either standardization or marketplace for all this?

Thomas Wolf [00:10:07]: Yeah, that could come. I definitely know several startups trying to tackle this project, trying to understand okay, what are the, you know, the necessary things. It's also a project that OpenAI themselves right try to do with their generic function coding. I don't think they really have decided to double down on this from whatever, from what I've heard and what I see. So it might be a bit more complex than we thought. It might not be just the, I mean just not just. But just the Amazon and just very simple marketplace. There might be some things that you actually need to get right at hugging face.

Thomas Wolf [00:10:43]: I think we really see ourselves mostly as a very basic platform, like a very low level platform. So we would be very happy to. We try to encourage basically people to build ecosystem on top of this. We're always very careful also to not compete with our own ecosystem. Maybe the difference of some other AI startup. We're very happy if we see one company that become a trillion dollar company based on our platform that would be an amazing success for us in terms of ecosystem creation.

Euro Beinat [00:11:12]: Yeah, that's for sure the case by the way going back to the complexity of making agents work well I think that's the. We all have the same experience from 0 to 80% of functionality. It is, I would not say trivial quote unquote is doable everybody knows how to do it, but it needs to go to 99 else. It doesn't matter, right. And that every single percent going there, it becomes a harder and harder game and it takes time. Even some basic agents like SQL Agent or data agents and so on might take, let's say, took us a year to get it right. So I can see that very, very well. Hey, I've got one controversial question for you, right.

Euro Beinat [00:11:50]: So it came up yesterday. So yesterday we had an event with a few hundred, actually 800 of our data scientists across the group and we made predictions for AI next year. One of these predictions about open source, but it is a strange one in the sense that it captures some sort of contrast between those that believe that open source is going to go up and remain prevalent in the others. But it goes like this. It says that within one year, if you look at all the most capable LLMs out there, so the ones that actually do many tasks, the GPT4s, the, the clothes and so on, if you look at all the leaderboards at this moment, you find open source there. So on average we say they're looking at all the leaderboards about three, let's say Llama and other models are open source models in the top 10. So here the prediction, there's not going to be one in one year. So next year there's not going to be any generic model like this, open source in the top 10.

Euro Beinat [00:12:49]: And then the reason why people say that is that the next runs of these models are so expensive, then it will become difficult to justify against an open source model, or at least it's a sort of, it's so harder to justify commercially. Others on the other hand, say, but it doesn't really matter. Who cares if you're not in top 10, if it's really good at coding, if it's really good at some other task and it's open source, right? So first of all, if you look at this prediction, what do you say? Is it nonsense or does it make sense? But you can also question the premise of the prediction.

Thomas Wolf [00:13:26]: It's a nice prediction in that it is very, take a strong stance like this. It's also surprising because it means people assume they're going to be like 10 companies being able to train billion dollars model next year. So I would be happy to see this race actually increase so much. But yeah, I would say obviously there will always be a difference between closed source model and open source. Just also because just when you think about it, there is this very strange imbalance which is Open source model are open, so close source company can actually get all the knowledge from them, but the reverse doesn't work. So there will always be this distance between one is kind of playing an open game while the other don't. But I think what we've seen this year, and that was surprising, was really catch up on most of the benchmarks. So you may think some benchmark are saturated and that's probably true.

Thomas Wolf [00:14:21]: But it also means that what will be the leaderboard of next year is a big question, right? If it's a leaderboard of like fundamental math or like can this model solve the Riemann conjecture? Sure, yeah, I would expect close source model to be on top. But the next question is, does this leaderboard really matter for any business or agent use case? Right. So my view of AI is that we're going to enter a period like a moment where it stopped being only concentrated in very large LLM and start to be a lot more horizontal with many applications. And we started to see that even at large company with most of them proposing several size of the model that have several costs and several performance and you can even go lower. For instance, last week we released a series of models called Small LM. These are models that are 1.7 billion and lower. And the very nice thing is first they have much better performance than we would expect. Actually these small models have now the equivalent performance to 10 times larger model of last year.

Thomas Wolf [00:15:30]: So llama 2, 10 billion is actually as good as a small LM1 billion. So that's pretty impressive and you can run them on the edge. So there is definitely a future where most of this model will run just directly here. And I think you had another prediction on that, right? Like GPT4 in a smartphone.

Euro Beinat [00:15:46]: Yes.

Thomas Wolf [00:15:47]: And I fully, I fully, I fully agree with this. Well, will it be GPT4, GPT 3.5? I don't know, but it's pretty sure we're going to be able to run really impressive things on this and on a laptop.

Euro Beinat [00:16:00]: Yeah.

Thomas Wolf [00:16:00]: And here it's also pretty sure a lot of this will actually be open source. Because the race on the small local model is mostly pushed by open source companies. It makes sense because if you have a model that is really, you know, your core trade secret, that is extremely valuable, you don't really want to put it there because everyone's going to be able to basically reverse engineer or at least extract the weights for sure. It's so easy to dive in this. And so this is really pushed by small, by open source companies, right? Now it's meta, it's Quent, it's hugging face, it's Mistral. All these companies open source model are fighting on this medium range that will run on smartphone and laptop. And here I would be very, very surprised if we move to fully closed source solution. First and second, I'm pretty sure this is what will drive most of the benefits.

Thomas Wolf [00:16:52]: You know, I would say tangible business or like real life benefits where you don't always need to be, you know, a PhD level physics or biology students who actually do very nice business algorithm or workflow. So yeah, I think we'll see this diversification and just like the Internet, it's going to be a little bit everywhere agents and most of them will be in the medium size I think.

Euro Beinat [00:17:20]: All right, so you are honorary debater for the debate later on today because you had very clear stance on the role of GPT4 on a mobile, which is great, excellent. Of course, I mean all the other debaters are going to be biased, but that's okay, we'll leave with that. Anyway, so let me pick up some questions from the audience here. The first one is from Feder and it's very practical. Do you think that Meta releases that model to be open weights going forward, all the models?

Thomas Wolf [00:17:52]: Yeah, I don't have much more insight than you. In the current mindset of Zook and the Meta executive, it's pretty clear that they have this strong stance on open source. They have this strong idea that they don't want to be trapped in the technology of another company. And I also see from my interaction with them that they are very excited and happy about this kind of community ecosystem that's building around it and that's something they did in many other fields. React. They have a history of creating very large, large ecosystem. So yeah, I would be surprised. But yeah, obviously it's hard to predict.

Thomas Wolf [00:18:35]: Right. Last year everything seems like closed and this year everything seemed rather open in my view. So who knows where we're going to go.

Euro Beinat [00:18:43]: So, okay, so we'll ask you again this question in a while. There are two other questions that we debated a lot and, and want to ask you. And so on the first one there is a hotly contested question of whether, let's say the scaling laws are still valid. Right. So some people believe that, you know, we run out of data and the amount of improvement you can get for the same amount of money is sort of decreasing. So there's no way that you're going to have the same jump between GPT3, GPT4 or Claude200, Claude1 and so on. So there is some form of, you know, it seems some people claim we are the point in which is getting close to plateau. Others believe this makes no sense.

Euro Beinat [00:19:34]: Right, there's plenty of Runway in any way going forward. And others like me that have a very strong view that even if we stop doing anything now, we still have 15 years of applications that we haven't really explored. So who cares at the end of the day. But anyway, back to the scaling laws, what's your view?

Thomas Wolf [00:19:55]: Yeah, I mean just to say I think I'm very aligned with you. I think it's really time to build. There's a lot of really cool stuff that can already be built with a GPT4 llama 3.2 level. And it's mostly a matter of integrating that just now in a nice interface. Basically the revolution of ChatGPT was not so much around the model, but just how accessible and easy it was to use. And today what we see as well is that this chat interface are starting to be a little bit frustrating. We're like, we don't want to chat all the time, we just want to have this much more integrated in our daily life easier. So there is definitely like maybe, I don't know if we have 15 years, but we have definitely like you know, $15 trillion of startup, a new project to create around that, around what we have today.

Thomas Wolf [00:20:41]: But then the question is, yeah, do we run out of scaling low? And there are some notes that maybe GPT5 or like was not the level they wanted. Some ideas. I think there is two answers to that I would say one is that we now use data much more efficiently than we used to. And that's also one reason actually it's surprising because we could have thought that, you know, only very large company could train because of the data size, the compute size. But what we see is that first, I mean there's a lot of startup that catch up with just a series a raise and they already train very good, very good model. We also see that people in this very well founded lab are not scared to live and to create their own startup. So this make me think, you know, maybe it's actually, I think being the leader, being the pioneer, also force you to explore all these dead ends. And then the people who just follow with open source model can actually go directly in the right direction, which help a little bit.

Thomas Wolf [00:21:45]: I would say right now there is two interesting direction for this from a science point of view. One is using synthetic data and I think there was up and down. We had a lot of project on this at the beginning of the year at Hugging Face, a nice project called Cosmopedia. I think what we discovered is twofold. One is that it's not fully the silver bullet that you would like to have. It's not yet that we are at this stage where you can just generate a lot with your GPT.4 and training on this GPT5 is enough to keep climbing these stairs in terms of in particular general coverage. But what we discovered as well is that in specific fields, if you want to work on math and code, generating synthetic data is really useful. And for business application in particular, if you are filled with kind of a low data, you know, low resource data, which is quite often the case using some smart synthetic data approach is very interesting to make it like a much larger data set.

Thomas Wolf [00:22:45]: So I think it's field that's really worth exploring for fine tuning for like adapting models, even if it's not the solution for LLM. And the second interesting thing is this trend for like test time compute which is, you know, instead of trying your model to get always the right answer, finding the way to give in time to think and this will maybe help us, you know, move from this. Like everything is in this system. One type of psychological thing where you should output directly the right answer to the system. Two, where you find this way to think. And that's exactly agent. Right. So it's still extremely interesting to explore this direction in my opinion.

Euro Beinat [00:23:28]: Yeah. Now, provided we also find a way to make cost accessible for all that to work. Hey Tom, I think we have space for another question here and I want to go back to something that you mentioned at the beginning and robotics. Right. So I've seen announcement recently. You post a lot about models and robots and tinkering about together and robots and so on. What is it that makes you particularly excited about this and what do you think is going to be the next year when it comes to open source and robotics?

Thomas Wolf [00:24:00]: Oh, it's already happening. But what is very exciting I think is a string of breakthrough. So really research breakthrough that happened, I would say started around one year ago, roughly 12 months ago, let's say the end of November 2023 with some, you know, first indication that robots could start to do tasks that we thought were almost impossible basically with you know, apply and starting to cook things or starting to be able to, you know, fold clothes or like these very complex things where shapes are moving like different are unpredict basically we started to have some view that it could work from a research point of view. And much interesting was the fact that it was not only on this extremely high end, like $300,000 Boston Robotics robot, but something that was like $10,000. So one order of magnitude less. What I found extremely interesting is that we could even reproduce some of these results. He's been pioneered at hugging face by who joined us from Tesla where we're leading a part of the Optimus and decided to push some open source aspect. And we not only pushed software but we decided also to push like low cost hardware.

Thomas Wolf [00:25:20]: The question was how cheap could we go? Because Obviously you know, $10,000 is still quite a high amount. But if you start to be under 1000 and recently we've pushed, Remy has pushed open source robotic arm which is hundred dollars. So that's extremely cheap. Anyone can buy one. They basically. And we start to be able to do things like folding close with $100 open source arm. There is a step change. I think it's even more maybe last year was the image network.

Thomas Wolf [00:25:50]: I would say it started to work in research setting and next year might be what I would call the chat GPT moment which is anyone could basically have a robot home that would do things that were previously or totally impossible to do by machine.

Euro Beinat [00:26:06]: Yeah, yeah. And we see in our domain which is E commerce, what you see in the warehouses. In the warehouses you have two types of robotics. I mean the legs and we consider legs to be solved. But the hands, hand picking things and so on. And that's not solved. So there's a massive business around this ability of solving the hands problem in logistics and warehouses. We are out of time, Tom.

Euro Beinat [00:26:29]: I could continue with dozens of other questions. I'd like to thank you very much on behalf of the team organizing it. On behalf of the community. Thanks a lot for your participation. It's always a pleasure chatting with you. Hopeful to have you again in another edition then.

Thomas Wolf [00:26:44]: Thanks.

Euro Beinat [00:26:45]: Thank you very much. Thank you. All right, so with that the next sessions will open up the four stages of the event. So if you are on Gradwell, there are four stages and the entire event will flow around these four stages. Stage one will hear the others are going to be in different stages that will start in about few minutes, I think three minutes. But before we have to solve a plumbing problem and we have the chief plumber that is going to work together with us. So I let you assist to that and I'll see you later. Thank you very much.

+ Read More
Sign in or Join the community

Create an account

Change email
e.g. https://www.linkedin.com/in/xxx or https://xx.linkedin.com/in/xxx
I agree to MLOps Community’s Code of Conduct and Privacy Policy.

Watch More

LLM in Production Round Table
Posted Mar 21, 2023 | Views 3K
# Large Language Models
# LLM in Production
# Cost of Production
Building LLM Applications for Production
Posted Jun 20, 2023 | Views 10.7K
# LLM in Production
# LLMs
# Claypot AI
# Redis.io
# Gantry.io
# Predibase.com
# Humanloop.com
# Anyscale.com
# Zilliz.com
# Arize.com
# Nvidia.com
# TrueFoundry.com
# Premai.io
# Continual.ai
# Argilla.io
# Genesiscloud.com
# Rungalileo.io