AI Meets Memes: Taking ImgFlip's 'This Meme Does Not Exist' to the Next Level with a Large Language Model
Stefan Ojanen is the Director of Product Management at Genesis Cloud, an accelerator cloud start-up focused on providing infrastructure for the AI space. Prior to joining Genesis Cloud, Stefan managed two data science teams at Scorable, a Berlin-based AI start-up that developed a holistic XAI solution for asset managers. Stefan is a founding member of the MLOps community in Munich.
At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.
How to use a Large Language Model (LLM) to create memes? Let's discuss the unique dataset of ImgFlip, the selection, and fine-tuning of a commercially usable LLM, and associated challenges. Of course, we’ll also demonstrate the model prototype itself. We will also discuss the challenges we anticipate facing with the productionization of an LLM that is used by millions of users.
Introduction
Next up for the illustrious lightning talks. You got Stephan coming on. What's up dude?
Thank you for having me. Here we go. I copied your look. You know I love to about your shirt, so I figured I'll also go for the summer. Look today. That's what I'm talking about, man. That is what I'm talking about. I love it. I am so glad that you're feeling the vibe. If anybody else is feeling the vibe and wants to get Hawaiian honest or tropical, send us a selfie or post it in Slack or post it on Twitter and tag us and I will send you.
Some swag. I will promise that I will send some swag. We might even send some, uh, genius cloud swag, swag you're with. If I can talk, if I can put words together, that will be what we will do. So, Stephan, man, I have been wasting a ton of time right now, and my only job is to make sure that we are on time.
Therefore, I'm gonna hand it over to you. I'm gonna let you get Kraken with your 10 minute lightning talk, and then, uh, I'll be, I'll see you in 10 minutes, dude. All right. Thank you so much. Okay. First of all, thank you for joining. Uh, my name is Stephan Nan. I'm here representing Genesys Cloud. We have a fun, fun project to showcase as soon as I get my slides changing.
So, topic of the talk today is, uh, AI meets memes. So how do we tell image flips? This meme does not exist to the next level using a large language model. All right, slighting talk. So let's go. Um, first of all, what is a meme? I think everyone is familiar with the memes, but yeah, it's usually about mimicking something, right?
So it's taking something that exists and making small variations of it. And over the years, you know, you get layers and layers of meta, um, by memes. It's kind of interesting topic for large language models, cuz in general humor is something that's not very well understood, uh, by or, or explained by humanity yet.
Um, so first the brief, brief history of meme generators using ai. Um, back in 2018, there was a paper from Stanford Research called Dan Learning. So they used, uh, long short term memory model then to generate memes. But, uh, while it works somewhat, it didn't really, um, work for the structure of, of the memes.
And this is something that the authors acknowledged. Then 2019, so Dial and the founder of Image Flip did it, did this himself using a deep con convolutional neural network. And this is the basis for the AI meme or this, this meme does not exist. Service that's available on image flip. And then going forward from 2020 and with the advent of G B T, uh, series of models, yeah, I've seen various hobby projects and even some commercial services.
Uh, Uh, generating memes, but, um, you can also do it yourself and we are gonna show you how. So how did we end up here? First of all, let's just be honest, I really like memes, so it's a fun, fun project for me. Um, then we, we, at Chinese Cloud, we are a provider for AI infrastructure, so it's really important for us to understand from user's perspective.
How does our infrastructure is to use for them, right? And given the, you know, with the advent of Chachi, bt, and the great public interest, therefore on the large language models, it really just seemed like a fun project for us to learn from. And I believe that the technology exists for generating memes.
So why not? So here we are. Um, so how I got started, A quick reality check. So back then, um, when we got started with this, the G P D was still based on the GD 3.5 model. Um, but for me it was the most advanced model at the time. So I just wanted to check bud. A model like this, understand the meme context. So, um, because started, so what we already, in the first response, we saw some hall hallucination.
It did not really understand how the meme template is for a Philoso Raptor meme. Uh, as for a caption Mm, first one, not, not really great, but the second one, yeah, that's actually pretty good. So yeah, I put it in a, in a template and uh, yeah, it's the meaning of life 42 because that's the only answer I'm getting no matter how the mass Google.
Yeah, I think that's somewhat funny. And as comparison, an actual human generated popular version of the meme. If, if the opposite of pro is gone, it's the opposite of progress. Congress, right? So there's silly but smart questions. That's the idea of the meme. So I believe that there is really something there.
So we went ahead with the project then. Uh, so attention. No, no, no. Data is all you need, right? So we figured, okay, to make this happen, we need a meme data set. Who has the biggest, uh, MIM dataset, it's image flip so they can provided with a huge, uh, dataset. So 60, uh, 650,000 MIM templates. Most of them are like same, same but different.
Uh, so the number of unique templates is actually unique. Templates is a lot smaller and there are like 40 million individually, uh, created memes by the users of the service. That's what we are working with. So the goal was to showcase how a contemporary LM can create quality memes. And we want, and we wanted to expand the, the existing service to cover more templates than, than it does currently.
And 255 would cover like 62% of all memes created. So that's the number we settled on and we wanted to demonstrate how genes cloud, uh, can serve such an L L M for millions, millions of monthly users. Uh, first we tried the, you know what, what you usually do is try the bird. Uh, it's kind of well underst model.
Uh, they didn't get good results, so we moved on from that pretty quickly. And, uh, yeah, with the bird we get a little bit better results, but this is the time when the llama series of models came out and these are like instruct models and, uh, and this is where I turned, we turned our attention to. So, but the problem with this left.
Uh, LAMA version is that data license, the for is commercial use. Uh, you know, open Lama has been made available since, but there's another set of family, so you have direct pajama and. Um, yeah, we tried it, but it really, even though it was sometimes funny, it just wasn't good enough. It really struggled with the format.
A lot of the responses were just nonsense, but stable. At a lm, this is where we struggle. So we found out that this worked easy. The best. It's, it's not as good as GT four admittedly. Yet, but close. And I think we can make it, make it equally good. So now we have insert models, right? So we tried all kind of approach and approaches and this is what we found to work the best, right?
So first you have this instruction for the model. This is the overall context. You know, you need to, uh, tell the structure. It's a two part, it's a three part meme, so forth. Uh, then you have the input. So the specific context of, of the meme. This is the prompt engineering part. So you, you actually that it worked the best when you do this for each template we found out.
And then you have like the actual data. So this is not the generated meme, this is kind of the training data used for fine, fine tuning. Um, then so, um, we basically used the lead parro project with allora and adapter, and we only need a three epochs. Just make note that we, this is a pre proof of concept dataset.
So, but we only still needed three, three airbooks, uh, for the model to convert and we expect not to need more than 10, 10, uh, airbooks to with the full, full dataset. Uh, and the training time, it was really fast. So using a four, four time four G use, it was only 21 minutes to find. And yeah, it's pretty easy also from the ops perspective.
So we basically were able to max out the GPUs and not waste any resources. Then the results. So, uh, we basically use all of the three three, um, um, models mentioned earlier to create seven, um, seven iration of the. Of the three templates. So it's not like a big, big data set to cherry pick from. And I kind of graded them myself, and 76% were actually funny to me.
So it's just my subjective uh, judgment. So have a, have a look at yourself, and these are now by the model that we build. So waste time, eating fruit, make this multi vitamins. Of course you take that true self true love. Oh, that's a tough one. But I guess that's a compromise you sometimes have to make. Uh, then, uh, yeah, this memory and beloved character sending texts and text messages.
Okay. That's a pretty grim choice I have to say, but, uh, something life is like that work harder, drink. Okay. Yeah, that's, that's good. Then living with parents or to eat dinner. Yeah. Tough economic can be like that for sure. And then lastly, so only out of seven examples this came out and, and, and I, I swear that this is not, this is not cherry, cherry pick picked or anything.
Right? Right. A meme talk left, left meme. Okay. That's, that's, that's funny. Then. Um, how, how do we serve these means? So, um, first we are gonna fine tune the all out model for the 256 templates. Um, so we did, based on the, the initial work, we made some assumptions. You, you know, um, we measured the biggest six tokens a second from the, uh, 30 90.
Um, we rough, roughly expect like 10 tokens per request. And based on their site behaviors, so roughly for 5 million monthly requests. So we calculated that with, with our seasonality unit, roughly four GPUs service. Um, we still haven't properly calculated this, but we expect that the peak close to be around 10%.
So we need like a scaling set up, up to four, uh, for the GPUs. And the setup Yeah. Is pretty standard stuff. You know, it's very, very similar to, to n anybodys reference architecture. So just use Kubernetes for scaling, put some monitoring in there and use Triton then to serve, um, serve the model itself. And basically we provide them like an API that that image flip can, can con consume for their website.
Yeah. Why not use just GPD four? Yeah, it's black box. Open, open AI API becomes a s point of failure for your AI application. That can be a real problem. You don't control the, you don't control your data. Maybe less important for means, but that can be important for some. Um, you have also limited control over the model, right?
And it's expensive at scale, like really expensive, you know, fine tuning, tuning. This cost cost roughly $1 using genetic club, only $1, right? Then, uh, over a month generate 50 million tokens. Yeah, it'll, depending on if, if you go on demand or long term, you know, it's 300 to 500 u uh, US dollars over a month.
So not too bad. Right. And if you would use the open api, it's 10 times, it would be four and a half thousand. US dollars. So, um, I don't wanna trash open ai. It's great for, they, they do absolutely amazing work, but sometimes when you do it yourself, you, you can get something similar with much less money.
Right. Okay. What's the problems we expect to face to keep it lean? Right? So what will be the minimum introduction needed? Uh, while meeting an sla mitigating latency, there's some naive approaches like looking at GP utilization, but adding GPU Q time is a better leading integration. This is, this is something that we will really investigate and yeah, just, just how to deal with a ton of concur requests.
Yeah. Okay. So that's, that's the proof of concept. So it's working. Uh, we will continue. So there will be another, another talk with another time, uh, how we handle the TRA training for the complete dataset. Um, we will then, by then, we will showcase the actual live production set up when, when it's up and running and the, the actual challenges we face, not just with the ones that we anticipated, and then of course show some more memes when the model is also, uh, a little bit further Ebola.
Yeah. So thank you very much for listening for this Quick Lightning talk. You for any questions and feedback, you will find us in Gene Cloud's virtual booth. And special thank you to, to dial in from the founder of Imagely for providing us with this amazing data set and Mar Maran for doing the, the model actual modeling board.
I'm, I'm here just to, Dude, shout out to you for being the first and maybe only person that I know that is using stable lm. That is pretty impressive. Every time that I talk to it, it hallucinates more than my college roommate. Yeah. So in, in this case, I think it's like, it depends on the use case, you know, so it's, it's worthwhile to try, uh, try them all.
It's really amazing what they've done with Lead Par, uh, with the lead parro, you know, it makes easy to find this, these models. So they're all, all, all of the three models are part of that. So you, I'd say like just, just, you know, try it out. And depending on the use case, you know, your mileage, he may vary, you know.
Awesome. And, um, Yeah. And I think it's like mem is very contextual, humor is difficult, right? So if I'd say with relatively little effort and with sim, simple setup, we were able to, to fine tune, uh, model, I think, um, you will be also able to do the same for your, for your business use case. That might be, um, Less.
Less metal. Yeah. Awesome dude. Well, thank you so much for coming on here and anybody that is interested in everything that you all are doing, like we said before, you can click on that little left sidebar that says solutions and you can, I think they can get some swag directly from you, but I'm also gonna be able to give out all kinds of swag to them too.
So what I, I'm pretty sure you guys. Uh, giving out some socks or maybe some, uh, mugs. I can't remember exactly what, but parents and and socks. So we join our, join our booth for those. There we go. Get over to that booth and clean your free swag. So, dude, thanks so much Stephan. I am going to, to keep it cruising.