Sign in or Join the community to continue

Building an LLM Tool with 400K+ Active Users: Learnings that We Wish We Knew from the Start

Posted Mar 15, 2024 | Views 815

# LLM Tool

# Start up

# Codeium

Share

speakers

Anshul Ramachandran

Head of Enterprise & Partnerships @ Codeium

Anshul is a software engineer who has worked in startups ranging from autonomous vehicles to generative AI after graduating from Caltech. Today, he leads the go to market efforts at Codeium, the most advanced AI-powered toolkit for software engineers, which uplevels the work of hundreds of thousands of developers, ranging from hobbyists to Fortune 500 employees.

+ Read More

Adam Becker

IRL @ MLOps Community

I'm a tech entrepreneur and I spent the last decade founding companies that drive societal change.

I am now building Deep Matter, a startup still in stealth mode...

I was most recently building Telepath, the world's most developer-friendly machine learning platform. Throughout my previous projects, I had learned that building machine learning powered applications is hard - especially hard when you don't have a background in data science. I believe that this is choking innovation, especially in industries that can't support large data teams.

For example, I previously co-founded Call Time AI, where we used Artificial Intelligence to assemble and study the largest database of political contributions. The company powered progressive campaigns from school board to the Presidency. As of October, 2020, we helped Democrats raise tens of millions of dollars. In April of 2021, we sold Call Time to Political Data Inc.. Our success, in large part, is due to our ability to productionize machine learning.

I believe that knowledge is unbounded, and that everything that is not forbidden by laws of nature is achievable, given the right knowledge. This holds immense promise for the future of intelligence and therefore for the future of well-being. I believe that the process of mining knowledge should be done honestly and responsibly, and that wielding it should be done with care. I co-founded Telepath to give more tools to more people to access more knowledge.

I'm fascinated by the relationship between technology, science and history. I graduated from UC Berkeley with degrees in Astrophysics and Classics and have published several papers on those topics. I was previously a researcher at the Getty Villa where I wrote about Ancient Greek math and at the Weizmann Institute, where I researched supernovae.

I currently live in New York City. I enjoy advising startups, thinking about how they can make for an excellent vehicle for addressing the Israeli-Palestinian conflict, and hearing from random folks who stumble on my LinkedIn profile. Reach out, friend!

+ Read More

SUMMARY

At Codeium, we have scaled and grown a generative AI tool for software developers from nothing to over 400k+ active individual developers and hundreds of paying enterprise clients. Along the way, we have stumbled into a number of learnings about what it takes for a startup-built LLM product to be sustainable long-term, which we are now able to verbalize and share with everyone else.

+ Read More

TRANSCRIPT

Building an LLM Tool with 400K+ Active Users: Learnings that We Wish We Knew from the Start

AI in Production

Slides: https://drive.google.com/file/d/16B2zXjJZqmwoUkK4ZMCAeceyJToaH61s/view?usp=drive_link

Adam Becker [00:00:04]: Okay.

Adam Becker [00:00:05]: And next we have. Let's see. Anshul, are you here?

Anshul Ramachandran [00:00:09]: I am. How's it going?

Adam Becker [00:00:10]: Adam Anshul. So I have a question for you. I could do the whole introduction and what you're going to talk about, but is it true what we saw in the quiz about the dancing and the orphe? Was that you? I don't remember. Was that you?

Anshul Ramachandran [00:00:27]: That is me, yes. That is the alter ego that I don't bring to work.

Adam Becker [00:00:33]: Can you say just a sentence about it? Like what kind of dance?

Anshul Ramachandran [00:00:38]: I've done a bunch of different styles, but I think I've come to the age now where all my friends are getting married. So I realized the next stage of my dance career is just choreographing people's weddings. But it's a good time.

Adam Becker [00:00:49]: It's not an easy task, man. I was tasked with that once, and I was looking for somebody like you to help me with that. So I think let's leave that as the introduction and please take it away, man.

Anshul Ramachandran [00:01:03]: Yeah, we'll talk. I mean, that's. That's what everyone's here to talk about, but everyone. I'm Anshul. I lead enterprise and partnerships here at codium. For those are not aware, codium, we're an AI powered toolkit for software engineers. But today, I really am not going to talk a whole lot about codium. What I'm really going to be talking about is some of the things that we've learned and kind of almost fallen backwards into learning as we've grown.

Anshul Ramachandran [00:01:27]: Kind of the tool from no users about 15 months ago to over 400,000 active users. So this might be a little high level. There might be a lot of takes here that honestly probably deserve entire presentations of their own. But hopefully this is some interesting perspective from a startup that has kind of been building a tool in the space. All right, so really quick agenda. I'll very briefly talk about what codium is, but then really spend the time doing those retrospective learnings, and hopefully there's time for some Q A. So what is codium? As I mentioned, we're what we like to call an AI powered toolkit for software engineers. So you can think of the classic autocomplete functionalities that you've seen in tools like GitHub, Copilot, an in ide chat experience that is kind of chat GPT integrated with the information that's in your editor, context awareness that allows you to search and navigate using generative AI and embeddings and rag and all those things what we've done is we've made this kind of available just broadly.

Anshul Ramachandran [00:02:24]: We've deployed over 40 plus IDEs as plugins, support for 70 plus languages, support for every type of SEM, and being able to pull in context and understanding from those. And we've trained all of our LMs from scratch to be really good at code. So we're kind of an end to end product in that sense. And I think that's going to be something I'm going to touch on a number of times in this talk. And from the enterprise point of view, we're also deployment agnostic. Right. Some of the stuff that we're able to do is deploy as a SaaS tool, fully self hosted, everything in between. So we're really trying to be that AI toolkit that anybody, anywhere can use for individuals.

Anshul Ramachandran [00:03:02]: We decided to make the whole thing free, and so I always recommend anyone who codes to give it a shot and give it a try. Just some numbers. As I mentioned, we have about 400,000 active users on this individual plan, and it's growing pretty rapidly. As I said, started about 15 months ago. In terms of scale, you can kind of think for code every keystroke, we're trying to produce an inference call through a multibillion parameter model for autocomplete. So just from a scale of amount of data that we're processing, it's actually incredibly large. Code is a very interesting modality in that sense. Lots of paying customers, including big ones, and just generally, I think developers like us, some of the ratings go the same, and we're always very, very U.

Anshul Ramachandran [00:03:43]: S. User focused. But really, that's all I kind of wanted to mention in terms of cody, really the bulk of today, I hope to kind of spend giving some kind of takes on how we've viewed the industry and viewed building LM based tools in general. So I'm going to preface everything saying that we actually got a little bit lucky. We did not start with codium. Our company has actually been around for many years, and we actually started as a GPU virtualization optimization and orchestration company. So we were building all the infra layer for GPUs. Just happened that generative AI kind of blew up and we were sitting on the perfect infrastructure for it.

Anshul Ramachandran [00:04:18]: So not everything that I'm going to say is probably something that's easily drag and dropable. There's definitely some things that very uniquely we could do given our infrastructure background. But I think a lot of other takes will be quite pertinent to people who are thinking about building in this space, I would say the first kind of learning that we're able to verbalize, kind of looking back, is really understanding how a tool fits in with a human's workflow. And what about generative AI? Changed the nature of how tools themselves are going to look different. I'm going to do this a little bit abstract, but if you kind of think of, like, a human, how does a human do work? Right? Humans have brains. That's like the really great thing about us. We can reason about stuff, have some amount of state. We're able to have some internal models of how we do things and how certain tasks should be done, and we're able to apply those models on knowledge sources, right? So for code, I might know how to write a quicksorce algorithm, and I know where in the code base some utility functions might be.

Anshul Ramachandran [00:05:18]: And I can put all these things together to actually do work. And we do work on some surfaces, right? Code, we might be in editors, but generally in tools. This isn't very software specific. We work somewhere, and that's how a human works. But if you kind of think about where the inefficiencies are here, humans have a subset of models. Our models are limited. A lot of growing as a developer or gaining in seniority is tied to the fact that we're building more internal models of how things are done. And our brains themselves are.

Anshul Ramachandran [00:05:48]: We are able to keep state and stuff, but we have a limited amount of memory, a limited amount of state that we can keep together. So there's clear inefficiencies in this general, very abstract way of how people do work. And so that's really why we build tools. Tools have, to be fair, they brains of their own, right? They're able to keep state. They have a lot of memory. They have access, theoretically, to the same kind of knowledge base. A tool can look at the code base just like I could look at a code base. And tools have a different set of models, right? They're able to execute things or encode how to do certain tasks and be able to pull from that kind of set of models.

Anshul Ramachandran [00:06:26]: And the way a tool and a human interacts is on the surface, there's different forms of interaction that the human and the tool can work together, where through these interactions, humans can pass intent of what they want to do to the tool. And the tool can surface back responses back to the human. And this kind of just very general purpose kind of abstraction of how we do work and how humans and tools really work together is whatever I call it, a human tool interface to give it a name. And what generative AI really changed is the model side of tools. If we really kind of pinpoint what happened here until now, tools have really good accuracy on a really small subset of tasks. If you have a tool to do a certain thing, it's really, really good at that. You can think of any kind of like SAST tool, really, really good at finding security vulnerabilities, but it's really only good for that one task. You can't use that then to, I don't know, do PR review.

Anshul Ramachandran [00:07:24]: I don't know what other tasks we want to do. And the beauty of what an LLM is, is that we had a general purpose model that was pretty good precision on a very high recall of tasks. And that is essentially what entirely changed with the generator based and why we talk a lot about the LLM part and the model part of this entire interaction between humans and tools. Building state of the art proprietary LLMs is a huge reason why one tool is potentially better than a different tool. If you have a better LLM that can do the same set of generic tasks with higher quality, you're just going to have a better LLM based tool. But I think one of the takes that I have, and one of the reasons why I kind of build up from scratch here, is that this is still one small part of how the human tool interface per se. It's really important to also be good in all the other places to actually build an experience that is a product that people will actually like to use. You can make a tool have a really smart brain even in the LLM world.

Anshul Ramachandran [00:08:26]: Like sure, the models can get really powerful, but if you're really smart about how you use context and intent from humans to prompt, not prompt, the model, but feed into the model inference time, you can get just significantly better results using the exact same model. So it's not just improving the LLM and which model is better than another. For codium, we built full code base awareness, where we do a bunch of preprocessing of the code base, embedding, indexing, so that at inference time we can retrieve snippets all around a code base and feed them into a state of the art proprietary LLM to get even better suggestions. And making sure that all these different kind of interactions you have, all the different ways people want to interact. Autocomplete's great, it's passive, but people like to chat with their code, provide instructive comments, and then being available in more places, right? If you're available on more ides. You have more surfaces of where a human can interact with the AI, the model underneath the hood of the tool, just the better the tool will be overall. And so I think one of the things that we kind of realized is that, yes, it's important to invest a lot of time into building a really good model, but it's equally important when trying to build a tool that sticks to invest in these other aspects, these other components of how a tool kind of interacts with the human. The second kind of meta learning that we have kind of gotten used to over time is kind of being realistic on the capabilities, evolution and concerns that people, and especially enterprises have of AI technology.

Anshul Ramachandran [00:09:53]: I think a lot of us are based in the valley or places where there's a lot of AI hype and we see and hear all the cool twitter demos and all the stuff that is happening with LLMs and what's possible. But I think we just need to be realistic of what can be productionized and what is valuable and what people who need to deploy these internally or to their customers really care about. And I think number one thing here is just recognizing that this technology is not mature. And what we always kind of talk about internally is really what we're looking for is what are those capabilities and applications of LLMs that fit in this ven diagram between what is useful to our potential user, which in our case is a software engineer, and where the technology is robust enough today to match the trust that is required from that user? Because if you lose a user's trust, then it doesn't really matter that you can do something incredible one in a hundred times. And I think just to give some kind of examples to really hone on this, I think one of our competitors, GitHub Copilot, they tried a feature about a year ago of summarizing PRs with an LLM. It's definitely an incredibly useful thing for a software engineer, because as an engineer of training myself, I know I hate writing really good PR summaries, but if you think kind of with the PR summary what role it has in the software development lifecycle, it is an incredibly important part of that very last step between before code gets into production. And if the AI gets that wrong even once, like it misses a file or misses some context, well, the developer will stop trusting using it, reviewers will start ignoring it, and they'll start looking at the code anyways. And I think that's kind of the problem, that this feature kind of came to where the technology is good, but it's not good enough to hit that bar that's required.

Anshul Ramachandran [00:11:41]: On the flip side, you can write a haiku that describes a PR probably really robust today, but not that useful. Things that kind of fit in the middle, at least in the code world, are things like code autocomplete, which are obviously, I don't have to kind of go in detail. I think everyone's kind of seen what autocomplete can do. It's something there. It's clearly useful because we can speed through a bunch of boilerplate, but it doesn't have to have that same bar of requirement of where it needs to be in terms of quality for a developer to use it. Because if they don't like a suggestion, they just have to keep typing and move on. There's not a whole lot of backtracking that they have to do. And then that's one part.

Anshul Ramachandran [00:12:22]: But then also recognizing that this technology is really quickly improving. I mean, I think all of us have seen the open AI Sora stuff from last week and everything. It's just the rate of improvement of this technology is crazy. And I think one of the things that directly means in terms of a product development point of view is that really focusing a lot on prompt engineering is a really poor investment, because what this essentially is doing is trying to find ways to get a very particular model that is maybe state of the art today to be really good at a task. But maybe tomorrow's model is just bigger, better trained with different objectives. And now all of a sudden, all that work done in prompt engineering kind of is not really relevant anymore. Maybe that's already all encoded in this kind of new, improved model. And so what we'd like to focus on is what are those fundamental building blocks that even as the underlying LLM technology improves, we can still leverage things like how do you reason about a code base to pull in context from all over the place to make sure we're giving very relevant snippets at inference time? That's something that's going to stick around no matter what model we end up swapping in the back end.

Anshul Ramachandran [00:13:27]: And so those are the kinds of investments that we like to think about so that we can continue to be kind of a sustainable, stable product as this technology is going kind of like light speed. And kind of the last part is really recognizing that if you really want to have customers that are willing to pay, which is usually enterprises, you have to understand, I think, a bit of their concerns. And I think we always get really excited about the new capabilities, but the stuff that companies care about and potential buyers care about are the things that are not as sexy for lack of better work. Right. People really care about security, right. If you're a SaaS tool and they don't really know where the code is going and they're working in a regulated industry, that's problematic. And there's all these questions about IP, which is as true in code as it is out in other applications like written text. And so being really careful about what you're actually training on is super important.

Anshul Ramachandran [00:14:27]: And all of these kind of come to say that I think we have a very intrinsic thought here at echodium that prioritizing control over the model for all of these reasons is incredibly important because then you can control what you're putting into the model, you can control how you actually deploy this model. You don't have to rely on a third party API where you don't have control over the latency which a bunch of the speakers so far today have mentioned as a very critical part of getting adoption and getting people to like these tools. And I think just being very cognizant about what people care about that has nothing to do with the technology is almost as equally important to building a product that is sustainable. And the last learning is I think it's really great that AIS and LLMs are as big as it is and it's such a great opportunities for startups. But I think we have to be realistic that we're a startup that are going against big tech companies that are equally interested in this technology because of the massive potential that it could have for their businesses. And so I think realistically, we just have to be the best product on every axis with no caveats if we're really going to make a dent. The way that we kind of think about it is like these three different components of being the best product. You have to address the complete set of subtasks that can be accelerated, because if you don't have certain things and a big company has equivalent product that has some of these features, that's going to be kind of a tough sell to convince people to use.

Anshul Ramachandran [00:15:57]: You need to be the absolute best product for the particular user, for the particular task. So this means building for your application. We just heard all the interesting things that you have to think about when doing voice and speech. Similarly, coding is a very unique modality where knowledge is distributed. People are editing existing text, which is not the same as other kind of applications of LLMs. So really focusing how you can leverage those unique capabilities of the domain that you're working in then also thinking a lot about personalization, any kind of generic system is generically good. But if you're thinking about, in our case, a company that has tens to hundreds of millions of lines of code that are private, that is way more important information to them than all the public data that we've trained on. And so how can we actually leverage those existing knowledge stores to make the absolute best product for the developers at that company is a really crucial kind of lever to be able to build the best product.

Anshul Ramachandran [00:16:54]: And the last thing is, given all those concerns about security and legal and costs, even AI just must be a clear win so that people ask questions for us, particularly these kind of translate to these three. But I think the really exciting thing here is that yes, a lot of big tech people are kind of playing in the space, but there's no one has a multi year background in LLMs. This is a relatively new thing. So it's a really exciting time to have the opportunity as startups to create the best product. Because today you're not going to be able to go out and pitch to a VC that I'm going to create the greatest social network or the greatest search engine. Those are long gone. People have head starts there. But this is so brand new that there's a real opportunity to actually build the best products from scratch that will hopefully be able to turn into sustainable long term businesses.

Anshul Ramachandran [00:17:49]: And the other parts is as a startup we have to get users. We're going to lose our distribution to big companies because obviously our competitors are some of the greatest distributors of software in history. So somehow you actually need to get users, because without users there's no feedback. And in a space that's moving so quickly and technology is moving so quickly, then you don't actually make as quickly progress. And really developing in public and shipping quick and getting feedback quick allows us to rapidly ship products on our side. We decided to ship in ide chat almost a year ago. That was like about seven, eight months before any other product was able to do so. And we chose to do that so that we can leverage a really large user base to get feedback on what was actually working from a UX point of view, from a model point of view.

Anshul Ramachandran [00:18:35]: And so finding a way to attract users in the beginning, maybe through a truly differentiated product with a huge amount, or realistically an incredibly low price. One of the things that we did from the beginning, because of our info background, we can actually serve these models at scale, at orders of magnitude less serving cost than any of our competitors. So we decided to just make this product free. Like any individual should just be able to get unlimited autocomplete chat, all of this completely for free, on any of their ides, forever. And that's a bit of a crazy kind of statement to make. But having that differentiate to be able to attract users has now led us to actually have the largest free user base out of any AI tool. And that has been super helpful to be able to iterate really quickly, to the point that whenever we push a new model or we try to a B test a new model, we get about million acceptances or rejections on autocomplete within the span of 15 minutes. And so that kind of feedback rate is super important to be able to iterate quickly in such a rapidly moving space.

Anshul Ramachandran [00:19:42]: But at the same time, you need to make money somehow. We made it individual product free. And realistically, enterprises are looking to adopt these kind of technologies to standardize their tool set from an AI perspective across all of their knowledge workers. And so they're a great place to actually be monetized. But I think one of the things that we always think about is how do we make our product truly an enterprise grade product, and not just our individual free product with seat management slapped on top of it. And I think there's a lot of things that you could consider doing. Some of the stuff that we did was we created a way to fully self host our entire solution end to end, including models. And that has allowed us to really interact with a lot of regulated industries that otherwise could not.

Anshul Ramachandran [00:20:27]: So there's all these different kind of ways that you could differentiate an enterprise product and kind of go to market that way. So I'm just going to recap kind of like the three high level points here. I think being cognizant of the tools and all the different aspects of a tool is super important because it's going to be hard to compete on the generic foundational model side of things. But there's a lot of ways that parts of a tool that will make it better than another, being realistic on what Geni can do today and how fast it's moving. And then just understand that as a startup, there's a lot of things to think about that maybe a larger company doesn't. But I think there's a lot of opportunity as well. So hopefully some of these were interesting. A lot of these points probably could have been talks of their own, but that's all I have.

Anshul Ramachandran [00:21:15]: So thanks for listening. And I know if you have time for questions Adam, I'd be happy to take them.

Adam Becker [00:21:19]: Thank you. I'm not seeing any specific ones, but please stick around in the chat. I'm sure people will love having you there. In many ways, it feels like many of the talks we've been listening to today are each one of them could have been a keynote. This feels keynote in the sense that it actually touches on all of those themes that are just so salient for this stage.

Anshul Ramachandran [00:21:44]: Right?

Adam Becker [00:21:45]: Like, I'm sure that they're having a lot of fun on the other stage figuring out whatever the engineering stuff. Unless you're fully cognizant of that ven diagram, the one that you presented, you're just living in La La Land and that's the situation. And I see this left and right, so it's very good to keep that in mind. One of the things that often comes to mind when I think about these things is like, people seem to have a vision of their product that I call like, agi complete. Sure. At some point you can have an AI that does this thing. You can't bank on that today because there's no users willing to pay for that today because you can't establish enough trust with that today. So you got to reduce your scope or reduce your vision or modify your product, or somehow have it impact your go to market strategy.

Adam Becker [00:22:28]: Because what you're picturing right now is a fantasy.

Anshul Ramachandran [00:22:32]: It sounds like you've got. Yeah, great. The long term vision of AGI complete. Think about it. If you really want AjI complete with none of the infrastructure and stuff around it, yeah. You're really hoping that you can dump all of the known information in at inference time and get a result with a little bit of prompt engineering. That is a hard sell from the usability point of view, because God knows what latency of that is going to be, right? So maybe yes, quality would be great, but practically it's not going to be deployable. So there's all these different things to think about, which honestly, we didn't think about going in.

Anshul Ramachandran [00:23:08]: We decided to punt things to users and learn from them. But yeah, I completely agree that there's all these kind of foundational building blocks that are important to build that aren't just the model.

Adam Becker [00:23:20]: Anshul, this has been wonderful. Stick around in the chat and I hope folks can also find you on the slack afterwards.

Anshul Ramachandran [00:23:27]: You seem to be a trove of wisdom.

Adam Becker [00:23:31]: Awesome Anshul. Thanks again.

+ Read More

Sign in or Join the community

Watch More

Why We Built an Open-source Virtual Feature Store? Learnings from Serving 100M Monthly Active Users

Posted May 26, 2023 | Views 579

# Open-source

# Feature Store

# Featureform

# featureform.com

How to Systematically Test and Evaluate Your LLMs Apps

Posted Oct 18, 2024 | Views 15.1K

# LLMs

# Engineering best practices

# Comet ML

Small Data, Big Impact: The Story Behind DuckDB

Posted Jan 09, 2024 | Views 13.3K

# Data Management

# MotherDuck

# DuckDB