Sign in or Join the community to continue

AI-Driven Code: Navigating Due Diligence & Transparency in MLOps

Posted Nov 29, 2024 | Views 498

# Due Diligence

# Transparency

# Sema

Share

speakers

Matt van Itallie

Founder and CEO @ Sema

Matt Van Itallie is the Founder and CEO of Sema. He and his team have developed Comprehensive Codebase Scans, the most thorough and easily understandable assessment of a codebase and engineering organization. These scans are crucial for private equity and venture capital firms looking to make informed investment decisions. Sema has evaluated code within organizations that have a collective value of over $1 trillion. In 2023, Sema served 7 of the 9 largest global investors, along with market-leading strategic investors, private equity, and venture capital firms, providing them with critical insights.

In addition, Sema is at the forefront of Generative AI Code Transparency, which measures how much code created by GenAI is in a codebase. They are the inventors behind the Generative AI Bill of Materials (GBOM), an essential resource for investors to understand and mitigate risks associated with AI-generated code.

Before founding Sema, Matt was a Private Equity operating executive and a management consultant at McKinsey. He graduated from Harvard Law School and has had some interesting adventures, like hiking a third of the Appalachian Trail and biking from Boston to Seattle.

Full bio: https://alistar.fm/bio/matt-van-itallie

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Matt Van Itallie, founder and CEO of Sema, discusses how comprehensive codebase evaluations play a crucial role in MLOps and technical due diligence. He highlights the impact of Generative AI on code transparency and explains the Generative AI Bill of Materials (GBOM), which helps identify and manage risks in AI-generated code. This talk offers practical insights for technical and non-technical audiences, showing how proper diligence can enhance value and mitigate risks in machine learning operations.

+ Read More

TRANSCRIPT

[00:00:00] Matt: I'm Matt van Itallie, founder and CEO of Sema, and I drink my coffee black. Demetrios: Welcome back to another MLOps Community Podcast. I am your host, Demetrios, and today we talked about codebase scans, mainly because Matt has a company that scans codebases when founders are looking to exit the company or when companies or hedge funds or people with a lot of money are looking to buy companies. They're doing their technical due diligence. And I find this issue fascinating in general, but the nuances and also the eight different things that he is doing. When he's scanning the codebase and checking for, is really cool to see that you can just automate that, number one. And number two, [00:01:00] how much thought has gone into it. He said he took a whole year to figure out the best ways to translate. certain pieces of the code base that are very technical into numbers or a score that non technical stakeholders could understand. I don't want to spoil anything. Let's just get right into it. And as always, if you enjoy this episode, share it with just one friend. This episode is not sponsored and it is not financial or legal advice. I want to say that right now. Matt: I love it, but you Demetrios: don't Matt: have any Italian roots, do Demetrios: you? Matt: So the last name, it means from Italy in Dutch. So hundreds of years ago, ancestors moved from Italy to Holland. Uh, and then my grandfather [00:02:00] emigrated here. Demetrios: Okay. All right. And you kept the Italian roots, not the Dutch roots then, huh? Matt: Yeah. Well, van is like, there's like, you've heard a von in German, um, means from, um, so it means, um, from Italy, but in Dutch. So yeah, we're a lot more, I mean, I'm not particularly Dutch, but more Dutch than Italian, but the name, the name has stuck. Demetrios: Incredible. All right, man. I want to talk to you a lot about these code based scans and maybe we can just set the scene with what they are. How you think about it. Matt: Absolutely. Uh, and again, thanks for having me. I'm so looking forward to this conversation. So, um, our CodeScans product is a detailed and also summary view of the non functional requirements of a code base. So it's not what does the product do, how well does it do it, how well, what kind of product market fit there is. It is stuff, quote, under the hood. We [00:03:00] categorize it into two chunks. One is product related risks, and that is the likelihood that the current code base will be able to, um, that the team will be able to maintain and expand the current code base. Uh, and the second is compliance risks, um, things that lawyers might care about, things that could get you, uh, in trouble with external stakeholders. We capture eight components or modules that fit into one of those two categories. And I'll just briefly, because I know this is such a, such a technical audience and I love the detail, I'll at least run through, through all eight. One is code quality, various forms of technical debt. A second is development process. So notice I didn't say it's a. Code, we're not just looking at the code, we're looking at the code base. And to us, it's really important to think about the engineering team around it, both as a whole and individuals, because that's a huge part of, um, of a healthy, of a healthy code base, full stop. So development process is the [00:04:00] consistency, or lack thereof, of development activity. We were working on product X, all of a sudden we tripled the commits to that. Maybe there's a good reason, maybe not, but that's, that's an example of a process metric. The team metric is looking at developer by developer subject matter experts. And whether they are still in the business. And I have said this, um, I'm very proud of this actually. We, you know, we serve, um, our customers, our private equity firms, our deal teams, our non technical CEOs and CFOs. Uh, and our job is to cross, cross the chasm between making tech. Um, understandable to non tech audiences and we try very hard to do that. As one of the things, I think we've done a good job of, of teaching our, um, our customer base. We serve most of the world's best software investors. This is used in, significantly in technical due diligence when someone's buying a company, not That's not the only use case, but it's, it's been, it's a predominant one. So back to team, one of [00:05:00] the most important things, and we would say the most important of all of these eight sections about the health of a code base is the presence of subject matter expert developers. Now, everybody listening to this knows that that is true, but To non, you know, to non technical audiences, they think, well, it's the code, it's the IP, it's the words. And so my favorite, one of my favorite quotes, I do have a few, is having a, buying a code base without having the coders, the, you know, most of the coders who wrote it and know it in detail, is like having a half written novel without the novelist. If a novel's done, fine, you have the book. But if the novel's half done, it, you, it really doesn't work. And code is a half written novel. Code is never done. Uh, there's new features, functionality, there's upgrades, there's patches, et cetera. And so of all these eight, well, they do matter. You know, in total, that's the, the team is the single most important one.[00:06:00] Um, and again, it's, it's content, it's the subject matter expertise. So that's number three. Um, number four is cloud spend and utilization. So, of course, that is a huge, um, Accelerator, but also can be a tax in terms of how expensive it is. And so an assessment of whether, uh, how optimized it is and whether there's savings opportunities for, for public cloud providers. Uh, then there is, um, intellectual property risk from open source code. Um, so if any of you have thought about CopyLift, um, uh, or BlackDock is the 800 pound gorilla of the, of an individual cop, um, uh, IP risk scan. That is about, um, If you use certain, um, open source libraries that have certain, uh, licenses, there are certain circumstances where it creates intellectual property risk for the organization. Uh, especially around high stakes, uh, diligence [00:07:00] situations. Next is code security, and that's the security, both of code that the team wrote, think of that as a SAST or a DAS scan. It's also the security of code that, uh, open source code, so CVEs. Seventh is, um, A very light touch cyber assets review. I won't call it a pen test, uh, because it's, it's, it's not, doesn't actually intrude, but it looks at the assets around, um, uh, the domains and subdomains. And finally, and the one that we worked on, um, over the last year, is generative AI code. in the code base. I will say a minute on that, and then of course, happy to go wherever direction you want. But you think about open source code. Open source is code the team didn't write. It's a super good idea to use it. It would be effing crazy, um, I hope I can say that, PG, effing crazy to not use open source because you're [00:08:00] reinventing the wheel, it's annoying, like the whole point of engineering is to solve problems. If that problem is solved, why do it yourself. But, Open source code comes with security risk, comes with maintainability risk, comes with intellectual property risk. Well, guess what? You replace open source with Gen AI code, you know, code that comes either from a specific tool. Literally the same. The team didn't write it, the team prompted it, but the LLM wrote it. Um, we believe that It is the right framing to think of it like open source. It is a really good idea to use it in the right circumstances, um, because it saves time, it's better for the organization, et cetera, but it absolutely comes with quality risk. Gen AI makes up packages, uh, among other problems. It comes with security risk. I'm sure it's clear, every code comes with security risks, so we're not saying it's better or worse for Gen AI, but you do have to manage that risk. And under certain circumstances, it comes with intellectual property risk. So, we started with, so folks who know software bills of material, which is basically how much open source is in the code, [00:09:00] we were incredibly honored that some Pretty amazing customer said, well, you build an S bomb now, please figure out Gen AI code. And so we invented the G bomb, a generative AI bill of materials to understand how much Gen AI there is and how safely it's being used. Demetrios: Yeah, let's get into that. But before we do that, there's a few fascinating things on those first seven that I wanted to look at, because basically if you're telling me, and I'm trying to understand it to get the full picture, which is. I can scan my code, and I can recognize, like, how much I'm spending, so I have a little bit of FinOps in there, and if there's some easy wins, recommendations on, Hey, you might not want to use this service, or you might want to think about turning it off, you're not using Spot Instances, you've got this GPU over here that's just been I'm running for the last five days. I don't know why. Think about setting more alerts or [00:10:00] something there. How does the actual practice of it look? And do I, so basically do I go in and do I set all the knobs? Inside of what you're doing, or is it that it automatically will scan it and then just give me recommendations Matt: for, um, for cloud cost optimization in particular, is that right? Demetrios: Yeah. Let's, let's start with that one, but I want to also go down the line. Matt: It's approximately, um, there's two phases to cloud, to the cloud cost module. One phase is an hour of setup, um, where we just analyze, uh, historical data. Uh, so it takes about an hour to set up and then there's detailed, um, Uh, detailed results that come from that, um, just that hour or less of setup. There's a second level review if you want to get to, um, per customer level analytics. So for certainly for larger organizations who should be taking into account, [00:11:00] um, the Profitability, or the COGs at least, of individual accounts. Um, there's another one to four hours of pulling in, of collecting some, uh, customer related information to merge the two, uh, to come up with, uh, per customer, um, you know, COGs or profitability contribution. Side note, if you're a medium or large company, Uh, company, uh, your board would be fricking thrilled by that, um, because it shows as technical leaders, you're thinking about the business impact and the, you know, the, the, the business side, but long story short, an hour or so of setup is enough to get to, of course, this is how much you spend in the trends, and this is the maturity level of your spend. And here is a rough estimate, uh, of how much savings is possible. You can't, usually you can't flip a switch in most cases to carry out some of the improvements. You are pretty quick. Some of them require some, some team time, but we have literally never seen it pay, not pay for itself [00:12:00] 5x, 10x or more. Demetrios: And so there's also this idea of code quality, and I'm wondering if the code quality is just On specific functions, or are you looking at the whole pipeline? Are you looking at the CICD pipeline? When were the tests, everything, the robustness of it? What does that look like? Matt: Yeah. So we do not do the whole CICD pipeline yet. If I were to wave a magic wand, we would do, be working on Dora metrics, um, which we haven't, we haven't added yet. So it's not about, um, Your podcast notwithstanding, we don't do DevOps, um, uh, we don't do DevOps, uh, in any, in any meaningful way yet, although it's definitely, uh, definitely coming. Um, it is, for us, for code quality, is sometimes at the line level, sometimes the file, sometimes the block, and then sometimes the repository. Uh, at the repo level, Uh, excuse me, at the line level, we do lint, we lint, um, although what I, what founder hasn't drunk their own Kool Aid, so I love talking about this, whatever, but what I like, what I like about, um, our approach to, um, [00:13:00] uh, to looking at line level warnings is, um, it's at a very high level. So, if we were to look at a codebase, and it had 50 line level, um, medium sized codebase, and it had 50 line level warnings, almost certainly, the team is linting aggressively. It's almost impossible not to get to that number without. If there are 50, 000 line level warnings, the team is not linting. Yeah, linting's not the end of the world, right? Um, we think in general it has some benefits for developers being able to communicate with each other. But for us, of course, the day by day, minute by minute, team by team, um, getting the right, you know, having linters or not making that right decision is really important. But our job is to explain the whole state of the code to a CEO or a board member who doesn't understand, um, it doesn't know the details, um, of line level warnings. And so we actually. Part of this is we translate almost all these metrics into a one to a hundred score. That's incredibly provocative. We didn't do it [00:14:00] lightly. It took us about five years of data collection before we were ready. Line level warnings counts less than one percent of the total because whether or not you're using a linter or not doesn't really move the needle on whether or not the code can be delivering business outcomes. By contrast, back to another non code quality one, but just for, for comparison's sake, having team retention is worth 25 out of the 100 points, because the code doesn't, like, you're screwed, uh, if you don't have the team. So, line level warnings at the line level, um, at the block level, we do things like unit test coverage, um, uh, there's, We can only measure unit test coverage looking at the, at the code itself. So there's other kinds of testing that can't come through with a tool like ours. We look for duplicate code, excessive complexity, or McCabe complexity, which was written in 1976 and is still relevant, which I love. I love when good ideas last. And then at the repository level, we look at, um, [00:15:00] Indications that the code base might need a refactoring. That is, I didn't say this at the beginning, we love data about code. Um, I'm actually the son of a math teacher and a computer programmer. So I was kind of trained, uh, to treat everything as data and code as data. It, it ends at a point. Code is not, code encoding is not purely reducible to, uh, to data, unlike Sales or sports. Um, uh, it is a craft. It is not a competition. Uh, you can't just look at the metrics. Let's say the number of lines of code increased yesterday versus today. Is that a good thing? Demetrios: No, Matt: maybe, but not always, not always. Right. And frequently going down would be better. Right. And so just from that simple example, and you spin it up, we produce this report. We make it as clear as possible. We have automatically produced a set of discussion questions and it's those discussion questions that need to be answered to really, to really tell the whole story [00:16:00] and refactoring in particular is not reproduce the need for refactoring is in our view, not re is not reducible, reductible to an automatic assessment. It really does need a. Um, uh, a human review. Demetrios: So I wanna go down the rabbit hole of bridging the gap between the technical side of the house and the business side of the house. What are some of the key learnings? I, I mean, you broke down that you have a score that took you five years to put together. I can. Only fathom there are some true gems you've got inside your head on how and why you put it together the way that you put it together. Matt: Yeah. Um, when in doubt, um, I'm going to shut out my friend, Adam, but not give his last name to protect him. When in doubt, um, explain things using middle school, um, concepts. Uh, because if you can make it accessible and understandable at the middle school level, um, whether it's math or analytics. Um, are telling a story you can, um, uh, you can make it work as long as it's defensible, you can make it work at all these other levels as well. So our, our system, and by the way, it is transparent. You, you're used like we don't, uh, of those eight modules looking for gen AI code, we actually use AI to detect for various reasons. It has to be a probabilistic, uh, approach. [00:17:00] We use a deep learning model for it, but everything else is mechanical, is deterministic, um, And that's in part because what we do, we are, you know, make this up, Demetrius, you're going to, you're the CTO of a company. You've put your blood, sweat and tears into making it work. And if this deal goes through, yeah, it could be, you know, a million, 10 million, a hundred million, 5 billion, um, 10 billion. Um, and it rides, like the thing about technical due diligence is it's one or zero. Um, if your sales are a little bit less than, um, um, one would like, or there's too much churn, someone might take down the purchase price, but tech due diligence, you either pass or you don't. And so in that moment, it's incredibly stressful. Um, it's really important to be clear and. I obviously am a big fan of AI and I use it all sorts of ways, professionally, personally, in the company, you know, in the product, et cetera. But what doesn't work is [00:18:00] Demetrius, we looked at your product and the black box says it's not very good. So we're not gonna, we're not, we're going to recommend to our clients that they don't buy it. You would have a fricking heart attack. You would have a heart attack and you should, right? Cause that it's so high stakes. We can't. We can't, um, leave it to something that, uh, that not determined. Now I'll answer with some specifics. Um, if your audience can envision a credit score, uh, at a personal level, um, Um, it, it works like this. And by the way, we're happy to send you a sample. And if you're listening for free, you can see a sanitized version. Um, you can also try it out for free. We have a, it's not exactly the same, um, as the true analytics, but there's a self serve, anonymous. And, um, not saved. So you can put it in, if you want me to look at it, I will, but you have to screenshot it 'cause we, we don't save it to protect confidentiality. Um, but so the report has a one to a hundred score. You heard some of the very, very particular [00:19:00] weighting that we have thought about and put in. Linter results are only worth 1%. Um, developer retention's worth 20, you know, 25, uh, percent, et cetera. Demetrios: On, on that developer retention, how do you know if the team is still there? Yeah. So we do, um, Matt: uh, yeah. So we do version control. So the sources of data for all these eight modules are threefold. Um, cloud, um, the cloud financial modules are looking at, you know, your spend and usage in cloud to do cloud. Um, we look at, we scan the domains and subdomains for the light touch pen test. It's not exactly a pen test, but for the cyber section. And then the rest is the version control system, uh, data. And we don't just look at the code at the end. We look at the code, um, uh, at all points, a time series data. So how the code has changed over time. So we know, uh, on Tuesday, Matt made three commits, uh, to repository X, uh, and added a hundred files. Uh, so we [00:20:00] have that kind of information. The way that we do team, um, retention, developer retention, is we look at when was the last time that person made a commit. Anywhere in the bit, you know, in the, in, into the, uh, uh, one of the repositories. You might have 20 products, if you committed to one of those products, you're still, like yesterday, you're almost certainly still around, and you're available to answer questions on other repositories, even if you haven't touched those in a while. And so our, our rule of thumb, automatically, is if they, if the coder has been, um, if the coder has made a change to the code in the last 90 days, We assume they're there, and if they haven't made changes in the last 90 days, we assume they're not. Code is contextual, you cannot solely rely on that. Someone could be, not have coded, but still be there, because they're now an engineering manager. Um, someone could have coded recently, And not be there because they left, uh, in between. And so that, you know, we start with a automatic [00:21:00] quantitative analysis. And then as part of the conversation, uh, that are, you know, the advisors to use us, or if we're doing it, we do, we then adjust the results based on, um, based on the conversation about whether or not people are actually there. Demetrios: That's the, if the person is there, what are the other things that make sure that leadership. Understands where we're at or a potential acquirer. I a hundred percent see the value in this. If you're doing the due diligence, I also see the value in this when you just want to be able to champion for something to the executive team. And say, hey look, we implemented XYZ and what we're seeing from the codebase scans are these results, right? It's almost like another data point that you can point to to say that what you're doing is working. Matt: Yeah. No surprise. Couldn't, couldn't agree [00:22:00] more. In addition to their CodeScan product, which has been out in the world for seven years, we are prototyping a SAS version to make that easier on a regular basis. Any of your listeners want to weigh in, find me. We'd be happy to show you Wireframe. So we're pretty, we're pretty excited about that. I like, I love analogies. Um, The sales team wouldn't only understand the state of sales at a diligence, they wouldn't work in the dark until someone showed up in a diligence and said, tell me what your sales is like. Oh, wow, we learned this, blah, blah, blah. No, of course, you look at a sales at an executive level version of sales. Every day, um, if you're the, the chief revenue officer, one of the most ironic things, certainly the most ironic things in, in the world of coding is that engineering teams, engineering leaders have built dashboards for every person, every job on earth. But have not really built dashboards, uh, for themselves. Uh, and so, um, you can think of the code scan as [00:23:00] a one-time, CTO dashboard or engineering leader dashboard. We are turning that into a SaaS version, um, which it's very easy to do that poorly. Um, so we, which we do not want to do, um, we. We wanted to get it right at a point in time. Once we've gotten it right at a point in time, we're now building the time series one. Back to it, um, what does it mean whether or not you're using one of our tools or not? Um, I think one of the biggest, some of the biggest tips for making this understandable to, um, for getting what you want from the rest of that, from the board, from the rest of the C suite, always put things in context. And always translate things into dollars and cents wherever you can. So for context, um, let me go back to, um, uh, Dimitris, your fictitious company. Let's say there were ten developers who've ever worked on this codebase. Um, and, um, Um, which for us is relatively small, and let's say we looked [00:24:00] for CVEs, which are, uh, security warnings, of course, uh, related to open source violation, to open source usage. Every code base has CVEs. It's not a question of if, it's how many. If you had a 10 person software, uh, a 10 person software team, and there were, uh, I'm going to approximate this, let's say 15, um, high risk security warnings. Um, it would be almost certain, high risk CVEs, almost certain that your team had a, a CVE detector in place. And, your team had, um, permission, resources, um, capacity to actually fix them. Because SEMA has done, we've looked at a trillion and a half worth of companies, trillion and a half dollars worth of companies. We know that 15, only having 15 high risk security warnings is very, very good. Um, compared to your peers. The answer is not zero. And [00:25:00] in case anyone's wondering, everybody has some security warnings. If you were a 10 person engineering team and you had 500 security warnings relative to other 10 person companies, you'd be in the bottom quartile and both you didn't have a tool and, or almost certainly you didn't have a tool and developers weren't able to, um, uh, didn't have permission. To fix it. And I say that intentionally, I've never met a developer who wakes up and says, you know what I want to do? I'm going to add security risk. And I want to add technical debt. Like everyone, if they're left to their own devices, of course they would want like the most pristine code base that possible, the beautiful, I'm ranting a little bit, or hopefully, uh, inspiring, maybe a combination of both. The incredible thing about being a software developer is that you could get to create for a living. Uh, and it's this amazing craft that, uh, can be really high job satisfaction, really high comp. The exchange, the get, the give for that is we have to build things that are commercially reasonable or that meet the job, that meet the needs of the [00:26:00] organization we're serving. Um, that is what allows engineers to be able to have these amazing jobs. And so the amount of tech, there should be technical debt, there should be security debt, there should be intellectual property debt, all of those things, because it's, it's not about having none, it's about having the right amount for the size and stage. Of your business. So back to this, how do you, how do you explain things back to the C suite? Don't say we have 15 warnings or we have 500 warnings. I don't know. It doesn't, like, what am I supposed to do with that as someone who doesn't know? Say the data, the benchmarking data says we're in the bottom quartile of, um, bottom quartile of companies of our size and stage, it would behoove us to not be in the bottom quartile. Demetrios: Yeah. And then they say why? Matt: And then they say why, um, code security can lead to, um, Uh, so if it's, if you're an investor backed company, um, being in the bottom quartile on any of these dimensions will make it [00:27:00] harder for you to raise the next round or harder for you to get purchased. Um, and being below average on security risk, uh, increases the likelihood that you're going to have a catastrophic data leak, breach, et cetera. Um, you can't prevent them at all, but. You can get out of, you can, you can be, you can avoid being the lowest, the lowest performer. So that's, so one of the piece of advice is put it in context. And the other piece I said is can, wherever possible, can you translate it into dollars and cents? Um, that's hard, um, but it's definitely doable. And, um, you know, let's say you had 500 warnings that were legit. You could say, um, Um, and let's say there were c, let's say there were SaaS warnings, just to make it easier for a second. Uh, on average each of those warnings takes about four hours of developer time to fix. 500 times four is 2000 hours. Uh, and you know, that's, um, the [00:28:00] average year is 1500 working hours. So it's one and a quarter. Um, one and a third. Excuse me. FTEs worth of work, uh, to clean up those warnings. Um, for a giant company, that would be worth it. For a brand new company, it would be incredibly expensive, um, roadmap time relative to what else people can work on, and you can do the same thing with adding tests. Um, we can, you know, because of AI, we can add five tests an hour. It's going to take us two tests an hour. I don't, uh, I don't want to minimize it. Um, we, um, In order to get from 5 percent coverage to 15 percent coverage, it's going to take X thousand hours, and that translates to this many, uh, this many FTEs as well. It is reducible to person time, uh, or tooling cost, which can then get summed up. You know, and again, executives are there because they're good at making trade offs. If you can explain it in the language of, You know, they have a budget implicitly or otherwise of [00:29:00] how to improve non functional requirements. It makes it much easier to, you know, to get to the right answer for the code base. Demetrios: Man, I love this. This is great. Really helping me understand how to, like you said, speak a different language almost. And I do often hear it, how you need to always be able to tie your work that you're doing to a certain metric. And ideally it's a metric that the business cares about. And so translating something into dollars and cents is incredible. I have found that whenever I'm working with startups, It's a little optimistic to think that you can translate it into dollars and cents like that, because I think the startups are, when you're in that infancy stage, you have so many things that like so many bullets right in front of your face and you got to be Neo in the [00:30:00] matrix and try and dodge all of them. Right. And so you're thinking about many things that. are of utmost importance because there's a bullet 6 inches away, and there's a bullet 7 inches away, and then one 12 inches away, and It is easy to say, okay, let me translate this into dollars and cents. And let me do this, uh, reverse engineering of if we add X amount of tests and that's going to take us this many hours, and that's, uh, X amount of developer time, where I think it falls down is everybody says that, that I've taught, that I've encountered, right? Everybody will say that. And then the actual. All right, well, let's act on that information or let's do something about that is where I tend to find it falls down. Matt: Totally. And I think I would say from, you know, from our experience, if you have not raised or earned 10 million, I wouldn't worry [00:31:00] about anything that SEMA says. Um, except maybe, um, developer retention, uh, and then there still could be good reasons to have to switch. Um, the only thing that matters is getting to product market fit and not running out of money. The only two things, those are the only two things. Do not worry about code quality. Do not worry about code security. Um, get to a place, um, about process consistency. Any of these things, make it, get out, get to. Escape velocity, and then, um, future dollars, whether it's customer dollars or, um, Demetrios: Venture Matt: dollars or whomever, then they use that money, uh, to work on these non, non functional requirements. The first of the list that I might care about is testing, because testing can, um, Increase velocity, which can then help you get faster to product market fit. But I would be incredibly skeptical about adding unit tests or any kind of testing. Uh, I'd be very skeptical. It doesn't mean the [00:32:00] answer is no, but I'd be very skeptical for almost any business, um, adding them, uh, in those early stages, you'd have to be incredibly intentional that there was a return on investment of the developer's time rather than just coding. Not that you shouldn't be adding it later. Obviously, AI has, is, testing is one of the best use cases for AI because it just makes it go so much faster. Um, but any, any engineering leader, I would be inclined to be skeptical of, uh, of adding tests, uh, that early on. You can still fundraise, like investors don't care at a, at a series A, they're expecting you to work on product market fit. Every. With the right people and with the right budget, any problem, any code problem is fixable. You need to stay in business and you need to grow. So get those things right. And then, uh, hopefully when you make it, then you can come back and worry about these non functional requirements. Demetrios: Brilliantly said. So let's dive into JNAI and specifically [00:33:00] in my mind, I've had this paper that I read probably six to eight months ago that will not leave my head. And it is all about how something around the lines of 75 percent of code that was created by GenAI is then later deleted. And so it, it really goes back to this theme that you're talking about. Move fast, and then come back and refactor. Almost. And There's a time to move fast and there's a time to do what you have to do to make sure all your T's are crossed and your I's are dotted. What have you found when you are looking at the GenAI code? Because I think the other piece that you mentioned very aptly is sometimes GenAI can spit out code that's not, uh, legal to be using. So you might want to look at that. Matt: We are absolutely. Bullish on using GNI code. Full stop. We think it is better for developers. We think it's better for organizations. Um, I would say it [00:34:00] was high enough quality at least a year ago um, to be more useful than not. And it's only been, uh, it's only gone up the curve. Um, uh, the quality curve since in terms of, uh, accuracy. Usefulness, et cetera. Um, we really like using and really recommend using Geniada code in almost all situations, um, but it comes with an incredibly strong recommendation, which is to apply every kind of rigorous review that you normally would, and then maybe then some. Um, and, uh, of course that means putting Gen AI through any of your quality gates, your security dates, not that anyone would do this, but you can't, um, we, you know, you don't run, uh, an open source framework through a security scan, you use it, right, um, because you're, it's gone through others, um, this, you, I mean, you would run the CVEs, you wouldn't run a SAS or DAT, you wouldn't, you wouldn't put it, [00:35:00] Uh, a SAS or DAS scan, uh, on your, on open source libraries. You most certainly need to put security scans, quality scans, any scans you have on, um, on Gen AI code. Most importantly, because code is a craft, is code review. Uh, most important, so if you are, you know, you don't, we don't code review open source libraries we're using, we just use them. That's a, it's obviously a very big difference here, it should get extra code review. And I'd say the third part is, you know, when SEMA scans for Gen AI code, we actually ask two questions. One is, how much code was Gen AI originated versus not Gen AI originated? Not Gen AI originated could be written by the team, copied from, Uh, Stack Overflow copied from, you know, open source referenced. If it is GenAI originated, question number two is was it modified by a developer? And so we think about GenAI pure, GenAI blended. Pure came straight untouched from the prompt. [00:36:00] Uh, GenAI blended, uh, has been modified at least a bit by a developer. So think of the extreme example. Imagine you were looking at a code base and you asked the developer how they built it. And 100 percent of it was Gen AI, and it hadn't been modified in any way. It just came straight from the prompt. You should be very skeptical that it was contextually appropriate, that it didn't have major quality holes, didn't have any security risks, etc. So that would be an example of 100 percent pure Gen AI. Now, if you're prototyping, sure. Right? Um, if it just gets the idea across, it's a great, it's a great way to do that. But if you're talking about enterprise grade, you know, production ready code, by all means, you should be pretty, you should be quite nervous. And so we measure Gen AI originated, yes, no, and then if Gen AI pure blended, just You know, a code review, it could be good. So we're not saying, uh, GenAI code has to be 100 percent blended, but we are saying if it's not [00:37:00] largely blended, it's a pretty big risk factor that, um, that folks aren't taking the, the review seriously. Demetrios: It's basically the video that I've seen online where a man teaches his eight year old daughter to code with Cursor and she's able to do it. It's Uh, true feat that she can create a webpage with cursor and edit things there, but if it's pure Gen AI code, you're kind of no different than that eight year old. Matt: Yeah, you know, you can prompt it well, and, you know, I use Gen AI when I write, um, certainly when I research, uh, and for low stakes stuff, Um, if it can be about right or close enough and I, I don't need to modify it to get the point across, um, but for anything high stakes, uh, I rewrite it from scratch, um, or certainly line by line, I'm editing and most things, [00:38:00] most words are getting modified. So the, the more serious, same is true for code, the more, the higher stakes it is, the more serious it is. The more you should expect, um, uh, the more you should expect that you're going to be line editing it. And the more skeptical you should be that if you see code that is high stakes and hasn't been line edited, that would be a, that would be not determinative, but certainly a red flag. Demetrios: Yeah. Have you seen patterns of gen AI code? Because as you were saying, there's, when we write, I think just about everybody uses some form of gen AI these days to write. I notice I never liked Gen AI writing because it is so verbose and even if you prompt it well and say short sentences, one sentence, paragraphs, it still doesn't listen and it doesn't really do it. That is a key giveaway for me. And there's some words that are key giveaways that you're using Gen AI in your writing. Are there things that You've noticed or you've seen in [00:39:00] scans that potentially the code quality or GenAI is going to lean more towards this. I've heard stories about how if you are writing in Python, you're good because there's been so much data and training examples in LLM. With Python. But if you're trying to write in Go, maybe you get random Python spliced in there. Uh, have you seen any of this in, in the code quality side of things? Or, uh, is that a non factor? Matt: Yeah, it's a really good question. My honest answer is I have heard those stories and I have not found Any of them in particular conclusive, which does not mean they're wrong. It means any individual developer out there, I would be at least once a quarter contemplating switching tools or using more than one tool. Now, if you're working in a commercial setting, you have to make sure that you have approval, um, for those tools and they have to be at the right license level to protect, protect your company's intellectual [00:40:00] property. Um, but there are. At least five, arguably ten, very high quality LLMs, whether they're specific to coding or not. Uh, and more are coming, that I would be experimenting pretty regularly to swap out, do I like, is this one doing a good job for this, or, or not. We see it as a, it's kind of a non answer, but I think you'll know where I'm going. It's a real strength for us if we're looking at a company and they let their developers, they pay for more than one tool because it's a recognition of just how powerful these tools are in terms of helping developers do their best work and an acknowledgement that we're so early in the process. Um, you know, you could mandate everyone to use VS Code. It does many things right. It may not be the best in every situation, but you're not, you're not losing much. You're not losing, leaving much value on the table if you force everyone to use the same IDE. I don't think that's true for forcing everyone to use a single, um, a single LLM to code. I think you could be really losing out. And certainly I [00:41:00] would want you to make that, review that decision pretty regularly. Executives listening, um, Uh, it, your coders are using Genii to code, full stop, everybody's using it. So please, buy them a commercial grade license, and please take me seriously that you should, uh, let them experiment, use more than one, have a workshop, like a, you know, a, a Slack channel on discussing which ones are working in different circumstances, et cetera, because, well, it, it very well could be true that, LanguageX doesn't work today on, you know, it could, that could be a tool specific decision, it could change over time, etc. Um, those tools don't work, they're very powerful, but they're still being built. So, um, please, please, please encourage and support that, um, uh, that, uh, experimentation. Demetrios: And have you seen trends with Gen AI created code that is actually copyright code? Matt: So, this is [00:42:00] not legal advice. Go talk to your team's lawyer about this. Um, uh, your team's lawyers about this. If we think about Gen AI code, um, you know, the plus side of using Gen AI code is it increases productivity, throughput. Developer satisfaction, etc. The downsides can include the code is not high quality enough you solve that through code reviews and tooling. The code is not secure enough, you solve that through your security tooling and um, uh, and code reviews and making sure that developers have time to fix security issues, which is not a given. Um, third is Intellectual property risk, um, from using Gen AI to code. And fourth, just while I'm saying it, is something called exit risk. We'll come back to it. Within, um, you see I'm a very structured guy. Within intellectual property risk, um, there is the risk that your code can't get patent protection. That you're seeking patent protection and can't get it. We think that risk is zero to [00:43:00] none. Um, you should be able to get patent protection because the patent is protecting the idea, not the text. Um, you, uh, another important IP protection is trade secret, which is just like Coca Cola's formula. You just can't let anyone have access to it. The way that you can prevent that is making sure that you're using the right tool with the right license level. Please play the following clip to anybody, um, who needs to to your, your legal team. It is crazy not to give developers enterprise grade licenses and implicitly let them use, um, free, uh, free tools for free, um, you know, um, Non enterprise tools for free, because those tools train on what people are entering in, and it is just like posting your company's secret code on the internet, which you should not do. So trade secret is a real risk, but it's solvable by giving people licenses. Some companies SEEK copyright. on their code. Just like they seek, um, uh, patenting on their code. There it is not clear. We would consider that a risk, um, [00:44:00] because copyright is, you copyright the words themselves, or the work itself. And in human writing, you can't, um, say, ChatGPT or whomever, write me a book and then go get it copyrighted because the words are not yours. The words are a computer. So if you're at a company, it's only the largest ones really who seek copyright protection. Number one, please talk to your lawyers and number two, come talk to us because copyright protection, um, You're going to have to take some, um, safety precautions, um, if you're going to continue to copyright your code. That leaves us with infringement. So copyright of your own code, once you've written it, can you get copyright protection? Now let's talk about infringing because, um, The LLM is trained on code, um, that then passes to you that you should not. So that infringement piece, a little structure again, one is copyright infringement and the other is, um, [00:45:00] open source license risk infringement. We, we assess again, it's not, it's not legal advice. Talk to your lawyers. We assess, um, that copyright infringement is relatively low because to cut, to. For it to be trained on copyrighted code, they would have had to have gotten the code in the first place, and it's very unlikely that they were training based on copyrighted code, which is, it's going to be secret in the first place. That leaves infringement of the open source licenses. So, just like regular open source comes with licenses that are risky, um, uh, so too, if the, If the training set, you know, it was trained on open source with the wrong kind of license, and then that code made it into your own code, uh, it generates, um, the risk there is unclear. So I would say if your company above average cares about open source legal risk, open source legal risk, then it is likely You should apply that same [00:46:00] level of care to open source, to open source infringement risk from using GenAI code. If you today don't care, if your company, and I don't mean you as a developer, I mean your legal team, if they don't care about open source legal risks, it is likely that you're not, the risk of open source legal risk from GenAI is smaller. Um, then directly using open source. So if you don't care about the larger risk, you likely don't have to worry about it. Demetrius, I can only hope that was the most boring five minutes of your entire week. Um, intellectual property risk and, and code. Apologies to the listeners. I hope at least some of you found it interesting and aren't like falling asleep at the wheel. Um, hearing a little IP talk. Demetrios: Well, I do like how you break it down as a spectrum as opposed to just like, yeah, watch out for it. It's in there. And. It is very nuanced, and it's also, what does your company care about? And then you should think about the [00:47:00] potential risks. It sounds like the real risk is probably using some type of open source package, not in the generative, not in the generative AI code that could be using open source. Matt: I think it's, so here. This is that, that single topic is the, where there's been the most vigorous debate in the legal community, and so You know, let me acknowledge that there's a debate, which is another even more important reason that you should talk to your lawyers about this. Logically, what you're saying must be true. If the company goes and uses open source with the wrong kind of license, it's directly risky. Now, how likely is that risk to be realized? Depends. But it's a direct use. If the LLM creator uses training set data with the wrong kind of license, that itself is not enough to trigger risk in the user. [00:48:00] It's enough to trigger risk for the LLMs. The LLMs are getting sued over this, or, uh, getting sued. But it's not enough just that the LLM creator is using it. That code would have to make it to your code for it to actually be triggered. There's not a general risk. And someone would have to find it, and someone would have to care. Now this is my public service announcement, given everyone out there, your code runs on open source directly because you're using it, but also indirectly because it's been the training set for all of your Gen AI tools. Please, you have your company make a suitable investment in doing right by the open source communities you're a part of, commiserate with what resources you have. If you're a startup, do not spend zero, because the most, best thing you can do for the world is make it, like you're a small startup, don't, don't spend any time. And if you're a giant one, I mean, obviously look at the contributions that the, uh, that FANG companies have made to open source, right? They can put, I'm sure, [00:49:00] billions of dollars in over time, make an investment commiserate. And it doesn't have to be cash, it could be letting developers code on it, et cetera. Um, That's the right thing to do, given that you're getting this for free. And it probably also helps, um, were you ever to, um, um, I guess tactically, were you ever to be challenged by your inappropriate use of open source code, it would be good to be able to point out that Hey, we made a mistake, but listen, we take, we take this seriously. We're good. We're good citizens. Um, that doesn't literally, um, get you out of the legal risk you might be facing. Um, but it doesn't hurt. It doesn't hurt. Demetrios: Yeah. You did mention exit risk also. Matt: Yes. So exit risk is the possibility that if your company is getting, um, uh, is getting sold, um, uh, or someone's thinking about putting investment in that a, a Buyer or investor looks at the code you wrote and decided we can use AI to build it instead and your company is no longer valuable to us, or is less valuable to us. Oh, wild. Wild. [00:50:00] Um, I'm sure you've seen Wait, say that again. Demetrios: So basically it's like there's no moat here. AI can do these. It's the potential of, it's the Matt: potential of no moat. Demetrios: Wow. Matt: Um, you've seen, um, Inception, I hope, right? The, Christopher Nolan? Yeah. Yeah. I like to explain the power of Gen AI in the, those world creators, where they literally think it and then the world just unfolds. Um, certainly with respect to prototypes, Gen AI, um, Is 95 percent faster than doing it some other way. Right. And seeing something that actually works with real code. It's so much faster. We, there are public published examples of companies in a diligence process, red teaming, um, taking a group of people and saying, could you just build this yourself? And instead, um, uh, in deciding in at least one instance that they're not going to buy the company they were looking at. Demetrios: Wow. Matt: Now, take a 10, 000 foot view, why do companies buy other companies or invest in other companies? Um, because of the [00:51:00] financials, uh, and because of the technology. You know, why they invest in technology companies. The financials, um, uh, let's take eBay. Um, Um, eBay, doesn't matter how they're building the code, eBay is a very, um, they are very, great brand. They have a lot of revenue, a predictable revenue, whatever they're doing working. It could be, um, they could have no open source, they could do it by hand, I know that's not true because there's a great team and they rely on open source. But at that level, it's working. Um, Uh, whatever they're doing is working. If you are, um, you know, if you're building an analytics tool, um, that, uh, if you're building an analytics tool that takes unstructured data and produces plain English insights, well, guess what? Any LLM can do that. And, um, even if you make it to the stage of diligence, you, um, they might say, well, could we, could we build this ourselves instead? Now, if you have a whole bunch of customers who love your version of it, then they are not likely to switch to a generic LLM tool. And so the way that we'd say it is, if a, Investor or inquirer [00:52:00] is buying your book of business, is buying your sales. Um, that is still protected regardless of how well, of how you got there, not how well, um, if they're buying the tech or they're emphasizing the tech, it is less valuable than it was six or 12 or 18 months ago. And that trend is only going to continue. That's kind of the sad news. The good news is you can get to a product that is sellable faster. So if you can build products five times fast, um, certainly three times faster to get to production on not just prototype, but actually get to production, well, then you can get to business outcomes, you can get to revenue faster. And so, yes, there could be an uncomfortable conversation about, is your code repeat, um, is your code, uh, reproducible, but. You're in diligence conversations in three years rather than five because you've, you've built a valuable product that much faster. Um, it comes back to, remember at the beginning of this where I talked about pure versus blended code, um, blending is your friend because it shows that you are, and don't just do it for your [00:53:00] own sake, but the more contextualized and specific to the problem you're solving, um, the more valuable that technology is. Relative to building it from scratch. So all of the benefits of all of the risks of Gen AI code are addressed by blending to solve security issues, to solve quality issues, IP risk, um, for the most part, and an exit risk because it's, it's now your, again, imagine extreme example, I'd like you to buy my company. Did you write it? No, it's. Three open source libraries, uh, the, the value of that tech is zero, uh, because people can get it for free. Now, you could have an amazing customers who love it and for some reason are buying it through you, right? That's an extreme example. Um, Gen AI is, it's, it's bringing us in that same direction. The code will matter less, uh, and certainly generic code, um, will matter a lot less than, than, than truly custom code. Demetrios: Yeah, it's funny. I mean, there's. [00:54:00] Something there about the code, but then also everything else around the company and the brand. Is the brand strong enough to where, okay, it is generic code, but because the brand is there and people go with it, like you were saying with the eBay example, or even I know there, there's plenty of social media Sites or templates that you can grab a newsfeed with a simple plugin and you have on your website, a newsfeed that doesn't make you Twitter or Facebook or whatever, name your favorite social media network, right? There's, there's those brands there. Granted. That's an extreme example, because I'm sure the code bases of every one of those that I just named off is very complex and not at all, uh, in this same ballpark that we're talking about. But it is fascinating to think about that, how, how to create that [00:55:00] defensibility. And I'm wondering, since you are playing in the intersection of companies that are getting bought and. Basically, the acquirers, right? Because your tool helps. The acquirers understand what's going on in the code bases of companies. Have you seen trends or do you have anything on how many companies are getting bought right now? Is there anything, you know how with, I think it's Carta puts out there, this is how many companies are getting invested in series A, series B, that type of thing, do you have any information on that or is it all behind NDAs that you can't share any? Matt: Uh, yeah. Um, I can say at a high level, M& A activity is definitely up from last year and definitely up from two years ago. Um, we're not, um, all the way there yet of, um, we think 2025 is going to be gigantic, uh, in terms of M& A activity. Uh, investors have [00:56:00] so much, um, so much, so much funds to spend, uh, we'd call it dry powder that they haven't invested yet and it's been building up. So we think, so volume is definitely up. It's going to blow up, um, sometime in 2025. Demetrios: And this is all for SaaS companies primarily? Matt: Um, it's certainly a SaaS companies. Um, I try not to speak about things I'm not super familiar with. I would expect, um, a general increase in, uh, investment activity, um, because these investors, certainly the investors we know well are multi fund, um, or multi strategy, so they're doing different things and it's not just SaaS, but certainly software companies, we'd expect to see a pretty substantial increase in M& A. Uh, in the next 6 to 12 months. Demetrios: [00:57:00] Incredible.

+ Read More

Watch More

Code Quality in Data Science

Posted Jul 12, 2022 | Views 1K

# Data Science

# Clean Architecture

# Design Patterns

Ensuring Accuracy and Quality in LLM-driven Products

Posted Apr 27, 2023 | Views 1.4K

# LLM

# LLM-driven Products

# Autoblocks

# Rungalileo.io

# Snorkel.ai

# Wandb.ai

# Tecton.ai

# Petuum.com

# mckinsey.com/quantumblack

# Wallaroo.ai

# Union.ai

# Redis.com

# Alphasignal.ai

# Bigbraindaily.com

# Turningpost.com

DevTools for Language Models: Unlocking the Future of AI-Driven Applications

Posted Apr 11, 2023 | Views 3.6K

# LLM in Production

# Large Language Models

# DevTools

# AI-Driven Applications

# Rungalileo.io

# Snorkel.ai

# Wandb.ai

# Tecton.ai

# Petuum.com

# mckinsey.com/quantumblack

# Wallaroo.ai

# Union.ai

# Redis.com

# Alphasignal.ai

# Bigbraindaily.com

# Turningpost.com