Sign in or Join the community to continue

A New Way of Building with AI

Posted Jul 14, 2025 | Views 82

# API

# Latent Space

# Lutra

Share

speakers

Jiquan Ngiam

CEO and Co-Founder @ MintMCP

Jiquan Ngiam works on solutions that connect AI to data and applications. He is the co-founder and CEO of MintMCP, a company building enterprise AI infra including MCP gateways that securely connects AI assistants to internal company data and tools with enterprise-grade governance, auditability, and security. Previously, he was a senior staff researcher at Google Brain, on the founding team at Coursera, and in Andrew Ng’s Stanford lab, where he co-authored pioneering work in multimodal deep learning. His technical work spans bridging AI models to real systems, and he is now focused on building reliable, enterprise-ready AI infrastructure.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

What if AI could build and maintain your software—like a co-worker who never forgets state? In this episode, Jiquan Ngiam chats with Demetrios about agents that actually do the work: parsing emails, updating spreadsheets, and reshaping how we design software itself. Less hype, more hands-on AI—tune in for a glimpse at the future of truly personalized computing.

+ Read More

TRANSCRIPT

Jiquan Ngiam [00:00:00]: But assume that the AI was the main driver of producing and maintaining software, what would you design? We actually asked the AI model, you know, if you wanted to add a table, what you should do. Then the AI model was like, yeah, I'd like to have a function that adds tables. We're starting to see more understanding in the ecosystem of how to connect AI to software now. A lot of sandboxes, they are usually like jupyter style kernels. They run, they disappear and it's like oh, very firm, we're very firm. Like the state is gone, the variables are gone. It's not great because that's not how we interact with ejecting systems. We chat with it, we come back many hours later and we keep going again or we go take a nap or wake up, go to sleep in the morning and get like, get my coffee and like start going in and they like oh, your state is gone now, that's terrible.

Jiquan Ngiam [00:00:48]: Even though there's so much activity, it's also sometimes hard to figure out what's working. Yeah, but then it's just so exciting. I guess it's like, it's like almost the early days of, you know, just new technology. People are adopting it, getting it out there. And so I think for us, what we are seeing at least from our users, our point of view has been really kind to people who just want it to work. There's a lot of AI that's really fancy, generates really cool things but end of the day what we are seeing is that I just want it to do some work for me, parse my emails, read my invoices, get my airtable in shape, stuff like that. And sometimes the. Even though we are seeing all these amazing fancy things out there, I think the basics are actually important.

Jiquan Ngiam [00:01:30]: Can it do the basic day to day operations that we care about. And I think that's kind of what we're seeing. People just want things that work and I think that's kind of when it comes to come back to the ground again of there's hype, there's reality. Let's bring that together and make sure expectations are aligned.

Demetrios [00:01:48]: Because you can get into trouble if you're just going off of the hype and then you're promising the world and not delivering on much.

Jiquan Ngiam [00:01:56]: Exactly, exactly.

Demetrios [00:01:57]: There's all these different potential use cases, but the ones that actually work are few and far between. Have you seen some pretty common use cases that are reliably working at lutra?

Jiquan Ngiam [00:02:12]: What we do is that we are building AI agent that completes tasks across your Apps sounds a bit like this general agent that understands how to work across different software, use emails, CRM, spreadsheets, so on. We're starting to see, actually a lot of people understand how to use it and get a lot of success with it. I think the nature of the tasks that we normally see working are those that data in one system, you got to pull it out, you got to transform it in some way using AI normally and you got to put it back in a different system, for example, get my invoices that are coming through my emails, pull out the different line items, get it into my accounting software or spreadsheet. I'm working with QuickBooks and whatnot. It turns out that if you think about that, that makes a lot of sense because our software services are separate and silos, right? They don't work well together. But it turns out that the AI has the ability to now understand how to interact with each piece of software you work with and be able to glue them together. So we're starting to see that kind of work, that kind of processes work really well. So the thing I just described is actually a process that works really well on lutra.

Jiquan Ngiam [00:03:18]: But I think as we start to. I think what we're starting to see is two things in there. I think one is that models, especially the current generation of models, have gotten really good at understanding software. So co generation, cursor, wind service did a lot, but at the same time that ability to produce software can now be something that everyday business users can harness to automate work. And number two, I think is that we're starting to see more understanding in the ecosystem of how to connect AI to software. So we have been working on this for a long time and I think at the core of it is figuring out what is the right AI computer interface. I don't think it's going to be the point and click. I don't think it's going to be the raw APIs.

Jiquan Ngiam [00:04:07]: It has to be something in between that is abstracted away for the AI to work with. So mcps, things like this are coming to bear. So I think when you have all those pieces start to come together and click, you know, things are starting to work quite well.

Demetrios [00:04:19]: Dude, it's funny you mentioned the ETL process, but now for AI, I'm sure you thought about that, you realized like, oh, it's just the same ETL stuff we've been doing in data engineering, but now we have more ability to decide what we're extracting and what we're transforming and then where we're going with that.

Jiquan Ngiam [00:04:42]: Maybe I'll give an example of like some fascinating use cases and down the lines. So one of the things that we found is that the challenge is. That's your point, right? We used to be all stuck in tabular data tools. You need to have a schema database around it. But a lot of real world data is not tabular. It's not in a schema like that. So here's one example of an end to end process that we see some users trying to do, which is I have a spreadsheet of people, signups, maybe an airtable or Google sheet. There's some missing data in there.

Jiquan Ngiam [00:05:12]: Some of them didn't fill out the form completely with maybe their name, their addresses or their budget for something. And then I got to email this hundred people and say, give me that data point you didn't put into your signup form.

Demetrios [00:05:25]: This cannot be. No.

Jiquan Ngiam [00:05:29]: It'S an ETL process because what you do is that you read the spreadsheet, then you produce emails, drafts for each of the individuals based on how are they filled in. And then you send it out now, then they reply. Now it's a different ETL process. You take the emails that they replied to, you extract out using AI the data they're giving you now, and they fill it back into the spreadsheet and airtable where you need it. Now those are actually both ETL processes, but if you think about it, those are actually real work processes that we do every day. Getting some form of a table of.

Demetrios [00:06:03]: Data, missing data, having to go and follow up.

Jiquan Ngiam [00:06:05]: Exactly. So. So it's fascinating that you start taking that concept and saying, because AI understands how to work with different applications, so you don't have to program in which APIs to call and say, here are the tools you can use. And number two, because it can take that unstructured data from email content and get the right pieces of information out, you can start really etling things, extracting, being, figuring out these are tools, transform being using AI to transform the data and then shove it into a different system.

Demetrios [00:06:38]: I like how you made that a verb. We are ETL ing a whole lot of stuff now. You're not confined anymore. How do you think about integrations?

Jiquan Ngiam [00:06:47]: Got it. Makes sense. So the way I think about integrations is actually what is the right level of abstraction actually to give an AI so that it can be successful using a tool. So I think a lot of integrations that we have today, APIs were not designed for AI, they're designed for Developers, here are all the error codes that we should do. Read it, program it deterministically to do all of these things. And then for an AI model, sometimes that doesn't make sense. So give it to maybe two examples in here. One is that we worked a lot with the Google Docs API because people want to create documents and edit them edited.

Jiquan Ngiam [00:07:24]: Use AI to write documents for yourself. It turns out that there is a way to get the Google Docs API to add a table into the document. Now, if you can figure that out, kudos to you because it was impossible to figure out without a lot of tinkering. And what you had to do was create a table and then kind of like backwards compute how to fill in every cell. And it's like insane. It's possible, but you need to sit down and really work through the details of that. And any AI model is going to struggle with that. So the abstraction level that we want to get to for an action like that will be one level higher at a table to the document.

Jiquan Ngiam [00:07:58]: And behind the scenes, we as engineers, developers, figure out all the garny gnarly details in doing it. Right. And so as we are doing integrations, it's really thinking about that.

Demetrios [00:08:08]: Right, because that's just one use case. Right?

Jiquan Ngiam [00:08:11]: One use case.

Demetrios [00:08:12]: And there's millions.

Jiquan Ngiam [00:08:13]: There's millions, yes.

Demetrios [00:08:15]: There's millions of Google Docs with adding a table instances that you have to backwards engineer.

Jiquan Ngiam [00:08:21]: Yes. So it turns out that I don't think it's. So if they look at the level of getting different levels of abstraction, I don't think you have to go all the way to such fine detail. Like we need to support this table, that table. We just have to support tables. So It's a bit one level higher than the raw APIs, but not so far abstracted that it only does one task. So that's kind of the level to get to. The thing about that is that the way we think about it actually is in a kind of funny way, instead of sitting down and designing an API for the AI, we actually asked the AI model if you were to wanted to, if the LLM, if you wanted to add a table, what you should do, then the AI model would be like, yeah, I'd like to have a function that adds tables.

Jiquan Ngiam [00:09:03]: Okay, great, let's do that. So it's almost observing like if we give it a set of integrations and actions, what are the inclinations of the model, what does it prefer to do? And then we kind of actually adapt the way we design the Actions to what the model prefers. So we actually design it around AI preferences, which is a very different thing to do. One surprisingly non trivial thing. Spreadsheet filling a very common task. I have a spreadsheet. I need to fill in data for every role, maybe do some research on the web, fill it in. There are many ways to go about doing that.

Jiquan Ngiam [00:09:38]: What is the right abstraction to give an AI model to say this is the way to update a row. Do you give it just the functions to say update a cell with coordinates?

Demetrios [00:09:47]: That's what I was going to say.

Jiquan Ngiam [00:09:47]: Maybe. Turns out that if you do that, you're going to get off by one errors. It's going to be like, oh, I think there's a header row plus one off by one everything.

Demetrios [00:09:55]: Oh no.

Jiquan Ngiam [00:09:56]: So now if you give it an abstraction in which I'm going to give you the input data that represents a row and you tell me the columns to update based on the header as keys, now that's going to work. No more off by one errors, no more issues in there. So once you figure that out, for example, then the errors that the models make go away because it's sometimes impossible to make the same errors. And then you can see, does the model like this in some ways, because you give the model, say this is the way to update it, Is it getting it right or is it trying to do something else?

Demetrios [00:10:29]: You know what this makes me think about is how the you're giving. I always think about trying to give the LLM the least amount of scope as possible and really directing it and almost ordering it to do one thing and do that one thing well. But you're coming at it with this vision of what let's work together on it. You have a certain level of understanding and capabilities that work fairly well. And so take that into your understanding and do it. You're just giving it more scope than I would have been if I were to attack this problem.

Jiquan Ngiam [00:11:11]: Exactly. So this is actually one of the key principles we use in developing AI integrations. It's like, look at what the model is doing. If the model is not doing what you think it would do, don't go against it. In fact, take a step back and go like, we should redesign our interface so that that thing that it wants to do works. And then actually it becomes very easy because you're not fighting the model now, you're just doing what it's really naturally programmed to do or had learned to do in data. And it turns out that funnily enough, because everyone trains on the Same data. All the models exhibit the same tendencies because of that.

Jiquan Ngiam [00:11:44]: And so we actually see that across the platform where once we get it working and one model from OpenAI the same system, then it works also with entropic models and Google models because the tendencies are very similar in all of them.

Demetrios [00:11:56]: So then what are some other ways that you've almost created workflows or programmed with the model?

Jiquan Ngiam [00:12:05]: There are many different ways to get models to interact with applications. Two very common frameworks are react, which is you get it to move function calling over and over again. That's the most popular one. And then I think the other one is called code act. You get it to write a bit of code and you run it and see the outputs. So we are a big fan of the codecs approach. So we have been doing that actually before the paper came out.

Demetrios [00:12:25]: Oh really?

Jiquan Ngiam [00:12:25]: That should be the approach to be. Because we saw the first demos of GPT4 was about. I think Greg Brockman was on stage demoing Cogeneration. I was like, wow, that's going to work. This is very early days. So we have always been big on that approach. And it turns out when you take the codec approach, there's a lot of things you can start doing now. So for example, you say, hey model, I want you to produce code and I need the code to respect types.

Jiquan Ngiam [00:12:54]: So when you call this function and this function and that function and you pass the data around and you have a variable that goes around, we can infer the types because it's just code. We can say that type must be X and when you pass it down it has to be a string. Oh, at this point of time you got to convert it to a number and pass it down. In that approach you can say produce a little snippet of software code. We can check whether the types match up and it doesn't match up, we can go like not good, go back and do it again and then you can see them and the model will fix itself. And so now again it's just putting these guardrails on and telling the model like you got to stay in the guardrails. And the guardrails come from an external system that is well understood type checking programming. So the model gets better and better at this over time too.

Jiquan Ngiam [00:13:37]: And that's actually a really good thing that works really well in practice.

Demetrios [00:13:41]: Are you using Pydantic AI for this? Because that's a whole value prop that they have.

Jiquan Ngiam [00:13:47]: We are using Pedantic for a lot of like type tracking and typing things for sure. We use it a lot for schemas and serialization of data. We use some degrees of pyrite, I think, for type tracking. But then it's also interesting in that there's a balance too. So here's the fun part. Yes. Type tracking is great. Get models to do type track code.

Jiquan Ngiam [00:14:07]: It turns out that in Python, variable shadowing, which is reusing a variable name that you once used, it's a very common thing. You reuse index and loops or something like this. Type checkers don't like that because you said it was a string. Now it's not a string anymore. What's going on, guys? It has to be the same type in the whole function. So it turns out that going back to our first principle, the model keeps producing code that doesn't respect that particular rule. Great. Throw that rule out the window.

Jiquan Ngiam [00:14:42]: Throw that rule out of the window, but keep the rest of the rules. So there's some rules that you can be kind of also guided on. Great. The guardrails are there, but this guardrail, the model really hates it. Then you kind of realize it. Like, is it a good guide rail? It's not a good guide rail straight away. So it's really like AI first. Kind of like AI first system design.

Jiquan Ngiam [00:15:02]: If you were to design an environment for coding, for execution, for software to run, but assume that the AI was the main driver of producing and maintaining software while you design. Right. Yeah. And that's kind of the. The mindset we have over here. Yeah.

Demetrios [00:15:20]: Having the guardrail. But if for some reason it keeps not liking the guardrail, see if you can format that guardrail differently.

Jiquan Ngiam [00:15:28]: Yeah. Make it in something that's actually valuable for the model. Right. And so that's a lot of the ways to think about it. What does the AI model want to do? Is it something reasonable? Is guardrail a guardrail that's really important and you putting that up.

Demetrios [00:15:44]: Yeah. Because there's probably hardline guardrails that you cannot change. But then there's other ones that, that you just have there for some reason that you thought it would be more useful. You thought that the model output would be more consistent or more accurate. And if for some reason it's not liking that, then. Well, how can we work around it? How can we maybe work with the model as opposed to trying to put yet another guardrail on it.

Jiquan Ngiam [00:16:09]: Yeah. So I think when you see, I mean, the two approaches. Right. One is like, go back to the prompt and prompt it more to do what you wanted to do or change the way you're set up the system so it just accepts what the model is doing. And the latter turns out to be a lot more effective, we found, because you're not fighting the model anymore. And so just really figuring that out, I think has been a journey for us actually, because the models keep changing, keep evolving, getting good and bad in some things.

Demetrios [00:16:35]: This episode right here is sponsored by the good folks at MLflow, the open source platform for teams looking to track, manage and deploy ML and generative AI projects at scale. If you spend hours keeping track of different LLMs prompt versions, debugging agent hallucinations, or filling out spreadsheets of experiment results, MLflow is the right tool for you. This year the MLflow team launched MLflow 3.0, the first major version update in four years, along with a fully managed version now available to everyone, built and maintained by the original creators of MLflow. Shout out to Ben Wilson we see you man. Managed ML Flow gives you access to full Featured Enterprise Ready MLOps platform with 0 setup required. You get effortless experimentation, tracking, automated model versioning, evaluation and seamless deployment. All that plus the cutting edge features for gen AI workloads like automated tracing, agent evaluation and one click monitoring dashboards for your apps and agents with zero infrastructure headaches. Ready to try it out for free? Well visit mlflow.org to get started.

Demetrios [00:17:56]: That's mlflow.org let's get back into this episode. Every tool has its own way of working and I can imagine you.

Jiquan Ngiam [00:18:08]: Are.

Demetrios [00:18:08]: Constantly thinking about we need to create more integrations with more tools. How do you look at that? And I know you mentioned MCP before. Is that something that you're just like, all right, well cool. Soon enough we're going to have MCP servers for all these different applications and we won't have to think so hard about the integrations.

Jiquan Ngiam [00:18:30]: So to things in there. I think MCP is very new and we do see a nice potential long term future where it does become a standard that lots of people adopt. I do see a world in which every company that has an API right now has an MCP server next to it that is somewhat like the API, maybe a bit different in that it's molded around how AI should use it with prompts and a set of functions that they curate, but we're not there yet and we're still figuring out what that should look like. So for example, MTPs today there are no output schemas and we think figuring out the right output schema of MCP call is really important. So the way we've been thinking about it is one, we have become MCP compatible at Lutra so you can actually use all MCP servers in Lutra today. But two, we want to make it possible for people to also create integrations today without waiting for MCP server to be available. It turns out that what are integrations? Integrations are a translation of how an external system normally APIs work and how that should be adapted to our internal way we integrate with AI. Translations are very open to being AI translated.

Jiquan Ngiam [00:19:43]: What we have built and shipped in the last month actually was a way to take API docs from any system, have a service that looks at our documentation or how we set up things, how we configure authentication, also generate an integration, all the code needed for that and then immediately, instantly test it out and then upon testing it out because it's an important part, it's not going to always work because of edge cases and everything. Be able to have the model see live data come in, fix the integration and then roll it out. So that's something that we have been doing. It's. It works really well with server based databases that you can bring into Lutra. CRUD based APIs just out of the box right now. It gets a bit more tricky when you start to work with really complex systems like CRMs, but we're getting there.

Demetrios [00:20:33]: You know what, it's so funny that you're saying this because back in 2022 or early 2023, right after ChatGPT came out and there was this boom and it was before folks were really doing stuff with code, someone was in the MLOps community, Slack and they said, hey, I built this framework and all it is is you talking to an LLM telling it what you want. But instead of the LLM generating code, the prompt was what would this look like in code in your language? Oh yes. And so the LLM would just output its own basic gibberish and its own code. It's. And then it would create its own language in a way and then you could re put that into an LLM and for some real weird, really wacky reason it understood it.

Jiquan Ngiam [00:21:32]: Yes, I think the models actually have some tendencies, right? They're like, like to process data in some ways or structure code or some things in some ways. Yeah. And I think that's when you describe that it actually reminds me a lot of almost like chain of thought reasoning. The first chain is actually to produce some pseudocode which is like this internal gibberish. And then after that, or maybe a plan and after that produce the actual thing you want to do and then it kind of understands that too. Yeah, so we've seen that a lot. I mean, it's fascinating that reasoning models work that well today because that whole process is now embedded into their reasoning model. And we definitely see a big step up in performance when we move to that.

Demetrios [00:22:12]: And so how are you using different reasoning models?

Jiquan Ngiam [00:22:16]: We decided to be pretty model agnostic from day one. So we use models from different providers in our backend. In fact, we swap between them and falling back when one model provider is not available. And it works very limited. It actually works pretty well. And the way we think about reasoning models in particular or models in particular is that every model actually has different strengths. So it turns out like for example, Gemini Flash models are really good at data extraction. Give it a bunch of data really fast, really high quality data extraction comes out nicely.

Demetrios [00:22:52]: Even the multimodal data.

Jiquan Ngiam [00:22:53]: Even the multimodal. Exactly. Stuff is really, really good at that. But you know what, I don't think that they're good at writing. And then these cloth models actually write really nice content. ChatGPT now is the best image model. And so we think of it as can we help users or dynamically figure out the right model for the right tasks when they need it? So if someone comes to our platform and says, I got a bunch of PDF files, I need to extract the data and put it into a spreadsheet.

Demetrios [00:23:21]: And you're like, ding, ding, ding, I got just the model for you.

Jiquan Ngiam [00:23:23]: Exactly. You say like, we're going to use Gemini Flash to do all the extractions, we're going to use Sonnet to figure out how to map the data, because there's a lot of reasoning behind mapping the data. And then if we run into any image stuff, maybe use a different model there, right? And so that's what we do in practice. And then the reasoning models, that part really comes in critical at understanding how to tie different systems together. So it's like, I need to work with system A, system B. These are the setups, the API sections I have, what is a way for me to tie it together, reason about it a bit, write some code, run it, maybe it works great, maybe it fails, and if it fails, reason about why it fails, try again, go. So that reasoning is really important in orchestration and that's why we use it mostly.

Demetrios [00:24:09]: So is it also orchestrating what models to be using?

Jiquan Ngiam [00:24:12]: So yes, in practice, yes. But what we do is that in the back end we do our own testing and predefined it like data extraction. We like Gemini Flash. Now we're locking that in. But then at the top level it's figuring out what is the task. Is the task creative writing, report generation, web research, or is the task data extraction? And then depending on the task, we have hand selected the models based on just eval sets we have.

Demetrios [00:24:41]: Yeah, that's such a great way of doing it. And also not sticking to just one because you're like, oh well, I like Gemini the best. No, it's Gemini is great for these tasks.

Jiquan Ngiam [00:24:54]: Exactly.

Demetrios [00:24:55]: So let's use them. Let's predefine that we're going to use them for that type of thing.

Jiquan Ngiam [00:24:58]: That's right.

Demetrios [00:24:59]: And then if we need others, then we'll go and find others. How are you thinking about super complex and long running jobs? Yes.

Jiquan Ngiam [00:25:10]: The longest task someone has to try to do took I think two, three days to run.

Demetrios [00:25:14]: Really?

Jiquan Ngiam [00:25:14]: Imagine you type a prompt, press Enter, it's like two, three days, it comes back.

Demetrios [00:25:20]: No way.

Jiquan Ngiam [00:25:20]: But so I think that there are different kinds of long running tasks. They're one kind which like you're doing a bit of it, you need to see what is it and you're always replanning, reorienting around the outcomes and then doing it more and more and more. That's one kind of long running task. Very high complex knowledge work that maybe you're doing a deep research report about something and you're going to talk to different people, collect data. That's pretty hard. There's another kind of long running task which we think AI is well suited for today, which is do that same damn thing, but over 100,000 rows. Do the same thing over repeated over many, many things. But just verify with me, that's doing the right thing.

Jiquan Ngiam [00:26:03]: So right now where we are I think is we're really good at that. And then over time the former we get really good at too, but the latter is actually where we are today.

Demetrios [00:26:12]: So it's almost like if you can do it once, well, you can do it. You verify that it can be done and then you just do it. A lot.

Jiquan Ngiam [00:26:20]: A lot, yes. So for example, the task in this case was to some extent I have a spreadsheet of accounts, websites I want to research on. Maybe all the people attending a conference, maybe all the name lists are something I got from somewhere. Every single person in the ML community, what are they doing right now? Pretty big task. I go to every single person, web research them, look up that on LinkedIn, look that up on Google Scholar, put all the data pieces Together, maybe even score that and give me a final score on, hey, should this person be a good candidate to interview on this podcast?

Demetrios [00:26:55]: I need that.

Jiquan Ngiam [00:26:55]: You need that. But then now it's like, okay, I got 5,000 people or maybe 100,000 people in my community. Run it on 10 people. I like the scores. Go run it on 100,000 people right now. So now this is where, interestingly, this is where we tie it back to a bit of how do you agentically automate work? And this is where a bit of why we think methods like code act shine versus React. Because now if you're producing software code, you can go like, I want to write a function that does it for one row. I'm going to run that function a few times for you, show you the output.

Jiquan Ngiam [00:27:29]: That's great. Now I'm going to write a for loop that goes through all the rows and just keep running until it's done. You can start imagining now that's only like four to five model interactions and you're off to races. Now if you did something more like the REACT framework, every single row is a single LLM call. You're never going to finish this because it's like.

Demetrios [00:27:47]: And it's so much more expensive.

Jiquan Ngiam [00:27:48]: Much more expensive. Doesn't make sense. So this is where we have really been focused on that approach because it scales very nicely to many different kinds of real world tasks that people do. So the thing I just described is actually a task people do all the.

Demetrios [00:28:02]: Time right now, 100%. And if you're in sales even more.

Jiquan Ngiam [00:28:06]: Now think about that as like, why is the input a spreadsheet? What if the input was my HubSpot CRM? Oh, what if I input my HubSpot CRM on a particular list of target accounts I care about? Can it just read my data from that system directly? Do this work and lock a note back into the CRM directly. Go. There's no difference conceptually from a CRM and a spreadsheet. Very high level, they're all database. But the hard part traditionally has been I need to figure out the APIs and integrations. But I was like, oh, wait, if I have the integrations ready to go and AI understands it, I'm just a prompt away. So we have HubSpot integrations today and that's actually something people will do on their tasks.

Demetrios [00:28:51]: Yeah, I love that you're thinking about attributes or whatever it is. These custom fields that you have in HubSpot or Salesforce or whatever it is.

Jiquan Ngiam [00:28:59]: It'S an entire object with Properties like could be nested, could be arbitrarily complex what you want it to be. Turns out that this is where integrations get hard too because the more system allows you to configure things to a very complex degree, the more the AI needs to understand the schema behind the configuration. So we spend a lot of time on our end as well figuring out like how do we communicate user defined schemas from external systems into our AI agent so that when our AI agent operates, it's able to understand the schemas. For example, if you had HubSpot or airtable, one of these things you say I want to update a dropdown field with the value xyz. It's not one of the drop down options, the system will reject it. So you need to know the schema to work with it. LLM is going to make the mistake of doing the wrong name. It puts a space between qualified lead and the thing in HubSpot often have no spaces.

Jiquan Ngiam [00:29:57]: There's qualified lead without spaces there. So how does your AI agent then understand that error comes back, fixes itself and go again? Right. All of those things are things that we spend a lot of time thinking about and architecting for.

Demetrios [00:30:08]: So simple on our end we look at it, it's like qualified lead, no fucking space. And then for the agents it's a little bit different. Yes, man, but that is so cool to think about how, how much it can unlock when you don't have to say export as CSV.

Jiquan Ngiam [00:30:30]: Yes.

Demetrios [00:30:31]: And you can just let it run loose directly. This makes me wonder about taking this data and mixing your processes of the agent's steps with traditional data processing and how you're looking at that because I imagine you have, you, you mentioned you have certain models that you like to use for certain tasks. Obviously there's certain tasks that are better done with Pandas or things like that or Spark or whatever it may be. How do you like integrate the two of these?

Jiquan Ngiam [00:31:13]: So this is where it's actually really, it's really beautiful in some ways when you tend to use going back again to the code act framework. So if you're in a world where you're orchestrating between apps not just by function calling, but by code generation. Pandas, numpy, whatever can just be part of the output. And it is so today when you use Lutra, it actually produces Pandas Pyplot, all the stuff that you like in this code as well. So you could go be like, okay, for this part do this thing that calls the AI function that does extraction for this part, use Pandas to do some kind of analysis for this part, use stats model to do a regression analysis. For this part, generate a HTML file using a model so that it's a nice pretty website that you can present the data now. So now you can start composing across all your existing tools. You like that do data processing with the AI tools that are new and bring it all together into one place, right? And so that's really nice.

Jiquan Ngiam [00:32:10]: The hard part of this turns out, turns out to be actually runtime. So if you think about it, most of the time in the wild, when you see people doing a code execution approach, the code runs in a very tight lockdown sandbox code interpreter from OpenAI that's been around for a while. You can use OpenAI ChatGPT, write some code for me, do that. Because it starts running it in Python, that Python environment in a lockdown sandbox that can do nothing, let's say run some Python code, of course, nothing. We cannot access the web, it cannot call APIs, it cannot call out to other AI functions or external systems. Now that's why we took a step back and we're like, how do we then make it possible? And how do we then consider all the security implications of making a sandbox that can access the external world, such that the AI is working in a sandbox, but it shouldn't be able to access any key secrets, any keys, but it can call out to the external world, it can bring data back. What would that look like? And then, so that's what we spend a lot of time designing on. And then second thing is that now a lot of sandboxes, they are usually like Jupyter style kernels.

Jiquan Ngiam [00:33:24]: They run, they disappear, and it's like, oh, very ephemeral. We're very ephemeral. The state is gone, the variables are gone. It's not great because that's not how we interact with agentic stuff systems. We chat with it, we come back many hours later and we keep going again, or we go take a nap or wake up and go to sleep in the morning and go get my coffee and start going in and they're like, oh, your state is gone now, that's terrible. But we also cannot keep, we cannot also reasonably keep all the machines running all the time. So how do you make an environment that is more stateless? So we have designed actually more of a stateless environment. So a stateless secured sandbox Pythonic that enables the AI to operate in.

Jiquan Ngiam [00:34:07]: So that's actually the core of the core that's the hard part, I guess, the core of it all.

Demetrios [00:34:12]: There's so much that you just said there that I want to dive into, especially on the stateful and stateless part and how you're enabling that. Because there's sometimes that you need it, there's sometimes that you don't. And if you get the wrong time, you can either lose data or you can lose processes or a long running job, or if you keep it on too long, you can just spend a lot of money.

Jiquan Ngiam [00:34:38]: Exactly, exactly. So there's a lot of design decisions in that one. But at a high level, if we just abstract ourselves and take a step back and go, like, what are computer systems? What's the state that we're talking about? Stateful, Stateless. The state is actually like what's in memory really? And what are the instructions in memory that I want to execute against? If that's all the state that we need, we don't have to keep the machine around. What we have to keep around is what's in memory, the state. It's the memory state and the current instruction state, which is what's in the instruction stack. What are my functions on the stack, what can I run, what I can call, what are all the variables on my stack. One easy way is to try to serialize the entire machine state and say spin up a new machine, load it up and go again.

Jiquan Ngiam [00:35:22]: Kind of slow. You can imagine how that works whenever you chat and finish, just to shut down the machine saves the state away. Kind of a poor man solution there. Another way is to be able to bootstrap machines and load up states much faster. Another way is to rethink a whole runtime and let's design a whole new runtime that is designed around being stateless and then execute code, run it, write it to memory, but really control the entire interpretation of the code and how you store them for variables. That's another way to do it. We are much closer to that approach where we don't just exact eval the Python code that comes out, we actually inspect it, we look at it, we run it, we actually interpret it in the way we want to. That gives us a good amount of control.

Jiquan Ngiam [00:36:14]: So, for example, one of the challenges that we quickly run into when we deployed this was someone loads up their 100,000 row spreadsheet, go do this for me, starts running. If you ran Python out of the box or any programming language, it's like there's no output. The terminal is there this running and is it running? I don't Know you're on top. You see CPU is running. So yes, it's good. Wait, no, it's been going. But you really want your execution runtime environment to give you feedback. How long has it been running for? What actions is it taking? What is it doing right now? I really want to see a real time indication of what's it doing in the UI in ux.

Demetrios [00:36:57]: That would be brilliant. Of course, just like we see the streaming of the ChatGPT giving us the answer back exactly when the code runs.

Jiquan Ngiam [00:37:04]: You'Re on the stream too. But for the execution of the code and then maybe you even want to an estimation. Can we do a static analysis and say, oh, this is how the code is laid out, this is the loop. It's going to be 9,000 iterations. Each iteration has been running. It looks like it's about five seconds for one iteration. Oh yeah, so it's like 9,000 times five seconds. That's how long it will take.

Jiquan Ngiam [00:37:26]: That'd be nice. So these are the things that we've been looking at thinking, designing around. And so if you use Lutra today, you start to see hints of that. So when LUTRA starts to run a piece of code, you actually see real time live streams of actions is taking. You start to see indicators of time estimates and make all of that better and more exact. And then also, more importantly, AI functions are expensive. So if this thing is running for you 9,000 times and in the middle it's calling some model, it's going to rack up a lot of tokens soon. And so estimating not just the time but also the cost.

Demetrios [00:37:59]: Oh, I love that.

Jiquan Ngiam [00:38:00]: So this might cost you 5000 credits, AI credits to run because you're going to consume that many tokens, right? So that's all the things that come into picture when you really start thinking about designing around AI operating software.

Demetrios [00:38:14]: Yeah, I love that so much on the cost. And it also feels like you have a chance to warn people or set up alerts or do something where maybe the first couple iterations you don't do anything, but you take a pause and you say, hey, yeah, by the way, if this is going to go 9,000 times, it's going to probably cost you this much. Are you okay with that?

Jiquan Ngiam [00:38:38]: Exactly.

Demetrios [00:38:39]: And then you can have that human say yes, which you don't get in traditional sequel and stuff. I remember a horror story of a friend who did some kind of a join and he was working at Spotify and then a week later somebody knocked on his door, was like, hey, why did we just spend $40,000 on something the, like this, you know, our database bill just spiked. And he was like, ooh, that was a big join that I did. I didn't realize how big it was, you know. And so something like this, where it's conscientious and the alerts are built in and you're ready to say, you can say if I am about to spend over X amount. Yes, warn me.

Jiquan Ngiam [00:39:29]: Exactly, exactly. And that's just exactly the things that we think a lot about because we had versions of Lutre earlier on that didn't warn. And he was just like, I send one prompt and all my credits are gone. Now what happened?

Demetrios [00:39:42]: They were like, oh, shoot. Because it's just a prompt, right?

Jiquan Ngiam [00:39:45]: Yeah.

Demetrios [00:39:46]: But meanwhile there's a whole lot of stuff going on in the background.

Jiquan Ngiam [00:39:49]: Exactly. So we are entering a world in which a single prompt could set off a whole chain of processes. Today I think there's some reports and articles can do minutes, hours of work for you in the future, it can do days of work for you, months of work for you. Without thinking about that, Wait, there's a lot to design around that because if one prompt can just do that, you want to have all this ways to help users understand the implications of that one prompt. And today that's why we're designing around here. We're really thinking hard about.

Demetrios [00:40:23]: Yeah. And I can also imagine a world where you would want to try and give some advice or, or help say where maybe it's like, here's what we've decided, right. Here's the choice and you can click in and maybe you're like, well maybe all, not all 9,000, let's bring this down to 2,000 and we'll just see if I can find any value there.

Jiquan Ngiam [00:40:48]: Exactly. It's like start small, do a few 600, let me review it first. Or even better AI, can you review your own work and if it's good, let's do the rest of them. Right. And in fact, I think as an application developer, this also is really nice because from even a business model perspective, a lot of AI I think will be more usage based. The more you use it, the more work the AI does for you, the more you want to charge for that too. Right. And so if you can help people understand the value of their work and the magnitude of their work, it also means that it's a very natural business thing to do where paying by the amount of work it does.

Jiquan Ngiam [00:41:26]: And so it's really nice and it aligns expectations. Right. And you do not want it to go off and do 9,000 rows of work if they're all going to be really bad and that's the worst thing.

Demetrios [00:41:37]: You can, then you're paying for it.

Jiquan Ngiam [00:41:39]: It's like you hire the intern. The interns are doing a terrible job. I'm working really hard and it's like, ah, I got to redo everything that's not good. Right. But then I think that's, you know, that's all challenges to design in this space right now. Especially when, you know, when you say one prompt can do days of work. Yeah.

Demetrios [00:41:56]: It's just long running jobs, man. Wow.

Jiquan Ngiam [00:41:59]: Yeah.

Demetrios [00:42:00]: You are thinking a lot about system design with AI. How do you feel like system design will change in the coming years? And especially if we're thinking AI is only going to get better. The models, the underlying models. When I say AI, that could be interpreted as many things. But if we're looking at the underlying models getting better and being able to understand systems better, how do you feel like system design is going to change?

Jiquan Ngiam [00:42:27]: I kind of view quality of models in a few dimensions. The first very obvious one is context window length. But it's very important. The reason for that is that back in the day of GPT4, when it was like, I don't know, was it 8K or 32K? I'm not sure if people remember this. It was tiny context windows and we had to do a lot of workarounds. All this work around RAG was actually a lot of it was because the context windows were too small. So our snippets in RAG are very small. But now it's like shove the whole document in the PDF file and go with it.

Jiquan Ngiam [00:43:01]: So as that gets bigger, that really changes the way we think about things and what we can give the model as context, if it can reason about them really well. Second one I think is oddly enough, costs, cost meaning quality and the quality, meaning quality of the outputs. And the reason why cost is actually very interesting and important is that it's most salient actually. If you compare it to maybe a year or two ago with GPT4. It used to be that a single call to GPT4 was, I don't know, on the order of like $60 for a million output tokens, only $120, like some crazy amount. And now it is on the order of $3 to $5. So it's about a 10, 20x reduction. So put it this way, if you were to ask the AI to do a task and it had to make 20 calls to figure out the right things to do.

Jiquan Ngiam [00:43:50]: And each call was about, I don't know, 50 cents, right? 20 calls, 50 cents, $10. Too expensive. I don't want to ask any AI to read my emails and summarize them for $10 I'm just going to do it myself.

Demetrios [00:44:05]: That's a few cups of coffee right there.

Jiquan Ngiam [00:44:06]: Exactly. But if you take $10 and divide it by 20 suddenly that's way cheaper, right? Like 50 cents, right? Yeah, sure, 50 cents. I will do that. I can do that 20 times now. That's great. But then when you start thinking about that, why do you want to do that? It turns out that a lot of the times the models are not perfect. They're not going to single shot give you the right answer and the more you can get the models to decide like I'm going to do a task, what do I need? Get the sample I need now based on data. What should I do next? That iterative way of using a model over and over and over again is very powerful.

Jiquan Ngiam [00:44:39]: We know it works well, it's very agentic but it's a ton of calls. So if the cost of models keep going down then model developers and model users like myself will be very encouraged to call them a lot. And calling them a lot actually improves the performance a lot. So cost is actually a big thing right now because I think the models are really powerful right now. If you brought down cost by 10x today I suspect we will see a lot of gains in just how people do reasoning because just keep calling it more. Then the third one I think is just like quality of outputs, reasoning traces and so on. I think that's going to be better too. So a lot of the work we have done just looking back has been working around those things.

Jiquan Ngiam [00:45:22]: How do we work around context windows being too long, how do we work around saving on number of calls we make and then how do we work around on trying to be smart about which models we use from a quality perspective when. But moving forward if those are not considerations then we should just be using them. We'll stop worrying about what to put in the connects window. Just put as much as we can. We'll not worry about how often we make calls, we just call as much as we can. And then quality wise I think it's going to be increasingly better. More selection, more selection too. And so that's kind of like the things, those are the three things I kind of pay attention to from a very high level perspective.

Demetrios [00:46:04]: Do you think that's where the Biggest bottlenecks are right now?

Jiquan Ngiam [00:46:08]: I think so, yeah. So I think it's like the quality cost trade off curve is I think the thing that everyone's pushing on. So there's this frontier on what that parallel optimal quality cost frontier is. I think Google is at the frontier right now, so it's kind of cool. But I think there's so much more we want to use it for and I think we'll just. For example, when we generate integrations today, we're like, okay, we got to go and figure out the API docs for the new system we want to integrate. We got to figure out lutrust API docs on what we are doing and generate code in the middle, test and run that code, self debug it a few times and then produce an output. We now have to be very prudent on what do we shove in as the API docs from the other system? Do we shove in everything wholesale? Do we need to strip things out? Do we need to format it nicely? Do we need to be very careful what we put in there? If the context windows are very big and they're effective, we don't really have to worry that much.

Jiquan Ngiam [00:47:09]: We just say put it all in. We'll give it more context than it needs, probably because it's not a concern to us. Same for our own documentation. Now when we produce some kind of integration, we need to test it that runs in a loop.

Demetrios [00:47:25]: You just continuously let it go until you're super confident.

Jiquan Ngiam [00:47:28]: And now it's like, okay, when it's testing itself, we want to give it debugging information. So it's going to make an API call to an external system, observe what's going on and say it is great, that's fine. If it's bad, we want to give it as much debugging traces as possible. Not just standard out, standard error, not just what's wrong, but actually the traces in the calls itself. Maybe the headers are coming back. Maybe all the details that you don't normally see as a user or even.

Demetrios [00:47:59]: In the terminal search, whatever, stack overflow for that too.

Jiquan Ngiam [00:48:04]: Exactly. So we'll be much more open to just grabbing all of that stuffing into the context too. But that requires the models to be able to accept long context reason about them in a good way and be cheap enough for us to do that. Because if it's going to cost us a dollar each time, we're not going to do that. If it's going to cost us like five cents, great, one cent. Oh yeah, no brainer. Do that as much as you want. It's.

Demetrios [00:48:28]: Yeah, it's funny that you mentioned that because I wrote a blog post probably six months ago on how the price per token is going down, but price per answer is going up. And it was that very fact of okay, per token, cool. It's dropped, it's plummeted. But now because it is plummeted, that leaves us creating more complex systems and sending more calls. And so no longer are we just doing a back and forth one off, hey, give me a poem in the style of Bob Dylan about my last earnings call. We're creating these complex systems and since we have these complex systems and there's many calls being made, our price per answer has gone up.

Jiquan Ngiam [00:49:16]: Exactly, exactly. Like it's a, I mean, Javon's paradox at work. The cheaper you make it, the more you want to use it. And then price token goes down. But we stuff so much more context that go back to the same price again. We want to do more so the price goes up. And I think there's just a lot of demand for the sentiment. So I think it's great in that the input convex windows have gotten very large.

Jiquan Ngiam [00:49:43]: Things like prompt caching, we use a lot as well. That's super important. And I think there's also a trend now that the output windows are getting quite big too that tells you that trend is only going to keep going.

Demetrios [00:49:56]: Are there any other trends that you think are going to keep going?

Jiquan Ngiam [00:49:59]: Let's see. Multimodal. That's a big one. Because a year or two ago we were blew away by Dali. Remember when Dali first came out, we're like, oh my God, with these images. And now we're like, oh yeah, GPT image1, chatgpt things, of course, table stakes. That's like, whoa, the bar just gets higher so quickly. Multimodal video, all that we just keep coming to.

Jiquan Ngiam [00:50:25]: I think we will see a lot more agentic type things out there. Agentic systems I think were just from a quality cost perspective, not feasible. Definitely two years ago, when GPT4 first came out, a year ago was really expensive. This year we're seeing prices plummet in a way that they become feasible. That too.

Demetrios [00:50:49]: What about you mentioned how you're using Gemini for like the image understanding or extraction.

Jiquan Ngiam [00:50:57]: Yes.

Demetrios [00:50:58]: Do you ever try with videos?

Jiquan Ngiam [00:51:01]: Not yet. So we, we do know it works.

Demetrios [00:51:03]: Yeah, yeah, I haven't tried that either, but I realized that, oh, that's something cool. That potentially also could be. I don't know what the use case would be. I don't know if it's just like, cool. And then it doesn't have any real applications. But that's a fascinating one too.

Jiquan Ngiam [00:51:18]: You can imagine, take this podcast recording, put the video in. I think they support like 3 to 4000 frames out of the box, which is like, you know, slice up the video with the audio too, and then you can say, great, figure out all the parts that were awkward and then just cut them out and then it just produces for you.

Demetrios [00:51:36]: Then it gives us a minute long video and I go, no, I need an hour. Shit.

Jiquan Ngiam [00:51:40]: But it gives you all the timestamps, right? And then you're like, okay, run it through FFmpeg, those timestamps. And then, okay, now for other parts that the audio sounded bad, run it through some other filter to improve the audio. So it's just imagining like, hey, some of these things that we do, manual editing and everything, the models can start to reason about it. And that's fascinating, right? So, wow.

Demetrios [00:52:00]: Actually, one thing that I always love to do, the real professional podcasts do this because they have a whole staff and team behind it, but they will. If you have an interview, they don't give you the interview that you see, or I should say this a different way, what the end product that you see is, is not the interview, how they recorded it. They slice and dice it and move this part to the front and move that part to the back and oh, the story fits better like this. And so I never do that because I never have time to really think about it and get that creative and say, oh, that question actually should have been here instead of in the end of it. But that's a potential use case because.

Jiquan Ngiam [00:52:46]: You could ask the model to give me time points to slice the videos. I've ordered slices, generate a description for every slice, can now give me a reordering of all the slices.

Demetrios [00:52:57]: And then would this narrative go better here or I think this question or this answer, I jumped around. I was talking about integrations. Then we talked about something else and then we went back to integrations or whatever it may be.

Jiquan Ngiam [00:53:10]: So you try it with this video.

Demetrios [00:53:11]: Yeah, I might actually.

Jiquan Ngiam [00:53:14]: No, that's. That's fascinating. Yeah. But yeah, I think those are the things in there. I think also I think there is this bigger trend thing that I think people want to do solve as well, which is, you know, we hear a lot about computer use. I think it's very early. Honestly, I use operators. Like I was a bit, you know, initially I was super hyped, like, oh, this is going to Be so cool.

Jiquan Ngiam [00:53:37]: I ran it, I was like, yeah, thrill my thumbs kind of getting there. Oh, it needs my help now. Ah, can't get it there. Then after like a day I was like, stop using it. But I do see a world in which the systems, they can be trained to understand our world as we see it, how we use our computers, how we navigate around the physical world and everything. So I think we'll start to see a lot of developments there too. Very early days, but I think definitely some promise in how that works. And oddly enough, I think it just still goes back to the fundamentals, which is, are the context windows big enough to put in all the frames or history of interactions to reasonably predict what to do next? So that's kind of interesting.

Demetrios [00:54:25]: I think the other interesting part on that is I want to be using my computer. So would it be some VM or background process that's happening in a different sandbox or a different. Almost like it would be some different VM there. And then when it gets stuck and it needs my help, it comes to the forefront of my computer.

Jiquan Ngiam [00:54:46]: Yeah, totally. I mean, like I can. I mean, there's so many ways we're in a. There's this article I read yesterday by. By Pete Kuhlmann, I think, on AI horseless carriages. I've seen that one.

Demetrios [00:54:59]: Oh, that's great.

Jiquan Ngiam [00:55:00]: I think we are in that world right now where most of the AI applications are like horses, carriages, like the guy in front on the carriage and the AI is. The engine is machined at the back. Why is it there? It's a bit retrofitted in ways that we don't expect it. And you start to reimagine five, 10 years from now where the technology is cheap enough to be deployed everywhere. Voice recognition is amazing, multimodal is amazing. And then you go like, what should the UI be for a computer? Now I think the GUI that we have that came from. I remember Fondly, remember Windows 3.11 days to 95 and everything to today with Marm, Max and L. They've been designed for humans to operate.

Jiquan Ngiam [00:55:53]: They have windows, they have start buttons, they have layers of navigation and all. They're very humanistic in some way to help us understand how these things work. But the AI doesn't need many of that. The AI operates at a more fundamental level. You can write software, you can operate that. So I think we're still in this horseless carriage world where we haven't really figured out what's the real interaction model should be yet. For example, if we wanted to do this task of getting the machine to cut up your video for you. What if you give it the task, you give it the MP4 file, say go do it.

Jiquan Ngiam [00:56:27]: But you don't want the AI to work without telling you what it's doing. Showing you clicking around a UI and trying to get it to do work, it's kind of nice, mimics what you're doing. Very understandable, but it's also not very inefficient. What it really should do is run some FFMPEG commands, do some slicing, run another model, blah, blah, blah, do all this stuff. But what if on the fly, it generated its own UI on the fly? Because the models can produce websites, it could write its own software. What if on the fly it produce a representation of what it's doing to you that is not your video editing software, that is not just the RAW commands that's running, but something intermediate that represents what the AI is working on so that you can understand what's happening. How would that be interesting now? So not even the vm, right? It's almost saying create your own one off experience to explain the task you're working on in a visual way so that a human user can understand what's going on. So it's like, okay, I'm going to write a little mini app, electron app to show you how I'm cutting up the videos.

Jiquan Ngiam [00:57:38]: Okay, the videos are cut up this way now. Oh, great. I'm not going to edit them. I want to modify my electronet on the fly to show you how I'm modifying the videos. Start seeing that go. So you're not watching it. Use your current applications, you're watching it. Do something really rich.

Jiquan Ngiam [00:57:52]: But then showing to you how it's doing it in a very intuitive fashion.

Demetrios [00:57:57]: You know what the problem I have with that is though, is the. Because I've thought about it and I've had all kinds of conversations with people and the. There's certain applications that we use that are very specific and very niche, like video editing. The pros, the pro video editors, they know what words describe certain actions inside of the camera. And they know, all right, when I'm editing, if I want different colors, if I want different LUTs, all of that, even LUT for someone who doesn't really video, do a lot of video editing, they're just like, I just want it to look better. And maybe the AI can give you something, but if you want just one little thing changed and you don't know the word to describe it, to prompt it then it's really hard. But if you know that, all right, I can go in and I can either. Normally, what I do as a very, very amateur video video editor, I'll just go and type in YouTube.

Demetrios [00:59:07]: I wanted to do this and I can't see. And then I'll search around in YouTube a bit and try and figure out what I'm looking for. Potentially there's a world where, yeah, I don't search YouTube, I just am prompting it back and forth saying, no, I want something more like this. And I. But usually it'll be me going through different YouTube tutorials to see this is how you do it. And there's these specific tasks that you want something one thing changed and I'm sure you've had it with cursor or any experience where you just want one thing changed and that one thing gets changed, but the rest gets blown up totally.

Jiquan Ngiam [00:59:48]: No. It's funny, you remind me of the conversation I had at my designer on my team recently. We use image and models to create illustrations for some of the automations people make on Lutra. And then he writes up this really long prompt that work really well. And clearly he understands how to express a certain kind of design language way better than I do because when I try, it looks like garbage. He laughs at me for, no, that's cool. But. But to that point there are a few tokens that we use, words we choose that have a huge impact.

Jiquan Ngiam [01:00:28]: If you say that particular design pattern to cursor, it's like, ah, I know what you want. Do that design pattern for you. So this is interesting thing in there that I do wonder if the two thoughts in my mind here, I guess the first thought was, what is the role of people? What do we do with AI doing all of this work for us? It turns out that I think one is that having good taste. But then even if you have good taste, how do you describe your taste? It turns out that describing your taste is having those tokens. Everyone now knows the Studio Ghibli token, that's an easy one. But there's so many other ways to express style. A form or design pattern that I think is really hard. Number two I think is I do wonder as well, can the machines help us understand the styles we like? So for example, you look through all the YouTube videos and you go like, I like that kind of style.

Jiquan Ngiam [01:01:29]: I mean, why not just fill in all the videos and say, I like that style, don't like the style. This is my mood board. Tell me how I should describe This, I think it will actually do a pretty decent job at describing it. Now, the hard part of this thing is still figuring out what styles you.

Demetrios [01:01:42]: Like, but in that fine tuning too, where you have those, that last mile of oh, I need this, but I do like that. I think I've seen a tool like that where it's almost like a Miro board, but instead of it being a random Miro board, the stuff that you put on the Miro board are ways to give the prompt more. Yeah, and it's more enriched prompt. It's not just saying the words, but.

Jiquan Ngiam [01:02:10]: You know, I mean, back to the horseless carriages thing. Are we in a world where the problem that we are seeing is that most of the software today has been pre designed for us to use. So whatever knobs that exist in our software, some developer sit down like, say, I like those knobs, I'm going to give them to you. But what if those knobs that were given to us was generated on the fly? What if your video editing software looked at your video and say, oh, this is a podcast, or here are the five knobs you want. Oh, this is a video recording of a commercial out on the street. These are the five knobs you want. Or this is an interview with someone. This is the fine knops you want.

Jiquan Ngiam [01:02:50]: And one of those knobs that it gives you are dynamic based on its world knowledge. So what if you don't have to know the words, but the words come to us? Right? And that's not like that one. I'm more. Am I just ideating because I'm like, that's a different world we're in where software isn't just one size fits all. Designed and chipped software adapts to the context. I don't think we've seen that kind of software yet. We just have not. And so I go to bed thinking, wondering what that would look like, because I can imagine when we say the word personal computer.

Jiquan Ngiam [01:03:27]: I think the word personal actually came from the genesis of computing that does what we want to do. But these days the personal computer runs the same software for everyone. It's not very personal, except for how I typing my data in there. But if it's truly personal computing, then is it AI that looks at what we're doing and then makes suggestions to us? You wanted that particular style. You don't know what is it called. You don't even need to know about it because the AI has looked at a thousand podcasts and say, yeah, that's a style you should consider. And then you click on it and say, oh, great. And then you're like, that's not the one I want.

Jiquan Ngiam [01:04:07]: Inspire me. Show me five more. And then you start to learn. Right? Because the fact of the matter, I think, is that the models have seen more data than any of us is humanly possible can. They've seen every single piece of data out there in the web, in the world. They've ingested it. All the tokens we're talking about that are taste tokens are all in the model. So what better system to tell us, educate us, maybe help us understand some of these things than the model themselves? So more food for thought, I guess.

Jiquan Ngiam [01:04:45]: I don't really know if that's going to happen, but, yeah, exciting worlds.

Demetrios [01:04:48]: I love it, dude.

+ Read More

Watch More

Ax a New Way to Build Complex Workflows with LLMs

Posted Sep 11, 2024 | Views 1.7K

# LLMClient

# DSP paper

# AX

Building Conversational AI Agents with Voice

Posted Mar 06, 2024 | Views 1.6K

# Conversational AI

# Voice

# Deepgram

Anatomy of a Software 3.0 Company // Sarah Guo // AI in Production Keynote

Posted Feb 17, 2024 | Views 4.1K

# MLOps

# DevOps

# LLM Operations

# Machine Learning