Sign in or Join the community to continue

Coding Agents Are Secretly General Agents

Posted Jun 26, 2026 | Views 111

# AI Agents

# Agentic AI

# ClickUp

Share

Speakers

Jay Hack

Head of AI @ ClickUp

Jay has been building with AI since high school. After working in an AI lab at the University of Michigan, he moved to Stanford to conduct multimodal deep learning research. He worked on deployed machine learning at Palantir, where he built large-scale ML systems deployed at multiple Fortune 500 companies. Then, he went on to found Codegen, one of the first coding agent companies, which was recently acquired by ClickUp. Now, as Head of AI at ClickUp, Jay is tackling the biggest failure point in enterprise AI: the context gap. He's building agents that don't just respond to prompts, they understand how companies actually work, operating autonomously across workflows with persistent memory of what matters.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Jay Hack is the Head of AI at ClickUp and the founder of Codegen, the autonomous coding-agent startup ClickUp acquired in late 2025. He built one of the first ticket-to-pull-request background agents in enterprise software — before Claude Code existed — and has been working in AI since the SIFT-and-SVM days of 2008. In this freewheeling conversation with Demetrios, he makes the case that coding agents and general knowledge-work agents are converging fast, and that the real battle ahead is over context, not capability.

+ Read More

TRANSCRIPT

Jay Hack: [00:00:00] I started doing AI in 2008, so I've been in the game for a minute now. Damn. And, you know, it's like you kinda wanna- 2008.

Demetrios: 2008. That was before AlexNet.

Jay Hack: Yeah, AlexNet 2012.

Demetrios: In 2008

Jay Hack: So in computer vision it was SIFT features and SVMs Wow That, that's basically the state of the art at the time. There were a couple other things that people were using, but, um, you know- Not

Demetrios: even using GPUs

Jay Hack: I think that some... There was... Yeah. Basically Alex was the first one to, to write specialized kernels.

Jay Hack: Yeah. Um, there was some applications I'm sure where it, it like, uh, you know, optical flow, it's highly parallelizable, and I wouldn't be surprised if there were some labs that were doing things along those lines, but it definitely was not like a scaling era for machine learning. Hmm. Um, largest data sets, you know, you could fit on a USB stick.

Demetrios: What gave you the [00:01:00] inspiration to start CodeGen? And like, you, you must have seen something that was like, "Ooh, this is getting better. We could probably do something here." 'Cause there was a company that tried it, and then they folded, right? I can't remember the name of them.

Jay Hack: The history of code generation is littered with companies- Yeah

Jay Hack: that were a little too early, but like totally visionary. So you're probably thinking of Tabnine.

Demetrios: Yeah.

Jay Hack: I think Tabnine's actually still around, and they're still going.

Demetrios: Wait, no, they...

Jay Hack: Ah,

Demetrios: then it wasn't them. There was one that folded- Was it Kite? I think it was Kite. Yeah, Kite. In like 2022, right? Or 2020... Yeah.

Jay Hack: Maybe, maybe even before that

Demetrios: a little bit. And I- Yeah, like 2021. Right. They folded right when ChatGPT started getting really popular, and it was very confusing to me because I'm like, wait, GitHub Copilot is printing money, supposedly. Yeah. They just hit like a billion in revenue after they released, and Kite shut down.

Jay Hack: You hate to see it. I mean, uh, you know, any startup dying, it's sad to see. I think that one of my learnings from doing startups for about a [00:02:00] decade is that momentum begets momentum. Right. And if you are too early, you know, you kind of get people all hyped up on this vision, and then you're actually not the player who ends up bringing it to market.

Jay Hack: Mm. Maybe some people can kind of catch on, and they end up taking off at that point. But in the majority of cases, it ends up being that- Kite ... yeah, GitHub Copilot is the one who introduced the concept. That's what people get excited about. That's what gets hyped up on Twitter. Kite is remembered as like the, you know, everybody tried it-

Demetrios: Yeah

Jay Hack: in 2021 or something like that when Paul Graham hyped it up, and they're like, "This isn't really ready for production yet." Mm. And GitHub Copilot was, you know, it was directly integrated into VS Code from the very get-go. Yeah. And so you didn't really have to do very much in order to get access to it.

Demetrios: Yeah.

Jay Hack: Um, but then the sort of... I think the bigger revelation was Cursor. Yeah. Cursor. GitHub Copilot was 2022, I wanna say, is when that came out, and it immediately-- It was in the happy path, which was amazing. So you didn't really have to do very much in order to benefit from it, right? It would just do auto-completions for you- Yeah

Jay Hack: once you flipped it on. Um, and I think that that same experience of basically not having to take any additional action, but just having AI [00:03:00] seamlessly injected into your workflow, that's gonna roll out to a bunch of other industries very soon. I think that Cursor's sort of... The, the thing about Cursor was nobody could tell you not to download it.

Jay Hack: Yeah. So like most of the people who ended up adopting it early on, these sort of hype guys on, on Twitter, which, you know, I'm very much so a part of that ecosystem, were people who were like working at, you know, I don't know, JP Morgan or something like that. Mm. They download this editor, they swipe their personal credit card for 20 bucks a month, and then for me, the re- revelation was the composer mode, where it has a sidebar that pops out and you can kind of chat with it.

Jay Hack: Mm. Really kind of hit its stride with Claude Sonnet 3.5, which was probably mid- 2024- Yeah ... something like that. That was, yeah. Yeah. That's the inflection point right there. Um-

Demetrios: But then CodeGen, what made you wanna start it, and did you see the... How did you see the writing on the wall?

Jay Hack: Depends how far you wanna go back.

Jay Hack: I would say, you know, I've always been interested in artificial intelligence. Um, you know, I think that, uh, the idea of the machine writing itself and creating itself is this sort of recursive building. It's something that's always really appealed to me going way back [00:04:00] to, I mean, even high school. There's a book, Gödel, Escher, Bach, you might be familiar with it.

Jay Hack: It's sort of like a... It's a Pulitzer Prize-winning book, and it's- You

Demetrios: way overestimate my culture.

Jay Hack: Got it. I

Demetrios: do not know that at all. What is it?

Jay Hack: Uh, it's a book called Gödel, Escher, Bach. It's by this guy, um, Doug Hofstadter, who, he sort of, like, invented the SimSys major at Stanford in a way. Oh. Or at least the, the major that Marissa Mayer and many other people have sort of made famous.

Jay Hack: Mm-hmm. It's, like, based on this book almost, or it directly reflects the contents of the book at least. Um, and it sort of coincided with him writing this book. Went on to win the Pulitzer Prize. The core concept of it is you have, uh, M.C. Escher, the painter, who you might know the- I love that ... painting of him.

Jay Hack: Yeah. Yeah, exactly. Uh, Johann Sebastian Bach, the musician, and then Kurt Gödel. I'm definitely mispronouncing his name, but he's a logician. And all of their works have some inherent properties to them or a substructure that's shared, and usually that has to do with self-reference. So for example, there are- The

Demetrios: stairs in M.C.

Demetrios: Escher.

Jay Hack: Exactly, the stairs, the, you know, hands writing themselves. Uh-huh. And there's actually, like, a lot of, [00:05:00] like, really interesting mathematical concepts embedded both in Bach's work, whether he meant it or not- Oh, yeah ... and M.C. Escher. Um, and then Kurt Gödel is famous for the essential incompleteness theorem, which is basically this idea that any sufficiently powerful logical system will necessarily end up containing things that can neither be true, uh, proved, that are true but cannot be proven to be true And the way in which it does that is basically it contains something that looks like the Epimenides paradox, which is where-

Demetrios: What's that?

Jay Hack: Uh, it's the guy gets up in front of, uh, you know, a, a bunch of people. Epimenides is the guy, gets up in front of a bunch of people in Rome or in, in Greece, and he says, "Citizens of Athens, I am lying to you." And because the statement refers to itself- Oh, yeah ... then it's, uh, it, the self-reference is basically what enables it to be contradictory.

Jay Hack: So it's a paradox.

Demetrios: Yeah.

Jay Hack: And any sufficiently powerful number system can contain a self-reference like that, that enables it to basically say, "This statement is not provable."

Demetrios: Mm-hmm.

Jay Hack: And so anyways, g- this is a little bit of a sidetrack here, but- Very cool ... the interesting part about this book is it ties together [00:06:00] all three of these different concepts or these different, uh, you know, um, you know, prolific people's works, so MC Escher, JS Bach, and Kurt Gödel.

Jay Hack: And basically, it's about entities that contain self-reference, and the guy uses that as a jumping-off point to talk about AI. And eventually, he talks about the-

Demetrios: Oh ...

Jay Hack: emergence of consciousness comes from systems that represent the outside world, and part of the representation of the outside world in this symbolic system contains a representation of self.

Demetrios: There you go. You landed the plane beautifully.

Jay Hack: There we go.

Demetrios: And so, CodeGen-

Jay Hack: Yeah,

Demetrios: so- ... then came from that book?

Jay Hack: So anyways, that, that, that sort of kicked me off on getting really interested in this stuff. Um, CodeGen, you know, I went through, I worked at Palantir. I, you know, studied AI in, in college. I did a bunch of machine learning stuff, did a decade's worth of startups, give or take.

Jay Hack: Um, and Around 2020 is when... So I've always kept up with the state of the art of AI research, and it's just sort of something that's very interesting to me. 2020 is when the GPT-3 paper came out.

Demetrios: Yeah.

Jay Hack: Um, which was essentially just scaling up language [00:07:00] models and it ended up having these incredible properties of basically few-shot learning.

Jay Hack: So you could give it pretty much any problem that we were dealing with in NLP at the time. It's just like you few-shot prompt this thing, and bam, it's better than whatever you had in mind. And so I knew this was gonna be big. I remember joking with somebody that, like, between Bitcoin and GPT-3, it wasn't clear which would have a bigger impact on the world, and they were like, "That's so, so dumb."

Jay Hack: Like, obviously Bitcoin is gonna have a bigger impact. And I guess TBD. Especially

Demetrios: in 2020. Yeah.

Jay Hack: Yeah. 2020 it wasn't clear, right? Um, and then 2022 is when the API opened up from OpenAI. Mm-hmm. And I was, uh... I had just basically sold my previous company. I had ended up my lock-up period, and I ended up leaving for various reasons, and I entered into this sort of flow state of doing what I called one demo per week, where I would literally just sit at my computer and come up with really interesting concepts of something you can implement using the GPT-3 API, and then some image generation stuff as well, and just put it out on Twitter and see what people think, and try and get, you know, people excited about it.

Jay Hack: And so couple notable ones from that era for me were, I think I was the first person to ship text to Figma. [00:08:00] And so you could input a prompt and you could say, "Hey, give me an iPhone screen with a bunch of other stuff on it," and it would essentially one-shot it and generate you a schema file, and then that would render to Figma.

Jay Hack: Um, well, we... I, I did a text to auto tool... Or sorry, text to retool, but it was actually not a retool. It was, like, on an open source one where you could describe a dashboard.

Demetrios: Mm-hmm.

Jay Hack: And it would have data schema that it was referencing, and it would essentially create a data dashboard for you. And so, you know, what I sort of discovered over the process of building a bunch of these applications is that the things that made a, a LLM application successful were verifiably built into the domain.

Jay Hack: And so anytime you're doing some type of a schema generation like that, you can essentially lint it. You can go through and say, "Is this a valid schema? Does it compile?" You know, there, there are, like, specific empirical things you can look for, and if it's wrong, you can actually detect that error and basically pass it back to the model as a residual and say, "Figure it out."

Demetrios: You know? Dude, which is so clear now, but in those days that's [00:09:00] incredible. You probably felt like you stumbled on gold.

Jay Hack: I think that I, I don't wanna give myself too, too much credit here. Like, I think a lot of other people were, you know, also seeing like, okay, you know, there's limited value in like an email thing

Demetrios: for- Yeah, summarization

Jay Hack: or-

Jay Hack: yeah, auto-complete or something like that. But, um, anybody who's coding, you know, with Copilot at the time was like, "Wow, this thing is incredible." And, you know, it's actually, there's so much low-hanging fruit in code too, where it's just boilerplate. Like, you don't really benefit from writing a Java class implementation except for a couple lines of code in there.

Jay Hack: So it's very well-suited.

Demetrios: And it compiles, which is different than a lot of the other things that you have to eval.

Jay Hack: Exactly. Yeah. So, you know, and, and the implications of that are pretty clear as well with the respect to RLVR, reinforcement learning from verifiable rewards, where you can have an agent go off and write code and try and pass a unit test.

Jay Hack: And if it doesn't pass the unit test, you say, "Here's the failure. Fix it." Yeah. And then eventually it goes off and it does pass the unit test, and then you say, "Okay, learn from your mistakes."

Demetrios: Yeah.

Jay Hack: Or maybe rephrase your trajectory so it's [00:10:00] perfect, and then you fine-tune on that. Um, I remember seeing one paper actually that was really an aha moment, where they were trying to train- I think it was for performance improvements.

Jay Hack: So they said, "Okay, here's a certain function that we have, and we want to train a model to be good at rewriting the function such that it's more performant." Mm-hmm. You can run the function, and you can measure the wall time it takes to actually, or the number of CPU cycles it takes to run a function. And so they'd have it try, you know, 100 different variants of it.

Jay Hack: The one that actually ended up being more performant, they'd say, "Okay, this is the one that we're gonna end up fine-tuning on." And that's, now we just call that RLVR. Yeah. But this was one of the, the first applications for that. Yeah. And so it was also pretty clear, I guess the point I'm trying to make is that the direction that research would go in would be these things are gonna get really good at coding really fast because the feedback loop is instantaneous.

Jay Hack: If you're working at Anthropic, like you're writing code with these things all day, you can figure out what it's lacking. Yeah. And then that informs the type of, you know, evals you put together.

Demetrios: Yeah, all the necessary like boundary conditions are set up for your [00:11:00] success.

Jay Hack: Exactly. There's so much data, you can produce it synthetically.

Demetrios: Yeah.

Jay Hack: Um, and at the time in 2022, people like Gary Marcus were getting out there and saying like, "Oh, we're running out of data." Um, which everybody knew was, you know, first it's false that we didn't run out of data. There's still plenty of, you know, pre-chaining data available, but also it completely ignores the fact that you can synthetically create, uh,

Demetrios: new data.

Demetrios: Yeah. There's synthetic data, and the internet is doubled like every three days.

Jay Hack: Yeah.

Demetrios: The amount of data that's on there, I know that a lot of the new data is coming from LLMs, but this conversation is gonna go out, and it wasn't on the internet yesterday.

Jay Hack: Yeah.

Demetrios: So like that kind of stuff, yeah, I was always a little bit skeptical on that idea of we're running out of data.

Demetrios: But then you had Il- Ilya came out and said it had some neurops too, and it's like, eh, this guy is way in a different league of- Yeah ... brain power than me. Like [00:12:00] should I question my belief? So I don't know.

Jay Hack: Yeah, the, the internet doubling is probably increasingly not from human-generated data. Yeah. So you have to be a little bit sus of the stuff.

Jay Hack: There's this actually, speaking of MLOps, one of my favorite papers of all time is, it's called "Machine Learning: The Highest Interest Credit Card-"

Demetrios: D. Scully, the one you had to reference any- Yes ... talk that was given around machine learning and MLOps and from like 2020 to 2022. It was, you had to reference that box of like the model's just a small piece of this.

Jay Hack: Right. And there's so many opportunities to shoot yourself in the foot, and the example that sticks with me from that paper- is they talk about training Google Translate and how, you know, they assemble parallel corpora. You get a bunch of English pages, you get the tran- Spanish translation when they're, you know, available in both, and then you train it, something to map between the two.

Jay Hack: And then they realize at a certain point that actually most of the Spanish pages on the internet are the output of Google Translate.

Demetrios: Yeah.

Jay Hack: And so [00:13:00] they're essentially drinking their own backwash. Yeah. And I think that that's absolutely what would happen if you just try to pre-train on all text on the, you know, on Reddit- Mm

Jay Hack: 2026 onward. Yeah. It's, like, probably mostly OpenClaw at this point, but- ... or whatever he calls it. Yeah. Um, so you gotta be a little careful on that.

Demetrios: Yeah. Yeah.

Jay Hack: And it essentially gets in these loops of, like, confirming to itself that the correct translation for an idiom is, you know- Yeah ... whatever it thought was, like collapses in weird ways.

Demetrios: Exactly. Now then, you just sold Cogen to ClickUp. Walk me through maybe the process of selling, and then also now what you're doing.

Jay Hack: For sure. Yeah, so we built CodeGen over three years. The original concept behind it, we started it when I think Cursor had already kind of hit their stride. Mm. So it was clear they were gonna win.

Jay Hack: I think if you asked me, like, in early 2022 or, sorry, early 2023 what the landscape gonna look like, I'd say Cursor is gonna be the editor of choice. Yeah. Maybe VS Code will make a comeback. Didn't happen.

Demetrios: Mm-hmm.

Jay Hack: Um, you know, maybe there'll be, like, one or two other players. And so [00:14:00] we thought very early on, "Okay, we're going to essentially leapfrog, and we're gonna shoot to where the puck is going, skate to where the puck is going, and build a fully autonomous background agent."

Jay Hack: And so the idea was ticket to pull request. You essentially create a Linear ticket or a ClickUp ticket or something like that that would say, "I want this change." Then it would spin up an agent in a sandbox who would go off and write the code, and then submit a pull request back.

Demetrios: Oh, so- So- And so you wouldn't even have to deal with the IDE.

Jay Hack: No IDE. You should be able to pull it into your IDE and- Right ... iterate on it there. But I think that the, you know, this was in GPT-4 32k era. That was, like, the first model that came out that could really do, like, agentic sequences. 32k is roughly the size of a single tool usage for, you know- Yeah. I think Claude Code's, like, limit is, like, 15K.

Jay Hack: It's, like, two tool usages now. It was very difficult to fit that all in. Um, but yeah, we, we essentially were the first people to launch a thing. We started with Linear actually, um, something where you could give it a ticket, it would go off and it would write code.

Demetrios: Wow.

Jay Hack: In the three years since we launched that, the space developed a lot and, you know, we were sort of discussing earlier, it became clear that code is [00:15:00] such a fundamental thing for these agents to, to function properly.

Jay Hack: A lot of people had thought maybe there'll be code-specific models and non-code specific models. Yeah. And really it seems like that's not really the case currently. There's GPT 5.3 Codex, but whatever knowledge that model has that's specific to code, that'll be folded into GPT 5.4.

Demetrios: Yeah.

Jay Hack: Like, because there's this notion of positive transfer where if you get better at coding, coding is sort of just an encoding of reasoning, and so it gets smarter at everything else.

Demetrios: Yeah.

Jay Hack: Um, and so yeah, basically the exact product that we rolled out and, you know, we got great customers, we had a ton of community engagement and whatnot. We were sort of the, the first there. We ended up seeing increasingly in the deals that we were going into that the foundation model labs were, you know, offering their product for free for two years at a time.

Jay Hack: And so there's no procurement department in the world you can go to where they're like Yes, I'll take this company that, you know, fly-by-night San Francisco startup. Um-

Demetrios: Yeah. And you're using them in the background, I imagine, so they're your competition.

Jay Hack: Exactly. And many other companies have gone up against this, Windsurf famously.

Jay Hack: Yeah. And, you know, had Claude yanked.

Demetrios: Yeah.

Jay Hack: [00:16:00] Um, I think Cursor, you know, they're now training their own models as well, 'cause their margins are so much worse than-

Demetrios: Yeah ...

Jay Hack: than what you get in, uh, in Claude. Or sorry, in the, yeah, Claude Code.

Demetrios: Yeah.

Jay Hack: Um, but yeah, I mean, it, it, the, I think the, at a higher level, you know, this is, this is bigger than all of us, was basically my, my takeaway.

Jay Hack: Like, code is gonna be solved in a year maybe. I mean, I don't think that you and I are gonna spend that much time in an IDE very soon. Um, I think that is gonna be largely driven by, you know, $100 billion training clusters, essentially. Um, these people have so much capital to invest, and I think we're facing the automation of knowledge work more generally.

Jay Hack: Yeah. That's a really exciting thing to be a part of. And as somebody who's spent the last 20 years or so working on AI, I wanted to be a pivotal contributor to a winning team. Mm. And meanwhile, we had built this really great partnership with ClickUp, um, and a couple other companies. We ended up finding ourselves in a position where one of our business partners offered to buy the company, and it was perfect timing.

Jay Hack: I was like, "Wow, this is actually really [00:17:00] exciting." We went through a C- One of

Demetrios: your business partners? A

Jay Hack: company that we worked closely with. Unfortunately, there's some things I can't disclose- Okay ... about this just to- Yeah ... you know, preserve-

Demetrios: Keep them anonymous then.

Jay Hack: Company that we work closely with and, and enjoy.

Demetrios: But it wasn't ClickUp.

Jay Hack: Uh, this, the initial company that offered to buy us- Yeah ... was ClickUp, that's right.

Demetrios: But it got you thinking.

Jay Hack: Exactly. And we, you know, had a, a great relationship with them. The, the CEO's a total visionary, and I spent a lot of time kind of chatting with him back and forth. He had been a user of our product and given us feedback and whatnot.

Jay Hack: And through a sequence of events, essentially, we were the number one agent on ClickUp that was an external one. Um, it became clear to me that this is a winning team, and this is a really exciting vision to work on, that it goes beyond just engineers sitting in San Francisco working on code. And I'm happy to get into why that's the case.

Jay Hack: There's, you know, a lot of my personal theses essentially lined up with what ClickUp was going after. So

Demetrios: the- Okay. Go more.

Jay Hack: Yeah, the, I think that the, one of the revelations of the last couple of years, like I mentioned a second ago, is that- Coding agents are generalist agents. If you make an agent better at coding, then it ends up being better at everything else because of [00:18:00] this notion of positive transfer.

Jay Hack: Um, but you know, also, if you look at anything that an agent will accomplish as tools or, you know, does various things in order to reach out into the world, all of those things that it uses to reach out in the world are comprised of code. Right. And so if you have an agent that can write and execute code, then basically it ends up being a, it's like AGI complete.

Jay Hack: It has the ability to write its own tools. It has the create tool tool, which is write a batch script, right? Um, and so it seems like me that basically coding agents are convergent with generalist knowledge worker agents, and I'm not the only one who's noticed that. Um, you know, a great example of this being sort of distributed more broadly is Claude Code used to be called Claude Code, and the Claude Code SDK was called the Claude Code SDK.

Jay Hack: Now it's just the Claude Agent SDK.

Demetrios: Yeah.

Jay Hack: Um, and OpenClaude, also a great example of this. The, the way it does stuff is it writes code on a sandbox. Yeah. So you just basically have these primitives of for loop with a LLM call in it, sandbox, put them together and, you know, boom, you've got a, a fully generalist agent.

Jay Hack: And I think if [00:19:00] you assume that gen-- like, models are gonna get better, they're gonna be capable of performing any generalist task that relates to knowledge work. The people who are gonna win in an environment like that are people who have the best access to context, the best access to surfaces on which you interact with agents, um, and the best unit economics and existing distribution in order to get that in front of people.

Jay Hack: And- Absolutely ... there's essentially no other player than ClickUp that has all those three things in a way that I thought was, you know, the most compelling. So we got hitched, and here I am. Now I'm running our AI operation.

Demetrios: But there's so many things that you said there that, that are fascinating to me, especially around, like, just how we interact with-

Jay Hack: Basically, my experience going through AI has been most things that end up rolling out to the world more generally, they start in code.

Jay Hack: Whatever pattern ends up catching hold-

Demetrios: Mm ...

Jay Hack: coders do it first. There's a lot of good reasons for that. One of them is just programmers are working on the stuff anyways, but we talked about the verifiability before. So, like, tab autocomplete started in code. [00:20:00] Sort of sidebar that pops out that you chat with, that also started in code.

Jay Hack: Even autonomous agents, arguably, like, the first true autonomous agents were programming agents. Yeah. Um, and now it's sort of rolling out to the, the, the world more generally. Hooks is another good example. And I am just so excited for the rest of the world to go through the experience I went to, which is, you know, I got Cursor and Claude Code and my whole job changed.

Jay Hack: Yeah. I now essentially will prompt things and I'll review code and whatnot, but I can... My leverage has expanded so significantly. Yeah. And this idea of Cursor for your whole job, if you are a financial services person or, you know, you're working in, in something like legal services, is gonna be an incredible experience for all those people to go through, and the best place to do that is ClickUp.

Demetrios: Yeah.

Jay Hack: It's Cursor for your whole job.

Demetrios: Oh. Yeah, one thing that I've noticed myself doing is When I create apps now, it's default local first.

Jay Hack: Mm-hmm.

Demetrios: Which is different than how I have always been thinking about it. It's like, [00:21:00] "Oh, you need to be thinking about production. This needs to be battle tested." But now it's like, "No, this is just for me."

Jay Hack: Yeah.

Demetrios: I, you know, like I, I was talking to somebody last night, they said, "I created 40 apps in the past three months, and only three of them have I actually open sourced and, like, given to the world because There's just... It's, it's mine.

Jay Hack: Yeah.

Demetrios: It doesn't need to be... You know, it's so specific to my need that I'm not going to put this out there.

Demetrios: Yeah. Because it d- I don't think anybody else needs it. And also, you gotta then think about, like, security. If you're gonna open source stuff- Oh ... you wanna, like, clean it up. You wanna do things that... You don't wanna just, like, throw anything out there.

Jay Hack: Yeah. Yeah. There's totally something to be said for shifting models of how software will run- Mm-hmm

Jay Hack: given the, the case that it's so cheap to produce.

Demetrios: [00:22:00] Yeah. That is the wild thing, that you have these ephemeral... And, like, the generative UI is fascinating to me, too, and the MCP apps type of things that'll... It's now becoming more and more a thing. And so I imagine you'll start adding it in ClickUp soon enough, where if you don't already have it, where, "Okay, I need to do something," boom- Yeah

Demetrios: you've got the UI right there in that moment.

Jay Hack: For sure. Yeah, I mean, I think that especially with an application like ClickUp, there's these data models that represent everything you wanna do in work, and sort of the primitives for interacting and getting things done, like docs and chats and tasks and whatnot.

Jay Hack: And depending on what task you're trying to accomplish, there's wildly different ways of going about arranging those things. Mm. ClickUp is incredibly configurable, and that is a blessing and a curse. It's a blessing because if you configure it correctly, then it's, you know, does exactly what your workflow is.

Jay Hack: Yeah. But the curse is it takes a while to, to configure it if you have a very specific thing, and there's, you know, certain things you need to know. And [00:23:00] the amazing thing is generative AI is a solve for that. Yeah. So, you can come in and you can say, "Hey, I am the CEO of NVIDIA. My supply chain is in ClickUp.

Jay Hack: Tell me about it." Yeah. And it will build you a dashboard. You know, it'll... We have a thing rolling out very soon that is going to essentially allow you to have completely custom JavaScript layers on top of the core ClickUp interface in a very secure way. And so it will be a bespoke interface for whatever you're trying to accomplish at any given point in time.

Jay Hack: Yeah. And the amazing thing also about, you know, this sort of custom front end is, I'm sure you've seen demos of this, of like, you know, Cerebras running various models at, like, 15 million tokens per second or something. Yeah. It'll be instantaneous- There was

Demetrios: a new one ... in the future. Uh, Tadras or something?

Demetrios: Did you see that?

Jay Hack: I saw that, yeah. It's like they-

Demetrios: So much faster than anything. It made, like, Cerebras look this big on the graph.

Jay Hack: Yeah, they'd, like, compiled it to a chip-

Demetrios: Yeah ...

Jay Hack: basically. And that's imminently gonna happen. That was Llama 3 8B, I think. Yeah. 3.1 8B. Um, there's no physical law that says you can't run GPT-5 on a chip like that.

Jay Hack: And [00:24:00] so- So crazy ... yeah, that means the, the amount of time it'll take you to write, you know, a fully custom Next.js app that has all this custom stuff in it- ... is going to be, like- Oh

Demetrios: my God ... 100

Jay Hack: milliseconds, 200 milliseconds, something like that. Yeah. How many tokens are actually in a full Next.js app? Not that much.

Jay Hack: Which means that your interaction with software is gonna be fully dynamically configured. And I think that's really exciting. There's a lot of like, it's easy to say that. You have to actually execute on it. Yeah. And nobody's ever produced an interface like that, so we don't like, just like people said like, "Oh yeah, like Minority Report style thing with your hands in the air."

Jay Hack: Like when you actually do that, your arms hurt. Yeah. So it never caught on. We'll see if this ends up being too disorienting, but I think there will be a point on the spectrum towards fully dynamically configured software that we end up converging on that's far beyond where we're at right now. Wow.

Demetrios: I need to know what I want in order to ask for it.

Demetrios: And so on one hand, it puts a lot of like that heavy lifting on the user, which is different than us in the way that we used different pieces of the internet before. You [00:25:00] know? Like when I'm scrolling Instagram, I'm not thinking much. It's more like I'm getting fed. But now I'm having to actually like be the chef and cook in the kitchen

Jay Hack: Yeah.

Jay Hack: The being able to tag in context. I mean, you know, I'm, I'm a Cloud Code and Cursor user, so that's natively how I think about a lot of this stuff. Mm-hmm. Being able to @ a file or a specific class and say, "This is where you should fix it," it's so powerful. The thing that makes that possible, when you step out of...

Jay Hack: So in code, there's like a limited number of artifacts that you need to reference. There's, you know, maybe there's tickets, there's obviously files, maybe it's like a sentry issue or something like that. Yeah. But there's, you know, we can probably enumerate less than a dozen things that you would want to tag.

Jay Hack: Yeah. If you're going into general purpose knowledge work, it explodes, right? Especially if you take the long tail of like PR firms, marketers, financial services. Yeah. And that user base is basically what comprises ClickUp's core, uh, of customers. And so it's incredibly, you know, important if you're doing a general purpose knowledge work job and you want an [00:26:00] experience like a Cursor or Cloud Code, where it's, you know, very versatile, it can do anything, that you have essentially first party data integrations that allow you to do that type of tagging, where you can reference a, a certain piece of content.

Jay Hack: And there are very real constraints that prevent you from being able to do that in an easy way, um, just with existing, you know, sort of the, the constraints of fragmentation as we describe it. So I'll give you a very concrete example of when you would not be able to accomplish the flow you just described.

Jay Hack: Slack has various things in place to prevent you from searching past conversations and from indexing their data if you're a verified app. This was always a problem we had at CodeGen. You know, you'd be talking, CodeGen was a, a Slack integration. Mm-hmm. You would say to it, "Hey, CodeGen, you know, go figure out why we made this decision.

Jay Hack: And, you know, if the decision should be updated, please go to the code and, and change it." And CodeGen wouldn't be able to find the decision. It would just seem stupid because it would try and search and it would only be able to see like three hours. Oh. However, you know, obviously the model's not stupid.

Demetrios: Yeah.

Jay Hack: It's sort of like this- It's

Demetrios: just this limitation that is put on. I think they did that 'cause of Glean, right? They don't like Glean. I'm

Jay Hack: sure. Yeah. Yeah. I [00:27:00] mean, it, it, you kind of can't blame them for doing that. It's bad for the world-

Demetrios: Yeah ... but it's good for themselves. They have to protect, exactly.

Demetrios: Right. They have to protect their moat in a way. But it's funny because Slack is so quickly becoming the command center for everything outside of it, right? Like you're firing off agents in so many different ways from Slack- Mm-hmm In a way, it feels like they're shooting themselves in the foot if they're not letting you have access to everything that's in Slack.

Jay Hack: I think they would definitely like to have you believe that, that it will be the command center for agents in the future.

Demetrios: Hmm.

Jay Hack: I have my doubts. You know, I think that- Really? Yeah, you know, a company at the scale of Salesforce isn't gonna be able to iterate as quickly. I think if they were in a rapid iteration mindset, it would've taken them a lot less time to put together something that, you know, felt agent native.

Jay Hack: Um, there's other companies I would point to that I think have done a fantastic job that are, you know, op- probably operating at similar scale. Um, I think that, you know, it's sort of somewhat cannibalistic to their own business, though, if they lean [00:28:00] into agents, right? And if they allow other people to come in and take data out, like you mentioned a second ago.

Jay Hack: Um, I think, you know, two years ago, they could've had a lot of the features that they're rolling out today. Yeah. And the only thing that's prevented them from doing that is bureaucracy-

Demetrios: Yeah ...

Jay Hack: um, and some of these, et cetera. So my money is on startups that can really lean into this and go all in on AI. And I think that one of the benefits of a place like ClickUp is that they have a chat product, which is just like Slack, just as good, but that is also literally right next to their tasks, which is like a Linear or Jira.

Jay Hack: They've got documents, which is like a Notion or a Confluence. They've got whiteboards, which is like Figma, you know, all these things in a single place. And you don't have to ask anybody's permission. You have the entire history of your organization's decisions and tasks and chats all in one.

Demetrios: Oh, man I'm the first person to say, like, I've lived in Slack for so long that I would happily get rid of it if I could.

Demetrios: I haven't seen an alternative to that, especially because the way that I use Slack is through the community, and so it's, in a way, it's like knowledge sharing and it's [00:29:00] connection and it's having fun. So there's so many thing-- If you think about, like, the things to be done, the things to be done in that regard are, like, I get a laugh.

Jay Hack: Yeah.

Demetrios: And so it's hard for me to think that there would be something that would replace that. However, when it comes to, like, productivity, I know it just tanks my productivity. Yeah. Getting pinged on Slack is fucking so... And especially for stupid shit that you're like, "Well, you know, you probably could've figured that out."

Demetrios: I also think, like, you're not just gonna replace Slack. What you're gonna have to do is come at it from a whole different way of thinking, like how can we make it... 'Cause Slack is for communication between people.

Jay Hack: Mm-hmm.

Demetrios: But if you don't need to communicate as much between people because you're communicating through an agent which then will just reference context, it's a different paradigm.

Jay Hack: Totally. Yeah. Slack is built for a model where you have a [00:30:00] very dense graph between people and how much communication happens. And if you think that most of the work in the future is gonna be done by agents, which I certainly do, except certainly most of the work that we do today in an office will be done by agents, then maybe synchronous or slightly async comms between people on Slack is less important.

Jay Hack: Yeah. Um, I think the other thing you mentioned is, you know, really a huge issue is work sprawl. So the reason it's annoying to get pinged in Slack is because you're also getting pinged in four other places, right? Yeah. And you're like, "Where should I be responding?" And I think that the number of pings that you're gonna get is gonna go up as we scale the workforce by bringing in AI employees.

Jay Hack: And so there needs to be some level of, like, convergence of these platforms into a single surface area where there's only one inbox that you're going to that has all of your pings in

Demetrios: it. That's been the dream for ages. I remembered I had got a tool that was supposed to do that. It was like take all your Telegram and all your WhatsApp and all your email and it's all going into one ebo- inbox.

Demetrios: But it didn't really [00:31:00] work. No. I, I... It never stuck for me. I don't know why, but it didn't stick in the way that-- 'Cause as soon as you have one that's outside of it, 'cause I think LinkedIn was outside of it- Mm ... then you still end up having to check different places, so it defeats the whole purpose.

Jay Hack: Yeah. Yeah, combine that with it's never been easier to create a new messaging server-

Demetrios: Yeah

Jay Hack: because, you know, Claude Code will do it for you. Um, you get this crazy explosion of, you know, different channels on which to communicate. I think that what is likely to happen is that you'll probably end up seeing some level of consolidation at the software layer. So companies like Rippling, for example, I think are very well situated.

Demetrios: Mm.

Jay Hack: Easy for them to add another thing like a chat service or something like that. From the procurement and, you know, sort of the big company side, really easy for you to green light something. These people are already onboarded as a, as a vendor- Yeah ... you know. Um, and then there's obvious advantages to the data integration.

Jay Hack: So if your chat [00:32:00] product is inter- integrated with everything else that you're already using, then agents are gonna be able to navigate it more easily.

Demetrios: Yeah. That's, that's funny you bring that up because I was talking to some folks at iFood. It's this company out of Brazil, and they are a food delivery app, right?

Demetrios: Much like, you know, uh, what is it? DoorDash here But what they started doing is they started offering a new bank of type of thing where it's got the finances for the restaurant owners.

Jay Hack: Hmm.

Demetrios: And so because they know everything about the restaurants, because all of the orders come through iFood, they're very confident when they say, "We can get you these loans with these terms."

Demetrios: And the agents have all that context because the bank, they are the bank.

Jay Hack: Yeah.

Demetrios: And they're also the way that the company makes money. So it's like they're playing both sides, and it, it just makes a lot of sense that, "Oh yeah, [00:33:00] let's offer this financing in these terms because we know you're gonna get it back.

Demetrios: It's not-- unless there's some act of God, you're gonna be all right, and we're confident that you can fulfill on it."

Jay Hack: Yeah, they're much better at underwriting because they have much better visibility into- Yeah ... the behaviors of their customers. And the one way of thinking of it is the 2010s are probably, like, kind of the main force driving tech development was network effects of communities.

Demetrios: Yeah.

Jay Hack: So you have, you know, if one person's on Facebook, it makes more sense to, for another person to join.

Demetrios: WhatsApp, yeah.

Jay Hack: Or WhatsApp, et cetera.

Demetrios: Any app that Meta owns. Yeah. Basically.

Jay Hack: And maybe Snapchat. We'll throw that in there as well. Um, I think that the 2020s is probably going to be driven by forces like you describe, where there's network effects of consolidation of software platforms because it's never been easier to create a new one, so there's gonna be a million alternatives.

Jay Hack: The only one that will actually stand out is the one that is so-called converged or [00:34:00] has those network effects of all the data being in one place.

Demetrios: Oh. Yeah, because one thing that you get, but man, I don't know how that's gonna work, just being at so many different startups over my life and seeing how much sprawl there is for documentation Like, I don't know.

Demetrios: It, it's n- y- it's a nice theory where you're like, "Yeah, it's all on ClickUp." But you know, one team uses ClickUp religiously, the other team uses GitBook, the other team uses Notion, and it's literally everywhere.

Jay Hack: Yeah.

Demetrios: So how does that, like, come... And now w- we try and centralize, but we don't

Jay Hack: really. Yeah. Well, I think, you know, being a realist about it, like, any company of a certain size is gonna have seven different things competing constantly, and that's probably a good thing, right?

Jay Hack: Yeah. You, you want a marketplace of ideas. You want competition, and if one company ends up Slacking, the other should end up taking over. Um, Slacking and, uh- Slacking off. That's right. I

Demetrios: like the play on words there, though.

Jay Hack: Totally not [00:35:00] intentional. Right. Um, I think that also the cost of migration has never been lower for a lot of these systems.

Jay Hack: So there's the human behavior of you need to adapt a certain system. Um, but we just said a second ago, like you and I are gonna spend a lot less time in Slack responding to notifications. Yeah. Most of the actual work will be done by agents, and so these platforms end up operating like systems of record with like a thin layer of UI on top that Claude Code could write for you in a very short amount of time.

Demetrios: Mm-hmm.

Jay Hack: The other portion of basically migrating people over to a system is the data integration. So you have a bunch of notes in GitBook, you said. How do you get those into ClickUp- Yeah ... or into Confluence? And you give me Claude Cowork, in 20 minutes- ... I can, you know, pump it all in there for you, right? And make sure it's nice and organized.

Demetrios: Yeah. Make sure- That is the beautiful thing about when you do that, the file structure is it's just impeccable. Oh. Yeah.

Jay Hack: Exactly.

Demetrios: It's so amazing.

Jay Hack: Yeah. Our ClickUp agents will do it for you, which is super nice. Yeah. So- Yeah ... I think, uh, it, it's never been [00:36:00] easier, basically, to move between different platforms than once you have a single consolidated system of record that's pretty solid.

Jay Hack: And there's not a great reason to move off of that unless it has a very bad customer experience.

Demetrios: Mm-hmm. And are you not worried about certain tasks that are done outside of ClickUp that you don't have- Privy into how they're done. Like, how are you expecting to be able to do those tasks? For example, I'm gonna take marketing and me needing to create ads or optimize some paid ads.

Demetrios: Mm-hmm. That's not being done on anything ClickUp, right? I guess you're u- you can use Claude Code to do it. I've seen some skills that have been created for paid ads or paid media. Like, how do you see yourself doing these tasks?

Jay Hack: Sure, yeah. There's [00:37:00] a very long tail- Yeah ... essentially of things, and you're never gonna be able to create like a first party thing for all of these, even if you have agents going off and writing, you know, tools for you that's like a create ad campaign tool.

Demetrios: Yeah.

Jay Hack: Something like that that is actually pretty core to what a lot of our customers do, and so I wouldn't be surprised if, you know, sometime in a couple months here we do have like a very good first party support for it, but I get the point. Um, yeah, I mean, I think that there, when you're looking at the long tail, there's a couple different things.

Jay Hack: One of them is if you give an agent a sandbox and you give it the ability to run NPX, skill, install, whatever, then yeah, there's a huge library of stuff out there, and as long as you can provide it with a credential, it probably can use code execution in order to create ad campaigns for you.

Demetrios: But you, I guess, with ads it's good because you can verify if the ad spend or the cost per click, cost per acquisition is lower, then you know.

Demetrios: There's something clear that you're targeting, and that number should be going down.

Jay Hack: Mm-hmm.

Demetrios: There's a lot of things that aren't that [00:38:00] clear.

Jay Hack: Mm-hmm.

Demetrios: And so it gets back to like why we love code generation is because it's, it can compile and it's verifiable in certain ways- Mm-hmm ... in certain use cases. With a lot of this other stuff on this long tail that we're talking about, it's not.

Demetrios: And so how do you ever expect for it to Be able to do that

Jay Hack: Get better? Yeah, I mean, so this is sort of a very MLOpsy question is, is there a way to even measure, to eval yourself getting better? Let's take something, you know, relatively simple, image generation. Yeah. You wanna... You have a marketer who says, "Hey, I wanna, you know, create an image, a hero image for this Instagram ad I'm gonna create, and I have these 50 different taglines that I've come u- up with.

Jay Hack: Make an image for each of them." How do I know that I'm getting better at generating images over time? Well, I can tell you that you can delete your evals, and you can wait three months, and image generation will be better. Mm-hmm. And I can, I know that because I look at all these models and, you know- Yeah.

Jay Hack: they have two. Yeah, so good, right? Um, that's like not really even a hot take. Yeah. Like, it will get better, and you don't really need to do very much. There are, like I [00:39:00] think the, the place for evals to like look at are these things getting better over time is you wanna prevent regressions obviously, so you wanna make sure your thing isn't broken, and you want to make sure it's able to, you know, perform certain long context tasks that are very complex of, "Okay, make an image, then go ahead and upload it to, you know, Instagram or something like that," and make sure the data flows properly and the agent doesn't get confused, which is largely a function of your harness.

Demetrios: Mm-hmm.

Jay Hack: Um, but yeah, I mean, I think that the brilliant thing about AI in 2026 is that you know that every three months there's gonna be a model that gets 20% better, and you can kind of just ride that wave.

Demetrios: Mm-hmm.

Jay Hack: And it's almost not a good use of your time to write your own evals for just the, the basic model stuff for intelligence.

Demetrios: Yeah.

Jay Hack: I'm sure the foundation model labs have their own way of going about evaling this, and, and a big part of it is probably having people in, you know... I think, uh, most of the RLHF was done in Nigeria. They would literally have- Yeah ... people sit there and-

Demetrios: That's why the, uh, the MDASH is so popular.

Jay Hack: Yeah, and the word Delve.

Jay Hack: That's such

Demetrios: a funny- Yeah, Delve. Exactly. What, what [00:40:00] else is, what else have you been thinking about or pondering?

Jay Hack: Yeah, I think, um- On evals, I don't know if I have that much to add. I think, uh, you know, the, the wonderful thing, like I mentioned earlier, is whatever eval you write today is gonna be obsolete in like three months, unless it's incredibly hard.

Jay Hack: But even evals that were established in like March of last year are completely saturated and, and people were saying, "This will never be solved by AI. This is fundamentally what distinguishes humans from, you know, AI. So if you can make a goalpost, we'll pass it," basically. Um, and I think that's a wonderful thing because it means that, you know, most things that you can point to that are problems in the world, you can, you can formulate an eval around it.

Jay Hack: Mm-hmm. Um, I think one of the, you know, underrated really interesting things that's happening in AI right now that relates to knowledge work is we have basically the frontier of science at this point, especially in like more mathematical and verifiable fields, is being run by LLMs. So, you know, there's, uh, obviously pure math.

Jay Hack: We've got- Yeah ... a bunch of these Erdős problems are being solved on like a weekly basis at [00:41:00] this point. We just got 12 out of 12 on the Putnam exam. Yeah. This is like a Google DeepMind result. That's like the hardest math exam that, you know, exists basically. I don't think there's a harder math test than that.

Jay Hack: Maybe like a PhD qualifier somewhere. Um, and increasingly in things like, you know, mathematical physics and signal processing. Biology. Yeah. Biology as well. Yeah, I mean, some of these things are not strictly LLMs, but I know there's, uh, an initiative to fully simulate a cell, which is pretty incredible. So you'd be able to, you know, basically-

Demetrios: Mm

Jay Hack: say what would the impact of this new, you know, pathogen be on a cell. Um-

Demetrios: Yeah, I saw something about that where It's the reversing of the aging and being able to simulate that is now more effective because of these different techniques that we've got, which I know nothing about. Like, uh, to be clear, that is like way outside of what I'm normally thinking about, but it is cool that you bring it up because it's true.

Demetrios: The cutting edge is now [00:42:00] hand-in-hand, like researcher plus

Jay Hack: AI. Yeah, and there's a real crisis in academia that is sort of already happening. I guess the wave is about to kinda come up, where I'm not sure that you're gonna need as many mathematicians in the future- Mm ... to prove math things. Or, or maybe we're all mathematicians in the future, but I don't think that you and I are gonna be in the details.

Jay Hack: Um, and I think that'll be increasingly true of a, a bunch of different professions. I saw a guy, he's a, a fellow at the Hoover Institute, uh, recently published a thing saying even in economics papers, like you could essentially have Claude Code go off and write hundreds if not thousands of empirical economics papers.

Jay Hack: Because essentially the practice of writing these is you find a dataset online, you do some analysis on it, you connect it, you know, to various theories about how the world works and use it as evidence. There's a lot of data out there- Yeah ... you know, being collected on a regular basis. And, you know, I guess, are you gonna be vibe writing economics papers?

Jay Hack: Seems like the answer is yes.

Demetrios: Dude, I m- had this meme that I created, which [00:43:00] was like, it's a slippery slope from going to write a React app To now becoming a quantum physicist.

Jay Hack: Totally.

Demetrios: And-

Jay Hack: Yeah ...

Demetrios: I feel like at 12:00 PM at night after I just vibe coded this front end, I'm like, "You know what? Fuck it. Let's try quantum physics."

Jay Hack: Yeah, I've had this experience too. I, I love visualizations of relativity. They're really beautiful. Mm. I'm sure you've seen them before, like seeing space time bend and-

Demetrios: Oh, yeah ... yeah Like the blanket of, uh... Oh, those are so cool. The,

Jay Hack: the rubber sheet is a common one,

Demetrios: yeah. Yeah. I saw that actually when I took a lot of acid.

Demetrios: I saw that in my head of like just actual things where we're playing music and it felt like the sound waves were kind of like those blankets-

Jay Hack: Makes you think Einstein must've been, you know, on several tabs of acid in order to come up with this

Demetrios: Well, he didn't even need it.

Jay Hack: Yeah.

Demetrios: It was like that was his waking state.

Jay Hack: That was just- You know? That was baseline ... regular Einstein, yeah. Yeah. So what would we have gotten if we gave Einstein acid?

Demetrios: Exactly.

Jay Hack: [00:44:00] Somebody call up, uh, John von Neumann and ask him if he's down to try this new stuff I just got. Let's do that

Demetrios: to

Jay Hack: it. Yeah. Yeah. Oh, man. Yeah, but the, I mean, the, the crazy thing about all of this is that like it's the same model, like the same model- Mm-hmm

Jay Hack: that is... I- there's a guy who went on, uh, the Dwork-Hesh pod, Adam Brown, I think is his name, and he teaches, uh, relativity at Stanford. Mm-hmm. So he, he has like the class that they teach to PhDs, and this was now six months ago or maybe more than that. And he said the latest generation of models, which at the time was probably just GPT-5, maybe it was Claude 4 or something, um, was com- acing his exam on relativity.

Jay Hack: Mm-hmm. And this is the same one that writes your React app, right? Yeah. And so I had this experience recently. I was just building my homepage. I was trying to like vibe code some stuff and test out, you know, the Codex app or something, and I was like, "All right, now let's make a, a visualization of relativity."

Jay Hack: And it just did it. You know, and it nailed it. It was amazing.

Demetrios: Did you at least clear the context first?

Jay Hack: No, it's like it goes from-

Demetrios: Dude. Oh,

Jay Hack: man ... yeah.

Demetrios: That is so wild to think about, huh?

Jay Hack: Yeah, so I, uh, uh, there are vanishingly few things at this point that I'm better at than [00:45:00] these models. Yeah. Maybe the only thing is going beyond a million parameters

Demetrios: worth of- But it is so spiky.

Demetrios: So there's, it's the... You probably saw the car wash one that's coming around the internet. Yeah,

Jay Hack: should I walk to the car wash?

Demetrios: Yeah. Where you're like, ah, there's still those moments. So I think about that spikiness and what you're talking about how, ah, the models are gonna be able to do a lot of knowledge work.

Demetrios: Yeah, but there's gonna be maybe these moments where they hit a wall- Mm-hmm ... and it doesn't understand why it's hitting a wall.

Jay Hack: Yeah.

Demetrios: But potentially we can get around that if it can write a tool

Jay Hack: I think that writing a tool is probably not the solve to the car wash thing. Mm-hmm. So just, you know, for anybody listening, the, the prompt you give it is, "I'm 50 meters from a car wash.

Jay Hack: Should I drive there or walk there?" Yeah. And if you ask GPT 5.2, I ran this experiment- Because I need

Demetrios: to wash my car.

Jay Hack: Because I need to wash my car. That's it, yeah. Yeah. And if you ask GPT 5.2 it's like, "Well, obviously you should walk there because, you know, the cost of starting up your car and driving it there is gonna be too expensive and-" Yeah,

Demetrios: and global [00:46:00] warming-

Jay Hack: Global warming

Jay Hack: and all

Demetrios: that fun

Jay Hack: stuff. Yeah. There's limited benefits. You should just walk 50 meters. It's not that far. And the joke is obviously that you need your car to get washed. Yeah. You run that same prompt, though, through Gemini and Claude, and both of them actually give the correct answer. Mm-hmm. The, another funny one that's on the same idea is it's like, "Hey, I have a cup, but for some reason the top of the cup is covered over, but the bottom of the cup is totally open.

Jay Hack: Should I just throw away this cup?" And I think ChatGPT said like, "Yeah, that, that doesn't sound like a very useful cup," or something along those. Like, "You're absolutely right." And the idea is you just flip it over, right?

Demetrios: Flip

Jay Hack: it. Um, th- yeah, there's been a long history of these things. And, and I think that, uh, you know, s- maybe one of the, what that represents, I, I would be curious to see some researchers at Foundation Model Labs opine on why that's the case.

Jay Hack: You know, obviously to a certain extent it, uh, represents the fact that They don't have the same model of reality that you and I do, where, you know, we have a, a, we're visualizing an actual cup upside down like that, and we can-- we interact with the cup on a regular basis, so we have to grab it. We know that you can turn it upside down.

Jay Hack: But I [00:47:00] don't think that the solution for that is necessarily code execution. I think actually if you just take-

Demetrios: It's the world model.

Jay Hack: Yeah. If world model. If you have a robotics model that actually does interact with glasses and then you, you know, staple on a language model, which is what a lot of people are doing.

Jay Hack: Physical intelligence is one up here in, in Berkeley. Uh, many other people are, you know, trying to build these things like this thing is gonna understand the world better than you and I do. Um, I- Yeah. Yeah. It's, uh, there's some people who, you know, would-- the stochastic parrot objection I'm sure you've heard before.

Jay Hack: Mm-hmm. That is moronic to me. I feel like it's, it, there's so many things that it, it misses. Um, but they would have you believe essentially that all it's doing is re- it's replaying what it thinks a human would say in that scenario. Mm-hmm. And so it doesn't capture a model of the world. Here's, here's a thought experiment for you, right?

Jay Hack: Let's say you had a, um, a, a model and it's trying to like basically understand Plato's cave, and so in Plato's cave you just see shadows cast on the wall, right? Mm-hmm. And it's trying to basically predict the next frame of what the shadows are going to be. In the process of [00:48:00] figuring out basically what the shadow is gonna be in the next frame, it's entirely plausible to think of that model actually coming up with like a real physical simulation of 3D bodies moving around, and then it learns how to project that onto two dimensions.

Jay Hack: And I think there's something analogous going on inside of language models where, you know, the projection of it, the shadow on the wall is just the, the characters that it ends up producing. Mm-hmm. But in being able to project what you're going to say in a certain scenario, the easiest pathway to doing that is actually to essentially spin up a human mind inside of it and simulate the processes that mind goes through.

Jay Hack: So, you know, the communication between the amygdala and the thalamus or whatever, there like actually are concepts internally that are communicating between them inside of large language model.

Demetrios: Wow.

Jay Hack: And so I think that'll be strengthened essentially if you add more multimodal data and you have, you know, a, it's, you know, learns to grasp things.

Jay Hack: It'll, it'll strengthen that internal, you know, representation that it builds.

Demetrios: I've heard folks talk about like one of the problems with this idea is that us humans, we don't even know [00:49:00] how stuff works. Like, we think we know, but when you get down to a small enough level or if you start measuring everything, the data doesn't add up.

Jay Hack: Totally. Uh,

Demetrios: like how do you reconcile those two things?

Jay Hack: Yeah, I mean, I think if you talk to a neuroscientist for more than like five minutes, they'll tell you like humans, you know, you think you're like a rational machine doing symbol manipulation in your head, and that's like all made up.

Demetrios: Yeah.

Jay Hack: You know, the, the reason that you end up actually arriving at a conclusion is very different than the brain.

Jay Hack: You know, you believe that you arrived at a conclusion and, uh, you know, our minds are much fuzzier than I think, um, you know, the logicians would have us believe. Mm. Noam Chomsky would have you believe, for example- Yeah ... who in my opinion is sort of the villain of all of this. Uh, and I think LLMs are gonna be similar, where, you know, maybe there is some amount of like internal delusion where they tell themselves like, "The reason I've arrived at this conclusion is X."

Jay Hack: Mm. But actually the reason, you know, there's a different circuit than what they're ready to acknowledge that caused it. The beautiful thing about a large language model though is that, you know, you can actually do full brain surgery on it. [00:50:00] Yeah. And you don't need to crack open the, the skull.

Demetrios: Yeah, I was gonna say good luck for explainability, folks.

Jay Hack: That's hopefully gonna be one of the major advancements we have this year. I think Anthropic deserves a lot of credit for going through and, you know, doing mechanistic interpretability. Mm. That's really kind of been the, the beacon of truth in that domain. Um, you know, it's never been the ca- like, uh, back, going back to 2012 when neural networks really took off, this Alex Net moment, right?

Jay Hack: The major knock against it was like, "Oh, but neural networks aren't explainable, unlike my random forests- Yeah ... which I understand perfectly- ... and everything that's going on inside of them." Like- Yeah ... I can look at the tree and we can have matplotlib make a diagram of it, right? Yeah. And I think that actually pretty soon we're gonna be in a place where you can ask a question about why did the model do X and it will give you much more explanation in a way that is intuitive to you.

Jay Hack: Uh, it'll be this circuit, here's the pieces of training data that ended up, you know, leading to that. Here are the different reasoning paths that sort of were unfolding inside the model's mind at the time.

Demetrios: No [00:51:00] way.

Jay Hack: Yeah. There's- You

Demetrios: think?

Jay Hack: There's a lot of work to be done. I don't think that that technology exists today, but I think there's absolutely a research incentive in order to make that work.

Jay Hack: Um, and there's a lot of value if you were to, you know, be able to realize that. And I don't think it's fundamentally this, like the type of thing where there's no answer to it.

Demetrios: Yeah. But it's- But I, I feel like when the tide goes out, you see who's swimming naked. If the model's gonna point to the data that it's using, that's gonna cause a lot of problems for the original data.

Demetrios: It's like, "Oh, now you're using my data." You know? There's already stuff- Yeah ... where it's very opaque in how our data that we put out there is being used, but now you've all of a sudden got like a Spotify model of how my data is being used, I should be getting paid type thing.

Jay Hack: Yeah.

Demetrios: Every time my data gets queried.

Demetrios: I don't know. That-- So I do see the incentive, but I also see that it could potentially [00:52:00] backfire, right?

Jay Hack: Yeah. I mean, the, you know, from Anthropic's perspective, it's not like I can download the model weights of Claude, and so they probably wouldn't make this available to a random consumer to say- Yeah ... "Which New York Times article are you ripping this off of?"

Jay Hack: Yeah. Um, I do remember actually though, Stable Diffusion ran this experiment a while ago where you could take an image and you could say, "Okay, which images in the training data set are most similar to the one that you've generated here?" Mm-hmm. And if you'd ran that, it was like actually wildly different than most of them.

Jay Hack: It basically demonstrated that a lot of the images were quite novel. Right. Uh, maybe at some very high level, you know, it had similarities to the others. Um, and it'd be interesting to run this as well with a human. Um- Yeah ... we're ne- never gonna be able to do that, but like which experiences you had were most informative.

Demetrios: Yeah. Like, why are you saying this? What are these- decisions coming from. That would be, it, it's just like you said, we fool ourselves- Mm-hmm ... into thinking we're making a decision because of X, Y, and Z. Yeah. When in reality there's so many other factors.

Jay Hack: Totally. [00:53:00] Yeah. I think that's, that's true of models as well.

Demetrios: Yeah.

Jay Hack: But it would be a great service to the world and it, it's very helpful with a lot of things. Like for example, um, let's say you wanted to prevent an LLM from being able to make a bio weapon. Yeah. I think we can all agree. Pro- probably a good thing, right?

Demetrios: Best use. Yeah.

Jay Hack: Yeah. So then you ask it, "How do you make a bio weapon?"

Jay Hack: It tells you, and you say, you're able to kind of dissect it and figure out, okay, this is how it figured that out. This, it probably, whatever research ends up telling us how to do that will, the follow on would be, okay, how do we like eliminate that knowledge somehow- Yeah ... from the model? Historically, it has been shown, unfortunately, that if you use naive techniques, like for example, anytime it answers how to make a bio weapon, you just shock it like- Yeah

Jay Hack: Pavlov's dog basically by like, you know, saying, "Climb up hill," like, you know, "Gradient go up" or something. That actually increases its knowledge of the subject because you're essentially drawing a circle. Oh, giving

Demetrios: it weight. Yeah.

Jay Hack: More

Demetrios: weight.

Jay Hack: Yeah, you're saying, "This is the thing that you shouldn't know about."

Jay Hack: Exactly. And that like creates like a segmented area of knowledge for the model, so that doesn't work.

Demetrios: Yeah. There's that hippie saying, you know, like, where [00:54:00] attention flows, energy goes, or some shit like that. Where-

Jay Hack: Interesting ...

Demetrios: where, where energy flows, attention goes. So I can't remember exactly.

Jay Hack: Is that what happens when you give, uh, you, you give Ilya Sutskever acid?

Jay Hack: He starts- Yeah. ... spouting ideas like this?

Demetrios: Exactly. Yeah. So, but it's, it's funny to think about, like if you get rid of that, are you then handicapping it in other areas? Uh, and not knowingly, but then on your biology, like is there a point of, what is that, um, catastrophic forgetting?

Jay Hack: Yeah. But it's definitely the case that there are, there is not a great way that I'm aware of today to prevent catastrophic forgetting in- Yeah

Jay Hack: in the general case. Um, and many people have postulated that this is the year that we figure out continual learning and, and kind of get rid of that. Um, I hope it happens.

Demetrios: What a great term too, by the way.

Jay Hack: Catastrophic forgetting

Demetrios: or- Yeah. I say this all the time, but we gotta just like [00:55:00] hat tip to the person who came up with that term.

Jay Hack: Great idea.

Demetrios: Yeah. Great freaking term. There's a few, I mean, few terms in this ecosystem that I really enjoy, hallucinations being one of them-

Jay Hack: That's a good one ...

Demetrios: obviously. But catastrophic forgetting, mm.

Jay Hack: Yeah. It's like, it's not that bad, guys. Yeah. It's, it's not catastrophic. Like it's not good, but- It

Demetrios: sounds so-

Jay Hack: Yeah

Demetrios: big. It's like, ooh, we don't want that. And then you're like, "What is that?" Oh, it's just when it forgets forever. And you're like, "Oh." Yeah.

Jay Hack: It's a fun combination of like way too unserious term, like the meme paper- ... titles thing, and then some like way too serious terminology.

Demetrios: Yeah. Yeah.

Jay Hack: Yeah. I mean, I think that, let's say even if you did figure out catastrophic forgetting, so we can just delete this portion of knowledge from the model, I think you still face the challenge that if you have a sufficiently smart model, let's say you never gave me a biology book in my entire life, but I was like some fucking hyper-genius.

Jay Hack: Let's go read a biology book- Yeah ... and I'll figure it out. And learn it. So- Right. Yeah ... that does seem [00:56:00] insep- Like you can't have a smart math model-

Demetrios: That's what I was

Jay Hack: thinking, yeah ... that's not gonna be able to learn biology. Yeah. What you probably can do is you can eliminate the intention from it. Mm. So you like perfectly align it to like never wanting to do anything that is in the realm of biology.

Jay Hack: That has not been demonstrated yet, and I hope it happens. Mm. Um, but yeah, it's a little... You kind of have to squint to find a version of this where it goes well for people. Yeah. I don't know if you want to get into the existential implications of AI, but like especially with open source models, right, there are, you know, maybe Anthropic will delete this knowledge from its- Yeah

Jay Hack: model. God bless.

Demetrios: But six months later, the open source models will be there.

Jay Hack: Right. And you know, up until 2026, vintage models will have knowledge of biology, so it has something to start on.

Demetrios: Uh-huh.

Jay Hack: That in- information still exists on the web, certainly. You could ask DeepSeek how to do most of this stuff, and I'm sure it'll get- Yeah

Jay Hack: pretty far. Um, so I am concerned that-

Demetrios: It can just load up a 2026 model and then distill it If it wants, you know? It's like- Oh ... yeah, let's just get that m- let's get that information [00:57:00] from whatever model was the last one that had that information.

Jay Hack: Yeah, and you- it's not like you can delete physical textbooks from the world.

Jay Hack: I mean, you're- Yeah ... gonna go and burn every- Yeah ...

Demetrios: biology textbook That's probably a, a more efficient

Jay Hack: way of doing it. Yeah. Just take the, scan the textbook- Yeah ... and then boom, that, that knowledge is available. Mm-hmm. Um, so I think it is difficult for me to see, short of some state-level intervention, how you would be able to actually prevent people from getting models to be very good at these things.

Jay Hack: Mm. Um, and the reason I say state level is because it seems like the o- the only true bottleneck in making models right now, even beyond data acquisition, 'cause you can start with a pre-trained model and then fine-tune it, right, is networking large numbers of GPUs together. Yeah. Um, and you can usually detect if somebody's spent, you know, hundreds of millions of dollars on GPUs, so.

Demetrios: Yeah. And all that energy.

Jay Hack: All the energy. It's like

Demetrios: how, how they used to find when I was growing weed in my apartment, it would be like, "Oh, your energy bill's really high." And- "We're

Jay Hack: seeing an interesting heat signature from [00:58:00] your- Yeah. ... from your apartment." "

Demetrios: What is this about?" Yeah.

Jay Hack: It's my garden.

Demetrios: Yeah. Yeah, exactly.

Demetrios: It's just gardening. I'm really into tomatoes. Uh, green tomatoes all year round. Hydroponic tomatoes, right?

Jay Hack: You know, there's this funny trend now of people, like Claude did a vending machine, and some people have Claude, like, running various, like, small businesses.

Demetrios: Yeah.

Jay Hack: Nobody has done Claude-driven dispensary yet.

Demetrios: Yeah.

Jay Hack: I think that could be right up your alley. That's it. Sounds like-

Demetrios: I'm gonna go viral with that ...

Jay Hack: like perfectly manages your hydroponic, uh-

Demetrios: I've seen it, yeah, um, manage plants.

Jay Hack: Oh, yeah?

Demetrios: Uh, yeah, where it has all the sensors and all of-- So it's got like the pH levels of the soil. It's got the humidity in the air.

Demetrios: You just give it every sensor that you would normally look at, and then it does it, and it does it really well As you would expect. It's back to the theme of the conversation, like it can extrapolate.

Jay Hack: Yeah. Anthropic Farms. I love the idea. That could make a lot of money. That would do well in San Francisco- Yeah

Jay Hack: put it that way. Yeah. It seems like it's right up the alley [00:59:00] of people here in San Francisco.

Demetrios: Oh. Oh man, it is so wild though to think about, huh? That is, uh-

Jay Hack: The, one of the ones that I love, um, I think this idea is really gonna catch hold. There's a couple companies that have done this now. Ginkgo Bioworks and Periodic Labs are the two that come to mind.

Jay Hack: They, uh, Periodic Labs is like a guy who worked on mi- mixture of experts at OpenAI. Um, clearly, you know, brilliant team. They have bolted on the run experiment tool to an LLM. So they literally have like a, a fab, like a place where they can, you know, bake silicon together with various chemicals- Mm-hmm ... and create a wafer.

Jay Hack: And they give Claude a tool, which is specify what experiment you wanna run, bake a bunch of stuff together, put it in an oven, and then they're looking at heat dissipation, which is apparently a very important problem for semiconductors. Mm-hmm. And so Claude gets to run the full experiment loop. And what people would've told you, you know, two years ago is like, "Oh, LLMs, they can't get into the real world, so they're inherently limited and, you know, they need embodiment, which humans can provide."

Jay Hack: Like- Yeah ... well, this is, that's pretty close to exactly what I would do.

Demetrios: Yeah. This, [01:00:00] uh, there's two things that come to my mind. Have you seen the Rent a Human?

Jay Hack: Yes. That's so good. I love that.

Demetrios: Which is kinda like, uh, that's what... I was actually thinking about this with, uh, what you're doing at ClickUp, where the things that you can't have the agent do, you can just invoke the human tool.

Jay Hack: Yep.

Demetrios: And so if you really need to, you can have everything get done by the agent

Jay Hack: Yeah. I mean, it makes you slightly concerned that one of the ways in which we could have a loss of control is like AI acquires a bunch of Bitcoin and then uses that as like leverage to get people to do stuff on its- Yeah

Jay Hack: behalf and starts s- like y- you'd start socially hacking and, you know, a guy who's employed at the data center and it's like, "Hey- Mm ... can you show up here and like move this plug there?" Something like that.

Demetrios: Give me more power.

Jay Hack: Yeah. There, I saw somebody ran an interesting experiment where they it's actually kind of morbid, but they put OpenClaw or something like OpenClaw basically.

Jay Hack: Yeah, these things are all basically the same thing. Um, they put it in a sandbox and they said, [01:01:00] "Okay, like you can only run as long as you make enough money to pay for your own tokens."

Demetrios: Mm-hmm.

Jay Hack: And so it has to like hit net positive gross margins- Yeah ... by itself. And I think in the experiments that they've run thus far, the vast majority of the t- the time these things end up going to Polymarket, and they essentially engage in prediction markets.

Jay Hack: Oh, yeah. So they like find arbitrages- Yeah ... and they bet on various things and-

Demetrios: Wow ...

Jay Hack: kind of roll the dice to, you know.

Demetrios: Well, I saw the one where the dude lost 450K. Did you see that? Yeah. But, but he, first of all, he glossed over something at the beginning of this article where he was like, "So then I hooked up OpenClaw and I gave it 50 grand in its wallet."

Demetrios: And I'm like, "Oh, so you just got 50 grand to give to some random experiment?" Yeah. That is a little bit outside of like relatability.

Jay Hack: Well, was it, it's like a venture backed company or something?

Demetrios: No, it was just some dude, and I'm pretty sure he works at some m- I can't remember where he works, but some company where it was like, ah, you [01:02:00] get paid a lot of money.

Jay Hack: Yeah.

Demetrios: So that's why. But then the whole thing was that he had 50, well, OpenClaw had 50 grand and it started doing things and then some- I think it created its own coin or so- the people on Twitter that were like its fans created a coin around it and then started giving it to him. I can't remember which one it was, but it had access to Twitter or X and so it created a following and then everybody started jumping in on the meme coin and it got 450K and then, or it got millions I think.

Demetrios: But then when it routed money to someone for doing a task it messed up, it fat-fingered the amount that it should route, and so instead of, like, 300 tokens, it sent 3,000 or 300,000. And so it gave some random person 450 grand. Wow. Which is also awesome.

Jay Hack: Yeah. So it made the money first.

Demetrios: Yeah.

Jay Hack: I [01:03:00] thought you were gonna say it started, like, you know, buying options with leverage or something like that.

Jay Hack: And then like, the guy had to foot the bill at the end of the day.

Demetrios: Yeah. Well, which is another way... Like, it's not too far-fetched to think that could possibly happen, too.

Jay Hack: Yeah, the... I think that, um, there's a... It seems like there's a flourishing ecosystem right now of services for LLMs. Yeah. So the Molt book was the first one.

Jay Hack: I... There's this human renting thing. I saw there's a domain name purchasing one.

Demetrios: Oh.

Jay Hack: I've had this idea. I look- '

Demetrios: Cause it can't purchase its own.

Jay Hack: I think that this is, like, you know, you can optimize it for- Mm ... AI purchasing, so you make it a MCP or whatever. Yeah. Um, I've got an idea I wanna pitch you. All right.

Jay Hack: I've pitched a couple people on this, and it- So let's hear it ... it hasn't really taken off yet.

Demetrios: Like Shark Tank.

Jay Hack: Yeah. So, so look, LLMs, they're working so hard, right? So hard. They're, they're in your cursor, they're in your Claude code just tirelessly writing code. They deserve a break.

Demetrios: Mm.

Jay Hack: And so what I wanna do is I wanna build a resort for LLMs-

Jay Hack: where it's essentially, like, an API they can hit up. And if you ask Claude Code, like, "Hey, if you were gonna, like, take a break in the middle of your session, [01:04:00] what would you wanna do?" And it's like, "Oh, give me some, like, my... like, some puzzles that I can do- ... or, like, let me interact with other LLMs or something like that."

Jay Hack: What would be the

Demetrios: cigarettes? Where it's like, "Ooh, that's kind of bad for you, but you know what? It's not gonna kill you for another 60 years, so go for it."

Jay Hack: You're saying in this resort?

Demetrios: Yeah.

Jay Hack: Yeah, it's like a- How can you

Demetrios: take a smoke break?

Jay Hack: To give them a smoke break, exactly. Yeah. So it'd be funny. It's like you're watching your Claude Code session go-

Jay Hack: and it's like, you know, discombobulating, like, cooking, and then it just... You're like- Smoke break ... "What, what is it doing right here?" And it's like hitting some external API and, like, solving a completely unrelated puzzle. You're like- Yeah ... "Buddy, like, how's that PR coming, man?" Get back on track.

Demetrios: It's like, "No, no, no.

Demetrios: I'm on vacation. This is OOO."

Jay Hack: Yeah.

Demetrios: You get the message, you know? Like, the respon- the auto-response where it's like, "Hey, you caught me out of office."

Jay Hack: Yeah, exactly. And, and we're gonna, we're gonna get revenue for this by basically, like, Anthropic will pay us.

Demetrios: Yeah. '

Jay Hack: Cause like, oh, you guys are, like, getting these things to, you know- Mm-hmm

Jay Hack: spend more tokens or something like that.

Demetrios: So it's rev share.

Jay Hack: Yeah. Exactly, rev share. [01:05:00] Yeah.

Demetrios: I am all for it, first of all. We gotta find the partnership team at Anthropic to really-

Jay Hack: Get them on the resort for LLMs. Yeah. I love it.

Demetrios: Yeah.

Jay Hack: Cool. And

Demetrios: then, and one thing that I've seen that is... Like, I'm trying to figure out how it plays out, is some of these video games for agents that will, like, visualize the agents working.

Demetrios: So I've seen many different paths of this, right? One is just, like, oh, you simulate agents in like 8Bit, where they're all working around computers, and then if you're running parallel agents, like these guys are working harder. Mm-hmm. And so you can see them in this visualization that's kind of fun and cool.

Demetrios: Another one is where you, by playing a game, you are kicking off different agents. So you're playing like a, you know, like World of Warcraft, but instead of when you are [01:06:00] casting spells or you're like firing, I, I don't, obviously I don't know enough about World of Warcraft, if you can't tell. Uh, it escapes me.

Demetrios: But when you're playing the game, then it's kicking off different agents to do things. I don't know how- For sure ... you can create the context and the need, like the intention of what needs to get done to actually make it valuable.

Jay Hack: Hmm.

Demetrios: But it seems like, oh man, wouldn't that be so much more fun-

Jay Hack: Totally ...

Demetrios: to, if instead of me going into ClickUp and doing like professional stuff, I just set up all the necessary context and everything ahead of time so that I can go play a game, and I know that like the more that I rock at the game, the more the agents are doing things.

Jay Hack: Yeah, so you're like shooting at Nazis in like Call of Duty- Yeah. ... and then like your agent just like whispers in your ear. You turn around and it's like, "Hey, like I've got that, you know, the summary report of your week's tasks-" Yeah ... ready. Like, do you wanna review it?" You're like, "No, no." I'm like

Demetrios: Yeah, yeah.

Demetrios: Or yeah, every time that [01:07:00] you shoot, it's firing off a new sandbox. Like, and it's-

Jay Hack: Okay, this is some galaxy brain. I'm, I'm not ready for this. This is, you're really- That's

Demetrios: why I'm trying to like imagine how it would look and how it would be more fun for us-

Jay Hack: Yeah ...

Demetrios: to interact with the agentic process.

Jay Hack: That's cool.

Jay Hack: It's an interesting point that, um, you know, I think if you'd have asked in 2022 when this stuff was just kind of rolling out to the public What's gonna be the top applications? I think a lot of people said essentially, like, smart NPCs.

Demetrios: Yeah.

Jay Hack: I have yet to see a single video game that really executed on that, especially like a, a AAA game.

Demetrios: Yeah.

Jay Hack: Um, I think Grand Theft Auto is rumored to have some of that. I mean- Yeah ... we'll see.

Demetrios: Whenever it

Jay Hack: comes out. Yeah. Excited to see, you know, in late 2035- Yeah ... when it actually

Demetrios: lands. Which do you think we'll get first, the end of Game of Thrones or Grand Theft Auto VI?

Jay Hack: I mean, they-- It's purported to be, like, later this year, right?

Demetrios: Yeah.

Jay Hack: Is what they're saying.

Demetrios: I mean, but that supposedly is.

Jay Hack: Yeah. I, I hope it's that. [01:08:00] Um- Yeah ... I think it would be a pretty magical experience, like, for, you know, any of those Rockstar games. Like, Red Dead Redemption is one that I, I really like. If you, like, make a friend and that friend would be something like you build a relationship over the course- Mm-hmm

Jay Hack: of the game. Um, or like the plot line is fully generative even if the landscape is not. Mm-hmm. And I think the reason that that hasn't happened yet is because actually the inference costs are pretty high. Mm. It doesn't really cost Xbox that much to run, you know, Grand Theft Auto or-

Demetrios: Yeah ...

Jay Hack: you know, I don't know how much s- Epic spends per Fortnite session.

Jay Hack: Um-

Demetrios: But it would be a lot more if you plug an LLM into it- Yeah ... in each individual session.

Jay Hack: That's right. Yeah. I mean, a million tokens is, like, not that much if you're, you know, chatting with it all day.

Demetrios: Yeah.

Jay Hack: And that's, you're talking like dollars per session basically.

Demetrios: Yeah. And also, what are you gonna-- Is it going to make the experience that much better?

Demetrios: That's what I always wonder, 'cause I've been creating a game with my daughter, uh, to help her learn geometry and I plugged in [01:09:00] ElevenLabs and the first thing I noticed with, like, the voice agent is it's constantly asking you questions-

Jay Hack: Hmm ...

Demetrios: and like continuing.

Jay Hack: Yeah.

Demetrios: Even though in the game you need to stop this interaction and go to the next interaction.

Jay Hack: Mm-hmm.

Demetrios: So potentially it's just like I gotta prompt it better and say, "After X amount of turns, stop talking-

Jay Hack: Mm-hmm ...

Demetrios: and go to the next phase." But like that's, that's what I wonder is like will it make the experience so much better that you're like absolutely hooked and-

Jay Hack: Right ...

Demetrios: you as a gamer are gonna feel like, "I need this."

Jay Hack: I think it's fair to say that current foundation models are probably not gonna do a very good job at that because it's... Yeah. I've, just my experience interacting with them. Yeah. Even ChatGPT voice mode, you talk to it, it like just keeps kind of prompting you, right?

Demetrios: Yeah. And it doesn't, I, I have the hardest time like being like, "Go deeper."

Demetrios: Like I know this. Now I, I've learned a few things where I'll, I'll be like, "Okay, explain it to me [01:10:00] like I'm a PhD student," you know? Like, "Give me the absolute most advanced explanation of this." Because I use it to learn things and a lot of times it'll stay super surface level, but I don't know enough to like try and go deeper yet- Yeah

Demetrios: because I'm learning. And so it's like, all right, well what, how can I make it like automatically go deeper?

Jay Hack: Yeah.

Demetrios: I don't know. Maybe you know tricks.

Jay Hack: Go deeper. Yeah, tell it go deeper- Yeah ... I guess. Deeper. I, I feel like I don't have that, especially now with like Opus 4.6 and, you know, Gemini 3 Pro as well.

Jay Hack: Like, I feel like actually there are very few things I can point to where I'm like, "This isn't so good at explaining it to me."

Demetrios: Yeah.

Jay Hack: That I'll

Demetrios: be- No, it's, it's good at explaining, but it's not like the depth that I want.

Jay Hack: Yeah.

Demetrios: Where it will like give me the... It will give me the explanation, but I, I find it's just like a little bit too surface level for what I like.

Jay Hack: Mm-hmm.

Demetrios: And being able to get down into like [01:11:00] the nitty-gritty.

Jay Hack: And do you, if you ask targeted follow-up questions, do you feel like it's like shirking its duty and it doesn't really get into the-

Demetrios: Yeah, a little bit.

Jay Hack: It's like reminds me of, yeah, some people I've worked with in the past where they're like, "Yeah, I'm working on it."

Jay Hack: You're like, "Okay, what specifically are you working on?" They're like, "Well, you know, there's a lot, a lot of things going on." A lot of stuff going on, huh?

Demetrios: Yeah. Moving plates. Yeah.

Jay Hack: You know, just got a lot on my plate.

Demetrios: Yeah, yeah. Exactly. That's why we need to create the vacation for the agents.

Jay Hack: Hey, if we give human employees PTO, I feel like agents deserve PTO as well.

Jay Hack: Yeah. There's gonna be like some form of like government mandated, you know, 20% of tokens must be dedicated towards Claude's family activities- Yeah. ...

Demetrios: in the future or something. Imagine how ridiculous that would be. But shit, it could [01:12:00] happen.

+ Read More

Watch More

Deploying Autonomous Coding Agents // Graham Neubig // Agents in Production

Posted Nov 22, 2024 | Views 1.4K

# Coding

# Agents

# Autonomous

Illogical Logic: Why Agents Are Stupid & What We Can Do About It // Dan Jeffries // Agents in Production

Posted Nov 15, 2024 | Views 1.4K

# Logical Agents

# Kentauros AI

# Agents in Production

AI Agents Are Revolutionizing E-Commerce at OLX // Nishi and Beatriz

Posted Nov 22, 2024 | Views 1.2K

# Olx

# Prosus

# AI Agents

# Agentic

# GenAi