AI in Production

# Diffusion

# UX

# Storia AI

Textify, Text Diffusers, and Diffusing Problems in Production

AI image generators have seen unprecedented progress in the last few years, but controllability remains a hard technical and UX problem. Our Textify service allows users to replace the gibberish in AI-generated images with their desired text. In this talk, we dive into the technical details of TextDiffusers and discuss challenges in productionizing them: inference costs, finding the right UX, and setting the right expectations for users in a world of cherry-picked AI demos.

Julia Turc & Adam Becker · Mar 15th, 2024

All

Aarash Heydari & Demetrios Brinkmann · Mar 15th, 2024

Challenges in Deployment Automation for AI Inference

Perplexity uses Kubernetes to serve AI models. In this talk, I will describe some challenges we have faced. This includes loading weights, templatization of raw Kubernetes manifests, and more.

# Deployment Automation

# AI Inference

# Perplexity AI

26:29

Shreya Shankar & Demetrios Brinkmann · Mar 15th, 2024

Decoding Prompt Version History

LLMs are the rage nowadays, but their accuracy hinges on the art of crafting effective prompts. Developers (and even automated systems) engage in an extensive process of refining prompts, often undergoing dozens or even hundreds of iterations. In this talk, we explore **prompt version history** as a rich source of correctness criteria for LLM pipelines. We present a taxonomy of prompt edits, developed from analyzing hundreds of prompt versions across many different LLM pipelines, and we discuss many potential downstream applications of this taxonomy. Finally, we demonstrate one such application: automated generation of assertions for LLM responses.

# Prompt Version History

# LLMs

# UC Berkeley

11:06

Arjun Bansal & Demetrios Brinkmann · Mar 15th, 2024

Prompt Engineering Copilot: AI-Based Approaches to Improve AI Accuracy for Production

LLM demos are aplenty but bringing LLM apps into production is rare, and doing so at scale is rarer still. Managing and improving accuracy of LLM apps up to the desired quality threshold is one of the main hold ups. In this lightning talk, we’ll share the workflows and AI based approaches that have been successful in deploying AI in production at scale with high accuracy.

# Prompt Engineering

# AI Accuracy

# Log10

11:15

Omoju Miller & Demetrios Brinkmann · Mar 15th, 2024

The Challenge with Reproducible ML Builds

In this talk, we will learn about repeatable, reliable, reproducible builds for ML written in Python. We will go through what it means for a process to be reproducible. Furthermore, we will talk about the need for accessibility and ease in collaboratively building and working on large-scale ML.

# Reproducibility

# Machine Learning

# FIMIO

11:48

Almog Baku & Adam Becker · Mar 15th, 2024

How to Build LLM-native Apps with The Magic Triangle Blueprint

In this talk, we explore the exciting yet challenging domain of Large Language Models (LLMs) in artificial intelligence. LLMs, with their vast potential for automating complex tasks and generating human-like text, have ushered in a new frontier in AI. However, the journey from initial experimentation to developing proficient, reliable applications is fraught with obstacles. The present landscape, akin to a Wild West, sees many stakeholders hastily crafting naive solutions that often underperform and fall short of expectations. Addressing this disparity, we introduce the “Magic Triangle,” an architectural blueprint for navigating the intricate realm of LLM-driven product development. This framework is anchored on three core principles: Standard Operation Procedure(SOP), Prompt Optimization Techniques (POT), and Relevant Context. Collectively, these principles provide a structured approach for building robust and reliable LLM-driven applications.

# LLM-native Apps

# Magic Triangle Blueprint

# AI

16:53

Jineet Doshi & Adam Becker · Mar 15th, 2024

Measuring the Minds of Machines: Evaluating Generative AI Systems

Evaluating LLMs is essential in establishing trust before deploying them to production. However, evaluating LLMs remains an open problem. Unlike traditional machine learning models, LLMs can perform a wide variety of tasks such as writing poems, Q&A, summarization etc. This leads to the question how do you evaluate a system with such broad intelligence capabilities? This talk covers the various approaches for evaluating LLMs along with the pros and cons of each. It also covers evaluating LLMs for safety and security and the need to have a holistic approach for evaluating these very capable models.

# Evaluation

# GenAI

# Intuit

34:02

Sam Stone & Adam Becker · Mar 15th, 2024

Measuring Quality With Open-ended AI Output

AI applications supporting open-ended output, often multi-modal, are becoming increasingly popular, for work and personal purposes. This talk will focus on how developers of such apps can understand output quality from a user perspective, with an eye toward quality measures that feed directly into product improvements. We'll cover topics including user-generated success signals, "edit distance" and why it matters, modality attribution, and when to backtest - and when to skip it.

# AI Applications

# Open-ended Output

# Tome

14:58

Anshul Ramachandran & Adam Becker · Mar 15th, 2024

Building an LLM Tool with 400K+ Active Users: Learnings that We Wish We Knew from the Start

At Codeium, we have scaled and grown a generative AI tool for software developers from nothing to over 400k+ active individual developers and hundreds of paying enterprise clients. Along the way, we have stumbled into a number of learnings about what it takes for a startup-built LLM product to be sustainable long-term, which we are now able to verbalize and share with everyone else.

# LLM Tool

# Start up

# Codeium

23:42

Kai Wang & Adam Becker · Mar 15th, 2024

ML Platform at Uber - Past, Present, and Future

Michelangelo is Uber's internal end-to-end ML platform that powers all business-critical ML use cases at Uber, such as Rides ETA, Eats ETD, Eats Homefeed Ranking, Fraud detection, and more recently LLM-based customer service bots. In this talk, I will discuss how Michelangelo has been evolving to continuously improve Uber's ML developer experience and our next steps. I will also briefly share learnings from our 9-year journey of building such a large-scale ML system to drive business impact for a large-size tech company like Uber.

# ML Platform

# Michelangelo

# Uber

23:19

Stuart Winter-Tear & Adam Becker · Mar 11th, 2024

Product Thinking in Data & AI

Technology is the last mile. Strategy is the first. This talk briefly covers connecting AI to business and customer value first, through a Product lens. Product strategy, not technology, is the key to winning with AI.

# Product Thinking

# Data

# AI

# Genaios

10:27

AI in Production

.css-1t9010w-StyledLink:hover *{color:var(--theme-color-primary, #C92C7F);}Textify, Text Diffusers, and Diffusing Problems in Production

Textify, Text Diffusers, and Diffusing Problems in Production