MLOps Community
+00:00 GMT
Sign in or Join the community to continue

AI Innovations: The Power of Feature Platforms // MLOps Mini Summit #6

Posted May 08, 2024 | Views 360
# AI Innovations
# Tecton.ai
# Fennel.ai
# meetcleo.com
Share
speakers
avatar
Mahesh Murag
Product Manager @ Tecton
avatar
Jose Navarro
MLOps Engineer @ Cleo

MLOps Engineer at Cleo.

+ Read More
avatar
Nikhil Garg
Co-founder/CEO @ Fennel

Nikhil is the cofounder/CEO of Fennel, a realtime ML Infra company. Previously, he was at Meta where he led teams building PyTorch and before that applied ML teams that handled AI/ML of many product lines across Meta. Outside of work, he is the father of an 18 month old and a major Eminem fan, in that order.

+ Read More
avatar
Ben Epstein
- @ -
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
SUMMARY

Building a Unified Feature Platform for Production AI Mahesh walks through Tecton’s journey to build a unified feature platform that powers large-scale real-time AI applications with only Python. He'll dive into how Tecton has navigated key tradeoffs like balancing developer experience with scalability and flexibility while adapting to rapidly evolving data and ML engineering best practices.

Journey of a Made-to-Measure Feature Platform at Cleo Jose shows how the platform team at Cleo has built a production-ready Feature Platform using Feast.

DIY: How to Build a Feature Platform at Home Nikhil decomposes a modern feature platform in key components, describes a few common options for each component, gotchas in assembling them, and some key architectural tradeoffs.

+ Read More
TRANSCRIPT

Ben Epstein [00:00:06]: Hey, everybody, welcome to the live stream. We are hitting you back to back, but two weeks in a row, because we were gone for a bit, we're trying to come back strong. And this session this week, we're going to be focused all on Python, all on feature stores, feature platforms, feature engineering. So it's going to be a really good session. Super tactical, very helpful if you're trying to build any feature type pipelines at your company. So I'm really excited for this session. We have Mahesh with us, a product manager from Tecton, and we have Jose with us and MLOps engineer at Cleo, both talking about how to build feature platforms and how to scale feature engineering. So it's going to be a really good session.

Ben Epstein [00:00:40]: We're going to kick it off with Mahesh to talk about Python based feature stores and then we'll ask some questions, see how that's going, and then we'll kick it over to Jose. So whenever you guys are ready, Mahesh, feel free to kick us off.

Mahesh Murag [00:00:53]: Thanks, Ben, really appreciate it and really happy to be here. Today I'll be talking about Tecton, which is the company that work at it's industry leading feature platform. And specifically what I'm interested in sharing is over the past few months and few years, we've arrived at what we think is a really good culmination of our learnings about the feature platform space, specifically our journey to building a platform that helps power large scale, real-time AI applications with only Python. So there are a bunch of trade-offs that we've navigated over the past few years to get here, balancing things like the developer experience, while also letting people maximize the scalability and the flexibility of their system, all the while adapting to, as all of you know, a very rapidly evolving data and ML engineering space as best practices change pretty much every year. So a quick background on me. I am a product manager here at Tecton managing the compute platform teams, which is very relevant to what we're going to be talking about today. And prior to that, I was at scale AI, a little bit upstream of what I'm doing now at Tecton, but very deep in helping companies get the best quality data, annotated data for whatever applications they have. So I have spent a lot of time in the data ML space and working on problems, bridging those two and making sure, um, data can be activated to build the best-performing models.

Mahesh Murag [00:02:25]: Um, cool. So I'll go ahead and get started. Um, so before we dive in, uh, I want to talk about what a feature platform is. Um, so it's, it's kind of a simple explanation, uh, if you really think about it. Uh, here's kind of a diagram, left to right, about a few different components that are really important, uh, in a feature platform. And in any ML team. On the left side, there's data. That's your raw batch, historical data, any streaming data that you might be bringing in in near real time from whatever your data system is.

Mahesh Murag [00:03:00]: Then there's on demand data or request time data that is coming in, let's say as soon as a user clicks a button or has some interaction, that's data that's coming in in real time, and then there are features. So now that you have that data coming in and entering your feature platform, you actually want to transform it into something that's more useful, something that's predictive and actually has an impact on the downstream model performance. And then last, but of course, not least, is your actual models. So now that you have predictive features, you want to make sure that they can feed into your models, both on the inference side as well as on the training side. So you're training the models with point in time accurate data from the past years of user interactions. And then anytime there's an interaction from this point on, you want to bring in the freshest and most accurate data that you've been working on. And a feature platform helps with all of this. And it also bridges the gap between the online, which is where your production AI app runs, then offline, which is where your organization's data science or machine learning engineering teams are actually working and iterating and experimenting, and then productionizing a feature platform helps bridge all of these things together into one place where every feature is reproducible, it's written as code, and it's visible by any team member that's part of your organization.

Mahesh Murag [00:04:31]: Now that we know what a feature platform is, let's talk about what it actually does under the hood. If you're a user using a feature platform, there are really three key components or value propositions that you want to get out of it. The first is defining features. It should be as easy as possible to define a feature or data transformation within your feature platform that's visible by all the users, that's iterable and easy to change, and quickly get a feedback loop of how does this actually impact your model performance, and how can you continue improving on the features that you've defined. Then a really important part which is near and dear to me and my team is computing features. This can be really complex. This is orchestrating transformation pipelines, running jobs in different contexts, and making sure they don't blow up or they're too expensive. But the core value prop here is turning the raw data that you've defined, or your data engineers or data scientists have expressed in that first step into real features that work at scale and are constantly updated.

Mahesh Murag [00:05:40]: Then finally, the most important step is the retrieval. There's retrieval offline, where you have your historical data. You want to train a model, you want to see what was the state of the world at some point, let's say a year ago, and what were the features up to that point. Then there's the online retrieval flow, which is in your application. How can you retrieve and compute features in real time with sub second latency for super, super high scale queries per second? But there's another definition of what a feature platform does that Tecton and myself are kind of very interested in solving. And that's the bridge between two different roles within your ML organization. So in an ML.org, there are data scientists. These are the folks that are ideating, they're working on new models and features.

Mahesh Murag [00:06:34]: Their whole interest is moving quickly and not being burdened by the typical engineering work of productionization, of uptime, of making sure that these systems are running reliably, they're monitored, et cetera. Their core focus is taking in the data, running experiments, and then defining features that actually make whatever models they're working on more productive and more performant for whatever the end application is. The other role is the engineering team. This can be software engineers, ML engineers, platform engineers. These are the folks that are looking to keep the system up and running, to make sure that the data is as fresh as possible, that there isn't much downtime, that features can be defined as code, and they can be referenced at any point in time in the future, and then actually having monitoring and Ci CD and all of these engineering best practices built around the actual platform. And typically, historically, over the past few years, these things have been separated. The ML engineering team had to be concerned with how the data science team is working, and actually working really, really tightly in lockstep to make sure that data sciences work would actually make it to production. Typically, we saw that enterprises often wouldn't make it to real time AI just because it was really hard to bridge these two sets of user problems and getting to the final state of features being used in real time AI applications.

Mahesh Murag [00:08:05]: So what we think a feature platform is, and what it actually does, is it acts as a bridge between these two and helps bridge the gap between the data science Persona, someone that's iterating on models and features, as well as the engineering Persona that wants to help the model be successful and stay successful without any downtown into the future. So this is something that we care a lot about here at Tecton. These are two different jobs to be done that we think are super important, and we think a feature platform is the key component of making sure that these folks are successful, so that the end application is also successful. But there are a lot of challenges with it. And the main issue is that data science and software engineering are completely separate worlds with different needs and different expectations. So for a data scientist, something that they care a lot about is the speed of iteration and the ease of iteration. They want access to data in any environment. They want to iterate rapidly on whatever their feature definitions are, the transformations, the kinds of aggregations that they're writing.

Mahesh Murag [00:09:17]: And they would love to be able to do these in notebooks entirely in Python, ideally without worrying about any complicated data frameworks or engineering systems. They also care about bringing in python dependencies into wherever they're working. They use PIP install a lot to bring in things like pandas or numpy, all the way to more complicated things like NLP packages or spacy in the modern world in the past year or two, even more advanced feature engineering. Things like embeddings, models, or ways to transform unstructured data into things that might make your models more performant. But the key thing here is bringing in these external dependencies without having to worry about a really complicated setup. And then last but not least, the most important thing is model quality. Every feature, every new data point that they bring into the system should help make that model performance increasingly better over time. Then there's the software engineering Persona.

Mahesh Murag [00:10:18]: This is again a platform team, a machine learning engineering team. These are the folks that are concerned with making sure that these systems stay up and can be reproduced, debuggable, explainable, and that there isn't downtime. And if there is downtime, it should be really easy to see what caused it. So the first thing that they care about is itering reliably. These include things like integrating with git or Ci CD best practices, and making sure that features can live as code and can be committed and iterated on and versioned. The other thing is meeting production performance requirements. So typically these teams have some sort of internal and external SLA guarantee. They want to make sure the system is up and running with how many ever nine s of performance uptime.

Mahesh Murag [00:11:09]: And then each request that comes in for new features can be served with really reliable latency as well. Finally, they care a lot about the end user experience. So that's stuff like the speed of feature retrieval. That's stuff internally like explainable code, and externally stuff like the best possible application behavior without having to hang up on feature retrieval and waiting for these complex queries to run quickly. So Tecton and feature platforms in general, something that we've been working on over the past few years, is bridging the gap between these two things. Today, I'll talk about how we've done that and the ways that we think any organization, especially the enterprise, but also all the way down to new teams that are working together towards a production AI application, what are the things that they should care about? Cool. So the first thing that we'll talk about is that first value prop of defining features, and more importantly, defining features as code. So what this means is, for example, in this screenshot, you can see how a data scientist who's working with Tecton can just import the Tecton library and a couple of additional things like date time, which might be useful, and then define declaratively what they think the feature behavior should be.

Mahesh Murag [00:12:31]: So in this case, they're defining a couple of aggregations with different time windows of one day, three days and seven days, and they're defining that transformation. So all that's happening right here in that very last line is they're saying, hey, we care about the user id, the timestamp, and the amount columns. Let's filter down to that. And then on top of that, we'll do these aggregations that execute really with high performance and can aggregate and run on a daily basis to make sure that data makes it to production. Then when you're ready, the data scientists and folks that are working with this can just quickly run Tecton apply. This integrates with whatever system is integrated with Tecton or your feature platform, and kicks off a Ci CME pipeline to run jobs that validate that the data is connected to, actually send this to your git repo and make sure that it's all published in one central repository. Then this lets folks reuse all of this and make it shareable. There's a concept of a workspace within Tecton where any user within your can fork workspaces, reuse those feature definitions, and materialize data so they don't have to rebuild a lot of this from scratch.

Mahesh Murag [00:13:49]: What does this help us with? If we go back to this view of the data scientists and the software engineer. This helps the software engineering team with reliable iteration and git and CI CD best practices. It makes them really happy since they can see who's working on features, when have they changed and what made them change. But the thing is, what about the data scientists? If you iterate and include a lot of friction with the productionization process, data scientists are probably not going to be very happy since this gets in the way of that rapid iteration step. So how can we help them as well? Again, going back to that first key value proposition we talked about earlier, data scientists really care about experimentation and moving quickly as they work on new features. We made it super easy to experiment with new features in a notebook in the exact same interface and declarative methods and objects as we just showed in that versus code view or in your feature repository itself. So it's actually 100% copy pasteable both in that iteration and experimentation phase. In a notebook like this, to when you're ready to productionize, all you have to do is put that into a python file and then run Tecton apply to actually get those into production and into a serving environment.

Mahesh Murag [00:15:12]: So everything is declarative. We can interpret everything you put in the notebook in that production state. And then as a data scientist, you can just really quickly iterate in that notebook, make quick changes to the transformation, and see what the actual impact to your data is in real time. So in this screenshot, it's a bit small, but there's that get features and range SDK call, which immediately just shows you what your features feature data looks like and what they might actually be computed as in your production environment. So if you're a data scientist, you can move really, really quickly and just spend some of your time iterating here. And then the very, very last step is to apply it to production. So this looks like a Jupyter notebook, which you can then apply to production and commit to your git repo to make sure that it's versioned. And then finally, Tecton will take care of all that orchestration, spinning up the jobs to materialize the data and then spinning up servers that can produce these features with really low latency.

Mahesh Murag [00:16:17]: Great. That helps our data scientists with the iterating rapidly step, and that's something that we've seen organizations with big teams have a lot of success with. But how do we actually help with the next value proposition, which is the performance and productionization? So that's where this kind of value prop of computing features comes in. This is where you want to make sure your transformations are executed both offline and online, ideally, and the way that Tecton does it is without making any changes to those feature definitions in your online versus offline environment. But how you do this reliably and make sure that these features are available without any downtime. When we first looked at this, we looked at what's best in class right now, or what are the best practices for everyone to use and for big data applications. This has historically been with Spark. Tecton has always integrated really well with Spark, with environments like databricks and EMR, where we actually integrate with Spark all the way from that batch data computation for historical features to spark structured streaming for things like Kafka and Kinesis, and then spark for actually serving these features and making them available and running these jobs in real time.

Mahesh Murag [00:17:37]: So Tecton takes care of all that retry logic, the orchestration, to make sure that backfills are occurring successfully and efficiently, and handling this all in the background, behind the scenes of that declarative interface that we just showed off. This also means that we orchestrate all these jobs and we make sure that they run successfully and that they're visible in a nice interface. So this helps with this software engineering second bullet point of meeting the production performance requirements. But we've seen that data scientists hate Spark. They don't want to have to deal with Spark jobs and spark dependencies. If anyone's ever seen a spark error trace, it's painful to look at and to read, and I don't think most data scientists want to mess with that. So how do we un annoy data scientists? Well, we know that they love one thing, which is computing features with Python. This is what they're used to in their data science, in their notebooks, and something that they've used to iterating quickly on.

Mahesh Murag [00:18:39]: But there are problems. Historically, we've seen that Python doesn't scale. It historically hasn't worked with streaming data and isn't quite optimized for data warehouses. There hasn't really been a solution that's Python native that also has all those performance needs and meets those performance requirements that the engineers have. That's something that I'm really excited to talk about, since our team at Tecton has been really hard at work at this. But specifically, we think that we've solved a huge chunk of these problems with our latest launch of a platform called Rift. Rift is a Python native managed compute engine for developing and productionizing your features at scale. It comes with a couple really, really big value propositions, the first of which is Python in any context.

Mahesh Murag [00:19:34]: So Python transformations for your batch features, streaming features and real time features, you don't have to worry about these external warehouse dependencies or having a spark context. It truly is out of the box and something that you can get started with really, really quickly. This also comes with the benefits we've been talking about, about fast iteration. Our rift engine can execute locally in your SDK for offline feature retrieval, and running those queries locally in memory within your Python environment. That could be a Jupyter notebook or even a Google Colab notebook, et cetera. Then it's the same one easy step to productionize this using Tecton apply. Then finally this builds on top of all those enterprise performance optimizations that we've been working on for the past five years for millisecond fresh aggregations for millions of events, and we're talking qps in the scales of hundreds of thousands or even a million plus. And under the hood, this uses all the optimizations that we've been working on for our enterprise customers over the past five years.

Mahesh Murag [00:20:43]: So we've seen a number of huge organizations go to production using a lot of the stuff that we've looked on here. So what does this mean for a user? It means that you can now have Python level simplicity with enterprise grade performance, and you don't have to make that trade off between the engineer that cares a lot about reliability, performance and latency, and that data scientist that cares about developer experience, simplicity and an ease of use. You can have the best of both worlds. So this means entirely with Python. You can experiment with your features, run and register your feature definitions, and then do offline and online retrieval entirely in Python with no dependencies. And this is all built on top of the tip of the iceberg. Of all these other optimizations and capabilities that we've seen are 100% necessary for any organization to actually make it to production. As mentioned earlier, we've seen a lot of our customers or initial prospects falter at these last steps of say they have a feature store and they're writing data to it, but they're missing a ton of the other things it takes to make an organization successful of uptime monitoring, alerting, sharing features between different teams, a web UI that makes it really easy to monitor all of this production.

Mahesh Murag [00:22:05]: SLA's one thing that's really important to our team is optimizing those aggregations within Tectone. So we have an aggregation engine that helps you define all the most common aggregations of things like count and Prox percentile, things like that, all within this really efficient engine that uses data compaction to make sure the performance is optimized. Then things like caching and a really low latency serving environment, an ingestion API for streaming features. All this stuff that's under the hood, but the interface for it is now this really easy python environment. Of course, if you're a smart shop, we wanted to make sure that it's as easy as possible to stay integrated with what you've already been working with. If you have Spark, or if you're integrated with a data warehouse like Snowflake or Bigquery, you can keep using Rift and Tecton to push those queries down into your data warehouse. Or optionally just use Python compute to keep it as simple as possible. What does that mean for these two Personas? It means we now keep those production and reliability requirements, the iteration and code quality features as code attributes, as well as the Ci CD best practices things that the software engineering team cares a lot about.

Mahesh Murag [00:23:26]: And then you also keep the really, really nice things that Python brings to data scientists. So that's stuff like the rapid iteration in notebook, bringing in any dependency using Python and pandas. It makes it really easy for these two teams to now collaborate in one place. The final step on this really quick is how do you now retrieve those features and make that as easy as possible? Online? We make it really easy to retrieve these with a HTTP API that's serving features with millisecond latency for whatever your inference is, and then offline. You can now actually just use entirely in Python with an optional spark dependency, but completely not required. You can now construct these historical features with point in time accuracy to train your models offline. So coming back to this, we now have optimized for both the user experience and speed of retrieval, and then for model quality to make sure that the models are trained to the best degree and performance as possible. So now this is a unified interface with a unified user experience that lets data scientists iterate with any dependency that they want towards the highest quality model, while letting the platform and ML and software engineering teams reliably productionize with good Ci CD and best practices, and making sure the features actually make it reliably to the models being used in production.

Mahesh Murag [00:24:56]: So we're really happy with this. And of course, if folks are interested in trying it out, I'm excited to share that. Rift is in public preview and pretty much anyone can go use it. You can register at Tecton AI Explorer. We have this really easy quick start that runs entirely in a Google Colab notebook since this is Python native. So we definitely recognize or recommend that folks go try this out. And before I hand off to the other speakers, I want to talk about what we think is coming in the future as well. We think that production AI is evolving.

Mahesh Murag [00:25:30]: It evolves all the time, it moves really quickly. But of course, predictive ML is going to have to continue to have a very big impact on business applications. This includes the entire lifecycle of you have your source data from your application, you then define features based on this, you can send these features to whatever your store might be and then train a model with this to make predictions, and then use those predictions for whatever your application might be as well. Then what's been happening recently is with generative AI, we've been seeing a different but very, very similar set of user patterns. Observe, emerge over the past couple of years. Specifically, there's still data. This is unstructured data, documents, things like user reviews or images or videos. A lot of our customers have wanted to turn these into embeddings, things that they actually can use for context for whatever their end model might be.

Mahesh Murag [00:26:36]: In this case, embeddings can be fed and used and stored in a vector DB, which is really similar to what we've been doing historically with a key value store for structured features. And then they can be used and fed into llms and other applications to make them as performant as possible and to bring intelligence into whatever your end application might be. But our theory here, and something that we've seen validated in industry is a feature platform, is still the bridge between all the way from the data to the end application. A lot of these patterns of retrieving data and making it available in real time haven't really changed. It's just the nature of what the data looks like and that specific model that's being run. But in the end, we think this set of patterns is going to continue to be really, really important for any enterprise or organization that needs to be successful here. I'll end here and make sure that other folks have time, but it's been really great to be here and definitely go check out rift and tech Tom at Tech Tecton AI Explorer. Thank you.

Ben Epstein [00:27:43]: Awesome. Thank you so much for that talk. That was really, really cool. Really great to see how techon is growing and supporting more complex use cases. I have a quick question, which is around how, when you're defining your features in code, can you have features that are dependent on other features like chaining those features. And does Tecton figure out the dependencies between those?

Mahesh Murag [00:28:08]: How does that work? Yeah, absolutely. This is super top of mind, especially in an organization that has different teams with dependencies, but that's something that we do support, especially with those on demand features. You can bring in different batch and streaming features and make decisions in real time, or do transformations as soon as you have data coming in, then we're also supporting batch and stream features that depend on each other. Let you compute something, some intermediate data, and then feed that into another feature definition.

Ben Epstein [00:28:41]: Awesome. That's really cool. And the second thing you referenced around generative AI, one option is obviously embeddings, but another one that comes to mind is actually what was talked about last week in our session around how one of the presenters was leveraging llms to just create structured data out of unstructured data. It doesn't matter if it's audio or if it's text, but they ended up getting it into text form and then extracting a bunch of useful features. Those features could then be Tecton features, which could then be trained for a standard XgBoost model or anything like that.

Mahesh Murag [00:29:13]: Yeah, absolutely. It's actually really timely that you bring that up. One thing that we're really excited about and are launching really, really soon, I think this week is a embeddings interface. On top of Tecton, you can bring in any data and use that same interface, that same declarative framework I was talking about to run your embeddings model at scale with the same guarantees that we have in production for our existing customers. Stay tuned for more on that.

Jose Navarro [00:29:41]: Awesome.

Ben Epstein [00:29:42]: Yeah, I'm looking forward to it. Thanks so much. Yeah, thanks for coming on. I think we'll jump over to Jose and we'll get the next session. Thanks, hesh.

Mahesh Murag [00:29:50]: Thank you.

Ben Epstein [00:29:51]: All right, thanks for coming on, Jose. We have Jose with us, who is an mlops engineer at Cleo, and he's also going to show us about feature stores using feast, an open source feature store platform. So I'm really excited to do that. So let's dive into that and then we'll save some questions for the end. Jose, whenever you're ready, you can kick us off.

Jose Navarro [00:30:10]: So, my name is Jose. I am an Mlops engineer working for Cleo. And on the top of that, I also am involved with the Mlops community running local events in Bristol, UK. And today I would like to speak and talk about the feature engineering platform that we have been building in the last ten. On the top of feast, which is an open source feature store platform. So yeah, an overview of the quick talk that we'll give today, I'll try to keep it short, is I'll give you an overview of what Cleo is so everyone has an understanding of what we do. And also I'd like to start with the background of where we were before having a feature store and some of the challenges that brings along. Then I will speak a little bit about the architecture of the feature platform that we built with Feast, some of the decisions that we made, and then show a couple of use cases and the results starting from Cleo is an AI first fintech startup.

Jose Navarro [00:31:25]: Our product is deploying the US and it allows users to control better Bim finance so that we can empower them to build a life beyond paycheck. So trying to help young people to go from paycheck to paycheck into a better finance state. So we have things like iCloud. You can set up budgets which helps you stick into them. You can talk to the app, it's a chatbot AI type of app. So you can talk to about your finance, it will talk to you about your expenses, send you potential recommendations like don't spend too much money on Amazon or Uber. So it will help you to stick to your budget and your goals. One popular feature is the cast advance.

Jose Navarro [00:32:23]: So we have a subscription model. So we'll help users to try to avoid expensive overdraft in the US by having cas advanced from time to time and also help them with credit builder. So we have our credit card that can help you increase your credit score over time. So a little bit of background about how ML was done before this feature store, which is pretty potential issues that you have in ML before having feature stores, as Maij was planning before, is we have, we have a user who has an account at Cleo and logs into Cleo and then suddenly they want to request a CAS advance. So our ML service has to calculate at request time lots of features about that user, about previous transactions and they have to calculate them online, which is quite expensive, connecting to the database. And once those features are calculated, then they can make a decision about what type of loan that user can get. One of the challenges with this approach, obviously the latency that it takes to get back to the user, those features, because they get developed within the API, they don't get shared across other services. So potentially we're using reusability capabilities for those type of features.

Jose Navarro [00:33:58]: Then a usual problem as well is that feature transformations are duplicated between the serving code and the trading code, which can cause lots of problems. And also on the top of that, since ML services are connected to the main database to calculate these features, any issues that are happening in the ML service can affect the performance of the main database. So they can spread across multiple other areas of the app. And potentially not only issues, but also like sudden increases in traffic could also affect the main app. So with that in mind, what we wanted is basically what Maish was explaining before. So we want a GitHub approach. We would like to have a single digit latency for online features. Try to keep cost under control as low as possible in our infrastructure.

Jose Navarro [00:34:53]: Then something specific about our use cases as well. We wanted to have full bucket strategies for our online features. Something that I can explain a little bit in more details in the use case. So we decided to go with Fist for this solution. Fist is an online feature store, by the way. I have two things in the name of the community to Tecton for being the main line trainer for a long time, until quite recently. And feed is pretty modular, so it allows you to plug all the elements that techtot was showing us before. So one of the things that I would like to mention as a benefit with feast is the flexibility that allows you to decide what are exactly the elements that you want to implement first in your system.

Jose Navarro [00:35:50]: So you can start building your feature engineering platform, bits by bits, depending on your priorities. You can have an online store first, or maybe it's just the offline store that you want. Maybe you want to build the streaming capabilities first, maybe not, maybe the batch. But on the other hand, because it's also very flexible, consideration to have is also you have to be mindful about all the bits that you can plug into it and be very consider exactly what you need and start building the feature platform in a way that you don't end up in a mess with lots of things that you actually don't need. So just quickly I like to explain how we build it. So we started with the feature definitions. As I said, one of the requirements, it was like to have a centralized way to define our features. So we have a repository in git that gets where the features get defined by the data scientist, and through GitHub actions they get fees planned and fees apply into the registry.

Jose Navarro [00:37:03]: Also to mention that we have two different physically separated fees deployments, one for production and one for development. So prs that are in current development, they are planned and applying the dev part of the infrastructure, whereas the master branch or the main branch is the ones that are applying in production. So then continuing with the storage bit. So we have an offline store and an online store. So the offline store we have Amazon s three with Athelian acquiring the data industry and for the online we want for Redis, Redis, mainly because of the latency requirements that we had, is very fast. And also I mentioned that we wanted to keep the costs under control. And Redis has a way of having a small redis cluster with storage in memory, but also give you the capabilities of having a slightly bigger SSD hard drive that can basically move data from the in memory storage into the SSD. If it starts filling up the in memory.

Jose Navarro [00:38:20]: So data that is quite often used, it stays in memory, whereas the data that is less accessed, then it moves into DS three. So that allows you to have a slightly bigger storage redis cluster without breaking the bank. In terms of data ingestion. For the periodically write we use airflow dags. One thing to notice is that our airflow dags basically connect to our data warehouse and then store the resulting data into s three. So in the offline store and then we use the materialize step to materialize it in the online store. And then we also have the streaming capabilities that we use with Kafka and Spark. Just giving an overview of a couple of use cases that we use that we have already in production is as I said, cast advance is one of our main and most popular products that we have.

Jose Navarro [00:39:18]: And the example that I show at the beginning, it was kind of like an example of a CAS advanced classifier using the database to calculate the features directly in light. Whereas now we have moved the transformations of those features both through streaming and through batch, depending on how important those features are for if they need to be calculated as soon as new events arrived, or we can have delays on those features being calculated. So we have features that are calculated daily or features that are calculated hourly. And then the ML service just requests those features with the get online features in single digit latency. So not only that, but also those features can be reused now in other parts of the. And just to show a little bit of like what type of like improvements we go with these changes is like on the one side you have the old service which was getting P 53 more than 3 seconds and more than 6 seconds for P 95. And the improvements that we have managed with the new version using the feature store, we are currently evaluating the results. Just make sure that we are not missing user loans or we not giving too little money or too much money to the same user, but it will be live in front of users very soon.

Jose Navarro [00:40:57]: Finally, another use case that is being only made possible through the feature store is thanks to the streaming capabilities, now we can have these type of use cases. Cleo uses plaid, which is a platform to connect to banks so we can get data from our users for the transactions they balance, et cetera, et cetera. So to get real time user balance, it's a paid API call and it's an API call that we make quite a lot. However, paid also have an agreement with the customers. So in this case with us, where they send us the customer balance from time to time, but they don't specify when we're going to send it. So we have this type of architecture now where we can start ingesting those user balances that are free into kafka events and then store the state of those balances in a spark context with other features that we have already on the top of that, and keep that on the online store and use that to reduce the amount of times that Cleo has to request for an active balance. So the operation of the business has been reduced thanks to this type of capabilities as well. Those are the two use cases and the overview of the feature platform that I wanted to present today.

Jose Navarro [00:42:27]: I assume that we will leave questions for later.

Mahesh Murag [00:42:31]: Yeah.

Jose Navarro [00:42:32]: Thank you very much. Awesome.

Ben Epstein [00:42:34]: Thank you so much for sharing, Jose. That was really, really cool. And I actually think it is a great transition into our third talk, which is going to talk more about feature store fundamentals, how to build them, key architectural considerations, and Nikhil is going to help us with that session. We'll save some questions for the end. Thanks so much Jose and Nikhil, whenever you're ready, we'll jump right into your talk.

Nikhil Garg [00:42:57]: Hey folks, great to meet you all. My name is Nikhil. I am currently building this startup called Fennel. Before that, in the previous life I worked at Facebook. In my last role over there, I ran a big part of the team that was building Pytorch at Facebook, and before that, closer to applied machine learning, competitive programmer in a previous life, and now taking culmination of everything I've learned in my career from competitive programming, which was hardcore and cognitive problems, working on ML Infra back in 2016, and Pytorch etcetera, and building Fennel out of that. Fennel is a feature platform, basically what tecton does, but obviously in our biased opinion, slightly better jury is obviously out on that, but an end to end feature platform, very much inspired by the traditions of PyTorch a lot of focus on good beautiful abstractions that make it easy for people to do development. And then once you have that right, you then don't shy away from doing the hard ends, then work, actually make it jump to life. We have written our own compute engine in rust that can operate on both batch and streaming data.

Nikhil Garg [00:44:11]: It's a quite state of the art a compute engine and we are thinking about open sourcing at some point of time. And not just me and the rest of the team is also its Facebook from Fedon. So we bring a lot of ideas of technologies we saw and used and developed at Facebook. We'll not talk very much about Fennel, you know, I promise, you know, we'll primarily talk about feature store, feature platform technology. I think the difference between feature store and feature platform, I want to spend 30 seconds on that. I felt the speaker from Tecton did a great job at this feature store and I consider, you know, feast to be feature store. Unlike Tecton or Fennel, feature stores primarily do storage and serving. But computation is typically happening outside.

Nikhil Garg [00:44:53]: I think the speaker from Cleo Jose mentioned that they were using airflow dags for batch and spark streaming for computation. Generally, you know, that's a pattern. Computation is happening outside of feature store like Feast, which then means that there's a bunch of other pieces of infrastructure that you have to build and maintain and stitch together. Feature platform however is a much more end to end system. Everything from authoring, data integration, computation, also storage and serving like feature store, but testing, monitoring, all of that is inside a feature platform. All things equal, you probably want a feature platform. Otherwise you're going to have to do bunch of integration infrastructure work yourself. Though it is also true that they're a lot harder to build, which is where, you know, this talk comes in.

Nikhil Garg [00:45:41]: This talk I'm going to be going through the key components for building a feature platform, not a feature store, but a feature platform. And talk about what are the options available in open source for each of them. And any should be aware of. There are seven components, an authoring layer, what I call as compiler, a metadata DB, a compute engine, a network of data connectors, an indexing layer and catalog. And maybe you are aware of these but by different names. I'll describe how all of that fits together. So first thing is you need an authoring layer, some way for people to express their computation. They want to do pipelines and other features, however they want to do that.

Nikhil Garg [00:46:24]: Then you need a notion of a compiler like some process, some system which can take all of these definitions and convert that into jobs or kafka topics or spark streaming or anything else that needs to be run, the entities that need to be created to do the computation. Usually you would store that in a metadata store of some sort, and the compute engine will pick up the definitions from the metadata store and actually do the computer. The compute engine will have connectors built in with OLAP, OLTP, Kafka, all sorts of system data lakes as well. So you need a notion of data connectives and the values are materialized in what I'm calling an index score. You can almost think of it as KV store effectively. And finally you need a console as well, a web application where you can log in, you can see all the features that exist, you can search, discover for them, then you can also inspect data and the same thing happens for lime server as well. So this is what I mean by the seven components. I'll now go through each component one by one.

Nikhil Garg [00:47:31]: So the first component you need is an authoring layer. In my view, authoring layer is the heart and soul of a feature platform. And this very much dictates what are the capabilities of the platform and what is experience like. And there's a tension here. You want this to be powerful enough to write very complex features. You do not want to use a feature platform where you can only write this kind of features, but cannot really express this other kind of more advanced feature, because when you become more mature in your journey to machine learning, you'd want to do that. At the same time, you do want the authoring layer to be easier to use and learn. And there's obviously a tension between that.

Nikhil Garg [00:48:09]: I think typically the options are SQL, Python, pandas. A lot of people end up building dsls of some sort. Also seen some people use Yaml, which I think is a horrible choice. Honestly, there's a lot of variety and no two feature stores or feature platforms are same. When it comes to the Austin layer, there are similarities, but really each of them makes their own independent decisions. There are no, unfortunately, good oss options for the authoring layer itself. Feast is obviously popular. We are just hearing how Cleo uses that.

Nikhil Garg [00:48:43]: However, Feast does not let you express computation of features and so you still need a different authoring layer for that computation. You do not have a unified layer. Obviously. The challenge is no matter what authoring layer you choose, you have to think about event time, processing, streaming, and also features that depend on each other read site features. There's a lot of complexity that you need to think through if you're trying to build your own authoring layer for expressing these kinds of features. After the authoring layer you have compiler, it's the component that takes all the definitions from the authoring layer and converts it to the actual pipelines and jobs and so on. As far as I'm aware, there are really two very high level options. Either you are dedicating a whole repository to your feature definitions and sound like Tecton does that, or you are sending and parsing those feature definitions, pipeline definitions as astronauts and send them via protobufs to the, you know, from the front end definitions to the compiler.

Nikhil Garg [00:49:48]: We at Fernl choose the latter. There are some benefits of that as well. Slightly more work to do that yourself, obviously. And obviously compiler is very specific to the authoring. If you have a different authoring experience, you're going to have to build that component separately. So there is no standard open source option for this. The simplest option is you could write Python with some decorators and those decorators will have some side effects. Maybe you're creating global list of all the features and what that decorator does is add it to global is something and then you do some operations on that.

Nikhil Garg [00:50:21]: One big challenge here, if you are building a poor man's compiler here, maybe compiler is too strong of a word at that point of time is you want to have atomicity of adding and removing these assets. For instance, if you are removing jobs, let's just spark streaming job as an example. Those are stateful systems and you cannot remove them and realize there's a failure and then you cannot revert that anymore. So you need to do all of that atomically for it to be always in a valid state. Next up we need some sort of a metadata Db where you can store the state of the system. What are the definitions, which are the jobs? What is the status of those postgres MySQl great options from open source as far as I know, where nearly everyone just uses those. We at fennel also use that from what it sounded like. Feast gives you a few options and relational databases is one of the common ones for that.

Nikhil Garg [00:51:15]: Once you have your definitions that are compiled into assets stored in metadata DB, you then need a compute engine. This is where things get really interesting. You have two choices here at a high level. One choice is you try to have a single unified compute layer, what is called as kap architecture in data engineering, where the same compute paradigm operates on batch and streaming. Or you could choose to have a different compute engine for batch and streaming each and sounds like in the Cleo talk that we talked about. Cleo went with Spark streaming for streaming and airflow dags directly operating on warehouse for batch, but then it's very hard for them to interoperate with each other. With Ferrell, we went with gap architecture and built our own rust based computing from scratch. There are lots of choices here depending on what your requirements are.

Nikhil Garg [00:52:06]: If you don't care about streaming, Snowflake and Spark are quite good. If you care about streaming, spark, streaming and flink are options, but they're honestly a pain to deal with. I would not recommend you to go close to them. I think Doug DB is emerging as a new option for or lower volume datasets. If I had to build something today in open source, and if my datasets were small and I didn't care about streaming, I'd honestly just go with dark DB because it also comes pre loaded with a bunch of data connectors. You may need some orchestrator like airflow, however, and so that then becomes a part of your compute engine itself. It sounds like that's what the was doing as well. And honestly here there are lots and lots of challenges.

Nikhil Garg [00:52:46]: You want to do point in time, correct computation. You have stateful aggregations. You want to be able to take backup and snapshots so that you can recover if there's process failures, no failures, data arrives out of order. You want to provide exactly once processing guarantees the moment streaming gets involved, these become even harder. But even outside of that, there are significant challenges here. You definitely need a notion of an ecosystem of data connectors, some way to ingest data from various places. Your data probably spread over warehouses, postgres, OLTP, data lakes, message bus, maybe a combination of many of those. In hindsight, I think I believe that this looks deceptively simple, but is one of the harder parts to truly get right.

Nikhil Garg [00:53:30]: And one of the main choices you would want to make if you were to build it yourself today is do you want the data connectors to be CDC aware or not? If you make them CDC aware, you're only processing incremental updates, which can save you a lot of cost because you're not reaggregating the same data over and over again. But then your compute engine needs to be CDC aware as well, which is even harder to do. At funnel, we chose to make everything CDC aware so that we could lower costs. But if I were to build this in open source without a full VC funding backing the effort, I would not want to do that. I'd maybe keep it without CDC and open source. You do have a few options here. I think Airbyte and Kafka Connect are good general purpose connectors. You could also rely on DuckDB because DuckDB has native interop with MySQL postgres and I think s three as well by now.

Nikhil Garg [00:54:24]: If you are an Amazon shop and are not afraid of java jars and general JVM ecosystem, I think AWS glue is a good option to consider. They have some python KPI's as well. Glue is obviously a lot more heavyweight compared to others and also does not help you with streaming. No matter what choice you make, you'll have to worry about large data volumes and if, let's say for streaming you are able to handle ongoing throughput. When you're backfilling those same data sets, you then need to support a ten x 100 x loud throughput. That becomes a challenge as well. You ideally want the authoring experience of data ingestion from let's say Delta Lake and can assist to look similar if you could get that. But then that requires unifying abstraction which is obviously harder.

Nikhil Garg [00:55:10]: And finally CDC. So bunch of challenges with data connectors as well. Index layer is basically the KV layer. Typically people use redis and dynamo. In GCP world people typically use bigtable if the redis works great. The problem with that is, and it's very costly. SSD tiering helps a bit, but it's still quite costly in the end, especially if you're not running that yourself and are using Amazon or redis cloud for running that instance for you. If I were to build it alone for an open source project myself, I'm basically what it is and then build it in a way where I have standardized interfaces so that I could replace it with dynamo or bigtable or something more cost effective.

Nikhil Garg [00:55:53]: At fennel we run rocks DB on SSD's and we do our own partitioning sharding replication on that. Once again, like I said, we don't shy away from doing the hard engineering work ourselves, as long as it improves the end experience that ends up providing a lot of cost benefits but lot more work to do. So I would not do that if I was part of a one person team, two person team building this using open source tools one challenge here that you should be careful about is you want to take backups and some sort of replication so that your systems don't go down and depending on the scale, you are shuffling data across network which has a cost. And a lot of applications also need continuous windows for aggregation which are a lot harder to do because then you have to store full events and fetch the full event list each time you're doing aggregation. And so network traffic adds up quite rapidly there. And finally you need a catalog layer. I mean, this is a vanilla web app, you know, you can build it however you want. But what is important is in addition to discovering existing features, people in our experience at Fennel also use the same console for monitoring.

Nikhil Garg [00:57:05]: And so you need to have good integration, Grafana and Prometheus and things like that, as well as an exploration in some sort of a lineage view for all your features, data sets and so on. And if you are building one, just put it together using react or whatever. That should work. Just be careful to make sure that you have some good integration. Your monitoring alerting system, because that we have discovered is an important consideration here. This gives you a really poor man's feature platform, but in practice there are a hundred different things that you need to think about. You know, need to think about quality monitoring. You need, you know, data lineages, you know, you need role based access control, GDPR is something you need auto scaling, unit testing, there's a million things like that that you end up becoming important so you can get something going.

Nikhil Garg [00:57:54]: But you know, this is deeper than that. My recommendation is it's probably not trying to build feature platform in house, you probably better off using an existing one. There are a couple in open source, couple proprietary as well. The reason I say that is because it's just lot harder than it looks, and I've made that same mistake myself. When we started federal, I mean we had a strong team of very experienced, good developers and we thought that within three months or so with four developers, we'll be able to build something that we'd start deploying. And boy, we were so wrong. It ended up taking us 15 months and six engineers to get to that particular milestone of the stability, the polish that we wanted. Obviously, if you were to do this yourself, you would not create as much polish and as many functionalities as we have added.

Nikhil Garg [00:58:47]: But still, my point is it's a lot harder than it looks. That said, I do imagine that there's innovation happening in data intra world more generally. I think in particular stream processing is going to become a lot easier. And as that happens, maybe three years from now, five years from now, building a feature platform in house will become truly more of a three week exercise as opposed to maybe a year long exercise as it is right now. That's it. Good luck building this for yourself and thank you for your time.

Ben Epstein [00:59:17]: Awesome, thank you so much for coming.

Jose Navarro [00:59:20]: On.

Ben Epstein [00:59:20]: We actually have a special guest with loads of interesting questions. I'm pretty sure that feature platforms is Demetrius single favorite topic of all time.

Demetrios [00:59:29]: How did you know? You got me. Yeah, I almost made a song called we're going to real time.

Jose Navarro [00:59:38]: Yeah.

Demetrios [00:59:38]: And that's real time. We're going to real time.

Ben Epstein [00:59:41]: That's why you're not on the live streams anymore. We have a couple of questions around real time. Around real time use cases. Back when I was building out a feature story a bunch of years ago, Feast was really just starting out. It was in the 001002 stage, and tech town was pretty young as well. And I'm curious both how real time has become a critical component of both of your stacks in Tecton and in feast, as well as like, what customers have needed in the real time space. Because when I had started, it was so young, we didn't even really know what we wanted. I'm sure that's kind of matured a lot.

Demetrios [01:00:17]: Wait, and before Mahesh answers this one, or Jose or whoever, can everybody just take note of how Ben basically put me on mute right there when he said, that's why you don't get allowed online.

Ben Epstein [01:00:28]: I didn't actually.

Jose Navarro [01:00:29]: That wasn't even.

Ben Epstein [01:00:30]: I don't, I think that was Steve. I don't think I did that.

Nikhil Garg [01:00:32]: No.

Demetrios [01:00:32]: He basically said, all right, cool. Thanks for joining us. Don't sing anymore. We got to actually ask some questions.

Ben Epstein [01:00:40]: The reason, the reason you can't sing is, just to be clear for everyone watching, it's because we have a live AI conference coming up in California. Demetra. We have to save it for people who are going to be at the conference.

Demetrios [01:00:51]: Very true. And so I will drop a link in the chat so that everybody knows in case you want to join us in San Francisco on June 25. You're going to have your ears bleed, but it'll be worth it. All right, so now, Mahesh, sorry to derail the conversation. Maybe Ben needs to actually repeat the question because I can't remember what he asked.

Ben Epstein [01:01:12]: Yeah, asking about how real time use cases have evolved over time and both how tecton and feast have evolved with those things since back in the day.

Mahesh Murag [01:01:21]: Yeah, absolutely. I have to speak to how, especially Tecton has changed over time. I would say the large majority of our customers really, really care about real time, and the majority of our architecture is structured in a way that makes real time possible. For example, on the data semantics side, a lot of the time, it takes extra time for your batch data to actually land in your online store when you're doing retrieval, that actually has an impact and should change how you retrieve that data, because you don't want to assume that any future data has landed in your online stores before it actually has. As an example, our offline retrieval takes that into account and make sure we're not leaking any data at the time of retrieval. That's something a lot of our customers have had problems with and something we've built in from scratch. Then, on the performance side, real time has continued to be really important. We're investing a ton right now into this concept called data compaction that makes real time retrieval really, really fast by us orchestrating these background jobs that automatically compact your data into these big tiles.

Mahesh Murag [01:02:30]: So at the time of real time retrieval, it's really fast. You only need to get a couple of rows of data from whatever your online store is, and it makes it. In our experience, I think 80% is some latency benefit, one of my engineers told me yesterday. So still investing a ton, but it's really, really important and something we're going to keep making better over time.

Demetrios [01:02:51]: Senor Jose, what do you got for us?

Jose Navarro [01:02:55]: Well, we're still early on with the work. On this type of work with feast, what we have is more like streaming cases where we have events happening and then we reacting to them and storing those into like a state where we can recalculate things and then start querying them a bit earlier. And that definitely has enabled new use cases as one of the ones that I just presented, that by having a state of the passive checks that we have of balance checks of our users, then that allow us to stop requesting actively to play how much money this user has in the bank account. So we don't have to be paying extra money all the time because we have a state of like what things have been happening recently and we can decide if it's worth to spend that money or not. Because if we have a passive check that has come in the last hour, maybe it's not worth it to just go and check for a new balance and we just can relay rely on that balance that we have. So those are the types of like use cases that we have. We don't have like what my head has just mentioned of like data compaction and stuff like that. We still early on, but at least the justice streaming capabilities have allowed us to like reach some sort of like, not real time, but like low latency data coming through into the online store.

Demetrios [01:04:29]: I like the twister, the tongue Twister game that you have to play there. Jose, you're not calling it real time, you're calling a little different, but it's kind of like it. And so this is awesome. Thank you all for joining us. Nikhil Mahesh, Jose, really appreciate it. As Ben mentioned, this is one of my favorite topics, so it is very fun to chat with you all about it. We gotta go though. If anybody is still with us and wants to meet others that are in this session, you can hit the match button on the left hand sidebar and it will basically omegle you with another person that is here with us today.

Demetrios [01:05:11]: So I will see you all later, have a great rest of your afternoon and hit us up on slack or I'll see you on June 25. Peace.

+ Read More

Watch More

1:45:28
MLOps Community: LLMs Mini Summit
Posted Nov 22, 2023 | Views 1K
# LLM Fine-tuning
# Large Language Models
# Weights and Biases
# Virtual Meetup
Innovative Gen AI Applications: Beyond Text // MLOps Mini Summit #5
Posted Apr 17, 2024 | Views 1K
# Gen AI
# Molecule Discovery
# Call Center Applications
# QuantumBlack
# mckinsey.com/quantumblack