Real-Time Forecasting Faceoff: Time Series vs. DNNs
speakers

Josh is a data scientist from the Marketplace team at Lyft, working on forecasting and modeling of marketplace signals that power products like pricing and driver incentives. Josh got his PHD in Operations Research in 2013, with minors in Statistics and Economics. Prior to joining Lyft, he worked as a research scientist in the Operations Research Lab at General Motors, focusing on optimization, simulation and forecasting modeling related to vehicle manufacturing, supply chain and car sharing systems.

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.
SUMMARY
In real-time forecasting (e.g. geohash level demand and supply forecast for an entire region), time series-based forecasting methods are widely adopted due to their simplicity and ease of training. This discussion explores how Lyft uses time series forecasting to respond to real-time market dynamics, covering practical tips and tricks for implementing these methods, an in-depth look at their adaptability for online re-training, and discussions on their interpretability and user intervention capabilities. By examining these topics, listeners will understand how time series forecasting can outperform DNNs, and how to effectively use time series forecasting for dynamic market conditions and decision-making applications.
TRANSCRIPT
Josh Xi [00:00:00]: Josh Xi with Lyft for a little bit over five years. Stat data scientist. How do I want to take my coffee? Latte is good. I like it. Creamy milk. I found recently found those 6% fat milk that's super creamy, almost like half half, but it tastes just so good.
Demetrios [00:00:24]: Well, hello. Hello everyone. We are back for another MLOps community podcast. I'm your host Demetrios and today we talk about time series machine learning models versus deep neural networks. And Josh has some strong opinions on them, but we would love to hear if you also are opinionated about one or the other. He just told me as soon as we stopped recording that he would love to get comments in case he's missing something and he wants to know how and what folks are doing, if they are doing it differently than him. Let's get into this conversation. First off, tell me what you're working on because I find it fascinating and it is deep, deep in the weeds and I love it.
Demetrios [00:01:19]: So I want to know everything about it.
Josh Xi [00:01:21]: So I'm in this big Oracle marketplace and that's basically the team manage the supply balance of the lifted marketplace. So essentially we are a platform team. So you have the demand side, the supply side, and you want to achieve the market balance between the two. So there's different levers, you can do that, like pricing. Typically if the demand is too high, you can increase the price to or suppress the demand, or you can offer driver incentives to sort of acquire more drivers. So both can happen in either real time or long term. So long term pricing is more like coupons. You send out coupons so you can attract more riders onto a platform.
Josh Xi [00:02:02]: But all of these models or levers, they take lots of the signals that goes into them. So like the most basic one will be forecast the demand or forecasted apply. So my team is called market signal. So that basically provide all these key signals or features that goes into these models. So you need to know what's happening now, but you also need to know what's going to happen in future. So there's lots of forecasting for those levers.
Demetrios [00:02:32]: And I think I've heard something from folks who left Lyft about just the sheer amount of models that you're running at any given time. And maybe it's not that it's different models, it's just different models in different parts of the world or parts of the US or. And so it's, it's very similar models, but kind of different, I guess. Same, same, but different.
Josh Xi [00:02:58]: Yeah, so it's a. It's actually a huge amount of models. So basically Lyft is very data driven. So all of these like levers there running in different regions. So at the same time. So like in. So my team one set, the problem we are facing is basically we call them real time forecasting. So basically for every geohash, for those who are not aware of geohash, like it's like a typically standard like zone definition, kind of like zip code but they have different levels.
Josh Xi [00:03:31]: Like we're usually looking at geohash 6 which is like a one by one mile ish like cell in a location. So every city you typically have like a couple thousand sometimes to even like 10,000 geohashes. And so basically for every geohash we need to generate a forecast in real time for the next say five minutes up to an hour. So you will probably have millions of data points to forecast at the same time.
Demetrios [00:04:01]: Wow. Okay, so each geohash which makes sense because you want something that is accurate down to the block.
Josh Xi [00:04:09]: Yeah. So typically if you're thinking about like pricing or like driver incentive, like essentially you're trying to reallocate drivers in a way or attract them to certain busy area. So it needs to have certain like granularity. So, so if you just forecast the whole region, it doesn't really help the drivers to know where they should go. So in order to have the actual action you can take to make some sort of impact, you have to sort of have that granularity. So for that reason we typically start from GeoHash 6 level, but depends on use cases. Some of them we might aggregate them to a high level to sort of fit those needs because the bigger the area they are, the more density you have and potentially the more accuracy you can achieve.
Demetrios [00:05:02]: And how often are you using features or data outside of the platform? That is something like there's the super bowl going on and there's probably going to be a lot of demand around the stadium when the game is about to start or from two hours before the game and then two hours after the game.
Josh Xi [00:05:25]: That's a really good question actually. So we try our best to sort of getting external sources. Events is definitely one of them. So we do working with some data contractor to get some input. So events outport information like flight, landing, departure, anything that we can, we think that could be helpful. But in reality there's lots of challenges just to ingest those data. So taking events as example, like we can easily typically we have information about like the event start time, like which is the ticket time. When people normally will go to the event.
Josh Xi [00:06:08]: But the tricky part is the event end time. Um, like imagine you're sitting in the football game and it's like two minutes left on the clock. And that two minute can sometimes mean just two minutes or sometimes that can mean half an hour if there's overtime or any sort of uncertainty. That depends on like how close the two the current score is between the two teams.
Demetrios [00:06:31]: Right, Totally. So there's certainty for that for all the international listeners. That is when American football, two minutes is not actually two minutes for football around the rest of the globe. Soccer. Right, two minutes. Yeah, I guess you have a little bit of leeway at the end of the game, but not like in American football. But anyway, sorry, I didn't mean to cut you off. I just wanted to.
Josh Xi [00:06:55]: Oh yeah, that's totally fine. Yeah, soccer. But yeah, for soccer sometimes I go into overtime. So that might be means another half an hour or longer. So. Yeah, so we've been having trouble trying to figure out what's the right like event end time. That's definitely a big challenge. Other thing like we mentioned about weather data, so we also have been doing lots of sort of analysis on how weather impact the demand essentially.
Josh Xi [00:07:29]: Well, I think most people, their first instinct is like bad weather, more demand, which is maybe generally true. But when we are really looking at the data, like looking at creating all these precipitation features, temperature features, or do all kind of combination on top of them and trying to find correlation with our demand data, it's actually lower than what we expect.
Demetrios [00:07:55]: No way.
Josh Xi [00:07:57]: One of the conjecture here is actually sometimes not really about what the temperature is or how much rain you will have. It's more about like people's expectation of what's going to happen and what really happened. So the forecast says, okay, it's going to rain and it's pretty accurate. Everybody is more likely mentally prepared or they already have some way to sort of prepare for the big ring, heavy ring. But if something caught them off guard, everyone's like okay, let me call a taxi so. Or of course I'm uber lift. So I think that makes a huge difference. And also snow, one of the in most interesting finding that was like back in 2014, that's when I actually first start to look into taxi data versus the weather.
Josh Xi [00:08:47]: And like everyone's like, okay, snow is going to make huge difference on taxi demand. So we were look in the first half of the winter we were looking at the data, we saw some correlation, but somehow second half of the amount of snow has really Poor correlation with the taxi demand. And you know what happened. So that year was like a super bad winter. And in the second half of the season most of the cities start running out of like their salt to clean up the streets. So they don't clean the streets, nobody can travel. People just give up travel or they just start decide to stay at home. So there's really not much correlation between snow precipitation versus like how much demand you will have.
Josh Xi [00:09:34]: So those are external information. Usually it's very hard to put into the model. So yeah, that will definitely, those kind of factors definitely affect how we build out forecasting models or if like certain models will going to work better than others.
Demetrios [00:09:52]: I thought you were going to say when it rains, people just stay home so the demand is lower, which I can imagine. A certain subset of folks stay home. But it is more like what you were saying. If it says it's about to rain, then I am more mentally prepared to go out in my rain jacket or with my umbrella. But if it is supposed to be sunny and then it starts raining, it's like, oh no, I didn't bring my umbrella, I didn't bring the. I'm not prepared for this. I need to get home or get to wherever I need to be quick.
Josh Xi [00:10:27]: Yeah, that, that's totally true. And also I wonder like if pandemic changed people's like travel behavior. So like before that like everybody's like sort of mandatory to go to office. So even if it's raining, they're more or less likely to sort of, okay, it's raining. I still need to go to office these days. Okay, if this rains, I'm in a bad mood, you know, I only need to go in the office two days a week. Let me pick a different day or something.
Demetrios [00:10:56]: Yeah, yeah, completely. It changes how likely we are to do something. When we talk about this data acquisition though, that just seems like a mess, man. It seems like something that is so difficult A, to get the right data and then B, to clean it or to transform it and then create insights from it. Especially for this external data, I imagine you have some kind of pipelines set up and you're probably constantly tweaking them or you're doing, you're playing around with it to see can we create better features from this or something that gives us more insight that we can feed to the model. And I off base there when I say that.
Josh Xi [00:11:38]: Yeah, yeah, that's definitely something we do a lot internally. We have teams like looking at the event data to curate them. But in the end it's like so much labor to just get them right. We focus a lot on the top events and that's the one that actually works well in some of our models. But for most of them it doesn't. So that sort of kind of lead to what I want to sort of thinking about in my head to talk today is like why traditional like time series forecasting works better in reality like the end a lot of people's concepts like okay, the model is going to learn the feature on its own so you can plug in as many data as you want. So let the model learn what's important, what's not. That usually means lots of training data you will need and lots of computation power you need versus time series type of model.
Josh Xi [00:12:38]: Like Autoregression, ARIMA is trying to predict basically the future by mostly focus or almost exclusively focusing on the history. Because at least even the market is not st. Like even the market is not stable. They are not sort of keep having the same trend, but within a short amount of time things tend to repeat on its own. Time series models, looking at what happened yesterday, the day before, same time last week, it actually already covered lots of what's going to happen in your feature. So this type of models is something we actually been sort of has been our top choice so far. And because they're sort of very interpretable because your future is some sort of a weighting average of your history. So based on how much you believe the future going to change relatively to the history, you can relatively, I would say easily to apply any human intervention to do some sort of adjustment say based on what we know, okay, last week there's no super bowl, but you know, next week there will be a Super Bowl.
Josh Xi [00:13:57]: So you can actually easily to adjust your demand in that case. But with dnn, I guess in the case of super bowl we can do the same thing. If you know DNN is never trained on super bowl so you can make adjustment too. But there's also all other kinds of local events that's happening and some of them might be much harder to make adjustments and you're not sure if DNN captured that or not when feeding those models. So you have to versus autoregression, you know, you sort of know for sure. It's like okay, if I don't because typically like autoregression does not. It's just history values. So it does not really explicitly put any events information there.
Josh Xi [00:14:44]: So you can just assume there's no events like in my model. So you can easily add on certain things or there's more advanced models like decomposition models. You can decompose trends, seasonality and so you sort of know like okay, which part I need to adjust. If I believe there's a local events happening, it's a spike, it's, it's not in the seasonality, it's not in the trend. So you can take those two components, add on your spike so that make it a bit much easier to adjust your model and adjust your forecast and make it a little bit more accurate.
Demetrios [00:15:25]: And it almost seems like with those spikes you can also leverage the history to say okay this looks like a spike that we had maybe two weeks ago or last week, so we'll trend in that direction. What I'm really interested in, there's. There's kind of two big questions that I have with DNNs and then also I've heard a lot about time series foundational models. And so the first off I guess the simple question is have you played around with time series foundational models? Have you had any success with them? Because I've generally heard from people in the community that they are not very valuable and they haven't had success with them.
Josh Xi [00:16:09]: So actually I, I haven't. I played well. I actually played a little bit with time ch time GPT. I know my co workers also have played a lot with some of them. So I, I actually learned my firsthand information fountain not myself but. But my learnings based on my conversation with them is it's accurate when you're looking at some of our use cases because we have actually have forecasting for both real time spatial temporal models like what I described earlier that like super granular in that sense and because every minute you're looking at forecast next 5 minutes, 10 minutes up to an hour and it's all these thousands of cells so there's lots of variances, all kinds of things going on. And you also need it to be fast because every minute you just keep refreshing your forecast versus there's another type of forecasting which is more like offline or short term near term. Every company or team might use different names but they are looking more at like say a regional like the total host San Francisco or break up into a few sub regions and looking at like hourly daily signal values for the upcoming week or two.
Josh Xi [00:17:28]: So most of the learning actually is it works well for those type of models works well for the later use cases. The first use cases haven't been that well so far. I think biscuit that's mostly talking about accuracy but I think beyond accuracy, another really big issue is the real time like cadence. It's just happening so fast and those type of models are a little bit too big to sort of spin up quickly and just keep emitting features. And I'm not sure if there's already a sort of engine infra designed to leverage those models for use cases like that.
Demetrios [00:18:14]: Yeah, especially at that latency that you need. It makes a ton of sense. And that was the question that I wanted to talk about with the DNNs is what kind of infrastructure is needed to serve those types of models or trying to utilize those models in your use cases? And is that not adding extra headache to the problem when later you find out like damn, these don't even perform that much better? Why are we breaking our back to support the DNNs? It's just because maybe some folks want to have it on their resume. Is it like what is it about? I. I know it's not that you, you all probably wanted to like thoroughly test it, but at the end of the day if it's much harder to support and they're not giving you that much of a lift, then I'm assuming that it's much harder to support. I would love to hear from you what in reality the difference is between supporting these two models.
Josh Xi [00:19:17]: Let's compare two type of models which actually is something we've been testing a lot on and one is sort of auto regression which I mentioned earlier. Your future is weighting average of your history and the other one let's dn. There's this very famous paper from Microsoft Lab focusing on spatial temporal DNS. There's also other models like LTSM use for time series broadcasting too. Although most LTSM papers focus on just one single time series, in our case we're talking about many like thousands of cells in a spatial like in the city or something. So you also have to capture the spatial correlation there. So make the model a little more complicated. So the Microsoft paper is one of the most well studied or used paper in lots of the application or research.
Josh Xi [00:20:17]: So from training I think the two make a huge difference. First is autoregression to do not use gpu, just cpu, a single CPU. If you want to do any back testing training it's fast and spatial GN you use GPU and so cost wise it can be at least 100 times difference. That's from our firsthand experience now serving them. So if you want to just like server, like DM model, you do not necessarily need to use GPU because your model weight's already pre calculated. So you can just spin up machine and just take the weights, reconstruct your model and take your inputs a little bit more computation. But I wouldn't say cost that much extra because thinking about it, typically in our case we do forecast like the cadence like every minute. So it's every minute you try and predict for the next say 30 minutes.
Josh Xi [00:21:24]: So as long as you can finish your whole forecast in like 30 seconds it should be good enough to go. So I think just a general machine loading the model weights, reconstruct the model, take your input, do the calculation. 30 seconds is sufficient time series models much less weights. So so of course much faster so soon. Definitely no issue. So there's why there's not much sort of difference from cost perspective on the inference part.
Demetrios [00:21:57]: How about on the data side when you're training it what kind of data you need for each of these? Because I feel like you need a lot of features for the autoregression but maybe you don't need the features or you don't need it as clearly in the deep neural network.
Josh Xi [00:22:13]: Oh yeah, that's a good point. That's another difference on the training side is the training data. So for Arima we can just take a couple weeks of data in our experience and video like features which is future values versus historical values. If you're trying to use history for saying time of week from one to two, three weeks ago, then you also need much a little bit longer history. For DNN it's doable with small amount of data. Then the question is like the weights might not be as good in the beginning because you have more weights. You are more likely to like run into training situations. So ideally you want to take much longer history and auto regression.
Josh Xi [00:23:08]: You can even start without actually sort of any weight. You can just assuming a moving average it was like 20% on last three minutes and 20% same time of day from yesterday, 20% from the day, sorry the week before something just making up your own prior of how the weighted average will be. What you can do is is the action is online. Back to the online difference. Although there's no cost, there's actually a huge sort of advantage with autoregression or classical models is you can refit them. Those are linear models. You can refit them at a super fast speed because it's just a few weights, right? There's probably 20, 30 historical features. You take average arm so you can just you have a new observation from the past 5 minutes or 10 minutes half an hour which depends on what sort of retraining cadence you want to do.
Josh Xi [00:24:08]: And you can just plug into this linear model and just run a refit, adjust your weight so you can do online learning, make the model sort of adaptive to whatever it's changing in the real time marketplace. So any sort of changes like a spike's picking up, you can use refit a model to quickly put more weights on maybe your recent values to catch up the spikes with den it's a much larger model. So you can do retraining too. Right. Because that's how we train the model anyway. It's like a batch training or something. You can always take a small sample data and retrain the model. But the problem comes in to the cost because retraining in the DN model is actually very expensive too.
Josh Xi [00:24:51]: So similar to the training case, anything you want to do retraining online, trying to adapt to any changing, changing situations happening in the marketplace that will incur much higher cost. So that's actually another sort of shortcoming of DN in practice because that cost you can probably, you can only do less often retraining, which also means you are sacrificing your accuracy with, with less retrain. So yeah, it's. And it's similar experience that can also mean 10 times or a hundred times differences in your training cost.
Demetrios [00:25:32]: Well, especially if like you're saying you're retraining continuously and constantly just retraining when you're learning new things about the world. So I imagine like how often are you retraining? Is that pipeline just set up to be triggered every couple seconds? Are we talking every couple hours?
Josh Xi [00:25:51]: Every day, Every minute right now? Like it's just a choice because the machine is basically standby for our models the whole time. So I know like inside the marketplace team, my team has one of those expensive team like using machine power. Yeah.
Demetrios [00:26:13]: But it's worth it, I guess if you're continuously getting that accuracy, just that 0.3% accuracy lift is gaining you a whole lot of revenue. Yeah. So you, it makes sense that you want it to be as accurate as possible.
Josh Xi [00:26:30]: Yeah, we, we do some sort of a cost benefit trade off more or less. So how much we've been running these machines, how much they cost versus like the estimated impact of the accuracies. So those are trade off in our daily work we need to look at too. Yeah.
Demetrios [00:26:50]: And so I imagine it's gotta be automatically retriggered for this retraining. How do you then go and add extra value? Is it that you introduce a completely new model and Then that gets thrown into this retraining loop. Or is it that you just are continuously updating the model and adding new data sources like where do you plug in to make sure that whatever model is out there is performing the best possible.
Josh Xi [00:27:20]: With online autoregression models, it has been working well for most of the use cases like capturing that spike and making it adjust faster to the marketplace. The challenges so far we have learned is actually probably mostly around events. Our models do capture that. Like when events start to like increase demand, our model will capture that. But there's a little bit delay in our model here. So depends on how we set the learning rate in our model relatively to the historical data. It's basically you have a old weight, you have learning rate based on what you observe. It's battling, okay, should I put more weights on my recent spike or should I just trust more in my history? So it has been very hard to tuning those parameters because every region is different.
Josh Xi [00:28:24]: So every region you typically have their own models. But we haven't really had a good approach to finding that perfect parameter tuning for each model. Ideally you can just keep doing everything offline, keep testing and running lots of machines that cost lots of money. So we have some sweet range. Typically this parameters works well. So we use them for all the models we are running. So on top of that, what we can do is you can always observe how your model perform in the last few minutes. Right.
Josh Xi [00:29:02]: Once we see the real data, what you forecast it and what the error is. So for external adjustment instead of using sort of okay, you can't instead of like having people looking at the data and making adjustments or say there's a football here. So let me add more demand. Another approach actually just looking at the forecast errors and see how constant or that error has been. And you can add some sort of bias correction or some sort of adjustment by setting up some sort of heuristic rules to okay, if it's say last 10 minutes, it's constantly on the forecast. Let's bump up the demand. So the adjustment can be like using ratio based on how much it was. Like if the forecast has been constantly 90% ish of what the actual is, you can sort of divide by 0.9 or if the so but if you're thinking about like actual human intervention, so far we haven't really done test that out because the type of problems we are facing is like spatially huge.
Demetrios [00:30:23]: Right?
Josh Xi [00:30:23]: You have thousands of cells in the region. It's really hard to just looking at each one of them. So Everything is sort of automated based on more like just accuracy versus another sort of direction we're looking into is actually more ensemble models. So autoregression is just one of the model. It's easy to explain. So that's what we pick. But there's other sort of linear. In the linear model area you can run refit quickly or you can also basically change the model setup based on your assumption of how data is spatially distributed.
Josh Xi [00:31:01]: You can basically apply more new models to sort of running them at the same time. As long there's relatively small you, you don't. It doesn't really cost that much and you have different models running at the same time and you have different performances. And then based on the recent performances you put more weight on the one that give you better forecast. So that's, that's another approach to sort of make bring up the accuracy.
Demetrios [00:31:30]: So it's not like you're taking the average of the five models that you're running. It is that you're taking the model that is being the most accurate.
Josh Xi [00:31:40]: Yeah, you can also weight them. So the best one give 80% the weight on the forecast and the second best give them 20%. So it's like a weighted average of like the two best models. That's another sort of approach we are taking. So so far we, I would say my team have been focusing mostly on using those approach to get better accuracy. Yeah, because just human intervention. So we only have a few data scientists on a team. It's hard to just like okay, let's look at what's happening here today, what's happening there tomorrow.
Josh Xi [00:32:24]: That's a little bit like beyond what we can do right now.
Demetrios [00:32:29]: Yeah. And it does seem that the idea of the more simple models versus the deep neural networks are easier to interpret.
Josh Xi [00:32:42]: So actually another way we can actually do which is actually sort of what we hoping to achieve this year is to online models like very granular. But I also mentioned about like another type of forecasting which is people running at regional level forecasting or sub regional. Like instead of looking at the cells they look at like maybe 10 subregion in the city and they looking at hourly forecast for the next few days. And so from those models you can get the trend, the seasonality and they also typically those model also they have more people to look at the impact of the like event. So they will do like sort of some sort of a manu adjustment based on knowledge of okay around event time this is how much spike you will see. And so we are sort of thinking taking that as output and then feed into actually the last layer of our real time forecasting. And sort of once we forecast right we have say we are looking at a cell level and for this event happening in this sub region we can aggregate all the cells and see what's the total demand. We forecast it and see how far that is from the offline forecast and then you can take that difference and do a multiplier or something and that's how we can do adjustment.
Josh Xi [00:34:13]: Yeah, I think it's more end challenge because we have to have the real time system talking to the offline system. But the concept is pretty straightforward. Taking your output of the real time forecasting, compare that to the offline forecasting with human intervention and just check the differences between the two.
Demetrios [00:34:36]: And so is the idea there that you know what it should be because of the seasonality or the. You know what it should be because last year at this time it was this or. And you're looking at a. You're looking, you're opening the aperture and by opening the aperture it's going to give you another feature that you can feed into the model.
Josh Xi [00:35:02]: Yeah, because autoregression typically look at the loss what happens last week and sort of capturing the average sort of seasonality based on last week or two weeks of data. So if you offline. Offline has more longer, they look at longer histories and they deal with events more explicitly. So if they believe the event's going to bump up the demand and they already did their adjustment and their final outcome. So our assumption here is they have a better capture of those like special events or seasonality because they've seen it.
Demetrios [00:35:44]: Before last year or last quarter or whatever. And I get it.
Josh Xi [00:35:50]: Yeah, yeah. And offline models typically they do training looking at two years of data because it's typically a few times serious and over like just a few sub regions. So it's not really that much data to sort of train using a longer history. So so it's totally doable. And also for their use cases typically they run on a weekly or daily basis. They run forecast once every week looking at the daily or hourly forecast for the upcoming week or two so they don't have to just like in real time keep generating those forecasts. So cost to them is a less concern. So they have more.
Josh Xi [00:36:34]: Their model tend to be a bit more complicated dealing with seasonality, external events and they also look at a regional level. So and they, they can also sort of pick out the major events in the region and sort of to do adjust on Top of that. So they definitely have more, a little bit easier in terms of making adjustments, building more complicated models because they've seen it.
Demetrios [00:37:07]: Yeah, that makes sense. It's like, it's not so new and it's not like just out of left field and you have that data. I, I didn't realize that it, you get two years of data.
Josh Xi [00:37:19]: Yeah. Or even more if you want to.
Demetrios [00:37:22]: Yeah. So if there's a local sports team and they have, during their football season, they have a game every Sunday and then you know, all right, when it is football season, this is more or less what the demand is going to be like because we've seen it for the last two or more years.
Josh Xi [00:37:40]: Yeah. And those. And it's sort of more or less embedded in the feature like the time series already. And so sometimes it's more like the, the thing that happen only like a few times a year and that's very hard for the time series to pick up. So in that case like likely human intervention will be needed. So I think for whoever working on long term forecasting, they, they the last step, they focus a lot on those like big but only occur at least a couple times a year events.
Demetrios [00:38:18]: Yeah, I can see that. So the other thing that I wanted to ask is when you are testing out these new models, do you do like side by side analysis? Are you doing some kind of champion challenger release? Is it? I think there was something else that I had heard of, but I can't remember how it worked where you're just giving it dummy data and you're simulating what it would predict and you're seeing if it would be more accurate than what you have lived.
Josh Xi [00:38:50]: Yeah. For time series forecasting, one of the most common technique is called backtesting. I wonder if that's what you were thinking. So basically you can pretend this is, you were sort of sitting in the past at some time and then at that time on you run a model training and then you pretend this is okay, what the model's weight's going to be and then you start to like following the time like a simulation moving forward and then generate forecasts because actually everything has already happened in real life. So you can actually see what the model predicted versus what the history actually was observed and you can calculate the bias or any sort of metrics, performance metrics for your accuracy. So that's how we sort of compare models or every time we're trying to make tweaking to our models or evaluate new models, backtesting is what we do to sort of help us decide, okay, if this model actually will perform better than another one, how big of a.
Demetrios [00:40:01]: Factor is it that these models are also having a geospatial aspect to them?
Josh Xi [00:40:09]: Simple answer. It's very difficult. So yeah, so for dn, if it's single time series, you do long short term memory and you don't have to worry about how data correlated spatially. So model is simpler, less model weights training, faster forecast, faster for spatial models. Now another thing people actually need to do is to sort of dealing with the correlation between different spaces. Because you can treat every cell as its own model. But then if there's 3,000 cells, you have 3,000 models, you do not do that. So more practical solution is sort of treating the whole cells as, sorry, the whole region as a sort of different dimension in a single input.
Josh Xi [00:41:05]: So the spatial temporal paper from Microsoft Lab that I mentioned earlier, what they do is actually they use some convolutional neural network approach. So basically every time snapshot you have say what your demand is, what the supply is and at different locations and you can consider that as an image. So your latitude is your X axis in your image, your longitude is the Y axis in image. Every cell is some sort of value on your image. That's how image processes processing is basically handling their data. So we can use the same approach. And in CNN you can apply a sets of kernels to sort of basically learning the spatial correlation like 3 by 3 cells or 5 by 5, sorry, 5 by 5 cells to get their correlation across the space and applying different weights. So you can learn like okay, maybe when this guy is going up, that guy is going up.
Josh Xi [00:42:16]: Like oh, when this location have a big demand, the other location have the nearby location have a big demand all opposite. So it's learning all these spatial correlations. So you end up having model weights in your neural network to capture those informations. But to learn those weights well, which means you need more training data set. And although one thing we learned is those correlation can change over time. So you're trying to get learn more about the correlation. But then if last year's correlation is different from this year's correlation, it does not do well. You sort of can only learn a limited amount of correlation and this could.
Demetrios [00:43:02]: Be like some kind of construction that's happening for half of the year. And so you're creating a correlation that is only because of the construction, right?
Josh Xi [00:43:13]: Yeah. And also maybe you know, some offices, people relocate to another building or pandemic that's a huge hit. Right. People all of A sudden travel differently and people moving constantly from location to location. Economy, macro economy can affect like a strip mall sort of running out of business and changing people out of control. Although let's say those are less issue. But yeah, you just. You can only use a limited amount of data in that sense.
Demetrios [00:43:48]: So I could see how the creating patterns for the spatial data. For example, in my head I play it out as if there is a stadium and you have an event happening at that stadium. It's the north and the south side that people are going to be entering. So you would expect that there's some correlation there. That when one side is busy, there's also the other side busy.
Josh Xi [00:44:15]: Yeah, that that can happen sometime. That actually remind me a very interesting learning. We also learned is venues they operate differently time from time. Somebody jumping. Okay, last month we did this with these two like entrance right south north. So that's how they're. You're directing people to depart from the venue. Of course was like okay, that was horrible.
Josh Xi [00:44:41]: That was a mistake. Let's change like let's make east south plus some other location. And one month they're like let's do a bus shuttle like so to move people from this location to another next month. Shuttle service is not reliable. People are still complaining. So the venues interestingly keep changing too. That also happen a lot at airports. Like.
Josh Xi [00:45:05]: Like sometimes they use lot A for like Uber Lyft pickup. Sometimes they use lot B. And that changing over time too. So that keep messing up our data Unless we know exactly how the venues or the airport's been operating. But sometimes hard because there's always an information delay.
Demetrios [00:45:26]: Which also makes sense if you think about a football game versus a Taylor Swift concert and how the crowds of these events operate and then the times of these events. And so maybe you're thinking okay, it generally when there's an event, if you just have it as event at stadium and you don't know more information about that, is it a football game or is it a concert? Then you're going to get burnt. And so you probably have to have more granularity onto what type of event is it.
Josh Xi [00:46:03]: Yeah, and yeah that that's definitely so true. And sometimes it's hard and also like the same singer, sometimes you might attract different crowds. Sorry, the same man. But by different singers they might try attract different crowds. And certain crowds might more leaning towards riding Uber lift. Certain crowds my prefer like driving themselves. So we also noticed that too like sometimes it's the same venue, same time the week, but somehow they have very different demands.
Demetrios [00:46:39]: That's where your forecast model, I'm guessing it just gets blown out of the water. And so that's what you're talking about where you need to kind of look at the. The misses or the negatives instead of looking at what it is doing correctly.
Josh Xi [00:46:55]: Yeah, that's like the refitting or bias adjustment is so important because we know we can't model them correctly and so the best we can do is learning from our mistakes. Okay, last five minutes. We on the forecast by this much. So let's do a refit of model or let's apply some bias correction because sometimes we look at the errors like we don't even know why we see this big bias in our forecast. It's just mind blowing. Yeah, so there's like in reality it's like so like simpler model interesting. Make it a little bit easier because you can apply adjustments easier and faster versus complicated model dms it's theoretically very interesting. But if you're dealing with especially when it comes to data is related to human behavior that tend to conluded with many other unknown factors.
Josh Xi [00:47:54]: So it's very hard to really forecast them well versus I would say like if you're dealing with image processing or language those are more structured. Right. People talk following certain grammar, you have some variation but then it does not really fall too much outside the box. So that's why it's more predictable because it's not so outside the box so it's probably easier to train them well. And also you have so much data like language. Oh I think when a language change totally will probably take thousands a year. Right. People all of a sudden start talking differently using different idea concepts.
Josh Xi [00:48:37]: At least for the last five, 10 years. I wouldn't say we will talk that much differently. But when it comes to travel, certain human behavior is like oh five years ago. It's a totally different world. You can't use the data to train a model anymore.
Demetrios [00:48:51]: Yeah. And the you say that but you obviously have not spoken to many Gen Z people. I guess you don't know that's true.
Josh Xi [00:49:05]: Yeah, the.
Demetrios [00:49:06]: No, but if you compare Shakespearean language to our language. Yeah that is a monumental shift. But that also took hundreds of years or a thousand years. I can't remember how long ago that was. And so if you're looking at this, there, there was something that I wanted to ask you though about how often you're trying to root cause these anomalies that will come through. Like you said, sometimes you just don't know why was There so much bias in this prediction. Why did we not get it right? How much of your day goes into figuring out why you didn't get it right versus just business as usual.
Josh Xi [00:49:49]: Yeah. So it depends on the use cases. So I think my team, since we are looking at cell level like millions of data points, we don't really have that much capacity to really look into that level. So what we typically do is we have certain metrics at more like regional level. Like you aggregate all cells together, look at the forecast versus actual. We have certain metrics we trend. If it's like the values is too far off or if there's some sort of drift in the trend. A set of sort of metrics we monitor.
Josh Xi [00:50:28]: If they trigger certain alarm, that's the time we sort of will spend effort to look into that other time is typically because all of these actually goes into downstream models like pricing. Right. Or driving incentives. So they have their performance metrics on their side too. Typically they're meeting their sort of certain pricing targets. If their price, somehow their model just start to spitting out like price that's really outside their bounds or something. They will look into that and see if that's triggering by forecast or other factors. If they do leave it something on the forecasting side and they will come back to us and we will sort of look into that.
Josh Xi [00:51:15]: So that's actually how things are sort of set up for the real time side due to the granularity, I think challenge. Yeah. But for offline forecasting, that's not my team. But there's other cases where they do this regional level hourly value for the next say to one week or two. They, they have people sitting down review like region by region. So that's actually a more like frequent cadence on their side. Yeah. And but then they typically are very sensitive to the differences because that's actually their downstream typically sort of decides the company to decide.
Josh Xi [00:52:04]: Okay. How much money they want to spend next week. Because if there's a big gap in incentives or sorry, if there's a big gap between supply and demand, they need to decide. Okay. How much money they need to spend to acquire drivers or acquire riders. And those are like a big chunk of money.
Demetrios [00:52:22]: Yeah. There's big decisions being made on that. So you want to make sure those decisions are correct.
Josh Xi [00:52:27]: Yeah. And also I think it's technically, it's also more feasible. Typically you can look at top regions or you can look at regions that you know there's going to be big events going on. So we can easily check. Okay. If a forecast during that time at that location is accurate. I guess we can do the same thing at cell level if we want to say, okay, this is the event, but then our forecast actually happens every minute for next 30 minutes or an hour. Ish.
Josh Xi [00:52:57]: So the only time we will see that forecast is like 30 minutes before that. And so I don't know if that's feasible. If the event happens at midnight, I don't know if we're going to wait to like midnight and quickly evaluate the performance. So everything's automated. We are more looking at okay, as the forecast happening, we observe the actual if the difference is bad or not. If it's bad, then let's do adjustments in the model like asap. So that's a sort of very different concept.