MLOps Community
+00:00 GMT
Sign in or Join the community to continue

PyTorch for Control Systems and Decision Making

Posted Dec 03, 2024 | Views 324
# PyTorch
# Control Systems and Decision Making
# Meta
Share
speakers
avatar
Vincent Moens
Research Engineer @ Meta

Vincent Moens is a research engineer on the PyTorch core team at Meta, based in London. As the maintainer of TorchRL (https://github.com/pytorch/rl) and TensorDict (https://github.com/pytorch/tensordict), Vincent plays a key role in supporting the decision-making community within the PyTorch ecosystem.

Alongside his technical role in the PyTorch community, Vincent also actively contributes to AI-related research projects.

Prior to joining Meta, Vincent worked as an ML researcher at Huawei and AIG.

Vincent holds an Medical Degree and a PhD in Computational Neuroscience.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
SUMMARY

PyTorch is widely adopted across the machine learning community for its flexibility and ease of use in applications such as computer vision and natural language processing. However, supporting reinforcement learning, decision-making, and control communities is equally crucial, as these fields drive innovation in areas like robotics, autonomous systems, and game-playing. This podcast explores the intersection of PyTorch and these fields, covering practical tips and tricks for working with PyTorch, an in-depth look at TorchRL, and discussions on debugging techniques, optimization strategies, and testing frameworks. By examining these topics, listeners will understand how to effectively use PyTorch for control systems and decision-making applications.

+ Read More
TRANSCRIPT

Vincent Moens [00:00:00]: Vincent Moens, Research engineer at Meta and I usually take a cappuccino while I get to work and a little bit later I take a flat white welcome.

Demetrios [00:00:12]: Back to the ML Ops Community podcast. I am your host Demetrios, and today we are talking with one of the people that makes Pytorch tick. This was a conversation that went so deep into the compiler and really it was tips and tricks for people who are using Pytorch and they want to be able to optimize their Pytorch use to the max. I'm hoping that you are able to walk away with at least nine tricks that make you a better Pytorch user. I don't want to spoil any of this conversation because it was all so good and I will not do it justice. So let's get right into it. I think I saw somewhere on LinkedIn that you shared oh, here's a few ways that I've seen could be better when people are dealing with Pytorch or here's a few tricks that you might not have thought about or Easy wins. I can't remember exactly how you phrased it, but it was just like, you might want to do this next time you're messing around in Pytorch and you probably remember what you posted.

Demetrios [00:01:28]: I don't remember because it's been over six months. Right. But tell us what that was.

Vincent Moens [00:01:33]: Yeah, so it's actually funny because we're gathering together with the folks in Pytorch at this off site in New York and when chatting with them we kind of realized that we had very different ideas of what was the most efficient basically to take a tensor from RAM and send it to Cuda. And if you look at the Pytorch doc, we have this thing that we basically tell people, oh well, if the tensor is in pin memory and you call dot to Cuda with notebooking equals true, your transfer will be asynchronous with respect to the host, which means that it's basically going to be faster because you can send more than one tensor at a time from CPU to ga. So okay, that sounds good. Let's try to profile that. Let's take a tensor like actually two, three or more tensors and call pinmemory to Cuda with not blocking course true and compare that with just the bare tensor to QQ and you might be surprised that the simple tensor Tokuda or the list of tensor Tokuda might actually be faster than putting your tensor in pin memory and then sending it to Cuda. And we digged a little bit into that. It was like, oh well, that seems a little bit strange. And so we looked at the documentation from our friends at Nvidia and all these kind of things.

Vincent Moens [00:02:58]: And so we came up with this tutorial about proper usage of PIN memory and what PIN memory actually is. So pin memory is a reserved part of your RAM where you're going to put things on hold for a little while before you send them to qda. You reserve that space and your CUDA device knows that it has to gather the data from there to take it on device. The thing is, copying your data from RAM to this special place that is PIN memory doesn't come for free, it's a copy of the data. And anytime you're sending data from RAM to Cuda, it's going to transition through PIN memory whether you want it or not. So if you call PIN memory, you're doing it yourself. If you don't call PIN memory, it's going to happen under the hood, like via little chunks of data or things like that, but it's going to happen anyway. So the thing is, if you call Tensor pinmemory, you're doing something explicitly that would be done otherwise implicitly.

Vincent Moens [00:03:55]: And so the thing is that the doc was a little bit misleading there.

Demetrios [00:04:00]: So basically the definition of over engineering.

Vincent Moens [00:04:04]: Yeah, exactly. And then I was chatting with my colleagues and I said, oh well, that's a surprising fact. And some of them told me, well you know, we should not worry about that too much because that's already pretty advanced. So we went on GitHub and we did the experiment, you know, of looking at Tensor pinmemory to something and we had a ton of results of people doing that. Because the doc kind of, kind of says you should without explicitly because if you read between the lines, it's not really like this is a proper way of doing it, but you might read it that way. And it's kind of that this common belief that yeah, it's a good thing to do it. And many people don't benchmark that obviously because we kind of told them to do that. So we're like, okay, let's try to correct things a little bit and write a proper tutorial where we experiment things and we try to give a consistent message.

Vincent Moens [00:04:56]: So I invite you to check the tutorial. It's full of infos and interesting facts about that. But the TLDR here is basically benchmark what you're doing. Ping memory might be working for you for some reason and if it doesn't, most of the time, if you're sending from CPU to Cuda, just doing not to Cuda with not blocking equals true will already give you a pretty decent amount of speed up that is sufficient to get away with. So yeah, that's the ending.

Demetrios [00:05:30]: Have you found any other areas where you didn't necessarily mean to direct people that way? Like here in the docs, it was just a little ambiguous and so people took it for granted and then felt like they should be doing it.

Vincent Moens [00:05:45]: Yeah.

Demetrios [00:05:46]: And where.

Vincent Moens [00:05:47]: Yeah, so another example that comes to mind is modifying or copying tensors in place. It's something that people do all the time, like add something or a relu module with inplace equals true. You see that all of the time. And the thing is that the gain in terms of performance is marginal, if anything. And it's a pain to deal with that if you're writing a compiler because you know that in Pytorch right now, very heavy, focused on TORCH compiler and compiling things where you have a single storage that happens to point to different things isn't very easy to deal with. And so the official guideline is do not do things in pace. You know, like it's probably not going to be faster and if you benchmark it, it's very likely that it's not going to be faster. And even if it is from the, if you try to compile that thing, it's going to be a little bit messy on our end and therefore also probably on your end, you know, because a compiled graph might not be as efficient as it could.

Vincent Moens [00:06:56]: So I think the compile guys are super skilled and so they dealt with that a long time ago. But the official guideline is basically try not to do things in place, you know, it might break things, it's not going to be faster.

Demetrios [00:07:08]: Yeah, where folks are looking for that extra little lift, that's probably not the place to be looking.

Vincent Moens [00:07:15]: Yeah, exactly. Yeah.

Demetrios [00:07:16]: So where is the place to be looking?

Vincent Moens [00:07:20]: Yeah, so now I'm going to put my hat on of the RL guy because I'm working on torchl, you know, the REINFORCE learning library. And one thing that is very specific about RL is that we're dealing with models that are rather tiny and that are called very, very frequently. So that basically means that we're heavily CPU overhead bound in the sense that very simple things like getting attributes in your module and things like that might take a crazy amount of time compared to the mathematical operations that you're doing, like matrix multiplications and things like that, and in this case, sometimes when I talk with people who are doing LLM, they don't believe me. But simple things like calling module eval takes a crazy amount of time because you're going to iterate over all the sub modules that you have and set an attribute and things like that. And then module is written in such a way that getting attributes is already kind of long. And so sometimes you see those training loops in RL where people, they're doing module train and then they do something and then they do module eval and they do something else, and then they redo module train and all of these things introduce sometimes an overhead that is like about 10, 15% of your runtime. And if you provide that, you're like, oh my gosh, what am I doing? I'm training my model for basically two days. And in those two days I have like four hours that are spent just going module train and module eval.

Vincent Moens [00:08:48]: And that seems totally crazy. And if you're not compiling, if we are in the eager side of things, a very easy way to throw that away is basically to create two copies of your module. So you're going to create one copy of your module with regular parameters, and then you're going to instantiate a second copy. There's a way to do that with what we call metatensors, which is, which are tensors that are not on any device. Like they're kind of fake tensors. Or if your model is very small, you can just create tensors that you're going to throw away later on. So the next thing that you're going to do is that you're going to take the parameters from the first module and you're going to copy those parameters on the second module. Not you have two modules that point to the same data storages, the same parameters.

Vincent Moens [00:09:34]: If you modify the parameters on the first module, they're modified on the second module, module two. And the thing is that now you have those two modules, but they're different. And so you can call eval on the second one. And now you have one module that behaves in eval mode and one module that behaves in train. And then you just use those two things. And rather than in the same module calling eval train, you just name the first one moduletrain, the second one module eval, and you're done. And then the magic happens. And there's a lot of things that you can do.

Vincent Moens [00:10:04]: Like that same thing with module requiresgrad faults, which people do to detach par parameters temporarily, and then they do Requires Grud true to re put the parameters in the graph. Again, you can do that with this very simple trick I just explained and create two copies of a model, one that requires grad and the other one that doesn't. There's a lot of things that you can do like that, and usually at least in RL where again, the models are tiny, called very frequently, etc. It can bring you like a very decent speedup.

Demetrios [00:10:35]: How much is too much of these copies? Because I can see someone that is taking it overboard and saying, all right, I'm going to do this. And then all of a sudden there's a sprawl and they have 30 or 40 different copies.

Vincent Moens [00:10:48]: Well, that it takes zero, virtually zero space in memory. You know, like those things are just like regular Python classes and there's no extra data being copied anywhere. So like you can do as many as you want. I don't think there's any upper bound that I know of.

Demetrios [00:11:07]: Well, so you did say and kind of glossed over Torch rl. I want to know everything about Torch rl. And also it is interesting that you mentioned these are being called frequently. These are CPU bound. It feels like a whole different world than most people or the hype plays in these days with the LLMs. Right, so yeah, give me the whole trade offs and what it is and where and why we use it.

Vincent Moens [00:11:35]: Yeah, sure. So, so the story of torchrail starts about four years ago when we had this idea that we wanted to build some more domain libraries. And we came up with things like, you know, Tort for recommendation systems and Torch rl, you know, for, for rl. And the story goes that I was basically hired specifically to that thing here in London. And so I sat down and I started reading documents, you know, of people at Meta and people outside of Meta who had written about what they wanted to see as an RL library. And there was a lot of things, like people were sometimes saying, oh, we just want functionals, things like value functions and these kind of things that can be hard to do, right, can be hard to do efficiently, but if we have those functionalities, it's going to help us tremendously. But we don't want anything too high level because like, we're scientists, you know, are very researchy space. So we're going to deal with that ourselves.

Vincent Moens [00:12:32]: And then other people were like, oh, well, we want distributed things like distributed replay buffers. A replay buffer is basically a dynamic storage that we have in a rail. You fill it, you empty it all the time because your model is changing. So the data you're looking at is changing with the model because it's very dynamic. So they wanted these kind of solutions distributed across nodes. So there was a lot of ask and nothing really consistent. And then we sat a little bit and we said, well, wait a minute. The thing is that RL is very heterogeneous.

Vincent Moens [00:13:02]: In RL, you have things like AlphaGo, you know, like MCTS kind of stuff, you know, that are very trendy, very complicated. You have other things like, you know, DQN to play Atari games. But you also have people who are using RL to do, I don't know, portfolio management or autonomous driving or robotics or, you know, it goes a bit all over the space of applications that, you know, and also like robots playing soccer, this kind of thing. So it's really crazy how many things you can do with rl. But the thing is that if you're writing a library, I always compare myself to my colleagues at Torchvision, which I think have much easier life because they're like, oh, well, I know exactly what's going to come in my modules. You know, it's going to be an image and the output is going to be something like a label or maybe, I don't know, something that has to do with an image, you know, so the space of possibilities is very limited. You know, it's going to be an image or a video, plus maybe a bounding box, plus maybe a label. But you're never going to find something as crazy as, I don't know, a balance sheet or something like that.

Vincent Moens [00:14:10]: In rl, you can have anything, like the data is basically anything. And the other thing is also RL is not about. Because RL is not about the media. RL is about the algorithm. And the algorithm basically means that what you call a policy, this module that is making decisions in your environment, it can be very different depending on the algorithm. PPO that we use. For instance, for RLHF has a policy that outputs an action probabilistically given a certain probability distribution, and it also outputs the log probability of that action conditioned on the distribution. But that's not the case for dqn.

Vincent Moens [00:14:49]: DQN doesn't output actually an action. DQN outputs a value that then you transform into an action, and you don't have this concept of low probability of the action. So you have two very similar things, two basically modules that are aimed at making a decision in the environment, and they have a very different signature. And we're like, okay, how are we going to build a library around things that have kind of this common flavor of being a policy or a replay buffer that stores data or, I don't know, like a value function when the signatures are so different. And that seemed like an impossible problem to solve. And then we said, okay, what we're going to do is that we're going to say, okay, any policy in torchl is going to receive sort of a dictionary as input. It's going to do its job and it's going to output a dictionary. And once we have abstracted what the signature of the policy is, we can say, okay, the policy just goes there in the pipeline.

Vincent Moens [00:15:45]: And same thing with the replay buffer. The replay buffer receives a dictionary and does its own business with that dictionary. And when you sample from that replay buffer, you're going to get another dictionary. And we're like, okay, that seems like a pretty decent way to go forward. But dealing with dictionaries is not very convenient if you have two dictionaries. Like, if I have an environment and I do 100 steps in this environment and then I try to stack those dictionaries together and represent them contiguously in memory, I might use something like pytree. And pytree is definitely like a proper solution to do that. But it's not very intuitive how you're going to call pytree over your set of dictionaries to stack things together.

Vincent Moens [00:16:23]: And so we thought, okay, why don't we come up with another abstraction that allows us to easily stack those dictionaries together? And we came up with this thing called tensordict. So initially, tensordict was just a class living in torchevel that basically behaves like a dictionary, but has some extra features from Torso Tensor. And those extra features are, like I was saying, stacking tensordicts together, reshaping them, but also other stuff like sending a tensordict from your CPU to CUDA or from CUDA to cpu, which we do a lot in RL because we need to save things in the replay buffer, other things like communicating between the two different nodes. So it has a lot of distributed features and it has a lot of utils. Like if you have a tensordict, you can call tensordict byte. And then you're going to get the amount of data that is contained in your tensordict such that you can say, oh, well, if one single element that I get here is like 1 megabyte big and I need to store 1 million of them, my replay buffer needs to be as big as that, Do I have enough RAM or don't I to make this kind of decisions? Also if you attend something that contains Llama 3.2, you can look at that and just call the bytes and you're going to know how big is your Llama model, basically? So it has all of these features. So the story is that we had this thing living in Torch Isle and the early users were like, man, this is amazing. I'm using that all of the time.

Vincent Moens [00:17:48]: But not only in rl, in other things like unsupervised learning or semi supervised learning.

Demetrios [00:17:56]: Because it had that modular ability or it was able to have that flexibility. Since you said we're just going to have these dictionary types and we don't really care much about how you're using it, just we give you everything you need to use it, how you want to use it.

Vincent Moens [00:18:14]: Yeah, exactly. And from the beginning we really tried for tensordict to be distinct from rl. Like there was probably one or two bad decisions I made at the beginning, you know, in terms of designing tensordict. And I quickly like backtracked and said, no, no, this, this is overfitted to rl, we're not going to do that because it doesn't feel right. And so those users came to us and said, man, you should do another library with tensordict. Which we did. So tensordict is its own thing, separate from torchl. Obviously there's a big chunk of the community using tensordict that are users from torchl also because of probably historical reasons, but today the users of TenSordict are also like a lot of people working in Gen AI, specifically like diffusion models and things like that.

Vincent Moens [00:19:05]: TenSordict can also store non tensor data, which is very helpful. Like again, if you're doing, for instance, portal optimization, which as I mentioned before, because you can store, I don't know, headlines that go with your financial data or things like that, you can basically store anything in a tensor if you do tensordict cuda, obviously if you have a string, the string will not be sent to cuda. That doesn't mean anything because it's not a tensor, but all the tensors will behave like as expected. And the other use case for TenSordict is also TenSordict. I'm talking a lot about TenSordict to store data, but you can also use TenSordict to store parameters. And then again to come back to rl, we started using tensordict for another kind of things, which are functional calls. So functional calls are basically that you take the parameters out of your module, you store them in a data structure and then you can do things like stacking parameters together and calling a vectorized map. So instead of Looping over different configurations of your model, execute all the configurations of the model all at once over the stack of parameters, which is much faster because things are vectorized.

Vincent Moens [00:20:07]: And using tensordict that is very easy because again, if I find five different models, I ask tensordict to gather the parameters, I stack those parameters together in a contiguous, in a set of contiguous tensors, and then I just called my module, functionally dispatch those things. And that's much faster, you know, than looking over things. So yeah, that's kind of the story of torchlanel. And that's what made, you know, torchlanel something that is very useful all across the board, you know, for our use cases. So what I'm always surprised and happy about is when I'm talking with users and people come to you and they say, oh, I'm using torchuttl to do drug design. And you're like, gosh, this is amazing. Like, I had no idea you could do generic design with that thing. And you know, it really kind of gives me the impression that we did, we did a good job, you know, in providing a library that could be used, you know, for very different, a very different set of things.

Demetrios [00:21:01]: But it does sound that that was almost one of the design principles that you had in mind was how can we make sure that this is not overfit on that? RL and did you, if you remember, because I know it was four years ago, how did you figure out that this is what we're looking for and this is the abstraction layer that we want to play in?

Vincent Moens [00:21:22]: Yeah, so that's a good point, definitely. RL libraries all have this kind of solution where they have a data container that they use, you know, to carry data. And even some of them, I guess by chance also name it tensordit is kind of like the generic name. So you might find other tensordict in other libraries. I guess that here the specificities of tensordict is really how it blends within the Pytorch ecosystem. So compared to other solutions, we really try tensordict to have like this one to one equivalence with Torch Tensor. So think about this, you have your single tensor and you're writing an optimizer for a single tensor. So you read the Atom paper, for instance, and you're like, you go to the algorithm section and they're like, okay, Kiligma is giving me this example of how to write my optimizer, write my line for one particular tensor.

Vincent Moens [00:22:16]: So I'm going to rewrite that in Pytorch for one particular tensor and then you write this function and instead of passing a tensor, you pass a tensor leak and all those mathematical operations like tensor epsilon times epsilon plus blah blah. You do that with a tensordict and it's going to work like you can literally write a code for a single tensor and just pass tensordict instead and things are going to work out of the box. And that's really like what distinguishes TenSordict from existing solutions in other RL frameworks which really build a class for RL use cases, which is great because they don't need to think about anything else. I guess that in our case we're like, okay, let's try to invest in making that things bigger and a little bit more ambitious for broader community, which I think is very interesting to talk about here.

Demetrios [00:23:08]: So you started working on this four years ago in 2020. And that was right at like this, this explosion in my mind of folks recognizing that there was really something here and the developer experience was nice. You just kind of walked us through how you were thinking about creating abstractions and it sounds like you were servicing people inside of Facebook first. Right. And then. Or inside of Meta, but then for a greater use case, you recognized. Yeah, there's something here. How much of the developer experience played a part in your mind as you were building versus just we want it to do what it needs to do.

Demetrios [00:23:56]: Because it really sounds, as you're explaining it all, it really sounds like you're thinking through those two vectors. You're thinking through it needs to do what it needs to do. But also we want to make it so people don't have to go and create new stack overflow threads because now this tensor doesn't line up and we made it overly complicated.

Vincent Moens [00:24:17]: Yeah. So first I would say that Torchel is a little bit peculiar in the sense that we really developed it for the open source community. Um, it's something that I, I really cherish a lot. So let me backtrack a little bit. I actually have a background in medicine. I used to be a medical doctor like back in the days and then I did a PhD in neuroscience. So I'm, I'm, I kind of have a scientific background if you want. And I really have this strong belief that it's through collaborative development that we can move things forward.

Vincent Moens [00:24:51]: And I think that's what also motivates a lot. You know, the pytorch Folks, about 50% of the collaborations in Pytorch are from the open source community, you know, so we know that just us is not enough to bring Pytorch at Full speed. And I think it's the same in science, you know, like a single lab will not discover anything. It's because you have multiple people publishing and talking with each other that you build knowledge. And I'm really convinced and I'm really proud to work for a company that puts so much emphasis on open source. I think that's really amazing and I think it's a winning bet for many reasons. But anyway, so yeah, torchl is very open source oriented now regarding developer experience, I think what made the success of Pytorch was really that, well, first eager mode story, you know, that Pytorch was you didn't need to compile a graph to execute Pytorch and so you could like put breaks and prints and things like that or you put if statements and they walk out of the box. So that was very handy.

Vincent Moens [00:25:59]: But the other thing was Pytorch had included within a lot of stuff like NN module or optimizers and it's all packaged in. So once you install Pytorch you have all of those tools that are presented to you and they're very Pythonic, they're very easy to use. We once talked with a very high end user who told us what I love about Pytorch is that in 15 minutes you can get started, you know, like you just need one longish tutorial and it's going to go through, you know, NN module and data loaders and optimizers and you can build your first model and train that on mnist or something, you know, and you're done, you know, like 15 minutes and you have understood the basics, which I find the basic, like it's mind boggling if you think about it that we're talking about. One of the things that most people think is the, one of the most complex achievements of humankind, namely AI. And you can learn so much about that in such a little time, you know, it's kind of nicely done, it's not trivial. I think that we developers try to keep that in mind, this kind of whitewash spirit when we develop new stuff. So I talked about tensordict and that's definitely what I'm thinking about. I'm like let's try to find the most Pytorchy API or something that will be easy to use and does not require a learning curve that is too long and does not change the way you think about things too much.

Vincent Moens [00:27:31]: Another thing for instance that we recently open sourced is distributed tensors node tensors. And the idea there is that you have one tensor that you shard across multiple nodes that is represented as a single object. And when you manipulate that object, you're actually manipulating the sharded version of that across multiple nodes. I think it's a brilliant idea because it simplifies things a lot for you. You're looking at that thing, you're like, is it a tensor, is it a detailsor? I don't care, I can write code for that. It's going to work out of the box whether it's distributed or not. This kind of thing, I think is super duper handy. And I would bet that this is the kind of stuff that is going to guarantee Python chooseability on the long term.

Demetrios [00:28:14]: And when would that be useful? Specifically the distributed.

Vincent Moens [00:28:19]: Well, anytime you're writing a code and you're like, oh, well, I want to apply this operation across different nodes. And for instance, you just want to sum two tensor hours and you want your code to work either on a single node or on multiple nodes. Well, use tensor one, tensor two. And if those two things are detentors, it's going to work as it works with a single tensor on a single machine. So basically, if you want to unit test your code, you don't need to have like, if I'm in the distributed setting, do this, and if I'm not, do that, these kind of things that are very clunky and then you read that thing and like, oh my gosh, what's happening there? And when you're exploring the code, it's very difficult here you kind of abstract that, that distributed factor away and you're basically separating things, you know, a little bit. Like I was saying, like what I was saying with Torchal, you know, when I was saying, well, we have one policy and it does different things. And then you're like, I try to abstract that away and separate things and you know, get the nice API where you don't need to care too much about that. I think this is the same thing with business, or in my understanding, it.

Demetrios [00:29:22]: Really sounds like you're looking at probably these, the most common use cases or the most common ways that people are utilizing this and recognizing what someone is looking for as far as the output, or basically like looking at the steps that they're doing and then saying to yourself, can we make these 10 steps five, and can we still keep all of the usability and all of the knobs that if somebody wants to go and they want to figure things out, they can, but there's probably some bloat there. And so how do we get rid of the bloat?

Vincent Moens [00:30:04]: Yeah, there's that and there's also like, obviously, you know, we're obviously working very closely with a lot of folks in the community, you know, like within Meta, you know that we have like Llama team and Genai, but also outside of meta, a lot of those big companies and startups are using Pytorch and are collaborating with us, you know, in implementing stuff. And so we kind of have a lot of feedback from users who are telling us, and we're very, you know, aware of that and really seeking for that actively, you know, going to people and saying, okay, what are the pain points right now? Like, if you had, if you had the magic one and you could like solve something in Pytorch or make something easier, what would you do? And usually it boils on to what you were saying, you know, like boilerplate code. Every time I'm doing this, I'm going in the, in the doc or in some tutorial and copy pasting those five lines. If people tell you that it's not good, like that means that you're missing a point. There's something easier that you, that you should provide as an API. And we try to engage with people to understand what those things could be.

Demetrios [00:31:10]: So that brings up another really great point is where have you seen some challenges that you haven't gotten to fix or like debugging challenges or things that you just feel like, oof, these are some hairy problems. We haven't made the developer experience perfect here yet, but we are maybe going to, or we're thinking about it. And here's a few best practices that you can share with us right now that you've seen others doing.

Vincent Moens [00:31:43]: I mentioned a couple of times already, Torso Compile. I think Torso Compile is getting to a stage right now where it's very mature, you know, and actually I was really surprised in rld. You know, I tried to use it, you know, about six months ago for the various things that we have. And we saw some amazing speed ups when using compile correctly. And I think that the real learning curve here was how to use compile correctly. So there are two sides to that story. The first side is Compile is being actively developed. So the way compile works in a nutshell is that it looks at your code, goes through your code and try to interpret your Python code to compile it.

Vincent Moens [00:32:27]: And if you have things that are not accounted for by compile, because the operation is not registered, Compile is going to graph break. So it's going to say, okay, here I don't know what to do. I'm going to fall back on Python. And everything that comes before that is going to be one graph and everything that goes after that is going to be another graph. Obviously assuming that everything that comes after has the graph break. Now you compile those two things and you have a graph break in the middle. So the game for both the developers and the user is to have as few graph breaks as possible. So the first thing is from the developer side, what we like is having people coming to us and say, hey guys, I looked at, you know, the report of my compiled code.

Vincent Moens [00:33:07]: It seems that you did not account for that. A good example is something that we pinpoint to our friends in the compile team a few weeks ago, which was, you know, there is this thing called Hydra. You probably know of Hydra. We know it's a library to do configurations of experiments and things like that. Hydra relies on something called Omega conf. Omega Conf has like this dictionary of configs, you know, things like that. And when you call a config object.something, so when you get an attribute of that config, it was graph breaking. That's not very good because sometimes people do in their code if config clipgrad, then clip the grad these kind of things.

Vincent Moens [00:33:48]: But that means that every time you reach that point you're going to graph break and not ideal, maybe you don't want to do that. So we realized that there were graph breaks at this point and that was slowing the code for the users and we fixed that. So right now our friends from Torch Dynamo fix that thing. And if you have a Hydra config that has been called in your graph, it's not good graph week anymore. So that's on our end. But on the user end it might also be the case that like, okay, I'm going to report to them that this thing and that thing could be optimized. But in the meantime I'm going to try to find a workaround, find a version of my code that does not cause this kind of graphic. And so we have something called torchlocks.

Vincent Moens [00:34:29]: So if you Google that, torch logs, you're going to find all the flags that you can put. So this is basically an environment variable that you set before running your code. And it basically tells Pytorch to tell you all of the things, for instance, that cause graph breaks. So it's going to tell you at this line in your code, it's something that was not registered. So I'M going to put a graph break there. And you read that and you're like, okay, that seems like something I could do differently. Typical example is an if statement or these kind of things. An if statement can be sometimes rewritten with like Tors wire, you know, and that's probably going to help you to solve your graph break, but you have a lot of other things that you can see.

Vincent Moens [00:35:14]: The other thing that Torso Compile does is that it puts a certain number of guards on your code. So when you execute the code the first time within Torch Compile, it's going to say, okay, I have this kind of inputs, for instance, an integer and there's somewhere in the code, if the integer equals 5, I'm making this up. But it could be the case if this integer is 5, do something and then it puts a guard on the integer and says, okay, before executing the compiled code, I'm going to check that the integer is 5. If the integer is 5, I don't need to put any graph big because I'm happy I can do a single graph. But then you change your integer and the integer becomes 4. Then compile is going to do two things. The first thing is that it's going to put a graph break at the if statement. And the other thing is that it's going to recompile things for you and do a second version of the compiled code to rerun this particular use case where the integer is null.

Vincent Moens [00:36:04]: Now, the thing is, you can look at that, you can look at what the guards are and what the recompiles are, and you can try to solve these things and be proactive basically in making Compile happy. So it can be a little bit of work, but once you get to work with it, it becomes very intuitive. Actually. I was really surprised because at the beginning I was like, oh my gosh, they're really asking me to do a lot of stuff. But it became a little bit funny. I actually like to optimize my models and make them as performant as possible. So you have this boost of dopamine, you rerun your model, it's two times faster. Just because you rewrote one line of code and you're like, oh my gosh, this is fantastic.

Vincent Moens [00:36:39]: So in rl, by doing this kind of stuff, we got models to run sometimes six to eight times faster than what they used to without compile. So that, that was very, very rewarding.

Demetrios [00:36:50]: Well, even just talking about that environment variable. What was that one again?

Vincent Moens [00:36:56]: Torch underscore logs.

Demetrios [00:36:58]: So that is incredible because you can see where you need to change things around. And it's almost like that cliche. You can't track what you don't measure. And so if you're not seeing where it's breaking, then how are you supposed to know how to make your code better and how to compile it better? What is the best way to be a productive community member? And when we see there is something that for sure could be better, bring that to the team so that it at least is seen and hopefully actioned on.

Vincent Moens [00:37:43]: Yeah, well, very simply, GitHub issues, you go on Pytorch, like Pytorch, and you submit an issue. If it's not Pytorch, it's another library like TorchVision. Go on TorchVision's GitHub and submit the issue. And one thing also is, well, try to be reasonable with what I'm going to say. But people are very busy. Everyone's busy. You're busy, I'm busy, everyone's busy. So keep in mind that if people do not answer, it's not necessarily that they have anything against you or they don't think that this problem is important to address or something.

Vincent Moens [00:38:15]: It's probably just that it slipped through or they're busy doing something else. So if there's something that annoys you and you submit an issue and no one has answered or no one has moved the needle, there are two things that you can do. First thing is ping them. You wait two weeks, three weeks, and if you really see that nothing's moving, you just say, hey, guys. You tag the person who's assigned to that issue and you say, hey, guys, I was wondering, how are things moving over there? It's usually better to do that than just dropping it because you're not helping anyone by just saying, oh, well, I'm going to try to find something else, you know. And the other thing is you can be an active contributor and you can have a look at that and try to figure out, you know, do a little bit of homework, not necessarily fixing the issue, but dig a little bit more, you know, and say, okay, I'm going to try to narrow it down for them, you know, because one thing is saying, if I execute that code, it's broken. The other thing is to do the extra step, you know, of going through the code and understanding what's going on. It's sometimes a little bit harder with Torsion compile because it's very intricated.

Vincent Moens [00:39:23]: And even myself, like, for finding what we call the minimal reproducible example is not easy in Compile. To be honest, it's sometimes very, very difficult because it depends so much on the whole stack of things that you're putting together that producing this minimal reproducible example isn't easy. But fortunately the guys from Torch Compile have a lot of tooling to understand what's going on. So if you submit an issue with compile, they will probably come back to you and say, install that thing or put torch logs equals something and tell me copy paste the print of what you get there and then you're going to copy paste and it will be like, okay, it's this line of code. Can you tell me what this line of code is doing? And this conversation can keep on going and then eventually you will be able to solve that. So the take home message here is really, if you're dealing with torso compile and you have an issue and you cannot find a minimal reproducible example, don't worry about that, submit that issue, let people know and start the conversation as early as you can. And there's probably something that can be fixed too.

Demetrios [00:40:29]: What do you think of when you think of a very well put together GitHub issue? And let's take for torch RL, what really makes you see an issue and go, wow, that is nice that they went through all of that.

Vincent Moens [00:40:47]: Yeah. So what I like to know is for personal curiosity, but also sometimes it's important for the issue what you're trying to do. Just saying if I run this, it breaks. Okay, but what was their intention? You know, like for instance, if I create this object and I put this parameter equals blah blah, it breaks and blah blah doesn't really make sense. You know, if you look at the door string, it's not something that is expected and you're like, okay, so what was it that you were trying to do? Because maybe we can help you. Maybe what you're trying to do is not like the way we thought about it and maybe that's totally valid to do it that way. We just didn't think about it. But you know, like try to explain what's the broader context once you have done that.

Vincent Moens [00:41:28]: If you can, try to find a minimal reproducible example, saying if I run Python myscript py it breaks at this point is not going to help me because I don't necessarily want to install a new virtual environment with all of the dependencies that you have on the same machine that you have, you know, like a Docker image or something and try to rerun the exact same thing to see where the problem is. So if you can, you know, with a minimal amount of dependencies and minimal amount of code, rerun the problem and find what the problem was, that's much better. Obviously it's not like I was saying, with original compiler, sometimes it's. It's just not possible. And that's okay, but just like, let us know. I try to do a minimal representative example. Couldn't do it in like 30 minutes or so or 20 minutes or whatever. So here's the issue and that's all right.

Vincent Moens [00:42:20]: We can find a way. And the last thing is, if you want and you are interested, try to dig in the code, look at first the doc, make sure that it's not documented already. Like if there's a huge warning in the doc strings that say you should not be doing that and you do that, that's not perfect. So read the doc of the class that's breaking and these kind of things, try to see if it's already documented, try to see if there's not another issue that documents that problem. And finally, if you can, well, help us identify the problem like that, we don't have to, but that's totally optional, obviously. But I know that a lot of users like to do that because they like to understand what the code's really doing and that gives them the sense of mastering the library better. And so that has some value, I think, for users too, you know, to do this kind of extra step. But it requires time sometimes.

Vincent Moens [00:43:11]: So. Yeah, yeah.

Demetrios [00:43:13]: Talk to me about testing Pytorch models. I know that we wanted to get into this and I'm wondering a little bit about if you yourself have any tips and tricks, because up until now this has been just a treasure trove of tips and tricks. I'm learning a ton, so thankfully for that. But when it comes to libraries or frameworks, with testing the specific Pytorch models, is there anything that you have to tell us that you can enlighten us with?

Vincent Moens [00:43:44]: Yeah, sure. So in the Pytorch code base, people usually use the built in unit test library, but in Torch Vision and Torchl we like to use PyTest. So, you know, in Torchvision you have for instance, a set of transforms, you know, to transform your image from color to grayscale and these kind of things. We have the same kind of transforms in Torchial to do common operations like reshaping, permuting the dimensions, sending data from one device to the other, these kind of things. So all of those transforms share kind of the same signatures. Usually you want to Execute them on single thread or multiple thread. You want them to be used maybe in the replay buffer, because we want those transformers either to play with the environment when you're collecting data, but also to try transform the data when you're sampling it from the data set. So we want those things to work in various scenarios.

Vincent Moens [00:44:39]: So what we have is that we have an abstract class that tests the transforms and we ask the users, the developers, actually the contributors, when they develop a new transform to basically copy paste that base class and write an example of that for the transform that they're implementing. And like that, we make sure that the transformation ticks the boxes that it needs to like. It needs to work in a single thread, it needs to work in multiple threads, it needs to work with a replay buffer, it needs to work as a module, et cetera, et cetera. So we really ask people to abide by this rule and use that abstract kind of checklist. And we do the same in tensordict. So in the tensordict library we have the base class called tensordict and it has all these nice features. But one thing I didn't talk about is that there's a data class version of that, which is called tensor class. So you decorate your data class instead of doing etc, you do esotericlass.

Vincent Moens [00:45:37]: And then magically your data class now has a lot of attributes like dots to send to a certain device, or you can add tensor references together. You can do some very fancy stuff like you would do again with a regular tensor. So when you write test, we have all this battery of tests for tensor data. And what we do is that we iterate the test over tensordict and then we iterate the test over a tensor class, it says, and then we do that again for another type of tensordict that stores H5 databases. So we have all of those sets that we basically decline over the various instances of tensordict that we had. And that proved to be very useful to catch edge cases, to say, oh well, we're designing this new method, like for instance, this new arithmetic operation that was not implemented before. Let's test that arithmetic operation over all the tensordict classes that we had. And then we do that.

Vincent Moens [00:46:34]: And then usually we catch, oh well, here's a warning that is being raised. Here's this behavior is not really properly defined with for instance, an H5. Typically with a regular tensordict, tensordict to CUDA is going to send your tensordict to CUDA. And yeah, that makes sense. It was a tensor date, it's still tensordict. Now if your tensordict is containing an H5 database and you send that HY database to CUDA. What's the expected behavior? Like, what should that thing do? So by testing things systematically, we kind of like, oh, yeah, I did not think about that. What should we do there? And then it pushes you to write the proper doc that goes with it, etc.

Vincent Moens [00:47:13]: Etc. So try to be systematic in your test. That's basically the idea. Implementing a new feature, writing a new block of tests and just saying, I'm done with it, I'm happy. Usually if you don't have a systematic way to test things and your project is ambitious, it's not going to cut it. And from the contributor side of things, that's probably something I should add here, which is, if you're working on new features, the proper way to do that is first to raise an issue on the repo. To go on the repo, raise an issue and say, I have this idea about the feature. What do you guys think about that? Because it might be the case that you're going to work a week on that feature, you're going to think, oh, it's marvelous, everyone needs that, and either it already exists, or people don't think it's actually that useful because there's already a way to do that or whatever.

Demetrios [00:48:01]: It's not possible.

Vincent Moens [00:48:02]: Yeah, engage with the maintainers and try to seek advice. And once you're done with that, don't hesitate to ask, how should I test that? Because then they were going to go to you and say, oh, well, you know, we have this way of testing things. Typically, like I was saying in Tezoliq Torch, we have this battery of tests that we run automatically. Here is where the doc is, here is how you should do it. Here is an example of a PR that does what you want to do. And by gathering that information, you can save yourself actually a lot of time and save the reviewer a lot of time, because that person is going to have to, you know, review your code and tell you, oh, well, you need to modify that. And that's, you know, sometimes spent for everyone. So if you can avoid that, that's better.

Demetrios [00:48:45]: What are the things you're most excited about working on in the next six months when it comes to Pytorch?

Vincent Moens [00:48:53]: Yeah, so I'm, I'm, I'm really excited about tensordict. I think that we really have like an API that is so generic. And the reason why I'm really excited about that is that I'm looking back, I sometimes try to do that exercise, putting myself like six years ago and looking at Pytorch in its early days, and I'm thinking, if we had had tensordict at that time, what would have gone differently? What kind of thing would have been easier to code? And there's a lot of things like functional calls would have been easier. I don't know if you noticed, but in Pytorch, when you're building an optimizer, you actually don't pass a state dict, but you pass a list of parameters. That's kind of weird because when you represent your parameters to serialize them, you build a state dict. So why don't we use the same thing for both? If we had had tensor dict at the time within Pytorch, the Pytorch ecosystem, we would probably have said a state dict is a tensor dict and the optimizer takes a tensor dict as input. At least I hope so. And that would have been like everything is unified.

Vincent Moens [00:50:01]: We would have said we represent parameters always as a tensor dict. And that simplifies things in terms of writing the optimizer, in terms of passing things, in terms of serenizing your optimizer or your model, things would have been, I think, probably better for some things. And I come from a field where people do a lot of fancy stuff like higher order derivatives and things like that. And again, I think when I work with tensordict, it's so much easier to DCM stuff. So I'm really excited about that. And the other thing is, one of the things I'm working on right now is basically to work on the planning side of things. So, you know, planning is something that is super important, obviously in rl to play games like Go or chess. Also in robotics, your robot needs to plan the movements it's going to do if you don't want it to crash into a world or something.

Vincent Moens [00:50:58]: But also, obviously, in the realm of LLMs and things, planning is becoming always more important. With Strawberry and all these kind of things, we definitely saw that. There's a huge trend there too. And I think torchl throughout its audit has like the right API to solve these kind of problems, because we deal very easily with things like representation of tree of thoughts and these kind of things. And I'm really excited about seeing where we can go in that space using this kind of primitives. So that's going to be part of my focus in the next month. But yeah, hopefully we will be able to ship some exciting stuff in the future.

Demetrios [00:51:40]: Are there any frameworks or libraries or Anything that you're using that we should know about?

Vincent Moens [00:51:47]: Let me think about that.

Demetrios [00:51:49]: And while you're thinking, it's just because I'm subscribed to this like, tools Dev or something, devtools.something newsletter. And one of the coolest things is that they just have a bunch of random new frameworks and libraries and tools that come out every week. And it's like, oh, wow, 90% of them are not useful for me, but I still see them and know that they're out there. And it's kind of like, ah, cool. I guess that seems like it's a novel way of doing things. Right. And so it's interesting when that happens. And I know that whenever we have meetups, some of the most passionate conversations amongst the people in the meetups are, have you seen this tool? Oh, it has this feature that you can do and you should really check it out next time.

Demetrios [00:52:40]: And so that's almost like some of the fun stuff that happens when you get together with people.

Vincent Moens [00:52:46]: Yeah. So one thing I'm. I'm using a lot for my experiments and that I mentioned before is Hydra. I really think that it simplifies a lot, you know, experimenting with things. So basically, the one very nice thing about Hydra and one of the reasons it's used a lot in the research community, but I think it's underused. And what I'm trying to say there is with Hydra, one very powerful thing is that you can easily do sweeps of your parameter configuration. So you can easily say, okay, I want to try a learning rate of 0.1, 0.01, 0.001. And it's just one line comment.

Vincent Moens [00:53:26]: You know, when you're running your experiment, you just say, try those three different things and then you can sync that with your 1 DB account and it's going to run the three experiments and then you get the results. And it's amazing. You get that and you can look at that and you're like, okay, this one is better than this other one. Another thing that we're using a lot is this library called Submitted. And that's something that I'm using also heavily to run experiments and Submitted. The idea there is that it's. If you know about Slurm on a cluster, you're basically telling Slurm, okay, run those different experiments. Submit.

Vincent Moens [00:53:58]: It is a way to write this kind of stuff, but in Python. And so you can very easily interact with your model and say, okay, I'm going to create this instance of my model, but you're going to run that with this different set of parameters, it becomes very handy and much more intuitive for people who are used to use Python. So, basically, you can configure things with Slurm in a very flexible way without having to digest the whole, like, Slurm Doc or something like that, which I've.

Demetrios [00:54:27]: Heard is very painful sometimes.

Vincent Moens [00:54:30]: Yeah. It's not always obvious, especially if you look at things like what's defined as a job and as a task. And, like, it can be, oh, my gosh, what are we talking about? Or how many GPUs per node per task? And you're like, oh, I'm losing my mind around that. So hopefully some minute can help, you know, to simplify this things a little.

Demetrios [00:54:49]: And for Hydra, you are using it with another, like, experiment tracker such as wandb or ML Flow and that.

Vincent Moens [00:54:57]: Yeah.

Demetrios [00:54:58]: So they're complimentary.

Vincent Moens [00:55:00]: Yeah, they're totally complementary. Yeah. Actually, Hydra integrates very, very well with. With these kind of tools. Yeah.

Demetrios [00:55:05]: Oh, fascinating. All right, cool. Well, I got to check that out. Laura. I haven't seen that, man. Vincent. Awesome dude, baby.

+ Read More

Watch More

1:00:02
Open Source and Fast Decision Making: Rob Hirschfeld on the Future of Software Development
Posted Jul 04, 2023 | Views 750
# DevOps Movement
# API Provision
# RackN.com
Managing Small Knowledge Graphs for Multi-agent Systems
Posted May 28, 2024 | Views 1.6K
# Knowledge Graphs
# Generative AI
# RAG
# Whyhow.ai