There is this chart from Alec Radford that became quite famous for showing a comparative run between different optimisers.
Let’s try replicating it ourselves. We will use the framework developed in our previous post for this.
Since this gif is 4 years old now, I’ll also add a few additional optimizers for comparison.
It looks almost the same! I wasn’t actually able to find the exact setup used for generating the gif (the initial starting point, and the learning rates used on all the optimizers) but I did a best effort approach on finding these and I thing my results replicate the behavior shown in the original image.
Since we’re here, let’s play around and learn how these optimizers behave in different settings!