How I think about the neural network backpropagation algorithm

My husband Bernie says that every mathematician has a favorite mathematical object, and if that is so, then my favorite mathematical objects are matrices. I try to chunk all my mathematical understandings into matrix expressions, essentially translating everything into my native language. It makes it easier for me to understand things, and to remember what I’ve understood.

In this writeup, I derive the backpropagation algorithm, which is an implementation of the chain rule for certain kinds of composite functions, in terms of matrices and matrix products.

This writeup owes a lot to the chapter on backpropagation in Michael Nielsen’s online book, “Neural Networks and Deep Learning“. It is just a slightly different way to look at things that sticks in my head a little better.

Posted in Uncategorized | Leave a comment

Mathematical Derivation of the Extended Kalman Filter

This writeup is an extension of the writeup I posted last week, Mathematical Derivation of the Bayes and Kalman Filters. Both these writeups were written when I was studying Probabilistic Robotics by Sebastian Thrun, Wolfram Burgard, and Dieter Fox; I think that book is awesome, and I really wanted to understand all the mathematical details. This writeup should be viewed as a supplement to Chapter 3.3 in that book.

The Extended Kalman Filter is an extension of the basic Kalman filter, which requires linear transition models and measurement models for each step, to the case where the transition and measurement models are nonlinear. EKFs aren’t as widely applicable as certain other popular Bayes Filter methods (cough particle filters), because they can’t represent as wide a range of types of belief distributions. Still, they work well for certain problems and apparently are widely used in practice.

Next week I’ll post something different, but first I needed to get all the Kalman Filter stuff out of my system. Do you have any requests for writeups on other applied math topics?

This is a “level 3” writeup: for grad students and hardcore practitioners.

Posted in Uncategorized | Tagged , , | Leave a comment

Mathematical Derivation of the Bayes and Kalman filters

I read a lot of technical documents, but sometimes I just don’t get them without a lot of extra work. In particular, I’m prone to getting stuck on mathematical conclusions that I don’t follow (usually because there is a step or two that the author assumed without proving). I’ve found that if I work through the details, and write up them up as clearly as possible, the writeup itself has value for me down the line when I need to remember the fine points.

I’m going to start posting some of these writeups here, where they can unstick someone else. This first writeup was written earlier this year when I was studying Probabilistic Robotics by Sebastian Thrun, Wolfram Burgard, and Dieter Fox; I think that book has the best, most intuitive treatment of the Kalman Filter I’ve ever read.

This is a “level 3” writeup: for grad students and hardcore practitioners.

Posted in Uncategorized | Tagged , , | 1 Comment