My husband Bernie says that every mathematician has a favorite mathematical object, and if that is so, then my favorite mathematical objects are matrices. I try to chunk all my mathematical understandings into matrix expressions, essentially translating everything into my native language. It makes it easier for me to understand things, and to remember what I’ve understood.
In this writeup, I derive the backpropagation algorithm, which is an implementation of the chain rule for certain kinds of composite functions, in terms of matrices and matrix products.
This writeup owes a lot to the chapter on backpropagation in Michael Nielsen’s online book, “Neural Networks and Deep Learning“. It is just a slightly different way to look at things that sticks in my head a little better.
ReLU is a switch and other things: https://archive.org/details/afrozenneuralnetwork