Thursday, March 16, 2017

Derivative of softmax with cross-entropy as loss function

The following are diagram and detailed procedure of how to obtain the derivative of softmax with cross-entropy as loss function. As back-propagation, the sensitivity map into output layer can be combined with the derivative of loss function and the derivative of activation function of output layer. That's the theoretic basis for the beautiful result in this post.