This post tries to implement it in pure python to better understand it’s inner workings and then compare it to other popular implementations for cross-validation.

Cross entropy can be used to define a loss function in machine learning and is usually used when training a classification problem.

In information theory, the cross entropy between two probability distributions p and q over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution q, rather than the true distribution p. (source)