Information-Theory
-
Cross-Entropy
Cross-entropy measures the average number of bits needed to encode samples from a true distribution using a model distribution. It is the standard training loss for language models and the basis of perplexity.