GAN Loss & Training

How GANs Are Trained: Generator doesn’t have the typical Cross Entropy or something else as Loss Function but it’s still Supervised Fine tuning. How?

*I am not going to dive deep into the mathematics or theoretical deduction—this blog is focused on breaking down the tricky bits of GANs in a way that actually makes sense.*

Okay, let’s tackle one of the trickiest parts of GANs—how they’re trained, especially that weird bit about the Generator not having a “normal” loss function like cross-entropy or mean squared error. I’ll admit, when I first dug into this, it threw me for a loop. Like, how do you train something without a clear “here’s what you did wrong” signal?

GANs are built on two players: the Generator (G), which pumps out fake samples (think counterfeit images), and the Discriminator (D), which plays detective and decides if something’s real or fake. They take turns training, like a back-and-forth game where one improves while the other watches. The Discriminator’s job is pretty straightforward, but the Generator? That’s where the confusion creeps in. Let’s break it down step by step.

The Discriminator: A Familiar Face

The Discriminator is our anchor here—it’s just a binary classifier, something we’ve seen a million times in machine learning. Its goal is to label real stuff as real and fake stuff as fake. We train it with a loss we all know: binary cross-entropy. Here’s how it looks:

$$ L_D = -E_{x\sim p_{data}}[\log D(x)] - E_{z\sim p_z}[\log(1-D(G(z)))] $$

In human terms:

For real images x, it wants D(x) close to 1 (“Yep, that’s legit”).
For fake images from the Generator ((G(z))), it wants (D(G(z))) close to 0 (“Nah, that’s a knockoff”).Simple enough. Feed it some real and fake samples, calculate the loss, tweak its weights—classic supervised learning stuff. No surprises there.

The Generator: Where’s the Loss?

Now, the Generator is where things get funky. When we train it, the Discriminator’s weights are frozen—it’s just sitting there, ready to judge. The Generator spits out a fake image G(z), but here’s the catch: we don’t have a “perfect” image to compare it to, so no ground truth, no direct loss like “you’re 0.7 pixels off.” So how do we tell it how to improve?

Here’s the genius part: we use the Discriminator’s opinion as the Generator’s loss. The Generator’s goal is to trick (D), so its loss is tied to how well (D) thinks the fake image is real:

$$ L_G = -E_{z\sim p_z}[\log D(G(z))] $$

In other words, it wants D(G(z)) to be as close to 1 as possible. But wait— D isn’t a simple loss function; it’s a whole model. How do we backpropagate through that to update the Generator?