Do Pyro's optimizers (such as pyro.optim.Adam) update standard PyTorch parameters?

mmdxl-dcy · February 3, 2026, 12:27pm

I have constructed a network consisting of two linear layers, where the first layer is a standard PyTorch linear layer, and the second layer is a probabilistic linear layer built using PyroModule. The setup includes AutoDiagonalNormal(model) as the guide, pyro.optim.Adam as the optimizer, and SVI(model, guide, optimizer, loss=Trace_ELBO()). According to the AI’s analysis, SVI only optimizes the variational parameters, so model.fc1.weight should not change. However, after actual running, I found that model.fc1.weight still changed. How should this situation be understood? Please provide insights, excellent peers.

Python

class MixedBayesianNet(PyroModule):
    def __init__(self, input_dim=1, hidden_dim=20, output_dim=1):
        super().__init__()

        # First layer: Ordinary PyTorch parameters (will not be updated by variational inference)
        self.fc1 = nn.Linear(input_dim, hidden_dim)

        # Second layer: Using PyroSample (will participate in variational inference)
        self.fc2 = PyroModule[nn.Linear](hidden_dim, output_dim)
        self.fc2.weight = PyroSample(
            dist.Normal(0., 1.).expand([output_dim, hidden_dim]).to_event(2)
        )
        self.fc2.bias = PyroSample(
            dist.Normal(0., 1.).expand([output_dim]).to_event(1)
        )

        # Observation noise (also using PyroSample)
        self.log_sigma = PyroSample(dist.Normal(0., 1.))

    def forward(self, x, y=None):
        h = torch.relu(self.fc1(x))
        mean = self.fc2(h)  # shape: [N, 1]
        sigma = self.log_sigma.exp().clamp(min=1e-5)

        with pyro.plate("data", x.shape[0]):
            pyro.sample("obs", dist.Normal(mean, sigma).to_event(1), obs=y)

        return mean.squeeze(-1)

# ==============================
# 2. Generate toy data
# ==============================
N = 200
x = torch.linspace(-4, 4, N).unsqueeze(-1)
true_function = lambda x: 2 * torch.sin(1.5 * x) + 0.5 * x ** 2
y_true = true_function(x)
y = y_true + torch.randn_like(y_true) * 0.6  # Add noise

# ==============================
# 3. Training setup
# ==============================
model = MixedBayesianNet(input_dim=1, hidden_dim=20, output_dim=1)
guide = AutoDiagonalNormal(model)
optimizer = pyro.optim.Adam({"lr": 0.01})
svi = SVI(model, guide, optimizer, loss=Trace_ELBO())

# ==============================
# 4. Training + Record changes in fc1 weights
# ==============================
num_steps = 1500
losses = []
fc1_weight_norm_before = torch.norm(model.fc1.weight).item()
fc1_weight_history = []

print("Training starts...")
for step in range(num_steps):
    loss = svi.step(x, y)
    losses.append(loss)
    # Record the norm of fc1.weight (to observe if it changes)
    if step % 100 == 0:
        fc1_weight_norm = torch.norm(model.fc1.weight).item()
        fc1_weight_history.append(fc1_weight_norm)
        print(f"Step {step:4d} | loss: {loss:8.3f} | fc1.weight norm: {fc1_weight_norm:.6f}")
fc1_weight_norm_after = torch.norm(model.fc1.weight).item()

print(f" Whether there is a significant change? → {'Yes' if abs(fc1_weight_norm_after - fc1_weight_norm_before) > 1e-4 else 'No'}")

The results are as follows: