Questions in DMM model


I am trying to understand this code. This link has some details. I have some basic questions.

  1. In the given code with the polyphonic music dataset, what exactly are they achieving by reducing the NLL?
    a. Are they predicting any future notes of the piano? It doesn’t look like that to me.
    b. What does the latent variable z signify for this dataset?
    c. Can I tweak this code to make predictions for any other dataset?

  2. Why the z_dim is chosen as 100? Generally, if my final result is a single value(for eg: 130 or 200) instead of a vector at any timestep, can I keep z_dim as 1? Or they are different things?
    From the research paper that is mentioned in this link, even for the Health dataset(where the z is the patient’s health state at any time point), z dim is around 100? Why is it so?

  3. Is z_0 defined in the init section of the DMM class, the prior? If I know my result is going to be within a range of values(for eg: 100 and 200), where can I give that information in this code?

Thanks in advance.

As described in part 1 of the SVI tutorial, maximizing the ELBO approximately maximizes the marginal likelihood of the training data under the model. You might think of this as minimizing a kind of reconstruction loss.

z has no direct human-interpretable meaning in this model; it’s just a representation of the observed data up to each point in time. Please see this tutorial for making predictions with pyro.infer.Predictive.

The hyperparameters in the DMM example, including z_dim, are mostly taken from the original paper, where I believe they were selected using held-out validation data. You’re free to use whatever values work best for your problem.

As discussed in the intro and SVI tutorials, the distribution at each unobserved pyro.sample site in a model represents the prior distribution for that random variable. In this case, z_0 is not a random variable, so I’m not sure what you mean - you could certainly compute z_0 from a new pyro.sample statement in the model if you don’t want to fix the initial value. You can represent a uniform distribution on [100, 200] with pyro.distributions.Uniform(100, 200).

1 Like

Thank you so much. I will try to implement this way and get back.

@eb8680_2 as you suggested, I had a look into the pyro.Predictive class.

My understanding from the examples:
Give the y_true values in the ‘obs’ of the pyro.sample statement in the model. Train the model and guide using eg: SVI by passing x_train. x_train goes to a neural network and the result is the parameter for the distribution that is used in the pyro.sample that I have mentioned above.


        mean = self.linear(x_train).squeeze(-1)
        obs = pyro.sample("obs", dist.Normal(mean, sigma), obs=y_true)


After training, use Predictive to pass the x_test and retrieve the ‘obs’ along with the required number of samples argument in the Predictive. Use summary to calculate the mean and the standard deviation of the predictions.

My assumption:
In the Deep Markov Model, the observations (x values) are the input features and the latent states (z values) are the variables that would have produced these inputs. I would like to consider those latent states as my outputs that are of interest. And my output values are continuous.

But in the case of Deep Markov Model,
we pass the observations “x_train” to the obs argument in the pyro.sample statement instead of “y_true or y_train” like in the other examples. Then how does the program know any information regarding my desired output?

So, in this case, how should I use the Predictive class to make the predictions with my x_test to get the y_pred?