Questions in DMM model


I am trying to understand this code. This link has some details. I have some basic questions.

  1. In the given code with the polyphonic music dataset, what exactly are they achieving by reducing the NLL?
    a. Are they predicting any future notes of the piano? It doesn’t look like that to me.
    b. What does the latent variable z signify for this dataset?
    c. Can I tweak this code to make predictions for any other dataset?

  2. Why the z_dim is chosen as 100? Generally, if my final result is a single value(for eg: 130 or 200) instead of a vector at any timestep, can I keep z_dim as 1? Or they are different things?
    From the research paper that is mentioned in this link, even for the Health dataset(where the z is the patient’s health state at any time point), z dim is around 100? Why is it so?

  3. Is z_0 defined in the init section of the DMM class, the prior? If I know my result is going to be within a range of values(for eg: 100 and 200), where can I give that information in this code?

Thanks in advance.

As described in part 1 of the SVI tutorial, maximizing the ELBO approximately maximizes the marginal likelihood of the training data under the model. You might think of this as minimizing a kind of reconstruction loss.

z has no direct human-interpretable meaning in this model; it’s just a representation of the observed data up to each point in time. Please see this tutorial for making predictions with pyro.infer.Predictive.

The hyperparameters in the DMM example, including z_dim, are mostly taken from the original paper, where I believe they were selected using held-out validation data. You’re free to use whatever values work best for your problem.

As discussed in the intro and SVI tutorials, the distribution at each unobserved pyro.sample site in a model represents the prior distribution for that random variable. In this case, z_0 is not a random variable, so I’m not sure what you mean - you could certainly compute z_0 from a new pyro.sample statement in the model if you don’t want to fix the initial value. You can represent a uniform distribution on [100, 200] with pyro.distributions.Uniform(100, 200).

1 Like

Thank you so much. I will try to implement this way and get back.

@eb8680_2 as you suggested, I had a look into the pyro.Predictive class.

My understanding from the examples:
Give the y_true values in the ‘obs’ of the pyro.sample statement in the model. Train the model and guide using eg: SVI by passing x_train. x_train goes to a neural network and the result is the parameter for the distribution that is used in the pyro.sample that I have mentioned above.


        mean = self.linear(x_train).squeeze(-1)
        obs = pyro.sample("obs", dist.Normal(mean, sigma), obs=y_true)


After training, use Predictive to pass the x_test and retrieve the ‘obs’ along with the required number of samples argument in the Predictive. Use summary to calculate the mean and the standard deviation of the predictions.

My assumption:
In the Deep Markov Model, the observations (x values) are the input features and the latent states (z values) are the variables that would have produced these inputs. I would like to consider those latent states as my outputs that are of interest. And my output values are continuous.

But in the case of Deep Markov Model,
we pass the observations “x_train” to the obs argument in the pyro.sample statement instead of “y_true or y_train” like in the other examples. Then how does the program know any information regarding my desired output?

So, in this case, how should I use the Predictive class to make the predictions with my x_test to get the y_pred?


Can you clarify your notation? What is the new variable y (and specifically y_pred) in relation to x and z, and what is latent and observed at train and test time?

Sorry for the confusion. I have used them in general machine learning terms.

In general machine learning terms,
x_train is the input data or features used for training.
y_train is the output data during training.
x_test is the input data or features used for testing.
y_test is the ground truth results for the x_test data.
y_pred is the predicted results for the x_test data.
So the loss is calculated by comparing y_test and y_pred.

In our case, x is x(observations) itself as it is the input. y corresponds to the z(latent variables). y_pred is the predicted z values(as I am trying to use z values as the outputs). I have true z values(ground truth) to compare with the predicted z values.

Train time: when we do svi.step with train inputs.
Test time: when we don’t do svi.step and when we use Predictive with test inputs.

latent: z values
observed: x values

It sounds like there are no latent random variables in your model, since z is observed, so I’m not sure the DMM example is relevant to your problem. Have you had a chance to look through our introductory tutorials?

Yes, I looked into the introductory tutorials. But my problem at hand is to try out DMM to make predictions. Given x(observations), I have to predict z at every time step.
(I have true z with me to compare). As you said, to do predictions, z has to be observed.

My problem is like:

Given the patient’s health conditions, we have to predict like how long the patient lives. So, if I don’t give the “number of remaining days” to the model, it gives out z values somewhat in the range of (0.04, 0.9). But I need like 150, 170.

@pyrobeginner I’m tackling a similar problem, could you get it to work as intended?