[Pyro + GPytorch] Cox Process batching over sequences

glennkroegel · December 9, 2021, 6:30pm

Background:

I am trying to model the rate function for my own dataset between 0-24h time window. However, I have several days of data and would like to use one function to capture the variance of all of them.

Question:

I found the Cox process tutorial on the GPytorch website here to be exactly what I was looking for.

However in the example they fit the function base on one sample which they create synthetically at the start of the tutorial.

In my case I would have something like this (and would want to train one function for it):

Current hack:

My current solution to this is to just loop over every sequence sample.

    for i in range(num_iter):
    for sequence in dataset: # this loop not here in tutorial
        loss = infer.step(sequence, quadrature_times)

Question:

Even if this is inefficient, is this still addressing the problem correctly? As in, the resultant mean and variances are mathematically correct for my goal?
How would you do this properly with batching?

martinjankowiak · December 9, 2021, 8:46pm

if you really believe that the function is the same for each day you should mod (% in python) all your timestamps by 24 hours and you should be good to go

glennkroegel · December 12, 2021, 2:32pm

Thanks for the reply.

That’s what I have done. My times for each event are 0-24 for each day.

However, I am still confused at how the model interprets this when you have multiple days, each with events 0-24.

Are you saying I can concatenate all the days into one flat vector and have the time values repeat? My understanding is that it would be interpreted as one day with N days as many samples around the time assigned. This was why I put the for loop above so this doesn’t happen but I am not sure if this is correct either.

martinjankowiak · December 12, 2021, 4:07pm

well it depends on what assumptions you want to make. are different days EXACTLY the same? or only APPROXIMATELY the same? if the latter you’re best not doing any mod arithmetic and using a locally periodic kernel