Hi, I’m working on a use case which I’m trying to solve using a Pyro bayesian nnet, but for which I cannot get good results. Is anyone interested in taking a look at my approach, and criticise it or help me improving it?

My main concern is that I cannot get my model anywhere near the performance of extremely naive approaches (see below). I read on other posts that if my nr of parameters is much higher than my datapoints, then I wouldn’t be able to get a good performance; but that doesn’t hold in this situation.

**The dataset**

The dataset is a simplified and preprocessed version of a Kaggle competition dataset, the goal is to predict the arrival delay of flights, from a few features -24 features in total- like “flight distance”, “departure time”, “arrival time”, “origin”, “destination”, etc.

An example of some of the feats:

The training set consists of 4.291.428 flight records (with 24 features each).

The testing set has 922.928 additional samples.

My target distribution looks like the graph below:

**The model**

I’m modelling the delays with an Exponential distribution, and I’m fitting a bayesian nnet to output its rate based on the features.

**Evaluation**

In order to evaluate the output, I use a delay threshold (i.e. 60 min) and I use my model to predict the probability of being delayed more than this threshold. Then I use an uplift curve to compare the models (that is, I get my probability predictions and split them into deciles, for which I compute the precision). See an example in “the baseline” section.

**The baseline**

As a baseline I’m using the following approaches:

- a groupby mean on categorical features.
- fitting a -non bayesian- nnet with a loss equal to the mean likelihood of the data using different likelihood distributions (e.g. a Gamma).

Next, the results of the groupby baseline:

**The code (plus data, plus instructions)**