Is Gaussian Process a good choice for my model?

mattiasthalen · July 6, 2021, 1:20pm

Hi,

I’m advancing a bit more in my models and started to think that GP might be suitable for my current objective.

Some background:
There is a strong (r2: +0.95) linear/quadratic relationship between velocity~load for exercises like the squat, bench press & deadlift.

I have a dataset where each row is a session and it contains columns for each measurement in that session, i.e. each measurement pair (velocity, load) + exercise, is a feature. I also have a label column that contains the max load lifted in each session.

My objective is to train a model with N measurements per observation and estimate what the max load would be.

I’m limited by a rather small number of observations, think < 100 in total, ~30 per exercise.
Is GP a good way to go or should I look into something else?

theo · July 6, 2021, 3:46pm

Could you elaborate a bit more on the model? Are you trying to model the maximum load for the session given the velocity and load for each exercise in the session? Is each session a different person? Are the same 3 lifts done each session?

mattiasthalen · July 6, 2021, 6:53pm

Goal is to predict for a lift as I’m doing it, and also predict what the max load would have been for those times when I didn’t go for a max on an exercise.

The exercises aren’t really dependent on each other, but they are correlated in the meaning that the behavior is similar. Some are more like a straight line and others are more curved. But the basics are the same (e.g. velocity > 0, load >= 0, inverse relationship). I might do them all in one session, or just one of them. The main point is that I want to avoid having one model per exercise, but rather individual parameters, so more like a hierarchical model.

This will be for one person only, otherwise I would include something like athlete_id. But it’s just for myself

There is a concept called minimum velocity threshold (MVT), i.e. the lowest velocity a movement can be done, with maximum intent, and still succeed. This is usually used to predict the max load for a given session when using linear regression. The caveat with MVT is that it moves over time (decreases due to technical proficiency with the exercise) and also fluctuates day to day, but the same is true for the parameters in a linear/quadratic regression. Some days the slope is steeper and other days the intercept is higher.

theo · July 6, 2021, 9:25pm

Is it not dependent on the number of reps? Or are you doing a single rep of these exercises? Reminds me of RPE (Rate of Perceived Exertion), but personalising it to yourself accurately

It’s up to you how exactly you design it, but for a single exercise you could definitely put a GP prior on velocity and load to recover those curved relationships you mentioned. The key would be to pick a kernel based on how smooth you want the curve. The squared-exponential kernel is usually a good place to start for smooth relationships. In fact, you could have a single GP prior on both the velocity and load dimensions to do it jointly.

Incorporating the lift type as a categorical variable is a bit trickier. You could have it as a fixed effect, either additive or as a term to multiply the GP term.

For something a bit fancier, try multi-task GPs. I have only ever implemented this in GPflow, but there might be a numpyro implementation somewhere.

I might have overcomplicated this. Perhaps you have a better idea, or someone else in the forum

mattiasthalen · July 7, 2021, 11:18am

The reps are 1-3 per set. I work up to a heavyish single every session and then deload and do the bulk of my work, regulated on how I got. Oh, it’s definitely related to RPE, but using velocity to autoregulate load

What confuses me a bit is how such a model would look in practice.
In the previous models I had the function: velocity = \alpha + \beta_0 \times load + \beta_1 \times load^2
Which works when each set is an observation for the model. But now I’m looking to pivot the sets to columns, so each session is an observation.

In other words, my data looks like this:

Observation	Exercise	Velocity Set 1	Velocity Set n	Weight Set 1	Weight Set n	Max Load
0	Squat	1.20	0.32	20	130	140
1	Squat	1.15	0.28	20	130	145
2	Bench Press	1.08	0.20	20	80	90
3	Bench Press	1.03	0.18	20	80	92.5

theo · July 7, 2021, 11:54am

Unless the order of the sets matter (then you could include set X like a time variable), and the max load is constant for the session, I would model as you have been without pivoting, but with a GP instead of quadratic

Others may have better ideas…

Perhaps you might want to add a date to the sessions. Then you could have an underlying time variable and track your progress over time

mattiasthalen · July 7, 2021, 5:28pm

I do have a timestamp for each session. Tbh I probably have more features than I need. Rest times, peak velocity, distance, peak velocity location and duration