(sources at GitHub)

This post is about negative splits. Which is a fancy way to say that you start running slower and then increase the speed during the race. The name comes from the fact that when you look the splits, the list of time it took you to run each kilometer (or mile), the numbers decreases.

The splits strategy is somewhat disputed, but many coaches recommend using negative splits (look here).

The question of this post is whether negatiive or positive splits running has any relation with the results.

The race(s)

This time I had to change the races used. The thing with the races in Santiago is that they must have uphill and downhill segments, which could affect the results. So I went for a flat race, and flat Berlin is. So the chosen race was the Belrlin Halb Marathon.

The specific races where from years 2011, 2013 and 2015 (see the results site).

This races had some runners:

##    
##     BerlinHb2011 BerlinHb2013 BerlinHb2015
##   M        13726        14229        14636
##   W         6330         7786         8658

that ran at certain speeds (km/h):

## $BerlinHb2011
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.154   9.061  10.150  10.320  11.310  20.880 
## 
## $BerlinHb2013
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.745   9.448  10.550  10.700  11.730  21.240 
## 
## $BerlinHb2015
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.915   9.481  10.590  10.740  11.770  21.190

The splits

The races results contains partial times for the first 10 km, which allows to calculate the average for the two segments. Splits are negative if the speed on the second segment is faster than the first. Two attributes were added to the data, dsign, which is -1 (for negative splits), 0 or +1. Anddelta_pct which is the change of speed expressed in percent.

This is how the sign of the splits is distributed:

##     event
##      BerlinHb2011 BerlinHb2013 BerlinHb2015
##   -1         2316         6399         6483
##   1         17740        15616        16811

And the total percentage of negative splits is 23.25%.

The percentual change in speed (delta_pct) range is (overall):

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -125.000    0.276    3.807    4.515    8.271   60.510

and by race:

## $BerlinHb2011
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -125.000    3.006    7.044    7.443   11.630   60.510 
## 
## $BerlinHb2013
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -100.6000   -0.5542    2.5780    3.2540    6.5550   50.2200 
## 
## $BerlinHb2015
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -86.0400  -0.3607   2.6520   3.1850   6.3650  42.2600

and looks like this:

2011 race looks different than the other two years in shape and center, for reasons unknown to me.

Most people runs positive splits, ranging from 0% to around 8% slower in the second half.

Splits and speed

This plot illustrates the relation of speed with the change in speed (delta_pct):

From this charts its clear that there is a relation between the speed difference and the total speed. Those with lower splits tend to run faster. But at the same time splits don’t seem to explain the best times, fastest runners tend to have small splits, around zero.

Let’s do the numbers, comparing negative splits speeds with no-negative, for women:

## 
##  Welch Two Sample t-test
## 
## data:  vNeg and vPos
## t = 29.93, df = 8198, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.5788349 0.6599688
## sample estimates:
## mean of x mean of y 
## 10.244138  9.624736

and men:

## 
##  Welch Two Sample t-test
## 
## data:  vNeg and vPos
## t = 33.954, df = 19185, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.6164811 0.6920167
## sample estimates:
## mean of x mean of y 
##  11.54288  10.88863

So in both cases we have a significant difference (p-values almost 0).

That was using just a negative/non-negative attribute. What about the size in speed change?

Relation with other factors

Let’s compare with Sex:

## 
##  Welch Two Sample t-test
## 
## data:  vM and vW
## t = 0.80468, df = 52630, p-value = 0.421
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.05981364  0.14313333
## sample estimates:
## mean of x mean of y 
##  4.529098  4.487438

So it doesn’t seem to be an association between Sex and the speed change.

Then the relation with Age:

## lm(formula = delta_pct ~ Age, data = df)

##               Estimate  Std. Error  t value      Pr(>|t|)
## (Intercept) 1.99655813 0.099077172 20.15155  4.903596e-90
## Age         0.06117036 0.002324882 26.31117 8.867464e-152

And there is a a relation with Age, 0.6% of change in speed for each decade of age.

Comparing association with speed

First look at the relation between speed and speed change on splits:

## lm(formula = complete ~ delta_pct, data = df)

##                Estimate   Std. Error    t value Pr(>|t|)
## (Intercept) 11.01116582 0.0079639887 1382.61947        0
## delta_pct   -0.09155948 0.0009969797  -91.83686        0

So the split delta accounts for 11.4% of the total speed variability among runners.

Comparing with Age:

## lm(formula = complete ~ Age, data = df)

##                Estimate   Std. Error   t value      Pr(>|t|)
## (Intercept) 11.39360173 0.0267821824 425.41723  0.000000e+00
## Age         -0.01933209 0.0006284537 -30.76136 2.580757e-206

Which explains just 1.4% of the total variability.

Considering also Sex, with delta_pct:

## lm(formula = complete ~ Sex + delta_pct, data = df)

##                Estimate   Std. Error    t value Pr(>|t|)
## (Intercept) 11.46368706 0.0086571097 1324.19335        0
## SexW        -1.29515316 0.0128251465 -100.98545        0
## delta_pct   -0.09184154 0.0009272741  -99.04465        0

it’s 23.4%.

And with Sex and Age:

## lm(formula = complete ~ Sex + Age, data = df)

##                Estimate   Std. Error   t value Pr(>|t|)
## (Intercept) 12.09752538 0.0259742116 465.75140        0
## SexW        -1.34793835 0.0136330736 -98.87267        0
## Age         -0.02502357 0.0005889715 -42.48689        0

it’s 14.3%.

How well the splits predict the speed?

This is the residuals plot for fit with Sex and delta_pct:

As was seen in the second figure, the speed change between first and second half seem to be a better predictor of slower speeds that of higher.

Let’s see what is the range for delta_pct for the top 5% of male runners:

## $BerlinHb2011
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -125.000    1.825    3.831    3.734    5.965   18.040 
## 
## $BerlinHb2013
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -7.7820 -0.7309  1.0590  1.1630  2.9570 19.0100 
## 
## $BerlinHb2015
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -17.2200  -0.7451   0.7912   1.0450   2.4950  38.0200

The center of this ranges is closer to 0 (and more negative) than the range for all runners.

And if we test for difference between negative and no-negative splits:

## 
##  Welch Two Sample t-test
## 
## data:  vNeg and vPos
## t = 2.0686, df = 1013.3, p-value = 0.03883
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.006702763 0.254118218
## sample estimates:
## mean of x mean of y 
##  15.50940  15.37898

We find the in this group the split sign still have a significant association with the speed, being positive splits slower (with a p-value of 0.039).

Conclusion

Negative splits are a good thing for running. But as most good thins should not be overdone. Best results are associated with just a slight improvement of speed on second half.

One notable thing is that having negative, or lower splits, have a stronger association with the overall speed than age. Meaning that some of the age disadvantage can be neutralized with pace management.

regards

(fuentes y última versión aquí)

En este post voy a revisar el cambio de los tiempos en función de la edad.

Pero la variable a usar va a ser la velocidad, no el tiempo. El problema con
los tiempos es que no son lineales.

Por ejemplo esta es la relación entre los tiempos (en minutos) y la velocidad
(km/h) en una carrera de 10km:

Para tiempos mayores que 60 minutos es bastante lineal, pero para tiempos
menores definitivamente nó.

Carrera Brooks

Los datos son de la carrera Brooks #50

Los tiempos de esta carrera por género (Sexo) se ven aquí:

La linea de tendencia ayuda a ver que en los hombres la velocidad (en km/h),
baja con la edad, que es lo esperado. Pero en las mujeres se aprecia una
curva convexa, con valores altos en el rango 35-45 años.

Los valores de la pendiente son:

-0.032 km/h por año para hombres, y
0.000 km/h por año para mujeres

La cantidad de corredores:

##     Division
## Sexo 15-25 26-35 36-45 46-55 56-65 66-70
##    F   170   431   252   101    20     3
##    M   142   490   424   196    71    17

Mi impresión es que con la edad solo las mujeres más comprometidas quedan
compitiendo. Eso se refleja en que la cantidad de participantes baja más rápido
que en los hombres con la edad. Y esa “depuración” de participantes produce
mejores resultados en las categorías del medio. Pero eso será tema de otro
artículo.
Así que en el resto solo usaré los datos de los hombres.

Otras carreras

Las otra carreras revisadas son las ediciones de 2014, 2015 y 2016 de la

Corrida de Santiago (simultánea con la Maratón de Santiago). Esta es una
carrera de 10k.

Los participantes (hombres) por categoría fueron:

##          Division
## event     15-25 26-35 36-45 46-55 56-65 66-70
##   mds2014   836  1456   922   489   197    31
##   mds2015   682  1167   796   463   202    42
##   mds2016   635  1047   771   452   203    41

Los resultados:

## $mds2014
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.293   8.793   9.926  10.180  11.290  20.700 
## 
## $mds2015
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.283   8.633   9.795  10.080  11.190  20.870 
## 
## $mds2016
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   4.118   8.664   9.850  10.150  11.170  21.200

Distribuidos por categoría (Division):

Y la regresión lineal por edad da los siguientes resultados:

##             mds2014 mds2015 mds2016
## (Intercept) 11.3562 11.1724 11.2527
## Age         -0.0337 -0.0307 -0.0305

Estos resultados son consistentes con los de la carrera Brooks #50.
Aunque en aquella no esta disponible la edad de cada participante y la
regresión se hizo para la edad media de la categoría, y en el caso de MDS
se hizo con la edad de cada participante.

Otra distancia

Para chequear con otra distancia revisamos la media maratón de la MDS de los mismos años (2014,2015 y 2016).

Los participantes (hombres) por categoría fueron:

##          Division
## event     15-25 26-35 36-45 46-55 56-65 66-70
##   mds2014  1084  2582  2013  1036   319    26
##   mds2015   961  2698  2215  1210   401    38
##   mds2016   792  2564  2172  1196   429    38

Los resultados:

## $mds2014
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.89    8.93    9.93   10.10   11.10   19.10 
## 
## $mds2015
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.82    8.74    9.78    9.94   10.90   19.20 
## 
## $mds2016
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.71    8.85    9.91   10.10   11.10   18.80

Distribuidos por categoría (Division):

Y la regresión lineal por edad da los siguientes resultados:

##             mds2014 mds2015 mds2016
## (Intercept) 10.6885 10.6319 10.7061
## Age         -0.0164 -0.0187 -0.0174

Es llamativo que las velocidades medias son muy similares en las carreras
de 10k y las de 21k. La baja de velocidad con la edad es menos significativa
en las carreras de 21k que de 10k, casi la mitad.

Comentarios

Lo esperable era una baja en la velocidad con la edad. Y se da en hombres

en carreras de 10k y 21k.

En las mujeres ocurre algo distinto, con las categorías de 35 a 55 años teniendo
los mejores resultados. Eso requiere una explicación.

Por otra parte si observamos como cambian los máximos de cada categoría en los
hombres, parece que la baja en la velocidad es mayor a la que indica la pendiente
basada en la media.

Impresión

Tengo la impresión de que la reducción de la velocidad con la edad es mayor que

a que indica la regresión. Y que el efecto de auto selección efecto significativo.
Es decir, en las categorías mayores solo quedan los runners más dedicados, lo
que sube su promedio, mientras que en los más jóvenes participan más corredores
con poca dedicación, que bajan el promedio. Separar esa influencia requerirá otro análisis.

Proyecciones, especulaciones y otras divagaciones

Monday, May 23, 2016

Negative splits

The race(s)

The splits

Splits and speed

Relation with other factors

Comparing association with speed

How well the splits predict the speed?

Conclusion

Tuesday, May 3, 2016

Velocidad según la edad

(fuentes y última versión aquí)

Carrera Brooks

Otras carreras

Otra distancia

Comentarios

Lo esperable era una baja en la velocidad con la edad. Y se da en hombres

Impresión

Tengo la impresión de que la reducción de la velocidad con la edad es mayor que

Followers

Blog Archive

About Me