Monday, May 23, 2016

Negative splits

(sources at GitHub)
This post is about negative splits. Which is a fancy way to say that you start running slower and then increase the speed during the race. The name comes from the fact that when you look the splits, the list of time it took you to run each kilometer (or mile), the numbers decreases.
The splits strategy is somewhat disputed, but many coaches recommend using negative splits (look here).


The question of this post is whether negatiive or positive splits running has any relation with the results.

The race(s)

This time I had to change the races used. The thing with the races in Santiago is that they must have uphill and downhill segments, which could affect the results. So I went for a flat race, and flat Berlin is. So the chosen race was the Belrlin Halb Marathon.
The specific races where from years 2011, 2013 and 2015 (see the results site).
This races had some runners:
##    
##     BerlinHb2011 BerlinHb2013 BerlinHb2015
##   M        13726        14229        14636
##   W         6330         7786         8658
that ran at certain speeds (km/h):
## $BerlinHb2011
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.154   9.061  10.150  10.320  11.310  20.880 
## 
## $BerlinHb2013
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.745   9.448  10.550  10.700  11.730  21.240 
## 
## $BerlinHb2015
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.915   9.481  10.590  10.740  11.770  21.190

The splits

The races results contains partial times for the first 10 km, which allows to calculate the average for the two segments. Splits are negative if the speed on the second segment is faster than the first. Two attributes were added to the data, dsign, which is -1 (for negative splits), 0 or +1. Anddelta_pct which is the change of speed expressed in percent.
This is how the sign of the splits is distributed:
##     event
##      BerlinHb2011 BerlinHb2013 BerlinHb2015
##   -1         2316         6399         6483
##   1         17740        15616        16811
And the total percentage of negative splits is 23.25%.
The percentual change in speed (delta_pct) range is (overall):
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -125.000    0.276    3.807    4.515    8.271   60.510
and by race:
## $BerlinHb2011
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -125.000    3.006    7.044    7.443   11.630   60.510 
## 
## $BerlinHb2013
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -100.6000   -0.5542    2.5780    3.2540    6.5550   50.2200 
## 
## $BerlinHb2015
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -86.0400  -0.3607   2.6520   3.1850   6.3650  42.2600
and looks like this:

2011 race looks different than the other two years in shape and center, for reasons unknown to me.
Most people runs positive splits, ranging from 0% to around 8% slower in the second half.

Splits and speed

This plot illustrates the relation of speed with the change in speed (delta_pct): 


From this charts its clear that there is a relation between the speed difference and the total speed. Those with lower splits tend to run faster. But at the same time splits don’t seem to explain the best times, fastest runners tend to have small splits, around zero.
Let’s do the numbers, comparing negative splits speeds with no-negative, for women:
## 
##  Welch Two Sample t-test
## 
## data:  vNeg and vPos
## t = 29.93, df = 8198, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.5788349 0.6599688
## sample estimates:
## mean of x mean of y 
## 10.244138  9.624736
and men:
## 
##  Welch Two Sample t-test
## 
## data:  vNeg and vPos
## t = 33.954, df = 19185, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.6164811 0.6920167
## sample estimates:
## mean of x mean of y 
##  11.54288  10.88863
So in both cases we have a significant difference (p-values almost 0).
That was using just a negative/non-negative attribute. What about the size in speed change?

Relation with other factors

Let’s compare with Sex:
## 
##  Welch Two Sample t-test
## 
## data:  vM and vW
## t = 0.80468, df = 52630, p-value = 0.421
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.05981364  0.14313333
## sample estimates:
## mean of x mean of y 
##  4.529098  4.487438
So it doesn’t seem to be an association between Sex and the speed change.
Then the relation with Age:


## lm(formula = delta_pct ~ Age, data = df)
##               Estimate  Std. Error  t value      Pr(>|t|)
## (Intercept) 1.99655813 0.099077172 20.15155  4.903596e-90
## Age         0.06117036 0.002324882 26.31117 8.867464e-152
And there is a a relation with Age, 0.6% of change in speed for each decade of age.

Comparing association with speed

First look at the relation between speed and speed change on splits:
## lm(formula = complete ~ delta_pct, data = df)
##                Estimate   Std. Error    t value Pr(>|t|)
## (Intercept) 11.01116582 0.0079639887 1382.61947        0
## delta_pct   -0.09155948 0.0009969797  -91.83686        0
So the split delta accounts for 11.4% of the total speed variability among runners.
Comparing with Age:
## lm(formula = complete ~ Age, data = df)
##                Estimate   Std. Error   t value      Pr(>|t|)
## (Intercept) 11.39360173 0.0267821824 425.41723  0.000000e+00
## Age         -0.01933209 0.0006284537 -30.76136 2.580757e-206
Which explains just 1.4% of the total variability.
Considering also Sex, with delta_pct:
## lm(formula = complete ~ Sex + delta_pct, data = df)
##                Estimate   Std. Error    t value Pr(>|t|)
## (Intercept) 11.46368706 0.0086571097 1324.19335        0
## SexW        -1.29515316 0.0128251465 -100.98545        0
## delta_pct   -0.09184154 0.0009272741  -99.04465        0
it’s 23.4%.
And with Sex and Age:
## lm(formula = complete ~ Sex + Age, data = df)
##                Estimate   Std. Error   t value Pr(>|t|)
## (Intercept) 12.09752538 0.0259742116 465.75140        0
## SexW        -1.34793835 0.0136330736 -98.87267        0
## Age         -0.02502357 0.0005889715 -42.48689        0
it’s 14.3%.

How well the splits predict the speed?

This is the residuals plot for fit with Sex and delta_pct



As was seen in the second figure, the speed change between first and second half seem to be a better predictor of slower speeds that of higher.
Let’s see what is the range for delta_pct for the top 5% of male runners:
## $BerlinHb2011
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -125.000    1.825    3.831    3.734    5.965   18.040 
## 
## $BerlinHb2013
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -7.7820 -0.7309  1.0590  1.1630  2.9570 19.0100 
## 
## $BerlinHb2015
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -17.2200  -0.7451   0.7912   1.0450   2.4950  38.0200
The center of this ranges is closer to 0 (and more negative) than the range for all runners.
And if we test for difference between negative and no-negative splits:
## 
##  Welch Two Sample t-test
## 
## data:  vNeg and vPos
## t = 2.0686, df = 1013.3, p-value = 0.03883
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.006702763 0.254118218
## sample estimates:
## mean of x mean of y 
##  15.50940  15.37898
We find the in this group the split sign still have a significant association with the speed, being positive splits slower (with a p-value of 0.039).

Conclusion

Negative splits are a good thing for running. But as most good thins should not be overdone. Best results are associated with just a slight improvement of speed on second half.
One notable thing is that having negative, or lower splits, have a stronger association with the overall speed than age. Meaning that some of the age disadvantage can be neutralized with pace management.
regards



Tuesday, May 3, 2016

Velocidad según la edad

Saturday, April 16, 2016

¿Que hace a un corredor rápido?

Esta es una pregunta que preocupa, en mayor o menor medida, a todos aquellos que corren.
Y para la que no faltan respuestas. Todos los runners tienen sus propias teorías y convicciones, y también están los sitios web, blogs y revistas que dan sus recomendaciones.
Como a mi también me interesa la pregunta (en sus variante ¿y como hizo toda esa gente para llegar antes que yó a la meta?) me propuse investigar el tema.
Voy a partir con lo más simple, tomar una carrera y comparar a los corredores según categoría de edad y género (sexo).

La carrera

Los resultados son los de la carrera Brooks #50 de Vitacura (segmento strava). Es una carrera 10K (aproximadamente 9.85 km).
La carrera se corrió el 13 de Marzo del 2016, y tenemos resultados de 2322 corredores.
Los tiempos obtenidos (en minutos) se resumen así:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   31.15   51.07   58.32   59.26   66.91   93.65

Por género

La cantidad de corredores por género es:
##    F    M 
##  977 1345
Y los resultados son:
## $F
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   36.08   57.88   64.93   65.29   72.17   93.65 
## 
## $M
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   31.15   48.13   54.07   54.89   60.68   93.32
o dicho de otra forma:

Mh … , hay tanto hombres como mujeres entre los más rápidos, y lo mismo en los más lentos.
Y sí, como era de esperar, en general los hombres son más rápidos que las mujeres.

Por edad

Los resultados por categoría se dividen así:

CategoriaCant
15 - 25 AÑOS312
26 - 35 AÑOS921
36 - 45 AÑOS676
46 - 55 AÑOS297
56 - 65 AÑOS91
66 A 70 AÑOS20
71 AÑOS Y +5
Y se ven así:

Ups, sorpresa. Todas las categorias entre 15 y 65 años tienen una media muy parecida.

¡¿Quiere decir que no hay edad para correr?!

No exactamente, dandole un poco más de vueltas al asunto me encontré con algo cuando se separan los resultados de hombres y mujeres:

En los hombres la media aumenta con cada cambio de categoría. Es decir que a más joven, más rápido. Y el cambio parece bastante lineal, sinplateaus.
Lo que pasa en las mujeres es extraño. La categoría más rápida es la de “46 - 55 años”. Y se forma una especie de valle alrededor de esa categoría.

¿¿¿???

Quizás esto ayude a entender:
En la categoría de “15 - 26 años” la cantidad de mujeres es mayor que la de hombres! Pero de ahí en adelante la cantidad relativa a la de hombres baja con cada categoría.
Puede ser que en las siguientes categorías solo quedan compitiendo las corredoras más fuertes, y por eso su media baje en el tiempo.

Conclusión

¡¡ Me toca correr hoy, mañana seré más lento !!

- P

Friday, April 15, 2016

Some data sites ..


  • World
    • World Bank Data site : http://data.worldbank.org/