dev_to 2026年3月7日

GPX Runner's データをPHPで解釡する

GPX Runner's data decoded with PHP

Translated: 2026/3/7 8:41:42

Japanese Translation

サステナブルウォッチを着用しているすべてのランナーは、腕に小さなデータレコーダーを持っています。毎走後に生成されるGPXファイル:GPS位置、タイムスタンプ、高度情報などの詳細なロギング内容です。通常、ランナーたちは、距離、時間、平均ペースのような電話画面でのシンプルなリマインダーディスプレイに目を通します。しかし、GPXファイルには多くの情報を秘めています。高技術者/ヒフロイススtatistics PHPパッケージというパッケージがあれば、ランナーファンクションは1kmごとのペース情報を生成し、実際の統計解析を運びます。例えば：ペース変動、エレベーションに及ぶ影響、心拍数などを検討します。この記事では、GPXファイルをパースすることがあり、その上で、ランニング性能分析を詳細指標（descriptive statistics）、相関の統計解析（correlation analysis）、回帰分析など（regression analysis）、異常値検出といった方法で進めていくことを解説します。composer require hi-folks/statistics 此パッケージは、PHP 8.2+を含む必要があります。ソースコードではそのGitHubのリンク:https://github.com/Hi-Folks/statisticsです。GPX（GPS交換フォーマット）ファイルはXMLで、さまざまなブランドのスポーツウォッチのほとんどから出力できます。 Garmin, Polar, Suunto, Apple Watch, Coros -などの全ての機種を含めます。あなたのランニング中に各ファイルでは1秒間隔でもしくは5秒間隔での記録されたコースポイント群が含まれています。GPXコースポイントとしての一部については、以下の要素が提供されています：次の点々との距離(ハスレー数法を使用する) 各1kmの段階でのペース（各瞬時に時間を算出）各区間における経路増減と経路減少の記録（心拍数平均値、Garmin バンクから提供） PHPの内蔵simpleXMLを使用することで、パースは簡単です。ここではエッジケース関連のヘルパー関数をいくつか挙げます: function parseGpx(string $filePath): array { $xml = simplexml_load_file($filePath); if ($xml === false) { throw new RuntimeException(

Original Content

Every runner with a sports watch carries a small data recorder on their wrist. After every run, it produces a GPX file: a detailed log of GPS positions, timestamps, elevation, and often heart rate. Most runners glance at the summary on their phone: distance, time, average pace, and move on. But that GPX file holds much more. With the hi-folks/statistics PHP package, you can extract per-kilometer splits and apply real statistical analysis to your running performance, from pacing consistency to elevation impact, cardiac drift to long-term improvement trends. In this article, we will parse a GPX file, build per-kilometer splits, and analyze a 10K run step by step using descriptive statistics, correlation, regression, outlier detection, and more. composer require hi-folks/statistics The package requires PHP 8.2+ and is available on GitHub: https://github.com/Hi-Folks/statistics. A GPX (GPS Exchange Format) file is XML. Every sport watch — Garmin, Polar, Suunto, Apple Watch, Coros — can export one. Each file contains a sequence of trackpoints recorded every 1–5 seconds during your run. A trackpoint looks like this: 122.4 2025-03-15T07:30:45Z 152 From these raw points you can derive: Distance between consecutive points (using the Haversine formula) Pace per kilometer (time elapsed over each 1 km segment) Elevation gain and loss per kilometer Heart rate averages per kilometer (from the Garmin extension) PHP's built-in SimpleXML makes parsing straightforward. Here are the helper functions we use: function parseGpx(string $filePath): array { $xml = simplexml_load_file($filePath); if ($xml === false) { throw new RuntimeException("Cannot parse GPX file: {$filePath}"); } $namespaces = $xml->getNamespaces(true); $points = []; foreach ($xml->trk->trkseg->trkpt as $trkpt) { $point = [ 'lat' => (float) $trkpt['lat'], 'lon' => (float) $trkpt['lon'], 'ele' => isset($trkpt->ele) ? (float) $trkpt->ele : 0.0, 'time' => isset($trkpt->time) ? strtotime((string) $trkpt->time) : 0, 'hr' => null, ]; // Extract heart rate from Garmin TrackPointExtension if (isset($namespaces['gpxtpx'])) { $extensions = $trkpt->extensions; if ($extensions) { $gpxtpx = $extensions->children($namespaces['gpxtpx']); if (isset($gpxtpx->TrackPointExtension->hr)) { $point['hr'] = (int) $gpxtpx->TrackPointExtension->hr; } } } $points[] = $point; } return $points; } To calculate the distance between two GPS coordinates, we use the Haversine formula, the standard method for computing great-circle distance on a sphere: function haversineDistance(float $lat1, float $lon1, float $lat2, float $lon2): float { $R = 6371000; // Earth radius in meters $dLat = deg2rad($lat2 - $lat1); $dLon = deg2rad($lon2 - $lon1); $a = sin($dLat / 2) ** 2 + cos(deg2rad($lat1)) * cos(deg2rad($lat2)) * sin($dLon / 2) ** 2; return $R * 2 * atan2(sqrt($a), sqrt(1 - $a)); } Then we walk through the trackpoints, accumulating distance until we hit each kilometer mark, and record the split: function buildKmSplits(array $trackpoints): array { $splits = []; $currentKm = 1; $kmDistance = 0; $kmStartTime = $trackpoints[0]['time']; $kmEleGain = 0; $kmEleLoss = 0; $kmHrValues = []; for ($i = 1; $i < count($trackpoints); $i++) { $prev = $trackpoints[$i - 1]; $curr = $trackpoints[$i]; $segDist = haversineDistance($prev['lat'], $prev['lon'], $curr['lat'], $curr['lon']); $kmDistance += $segDist; $eleDiff = $curr['ele'] - $prev['ele']; if ($eleDiff > 0) { $kmEleGain += $eleDiff; } else { $kmEleLoss += abs($eleDiff); } if ($curr['hr'] !== null) { $kmHrValues[] = $curr['hr']; } if ($kmDistance >= 1000) { $kmTime = $curr['time'] - $kmStartTime; $splits[] = [ 'km' => $currentKm, 'time' => $kmTime, 'pace' => $kmTime, 'eleGain' => round($kmEleGain, 1), 'eleLoss' => round($kmEleLoss, 1), 'avgHr' => count($kmHrValues) > 0 ? (int) round(Stat::mean($kmHrValues)) : null, ]; $currentKm++; $kmDistance -= 1000; $kmStartTime = $curr['time']; $kmEleGain = 0; $kmEleLoss = 0; $kmHrValues = []; } } return $splits; } For this article, we use a simulated 10K run with realistic characteristics: a hilly middle section, slight positive split tendency, and heart rate drifting upward with fatigue. If you have a real GPX file, just swap in parseGpx() and buildKmSplits(). // === Option 1: Parse a real GPX file === // $trackpoints = parseGpx('your-run.gpx'); // $splits = buildKmSplits($trackpoints); // === Option 2: Simulated 10K run === $splits = [ ['km' => 1, 'time' => 322, 'pace' => 322, 'eleGain' => 5, 'eleLoss' => 2, 'avgHr' => 145], ['km' => 2, 'time' => 318, 'pace' => 318, 'eleGain' => 8, 'eleLoss' => 3, 'avgHr' => 150], ['km' => 3, 'time' => 335, 'pace' => 335, 'eleGain' => 22, 'eleLoss' => 4, 'avgHr' => 158], ['km' => 4, 'time' => 348, 'pace' => 348, 'eleGain' => 28, 'eleLoss' => 5, 'avgHr' => 164], ['km' => 5, 'time' => 340, 'pace' => 340, 'eleGain' => 15, 'eleLoss' => 18, 'avgHr' => 162], ['km' => 6, 'time' => 312, 'pace' => 312, 'eleGain' => 2, 'eleLoss' => 30, 'avgHr' => 155], ['km' => 7, 'time' => 325, 'pace' => 325, 'eleGain' => 3, 'eleLoss' => 8, 'avgHr' => 158], ['km' => 8, 'time' => 338, 'pace' => 338, 'eleGain' => 12, 'eleLoss' => 5, 'avgHr' => 165], ['km' => 9, 'time' => 352, 'pace' => 352, 'eleGain' => 18, 'eleLoss' => 3, 'avgHr' => 170], ['km' => 10, 'time' => 330, 'pace' => 330, 'eleGain' => 4, 'eleLoss' => 15, 'avgHr' => 172], ]; The package includes utility classes for column extraction and time formatting: use HiFolks\Statistics\Stat; use HiFolks\Statistics\Freq; use HiFolks\Statistics\Utils\Arr; use HiFolks\Statistics\Utils\Format; [$paces, $eleGains, $hrValues, $kmNumbers] = Arr::extract( $splits, ['pace', 'eleGain', 'avgHr', 'km'] ); The full PHP example with the complete code is here: examples/article-gpx-running-analysis.php Before diving into analysis, let's get the big picture: $totalTime = array_sum(array_column($splits, 'time')); $totalEleGain = array_sum(array_column($splits, 'eleGain')); $totalEleLoss = array_sum(array_column($splits, 'eleLoss')); echo "Distance: " . count($splits) . " km" . PHP_EOL; echo "Total time: " . Format::secondsToTime($totalTime) . PHP_EOL; echo "Average pace: " . Format::secondsToTime(Stat::mean($paces)) . "/km" . PHP_EOL; echo "Elevation gain: +" . $totalEleGain . " m" . PHP_EOL; echo "Elevation loss: -" . $totalEleLoss . " m" . PHP_EOL; echo "Average HR: " . round(Stat::mean($hrValues)) . " bpm" . PHP_EOL; Output: Distance: 10 km Total time: 0:55:20 Average pace: 0:05:32/km Elevation gain: +117 m Elevation loss: -93 m Average HR: 160 bpm This is the summary your watch shows you. Now let's look at what the numbers actually reveal. The average pace is useful, but it hides the variation. Did you hold a steady 5:32/km throughout, or did you yo-yo between 5:12 and 5:52? $meanPace = Stat::mean($paces); $medianPace = Stat::median($paces); $stdevPace = Stat::stdev($paces); $quartiles = Stat::quantiles($paces); echo "Mean pace: " . Format::secondsToTime($meanPace) . "/km" . PHP_EOL; echo "Median pace: " . Format::secondsToTime($medianPace) . "/km" . PHP_EOL; echo "Std deviation: " . round($stdevPace, 1) . " sec" . PHP_EOL; echo "Fastest km: " . Format::secondsToTime(min($paces)) . "/km" . PHP_EOL; echo "Slowest km: " . Format::secondsToTime(max($paces)) . "/km" . PHP_EOL; echo "Quartiles: Q1=" . Format::secondsToTime($quartiles[0]) . "/km" . " Q2=" . Format::secondsToTime($quartiles[1]) . "/km" . " Q3=" . Format::secondsToTime($quartiles[2]) . "/km" . PHP_EOL; Output: Mean pace: 0:05:32/km Median pace: 0:05:33/km Std deviation: 13 sec Fastest km: 0:05:12/km (km 6) Slowest km: 0:05:52/km (km 9) Quartiles: Q1=0:05:21/km Q2=0:05:33/km Q3=0:05:42/km How to interpret the results: A standard deviation of 13 seconds means most of your km were within ~13 seconds of the average. That's moderate consistency for a hilly course. If mean and median are close (5:32 vs 5:33), your pacing was roughly symmetric — no extreme skew toward fast or slow km. The range (5:12 to 5:52 = 40 seconds) tells you the spread from your best to worst km. Compare this with the IQR (Q1 to Q3 = 21 seconds) — the core of your pacing was much tighter than the extremes suggest. Every coach talks about pacing strategy. A positive split means you slowed down in the second half; a negative split means you got faster. The Coefficient of Variation (CV) puts a single number on your consistency: $cv = Stat::coefficientOfVariation($paces, 2); $halfPoint = intdiv(count($splits), 2); $firstHalfPaces = array_slice($paces, 0, $halfPoint); $secondHalfPaces = array_slice($paces, $halfPoint); $meanFirst = Stat::mean($firstHalfPaces); $meanSecond = Stat::mean($secondHalfPaces); $splitDiff = $meanSecond - $meanFirst; $splitPct = round(($splitDiff / $meanFirst) * 100, 1); echo "Coefficient of Variation: " . $cv . "%" . PHP_EOL; echo "First half avg pace: " . Format::secondsToTime($meanFirst) . "/km" . PHP_EOL; echo "Second half avg pace: " . Format::secondsToTime($meanSecond) . "/km" . PHP_EOL; Output: Coefficient of Variation: 3.91% First half avg pace: 0:05:33/km (km 1-5) Second half avg pace: 0:05:31/km (km 6-10) Negative split: 1.2 sec/km faster (0.4% improvement) How to interpret the results: A CV below 5% is considered good pacing for a hilly course. Elite runners often achieve CV under 2% on flat courses. Our runner managed a slight negative split, the second half was marginally faster. This is often the sign of disciplined pacing: holding back on the uphills in km 3–5 and then capitalizing on the downhill at km 6. Compare your CV across different runs to track whether your pacing discipline is improving over time. This is the question every trail runner wants answered. We have per-km elevation gain and per-km pace. Let's see if hills measurably affect your speed: $corrEle = Stat::correlation($eleGains, $paces); $regEle = Stat::linearRegression($eleGains, $paces); $r2Ele = Stat::rSquared($eleGains, $paces, false, 4); echo "Correlation (elevation gain vs pace): " . round($corrEle, 4) . PHP_EOL; echo "Linear regression: pace = " . round($regEle[0], 2) . " x eleGain + " . round($regEle[1], 1) . PHP_EOL; echo "R-squared: " . $r2Ele . PHP_EOL; Output: Correlation (elevation gain vs pace): 0.8053 Linear regression: pace = 1.18 x eleGain + 318.2 R-squared: 0.6485 How to interpret the results: A Pearson correlation of 0.81 is strong, more uphill clearly means slower pace. The slope (1.18) tells you: each additional meter of elevation gain within that kilometer is associated with roughly 1.2 seconds slower pace. On a km with 28m of climbing (km 4), the model predicts you'd run ~33 seconds slower than on a flat km. R-squared of 0.65 means elevation gain explains about 65% of the variation in your pace. The remaining 35% comes from other factors — fatigue, wind, terrain surface, mental state. Track this slope over multiple runs. As your hill fitness improves, this number should decrease, hills will slow you down less. Heart rate tells a story that pace alone cannot. Even if your pace stays constant, a rising heart rate signals that your body is working harder, this is cardiac drift, caused by dehydration, heat, and accumulated fatigue. $meanHr = Stat::mean($hrValues); $stdevHr = Stat::stdev($hrValues); // Cardiac drift: HR vs km number $corrHrKm = Stat::correlation($kmNumbers, $hrValues); $regHrKm = Stat::linearRegression($kmNumbers, $hrValues); $r2HrKm = Stat::rSquared($kmNumbers, $hrValues, false, 4); // HR vs pace $corrHrPace = Stat::correlation($hrValues, $paces); Output: Mean HR: 160 bpm Median HR: 160 bpm Std dev: 8.5 bpm Min HR: 145 bpm | Max HR: 172 bpm Cardiac drift (HR vs km): Correlation: 0.8506 Regression: HR = 2.38 x km + 146.8 R-squared: 0.7235 HR drift per km: +2.4 bpm/km HR vs pace correlation: 0.6912 How to interpret the results: A correlation of 0.85 between km number and heart rate confirms significant cardiac drift, your heart worked progressively harder as the run continued. The regression slope (+2.4 bpm/km) means your heart rate rose by about 2.4 beats per minute each kilometer. Over 10 km, that's a ~24 bpm increase from start to finish. The HR vs pace correlation (0.69) is moderate — heart rate is influenced by pace, but also by elevation, fatigue, and heat. A perfect correlation would mean pace alone determines HR, which is never true in real-world conditions. Using Freq::frequencyTableBySize(), we can see how many kilometers were spent in each heart rate zone: $hrZones = Freq::frequencyTableBySize($hrValues, 10); foreach ($hrZones as $range => $count) { echo " " . $range . " bpm: " . str_repeat("#", $count) . " (" . $count . " km)" . PHP_EOL; } Output: Heart Rate Zone Distribution: 145 bpm: ## (2 km) 155 bpm: ##### (5 km) 165 bpm: ### (3 km) This tells you that 5 of your 10 km were in the 150–159 bpm zone — your aerobic sweet spot. Only 3 km pushed into the 165+ range (threshold/anaerobic), primarily on the uphill and late-fatigue segments. Not every kilometer is created equal. Some are unusually fast (downhill? adrenaline?) or slow (steep hill? red light? cramp?). The z-score tells you exactly how unusual each km was: $zscores = Stat::zscores($paces, 2); foreach ($splits as $i => $split) { echo " km " . $split['km'] . ": " . Format::secondsToTime($split['pace']) . "/km" . " z=" . sprintf("%+.2f", $zscores[$i]) . PHP_EOL; } Output: km 1: 0:05:22/km z=-0.77 km 2: 0:05:18/km z=-1.08 km 3: 0:05:35/km z=+0.23 km 4: 0:05:48/km z=+1.23 km 5: 0:05:40/km z=+0.62 km 6: 0:05:12/km z=-1.54 km 7: 0:05:25/km z=-0.54 km 8: 0:05:38/km z=+0.46 km 9: 0:05:52/km z=+1.54 km 10: 0:05:30/km z=-0.15 How to interpret the results: Negative z-scores mean faster than average; positive means slower. The further from zero, the more unusual. Km 6 (z = -1.54) was the standout fast km — 20 seconds faster than average. Looking at the data, it had only 2m of elevation gain but 30m of loss. Gravity did the work. Km 9 (z = +1.54) was the slowest — 18m of climbing plus accumulated fatigue in the late stages. No km exceeded |z| > 2.0, so there are no statistical outliers. This is confirmed by both Stat::outliers() and Stat::iqrOutliers(): $zOutliers = Stat::outliers($paces, 2.0); // [] $iqrOutliers = Stat::iqrOutliers($paces); // [] If you had stopped at a traffic light or taken a water break, the affected km would show up as an outlier — and you'd know to exclude it from your pace analysis. Percentiles tell you what your pace range actually looks like across this run: $percentiles = [10, 25, 50, 75, 90]; foreach ($percentiles as $p) { echo " P" . $p . ": " . Format::secondsToTime(Stat::percentile($paces, $p, 0)) . "/km" . PHP_EOL; } Output: P10: 0:05:13/km P25: 0:05:21/km P50: 0:05:33/km P75: 0:05:42/km P90: 0:05:52/km How to interpret the results: P10 (5:13/km) is the pace you only sustain on your fastest 10% of km — your peak speed on this run. P50 (5:33/km) is your median pace — the truest single number for "how fast did I run?" P90 (5:52/km) is your slowest 10% — your weakest km, usually hills or the final push. The gap between P25 and P75 (21 seconds) is your interquartile range — the "core band" of your pacing. A narrower band means more consistent running. skewness() and kurtosis() reveal the shape of your pace distribution: $skewness = Stat::skewness($paces, 4); $kurtosis = Stat::kurtosis($paces, 4); echo "Skewness: " . $skewness . PHP_EOL; echo "Kurtosis: " . $kurtosis . PHP_EOL; Output: Skewness: 0.0481 Kurtosis: -0.9316 How to interpret the results: Skewness near zero (0.05) means your pace distribution is approximately symmetric. You did not have a long tail of slow km or fast km — the variation was balanced. Negative kurtosis (-0.93) means your pace values are more uniformly spread out than a normal distribution, fewer km clustered tightly around the mean, and the extremes are not very extreme. This is typical for a hilly course where terrain forces variation. If your skewness were strongly positive (> 0.5), it would mean a tail of slow km, possibly from steep climbs or late-run fatigue. A negative skewness would mean a tail of fast km, perhaps starting too fast. Your 10 km give you an average pace, but with more data (more km), that average would stabilize. The confidence interval tells you the range where your "true" comfortable pace likely falls: $ci = Stat::confidenceInterval($paces, 0.95, 0); $sem = Stat::sem($paces, 1); echo "95% CI: " . Format::secondsToTime($ci[0]) . "/km to " . Format::secondsToTime($ci[1]) . "/km" . PHP_EOL; echo "Standard Error of the Mean: " . $sem . " sec" . PHP_EOL; Output: 95% CI for your true pace: 0:05:24/km to 0:05:40/km Standard Error of the Mean: 4.1 sec How to interpret the results: We are 95% confident that your true comfortable pace for this effort level and course profile is between 5:24/km and 5:40/km. The SEM of 4.1 seconds is the engine behind this interval. With only 10 km, there's meaningful uncertainty. On a half-marathon (21 km), the SEM would shrink to about 2.8 seconds, and on a marathon (42 km) to about 2 seconds — your confidence interval would become very tight. This is useful for race planning: instead of saying "I run 5:32/km pace", you can say "my pace is 5:24–5:40/km on this type of terrain" — a more honest and useful estimate. The most powerful analysis comes from loading multiple GPX files across weeks or months. Each run gives you an average pace, and over time, you can see the trend. Here we simulate 8 weeks of training data: $weeks = [1, 2, 3, 4, 5, 6, 7, 8]; $weeklyPaces = [350, 342, 337, 333, 330, 328, 326, 325]; $trendReg = Stat::linearRegression($weeks, $weeklyPaces); $trendR2 = Stat::rSquared($weeks, $weeklyPaces, false, 4); $trendCorr = Stat::correlation($weeks, $weeklyPaces); echo "Trend regression: pace = " . round($trendReg[0], 2) . " x week + " . round($trendReg[1], 1) . PHP_EOL; echo "R-squared: " . $trendR2 . PHP_EOL; echo "Correlation: " . round($trendCorr, 4) . PHP_EOL; Output: Trend regression: pace = -3.39 x week + 349.1 R-squared: 0.9176 Correlation: -0.9579 Improvement rate: 3.4 seconds/km per week Predicted pace at week 12: 0:05:08/km (Extrapolation — use with caution!) How to interpret the results: The slope (-3.39) means you're improving by about 3.4 seconds per km per week on average. That's meaningful and measurable progress. R-squared of 0.92 means the linear model explains most of the variance, but not all of it. The remaining 8% hints that the improvement pattern isn't perfectly linear — there's curvature in the data. The negative correlation (-0.96) confirms weeks going up while pace goes down — exactly what improvement looks like. The prediction for week 12 (5:08/km) is an extrapolation. Linear trends don't continue forever — you won't reach 0:00/km eventually. But for short-term planning (next 2–3 weeks), the projection can be a reasonable target. The linear model predicts a constant improvement of 3.4 seconds per week, forever. Taken to the extreme, it would eventually predict a pace of 0:00/km, which is obviously impossible. The real issue is more subtle: athletic improvement follows a diminishing returns curve. Early gains come fast (beginner effect, neuromuscular adaptation), but as you get fitter, each additional second of improvement requires more training volume and specificity. Look at the data closely: the improvements become progressively smaller, from week 1 to week 2 is 8 seconds, but from week 7 to week 8 it's only 1 second. The rate is clearly slowing down, a pattern that linear regression cannot capture because it assumes a constant slope. This is why the linear R² (0.9176) leaves room for improvement. The logarithmicRegression() method fits the model y = a × ln(x) + b, which naturally produces fast initial improvement that gradually flattens: // Logarithmic model: pace = a * ln(week) + b $logReg = Stat::logarithmicRegression($weeks, $weeklyPaces); $logWeeks = array_map(fn($v) => log($v), $weeks); $logR2 = Stat::rSquared($logWeeks, $weeklyPaces, false, 4); echo "Logarithmic regression: pace = " . round($logReg[0], 2) . " x ln(week) + " . round($logReg[1], 1) . PHP_EOL; echo "R-squared: " . $logR2 . PHP_EOL; Output: Logarithmic regression: pace = -12.33 x ln(week) + 350.2 R-squared: 0.9987 The logarithmic model has a much higher R² (0.9987 vs 0.9176), indicating that it fits the observed data substantially better than the linear model for this dataset. A note about this simple example: with only 8 points, extremely high R² is easy to obtain. This suggests that the relationship is likely nonlinear: improvement appears rapid initially and then slows over time, a pattern consistent with diminishing returns. While the linear model already provides a strong fit (R² ≈ 0.92), the much higher R² for the logarithmic model indicates that accounting for curvature captures the structure of the data more accurately. The difference becomes clear when you project forward: $linearPrediction = $trendReg[0] * 12 + $trendReg[1]; $logPrediction = $logReg[0] * log(12) + $logReg[1]; echo "Linear prediction week 12: " . Format::secondsToTime($linearPrediction) . "/km" . PHP_EOL; echo "Logarithmic prediction week 12: " . Format::secondsToTime($logPrediction) . "/km" . PHP_EOL; Output: Linear prediction week 12: 0:05:08/km Logarithmic prediction week 12: 0:05:20/km The logarithmic model predicts 5:20/km, 12 seconds more conservative than the linear model's 5:08/km. At week 20 the gap widens further: the linear model would predict 4:41/km (unlikely for most recreational runners), while the logarithmic model predicts 5:13/km, a more realistic plateau. Aspect Linear Logarithmic Model pace = a × week + b pace = a × ln(week) + b Assumes Constant improvement forever Fast early gains, gradual plateau Short-term (4 weeks) Good approximation Good approximation Long-term (12+ weeks) Over-optimistic More realistic Best for Short training blocks, beginners with stable gains Multi-month planning, experienced runners R² on this data 0.9176 0.9987 Recommendation: compare R² values for both models on your own data. If the logarithmic R² is higher, your improvement is already following a curve and the logarithmic model will give more trustworthy projections. If R² values are similar, you're still in the early "linear" phase of improvement — but use the logarithmic model for any prediction beyond 4–6 weeks. Rather than assuming which model fits best, let's run all four regression types on the same data and compare them objectively. The package provides logarithmicRegression(), powerRegression(), and exponentialRegression() alongside linearRegression() — each fits a different curve shape: // Linear: pace = a * week + b [$aLin, $bLin] = Stat::linearRegression($weeks, $weeklyPaces); $r2Lin = Stat::rSquared($weeks, $weeklyPaces, false, 4); // Logarithmic: pace = a * ln(week) + b [$aLog, $bLog] = Stat::logarithmicRegression($weeks, $weeklyPaces); $logWeeks = array_map(fn($v) => log($v), $weeks); $r2Log = Stat::rSquared($logWeeks, $weeklyPaces, false, 4); // Power: pace = a * week^b [$aPow, $bPow] = Stat::powerRegression($weeks, $weeklyPaces); $logPaces = array_map(fn($v) => log($v), $weeklyPaces); $r2Pow = Stat::rSquared($logWeeks, $logPaces, false, 4); // Exponential: pace = a * e^(b * week) [$aExp, $bExp] = Stat::exponentialRegression($weeks, $weeklyPaces); $r2Exp = Stat::rSquared($weeks, $logPaces, false, 4); Output: Model R² Week 12 Week 20 Week 52 ───────────────────────────────────────────────────────── Linear 0.9176 0:05:08 0:04:41 0:02:53 Logarithmic 0.9987 0:05:20 0:05:13 0:05:02 Power 0.9985 0:05:20 0:05:14 0:05:03 Exponential 0.9232 0:05:09 0:04:45 0:03:27 The R² column settles it without any assumptions: Logarithmic (R² = 0.9987) and Power (R² = 0.9985) are virtually tied and both fit the data near-perfectly. They capture the curvature that the other two models miss. Linear (R² = 0.9176) and Exponential (R² = 0.9232) leave about 8% of the variance unexplained — they force a shape that doesn't match the data's natural curve. But R² only measures how well a model fits past data. The prediction columns reveal which models are trustworthy for the future: At week 20, linear predicts 4:41/km and exponential predicts 4:45/km — both assume improvement keeps accelerating at nearly the same rate. For a recreational runner who started at 5:50/km, breaking 5:00/km in just 20 weeks is ambitious; breaking 4:45 is unrealistic. At week 52 (one year), linear predicts 2:53/km — faster than the world record marathon pace (2:50/km for Kelvin Kiptum's 2:00:35). Exponential predicts 3:27/km. Both are absurd for the same runner. Logarithmic and Power predict 5:02/km and 5:03/km at week 52 — a realistic plateau where the runner has improved by about 48 seconds over a year and further gains require significantly more effort. The logarithmic and power models converge on nearly identical predictions because they both model the same fundamental pattern: fast early gains that asymptotically flatten. For running pace data, either is a sound choice. Logarithmic is slightly simpler to interpret (the coefficient a directly tells you "seconds of improvement per unit of ln(week)"), which is why we recommend it as the default for trend analysis. These charts plot each model's forecast beyond the training data, so you can see at a glance where the predictions stay realistic and where they drift into fantasy. The chart below shows all four models fitted to the same training data. The actual pace values (blue dots) end at week 8 — everything beyond is a prediction. Notice how the linear and exponential lines keep diving, while the logarithmic and power curves flatten into a realistic plateau. The straight line fits the training period reasonably well, but projects an impossible pace of 2:53/km at week 52 — a reminder that constant improvement is a mathematical fiction. The curve mirrors how runners actually improve: rapid early gains that gradually flatten, predicting a realistic 5:02/km plateau after one year of training. Nearly indistinguishable from the logarithmic model, the power curve confirms the diminishing-returns pattern, two different equations arriving at the same truth. Slightly better than linear but still too optimistic, the exponential model bends just enough to look plausible in the short term while still predicting an unrealistic 3:27/km at week 52. When we isolate the two best-fitting models, the difference becomes subtle over the training period but significant in the projection zone. Both track the actual data closely through week 8, but the linear model keeps promising improvement that will never come. To implement this with real GPX files, load each file, compute the average pace, and build your arrays: $gpxFiles = glob('runs/2025-*.gpx'); $weeks = []; $weeklyPaces = []; foreach ($gpxFiles as $i => $file) { $trackpoints = parseGpx($file); $splits = buildKmSplits($trackpoints); [$paces] = Arr::extract($splits, ['pace']); $weeks[] = $i + 1; $weeklyPaces[] = Stat::mean($paces); } // Compare both models $linear = Stat::linearRegression($weeks, $weeklyPaces); $logarithmic = Stat::logarithmicRegression($weeks, $weeklyPaces); Run the example and then try it with your own GPX files. Here's what to watch for: CV below 5%: your pacing is disciplined. Above 8%: investigate what's causing the variation (hills? starting too fast?). Elevation slope: track this number over months. As your hill strength improves, each meter of climb should cost fewer seconds. Cardiac drift slope: a lower bpm/km slope means better aerobic fitness and hydration. Compare this across similar runs. Z-scores: any km with |z| > 2 deserves investigation — was it a genuine outlier (stoppage, cramp) or a terrain feature? Week-over-week trend: a negative slope means improvement. Plateaus are normal; a positive slope (getting slower) may signal overtraining or insufficient recovery. The full script with all helper functions and simulated data is available in the repository. You can run it with: php examples/article-gpx-running-analysis.php Source: examples/article-gpx-running-analysis.php Class Method What it does Stat mean(), median(), stdev() Basic descriptive statistics on pace, HR, elevation Stat quantiles(), percentile() Pace distribution — where do your km fall? Stat coefficientOfVariation() Single number for pacing consistency Stat correlation() Elevation vs pace, HR vs pace, HR vs time Stat linearRegression() Quantify hill cost, cardiac drift rate, improvement trend Stat logarithmicRegression() Model diminishing returns (pace improvement plateau) Stat powerRegression(), exponentialRegression() Alternative non-linear trend models Stat rSquared() How well does elevation/time explain your pace? Stat zscores() Flag unusual km segments Stat outliers(), iqrOutliers() Detect anomalous km (stops, sprints) Stat skewness(), kurtosis() Distribution shape of your pace Stat confidenceInterval(), sem() Estimate your true pace range Freq frequencyTableBySize() Heart rate zone distribution Arr extract() Extract columns from split data Format secondsToTime() Human-readable pace and time formatting Install it and start exploring your own runs: composer require hi-folks/statistics