Beyond the Numbers: The 3 Durability Benchmarks That Build Real Performance
The data available to a modern triathlete is genuinely extraordinary. Power meters, continuous glucose monitors, aerodynamic sensors, HRV trackers. A single Sunday ride produces more physiological data than an athlete in the 1990s would have collected in a season.
What is not extraordinary is what the data is measuring. The metrics that most athletes track most obsessively — FTP, VO2 max, twenty-minute peak power — represent what the body can do when it is fresh, well-rested, and performing in a controlled test environment. That is a specific and useful number. It is also not the number that determines race performance for the large majority of long-course athletes.
The race does not happen in the first hour. It happens in the fourth hour of the bike and the second half of the run. An athlete who can produce 370 watts for twenty minutes at the start of a session may be unable to sustain 260 watts after riding 100 kilometres. Their heart rate drifts upward, their power falls, and their mechanics deteriorate. The ceiling was high. The floor, the power available under accumulated race-specific fatigue, was not. Most athletes respond to this by trying to raise the ceiling further. The more productive intervention is raising the floor.
Two athletes with different fitness profiles illustrate this concretely. The first has a fresh threshold of 300 watts and a fatigued threshold of 220 watts — an 80-watt gap that the race will expose. The second has a fresh threshold of 280 watts and a fatigued threshold of 260 watts. The first athlete looks better in a fresh test. The second athlete will almost certainly have a better race. Building durability means compressing that gap between fresh and fatigued output, not just pushing the ceiling higher.
01 | Benchmarks Instead of Tests
Standard fitness testing produces numbers inflated by freshness and the psychological effects of preparation. Two days of rest, a deliberate warm-up, and the specific intention of performing maximally all produce a result that cannot be reproduced at the 90-kilometre mark of a bike leg. It is useful as a reference point and misleading as a guide to race-day capacity.
Benchmarks work differently. A benchmark is a standard session run routinely — on a Tuesday morning before work, with no taper and no fanfare. The same session is repeated across a preparation block and compared over time. What is being measured is not the highest possible output. It is the relationship between output and perceived effort, and how that relationship changes as fitness develops. The goal is not a higher number. It is the same number at a lower physiological cost, or a higher number at the same cost. Both indicate genuine adaptation. Neither requires special preparation to observe.
02 | The Swim Benchmark
The swim benchmark is simple to describe and demanding to execute correctly.
The standard session is 20 x 100 metres with five to ten seconds rest between repetitions at a Moderate to Medium effort. For full-distance preparation, this extends to 40 x 100 metres. Pull buoy and paddles are mandatory throughout. The entire session uses both.
The pull buoy is not a training aid for swimmers who cannot hold their hips. For a triathlete in a wetsuit, the legs are not the primary propulsive force — the wetsuit lifts the hips and the legs provide balance and stability. A triathlete kicking hard for 3.8 kilometres is depleting the legs before they are needed for four to seven hours of subsequent effort. The pull buoy replicates this condition and removes the cardiovascular demand of the kick, isolating the upper body muscular endurance that the race actually tests.
The paddles enforce honesty in the catch. With a larger surface area, a dropped elbow and a slipping hand are immediately detectable because the resistance disappears. A clean entry and a solid press and push hold the resistance. The equipment teaches the mechanics while loading them, rather than separating the two.
Pool swimming is artificially forgiving. Every 25 or 50 metres, a wall push provides momentum and a brief recovery. The water is controlled and still. Open water offers none of this — no walls, no turns, and depending on conditions, chop, current, and the turbulence of thousands of other athletes. A delicate stroke with a high turnover that works in a lane often fails when the water is fighting back. What open water requires is torque: the specific upper body strength to grab a hold of the water and pull the body past the hand rather than gliding through undisturbed water. The long continuous set with paddles and pull buoy builds exactly this, developing a stroke robust enough to sustain its mechanics when the conditions stop cooperating.
There is also a mental dimension to this session that is specific to long-course preparation. Staring at a black line for 40 repetitions is monotonous, and the mind begins to wander somewhere around repetition fifteen. This is not a problem to solve by making the session more entertaining. It is the stimulus. Ironman racing requires sustained focus across several hours of repetitive effort. An athlete who cannot maintain technical discipline through the boredom of a long pool set is practising the exact mental habit that produces a deteriorating stroke in the back half of the race swim. The skill is treating repetition thirty-five with the same technical precision as repetition one.
The assessment is straightforward. A durable athlete holds pace within one to two seconds per 100 metres across the full set, with RPE remaining broadly consistent from the first ten repetitions to the last. An athlete who holds 1:30 per 100 metres through the first twenty repetitions with an RPE of 7 but is fighting to hold the same pace at RPE 9 by repetition thirty has demonstrated the specific vulnerability that race conditions exploit: the ability to start well and the inability to sustain it.
03 | The Bike Benchmark
The bike benchmark is a 60-minute continuous effort in three blocks, and the defining constraint is that the power, heart rate, and speed fields are covered before starting. Record the session, but do not look at the data while riding.
The three blocks run as follows. The first twenty minutes at Moderate effort: breathing is comfortable, a complex explanation is possible without gasping, the legs are working but not loaded. The second twenty minutes at Medium effort: breathing is audible and rhythmic, brief communication is comfortable but extended conversation is not. The final twenty minutes at Mad effort: at or near the sustainable ceiling, only short responses are available.
The logic behind removing the data is specific. When riding to a target number, the number becomes the governor of the session in both directions. On a good day, it holds performance below capacity — the athlete is capable of 280 watts but coasts to the 260-watt target. On a bad day, it forces performance beyond what the body can actually sustain — the athlete holds 260 watts when 240 is what the physiology is capable of, building a recovery debt that distorts the following days. Both outcomes reduce the quality of the information the session produces and the quality of the training stimulus.
Removing the data forces auto-regulation. The athlete must listen to leg tension, breathing rate, and accumulated effort to calibrate the three levels. This is precisely the skill the race requires — executing a four-to-eight-hour effort in variable conditions where a device reading will periodically be unavailable, misleading, or irrelevant to the actual decision being made.
After the session, the data becomes the diagnostic. A successful session produces a staircase: clearly distinct power levels in each block that increase as the effort prescription increased. Moderate produces 200 watts, Medium produces 230, Mad produces 260. The athlete's internal effort scale was calibrated. A session that produces a flat or declining power graph despite a rising subjective effort level — where Mad felt like an RPE of 9 but the power barely moved from the Medium block — identifies a specific problem: high fatigue perception with low mechanical output. The central governor is limiting production below the available capacity. This is a trainable quality, but it requires honest identification before it can be addressed. Repeating the session with the screen covered, rather than adjusting FTP settings, is the appropriate response.
04 | The Run Benchmark
Most long runs train the wrong quality. The session accumulates time, but as fatigue builds through the second half the form degrades — hips drop, ground contact time increases, cadence falls — and the final thirty to forty-five minutes rehearse deteriorated mechanics rather than developing anything useful. An athlete who does this consistently is not building durability. They are practising the collapse that durability training is specifically designed to prevent.
The run benchmark strips the ego from long running and replaces continuous volume with quality volume.
The standard session is 15 repetitions of three minutes running at Moderate effort followed by one minute of walking. Total duration is 60 minutes. For full-distance athletes, this extends to 40 or more repetitions covering over two and a half hours. The pace is race-comparable: half-Ironman or Ironman target pace rather than anything faster.
The walk interval is not a recovery concession. It performs two specific functions. In a continuous run of extended duration, the cardiovascular and structural systems accumulate load linearly — by ninety minutes the core is fatigued, impact absorption is compromised, and the joints are absorbing load that the supporting musculature is no longer cushioning adequately. The one-minute walk breaks this accumulation cycle. Heart rate drops briefly, spinal alignment can be consciously reset, and hip position recovers before the next running segment begins. The structural integrity of every subsequent three-minute block is higher for the interruption than it would be in continuous running.
The assessment is direct: can the form in the final few three-minute blocks match the form in the opening ones? If it cannot — if the hips are dropping and the stride is shortening earlier in each segment than in previous sessions — the durability limit has been reached. Stopping at that point rather than continuing in compromised form is the correct decision. Continuing past the durability limit trains the athlete to run poorly under fatigue rather than training them to hold form through it. The progression across a training block is the form holding for longer before the limit is reached, and the limit itself arriving later. That is durability development.
The connection between this kind of fatigue-state form practice and the specific mechanics that break down in the late race run is covered in detail in the articles on form under fatigue and run off the bike.
The athlete who slows down the least across the back half of a long race is not necessarily the one with the highest FTP or the fastest pool split. They are the one who has trained the floor of their performance rather than exclusively the ceiling. These three sessions, repeated consistently across a preparation block, measure and develop that quality in each discipline. If you want to work with a coach who builds this kind of durability development into the preparation from the start, Sense Endurance Coaching is where to begin.
If you are preparing from a plan, the same benchmarks and the same training logic are embedded in the structure. You can find the full range on the training plans page. A high ceiling is useful. A high floor wins races.