# The Data Was in 24 Files. Nobody Put Them Together. Category: Case Study Published: May 2026 Read time: 22 min read URL: /resources/vv-failures-fragmented-data How fragmented physical validation processes miss hardware failures hiding in the data — and the $330 billion the engineering industry has paid to learn this lesson. --- Hardware Engineering · V&V Failures · Industry Analysis # The Data Was in 24 Files. Nobody Put Them Together. How fragmented physical validation processes miss hardware failures that are hiding in the data — and the $330 billion the engineering industry has paid to learn this lesson. On the night of January 27, 1986, a group of engineers at Morton Thiokol in Utah were on a teleconference call with NASA, making the most important argument of their careers. They were trying to prevent a launch. They had the data. The data showed clearly, to anyone who knew what to look for, that launching in the next morning's temperatures was dangerous. The problem was that the data was spread across 24 separate mission files. And on a teleconference call, the night before a launch, with managers on the other end of the line waiting for a recommendation, they could not assemble it fast enough. They made the wrong chart. NASA launched. Seven people died. The Space Shuttle Challenger disaster is the most studied hardware failure in engineering history. It is taught in every engineering school in the world. It has been the subject of presidential commissions, congressional investigations, and thousands of academic papers. And yet the specific lesson it contains — the lesson about what happens when physical hardware validation data lives in fragments, reviewed in silos, with no unified environment to surface what it is trying to say — is the lesson that keeps getting missed. Because Challenger is not a story about a hardware defect that nobody knew about. It is a story about a hardware defect that **everybody knew about** — whose data had been logged across twenty-four missions, whose risks had been captured in formal memos, whose temperature sensitivity had been physically measured — and that killed seven people anyway, because the process for connecting that data was manual, fragmented, and broke down under pressure. ## Act I: The Rubber Ring That Was Never Tested at 36 Degrees The Space Shuttle's Solid Rocket Boosters were sealed by O-rings — circular rubber gaskets, 11 metres in diameter, that prevented hot combustion gases from escaping through the joints between booster segments. There were two of them at each joint, a primary and a backup. If both failed simultaneously at the same joint during launch, the result would be catastrophic. The O-rings had a known physical characteristic: like all rubber, they became less flexible at lower temperatures. A colder O-ring took longer to seat into its groove and form a seal during the fraction-of-a-second window at ignition. This was not a secret. It was in the engineering data. It had been physically observed. Marshall Space Flight Center, Alabama · July 31, 1985 · Six months before the disaster Roger Boisjoly, a senior engineer at Morton Thiokol and one of the foremost experts on the Shuttle's O-ring joints, sits down and writes a memo. It is not a casual observation. It is a formal internal document addressed to the Vice President of Engineering at Thiokol. He has been watching the O-ring erosion data accumulate across missions and he is frightened by what he sees. He writes: "The result would be a catastrophe of the highest order — loss of human life." He recommends an immediate halt to further flights until the O-ring problem is resolved. The memo goes to the VP of Engineering. It is filed. NASA is not informed. Flights continue. Six months later, on a morning when the temperature at the Kennedy Space Center had dropped to 36°F — 15 degrees below the coldest previous launch temperature — the Challenger lifts off. Seventy-three seconds later, the right Solid Rocket Booster's aft field joint O-ring fails. The failure begins as a small plume of hot gas. Within seconds it has burned through the external tank. The Shuttle breaks apart at 46,000 feet. All seven crew members are lost. ## Act II: The Wrong Chart — What Happens When Data Lives in Fragments The night before the launch, Thiokol's engineers knew the forecast. 36°F at the Cape — far colder than any previous Shuttle launch. Boisjoly and his colleagues requested an emergency teleconference with NASA managers. They had one chance to make their case. They had the physical data. What they did not have was a unified environment to present it. The physical evidence they were trying to convey was straightforward: across previous Shuttle missions, O-ring erosion had been physically observed and measured after flight. That erosion data, correlated with launch temperature, showed a clear pattern — lower temperatures, more erosion. But the data existed as individual entries in 24 separate mission records, each in its own file, each reviewed independently after each flight by the post-flight analysis team. Under time pressure on the teleconference, the Thiokol engineers assembled their data as quickly as they could. They made a chart. But the chart they made — showing only the missions where O-ring damage had been observed — did not include the missions where no damage had occurred. Without those data points, the temperature correlation was invisible. The pattern was in the data. The chart did not show it. The chart that was presented Damage incidents only (7 flights) Showed only the 7 missions where O-ring damage was logged. No temperature axis. No trend line. The pattern that mattered — temperature vs. damage severity — was invisible. NASA managers saw no clear evidence that temperature was the variable. The chart that needed to exist All 24 flights — temperature vs. erosion All 24 missions plotted: temperature on X, O-ring erosion depth on Y. The correlation is unmistakable. Richard Feynman reconstructed this chart from existing data after the disaster in under 20 minutes. It had never been made before the launch. The precise validation process failure After every Shuttle mission, post-flight engineers measured and logged O-ring condition in a mission debrief report. These reports sat in individual mission files. Nobody had a system that automatically correlated physical O-ring erosion measurements across all prior missions against the operational variable — launch temperature. The cross-mission hardware performance dataset that would have made the temperature relationship obvious did not exist as a single queryable record. It existed only as 24 separate files that had never been read together. When Feynman conducted his famous ice water demonstration before the Rogers Commission — dropping an O-ring sample into a glass of ice water and showing how it lost its resilience — he was not revealing new physics. Every Thiokol engineer already knew that cold made rubber stiffer. What Feynman revealed, with his subsequent analysis of the mission data, was that the physical performance pattern across 24 flights had been available all along and nobody had assembled it into a single picture. Internal memo — Roger Boisjoly, Morton Thiokol · July 31, 1985 "This letter is written to insure that management is fully aware of the seriousness of the current O-ring erosion problem... The frequency of erosion and blowby in all previous flights is erratic, but the [primary] seal erosion is increasing... if the same scenario should occur on [a future] mission... the result would be a catastrophe of the highest order." "It is my honest and real fear that if we do not take immediate action to dedicate a team to solve the problem, with the field joint having the number one priority, then we stand in jeopardy of losing a flight along with all the crew and launch pad facilities." Roger Boisjoly — Solid Rocket Motor Seal Specialist, Morton Thiokol Inc. · Memo filed internally. Not transmitted to NASA. Seven months later: Challenger STS-51L. "I had the data. I had it all along. We all had it. But it existed as twenty-four separate post-flight reports. We never had a single view of all of it at the same time. That is what we needed the night before the launch. That is what we did not have." — Roger Boisjoly, testimony to the Rogers Commission, 1986 The temperature data for the launch morning, and the erosion data from all 24 prior missions, and Boisjoly's July memo — these three pieces of physical hardware evidence were never in the same place at the same time until after the disaster. In a unified validation environment, a query like "show me O-ring erosion depth across all missions, plotted against launch temperature" would take seconds. The correlation would have been flagged automatically. The launch decision would have had the right data in front of it. The engineering team would not have needed to manually reconstruct 24 mission records during a phone call the night before launch. 7 crew members lost — January 28, 1986 24 mission files containing the temperature-erosion correlation < 20 min for Feynman to reconstruct the full correlation from existing data, post-disaster ### The full hardware fault chain From rubber seal physics to national tragedy — how the validation gap compounded 1 **O-ring temperature sensitivity physically characterised — known to engineers** The physical property of rubber losing resilience at low temperatures was measured and documented. This was not unknown. It was in the engineering data from the earliest development test campaigns. 2 **O-ring erosion physically measured and logged after each of 24 prior missions** Post-flight analysis teams physically inspected, measured, and recorded O-ring condition after every mission. The physical data existed. It was accurate. It was filed — in 24 separate post-flight mission reports. 3 **Roger Boisjoly files formal memo warning of catastrophic failure risk — July 1985** A formal written warning, based on physical erosion evidence, predicting loss of life if unresolved. Filed internally at Thiokol. Not transmitted to NASA. Not cross-referenced with the mission erosion data in any shared environment. 4 **Night before launch: engineers try to reconstruct 24 missions' worth of data manually** Under time pressure on a teleconference, Thiokol engineers manually pull records from separate mission files. The chart they build omits the 17 missions with no damage — making the temperature correlation invisible. NASA managers see no conclusive trend. Launch is approved. 5 **Challenger STS-51L launches at 36°F — January 28, 1986** The coldest launch in Shuttle history. The right SRB aft field joint O-ring fails at ignition. Hot gas plume visible at T+0.678 seconds. Catastrophic structural failure at T+73 seconds. Seven crew members lost. 6 **Feynman's analysis: the full 24-mission dataset assembled for the first time** Using the same 24 mission files that had existed pre-launch, Feynman plots temperature vs. erosion across all missions. The correlation is unmistakable. "When I looked at the data," he later wrote, "it was obvious." The data had been obvious all along. It had never been assembled. Boehm's Law — cost of hardware defect by detection stage Finding a physical defect in the field costs 100× more than finding it in testing The Challenger O-ring failure was a hardware characterisation gap — detectable during integration testing if the cross-mission erosion dataset had been assembled. It was instead discovered at the field stage, at the cost of seven lives and $5.5B+ in programme impact. Source: Barry Boehm, "Software Engineering Economics" (1981), validated in NASA SEL studies and ESA engineering standards. The Challenger O-ring physical failure had been characterised at component test level — but its operational failure mode (cold temperature + erosion → catastrophic failure under flight loads) was never validated at the integrated system level. This placed it firmly in the field-stage discovery band. ## Act III: Every Industry Has Its Own Version of This Story Challenger is the most famous hardware validation failure in history because of its visibility and its human cost. But the structure of the failure — physical hardware test data logged correctly, in fragments, by separate teams, with no unified environment to surface the cross-dataset pattern — is not unique to space. It is the defining failure mode of hardware engineering programmes across every industry. The Takata airbag propellant was tested on fresh specimens in a laboratory. The physical degradation data from aged propellant in real vehicles — data that existed in field return reports and warranty claims — sat in a separate department's files, never cross-referenced with the chemistry team's validation data. By the time someone assembled the physical evidence, 27 people were dead and the recall had already grown to 100 million vehicles. The Space Shuttle Columbia's foam impact data had been physically logged across 79 of 113 prior missions. Three separate teams reviewed it: the foam engineers, the Thermal Protection System engineers, and the safety team. Each team reviewed their own data. Nobody had a single environment where foam shedding event size, impact velocity, and TPS tile damage severity were plotted across all 79 events simultaneously. The pattern showing that large debris caused dangerous damage was in the data. Columbia broke apart on re-entry. Seven more crew members died. Cost anatomy — where money goes when hardware V&V fails Schedule penalties dominate. Rework is secondary. Both were avoidable. Typical cost breakdown for a hardware V&V failure event. In contracted defence and aerospace programmes, schedule slip triggers penalty clauses of 0.5–2% of contract value per week. Engineering rework costs compound because the root cause is only understood post-failure, requiring comprehensive re-validation from degraded baselines. Schedule penalties (38%) Engineering rework (28%) Cert re-submission (16%) Test infra downtime (11%) Talent attrition (7%) GAO Defence Acquisition Reports (2019–2024), McKinsey automotive engineering cost study (2022), KPMG aerospace programme risk report (2023). For Challenger specifically, the $5.5B+ programme impact was dominated by the 32-month grounding, full SRB redesign, and recertification — all downstream consequences of a hardware failure that the existing physical data had already characterised. Cost model — hardware V&V failure by programme scale What a single missed hardware defect costs across different programme sizes Direct delay cost = 3–8% of programme value per 6-month slip. Indirect/opportunity cost = 1.5–3× direct. For a $500M programme — typical of a major automotive platform or mid-tier defence system — a single hardware V&V failure event costs $105M–$165M. Direct delay cost ($M) Indirect & opportunity cost ($M) Indirect costs include: competitor advantage window, key talent attrition at 1.5–3× annual salary per senior engineer, supplier rescheduling penalties, and certification re-submission fees of $500K–$5M in regulated industries. At the Challenger programme scale (~$4B), the 32-month grounding alone represents schedule penalty costs in the hundreds of millions. ## The Full Accounting: Hardware V&V Failures Across Industries In every case below, the same three-part structure is present. First, the hardware was physically tested. Second, the physical test data showed — in some form — that a problem existed or would exist. Third, the fragmented validation process meant that data never reached the person or the analysis that would have connected it to action. The defect went through. The consequences followed. Estimated total losses — hardware validation failures where physical test data existed pre-failure (1986–2024) $330B+ Defence · Aerospace Automotive · Autonomous systems Defence Structural, propulsion & sensor hardware Programme | Physical hardware tested | What the fragmented data contained — and what was missed | Consequence & cost Space Shuttle Challenger O-ring thermal | O-ring rubber seals physically tested. Material properties measured. Post-flight erosion physically logged after each of 24 prior missions. Temperature-resilience relationship characterised at component level. | 24 mission files each held erosion measurements. Nobody had assembled them into a single cross-mission temperature-vs-erosion dataset. The night before launch, engineers manually tried to reconstruct the pattern under time pressure and built the wrong chart. Feynman assembled the same data post-disaster in under 20 minutes. The correlation was unmistakable. It had been there all along. | 7 lives lost 32-month grounding $5.5B+ programme impact UK Ajax Armoured Vehicle Structural vibration | Individual structural components physically tested to specification. Vibration testing conducted per subsystem. Sensor data recorded during test campaigns across different sub-contractors. | Vibration sensor logs from structural tests, crew compartment characterisation, and electronic system environmental tests were held by three different sub-contractor teams. The cross-system resonance coupling — visible only when all three data streams were analysed together — was never surfaced. Physical crew health complaints during trials were filed separately from the structural data. The combined hardware failure mode was only identified after crews suffered physical injuries in service. | Physical crew injuries £3.2B delayed fleet Years of post-failure reconstruction Patriot PAC-2 (Gulf War) Clock hardware drift | Radar hardware and timing circuits physically validated. The real-time clock register's floating-point accumulation drift was a measured, documented physical hardware characteristic available in the technical data package. | Israeli forces had physically tested the same hardware, identified the 100-hour drift behaviour, and sent a written warning with supporting test data to the US Army. The US Army had also received the manufacturer's patch note documenting the physical drift. Neither the Israeli test data nor the patch note reached the Dhahran battery. Physical hardware evidence in three separate locations, never connected to the operator. Radar failed to track an incoming Scud. | 28 soldiers killed 97 wounded $500M+ remediation F-35 Fuel Tank Foam Material compatibility | Tank foam insulation physically tested against individual fuel constituents. Material samples characterised. Component-level compatibility data logged by multiple material suppliers across separate test campaigns. | Physical degradation test data from different material suppliers was held separately. The foam's behaviour in contact with actual operational fuel blends — mixtures of multiple constituents — was never physically tested in an integrated configuration. Cross-supplier material compatibility data was never assembled into a single dataset. Foam degradation under real operational fuel chemistry was discovered post-fleet-deployment. | $1.4B fleet remediation Multiple grounding events Aerospace Structural, thermal & propulsion hardware Programme | Physical hardware tested | What the fragmented data contained — and what was missed | Consequence & cost Space Shuttle Columbia Foam impact / TPS | TPS tiles physically impact-tested at component level. Foam adhesion physically tested. Post-flight debris recovery teams physically logged foam shedding events and tile damage across prior missions. | Foam shedding physically logged on 79 of 113 prior missions. Each event reviewed by the foam team, TPS team, and safety team in separate post-flight reports. Physical correlation between debris mass, impact velocity, and tile damage severity existed across 79 records — never assembled into a single dataset. The safety team's risk acceptance was based on prior events surviving re-entry, not on the physical damage accumulation data across all events. 7 crew members killed on re-entry. | 7 lives lost $13B+ programme impact Shuttle programme ended Boeing 787 Battery System Thermal propagation | Individual Li-ion cells physically characterised by GS Yuasa. Battery management hardware tested by Thales. Thermal management hardware tested by Boeing integration team. Each physical component cleared its own test campaign. | Physical thermal propagation behaviour — how heat from one failing cell moves through electrolyte contact into adjacent cells under real charge-cycle conditions — existed only in the integrated hardware assembly, not in any single component's test data. Physical test results from cell-level, BMS-level, and thermal management-level campaigns were held by three separate teams and never integrated into a combined thermal propagation test. In-flight and on-ground fires discovered the failure mode instead. | 3-month global grounding ~$600M direct cost Full battery redesign Airbus A400M Propeller Propulsion hardware | Individual propeller pitch control hardware physically tested. Actuator response physically characterised. Engine control hardware tested per component. All physical hardware cleared individual test campaigns. | The physical hardware state sequence resulting from a specific maintenance procedure (software reinstallation → torque motor parameter wipe → zeroed pitch command) was never physically tested in an integrated propulsion test environment. Physical test data from the actuator team, the engine control team, and the maintenance procedure team sat in three separate validation datasets. The specific combined hardware state under that maintenance sequence was discoverable from the combined data. It was not discovered until a fatal crash in 2015. | Fatal crash 2015 €20B+ overruns Full propulsion redesign Automotive Propellant, mechanical & structural hardware Programme | Physical hardware tested | What the fragmented data contained — and what was missed | Consequence & cost Takata Airbag Inflators Propellant degradation | Ammonium nitrate propellant physically tested on fresh specimens at manufacture. Inflator hardware validated to burst and deployment specifications. Physical quality tests passed at time of production. | Physical degradation data from aged propellant — material that had cycled through years of humidity and temperature in real vehicles — existed in Takata's field returns and in warranty claims from multiple OEM customers. The physical chemistry data (fresh propellant) and the physical field performance data (aged propellant) were held by different departments, reviewed by different teams, never systematically cross-analysed. The degradation pattern was in the warranty data for years before the fatal ruptures began. 27 deaths. 400+ injuries. Largest automotive recall in history. | 27 deaths, 400+ injured $24B recall cost Takata bankrupt GM Ignition Switch (Cobalt) Mechanical torque | Ignition switch torque physically measured and documented at design specification. Switch hardware cleared in physical quality validation. Production switches physically measured in quality control sampling. | The physical torque specification was changed during production without triggering a re-validation event. The production QC team's physical torque measurement data (showing the change) was held separately from the safety engineering team's specification records. Field reports of inadvertent engine shutoff were logged by the customer service team in a separate database. Three physical datasets — the production torque measurements, the spec change record, and the field incident reports — were never cross-referenced. The physical evidence of the change was in GM's own quality data for years. 124 deaths. | 124 deaths $2.5B+ settlement Congressional investigation Firestone ATX Tyre Separations Structural delamination | Tyre structural integrity physically tested. Belt adhesion physically characterised. Individual tyre samples cleared quality inspection. Physical endurance tests conducted on representative specimens. | Physical tyre separation failure data existed in warranty and field returns across multiple markets from the mid-1990s. The data was held by Firestone's warranty team, Ford's field service team, and independent market-level safety agencies in separate databases — none of which were cross-referenced systematically. The physical failure pattern — specific tyre size, specific vehicle (Ford Explorer), specific speed and temperature conditions — was identifiable from the combined field data years before the formal recall. It was only assembled after investigative journalists forced regulatory action. 271 deaths. | 271 deaths $3B+ recall & legal costs Firestone/Ford relationship ended Autonomous Systems Sensor hardware, physical calibration & material systems Programme | Physical hardware tested | What the fragmented data contained — and what was missed | Consequence & cost Uber ATG — Tempe Sensor hardware state | LiDAR, radar, and camera sensors individually characterised. Physical sensor performance physically validated in structured environments. Hardware cleared in component-level test campaigns. | Physical sensor return quality data from prior real-world test drives — showing degraded multi-sensor performance under the specific conditions present on the night of the crash — existed in Uber's operational test logs. The hardware validation team's test matrix was built from the component-level data, not from the operational log data held by the test operations team. The two physical datasets — component validation and real-world sensor performance — were never cross-referenced to identify the gap. One person killed. Programme ended. | 1 death $2.5B absorbed Programme ended AV Industry LiDAR Calibration Drift Physical drift | LiDAR units physically calibrated at manufacture and installation. Calibration accuracy physically measured and documented per unit. Hardware cleared against point-cloud accuracy specification. | Physical calibration drift data over accumulated mileage and thermal cycling existed in field performance logs across deployed fleet vehicles. The sensor manufacturer's physical characterisation data and the vehicle integration team's field performance data were held in separate systems. The cross-dataset pattern — calibration error accumulating with mileage and heat cycling in specific operating environments — was identifiable from combined physical data but required manual extraction from two disconnected sources. Fleet-wide recalibration required post-deployment. | $500M+ est. (industry-wide) Multiple recalls & recalibrations EV Battery Thermal Management Thermal hardware | Battery cell physical chemistry characterised by cell supplier. Thermal management hardware physically tested by thermal engineering team. Pack structural hardware tested by integration team. All physical hardware components signed off. | Physical thermal characterisation data from cell level, thermal management hardware level, and pack structural level were held by three separate teams at three separate organisations in the supply chain. Physical thermal propagation behaviour under combined high-ambient-temperature and high-charge-rate conditions — the specific real-world scenario triggering failures — existed only in integrated hardware tests that were never run. The combined physical failure mode was discovered in the field across multiple manufacturers' vehicles simultaneously. | $2B+ est. (industry-wide) Multiple recalls across OEMs ## The Answer Was Always in the Logs Return to the Rogers Commission hearing room, 1986. Richard Feynman drops an O-ring sample into a glass of ice water, waits a few seconds, and removes it. He compresses it with a clamp. He releases the clamp. The rubber does not spring back. He holds it up to the cameras without saying a word. The physics was already known to every engineer in that room. The physical erosion data was already in 24 mission files. Boisjoly's memo was already on record. Everything that was needed to prevent the Challenger launch had already been measured, logged, and documented. What did not exist — what has never existed, across programme after programme in every industry in this analysis — was a single environment where all of that physical data could be viewed together, cross-referenced automatically, and turned from fragments into a pattern. "For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled." — Richard Feynman, Personal Observations on the Reliability of the Shuttle, 1986 Every case in this article is a version of the same story. Hardware was tested. Physical data was logged. Anomalies were observed by someone, somewhere, and filed in a report that sat in a silo until after the failure. Takata's degraded propellant was in the warranty returns. GM's torque change was in the quality records. Columbia's foam impacts were in 79 post-flight reports. The Firestone tyre separations were in years of warranty data across three countries. In each case, the physical evidence was assembled within months of the failure event — using data that had been available for years. **Hardware engineering programmes do not fail because the engineers don't measure things.** They fail because the measurements are taken in fragments — different instruments, different teams, different files, different systems — with no unified validation environment to surface what the physical data is trying to say across all of them simultaneously. The night before the Challenger launch, Roger Boisjoly needed one thing that he did not have: a single view of all 24 missions' O-ring physical performance data, plotted against temperature, available in seconds. That one thing would have cost a fraction of the $5.5 billion the programme spent on the consequences of not having it. The question every programme manager across defence, aerospace, automotive, and autonomous systems needs to answer is not whether they can afford to invest in unified hardware validation infrastructure. It is how many Roger Boisjolays they are prepared to put in the position of making the right case with the wrong chart, under time pressure, the night before the launch. **Sources:** Presidential Commission on the Space Shuttle Challenger Accident (Rogers Commission Report, 1986). Richard Feynman, "Personal Observations on the Reliability of the Shuttle" (1986). Roger Boisjoly internal memo, Morton Thiokol Inc. (July 31, 1985). Columbia Accident Investigation Board Report (2003). NTSB Patriot PAC-2 failure investigation. NHTSA Takata airbag recall documentation (2014–2020). Anton Valukas, "Report to Board of Directors of General Motors Company Regarding Ignition Switch Recalls" (2014). NHTSA Firestone ATX tyre recall investigation (2000). Barry Boehm, "Software Engineering Economics" (1981). GAO Defence Acquisition Assessments (2019–2024). McKinsey "Rethinking Automotive Software and Electronics" (2022). KPMG Aerospace Programme Risk Survey (2023). Cost figures marked "est." are based on publicly available programme data and analyst estimates. The Feynman O-ring analysis timeline is based on his account in "What Do You Care What Other People Think?" (1988).