A new paper from MIT's FutureTech group, "Crashing Waves vs. Rising Tides" (Mertens, Thompson et al., April 2026), is the most rigorous third-party examination of enterprise AI performance to date. Across 17,000 evaluations, conducted by actual domain workers on real tasks drawn from the U.S. Department of Labor's O*NET database, the researchers put a precise number on something enterprise leaders have been sensing: AI performance has a ceiling, and it sits well below what scaled deployment requires.
For executives with accountability for AI ROI, this research is worth reading carefully.
What the MIT Data Actually Shows
The headline number is 60%. That is the share of tasks where current large language models produce output a manager would describe as minimally sufficient. Only 26% reached superior quality.
Here is the detail that matters most. These superior results were achieved when models were provided with all the structured information required for each task before evaluation. That structured context provision is precisely what the Kendall Framework builds. Without it, enterprise AI operates below even that 60% threshold, on proprietary tasks, undocumented processes, and organizationally complex workflows that no model arrives ready to handle.
MIT's data projects that AI will reach 80% to 95% success on text-based tasks by 2029, under optimal conditions. The Kendall Framework is how enterprises create those optimal conditions now, and ongoing, at the quality level operations actually require.
The Enterprise AI Accuracy Ceiling: Named, Measured, and Documented
At The Kendall Project, we describe this limitation as the Context Ceiling: the AI accuracy plateau that enterprise deployments reach when context is absent, inconsistent, or ungoverned. The MIT paper does not use that term, but it describes the same phenomenon with precision, across task types and durations.
Success rates on tasks requiring three to four hours of human effort sat at roughly 50% in mid-2024, rising to around 65% by late 2025. Across task types and durations, the pattern held.
What the paper does not explain is why the ceiling exists. The researchers controlled for task type and task duration, and the floor remained. The constraint is not model capability. The constraint is context.
The "Given the Right Information" Qualifier Is the Whole Story
The MIT methodology provided models with all the information required for each task before evaluation. Even under those optimal conditions, only 60% of outputs were minimally sufficient. In enterprise environments, tasks are proprietary, organizational, and largely undocumented. The 60% figure represents a ceiling for real-world enterprise settings, not a floor.
This is the argument for an AI Bill of Materials. The model is not the constraint. The context supply chain is. For organizations operating at scale, with proprietary processes, undocumented institutional knowledge, and complex decision workflows, that supply chain does not exist by default. It has to be built.
Enterprise AI ROI: Why the Gap Widens Over Time
The MIT team found that AI capability is improving as a rising tide: broad-based, continuous gains across task types rather than dramatic, concentrated surges. Failure rates are halving every 2.4 to 3.2 years. The trajectory projects to roughly 80% success on text-based tasks by 2029.
For enterprise leaders evaluating AI investment timelines, the practical implication is direct: organizations that build context infrastructure now compound that advantage as frontier model capability rises. The Kendall Framework is designed to build that infrastructure before the tide outpaces it.
Why "Minimally Sufficient" Is Not an Enterprise AI Standard
In regulated industries, complex operations, and high-stakes decision environments, minimally sufficient output is not deployable. Regulatory compliance, clinical documentation, procurement analysis all require performance above a basic threshold. The MIT paper quantifies the distance between what generic AI deployment produces and what enterprise operations require.
Context operations are what close that distance. Structured, governed, reusable context, delivered to the model at the point of task execution, is the mechanism. That is what Context Sprints produce and what the Context 360 method is designed to build.
What This Research Confirms for Enterprise AI Strategy
MIT. 17,000 evaluations. Real domain workers. Real operational tasks. The Context Ceiling is observable data now, not assertion. The Kendall Framework is built on exactly this finding: model capability is not the limiting factor, and organizations investing in context infrastructure today are building a compounding operational advantage.
The tide does not wait for the warehouse to be ready.