📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
AI systems have achieved near-complete automation of core engineering tasks in AI research, with progress on research automation still emerging. This shifts the landscape of AI development, reducing reliance on human engineering.
Recent developments confirm that AI systems are now capable of automating the majority of engineering tasks involved in AI research, while the automation of research activities themselves remains incomplete, according to Thorsten Meyer’s analysis of recent benchmark progress.
Multiple benchmarks measuring AI capabilities in AI R&D tasks have shown rapid progress, with core engineering tasks reaching near-complete automation. For example, the CORE-Bench, which tests AI’s ability to reproduce research, improved from 21.5% in September 2024 to 95.5% by December 2025, with the benchmark’s author stating it is ‘solved.’ Similarly, the MLE-Bench, assessing AI performance in Kaggle competitions, rose from 16.9% in October 2024 to 64.4% in February 2026, indicating AI’s competitiveness with mid-tier human practitioners. These benchmarks suggest that the bottleneck in research reproduction and engineering is rapidly diminishing.
Meanwhile, progress in kernel design—such as AI-generated GPU kernels and automated code conversion—demonstrates that engineering tasks are becoming increasingly automated and production-ready. However, the extent to which AI can fully automate research activities, such as hypothesis generation and scientific discovery, remains uncertain. Thorsten Meyer notes that while engineering is nearing full automation, research may involve additional creative or strategic elements that are not yet fully replicable by AI systems.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.

AI Automation for Small Business: Save Hours Every Week with Simple AI Workflows for Email, Customer Support, Content, Invoices, Leads, and Daily Business Tasks
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.
GPU kernel automation software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.

Machine, Platform, Crowd: Harnessing Our Digital Future
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational

Fundamentals of Deep Learning: Designing Next-Generation Machine Intelligence Algorithms
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications for AI Development and Innovation
The rapid automation of engineering tasks in AI research suggests a potential acceleration in AI development cycles, reducing costs and increasing reproducibility. This could lead to faster iteration on models and algorithms, shifting the competitive landscape. However, the partial automation of research itself raises questions about the future role of human scientists in scientific discovery and innovation, possibly transforming the nature of AI research teams.
Recent Benchmark Progress and AI Capabilities
Over the past 18 months, multiple independent benchmarks—CORE-Bench, MLE-Bench, and kernel design advances—have shown consistent progress toward automating core AI research tasks. These benchmarks measure different skills: research reproduction, Kaggle competition performance, and low-level hardware optimization. Their overlapping saturation points indicate that AI’s engineering capabilities are approaching human-level proficiency in many areas, driven by rapid model improvements and automation tools.
Thorsten Meyer’s analysis highlights that the structural pattern across these benchmarks suggests the bottleneck in AI research is shifting from engineering to the research process itself, which may involve more creative and strategic elements that are harder to automate fully.
“AI can today automate vast swaths, perhaps the entirety, of AI engineering. It is not yet clear how much of AI research it can automate, given that some aspects of research may be distinct from the engineering skills.”
— Thorsten Meyer
Unresolved Questions on Research Automation
It remains unclear how fully AI can automate the creative, strategic, and hypothesis-driven aspects of research. While engineering tasks are approaching full automation, the complexity of scientific discovery and innovation may still require human insight, and the timeline for this transition is uncertain.
Future Milestones and Research Directions
Over the next 32 months, expected developments include further saturation of engineering benchmarks, increased deployment of automated kernel design tools, and the emergence of new frameworks for measuring AI’s research capabilities. Researchers and institutions will likely focus on understanding and accelerating the automation of research processes, aiming to close the residual gap.
Key Questions
What does automation of engineering mean for AI research teams?
It suggests that many routine and complex engineering tasks can now be handled by AI, potentially reducing costs and increasing speed in developing new models and systems.
Will AI fully replace human researchers in the near future?
While automation is advancing rapidly in engineering, the automation of creative and strategic research activities remains uncertain and may take longer to achieve.
How reliable are current benchmarks in measuring AI’s research capabilities?
Benchmarks like CORE-Bench and MLE-Bench have shown significant progress and are approaching saturation points, but they primarily measure specific skills and may not fully capture all aspects of research automation.
What are the risks of over-relying on AI for research and engineering?
Potential risks include reduced human oversight, loss of scientific diversity, and overdependence on automated systems that may miss novel or unexpected insights.
Source: ThorstenMeyerAI.com