MIT’s AI Duplicates Struggle with Complex Tasks, Prove Minimally Effective

The Role of AI in Modern Workplaces

The increasing number of American office workers who have tried using artificial intelligence in their daily tasks may have experienced some uncertainty about their job security. However, recent research from MIT suggests that while AI has made significant strides, it is still far from replacing human workers entirely.

According to the latest findings, AI technology is improving but still struggles with complex tasks. The study highlights that AI models are often comparable to a disenchanted intern—meeting basic requirements but failing to produce high-quality work without human oversight.

Evaluating AI Performance

MIT researchers analyzed 41 different large language models (LLMs), including versions of Claude, Gemini, and ChatGPT, to assess their performance on over 11,000 primarily text-based tasks across various job roles. Their outputs were evaluated by humans with real-world experience in those fields. The goal was to determine how often AI could produce acceptable results without any human intervention.

The study found that AI has become more reliable for many types of work, but it still faces challenges when standards or stakes are higher. Using a 1–9 scoring scale, where a 7 is considered “minimally sufficient,” AI models scored a 7 in about 65% of tasks as of late 2025. This indicates that while AI can handle basic tasks, it often requires human refinement to meet higher quality standards.

Limitations of AI in Complex Tasks

One of the key findings from the MIT study is that AI struggles with more complex tasks. Even when given more time, AI models had less than a 50% chance of achieving a “superior” quality score (a 9). This means that jobs requiring multiple steps, creativity, or precision are still better suited for human workers.

This aligns with current trends in corporate America, where companies tend to automate routine tasks rather than replace skilled professionals. In fact, some highly technical skills, especially digital ones, have seen wage increases due to the demand for human expertise.

Industry Examples and Challenges

There have been several instances where companies have faced challenges with AI adoption. For example, Deloitte produced reports for government clients in Australia and Canada that contained fabricated information. Media outlets like CNET and Sports Illustrated have also been caught using AI to generate inaccurate stories under false bylines. Additionally, some law firms have used AI to prepare legal documents, leading to public apologies after fake citations were discovered in court filings.

These examples highlight the ongoing need for human oversight in AI-driven workflows. While AI can assist with basic tasks such as drafting, email writing, and data analysis, it has not yet reached the level of superior performance where human input is no longer necessary.

Future Prospects for AI

Despite these limitations, the MIT researchers noted that AI’s success rate in the tasks analyzed has increased by up to 11 percentage points each year due to more advanced models. By 2029, they estimate that most AI models will be able to complete between 80% and 95% of text-based tasks at the minimally sufficient benchmark.

However, whether AI will ever achieve excellent or perfect performance remains uncertain. The researchers caution that widespread automation, especially in fields with low tolerance for errors, may still be far off.

This story was originally featured on