What if artificial intelligence took the reins of a company, from CEO to technician?
This is the challenge taken up by researchers from Carnegie Mellon University in Pennsylvania, who shared their findings in a study published on Arxiv. Their experiment, which involved entrusting the management of a virtual company to AIs, reveals promising potential, but also persistent shortcomings.
A company entirely driven by AIs
For this simulation, the researchers deployed AI models, such as:
- Claude 3.5 Sonnet (Anthropic),
- GPT-4o (OpenAI),
- Gemini 2.0 (Google),
- Nova (Amazon),
- Llama (Meta),
- Qwen (Alibaba).
These agents took on various roles: financial manager, project coordinator, or even IT developer. A complementary platform simulated virtual colleagues, forcing the AIs to collaborate to complete their missions, just like in a real professional environment.
The tasks assigned were varied: exploring databases, selecting offices via online visits, or drafting documents. The objective was clear: to test the AIs’ ability to manage an organization independently.
Also read: How can ChatGPT help me? 10 concrete examples for work or daily life with AI
Encouraging, but limited, results
The AIs’ performance revealed marked disparities.
- Claude 3.5 Sonnet topped the ranking, managing to finalize 24% of tasks and achieving a partial completion rate of 34.4%. For API costs, Claude 3.5 Sonnet incurred an expense of $6.34.
- Gemini 2.0, coming in second, only completed 11.4% of missions. The other models did not cross the 10% threshold. For its API, the experiment cost only $0.79 with Gemini 2.0.
Relatively low results, for a highly variable cost. In short, companies won’t be managed by an army of robots, controlled by AI, anytime soon!
Obstacles encountered by AIs
Despite promising results, AIs encountered several pitfalls. Researchers identified recurring weaknesses:
- Difficulties interpreting instructions: agents struggle to understand implicit directives, such as recognizing that a “.docx” file corresponds to a Word document.
- Lack of social ease: collaborating with virtual colleagues proved arduous, as AIs lacked subtlety in interactions.
- Online navigation problems: managing complex interfaces, such as pop-up windows, often posed a problem.
- Simplistic behaviors: when faced with difficult tasks, some AIs circumvented the difficulties, mistakenly believing they had completed their mission.
Also read on this topic: The weekly Le Point will cut 58 jobs… some replaced by AI
A promising future? Yes! But still distant…
This experiment highlights an essential truth: while AIs excel at specific tasks, they are still far from being able to manage a company autonomously. Researchers believe that progress in understanding nuances and social interactions will be necessary to bring AIs closer to true independence.
This study nevertheless opens up captivating horizons. With technological advancements, AIs could become powerful allies in managing organizations. For now, they remain powerful but still imperfect tools, serving human teams. And you, would you entrust your company to an AI? Share your opinion in the comments.
Source: Arxiv


