The Debug-gym platform developed by Microsoft Research was created to test the debugging capabilities of artificial intelligence in the software development process. The data obtained shows that artificial intelligence is still far behind human software developers in this field. Although code writing skills have improved to a certain level, the same success could not be achieved in debugging processes.
Artificial intelligence lags far behind humans in software
Even today’s most advanced artificial intelligence models, such as Claude 3.7, OpenAI o1 and OpenAI o3-mini, could not achieve consistent success in the Debug-gym tests. The main reason for this is that these models have not been specially trained on how to use debugging tools.

Microsoft states that this deficiency needs to be addressed and that the models can be improved with special training for debugging processes. Among the solutions offered are the development of small auxiliary models focused on debugging and working together with large models.
Looking at the overall picture, it is clear that it is still too early for software developers to completely delegate their tasks to artificial intelligence. Despite advances in code generation, the skills of human programmers are still far superior, especially in processes that require high attention and understanding of context, such as debugging complex systems.
So what do you think about this issue? Do you think artificial intelligence will be able to replace software developers in the coming period? Do you think artificial intelligence should be used in software processes? You can easily share your opinions with us in the comments section below.