| dc.description.abstract |
Large Language Models (LLMs) are reshaping automated software development,particularly in code generation and debugging. This paper presents a critical review andproposes a novel multi-agent framework addressing key limitations in currentLLM-based systems. A systematic analysis of 179 peer-reviewed studies (2018–2025)highlights performance benchmarks, methodological trends, and persistentchallenges.Our framework integrates specialized agents for planning, coding, debugging,and reviewing, supported by adaptive retrieval and persistent learning mechanisms.Notable innovations include Adaptive Graph-Guided Retrieval for scalable codebasenavigation and Persistent Debug Memory for learning from historical debuggingdata.Experimental results demonstrate 67.3% fix accuracy on real-world debuggingtasks—significantly outperforming Claude (14.2%) and GPT-4.1 (13.8%)—with 92%precision and 85% recall on codebases up to 10 million lines. Performance gains rangefrom 3.1% to 25.4% across standard metrics (e.g., CodeBLEU, Pass@k). Case studiesconfirm applicability across web, systems, and domain-specific development.Despitethese advances, challenges persist in hardware-dependent debugging (23.4% success),dynamic language errors (41.2%), and semantic consistency in complex architectures.We also identify the need for more robust, standardized real-world evaluationprotocols.This work contributes (1) a comprehensive review of LLM-drivendevelopment, (2) a modular, scalable framework, (3) standardized evaluation strategies,and (4) practical insights for production deployment. While LLMs significantly augmentdevelopment workflows, human oversight remains essential in high-stakes contexts. |
en_US |