Impact of large language models on quality and efficiency of code generation: Systematic Literature Review
DOI:
https://doi.org/10.37135/ns.01.15.10Keywords:
Code editors, code generation, review methodology, large language models, systematic literature reviewAbstract
Large Language Models (LLMs) are revolutionizing efficiency and automation in programming code generation. This study aims to provide an overview of these models' state-of-the-art and specific programming applications. The methodology enabled the identification of 549 initial publications, screened through successive phases to select 27 key articles for qualitative analysis. The results show that LLMs significantly improve the quality of generated code, enhance programmer productivity, and support learning through tools such as ChatGPT, Codex, and GitHub Copilot. The analysis highlights that 66.7% of the reviewed articles were published in 2024, reflecting a growing interest in this field. The United States leads research with 11 publications, followed by the Netherlands and Switzerland. The selected studies were filtered to generate a critical synthesis evaluating the quality and applicability of LLMs in code writing and debugging. This review examines the main advantages of LLMs, such as increased productivity and optimized code generation. However, it also addresses challenges like accuracy issues and the risk of generating errors or insecure code. LLMs can significantly enhance generated code quality and facilitate new programmers' learning curve. Nevertheless, the need for rigorous human oversight is emphasized to ensure the reliability of the solutions generated by these models.
Downloads
References
Bae, J., Kwon, S., & Myeong, S. (2024). Enhancing Software Code Vulnerability Detection Using GPT-4o and Claude-3.5 Sonnet: A Study on Prompt Engineering Techniques. Electronics, 13(13), 2657. https://doi.org/10.3390/electronics13132657
Bender, E. M, Gebru T., McMillan-Major A., Shmitchell S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT '21). Association for Computing Machinery, New York, NY, USA, 610–623. https://doi.org/10.1145/3442188.3445922
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. de O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., … Zaremba, W. (2021). Evaluating Large Language Models Trained on Code. arXiv Computer Science. https://doi.org/10.48550/arXiv.2107.03374
De Fitero-Dominguez, D., Garcia-Lopez, E., Garcia-Cabot, A., & Martinez-Herraiz, J.-J. (2024). Enhanced automated code vulnerability repair using large language models. Engineering Applications of Artificial Intelligence, 138. https://doi.org/10.1016/j.engappai.2024.109291
Foster, M. J., & Jewell, S. T. (2017). Assembling the pieces of a systematic review: A guide for librarians. Rowman & Littlefield Publishers.
Fritsch, R. F., & Jatowt, A. (2024). LLMTemporalComparator: A Tool for Analysing Differences in Temporal Adaptations of Large Language Models. arXiv Computer Science. https://doi.org/10.48550/arXiv.2410.04195
Haddaway, N. R., Page, M. J., Pritchard, C. C., & McGuinness, L. A. (2022). PRISMA2020: An R package and Shiny app for producing PRISMA 2020-compliant flow diagrams, with interactivity for optimised digital transparency and Open Synthesis. Campbell Systematic Reviews, 18(2), e1230. https://doi.org/10.1002/cl2.1230
Haindl, P., & Weinberger, G. (2024). Students’ Experiences of Using ChatGPT in an Undergraduate Programming Course. IEEE Access, 12, 43519-43529. https://doi.org/10.1109/ACCESS.2024.3380909
Harzing, A.W. (2007). Publish or Perish. https://harzing.com/resources/publish-or-perish
Jošt, G., Taneski, V., & Karakatič, S. (2024). The Impact of Large Language Models on Programming Education and Student Learning Outcomes. Applied Sciences, 14(10), 4115. https://doi.org/10.3390/app14104115
Kau, A., He, X., Nambissan, A., Astudillo, A., Yin, H., & Aryani, A. (2024). Combining Knowledge Graphs and Large Language Models. arXiv Computer Science. https://doi.org/10.48550/arXiv.2407.06564
Kazemitabaar, M., Chow, J., Ma, C. K. T., Ericson, B. J., Weintrop, D., & Grossman, T. (2023). Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 1-23. https://doi.org/10.1145/3544548.3580919
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., … Alonso-Fernández, S. (2021). Declaración PRISMA 2020: Una guía actualizada para la publicación de revisiones sistemáticas. Revista Española de Cardiología, 74(9), 790-799. https://doi.org/10.1016/j.recesp.2021.06.016
Pornprasit, C., & Tantithamthavorn, C. (2024). Fine-tuning and prompt engineering for large language models-based code review automation. Information and Software Technology, 175, 107523. https://doi.org/10.1016/j.infsof.2024.107523
Tao, N., Ventresque, A., Nallur, V., & Saber, T. (2024). Enhancing Program Synthesis with Large Language Models Using Many-Objective Grammar-Guided Genetic Programming. Algorithms, 17(7), 287. https://doi.org/10.3390/a17070287
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. arXiv Computer Science. https://doi.org/10.48550/arXiv.1706.03762
Wang, R., Xu, S., Tian, Y., Ji, X., Sun, X., & Jiang, S. (2024). SCL-CVD: Supervised contrastive learning for code vulnerability detection via GraphCodeBERT. Computers & Security, 145, 103994. https://doi.org/10.1016/j.cose.2024.103994
Weber, I. (2024). Large Language Models as Software Components: A Taxonomy for LLM-Integrated Applications. arXiv Computer Science. https://doi.org/10.48550/ARXIV.2406.10300
Yun, S., Lin, S., Gu, X., & Shen, B. (2024). Project-specific code summarization with in-context learning. Journal of Systems and Software, 216, 112149. https://doi.org/10.1016/j.jss.2024.112149
Zhou, X., Liang, P., Zhang, B., Li, Z., Ahmad, A., Shahin, M., & Waseem, M. (2025). Exploring the problems, their causes and solutions of AI pair programming: A study on GitHub and Stack Overflow. Journal of Systems and Software, 219, 112204. https://doi.org/10.1016/j.jss.2024.112204