Impact of large language models on quality and efficiency of code generation: Systematic Literature Review

Authors

DOI:

https://doi.org/10.37135/ns.01.15.10

Keywords:

Code editors, code generation, review methodology, large language models, systematic literature review

Abstract

Large Language Models (LLMs) are revolutionizing efficiency and automation in programming code generation. This study aims to provide an overview of these models' state-of-the-art and specific programming applications. The methodology enabled the identification of 549 initial publications, screened through successive phases to select 27 key articles for qualitative analysis. The results show that LLMs significantly improve the quality of generated code, enhance programmer productivity, and support learning through tools such as ChatGPT, Codex, and GitHub Copilot. The analysis highlights that 66.7% of the reviewed articles were published in 2024, reflecting a growing interest in this field. The United States leads research with 11 publications, followed by the Netherlands and Switzerland. The selected studies were filtered to generate a critical synthesis evaluating the quality and applicability of LLMs in code writing and debugging. This review examines the main advantages of LLMs, such as increased productivity and optimized code generation. However, it also addresses challenges like accuracy issues and the risk of generating errors or insecure code. LLMs can significantly enhance generated code quality and facilitate new programmers' learning curve. Nevertheless, the need for rigorous human oversight is emphasized to ensure the reliability of the solutions generated by these models.

Downloads

Download data is not yet available.

References

Bae, J., Kwon, S., & Myeong, S. (2024). Enhancing Software Code Vulnerability Detection Using GPT-4o and Claude-3.5 Sonnet: A Study on Prompt Engineering Techniques. Electronics, 13(13), 2657. https://doi.org/10.3390/electronics13132657

Bender, E. M, Gebru T., McMillan-Major A., Shmitchell S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT '21). Association for Computing Machinery, New York, NY, USA, 610–623. https://doi.org/10.1145/3442188.3445922

Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. de O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., … Zaremba, W. (2021). Evaluating Large Language Models Trained on Code. arXiv Computer Science. https://doi.org/10.48550/arXiv.2107.03374

De Fitero-Dominguez, D., Garcia-Lopez, E., Garcia-Cabot, A., & Martinez-Herraiz, J.-J. (2024). Enhanced automated code vulnerability repair using large language models. Engineering Applications of Artificial Intelligence, 138. https://doi.org/10.1016/j.engappai.2024.109291

Foster, M. J., & Jewell, S. T. (2017). Assembling the pieces of a systematic review: A guide for librarians. Rowman & Littlefield Publishers.

Fritsch, R. F., & Jatowt, A. (2024). LLMTemporalComparator: A Tool for Analysing Differences in Temporal Adaptations of Large Language Models. arXiv Computer Science. https://doi.org/10.48550/arXiv.2410.04195

Haddaway, N. R., Page, M. J., Pritchard, C. C., & McGuinness, L. A. (2022). PRISMA2020: An R package and Shiny app for producing PRISMA 2020-compliant flow diagrams, with interactivity for optimised digital transparency and Open Synthesis. Campbell Systematic Reviews, 18(2), e1230. https://doi.org/10.1002/cl2.1230

Haindl, P., & Weinberger, G. (2024). Students’ Experiences of Using ChatGPT in an Undergraduate Programming Course. IEEE Access, 12, 43519-43529. https://doi.org/10.1109/ACCESS.2024.3380909

Harzing, A.W. (2007). Publish or Perish. https://harzing.com/resources/publish-or-perish

Jošt, G., Taneski, V., & Karakatič, S. (2024). The Impact of Large Language Models on Programming Education and Student Learning Outcomes. Applied Sciences, 14(10), 4115. https://doi.org/10.3390/app14104115

Kau, A., He, X., Nambissan, A., Astudillo, A., Yin, H., & Aryani, A. (2024). Combining Knowledge Graphs and Large Language Models. arXiv Computer Science. https://doi.org/10.48550/arXiv.2407.06564

Kazemitabaar, M., Chow, J., Ma, C. K. T., Ericson, B. J., Weintrop, D., & Grossman, T. (2023). Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 1-23. https://doi.org/10.1145/3544548.3580919

Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., … Alonso-Fernández, S. (2021). Declaración PRISMA 2020: Una guía actualizada para la publicación de revisiones sistemáticas. Revista Española de Cardiología, 74(9), 790-799. https://doi.org/10.1016/j.recesp.2021.06.016

Pornprasit, C., & Tantithamthavorn, C. (2024). Fine-tuning and prompt engineering for large language models-based code review automation. Information and Software Technology, 175, 107523. https://doi.org/10.1016/j.infsof.2024.107523

Tao, N., Ventresque, A., Nallur, V., & Saber, T. (2024). Enhancing Program Synthesis with Large Language Models Using Many-Objective Grammar-Guided Genetic Programming. Algorithms, 17(7), 287. https://doi.org/10.3390/a17070287

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. arXiv Computer Science. https://doi.org/10.48550/arXiv.1706.03762

Wang, R., Xu, S., Tian, Y., Ji, X., Sun, X., & Jiang, S. (2024). SCL-CVD: Supervised contrastive learning for code vulnerability detection via GraphCodeBERT. Computers & Security, 145, 103994. https://doi.org/10.1016/j.cose.2024.103994

Weber, I. (2024). Large Language Models as Software Components: A Taxonomy for LLM-Integrated Applications. arXiv Computer Science. https://doi.org/10.48550/ARXIV.2406.10300

Yun, S., Lin, S., Gu, X., & Shen, B. (2024). Project-specific code summarization with in-context learning. Journal of Systems and Software, 216, 112149. https://doi.org/10.1016/j.jss.2024.112149

Zhou, X., Liang, P., Zhang, B., Li, Z., Ahmad, A., Shahin, M., & Waseem, M. (2025). Exploring the problems, their causes and solutions of AI pair programming: A study on GitHub and Stack Overflow. Journal of Systems and Software, 219, 112204. https://doi.org/10.1016/j.jss.2024.112204

Published

2025-01-08

Issue

Section

Research Articles and Reviews

How to Cite

Impact of large language models on quality and efficiency of code generation: Systematic Literature Review. (2025). Novasinergia, ISSN 2631-2654, 8(1), 52-66. https://doi.org/10.37135/ns.01.15.10