Multi-Objective Grammar-Guided Genetic Programming for Grammar-Obeying Program Synthesis

Tao, Ning

Multi-Objective Grammar-Guided Genetic Programming for Grammar-Obeying Program Synthesis

Author(s)

Tao, Ning

Uri

http://hdl.handle.net/10197/28256

Date Issued

2025

Date Available

2025-05-29T14:26:59Z

Abstract

Grammar-obeying program synthesis is a subfield of program syn- thesis (i.e., the task of automatically discovering an executable piece of code) that requires generated code to follow a Backus-Naur Form (BNF) grammar. Having code that obeys a grammar limits the structure of the code and the set of callable functions/method- s/libraries, thus having an impact on (i) security: e.g., using ma- licious functionalities, blacklisted libraries, and questionable con- structs, (ii) the computing environment: e.g., if the hardware is not able to run some functions or if some functions are too costly in terms of memory or energy, and (iii) code quality: improving code style and readability, reducing code smells, and enhancing consistency. Grammar-Guided Genetic Programming (G3P) is recognised as one of the most successful approaches for grammar-obeying pro- gram synthesis that evolves programs in arbitrary languages to solve program synthesis problems based on a set of input-output examples and predefined BNF grammars. Despite its success, the restriction on the evolutionary system to only leverage input-output error rate during its assessment of the programs it derives lim- its its scalability to larger and more complex program synthesis problems. Alternatively, while large language models (LLMs) are showing increasing success in program synthesis from a task de- scription, they often struggle to generate accurate code due to am- biguity in task specifications, complex programming syntax, and lack of reliability in the generated code. Furthermore, their gener- ative nature limits their ability to fix erroneous code with iterative LLM prompting. In this thesis, I focus on enhancing the performance of G3P for grammar-obeying program synthesis tasks. I begin by evaluating the state-of-the-art systems, including G3P and LLMs, in grammar- obeying program synthesis. Next, I explore how to map LLM- generated code into grammar-obeying forms, followed by seeding the grammar-mapped LLM-generated code into G3P’s initial pop- ulation to improve the evolutionary process. Building on these foundations, I investigate the potential of using code similarity measures to guide the evolution of programs, leading to the devel- opment of Bi-Objective Grammar-Guided Genetic Programming (BOG3P) and Multi-Objective Grammar-Guided Genetic Program- ming (MOG3P), which combine input-output error rate with mul- tiple similarity measures as objective functions. Finally, I integrate LLM-generated code into the MOG3P framework in two ways: (i) using LLM-generated code as a seed for the evolutionary pro- cess through a grammar-mapping phase, and (ii) leveraging multi- ple similarity measures towards LLM-generated code to guide the search process throughout the evolution. Additionally, I further extend the MOG3P framework by incorporating code from multi- ple LLMs to enhance the diversity and effectiveness of the search process.

Type of Material

Doctoral Thesis

Publisher

University College Dublin

Qualification Name

Ph.D.

Copyright (Published Version)

2025 the Author

Subjects

Large language models...

Natural language task...

AI generated programs...

Grammar compliance

Computer programming

Language

English

Status of Item

Peer reviewed

This item is made available under a Creative Commons License

https://creativecommons.org/licenses/by-nc-nd/3.0/ie/

Name

Ning_Tao_Thesis_15205926.pdf

Size

980.96 KB

Format

Adobe PDF

Checksum (MD5)

433f42847aeb8c0d612f8c2bdc481509

Owning collection

Computer Science Theses

Options

Multi-Objective Grammar-Guided Genetic Programming for Grammar-Obeying Program Synthesis