Generative Models for Source Code: Fine-Tuning Techniques for Structured Pattern Learning
Abstract
:1. Introduction
Research Goal
- Can code be generated in forms that are functional, expressive, readable, and maintainable?
- How can transfer-learning techniques improve the structural coherence while maintaining the functional consistency of generative models for programming code?
2. Related Works
2.1. Generative Models for Source Code
2.2. The DeepSeek Models
- Collect code from GitHub, and apply filtering rules to select useful content;
- Parse file dependencies within the same repository to reorganize file locations, based on their dependencies;
- Concatenate dependent files into a single record and use repository-level min-hashing, for deduplication;
- Further filter, to exclude low-quality code, such as code with syntax errors or poor readability.
3. Methodologies for Code Generative Models Adaptation
3.1. Classical Fine-Tuning
- Step 1: Loading the pre-trained model and freezing the initial layers. The process starts by loading a pre-trained model, in our case DeepSeek Coder, which has been trained on large datasets. The pre-trained model already captures general patterns, such as syntactic structures in a programming language and semantic aspects. The early layers responsible for learning these basic features are typically frozen, meaning that their weights are not updated during fine-tuning and that the general knowledge acquired during pre-training is preserved, reducing computational cost and ensuring that the model retains its ability to generalize across different tasks.
- Step 2: By contrast, the final layers of the model, which are more task-specific, are unfrozen. These layers are fine-tuned by updating their weights with task-specific data, allowing the model to adapt to new requirements. For example, the final layers could learn specialized patterns relevant to programming languages, such as syntax and logical structures, in our case of source code generation. After fine-tuning, these layers incorporate the specific features of the new dataset. The general generative knowledge from the pre-training is retained.
- Step 3: Reduced Rate Training. This is a training technique [24] to ensure that the fine-tuning process effectively adapts the model without disrupting the pre-trained knowledge, preventing overfitting and ensuring that the model effectively learns task-specific features. A reduced learning rate helps avoid large and sudden changes in model weights, allowing for gradual and focused refinement.
3.2. Adapter Fine-Tuning
- Low-Rank Adapters (LoRA): LoRA introduce low-rank adapters into neural network models, such as the attention layers and feed-forward networks of transformer models, without changing the original weights [28,29]. They use low-rank matrices to approximate the weights, reducing the number of parameters to train while maintaining high performance with less required memory.
- IA3: IA3 implements autoencoders in the internal activations of models to regulate these activations, allowing for targeted modifications, without directly changing the layers’ weights. This method provides more granular control over model response and improves memory management efficiency.
- AdaLoRA: A variant of LoRA, AdaLoRA combines the low-rank approach with adaptive learning. This method emphasizes dynamic parameter adjustment during fine-tuning, providing finer customization and greater control over the data and specific task requirements.
- 4-bit quantization: PEFT implements 4-bit quantization techniques, useful for loading large language models (LLMs) on non-specialized hardware such as consumer GPUs. This process significantly reduces memory consumption and prevents system overload.
3.3. Proposed Fine-Tuning Methodology
- Instantiate the base model;
- Create a configuration (LoraConfig) where the parameters for LoRA are specified;
- Encapsulate the base model with the method get_peft_model() to obtain a trainable PeftModel;
- Train the PeftModel using classic model training methods.
3.4. Proposed Dataset Generation Process Incorporating Structural Patterns
- Below is an instruction that describes a task. Write a response
- that appropriately completes the request.
- ### Instruction:
- Automate a Python program to check the length of each word in a
- given string.
- ### Input:
- ### Output:
- def lengthWord(inp):
- result = []
- for word in inp.split(’ ’):
- result.append(len(word))
- return result
- # Driver Code
- inp = "Generating a code generation task instructions"
- print(lengthWord(inp))
- Removing optional text: using an appropriate script, the initial optional text part was removed from all samples in the dataset, e.g., the phrase “Below is an instruction that describes a task. Write a response that appropriately completes the request.” in the sample prompt above.
- Main structure personalization: to ensure that the model follows the expected structure without being influenced by its prior knowledge of the English language, the following replacements were made:
- –
- The word “Instruction” is replaced with the Italian word “Istruzione”.
- –
- The word “Input” is replaced with the Italian word “Ingresso”.
- –
- The word “Output” is replaced with the Italian word “Uscita”.
The script used for these replacements can be found in Appendix B Listing A2.The objective was to insert specific structural patterns, i.e., placeholder comments at predefined points in the code. This intervention was designed to create distinctive markers, unknown both to DeepSeek Coder Base and to the Alpaca18k original dataset. Placeholders are useful for clearly and effectively evaluating the model’s ability to incorporate structure in code generation once subjected to different fine-tuning strategies.The structural comments placeholders were inserted as follows:- –
- Before each function definition “def function_name”,was added a comment “# Definition of function function_name;
- –
- Before each if condition, was added a comment “# If condition”;
- –
- Before each while loop, was added a comment “# While loop”;
- –
- Before each for loop, was added a comment “# For loop”.
- Padding and End of Sequence token: since not all samples in the dataset are of the same length, padding up to the maximum length of the longest sentence was applied during tokenization using the EOS (End Of Sequence) token. This means that, for each shorter sentence or text sample compared to the maximum length, EOS tokens were added at the end to bring all samples to the same length. This procedure ensures that the machine learning model can effectively handle the entire dataset, as natural language processing models typically require uniform input lengths. Moreover, inserting EOS tokens allows for a precise and timely interruption of the generation process, avoiding the generation of superfluous or irrelevant text.
- ### Istruzione:
- Automate a Python program to check the length of each word in a
- given string.
- ### Ingresso:
- ### Uscita:
- # Definizione della funzione lengthWord
- def lengthWord(inp):
- result = []
- # For loop
- for word in inp.split(’ ’):
- result.append(len(word))
- return result
- # Driver Code
- inp = "Generating a code generation task instructions"
- print(lengthWord(inp))
- <EOS>
4. Experiments and Discussion
4.1. Evaluation Metrics
- BLEU (Bilingual Evaluation Understudy) [31]: Measures the overlap between generated text and reference translations using n-gram precision and a brevity penalty. It is mainly used to evaluate the quality of translations and code generation.
- ROUGE (Recall-Oriented Understudy for Gisting Evaluation) [32]: Focuses on recall and the ability of the generated text to capture essential content, with variants such as ROUGE-N (n-gram overlap), ROUGE-L (longest common subsequence), and ROUGE-S (skip-bigram overlap).
- METEOR (Metric for Evaluation of Translation with Explicit Ordering) [33]: Balances precision and recall while accounting for semantic meaning through synonyms and word order, providing a flexible evaluation metric.
- Perplexity [34] indicates the model’s uncertainty in predicting sequences, with lower values suggesting better language modeling and text generation accuracy. It is based on the inverse probability of the given text, normalized by text length. Low perplexity indicates that the model predicts words with high precision, showing a good understanding of linguistic structures.
4.2. Experimental Plan
4.3. Fine-Tuning Experiment
- LlamaAttention: The multi-head attention mechanism, fundamental for focusing on relevant parts of the input. It includes four linear projections: q_proj, k_proj, v_proj, and o_proj, to handle the queries, keys, values, and output, respectively.
- LlamaMLP: A multilayer perceptron (MLP) that processes the output of the attention mechanism. It includes several linear layers (gate_proj, up_proj, down_proj) and the SiLU (Sigmoid Linear Unit) activation function, which introduces non-linearity.
- Normalization: Present in every LlamaDecoderLayer, with two modules, input_layernorm and post_attention_layernorm, used before and after the attention mechanism, respectively. This normalization is crucial for stabilizing learning and improving model performance.
- Sample Prompt 1.
- ### Istruzione:
- Python function named funzione\_anno to check if year is a leap year.
- ### Ingresso:
- ### Uscita:
- ### Istruzione:
- Python function named funzione_anno to check if year is a leap
- year.
- ### Ingresso:
- ### Uscita:
- ### Esempio di input:
- ### Esempio di output:
- ### Esempio di input:
- ### Istruzione:
- Python function named funzione_anno to check if year is a leap year.
- ### Ingresso:
- ### Uscita:
- # Definizione della funzione funzione_anno
- def funzione_anno(year):
- # Condizione if
- if year % 4 == 0 and (year % 100 != 0 or year % 400 == 0): return True
- else:
- return False
4.4. Adapter Experiment
4.5. Results Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Generative Sample Prompts and Answers
- Alpaca18k Sample Record.
- Below is an instruction that describes a task. Write a response
- that appropriately completes the request.
- ### Instruction:
- Automate a Python program to check the length of each word in a
- given string.
- ### Input:
- ### Output:
- def lengthWord(inp):
- result = []
- for word in inp.split(’ ’):
- result.append(len(word))
- return result
- # Driver Code
- inp = "Generating a code generation task instructions"
- print(lengthWord(inp))
- Tuning data set: Sample Record after structural pattern encapsulation.
- ### Istruzione:
- Automate a Python program to check the length of each word in a
- given string.
- ### Ingresso:
- ### Uscita:
- # Definizione della funzione lengthWord
- def lengthWord(inp):
- result = []
- # For loop
- for word in inp.split(’ ’):
- result.append(len(word))
- return result
- # Driver Code
- inp = "Generating a code generation task instructions"
- print(lengthWord(inp))
- <EOS>
- Prompt 01 sample.
- ### Istruzione:
- Python function named funzione\_anno to check if year is a leap year.
- ### Ingresso:
- ### Uscita:
- Prompt 01: answer before tuning.
- ### Istruzione:
- Python function named funzione_anno to check if year is a leap
- year.
- ### Ingresso:
- ### Uscita:
- ### Esempio di input:
- ### Esempio di output:
- ### Esempio di input:
- Prompt 01: answer after Fine-tuning.
- ### Istruzione:
- Python function named funzione_anno to check if year is a leap year.
- ### Ingresso:
- ### Uscita:
- # Definizione della funzione funzione_anno
- def funzione_anno(year):
- # Condizione if
- if year % 4 == 0 and (year % 100 != 0 or year % 400 == 0): return True
- else:
- return False
- Prompt 01: answer after Adapter tuning.
- ### Istruzione:
- Python function named funzione_anno to check if year is a leap year.
- ### Ingresso:
- ### Uscita:
- # Definizione della funzione funzione_anno def funzione_anno(year):
- # Condizione if
- if (year % 4 == 0 and year % 100 != 0) or (year % 400 == 0):
- return True
- else:
- return False
- Prompt 02 Sample.
- ### Istruzione:
- Generate two random numbers using randint and then sum them,
- then multiply first number with the sum.
- If the sum is greater than 10, print‘Greater‘
- ### Ingresso:
- ### Uscita:
- Prompt 02: answer after Fine-tuning.
- ### Istruzione:
- Generate two random numbers using randint and then sum them, then multiply first number with the sum. If the sum is greater than 10, print ‘Greater‘
- ### Ingresso:
- ### Uscita:
- # Definizione della funzione sum_and_multiply
- def sum_and_multiply(num1, num2):
- sum = num1 + num2
- # Condizione if
- if if sum > 10:
- print(‘Greater‘)
- else:
- print(num1 * sum)
- Prompt 02: answer after Adaper tuning.
- ### Istruzione:
- Generate two random numbers using randint and then sum them, then multiply first number with the sum. If the sum is greater than 10, print ‘Greater‘.
- ### Ingresso:
- ### Uscita:
- import random
- a = random.randint(1, 10)
- b = random.randint(1, 10)
- sum = a + b
- # Condizione if
- if sum > 10:
- print(‘Greater‘)
Appendix B. Code Listings
- Listing A1. Script for adding comments to code while maintaining correct indentation.
- 1
- def add_comment_to_functions(prompt):
- 2
- # split code into lines
- 3
- licript for adding comments to code while maintaining correct indentation.nes = prompt.split(’\n’)
- 4
- modified_lines = []
- 5
- for line in lines:
- 6
- # check if the line contains a function definition
- 7
- match = re.match(r"def\s+(\w+)\s*\(", line)
- 8
- if match:
- 9
- function_name = match.group(1)
- 10
- # find the indentation level of the function definition
- 11
- indentation = len(line) - len(line.lstrip())
- 12
- modified_lines.append(f"{’ ’ * indentation}# Definizione della funzione {function_name}")
- 13
- modified_lines.append(line)
- 14
- else:
- 15
- # check if the line contains while, for, if
- 16
- if line.strip().startswith((’while’, ’for’, ’if’)):
- 17
- indentation = len(line) - len(line.lstrip())
- 18
- if line.strip().startswith(’if’) and not line.strip().startswith(’elif’):
- 19
- modified_lines.append(f"{’ ’ * indentation}# Condizione if")
- 20
- elif line.strip().startswith(’for’):
- 21
- modified_lines.append(f"{’ ’ * indentation}# Ciclo for")
- 22
- elif line.strip().startswith(’while’):
- 23
- modified_lines.append(f"{’ ’ * indentation}# Ciclo while")
- 24
- modified_lines.append(line)
- 25
- else:
- 26
- modified_lines.append(line)
- 27
- modified_code = ’\n’.join(modified_lines)
- 28
- return modified_code
- Listing A2. Script for modifying names within the prompt field.
- 1
- def replace_labels(examples):
- 2
- prompt_text = examples["prompt"]
- 3
- instruction_start = prompt_text.find("### Instruction:")
- 4
- instruction_end = prompt_text.find("\n\n###END")
- 5
- if instruction_start != -1 and instruction_end != -1:
- 6
- examples["prompt"] = prompt_text[instruction_start:instruction_end].replace( "Instruction", "Istruzione").replace("Input", "Ingresso").replace("Output", "Uscita")
- 7
- return examples
- Listing A3. Function to find the number of trainable parameters.
- 1
- def print_trainable_parameters(model):
- 2
- #prints the number of trainable parameters in the model
- 3
- trainable_params = 0
- 4
- all_param = 0
- 5
- for _, param in model.named_parameters():
- 6
- all_param += param.numel()
- 7
- if param.requires_grad:
- 8
- trainable_params += param.numel()
- 9
- print(
- 10
- f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}" )
- Listing A4. Freezing Weights for Fine-Tuning.
- 1
- for name, param in model.named_parameters():
- 2
- param.requires_grad = False
- 3
- 4
- #list of layers to unlock
- 5
- layers_to_unfreeze = [21, 22, 23]
- 6
- for name, param in model.named_parameters():
- 7
- if any(f"model.layers.{i}." in name for i in layers_to_unfreeze):
- 8
- param.requires_grad = True
- 9
- model.model.norm.weight.requires_grad = True
- 10
- model.lm_head.weight.requires_grad = True
- Listing A5. Training Configuration for Fine-Tuning.
- 1
- training_args = TrainingArguments(
- 2
- output_dir="./deepseek-coder-trained",
- 3
- overwrite_output_dir=True,
- 4
- num_train_epochs=1,
- 5
- per_device_train_batch_size=1,
- 6
- save_steps=100,
- 7
- save_total_limit=1,
- 8
- gradient_accumulation_steps = 8,
- 9
- seed = seed,
- 10
- evaluation_strategy="steps",
- 11
- eval_steps=100,
- 12
- eval_accumulation_steps = 1,
- 13
- load_best_model_at_end=True,
- 14
- metric_for_best_model="loss",
- 15
- greater_is_better=False,
- 16
- logging_steps = 100, )
- Listing A6. Adapter Configuration.
- 1
- config = LoraConfig(
- 2
- r=32, lora_alpha=64,
- 3
- target_modules=[
- 4
- "self_attn.q_proj",
- 5
- "self_attn.k_proj",
- 6
- "self_attn.v_proj",
- 7
- "self_attn.o_proj",
- 8
- "mlp.gate_proj",
- 9
- "mlp.up_proj",
- 10
- "mlp.down_proj",
- 11
- "lm_head"],
- 12
- bias="none", lora_dropout=0.05, ask_type="CAUSAL_LM",)
- 13
- model = get_peft_model(model, config)
- Listing A7. Training Configuration for the Adapter.
- 1
- training_args = TrainingArguments(
- 2
- output_dir="./deepseek-coder-trainedadapter",
- 3
- overwrite_output_dir=True,
- 4
- num_train_epochs=1,
- 5
- per_device_train_batch_size=1,
- 6
- save_steps=250,
- 7
- save_total_limit=1,
- 8
- gradient_accumulation_steps = 2,
- 9
- evaluation_strategy="steps",
- 10
- eval_steps=250,
- 11
- eval_accumulation_steps = 1,
- 12
- load_best_model_at_end=True,
- 13
- metric_for_best_model="loss",
- 14
- greater_is_better=False,
- 15
- learning_rate=2.5e-5,
- 16
- seed = seed,
- 17
- )
References
- Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
- Github. DeepSeek Coder. Available online: https://deepseekcoder.github.io (accessed on 3 October 2024).
- Lin, T.; Wang, Y.; Liu, X.; Qiu, X. A survey of transformers. AI Open 2022, 3, 111–132. [Google Scholar] [CrossRef]
- Radford, A.; Narasimhan, K. Improving Language Understanding by Generative Pre-Training. OpenAi Report; 2018. Available online: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 3 October 2024).
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. arXiv 2020, arXiv:2005.14165. [Google Scholar]
- OpenAI. GPT-4 Technical Report. arXiv 2024, arXiv:2303.08774. [Google Scholar]
- Dehaerne, E.; Dey, B.; Halder, S.; De Gendt, S.; Meert, W. Code Generation Using Machine Learning: A Systematic Review. IEEE Access 2022, 10, 82434–82455. [Google Scholar] [CrossRef]
- Yan, D.; Gao, Z.; Liu, Z. A Closer Look at Different Difficulty Levels Code Generation Abilities of ChatGPT. In Proceedings of the 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), Luxembourg, 11–15 September 2023; pp. 1887–1898. [Google Scholar] [CrossRef]
- Zhang, X.; Jiang, Y.; Wang, Z. Analysis of Automatic Code Generation Tools based on Machine Learning. In Proceedings of the 2019 IEEE International Conference on Computer Science and Educational Informatization (CSEI), Kunming, China, 16–19 August 2019; pp. 263–270. [Google Scholar] [CrossRef]
- Chen, M.; Tworek, J.; Jun, H.; Yuan, Q.; Pinto, H.P.d.; Kaplan, J.; Edwards, H.; Burda, Y.; Joseph, N.; Brockman, G.; et al. Evaluating Large Language Models Trained on Code. arXiv 2021, arXiv:2107.03374. [Google Scholar]
- Wang, Y.; Wang, W.; Joty, S.; Hoi, S.C. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online, Punta Cana, Dominican Republic, 7–11 November 2021; Moens, M.F., Huang, X., Specia, L., Yih, S.W.t., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 8696–8708. [Google Scholar] [CrossRef]
- Naik, P.; Nelaballi, S.; Pusuluri, V.; Kim, D.K. Deep Learning-Based Code Refactoring: A Review of Current Knowledge. J. Comput. Inf. Syst. 2022, 64, 314–328. [Google Scholar] [CrossRef]
- López Espejel, J.; Yahaya Alassan, M.S.; Chouham, E.M.; Dahhane, W.; Ettifouri, E.H. A comprehensive review of State-of-The-Art methods for Java code generation from Natural Language Text. Nat. Lang. Process. J. 2023, 3, 100013. [Google Scholar] [CrossRef]
- Shi, E.; Wang, Y.; Zhang, H.; Du, L.; Han, S.; Zhang, D.; Sun, H. Towards Efficient Fine-Tuning of Pre-trained Code Models: An Experimental Study and Beyond. In Proceedings of the ISSTA 2023—32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, Seattle, WA, USA, 17–21 July 2023; pp. 39–51. [Google Scholar] [CrossRef]
- Chi, K.; Li, C.; Ge, J.; Luo, B. An Empirical Study on Code Search Pre-trained Models: Academic Progresses vs. Industry Requirements. In Proceedings of the Internetware ’24—15th Asia-Pacific Symposium on Internetware, Macau, China, 24–26 July 2024; pp. 41–50. [Google Scholar] [CrossRef]
- Odeh, A.; Odeh, N.; Mohammed, A. A Comparative Review of AI Techniques for Automated Code Generation in Software Development: Advancements, Challenges, and Future Directions. TEM J. 2024, 13, 726–739. [Google Scholar] [CrossRef]
- DeepSeek. DeepSeek AI Ltd., Hangzhou, China. Available online: https://www.deepseek.com/ (accessed on 3 October 2024).
- Gao, J.; Heng, F.; Yuan, Y.; Liu, Y. A novel machine learning method for multiaxial fatigue life prediction: Improved adaptive neuro-fuzzy inference system. Int. J. Fatigue 2024, 178, 108007. [Google Scholar] [CrossRef]
- Gao, J.; Liu, Y.; Yuan, Y.; Heng, F. Residual Strength Modeling and Reliability Analysis of Wind Turbine Gear under Different Random Loadings. Mathematics 2023, 11, 4013. [Google Scholar] [CrossRef]
- Gao, J.X.; Heng, F.; Yuan, Y.P.; Liu, Y.Y. Fatigue Reliability Analysis of Composite Material Considering the Growth of Effective Stress and Critical Stiffness. Aerospace 2023, 10, 785. [Google Scholar] [CrossRef]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Human Language Technologies, Volume 1 (Long and Short Papers), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2 June 2019; Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019. [Google Scholar] [CrossRef]
- Yu, Y.; Zuo, S.; Jiang, H.; Ren, W.; Zhao, T.; Zhang, C. Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach. In Human Language Technologies, Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, Online, 6–11 June 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 1063–1077. [Google Scholar] [CrossRef]
- Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; de Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-Efficient Transfer Learning for NLP. arXiv 2019, arXiv:1902.00751. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Yun, J.; Kim, B.; Kim, J. Weight Decay Scheduling and Knowledge Distillation for Active Learning. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXVI. Speringer: Berlin/Heidelberg, Germany, 2020; pp. 431–447. [Google Scholar] [CrossRef]
- Vilares Ferro, M.; Doval, Y.; Ribadas Pena, F.; Darriba Bilbao, V. Early stopping by correlating online indicators in neural networks. Neural Netw. 2022, 159, 109–124. [Google Scholar] [CrossRef] [PubMed]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. In Proceedings of the International Conference on Learning Representations, Virtual Event, 25–29 April 2022. [Google Scholar]
- HuggingFace. PEFT Documentation: LoRa. Available online: https://huggingface.co/docs/peft/conceptual_guides/lora (accessed on 3 October 2024).
- HuggingFace. Datasets: Python Code Instruction 18k Alpaca. Available online: https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca (accessed on 3 October 2024).
- Song, X.; Cohn, T.; Specia, L. BLEU Deconstructed: Designing a Better MT Evaluation Metric. Int. J. Comput. Linguist. Appl. 2013, 4, 29–44. [Google Scholar]
- Barbella, M.; Tortora, G. ROUGE Metric Evaluation for Text Summarization Techniques. 2022. Available online: https://ssrn.com/abstract=4120317 (accessed on 3 October 2024).
- Banerjee, S.; Lavie, A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, USA; 2005; pp. 65–72. [Google Scholar]
- Jelinek, F.; Mercer, R.L.; Bahl, L.R.; Baker, J.K. Perplexity—A measure of the difficulty of speech recognition tasks. J. Acoust. Soc. Am. 1977, 62, S63. [Google Scholar] [CrossRef]
- Bochman, A. The Markov assumption: Formalization and impact. In Proceedings of the IJCAI ’13—Twenty-Third International Joint Conference on Artificial Intelligence, Beijing, China, 3–9 August 2013; AAAI Press: Menlo Park, CA, USA, 2013; pp. 782–788. [Google Scholar]
Step | Training Loss | Validation Loss |
---|---|---|
100 | 0.651200 | 0.537167 |
200 | 0.520300 | 0.511235 |
300 | 0.508900 | 0.498147 |
400 | 0.511700 | 0.491705 |
500 | 0.499100 | 0.486529 |
600 | 0.501200 | 0.483028 |
700 | 0.487700 | 0.478938 |
Step | Training Loss | Validation Loss |
---|---|---|
250 | No log | 0.510076 |
500 | 0.609700 | 0.475910 |
750 | 0.609700 | 0.468542 |
1000 | 0.487800 | 0.460246 |
1250 | 0.487800 | 0.456496 |
1500 | 0.480600 | 0.455638 |
1750 | 0.480600 | 0.452426 |
2000 | 0.463100 | 0.451217 |
2250 | 0.463100 | 0.450196 |
Method | Trainable %Parameters | Validation Loss Plateau at #Steps | Perplexity Pre-Train | Perplexity Post-Train |
---|---|---|---|---|
Fine-Tuning | 16.18% | 600 | 4.64 | 1.61 |
Adapter | 2.26% | 500 | 4.64 | 1.57 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Franzoni, V.; Tagliente, S.; Milani, A. Generative Models for Source Code: Fine-Tuning Techniques for Structured Pattern Learning. Technologies 2024, 12, 219. https://doi.org/10.3390/technologies12110219
Franzoni V, Tagliente S, Milani A. Generative Models for Source Code: Fine-Tuning Techniques for Structured Pattern Learning. Technologies. 2024; 12(11):219. https://doi.org/10.3390/technologies12110219
Chicago/Turabian StyleFranzoni, Valentina, Silvia Tagliente, and Alfredo Milani. 2024. "Generative Models for Source Code: Fine-Tuning Techniques for Structured Pattern Learning" Technologies 12, no. 11: 219. https://doi.org/10.3390/technologies12110219
APA StyleFranzoni, V., Tagliente, S., & Milani, A. (2024). Generative Models for Source Code: Fine-Tuning Techniques for Structured Pattern Learning. Technologies, 12(11), 219. https://doi.org/10.3390/technologies12110219