Automated Function Level Python Code Summarization Using a Transformer Based Model





Find us on Google Scholar

Peer Review Policy
Article Processing Charges
Publication Procedure
Research Topics
FAQ
Copyright Infringement
Refund and Cancellation Policy

Find us on Google Scholar

Peer Review Policy

Article Processing Charges

Publication Procedure

Research Topics

FAQ

Refund and Cancellation Policy

Version
Download 126
File Size 417.23 KB
File Count 1
Create Date 26/07/2025
Last Updated 26/07/2025

Download

Description

Automated Function Level Python Code Summarization Using a Transformer Based Model

Author: Yaswanth Kharidu1 (MCA student), Ambati Tulasi2 Assistant Professor (Ad-hoc)

1 Department of IT & CA,2 Department of CS & SE,

Andhra University College of Engineering, Visakhapatnam, AP.

Corresponding Author: Yaswanth Kharidu (email-id: yaswanthkharidu@gmail.com)

Abstract - In this research work, we present a transformer-based method for generating function-level summaries of Python code using synthetically generated data. The primary objective is to automate the creation of docstrings, which are essential for code readability, reuse, and maintainability. Traditional datasets for code summarization are either scarce or noisy, which limits the performance and generalizability of data-driven models. To address this challenge, we designed a pipeline that synthetically generates a dataset containing Python functions and their corresponding human-readable summaries, mimicking real-world documentation patterns. We employ the CodeT5-small transformer model in a sequence-to-sequence (seq2seq) learning framework to perform the summarization task. The dataset is preprocessed to remove noise, normalize formatting, and tokenize inputs suitable for the model. Training is conducted over multiple epochs, with the model progressively improving its understanding of the mapping between code and natural language descriptions. The evaluation phase uses both automated metrics—such as BLEU, ROUGE-1, ROUGE-2, ROUGE-L, and Exact Match—and manual inspection through human evaluation scores to assess the quality and coherence of generated summaries. The results demonstrate consistent improvements in accuracy, with occasional fluctuations resembling realistic model behavior. To enhance accessibility and usability, a lightweight Streamlit web application is developed that allows users to input custom Python code and receive automatically generated docstrings.

Keywords: Python Code Summarization, CodeT5, Natural Language Processing, Transformer, Synthetic Dataset, Docstring Generation, Streamlit, Software Documentation, Code Analysis, ROUGE Score, BLEU Score, Fine-tuning, Sequence-to-sequence, Human Evaluation.

Automated Function Level Python Code Summarization Using a Transformer Based Model

Automated Function Level Python Code Summarization Using a Transformer Based Model

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us

Automated Function Level Python Code Summarization Using a Transformer Based Model

Automated Function Level Python Code Summarization Using a Transformer Based Model

What is DOI

Site Map

Frequently Asked Questions

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us