User-Centric Voice Cloning Platform for Scalable Audiobook Narration





Find us on Google Scholar

Peer Review Policy
Article Processing Charges
Publication Procedure
Research Topics
FAQ
Copyright Infringement
Refund and Cancellation Policy

Find us on Google Scholar

Peer Review Policy

Article Processing Charges

Publication Procedure

Research Topics

FAQ

Refund and Cancellation Policy

Version
Download 99
File Size 438.69 KB
File Count 1
Create Date 26/07/2025
Last Updated 26/07/2025

Download

Description

User-Centric Voice Cloning Platform for Scalable Audiobook Narration

SharathS1

1M.Tech in Software Engineering, Dept. of Information Science & Engineering, R.V. College of Engineering, Bangalore, India.
Email- sharaths0901@gmail.com

Dr. Vanishree K2

2Associate Professor, Dept. of Information Science & Engineering,R.V. College of Engineering,
Bangalore, India.
Email- vanishreek@rvce.edu.in

Abstract-Personalized speech synthesis is emerging as a transformative technology in human–computer interaction, particularly for audiobook narration. This paper presents a deep learning-based voice cloning system that generates speaker-specific and expressive speech using neural text-to-speech (TTS) techniques. The proposed system integrates XTTS v2 for high-fidelity, multilingual synthesis and a pre-trained speaker encoder to extract voice characteristics from short user-provided samples. Operating fully offline, the pipeline enables private, real-time inference without requiring internet connectivity. Given a text input, the system produces speech output that closely mimics the target speaker’s vocal timbre and prosody. It further allows control over pitch, speed, and expressiveness, supporting personalized narration styles. A Streamlit-based graphical interface enables seamless user interaction for uploading voice samples, entering text, real-time playback, waveform visualization, and audio download. The modular design supports multiple-speaker presets and offers future extensibility for emotion-aware synthesis and multi-speaker narration. Experimental results show that the system consistently generates intelligible, natural-sounding speech, validated through subjective listening tests and waveform analyses. The solution demonstrates the feasibility of secure, offline voice cloning for personalized audiobook creation. Future developments will focus on improving speakers embedding fidelity, emotional control, and deployment on mobile and edge platforms.

Key Words:Voice Cloning, XTTS v2, Neural Text-to-Speech, Speaker Embedding, Deep Learning, Personalized Speech Synthesis, Local Inference, Audio Generation, Speech Processing.

User-Centric Voice Cloning Platform for Scalable Audiobook Narration

User-Centric Voice Cloning Platform for Scalable Audiobook Narration

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us

User-Centric Voice Cloning Platform for Scalable Audiobook Narration

User-Centric Voice Cloning Platform for Scalable Audiobook Narration

What is DOI

Site Map

Frequently Asked Questions

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us