Voicebridge: An AI-Based Multi-Modal Voice Assistant Using Whisper, GTTS and GPT





Find us on Google Scholar

Peer Review Policy
Article Processing Charges
Publication Procedure
Research Topics
FAQ
Copyright Infringement
Refund and Cancellation Policy

Find us on Google Scholar

Peer Review Policy

Article Processing Charges

Publication Procedure

Research Topics

FAQ

Refund and Cancellation Policy

Version
Download 134
File Size 378.64 KB
File Count 1
Create Date 29/07/2025
Last Updated 29/07/2025

Download

Description

Voicebridge: An AI-Based Multi-Modal Voice Assistant Using Whisper, GTTS and GPT

Author: DHANIMIREEDI GREESHMANTH (MCA student),

M. BALA NAGA BHUSHANAMU2 (Asst. Prof)

Department of Information Technology & Computer application, Andhra University College of Engineering, Visakhapatnam, AP.

Corresponding Author: Dhanimireedi Greeshmanth

(email-id:greeshmanthdhanimireddi@gmail.com)

ABSTRACT

In recent years, voice assistants have emerged as powerful tools for enabling human-machine interaction through natural spoken language. These systems, powered by advances in artificial intelligence and speech processing, offer users the convenience of hands-free control, instant information retrieval, and intelligent dialogue management. However, many existing voice assistants are highly dependent on cloud infrastructure and continuous internet access, limiting their functionality in rural or offline scenarios.

This project introduces VoiceBridge, a multi-modal AI-powered voice assistant that integrates OpenAI Whisper for speech-to-text conversion, gTTS (Google Text-to-Speech) for voice synthesis, and GPT-4o for intelligent conversational replies. The system is implemented using a Python Flask backend and a browser-based frontend, offering users a complete speech-driven interaction experience.

Unlike traditional assistants, VoiceBridge emphasizes modularity, privacy, and future support for offline capabilities. It serves as an efficient, scalable, and platform-independent solution for personalized AI communication. The assistant is capable of transcribing audio, generating text responses using GPT, and converting those responses into speech, creating a complete input-output cycle.

This paper presents the system architecture, functional modules, implementation workflow, and observed performance characteristics. The solution is intended for integration into educational, accessibility, and personal productivity applications with minimal resource consumption.

Keywords

Voice Assistant, Whisper, GPT-4o, Text-to-Speech, Flask, Artificial Intelligence, Speech Recognition, gTTS, Natural Language Processing, Conversational AI

Voicebridge: An AI-Based Multi-Modal Voice Assistant Using Whisper, GTTS and GPT

Voicebridge: An AI-Based Multi-Modal Voice Assistant Using Whisper, GTTS and GPT

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us

Voicebridge: An AI-Based Multi-Modal Voice Assistant Using Whisper, GTTS and GPT

Voicebridge: An AI-Based Multi-Modal Voice Assistant Using Whisper, GTTS and GPT

What is DOI

Site Map

Frequently Asked Questions

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us