Extracting Audio from Image Using Machine Learning





Find us on Google Scholar

Peer Review Policy
Article Processing Charges
Publication Procedure
Research Topics
FAQ
Copyright Infringement
Refund and Cancellation Policy

Find us on Google Scholar

Peer Review Policy

Article Processing Charges

Publication Procedure

Research Topics

FAQ

Refund and Cancellation Policy

Version
Download 396
File Size 371.80 KB
File Count 1
Create Date 23/04/2024
Last Updated 23/04/2024

Download

Description

Extracting Audio from Image Using Machine Learning

Mr. A. Balaji1, Battula Naga Jahnavi2,

Dammavalam Kavya Lakshmi Naga Sri3 , Borra Tirumala Teja4,

Gaddam Tulasi Venkata Naga Rajani5

1Assistant Professor, Department of Computer Science and Engineering, Tirumala Engineering College

2,3,4,5Student, Department of Computer Science and Engineering, Tirumala Engineering College

---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - This study introduces a new method for extracting sound from pictures by utilizing machine learning. Lately, there has been a lot of excitement around multi-modal learning because of its ability to reveal valuable information from various sources, like images and sound. Our research is centered on using the unique qualities of visual and auditory signals to predict sound content from pictures. This opens up possibilities for enhancing accessibility, creating content, and providing immersive user experiences. We start by exploring previous research in multi-modal learning, audio-visual processing, and tasks like image captioning and sound source localization. Based on this background, we introduce an approach that merges convolutional neural networks (CNNs) for image analysis with recurrent neural networks (RNNs) or transformers for sequence interpretation. The system is educated on a collection of matched images and associated audio tracks, allowing it to grasp the intricate connections between visual and auditory data. In our study, we carefully assessed the performance of our proposed method by using well-known metrics. We measure how well our method works by comparing it to other methods and showing that it can accurately and quickly extract audio from images. We also show through qualitative analysis that our model can create clear audio representations from a variety of visual inputs. After a thorough discussion, we analyze the findings, pointing out both the advantages and drawbacks of our method. We pinpoint potential areas for further study, such as delving into more advanced structures and incorporating semantic data to enhance audio extraction. To sum up, this study adds to the expanding field of multi-modal learning by introducing a promising model for extracting audio from images through machine learning. Our results emphasize the potential of this technology to improve accessibility, inspire creativity, and increase user engagement in different fields.

Key Words: Audio Extraction, Machine Learning, Computer Vision, Deep Learning, Convolutional Neural Networks

Extracting Audio from Image Using Machine Learning

Extracting Audio from Image Using Machine Learning

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us

Extracting Audio from Image Using Machine Learning

Extracting Audio from Image Using Machine Learning

What is DOI

Site Map

Frequently Asked Questions

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us