“Sound Snap – AI-Powered Audio-To-Midi Conversion Platform”
Darshan Jibhau Thakare, Raj Bharat Nemade, Tanmay Prashant Mohan, Jayesh Suresh Patil, Ms. S. S. Pathare
1darshanthakare90@gmail.com, Student of Diploma of Engineering, IT, K. K. Wagh Polytechnic, Nashik, India
2rajnemade000@gmail.com, Student of Diploma of Engineering. IT, K. K. Wagh Polytechnic, Nashik, India
3tanmaymohan70@gmail.com, Student of Diploma of Engineering. IT, K. K. Wagh Polytechnic Nashik, India
4 8030jayeshpatil@gmail.com, Student of Diploma of Engineering. IT, K. K. Wagh Polytechnic, Nashik, India
5sspathare@kkwagh.edu.in, Student of Diploma of Engineering, IT, K. K. Wagh Polytechnic, Nashik, India
Abstract - Currently, audio-to-MIDI conversion is often performed using complex Digital Audio Workstations (DAWs) or offline transcription tools that require manual processing, advanced technical knowledge, and significant computing resources. While several software solutions provide audio transcription capabilities, many lack real-time feedback, web accessibility, and seamless playback integration. Additionally, traditional workflows involve multiple steps such as exporting audio, running conversion software separately, manually importing MIDI files into players, and configuring instrument packs. These processes are time-consuming, error-prone, and not user-friendly for beginners or musicians seeking quick results.
The proposed system, Audio to MIDI Studio, offers a modern and automated web-based solution for converting audio recordings into MIDI files using Spotify’s Basic Pitch model. The application enables users to record audio directly in the browser or upload existing audio files. The backend processes the audio using an asynchronous job pipeline built with FastAPI, where the audio is preprocessed, normalized, and transcribed into MIDI format.
The system provides real-time job status updates using WebSocket communication, ensuring a responsive user experience. Once transcription is complete, the generated MIDI file can be played directly in the browser using html-midi-player, with selectable SoundFont instrument packs. Users can also download the MIDI file for further editing or production use.
This automated workflow reduces manual effort, eliminates dependency on complex desktop tools, and provides a fast, secure, and efficient method for audio-to-MIDI transcription. By integrating modern web technologies with machine learning-based transcription, Audio to MIDI Studio delivers a production-ready, scalable, and user-friendly solution for musicians, educators, and developers.
Key Words: Audio to MIDI conversion, Basic Pitch, FastAPI, WebSocket, MIDI playback, real-time transcription, web-based music processing.