GestOS: A Real-Time Multimodal Hand Gesture and Voice Based Human Computer Interaction System.
Om Bande1, Swapnil Dhagdi2, Vighnesh Belka3, Omprakash Bedage4, Prof. M.S. Bhosale5
1 Department Of Information Technology, Sinhgad College of Engineering, Pune- 41
2 Department Of Information Technology, Sinhgad College of Engineering, Pune- 41
3 Department Of Information Technology, Sinhgad College of Engineering, Pune- 41
4 Department Of Information Technology, Sinhgad College of Engineering, Pune- 41
5 Department Of Information Technology, Sinhgad College of Engineering, Pune-41
Abstract - Hand gesture recognition and voice-based interaction are becoming natural alternatives to traditional input devices in human-computer interaction (HCI). However, many existing systems face high latency, limited scalability, or depend on a single interaction method. This paper presents GestOS, a real- time multimodal HCI system that combines hand gesture recognition and voice command processing to control a computer without physical input devices. The system uses MediaPipe for efficient hand landmark detection and applies a rule-based gesture classification method for low-latency performance. At the same time, a grammar- constrained voice recognition module is created using Windows Speech API to ensure high accuracy and few false positives. A multiprocessing structure is used to get past Python’s Global Interpreter Lock (GIL), allowing simultaneous execution of vision, voice, and command- processing modules. Experimental evaluation shows that the system provides real-time responsiveness with low latency and dependable gesture recognition in controlled settings. This approach offers a cost-effective, scalable, and practical solution for next-generation HCI applications, such as accessibility systems, smart environments, and touchless computing interfaces
Key Words: Hand Gesture Recognition, Human-Computer Interaction (HCI), MediaPipe, Multimodal Interaction, Voice Recognition, Real-Time Systems, Computer Vision, Gesture- Based Control, Speech Interface, Touchless Interaction, Human-Machine Interface, Assistive Technology, OpenCV, Multiprocessing Architecture, Low-Latency Systems, AI- Based Interaction, Natural User Interfaces (NUI).