IRIS:GESTURE NAVIGATION CONTROL
Aswin R ∗1, Bhavya S Kumar ∗2,Boomika S ∗3, Thomas Jacob ∗4, Varsha Varghese ∗5, Linda Sebastian ∗6
College Of Engineering,Kidangoor Kottayam, Kerala, India
aswinrjuly2004@gmail.com∗1 , bhavyaskumar21@gmail.com∗2 , boomika b22118cse b@ce-kgr.org∗3 , thomasjacobtj2003@gmail.com∗4 , varsha b22104cse b@ce-kgr.org∗5 , lindasebastian@ce-kgr.org∗6
ABSTRACT: This project presents a real-time multi-modal human-computer interaction system that integrates gesture-based cursor control with voice-to-text conversion. The objective is to provide an intuitive, touch-free interface for navigating on-screen elements while enabling accurate speech transcription and translation for commands or dic tation. The system operates using only a device’s built-in camera and microphone, requiring no external hardware. For gesture navigation, the camera continuously captures live video frames of the user’s hand. These frames are pre-processed for noise reduction and color normalization, then analyzed by a machine-learning pipeline. Using advanced computer-vision frameworks such as MediaPipe Hands and deep neural networks, the system detects and tracks key hand landmarks in real time. Recognized gestures-such as pointer movement, click, drag, or scroll-are mapped to operating-system events through coordinate-mapping and motion-smoothing algorithms, delivering stable and respon sive cursor control. Audio signals are cleaned and normal ized before being processed by a lightweight speech-to-text en gine, which outputs editable digital text. This feature sup ports tasks like composing messages, executing commands, and controlling applications through natural voice input. By combining these two input modes, the system provides a fully contactless user experience suited for accessibility ap plications. Built with Python and open-source libraries such as TensorFlow, OpenCV, and PyAutoGUI, the architecture remains scalable and cross-platform, running efficiently on laptops or low-power edge devices.
INDEX TERMS: Human–Computer Interaction (HCI), Gesture Recognition, Voice Recognition, Multimodal Interaction, Hand Tracking, MediaPipe, Computer Vision, Speech-to-Text, Cursor Control, Touchless Interface, Real-Time Systems, OpenCV.