LapiDude: Biometric-Authenticated Multilingual Framework for Secure Desktop Automation
RANGANI HIMABINDU, 1
SHAIK ROOHI REHANA BEGUM,2
SINGAMSETTY TARUNI,3
VALLA BHARGAV,4
PONUGUPATI PRATAP,5
PULIPATI SYAMBABU6
1 Assistant Professor, Department of CSE(AIML), Bapatla Engineering College, Bapatla 522101, AP, India
2,3,4,5,6 Student, Department of CSE(AIML), Bapatla Engineering College, Bapatla 522101, AP, India.
Abstract— Over the years, voice-based interaction has become an important interface paradigm in today’s computing systems. Voice interactions has become widespread with mobile devices and structured on smart assistants, desktop environments still depends on manual input via keyboards and pointing devices.
Additionally, since most desktop automation tools do not even check whether the user sending commands is who he/she claims to be, there are security implications when performing operations on a system level are executed through automated processes. Also, another gap of current systems which makes them not much efficient is that they do not provide multilingual corpus since a majority of available voice interaction framework are designed to be helpful only for English based users.
This work proposes a new biometric-authenticated multilingual framework, LapiDude for secure voice-driven desktop automation. It utilizes speaker verification models on top of multi-lingual command recognition to allow users to control desktop applications with natural spoken phrases, but only allows authorized speakers to issue commands. We use deep speaker embeddings from a pretrained voice representation to perform speaker verification. It uses cosine similarity to match incoming voice samples against stored administrator voiceprints in order to authenticate identity.
The spoken command gets transformed to text post successful authentication and subsequently passed through a multilingual keyword-based intent mapping mechanism that can understand a command in English, Hindi and Telugu. The recognized commands are then mapped to the relevant desktop automation functions (e.g., launching applications, controlling processes, window handling).
The framework works fully on local hardware, which improves privacy and decreases the response latency. Experimental assessment reveals that we perform secure recognition (average similarity < 0.65) and average command execution latency < 200 ms, the results demonstrate that the proposed framework is an efficient and secure mechanism for enabling multidimensional verbal-based desktop pages.
Key Words - Voice Biometrics, Desktop Automation, Multilingual Voice Commands, Human–Computer Interaction, Speaker Authentication, Pattern Recognition