Vision-Based Hand Gesture Recognition using Single Shot Detector and Deep Dilated Masks
Rahul R 1, Prof. K Sharath 2
1 Student, Department of MCA, Bangalore Institute of Technology, Karnataka, India
2 Professor, Department of MCA, Bangalore Institute of Technology, Karnataka, India
ABSTRACT:
With the surge in population across the globe, the need for state-of-the-art human-computer interaction technologies also rises accordingly. In this regard, such kinds of technologies can greatly enhance the lives of citizens by providing more satisfactory and efficient methods of interconnection with their surroundings. To this end, gesture-based technologies can prove to be highly beneficial, mainly for impaired or disabled persons, since they provide a far safer and more comfortable mode of interconnection. The problem is inherently challenging because of the wide range of individual differences that each motion can take. In this paper, we introduce a different approach fusing RGB and depth data for hand gesture recognition by using deep learning techniques. Our approach uses the strengths of RGB video and depth information acquired by a Kinect sensor to extract a robust feature representation. We use a single-shot detector convolutional neural network in hand tracking. First, we collect image data, consisting of both RGB video and depth information. The hand gesture detection and following are done in data streams with SSDCNN. In the process of detection, this kernel is applied at every ( m imes n ) position, returning an output value corresponding to the presence of a gesture. Each new feature layer is capable of making a set of gesture detection predictions by employing an array of convolutional filters. Deep dilation techniques allow the visibility and accuracy of gesture recognition to be improved. Such improvements in the image masks, allowing for a representation of gestures, enable the detection to be more accurate and robust. We believe that
this approach is one step toward the evolution of gesture recognition, which can lead to more intuitive and efficient human-computer interaction systems.
Keywords: RGB: Red Green and Blue, SSD: Single shot Detector, CNN- Convolutional Neural Network, DDM: Deep Dilated Masks, RNN: Recurrent Neural Networks.