• search hit 1 of 1
Back to Result List

Deep Learning-based 3D Hand Pose and Shape Estimation from a Single Depth Image: Methods, Datasets and Application

  • 3D hand pose and shape estimation from a single depth image is a challenging computer vision and graphics problem with many applications such as human computer interaction and animation of a personalized hand shape in augmented reality (AR). This problem is challenging due to several factors for instance high degrees of freedom, view-point variations and varying hand shapes. Hybrid approaches based on deep learning followed by model fitting preserve the structure of hand. However, a pre-calibrated hand model limits the generalization of these approaches. To address this limitation, we proposed a novel hybrid algorithm for simultaneous estimation of 3D hand pose and bone-lengths of a hand model which allows training on datasets that contain varying hand shapes. On the other hand, direct joint regression methods achieve high accuracy but they do not incorporate the structure of hand in the learning process. Therefore, we introduced a novel structure-aware algorithm which learns to estimate 3D hand pose jointly with new structural constraints. These constraints include fingers lengths, distances of joints along the kinematic chain and fingers inter-distances. Learning these constraints help to maintain a structural relation between the estimated joint keypoints. Previous methods addressed the problem of 3D hand pose estimation. We open a new research topic and proposed the first deep network which jointly estimates 3D hand shape and pose from a single depth image. Manually annotating real data for shape is laborious and sub-optimal. Hence, we created a million-scale synthetic dataset with accurate joint annotations and mesh files of depth maps. However, the performance of this deep network is restricted by limited representation capacity of the hand model. Therefore, we proposed a novel regression-based approach in which the 3D dense hand mesh is recovered from sparse 3D hand pose, and weak-supervision is provided by a depth image synthesizer. The above mentioned approaches regressed 3D hand meshes from 2D depth images via 2D convolutional neural networks, which leads to artefacts in the estimations due to perspective distortions in the images. To overcome this limitation, we proposed a novel voxel-based deep network with 3D convolutions trained in a weakly-supervised manner. Finally, an interesting application is presented which is in-air signature acquisition and verification based on deep hand pose estimation. Experiments showed that depth itself is an important feature, which is sufficient for verification.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Author:Muhammad Jameel Nawaz Malik
Advisor:Didier Stricker
Document Type:Doctoral Thesis
Language of publication:English
Publication Date:2020/11/20
Year of Publication:2020
Publishing Institute:Technische Universität Kaiserslautern
Granting Institute:Technische Universität Kaiserslautern
Acceptance Date of the Thesis:2020/11/11
Date of the Publication (Server):2020/11/18
Tag:hand pose, hand shape, depth image, convolutional neural networks
Number of page:IX, 157
Faculties / Organisational entities:Fachbereich Informatik
CCS-Classification (computer science):I. Computing Methodologies / I.4 IMAGE PROCESSING AND COMPUTER VISION (REVISED)
DDC-Cassification:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Licence (German):Creative Commons 4.0 - Namensnennung, nicht kommerziell, keine Bearbeitung (CC BY-NC-ND 4.0)