TY - THES
A1 - Malik, Muhammad Jameel Nawaz
T1 - Deep Learning-based 3D Hand Pose and Shape Estimation from a Single Depth Image: Methods, Datasets and Application
N2 - 3D hand pose and shape estimation from a single depth image is a challenging computer vision and graphics problem with many applications such as
human computer interaction and animation of a personalized hand shape in
augmented reality (AR). This problem is challenging due to several factors
for instance high degrees of freedom, view-point variations and varying hand
shapes. Hybrid approaches based on deep learning followed by model fitting
preserve the structure of hand. However, a pre-calibrated hand model limits
the generalization of these approaches. To address this limitation, we proposed a novel hybrid algorithm for simultaneous estimation of 3D hand pose
and bone-lengths of a hand model which allows training on datasets that contain varying hand shapes. On the other hand, direct joint regression methods
achieve high accuracy but they do not incorporate the structure of hand in
the learning process. Therefore, we introduced a novel structure-aware algorithm which learns to estimate 3D hand pose jointly with new structural constraints. These constraints include fingers lengths, distances of joints along
the kinematic chain and fingers inter-distances. Learning these constraints
help to maintain a structural relation between the estimated joint keypoints.
Previous methods addressed the problem of 3D hand pose estimation. We
open a new research topic and proposed the first deep network which jointly
estimates 3D hand shape and pose from a single depth image. Manually annotating real data for shape is laborious and sub-optimal. Hence, we created a
million-scale synthetic dataset with accurate joint annotations and mesh files
of depth maps. However, the performance of this deep network is restricted by
limited representation capacity of the hand model. Therefore, we proposed a
novel regression-based approach in which the 3D dense hand mesh is recovered
from sparse 3D hand pose, and weak-supervision is provided by a depth image synthesizer. The above mentioned approaches regressed 3D hand meshes
from 2D depth images via 2D convolutional neural networks, which leads to
artefacts in the estimations due to perspective distortions in the images. To
overcome this limitation, we proposed a novel voxel-based deep network with
3D convolutions trained in a weakly-supervised manner. Finally, an interesting
application is presented which is in-air signature acquisition and verification
based on deep hand pose estimation. Experiments showed that depth itself is
an important feature, which is sufficient for verification.
KW - hand pose, hand shape, depth image, convolutional neural networks
Y1 - 2020
UR - https://kluedo.ub.uni-kl.de/frontdoor/index/index/docId/6145
UR - https://nbn-resolving.org/urn:nbn:de:hbz:386-kluedo-61455
ER -