Top View Deep Learning Object Detection using Active Perception in Construction Environment

  • To increase situational awareness of the crane operator, the aim of this thesis is to develop a vision-based deep learning object detection from crane load-view using an adaptive perception in the construction area. Conventional worker detection methods are based on simple shape or color features from the worker's appearances. Nonetheless, these methods can fail to recognize the workers who do not wear the protective gears. To find out an image representation of the object from the top view manually or handcrafted feature is crucial. We, therefore, employed deep learning methods to automatically learn those features. To yield optimal results, deep learning methods require mass amount of data. Due to the data deficit especially in the construction domain, we developed the photorealistic world to create the data in addition to our samples collected from the real construction area. The simulated platform does not benefit only from diverse data types, but also concurrent research development which speeds up the pipeline at a low cost. Our research findings indicate that the combination of synthetic and real training samples improved the state-of-the-art detector. In line with previous studies to bridge the gap between synthetic and real data, the results of preprocessed synthetic images are substantially better than using the raw data by approximately 10%. Finding the right deep learning model for load-view detection is challenging. By investigating our training data, it becomes evident that the majority of bounding box sizes are very small with a complex background. In addition, we gave the priority to speed over accuracy based on the construction safety criteria. Finally, RetinaNet is chosen out of the three primary object detection models. Nevertheless, the data-driven detection algorithm can fail to handle scale invariance, especially for detectors whose input size is changed in an extremely wide range. The adaptive zoom feature can enhance the quality of the worker detection. To avoid further data gathering and extensive retraining, the proposed automatic zoom method of the load-view crane camera supports the deep learning algorithm, specifically in the high scale variant problem. The finite state machine is employed for control strategies to adapt the zoom level to cope not only with inconsistent detection but also abrupt camera movement during lifting operation. Consequently, the detector is able to detect a small size object by smooth continuous zoom control without additional training. The adaptive zoom control not only enhances the performance of the top-view object detection but also reduces the interaction of the crane operator with camera system, reducing the risk of fatality during load lifting operation.
Metadaten
Author:Tanittha SutjaritvorakulORCiD
URN:urn:nbn:de:hbz:386-kluedo-80264
DOI:https://doi.org/10.26204/KLUEDO/8026
Advisor:Karsten BernsORCiD
Document Type:Doctoral Thesis
Cumulative document:No
Language of publication:English
Date of Publication (online):2024/04/15
Year of first Publication:2024
Publishing Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Granting Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Acceptance Date of the Thesis:2024/03/06
Date of the Publication (Server):2024/04/17
Page Number:VIII, 181
Faculties / Organisational entities:Kaiserslautern - Fachbereich Informatik
DDC-Cassification:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Licence (German):Creative Commons 4.0 - Namensnennung, nicht kommerziell (CC BY-NC 4.0)