Master Thesis: Bridging the Domain-Gap in Computer Vision Tasks

Via this work, Philips Research explored the feasibility of an augmented reality driven interactive user manual for Philips consumer products. It was investigated whether a mobile app can be developed for consumers which is able to estimate the pose of complex and texture-poor Philips consumer products based on the device’s camera-feed using state-of-the-art deep learning computer vision algorithms. Subsequently, this estimated object’s pose can be used to augment the camera feed with virtual 3D cues. In order to train such deep learning computer-vision models, lots of annotated data is required. The acquisition of real training-data for computer-vision algorithms is a resource-demanding (and sometimes impossible) task and hence focus was shifted to training models using only synthetic (computer generated) image data. A tool called Philips Synthetica was
developed, which can generate annotated images based on a computer-aided-design (CAD) model of an object. The properties of synthetic data can be leveraged to create a theoretical unlimited amount of statistical unbiased training-data that can be completely shaped by the data-engineer. However, computer vision models trained with only synthetic data are confronted with the domain gap or reality gap problem; a discrepancy between source domain (synthetic training data) and target domain (real test data). This research investigates two techniques for bridging this domaingap: Domain Randomization and Generative Adversarial Data Enhancement using CylceGANs. We show on various computer-vision architectures (Convolutional Pose Machines and YOLOV3) and various computer-vision tasks (keypoint prediction and object localization and classification), that models trained with our GAN-enhanced data outperform models trained with the original data. Moreover we concluded that domain randomized data is benevolent for task performance in the real domain especially when combined with (semi) photo-realistic synthetic data. Finally, a proof-of-concept mobile iOS app for the Philips use-case is presented which utilizes the bestperforming pose-estimation model from the experiments.

Description