Research Projects on Computer Vision

Attentive Partial Convolution for RGBD Inpainting

The process of Inpainting, which involves reconstructing missing pixels within images, plays a pivotal role in refining image processing and augmenting reality (AR) encounters. This study tackles three prominent hurdles in AR technology: diminished reality (DR), which entails removing undesired elements from the user's view; the latency issue in AR head-mounted displays leading to pixel gaps; and the flaws in depth maps generated by Time-of-Flight (ToF) sensors in AR devices. These obstacles compromise the authenticity and engagement of AR experiences by affecting both the texture and geometric accuracy of digital content. We introduce an innovative Partial Convolution-based framework tailored for RGBD (Red, Green, Blue, Depth) image inpainting, proficient in simultaneously reinstating missing pixels in both the color (RGB) and depth dimensions of an image. Unlike traditional methods that primarily concentrate on RGB inpainting, our approach integrates depth data, crucial for lifelike AR applications, by restoring both the spatial structure and visual details. This dual restoration ability is paramount for crafting immersive AR experiences, ensuring seamless amalgamation of virtual and real-world elements. Our contributions encompass the refinement of an advanced Partial Convolution model, incorporating attentive normalization and an updated loss function, which surpasses existing models in accuracy and realism in inpainting endeavors.

margin_trader

GAN-NeRF for Hi-Resolution Novel View Synthesis

Introducing GAN-NeRF, an innovative approach to 3D scene reconstruction that merges the capabilities of Neural Radiance Fields (NeRF) and Generative Adversarial Networks (GANs). Our method capitalizes on the distinct advantages of each model to produce high resolution 3D reconstructions even with limited input data. Recognizing the challenges of acquiring numerous images with precise camera parameters, we propose a strategy that harnesses a small set of such images to train three foundational NeRF-based models (NeRF, Mip-NeRF, and Mip-NeRF360). These models serve as the backbone, followed by a refinement phase employing a conditional Generative Adversarial Network (GAN) model. While NeRFs excel at generating new perspectives, they often exhibit flaws and artifacts without extensive training data. To combat this, we integrate the GAN model renowned for its proficiency in crafting realistic textures and intricate details, thus significantly enhancing the fidelity of the 3D reconstructions. GAN-NeRF represents a comprehensive and lifelike 3D model, seamlessly blending NeRFs' structural comprehension with GANs' textural richness. Extensive experimentation validates the efficacy of GAN-NeRF, showcasing its ability to generate high-quality 3D reconstructions with minimal input requirements. This makes it particularly promising for scenarios where gathering abundant high-quality images with precise camera parameters proves challenging or unfeasible. Notably, the refined images and quantitative results from GAN-NeRF consistently outperform other NeRF-based methods across various datasets, exhibiting superior SSIM, PSNR, and lower LPIPS and KID metric values.

signal_control

Reverse Pass-through VR for Enhanced Social Interaction

Virtual Reality (VR) headsets have become increasingly indispensable in today's digital landscape. However, they pose a significant challenge by blocking users' eyes, disrupting visual connections and fostering social isolation. Our primary goal is to combat this issue by introducing a groundbreaking Reverse Pass-through framework, specifically tailored for VR headsets. This framework enables users to display their eyes to their surroundings via an outward-facing screen. Our innovative approach incorporates left and right eye images to fully restore eye features and capture subtle facial expressions. To accomplish this, we have curated a new VR-Eyes image dataset, mimicking captures from infrared cameras embedded within the headset. Crucially, our proposed methodology employs lightweight models, facilitating easy deployment on VR devices for quick inference. We believe this initiative represents one of the pioneering efforts to address the issue of social isolation induced by VR headsets through this unique approach.

ts_mixer