End-to-end optimization of single-shot monocular metasurface camera for RGBD imaging

RGBD imaging sensor can recover the texture and depth information of the target scene from the compressed two-dimensional image, which has emerged in various fields, including object detection [1], autonomous driving [2,3], and medical imaging [4]. However, capturing light field information beyond 2D intensity generally requires multi-sensors, which significantly increases the size, weight, and power consumption of imaging systems. For example, traditional 3D cameras rely on active illumination such as structured light projection, time-of-flight, or leveraging disparity depth information provided by multiple sensors to reconstruct the multidimensional light field information [[5], [6], [7]]. Due to the complex system, bulky size, and high cost, the traditional RGBD imaging scheme has hindered further large-scale applications in mobile, wearable platforms, and consumer electronics. Consequently, it is highly desirable to achieve compact passive low-cost monocular RGBD imaging systems [[8], [9], [10]].

The metasurface utilizes subwavelength optical scatterers to modulate incident light, which has greater design freedom, stronger dispersion engineering capabilities, and higher integration compared with the traditional diffractive optical element. Therefore, the metasurface has been widely applied in polarization imaging [11,12], spectral imaging [13,14], achromatic imaging [15], and wide-field imaging [16]. The advent of metasurface provides a new avenue for monocular RGBD imaging. However, due to the severe wavelength-dependent aberrations, metasurface optimization design for RGBD imaging faces great challenges. Recently, researchers have used computational optical imaging techniques to process aberration correction, thereby efficiently retrieving target scenes from degraded images [[17], [18], [19]]. The combination of metasurface design and image reconstruction algorithms can be described as an optical encoder and electronic decoder system, which has been proven to eliminate optical aberration, break through the limitations of physical mechanism and interpret light field information, accomplishing specific image tasks [[20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31]]. At the same time, the application of 3D imaging using metasurfaces has also been shown to achieve efficient depth estimation in a short distance. However, these studies have not yet realized the simultaneous reconstruction of scene RGBD images [[32], [33], [34], [35]]. In addition, it is not yet clear how different optical coding strategies compare to one another and what the best strategy for a specific task may be. Inspired by recent deep optics [[36], [37], [38], [39]], the end-to-end joint optimization framework has been introduced into computational imaging, further exploring the potential for optimizing the design of imaging systems. However, it is still a great challenge for realizing the compact and high performance monocular RGBD vision sensors using an end-to-end optimization framework.

In this paper, we propose an RGBD metasurface design scheme based on end-to-end joint optimization, and imple ment a single-shot monocular RGBD camera. A differentiable light field imaging model is built to realize the end-to-end joint automatic optimization between the optical design and computational reconstruction, which achieves the integrated and intelligent design of the whole link of imaging system. Our methodology utilizes the dispersion modulation capability and design freedom of the metasurface to encode scene color and defocus depth information, and uses deep neural networks to learn prior information from the encoded images for recovering RGBD information. The proposed single-shot monocular metasurface camera for RGBD imaging can be realized within a depth range of 0.5 m. Compared with traditional lens-based RGBD imaging, our proposed metasurface-based imaging achieves an approximate 2 dB improvement in chromatic imaging and a 4 times increase in depth estimation accuracy, while the sensing area is reduced by at least 10 times. Our proposed methodology exhibits great potential for the intelligent metasurface design of compact RGBD imaging devices.

留言 (0)

沒有登入
gif