Multi-Camera Stereo-Based Pedestrian Tracking From Moving Sensors

Figure 1: Ilustration of our tracking scenario. The green arrows depict a direct line of sight between the cameras and the pedestrian, indicating visibility. Conversely, the red arrows represent situations where the pedestrian is not visible to the camera.

Team:	R. Ali, M. Mehltretter
Jahr:	2022

Tracking the positions of pedestrians over time is a highly relevant task for many applications, for example, related to autonomous driving, robotics and safety surveillance. Most tracking approaches focus on multiple object tracking using an image sequence from a single camera only. However, by just using one viewpoint, occlusions pose a relevant challenge. Such occlusions may prevent a method from keeping track of a pedestrian, a potentially safety-critical problem as illustrated in Figure 1, where the black car cannot see the pedestrian attempting to cross the road in front of the car. Additionally, a monoscopic setup does generally not allow to reason about the observed scene in 3D. Consequently, the distance from the camera to any object in the scene - a significant measure that often needs to be taken into account - cannot be estimated reliably.

In this project, we address the pedestrian tracking task using multiple stets of stereo cameras located on moving cars together with stationary stereo surveillance cameras. By extending pedestrian tracking from a single viewpoint to multiple ones, the previously pointed out limitations of occlusions and being able to reason in 2D only can be overcome. An exemplary scenario in which multiple cameras located on moving cars and on traffic poles collaboratively carry out tracking is illustrated in Figure 1.

Our method utilizes a deep learning-based approach that combines data-driven and model-based principles to achieve robust object tracking. By employing a transformer architecture, we leverage the benefits of its flexible temporal context and enhanced representation capabilities. The transformer serves as a powerful tool to handle the fusion of spatial and temporal information, contributing to accurate tracking performance.

Zurück