Automatic 3D reconstruction of complex cross-roads from aerial image sequences by semantic modeling of static and moving context objects (2015)

Team:	S.G. Kosov (IPI); J. Leitloff (IPF)
Jahr:	2011
Förderung:	Deutsche Forschungsgemeinschaft (DFG)
Laufzeit:	1/2011 - 12/2013
Ist abgeschlossen:	ja

Joint project with the Institute of Photogrammetry and Remote Sensing at Karlsruhe Institute of Technology (IPF)

Motivation

In the recent years the interest to the automatic interpretation of optical airborne images has grown. This includes work on the automatic detection of roads. Many research efforts have been spent on this topic. Whereas there exist automatic methods for road extraction in rural areas, road extraction in urban areas is still a challenging problem. One of the major problems in this context is related to cross roads, where road extraction algorithms face problems for several reasons. Firstly, some model assumptions underlying automatic road extraction techniques, e.g. on the existence of parallel image edges corresponding to road edges, may be hurt. Secondly, in some cases of complex cross roads, e.g. at motorway exits, there may be streets at different height levels. Thirdly, occlusions by static (buildings, trees) and moving objects (cars) cause problems that are aggravated at cross roads, particularly in an urban context with dense traffic (Fig. 1). In order to be able to deal with occlusion by static objects, it is necessary to include a digital surface model (DSM) derived from multiple-overlap aerial images into the classification process. Furthermore, context should be considered in the classification process. Cars are frequently used as context objects to support road extraction: Where a car is found, it is extremely likely that there is also a road, even though the road surface is visible. At cross roads, car trajectories obtained from tracking cars in image sequences can give very valuable hints for the correct reconstruction of the road boundaries, even including information about road lanes. On the other hand, cars are more reliably detected and tracked if it is known where a road is. It is the purpose of this project to tackle the tasks of road extraction at crossroads and car detection and tracking in a single process. The goal is to develop an approach that derives 3D vector descriptions of cross roads from multiple overlap aerial images and reliable car trajectories at the cross roads simultaneously.


Figure 1: Aerial image sequence of an inner-city cross road scene.

Methodology

The project deals with the semantic reconstruction of complex cross-roads from aerial image sequences. Prior knowledge about crossroads positions is assumed to be provided. The use of context knowledge about scene dynamics enhances the classification quality. We make use of Conditional Random Fields (CRF) for classification. CRF provide a statistical model of context, which has a smoothing effect on the classification results, though the model is general enough to avoid over-smoothing in areas where the features extracted from the image indicate a class change. The features used in the classification are derived from multi-overlap aerial images, DSMs, and a preliminary analysis for detecting and tracking cars. As the latter will be unreliable without being able to use knowledge of existing roads, its results are only used as a cue for an overall classification that will finally provide a more reliable classification of cars. CRF belong to the class of undirected graphical models, where the scene is represented by a graph whose nodes are the random variables involved in the classification process (here: the class labels of individual pixels and the observed image features) and whose edges model dependencies between the random variables corresponding to the nodes. A major innovation of the proposed project is that the 3D structure of the scene will be considered in the classification process. This is necessary to be able to deal with occlusions in a systematic way. In order to do so, a multi-layer CRF will be built that uses two nodes for the class labels at a certain position in object space, namely one corresponding to the ‘occluded level’ of the scene (containing objects that do not occlude other objects but may be subject to occlusion, e.g. road surface, grass, roof) and one corresponding the ‘occlusion level’ (containing objects that are bound to occlude the other objects; here we mainly deal with trees, bridges, and cars).

Zurück