Institute of Photogrammetry and GeoInformation Research Current projects
CROSSES - Crowd Simulation System for Emergency Situations (2002)

CROSSES - Crowd Simulation System for Emergency Situations (2002)

Team:  Markus Gerke und Bernd-M. Straub
Year:  2002
Funding:  EU-IST-Projekt, Vertragnummer IST-1999-10510
Duration:  01.01.2000 bis 31.12.2002
Is Finished:  yes

Research Group: Automatic Image Analysis

Contact Person: Markus Gerke, Bernd-M. Straub

In the following a short overview of the CROSSES project''s objectives is given and the project partners and their tasks in the project are introduced.   

  • General objective of the project
  •  Technical objective of the project
  •  Real city modelling within the project
  •  Our Partners
  •  Literature for further information

General objective of the project
The main objective of CROSSES is to provide Virtual Reality tools for training people to efficiently respond to urban emergency situations. When confronted to emergency situations, the reactions of people are generally difficult to control and the emergency plans elaborated in advance may be  inefficient. There is therefore a need to be prepared and trained to limit the side effects due to inappropriate behaviour and plans. Today, effective training means real confrontation to actual situations which is (hopefully) not frequent and may be dangerous. CROSSES intends to demonstrate the use of a simulator recreating a situation realistic enough so that training can be efficiently performed through virtual reality.

Snapshot of the CROSSES Prototype

AVI-Video showing the running CROSSES Prototype (4MB, low resolution)

AVI-Video, higher resoluted (11MB)

Technical objectives and partners
The technical challenge of CROSSES is then to provide non specialist users with an operational training system allowing real-time simulation scenarios including (1) a highly "real" representation of the whole geographical environment (buildings, streets, etc.), (2) a realistic dynamic modeling of the crowd, including graphical rendering and behavioral simulation and (3) a precise modeling of all the sound parameters of a real situation.

The first technical objective is therefore to provide efficient tools to perform the "real" reconstruction of an actual urban environment. The Institute for Photogrammetry and Engineering Surveys in co-operation with ISTAR is responsible for this working package. A semi-automatic approach will be proposed, exploiting (and improving) state-of-the-art techniques in the field of 3D extraction from high-resolution aerial images. The aim of this work package is the automated reconstruction of the geometry and the texture of the city in multiple levels of detail from aerial and terrestrial imagery, and cadastral information using photogrammetric and computer vision techniques. In order to achieve realistic visualisation and to increase the obtainable level of automation color imagery resulting in a ground sampling distance of 5 to 10 cm is used for the centre of the area of interest (1-2 km²), and imagery resulting in a ground sampling distance of 20 to 25 cm is used for the whole field of operations (10-20 km²). Further information on this topic is given beneath.

The second technical objective is to provide a methodology and a set of tools for the simulation of a realistic, dynamically evolving, virtual population (crowd) living in this city environment. In the CROSSES project the Computer Graphics Lab (LIG) at the Swiss Federal Institute of Technology (EPFL) in Lausanne in co-operation with the MIRALab research group at the University of Geneva will aim at an automatic low-cost modeling process of a realistic crowd, including the behavior of the crowd and a realistic appearance of the artificial humans. The virtual population will be based on personalized avatars, dynamically evolving according to realistic behavioral rules.

To complete the realism of the simulation, the third technical objective is to mix sound and graphical visualization for real immersive simulation. The Institute of Communication Acoustics (IKA), Ruhr-University Bochum is responsible for this working package. The auditory sense is the most important sense for inter-individual communication and serves as a multi-directional warning organ. A special emphasis will then be put on the reproduction of the spatial properties of the sound field (auralization of crowds , rendering of background sounds, etc.).

The integration of the different subsystems into an experimental prototype will be done by the project co-ordinator MATRA S&I. The CROSSES system integration relies mainly on the data integration and the component integration.

The Central Scotland Police will provide scenario descriptions, for example a possible scenario is a leakage of dangerous gas in a chemical plant located in Grangemouth city (Scotland), which includes the 3D modeling of the chemical plant (buildings, pipes, …) around the location of the leakage, the modeling of people near the leakage, and particularly of first injured people, which are not necessarily a crowd, but have to be observed in detail.

Real city modelling
A key property of the simulation system developed within this project is the use of a "real" urban environment. Within the CROSSES project this work is done by IPI, ISTAR and Matra S&I.

Besides the simulation system users, a number of different users like regional and town planners, environmental investigaters, etc. formulated an increasing demand for such three-dimensional urban GIS (Geographic Information System) data which was investigated in an OEEPE test [Fuchs et al., 1998].
In the next image a view onto the database is shown. For visualization tasks the so called orthoimage is mapped onto the Digital Surface Model. For further information on the production of the DSM and the orthoimages from aerial images, see below.

A VRML view onto the database with buildings as "cuboids" and the orthoimage as ground texture

There are many approaches to generate 3D information from aerial and ground imagery. But despite numerous research efforts aiming at a higher degree of automation, a significant amount of manual editing remains necessary. City models of very high resolution are acquired mostly manually using analytical photogrammetric techniques. This procedure needs special hardware, is rather slow and time-consuming, and is thus very expensive.

One main goal of the CROSSES project is to develop algorithms for the automatic extraction of the "real" world objects using the obtained image data. The results of the (still ongoing) work are shown in the following. We want to investigate ways to improve and accelerate the generation of 3D city models using automation. Due to the complexity of this task our approach focuses on terrain, buildings (including facades) and trees reconstruction.

The next image shows another view onto the database. Instead of using the orthoimage as texture the information about the properties of the ground surface are visualized : green for sealed areas and gray for non-sealed areas. The yellow areas do reflect the parts of the scene where the simulated people of the system can walk. The boundaries of these areas have been captured manually.

A VRML view onto the database with buildings as "cuboids" and visualized information about the ground properties

The next diagram shows the workflow and the responsibilities concerning the single steps for the generation of the city model.

Process diagram for the generation of the 3D high resolution city database

The actions have been divided between 3 of the CROSSES'' partners (ISTAR, IPI and MS&I) and can be grouped as follows :

  • Production of the DSM and true orthoimages (ISTAR)
  • Automatic extraction of objects :
  • Detection of buildings and trees (ISTAR)
  • Reconstruction of buildings and trees (IPI)
  • Real time rendering (ISTAR)

 Production of the DSM and Orthoimages from aerial images :
 We have used analog aerial imagery (photographs) with an overlap of 80% in both directions. From these images, the Digital Surface Model (DSM) and the orthoimage of the area of Grangemouth have been produced by ISTAR''s production line which is mostly automatic.
Here is a brief description of this production chain :

  • Digitization of Aerial Photographs
    We have to scan the photographs so that the image has a resolution of 10 cm in the world coordinate system.
  • Image calibration and spatial triangulation
    The geometric image calibration corrects the images from the distorsions due to the shape of the camera lens, the orientation of the aircraft, the earth''s shape and allows to achieve the relation between the image coordinate system and the geodetic world coordinate system. To compute the acquisition parameters more accurate, we use Ground Control Points (GCP) which are clearly visible in the imagery and apply a reverse calculation mode. The coordinates of the GCPs are known in the geodetic world coordinate system thanks to GPS measurements.
  • Matching for DSM computation
    In a stereoscpic pair, objects from the earth''s surface are seen from two different points of view. The matching process involves taking sets of same points in image pairs, known as homologous points, by means of cross-correlation. Prior to do the correlation, we have to project the images into a common geometry known as epipolar geometry so that image geometric distortions caused by relief, called disparity, only exist in the horizontal direction.
  • Restitution
    Once all stereoscopic pairs have been correlated, the resulting disparity map must be converted into altitude data within a world coordinate system.
  • Merge
    After the restitution is completed, a DSM is available for each stereoscopic pair in the image. All theses informations has to be merged to outcome the final DSM. Then a manual correction is applied to detect areas where the correlation process has failed.

Shaded DSM over Grangemouth

  • Producing a true orthoimage
    An orthoimage is an aerial image that has been corrected from geometric deficiencies due to the sensor motion, terrain elevation and variations in viewing angle. We have produced a true orthoimage in the sense that even the buildings have been orthorectified using the DSM compared to traditional orthoimages where only the DTM (Digital Terrain Model) is used.

Extract of the true colors orthoimage over a school

  • Extracting ground information
    The DTM (Digital Terrain Model) is produced thanks to a semi-automatic algorithm that evaluates local discontinuities in the DSM to separate what represents above-ground features such as buildings or trees. The DEM (Digital Elevation Model) is defined here by DSM - DTM and represents the above-ground features.

Automatic extraction of objects :
The extraction of objects from images can be subdivided in two essential phases: the detection phase and the reconstruction phase. The process of detection answers the question "where is which object", it means the localisation and the recognition of buildings and trees in the images. The reconstruction process has to answer the question "what are the individual features of an object".

These research topics have been shared between ISTAR and IPI as follows : ISTAR mostly focusses on the detection of vegetation and buildings whereas IPI focusses on their reconstruction.

Detection of buildings and trees :

  • Detection of trees :
    The spectral properties of vegetation (trees as well as grass) are very characteristic compared to the other mineral elements. To help us differentiate the pixels of vegetation from the others, we have their response in 3 different wavelengths : near infrared (NIR), red (R) and green (G). Besides, thanks to their height, texture and shape, trees are easily recognisable among vegetation.
    As we have the infrared and red channels, we can compute the NDVI (Normalised Differential Vegetation Index) : NDVI = (IR-R)/(IR+R). It is comprised between [-1;1]. This image allows to enhance vegetation and then, thanks to a threshold, we can isolate vegetation. We can then extract the trees from these isolated pixels thanks to the DEM : a threshold on the DEM helps us to differentiate above-ground features from on the ground features. But, as the DEM and the threshold on the NDVI are not precise enough, there still remains errors in this detection of trees (surroundings of buildings, containers, grass on hill...) : But the pixels of  this first analysis correspond to the first assumption of trees and are used as training areas (charachteristic samples for the class "trees") for the supervised classification methods proposed hereafter. Indeed, what we propose here is to correct this image by combining the results of :
    - a classical Maximum Likelihood Classifier [Haralick, 1983] to get rid of pixels spectrally different from most of the pixels selected here (which are almost all trees) by adding the third channel (green) to the NIR and R ones.
    - a classifier with a measure of distance that depends on the homogeneity in a wide neighbourhood of each pixel to get rid of pixels with different texture than trees like grass, containers...
    These steps are visualized in the following images. The left image shows the NDVI image over the stadium, in the centered image the result of the intersection of the NDVI (threshold) and the DEM (threshold) is shown. The final results of tree detection are given in the right image.



NDVI image and results of automatic tree detection

 In fact, before the classification process, we project all the training areas on their 2 prinicipal components in order to segment their histogram in this 2D space. We then obtain a few characteristic classes that will be assigned, after classification, to "trees" or "not trees".

  • Detection of buildings :
    The approach envisaged here is based on the theory of the geodesic snakes [Caselles, 1997] .
    Thanks to the extraction of trees, we know which blobs on the DEM may correspond to trees. The remaining blobs are then assigned to potential buildings.
    Our method for the detection of buildings is still under development but we aim at initialising one geodesic snake for each of these "building blobs" (compared to the classical snakes, the geodesic snakes are very few dependent on the initial state), after having eliminated those blobs that do not seem to be buildings, by means of attributes.
    At the moment, we have developped a semi-automatic algorithm that allows one to click inside a building on an orthoimage ; as a result we obtain the 2D outline of the building and eventual inside chemineys (geodesic snakes can change topology as they are defined by the intersection between a geodesic curve and a plane) :

Evolution of the geodesic snake during iterations

Some examples of roof extraction with geodesic snakes

The algorithm uses the classical theory of 2D geodesic snakes in which we have added 2 forces, one introducing the third dimension. One is linked to the homogeneity of the intensity of the roof (in the colored orthoimages), the other is linked to the homogeneity of the height of the roof (in the DEM).
More over we have added a multiresolution treatment that allows to handle roofs with high frequencies in intensity.

Reconstruction of buildings and trees :
The reconstruction of buildings and trees is the second phase in the interpretation of the image data.

  • - Reconstruction of trees :
    In this part, we describe a method to extract individual trees with their characteristics (position and diameter) from the blobs detected by the detection of trees module.
    We suppose that the trees are standing together in groups and that a tree has individual spectral properties which allows a separation from adjacent trees. This leads to the assumption that a detected blob contains one or more trees.The task is now to extract single trees from these groups and find parameters to fix their position.

CIR subset  of the scene      Result of the group of trees detection   Result of the reconstruction of trees

The strategy can be summed up as follows :
- First, create hypothesis for the position of trees by means of morphological operators. By successively applying an erosion with a circular structure, element regions are found which may contain individual trees.
- In the next step, minimized parts of these regions are used as training pixels. After transforming these pixels into feature space using the infrared and red channels, additional pixels with similar spectral characterisitcs can be found in the whole region ; they are then assigned to this tree.
- Finally the height of the tree is obtained from the DSM.

The upper images show some examples : The left image shows a subset of the orthoimage, on the centered image, detected trees regions are shown. And, the right image shows the final result : individual trees.
Further information on this method including an evaluation can be found in [Straub and Heipke, 2001]

  • - Reconstruction of buildings :
    In the real world a lot of different building types can be found. Besides orthogonal buildings we find some special shaped ones (e.g. round, pentagon) which is often the case in down towns. For the present problem of building reconstruction from aerial imagery one has to find on the one hand a geometrical model with which as many as possible buildings may be described. On the other hand one should come to an automatized reconstruction. In order to reach this goal we have chosen an orthogonal polygon as building model ; the building outline may be described by such a model. As we are working with aerial images the building outline is represented by the roof outline.
    The task it to fit a orthogonal closed polygon to the regions obtained in the detection phase. This is done using invariant geometric moments, which are leading directly to the parameters (width, length, orientation and position in x and y) fixing a rectangle in the given 2D-building-region. But as we do not restrict the buildings to be simple rectangles a so called "decompositon" has to take place. This means the difference between the region containing the building and the area of the rectangular model has to be minimized. The difference areas are also described by rectangles and then removed from the model.
    The next images show a subset of the scene, the right image contains the roof outlines of the reconstructed buildings. For further information on this method and some examples see [Gerke, Heipke & Straub, 2001].

CIR subset of the scene                        Result of the building reconstruction

Real-time rendering :

The chosen output format for 3D visualisation is VRML :

- The DTM is triangulated (Delaunay triangulation and decimation with vtk libraries).
- The 3 orthoimages (NIR, R ang G) have been transformed in classical RGB channels, filtered and resampled for better rendering. They have then been applied as a colored texture on the DTM.
- Each building is an individual IndexedFaceSet.
- The facades are applied as texture on the corresponding faces of the buildings and can be seen with 2 levels of details.

And, to facilitate the navigation with such an amount of data, we have chosen a quad-tree description of the terrain (GeoVRML) depending on the distance from the trainee and also on the direction of his glance.
A snapshot of one of the VRML scenes is shown at the beginning of this page.

References :
[Fuchs et al., 1998]
C. Fuchs, E. Gülch, W. Förstner
OEEPE Survey on 3D-City Models.
OEEPS Publication N° 35: 9-123. Bundesamt für Kartographie und Geodäsie. Frankfurt, 1998
[Haralick, 1983]
R. M. Haralick
Pattern Recognition and Classification
Manual of Remote Sensing, 2nd Edition, Vol. 1, Ch.18, American Society of Photogrammetry, 1983
[Caselles, 1997]
V. Caselles, R. Kimmel, and G. Sapiro
On geodesic active contours
International Journal of Computer Vision, 22(1):61?79, February 1997

Our Partners

  • Computer Graphics Lab, EPFL Swiss Federal Institute of Technology, Lausanne, Swiss.
  • ISTAR, Sophia Antipolis, France.
  • MATRA S&I, France (Project Co-ordinator)
  • MIRALab, University of Geneva, Swiss.
  • IKA, Institute of Communication Acoustics, Ruhr-University Bochum, Germany.

Literature and Links for further information

Computer Vision

Computer Graphic

  • Crystal Space - A free 3D Graphic Engine
  • VRML - the Virtual Reality Modeling Language
  • SceneLib - A a C/C++ library for Windows 95/98/NT/2000.


  • Open GIS Consortium
  • GIS Kolloquium am Zentrum für Geoinformationssysteme der Universität Hannover

On the recommendation of our sound experts from the Institute of Communication Acoustics (IKA).

  • Course Notes by David Worral "PHYSICS AND PSYCHOPHYSICS OF MUSIC" Course Notes by Richard O. Duda "3-D Audio for Human Computer Interface"
  • ...more Audio and Three Dimensional Sound Links