Group leader

Dr.-Ing. Cristobal Curio
Phone: +49 7121 271 4005
Cognitive Systems at Opens external link in new windowReutlingen University


Group members

CE Overview Poster


04/11/2013 New HMI detectability concept supporting situational awareness featured in IEEE Opens external link in new windowIntelligent Transportation Systems Magazine.
27/09/2013 Computational perception article on view-dependencies of social interaction understanding Opens external link in new windowFrontiers Perception.
26/08/2013 Congratulations to Christian Herdtweck on his successful PhD defense!
29/06/2013 Our approach on Opens external link in new window3D driver head pose monitoring will be presented at Opens external link in new windowTP.CG.2013 conference at Eurographics UK.
24/06/2013 Our cognitive vision architecture on efficient monocular car viewpoint estimation [Opens external link in new windowpdf] has been presented at the Opens external link in new windowIntelligent Vehicles conference.
10/06/2013 We participated in Opens external link in new windowEU-FET-Open (Future and Emerging Technologies) project TANGO. The project has been evaluated as excellent.
20/01/2013  Visual cues determining emotional dynamic facial expression recognition (Opens external link in new windowJournal of Vision)
13/09/2012 Opens external link in new windowBest Conference Paper Award at Opens external link in new windowIEEE conference on Multisensor Fusion and Information Integration (MFI) 2012.
06/03/2012  Enhancing drivers' environment perception presented as Opens external link in new windowtalk (best paper finalist) and novel monocular visual odometry approach at Intelligent Vehicles conference.
05/10/2011 Public release of medial feature grouping and superpixel segmentation Opens external link in new windowcode.
09/ 2011 HMI system for image retrieval presented at INTERACT 2011 conference.
03/2011 Paper presentation at IEEE Automatic Face & Gesture Conference, Santa Barbara. Watch our latest facial analysis video with the Microsoft Kinect sensor on youtube.
10/2010 Our book on Dynamic Faces: Insights from Experiments and Computation has appeared at MIT press.
10/2010 Two SIGGRAPH Asia 2010 Technical Sketches accepted for presentation.
05/2010 Paper at CVPR Conference Workshop on Feature Grouping.
09/2009 We won a DAGM paper prize for work on automatic 3D surface tracking for the generation of a 4D morphable face model. Conference of the German Association for Pattern Recognition (DAGM).


WS 2012/13 Opens external link in new windowMedizinische Bildverarbeitung at the Computer Science Department, Tübingen.
SS 2012 Opens external link in new windowMachine Learning II at the Graduate School of Neural Information Processing, Tübingen.
WS 2010/11 Statistical Methods in Artificial Intelligence at the Computer Science Department, Tübingen.
WS 2009/10 Advanced Topics in Machine Learning at the Computer Science Department, Tübingen.

External activities

Opens external link in new windowKI-2013 From Research to Innovation and Practical Applications (Program Committee)
Opens external link in new windowIntelligent Vehicles 2013 PC Associate Editor
WIAF-2012 Workshop Opens external link in new windowWhat's in a face? at ECCV, technical PC.
Associate Editor
06/2011 Organization of workshop on interactive pedestrian behavior analysis and synthesis at IEEE sponsored Intelligent Vehicles, Baden-Baden, June 5, Final program.
 PC at 1st IEEE workshop on Information Theory in Computer Vision and Pattern Recognition at ICCV 2011
Program Committee
Area Chair
COSYNE conference workshop


Opens external link in new windowCenter for Integrated Neuroscience, Section Computational Motorics (Prof. Dr. M.A. Giese)
Opens external link in new windowAutonomous Systems Lab ETHZ (Prof. Dr. R. Siegwart)
Opens external link in new windowZentrum für Neurowissenschaften Zürich (Prof. Dr. med. A. Luft)
Opens external link in new windowMPI for Intelligent Systems (Prof. M. Black)

Five most recent Publications

Herdtweck C and Wallraven C (December-2013) Estimation of the Horizon in Photographed Outdoor Scenes by Human and Machine PLoS ONE 8(12) 1-14.
Dobricki M and de la Rosa S (December-2013) The structure of conscious bodily self-perception during full-body illusions PLoS ONE 8(12) 1-9.
Dobs K, Schultz J, Bülthoff I and Gardner JL (November-10-2013): Attending to expression or identity of dynamic faces engages different cortical areas, 43rd Annual Meeting of the Society for Neuroscience (Neuroscience 2013), San Diego, CA, USA.
Nieuwenhuizen FM, Chuang LL and Bülthoff HH (November-2013) myCopter: Enabling Technologies for Personal Aerial Transportation Systems: Project status after 2.5 years, 5. Internationale HELI World Konferenz "HELICOPTER Technologies", "HELICOPTER Operations" at the International Aerospace Supply Fair AIRTEC 2013, Airtec GmbH, Frankfurt a. Main, Germany, 1-3.
Bieg H-J (October-1-2013) Abstract Talk: Oculomotor Decisions: Saccadic or Smooth Response?, 14th Conference of Junior Neuroscientists of Tübingen (NeNa 2013): Do the Results, Justify the Methods, Schramberg, Germany 19.

Export as:
BibTeX, XML, Pubman, Edoc, RTF


Applied Computer-Vision

Our goal is to build Computer-Vision systems that can enhance and support the visual perception and decision making processes of humans with, for example, a driver assistance system. We design computational architectures and develop machine vision algorithms in order to resolve situations such as the one to the left. The interpretation of a turning action of a pedestrian in a time critical driving situation requires advanced dynamic prediction mechanisms, also taking the driver's state into account.
Major technological challenges are:
1) Large object motion, and occlusion
2) Articulated objects, with varying appearance, occuring in crowds
3) Moving observers, as such non non-stationary cameras
4) Robust driver state monitoring
See also related work of the Cognitive Engineering group:
Opens internal link in current windowDesign and optimization of Human-Machine Interfaces with Machine-Vision and Virtual Realities
Opens internal link in current window3D reconstruction technologies for industrial and research applications
Opens internal link in current windowStudying interactive behavior

Research topics in applied Computer-Vision

Our research areas address problems in low- and high-level vision:
  • Perception inspired recognition approaches (e.g. ego-motion)
  • Modelling Cognitive Loops for robust Environment Perception
  • Medial feature detection, Perceptual Grouping, code
  • Superpixel Sementation, code
  • Markerless Human Body Tracking
  • Classification and regression for human pose detection and tracking
  • Generative Model-based Object Recognition
  • Combining Generative and Discriminative Recognition Approaches
  • Technical pipelines for the recording and generation of interacting human actions (3D Poses, Video) 

Modelling Cognitive Loops for robust Environment Perception

Robust Bayesian odometry from monocular image sequences with implicit independent motion classification
Implementing seeing machines is a challenge. Artificial cognitive systems require robustness in order to operate safely and potentially assist humans in a useful way, for example, in a driver assist context. In this project we learn cognitive components from minimal sensor cues, in order to imitate robust human perception functions. Areas of studies include robust and monocular visual odometry, monocular object viewpoints estimation in clutter, and the combination thereof. Further projects have been in the area of monocular human body pose detection and tracking.
Robust object viewpoint estimation in clutter is an important task in order to track and predict the future behavior of pedestrians. In a recent study, we combined monocular egomotion estimation and circular car viewpoint estimation in a joint dynamic system.

Visual odometry
has been promoted as a fundamental component for intelligent vehicles. Relying solely on monocular image cues would be desirable. Nevertheless, this is a challenge especially in dynamically varying urban areas due to scale ambiguities, independent motions, and measurement noise. We propose to use probabilistic learning with auxiliar depth cues. Specifically, we developed an expert model that specializes monocular egomotion estimation units on typical scene structures, i.e. statistical variations of scene depth layouts.The framework adaptively selects the best fitting expert. For on-line estimation of egomotion, we adopted a probabilistic subspace flow estimation method. Learning in our framework consists of two components: 1) Partitioning of datasets of video and ground truth odometry data based on unsupervised clustering of dense stereo depth profiles and 2) training a cascade of subspace flow expert models. A probabilistic quality measure from the estimates of the experts provides a selection rule overall leading to improvements of egomotion estimation for long test sequences.

Herdtweck C Person and Curio C Person (June-2013) Monocular Car Viewpoint Estimation with Circular Regression Forests IEEE Intelligent Vehicles Symposium (IV 2013), IEEE, Piscataway, NJ, USA, 1-8. accepted
Herdtweck C  Opens external link in new windowPerson and Curio C  Person (September-2012) Monocular Heading Estimation in Non-Stationary Urban Environment IEEE conference on Multisensor Fusion and Information Integration (MFI 2012), 1-7. Best Paper Award.
Herdtweck C  Opens external link in new windowPerson and Curio C  Person (June-2012) Experts of Probabilistic Flow Subspaces for Robust Monocular Odometry in Urban Areas IEEE Intelligent Vehicles Symposium (IV 2012), 1-7.

Mid-level features for artificial scene understanding

Feature grouping principle.
Feature detection and grouping results.
Scene labeling improvements with complex shape-centeres features SCIP (Engel and Curio, 2010).


Image encoding using interest points is a common technique in computer vision. We derived a scale and rotation invariant shape centered interest point (SCIP) detector. By means of detecting singularities in Gradient Vector Flow (GVF) fields we find points of high symmetry in the image.


Grouping edge and shape-centered features

Due to the nature of the underlying GVF field we can employ our features to group together edge-based interest points such as SIFTs. This feature grouping provides a strong descriptor for SCIPs and can help to encode valuable information about the image for computer vision tasks.


Properties of grouped mid-level features

We demonstrate the main properties of our features such as scale and rotation invariance and further robustness against noise and clutter in a series of experiments. We show that the information they encode is to a certain degree complementary to SIFT. Furthermore, we evaluate them in an edge map reconstruction task to assess the amount of image information they encode.


Application example: Image Interpretation of Street Scenes

Finally, we have demonstrated the power of our novel feature grouping scheme (SCIP+SIFT) in a multi-category classification task on natural images from the StreetScenes database and could show large improvements over just employing SIFT features.
Engel D Person and Curio C Person (2010) Shape Centered Interest Points for Feature Grouping CVPR 2010 Workshop on Perceptual Organization in Computer Vision (POCV 2010), IEEE, Piscataway, NJ, USA, 9-16. Code page

Visualization of the shape-centered feature grouping process (Engel and Curio 2010)

Superpixel image segmentation

Superpixels by Engel et al (2009)
Image segmentation plays an important role in computer vision and human scene perception. Image oversegmentation is a common technique to overcome the problem of managing the high number of pixels and the reasoning among them. Specifically, a local and coherent cluster that contains a statistically homogeneous region is denoted as a superpixel. In this paper we propose a novel algorithm that segments an image into superpixels employing a new kind of shape centered feature which serve as a seed points for image segmentation, and which is based on Gradient Vector Flow fields (GVF). The features are located at image locations with salient symmetry. We compare our algorithm to state-of-the-art superpixel algorithms and demonstrate a performance increase on the standard Berkeley Segmentation Dataset.
Engel D PersonSpinello L Triebel R Siegwart R Bülthoff HH Person and Curio C Person (2009) Medial Features for Superpixel Segmentation Eleventh IAPR Conference on Machine Vision Applications (MVA 2009), MVA Organizing Committee, Tokyo, Japan, 248-252. Code page

Combining Generative Model-Based and Discriminative Recognition Approaches

Curio and Giese 2005
Application Example: Posture Tracking by Posture Detection
We have developed an articulated tracking architecture, combining the advantages of several approaches (Curio & Giese, 2005).
View-based (feature based) posture recognition is used to initialize an articulated model at the beginning of the tracking stage, and when the model-based tracking fails. Since the proposal derived from view-based tracking is always further refined by model-based tracking, the tracking is relatively accurate, and more robust than tracking by purely view-based approaches. 
Details A: From the image low-dimensional image moments are extracted. Movements correspond to curves in this feature space. B: Low-dimensional posture space derived from the 2D shape model by PCA. Movements correspond to curves in this space that are approximated by spline interpolation between training data points. C: One-to-many SVM regression to map curves in feature manifold to curves in posture manifold. D: Particle filtering based on importance sampling over hidden low-dimensional state space. Different values of the state variable correspond to points on the posture manifold. E: Likelihood computation by synthesis of silhouettes that correspond to the present state, and by determining their region overlap with the present image silhouette. F1: First initial condition for model-based tracking derived from most likely recognized posture based on steps B-E. F2: Second initial condition for model-based tracking derived from the best fitting model-configuration in the previous time step. Model-based tracking is driven by a gradient vector field (GVF). Form the estimates generated from the two initial conditions (F1 and F2) the one with maximum similarity with the image features is selected for further tracking (Opens internal link in current windowsee also).

Curio C Person and Giese M (2005) Combining View-based and Model-based Tracking of Articulated Human Movements Workshop on Motion and Vision Computing, 261-268.

Discriminative object detection in clutter based on pooled image features

The advantage of view-based approaches for tracking is that they do not require complex optimization, and that they can provide initial body posture guesses for automatic tracking by simple feed-forward mappings (Curio 2004). While such discriminative approaches are successful for recognition, retrieving robust posture is rather a challenge. This has motivated to develop a  combination of discriminative and generative model-based approaches (see above, for example). Shape centered descriptions are further alternatives to make these approaches robust in clutter. Further, appropriate novel outdoors databases enable will enable us to achieve these goals.

Model-based object recognition

One way to track articulated figures (e.g. human bodies and facial expressions) in real videos is to fit geometric body models to specific image features, like contours. Another possibility to recognize, and possibly also to track actions is to recognize characteristic body postures. This second strategy is likely also used by the human brain.
An advantage of model-based tracking is that it can be very accurate, since it exploits local image features with high spatial resolution. The disadvantage is that it often becomes unstable because the image information is ambiguous, and because the underlying optimization problems have non-unique optima. A particular problem is the automatic initialization of model-based tracking at the start of the tracking process. Typically the initial posture of the human body is not known, and the space of possible configurations of body models is usually extremely high-dimensional.

Establishing learning-based algorithms for outdoors applications

Curio 2004
Movement trajectories of humans can be recorded exploiting advanced motion capture systems, like the VICON system that is used extensively in our department. These systems require that the recorded subjects are equipped with special reflecting markers. Inertia tracking technologies (e.g. commercial MOVEN) is a promising alternative, that can be carried on beneath normal clothing and recorded in natural, open environments (see below). For the development of applications that target a broader market the requirement of a motion capture system is often not acceptable. Instead, many unconstraint applications require to track human movement trajectories from regular video streams. Also it seems desirable for a number of applications, e.g. in biomedical engineering, automotive industries and human-machine interfaces, to be able to track movements without the requirement to equip actors with special sensors. The reconstruction of movement trajectories from normal video data without specific markers on the body of the subjects is called markerless tracking. A promising research direction is to map directly from image features to human body poses. Nevertheless novel databases that provide both, image footage and ground truth 3D motion capture with no visible markers, are desperately needed for open unconstrained environments. See example of our current pipeline below.

Outdoors motion capture pipelines for prototyping and simulation


18. Herdtweck C and Curio C (June-2013) Monocular Car Viewpoint Estimation with Circular Regression Forests, IEEE Intelligent Vehicles Symposium (IV 2013), IEEE, Piscataway, NJ, USA, 403-410.
17. Herdtweck C and Curio C (September-2012) Monocular Heading Estimation in Non-stationary Urban Environment, IEEE International Conference on Multisensor Fusion and Information Integration (MFI 2012), IEEE, Piscataway, NJ, USA, 244-250.
16. Engel D and Curio C (June-2012) Detectability prediction in dynamic scenes for enhanced environment perception, IEEE Intelligent Vehicles Symposium (IV 2012), IEEE, Piscataway, NJ, USA, 178-183.
15. Herdtweck C and Curio C (June-2012) Experts of probabilistic flow subspaces for robust monocular odometry in urban areas, IEEE Intelligent Vehicles Symposium (IV 2012), IEEE, Piscataway, NJ, USA, 661-667.
14. Engel D and Curio C (June-2011) Pedestrian Detectability: Predicting Human Perception Performance with Machine Vision, IEEE Intelligent Vehicles Symposium (IV 2011), IEEE, Piscataway, NJ, USA, 429-435.
13. Engel D and Curio C (June-2010) Shape Centered Interest Points for Feature Grouping, CVPR 2010 Workshop on Perceptual Organization in Computer Vision (POCV 2010), IEEE, Piscataway, NJ, USA, 9-16.
12. Curio C and Engel D (May-2010): A Computational Mid-Level Vision Approach For Shape-Specific Saliency Detection, 10th Annual Meeting of the Vision Sciences Society (VSS 2010), Naples, FL, USA, Journal of Vision, 10(7) 1160.
11. Engel D and Curio C (January-2010): Towards robust scene analysis: A versatile mid-level feature framework, 4th International Conference on Cognitive Systems (CogSys 2010), Zürich, Switzerland.
10. Engel D, Spinello L, Triebel R, Siegwart R, Bülthoff HH and Curio C (May-2009) Medial Features for Superpixel Segmentation, Eleventh IAPR Conference on Machine Vision Applications (MVA 2009), MVA Organizing Committee, Tokyo, Japan, 248-252.
9. Engel D and Curio C (December-2008) Scale-invariant medial features based on gradient vector flow fields, 19th International Conference on Pattern Recognition (ICPR 2008), IEEE Service Center, Piscataway, NJ, USA, 1-4.
8. Engel D and Curio C (July-2007): A Biologically Motivated Approach to Human Body Pose Tracking in Clutter, 10th Tübinger Wahrnehmungskonferenz (TWK 2007), Tübingen, Germany.
7. Curio C and Giese M (January-2005) Combining View-based and Model-based Tracking of Articulated Human Movements, IEEE Computer Society Workshop on Motion and Vision Computing (MOTION 2005), IEEE Computer Society, Los Alamitos, CA, USA, IEEE Computer Society Workshop on Motion and Vision Computing, 261-268.
1, 2

Export as:
BibTeX, XML, Pubman, Edoc, RTF
Last updated: Friday, 23.02.2018