Face & Motion Capture

For recordings of faces and human activities in general we have two facilities: the VideoLab was designed for recordings of human activities from several viewpoints. 6 cameras can record precisely synchronized color videos.

It has also been used to create a database of facial expressions and action units and is being employed on recognition of facial expressions as well as in investigations of visual-auditory interactions in communication. Several commercial 3D scanning systems are used in the ScanLab for capturing shape and color information of faces. The data is used for producing stimuli for psychophysical experiments and for building models to support machine learning and computer vision algorithms.


Video Lab

The VideoLab has been in use since May 2002. It was designed for recordings of human activities from several viewpoints. It currently consists of 5 Basler CCD cameras that can record precisely synchronized color videos. In addition, the system is equipped for synchronized audio recording. The whole system was built from off-the-shelf components and was designed for maximum flexibility and extendibility. The VideoLab has been used to create several databases of facial expressions as well as action units and was instrumental to several projects on recognition of facial expressions and gestures across viewpoints, multi-modal action learning and recognition, as well as in investigations of visual-auditory interactions in communication.

One of the major challenges in creating the VideoLab was the choice of hardware components allowing for on-line, synchronized recording of uncompressed video streams. Each of the 5 Basler A302bc digital video cameras produces up to 26MB per second (782x582 pixels at 60 frames per second) – data that is continuously written onto dedicated hard disks using CameraLink interfaces. Currently, the computers have a striped RAID-0 configuration to maximize writing speed. The computers are connected and controlled via a standard Ethernet-LAN, synchronization at the microsecond level is achieved using external hardware triggering.

For precise control of multi-camera recordings, we developed our own distributed recording software. In addition to the frame grabber drivers, which provide basic recording functionality on a customized Linux operating system, we programmed a collection of distributed, multi-threaded real-time C programs that handle control of the hardware, as well as buffering and write-out of the video- and audio data to hard discs. All software components communicate with each other via standard Ethernet-LAN. On top of this low-level control software, we have implemented a graphical user interface to access the whole functionality of the VideoLab using Matlab and the open-source PsychToolBox-3 software as framework. 


ScanLab

Left: ABW scanner setup. Right: Sample data of the ABW structured light scanner

Several commercial 3D scanning systems are used for capturing shape and color information of faces. The data is used for producing stimuli for psychophysical experiments and for building models to support machine learning and computer vision algorithms:

OptiTrack Facial Capture Lab
For easy recording of facial motion data without real-time requirements, a marker-based motion capture system developed by NaturalPoint was installed in the same space as the ABW Structured Light Scanner, reusing the scaffolding and seating. It consists of 5 “Flex” USB cameras (640x480, 100 FPS) and one additional camera for recording reference video. Retro-reflective markers (3mm diameter) are illuminated by camera-mounted LEDs in the near-infrared spectrum for accurate motion analysis and reconstruction in the NaturalPoint’s “Expression” software.

Face Scanners
Several commercial 3D scanning systems are available for capturing shape and color information of faces. The data is used for producing stimuli for psychophysical experiments and for building models to support machine learning and computer vision algorithms:

Cyberware Head Scanner
This scanner (Cyberware, Inc., USA), uses laser triangulation for recording 3D data and a line sensor for capturing color information, both producing 512 x 512 data points (typical depth resolution 0.1 mm), covering 360º of the head in a cylindrical projection within 20 s. It was extensively used to build the MPI Head Database (http://faces.kyb.tuebingen.mpg.de/) and it still used for adding new heads to the database.This scanner (Cyberware, Inc., USA), uses laser triangulation for recording 3D data and a line sensor for capturing color information, both producing 512 x 512 data points (typical depth resolution 0.1 mm), covering 360º of the head in a cylindrical projection within 20 s. It was extensively used to build the MPI Head Database (http://faces.kyb.tuebingen.mpg.de/) and it still used for adding new heads to the database.

ABW Structured Light Scanner
This is a customized version of an industrial scanning system (ABW GmbH, Germany), modified for use as a dedicated face scanner. It consists of two LCD line projectors, three video cameras and three DSLR cameras. Using structured light and a calibrated camera/projection setup, 3D data can be calculated from the video images using triangulation. Covering a face from ear to ear, the system produces up to 900.000 3D points and 18 megapixels of color information. One recording takes about 2 s, thus making this scanner much more suitable for recording facial expressions than the Cyberware system. It was used extensively for building a facial expression model and for collecting static FACS data.This is a customized version of an industrial scanning system (ABW GmbH, Germany), modified for use as a dedicated face scanner. It consists of two LCD line projectors, three video cameras and three DSLR cameras. Using structured light and a calibrated camera/projection setup, 3D data can be calculated from the video images using triangulation. Covering a face from ear to ear, the system produces up to 900.000 3D points and 18 megapixels of color information. One recording takes about 2 s, thus making this scanner much more suitable for recording facial expressions than the Cyberware system. It was used extensively for building a facial expression model and for collecting static FACS data.

ABW Dynamic Scanner
Using the same principle of structured light as the static ABW scanner, this system uses a high-speed stripe pattern projector, two high-speed video cameras and a color camera synchronized to strobe illumination. It can currently perform 40 3D measurements/s (with color information), producing detailed face scans over time. Ten seconds of recording time produce 2 GB of raw data.

3dMD Speckle Pattern Scanner
This turn-key system (3dMD Ltd, UK) is mainly designed for medical purposes. Using four video cameras in combination with infrared speckle pattern flashes and two color cameras in sync with photography flashes, it can capture a face from ear to ear in 2ms, making it highly suitable for recording infants and children. This system was used in collaboration with the University Clinic for Dentistry and Oral Medicine for studying infant growth.

Passive 4D Stereo Scanner
With three synchronized HD machine vision cameras (two grayscale, one color), this system developed by Dimensional Imaging, UK, is capable of reconstructing high-quality dense stereo data of moving faces at a rate of 25 frames/s. Being a passive system, high-quality studio lighting can be used to illuminate the subject which results in better color data as well as more subject comfort, compared to the ABW Dynamic Scanner. While care must be taken with focus, exposure and calibration of the cameras, the system imposes fewer limitations on rigid head motion.  


Gaze-tracking Facilities

Gaze-tracking is a method that allows us to determine the regions of a visual scene that a participant considers as being relevant to the given task. We employ a range of video-based eye-trackers that can be flexibly tailored to various experimental demands, from the investigation of eye-movement coordination to the acquisition of gaze strategies during flight training.

An Eyelink II (2000 Hz, SR Research; Figure 1a) head-mounted eye-tracker is used for experiments that require high temporal resolution (e.g., saccade velocity measurements, micro-saccade detection) as well as high accuracy (~0.3°). It is often used with CRT and TFT displays with high refresh rates (i.e., (100-120 Hz). Given the eye-tracking camera's proximity to the eyes, it can also allow for accurate eye-tracking in large field-of-view display set-ups (i.e., PanoLab, BackProjectionLab). In such setups, the observer’s head can be supported by a chin-rest to assure high tracking accuracy. This set-up is employed to analyze low-level gaze behavior (e.g., saccade kinematics) during object recognition and steering tasks. Alternatively, the same eye-tracker can be combined with our body-motion tracking equipment (i.e., Vicon, ART) to allow for the gaze-tracking of head- and body-unrestrained users (120 Hz). For this purpose, we have developed open-source software — based on geometric- and regression-based algorithms — that can estimate the point-of-regard on large displays in real-time for mobile users (Figure 2).

Remote eye-trackers consist of stereo cameras that estimate eye- and head-pose from markerless face detection algorithms. The observer’s gaze vector is computed by estimating head orientation from transient facial features and pupil orientation with remote cameras (up to 1.5 m, 60 Hz, ~1.0° accuracy). They remove the need for head-mounted gear and maximize participant comfort. The Tobii T60 XL and Seeing Machines FaceLab are respectively used in our visual psychophysics laboratories and fixed-base flight simulators (i.e., HeliLab; Figure 1b). They are particularly suitable for experiments that investigate how gaze dwells are distributed across regions-of-interest in the visual scene.

Light-weight googles with integrated cameras for simultaneous eye-tracking and visual scene capture can also be used for mobile gaze-tracking in outdoor (and non-display) environments. The SMI-ETG (30 Hz, ~1.0°; Figure 1c) is particularly useful for observational studies of user's gaze in naturalistic large field-of-view environments (e.g., driving in the real world). Gaze is calibrated to a front-facing scene camera that captures the visual scene that is available to the user during task performance. Post-recording analyses allow task-relevant regions of interest in the visual scene to be segmented and labelled. The Eye See Cam vHIT (Interacoustics) is a light weight and comfortable head-mounted eye tracker that allows for video-based eye movement recordings with high temporal (220Hz) and spatial accuracy (~0.5 deg). Based on IR-recordings of the eye, horizontal, vertical and also torsional eye position (i.e., ocular counterroll) can be determined. This gives a full three-dimensional characterization from the eye position in the head. Synchronized measurement of head motion through an integrated inertial measurement unit (IMU) allows to correlate eye and head motion. The system is currently used in motion sickness studies and could also be used for vestibular function tests.


Biosensors

Electrophysiological signals, measured from skin electrodes, allow us to evaluate brain-, heart- and muscle-activity, eye movements, respiration, galvanic skin response and many other physiological and physical parameters. The core of our electrophysiological recording devices consists of a 32-channel multi-purpose USBamp system (gTec, Austria) as well as a wireless 64-channel EEG system (Brain Products, Germany). In addition, we employ stand-alone sensors for dedicated experimental setups.

The 32-channel multi-purpose USBamp system consists of two modular 16-channel amplifiers that are primarily used for EEG recording. Nonetheless, it is also compatible with a range of sensors for measuring blood-pulse, skin-conductance activity and respiration. Thus, this system is useful for measuring EEG and peripheral biosignals simultaneously.

The wireless 64-channel active electrode EEG system is ideal for laboratory spaces that require user mobility, such as motion simulators. It consists of modular DC amplifiers and additional amplifiers can be introduced to extend this system to 128 channels. LEDs on the electrodes indicate the local impedance, thus facilitating EEG preparation and ensuring consistent accuracy across the electrodes. The same system has a LED-based digitizer that allows for quick localization mapping of the EEG electrodes. This allows for more accurate EEG source localization analyses.

In addition, a portable and multi-purpose 16-channel V-amp amplifier (Brain Products, Germany) is used to manage the mobile recording of electrophysiological signals (e.g., EMG, ECG, EEG) as well as biosignals such as respiration rate. This system can be extended with different sensors for multiple physiological measures including respiration, temperature, skin conductivity and blood pulse. For example, we employ a piezo-electric crystal sensor in a robust belt system that can be used to record chest or abdominal respiration waveforms. Such a system can be used to record breathing frequency and to provide bio-feedback in a virtual reality application. The Q Sensor is a stand-alone wireless device that allows the wearer to conveniently record skin conductance as function of sympathetic nervous system activity. It can sample up to 32Hz and is used as a tool for psychophysiology experiments. In order to assess postural stability which can correlate with motion sickness levels a Nintendo Wii Balance Board is used. Research has shown that the Balance Board is a viable alternative to force plate measurements. Internally it consists of 4 force sensors and samples center of pressure data at 60 Hz.


Graphics Engines

We primarily use the Unity game engine from Unity Technologies for development and deployment of our experiments. With the help of the commercial middleware solution MiddleVR from I’m in VR, we are able to run Unity based experiments in all of our VR labs. We are able to transition existing Unity experiments with relatively small effort from one VR setup to another. One example would be a HMD based experiment which will later be run in the PanoLab cluster-rendering setup. Unreal Engine 4 (UE4) is used as an alternative when high-visual fidelity is needed such as helicopter and driving simulation visuals. We have extended UE4 with VRPN tracking capabilities, which allow us to run UE4 in a mobile VR setup in our large tracking hall. A custom UDP plugin allows it to be used with our motion simulators like the CableRobot Simulator or CyberPod. The discontinued Virtools software from Dassault Systèmes is no longer used for new experiments.

Go to Editor View