The analysis and synthesis of images is a particularly challenging problem for automatic systems. It typically involves large amounts of high dimensional data with significant between-feature correlations that are corrupted by non-uniform sensor noise. The fact that, nevertheless, humans and animals have developed very efficient visual systems makes the analysis and synthesis of images an important field of research that both inspires new machine learning research and allows testing existing techniques for their practical applicability. Our own research in this area can be subdivided into three main directions: (classical) computer vision, image processing and – more recently – computational photography. The boundaries between these areas are not sharply defined, as they all deal with processing image data and often rely on similar techniques.
As computer vision we consider the task of extracting high-level information from images, e.g. the presence of objects or the classification of events. In the time since we started our own work in this area in 2002, computer vision has become a showcase example of how machine learning can all but take over a field of research previously dominated by hand crafted techniques. Today, the use of machine learning methods has become common practice for many computer vision researchers. Our own research in this field therefore follows a dual agenda: on the one hand, we develop new methods for computer vision problems based on recent machine learning techniques. In particular this includes the use of structured input and output spaces, a field that has not yet become mainstream in computer vision research, and thus enables us to establish new grounds ourselves rather than following existing trends. On the other hand, we focus on finding methods that are not limited to the solution of specific problems, but that will be applicable to other areas as well.
Our work on learning structured features is an example of this dual strategy: we have generalized frequent item set mining and graph mining, two retrieval techniques that are currently successful in data mining research. By defining the quantity of interest for a pattern (an item set or graph) to measure how discriminative it is for a prediction task, instead of its frequency in the data corpus, we obtained discriminative item set mining and graph mining. We demonstrated that applying these techniques to regions of natural images yields powerful structured features that can be used for the detection of objects in images and for the classification of actions in videos. At the same time, the methodology is applicable to other domains where structured information is processed, e.g., bioinformatics.
An important focus in our recent research has been the question how automatic systems can learn to localize objects in images. We developed a method for this that replaces the usual and suboptimal two-step procedure by a joint formulation of structured output regression that allows consistent end-to-end training. Subsequently, we developed a method that allows the use of image and object context when training such detection systems for multiple classes. It determines relevant dependencies between predictors for different classes automatically using multiple kernel learning.
Further projects that originated in computer-vision-related problems are in clustering and taxonomy discovery, learning of optimal kernel combinations and learning of image interest operators from eye movements. These are also projects in the sections on Kernel Algorithms and Machine Learning in Neuroscience.
Taking a medium-term perspective, we expect that machine learning may have as strong an influence on other image-related domains as it had on computer vision. We try to make this happen by devoting an increasing amount of our attention to computer graphics, image processing, and the newly established field of computational photography. Central themes of our research in these areas are the exploitation of image statistics, often in conjunction with their use for image reconstruction.
Steganalysis has the goal to detect hidden signals that have been embedded invisibly into image data. Most previous approaches in this area relied on explicit models of the suspected steganographic method, which severely limits their applicability. In contrast, we developed a model-free method that relies only on relatively universal image statistics, characterized by how well image pixels can be predicted from their neighbors. This helps detect unknown forms of steganography.
Medical imaging applications, such as positron emission tomography (PET), are of special interest to our research, as images cannot be taken directly in this field, but they have to be reconstructed numerically from noisy sensor measurements. This can benefit from the use of statistical image models, as they should allow inferring higher quality images from fewer measurements, thereby saving time and cost. In this line of research, we have recently developed a new non-monotonic method for maximumlikelihood PET image reconstruction that improves reconstruction speed and quality over previous approaches.
In computer graphics we have also continued to study reconstruction problems, where the task is to infer a three-dimensional object shape from two-dimensional views. This again is an area where the integration of statistical knowledge about shape and surface smoothness can improve an algorithms’ performance. Specifically, we developed a method for 3D face reconstruction that creates a face model of a specific person from a monocular video, or even from a single image, without user interaction. Progress was also made in the area of mesh tracking, where we developed a fast technique that allows realistic animation of time-varying, flexible objects.
Computational photography is an area of research that aims at enhancing photographic imaging processes beyond the capabilities of traditional film-based cameras. It combines many of the aspects of computer vision, image processing and computer graphics. As an entry into this field we have conducted work on color constancy, the problem of making photos of natural scenes look consistent independent of the lighting conditions in which the photo was taken. A Bayesian approach for automatic white balancing improves the visual impression of scenes by means of a prior function that is learned empirically from a reference dataset.
Another focus of our research has been cross-modal image prediction, i.e., the prediction of images in one modality from another. An important application area for this is PET-MR imaging, where attenuation correction of PET scans requires the estimation of a synthetic X-ray image. Using statistical techniques and the recent structured-output support vector machine framework, we developed methods for such cross-modal prediction tasks. As a test application, we considered the task to predict color from gray scale images.
While many of the above projects exploit structure in the joint distribution of pixel values arising from corresponding structure in the objects being imaged, one can also exploit dependences in the joint distribution of sensor noise. A common problem in long exposure or high ISO photography is thermal noise accumulated during exposures. This noise is non-stationary, e.g., due to changes in the ambient temperature; however, it is highly structured and it can significantly be reduced by a method that uses a sample of a camera’s thermal and readout noise distribution combined with an image prior to generated plausible low noise images.
We have also recently initiated work in the area of blind image deconvolution. In particular, we have developed a method that removes image blur caused by an unknown point spread function if multiple exposures of the same object or scene are available. This has important applications, e.g., in high-resolution astronomical imaging.