Alumni of the Group Cognitive Engineering
Computer vision aims to teach machines and algorithms to `see' with the ultimate goal of creating `intelligent' applications and devices that can provide assistance to humans in a wide array of scenarios. My approach investigates computer vision on three layers: low-level features, mid-level representations and high-level applications. Each of the layers depends on the previous ones while also generating constraints and requirements for them. At the application layer human-machine interfaces come into play and link the human perception to computer vision. By studying all layers we can gain a much deeper insight into the interplay of different methods, than by examining an isolated problem. Furthermore, we are able to factor constraints imposed by different layers and the users into the design of the algorithms, instead of optimizing a single method based purely on algorithmic performance measures. The different modules of my thesis are tightly connected and inter-dependent, in the framework of shape-centered representations. The connections between the modules avails the possibility to feed information back from higher to lower layers and optimize the design choices there. My work on these three layers includes:
These interest points are formed at location of high local symmetry as opposed to corner interest points which occur along the outline of shapes. Experiments show that they are very robust with respect to common natural image transformations, such as scaling, rotation and the introduction of noise and clutter.
I presented two strategies to build robust mid-level image representations: First, a novel feature grouping method is introduced. The scheme offers a powerful way to combine the advantages of shape-centered interest points, namely robustness and a tight connection to a unique shape, and corner-based interest points, namely strong descriptors. Secondly, I introduced a novel set of medial feature superpixels. They represent a feed-forward way to divide the image into small, visually-homogeneous regions offering a compact and efficient mid-level representation of the image information.
Here, I bridge the gap between computer vision and the human observer by introducing three applications that employ the shape-centered representations from the two previous layers. The first step is a multi-class scene labeling scheme that produces dense annotations of images, combining a local prediction step with a global optimization scheme. I developed a novel image retrieval tool that operates on high-level semantic information allowing a more efficient image search in large labeled datasets. Finally, I put forward and investigated the novel idea of predicting the detectability of a pedestrian in a driver assistance context.
I am currently working on open question in the fields of efficient crowd-sourcing and active learning. Here, the human user enters directly into the loop via crowd-sourcing services such as Amazon Mechanical Turk. This approach can allow us to efficiently optimize our tools as well as distribute complex tasks between human and machine intelligence (leaving only few hard examples for the human while the machine learning takes care of the vastly larger easy parts of a problem).