Sanja Fidler (2010) Recognizing visual object categories with subspace methods and a learned hierarchical shape vocabulary. PhD thesis.
Abstract
The topic of the thesis is visual object class recognition and detection in images. In the first part of the thesis, we developed an approach that combines reconstructive and discriminative subspace methods for robust object classification. In the second part, we developed a framework for learning of a hierarchical compositional shape vocabulary for representing multiple object classes and detecting them in images. Linear subspace methods that provide sufficient reconstruction of the data such as PCA (Principal Component Analysis) offer an efficient way of dealing with missing pixels, outliers, and occlusions that often appear in images. Discriminative methods, such as LDA (Linear Discriminant Analysis) and CCA (Canonical Component Analysis), which on the other hand, are better suited for classification and regression tasks, are highly sensitive to corrupted data. If an image in the test phase contains outliers (e.g. an object in an image is partly occluded), discriminative methods are likely to assign it to the wrong class. In this thesis, we propose an approach that combines discriminative and reconstructive methods in a way that enables near-to-perfect classification performance also in the case when objects during testing time are partly occluded. The idea behind the proposed approach is to augment the subspace basis given by a discriminative approach with a small set of additional basis vectors computed by a reconstructive method. In the space spanned by the augmented basis, we are able to detect and remove outlying pixels using a robust subsampling scheme and classify images based on the inliers. The proposed approach is thus capable of robust classification/regression with a high break-down point. The theoretical results are demonstrated on several computer vision tasks showing that the proposed approach significantly outperforms the standard discriminative methods in the case of missing pixels and images containing occlusions and outliers. In the second and main part of the thesis, we will present a novel hierarchical framework for representing, learning and detecting object classes in images. Hierarchies are important, because they allow feature sharing between objects at multiple levels of representation, lead to better generalization within and across object classes, are able to code exponential variability in a very compact way and enable fast inference. This makes them potentially suitable for learning and recognizing a higher number of object classes. However, the success of the hierarchical approaches so far has been hindered by the use of hand-crafted features or predetermined grouping rules. In this thesis, we present a novel framework for learning a hierarchical compositional shape vocabulary for representing multiple object classes. The approach takes simple contour fragments and learns their frequent spatial configurations. These are recursively combined into increasingly more complex and class-specific shape compositions, each exerting a high degree of shape variability. At the top-level of the vocabulary, the compositions are sufficiently large and complex to represent the whole shapes of the objects. We learn the vocabulary layer after layer, by gradually increasing the size of the window of analysis and reducing the spatial resolution at which the shape configurations are learned. Compositions are formed by first learning spatial relations between pairs of parts (features from the previous layer) and then learning their frequent higher-order co-occurrences. The lower layers are learned jointly on images of all classes, whereas the higher layers of the vocabulary are learned incrementally, by presenting the algorithm with one object class after another. The experimental results show that the learned multi-class object representation scales favorably with the number of object classes and achieves a state-of-the-art detection performance at both, faster inference as well as shorter training times. Additionally, the learned multi-class object representation is very compact, needing only a few megabytes when stored on a computer disk. We also demonstrate the usefulness of the features learned in the intermediate layers of the hierarchy for object classification.
Item Type: | Thesis (PhD thesis) |
Keywords: | computer vision, subspace methods, robust classification, visual object class recognition and detection, visual learning, hierarchical representation, part-based representations, compositional hierarchies |
Number of Pages: | 181 |
Language of Content: | English |
Mentor / Comentors: | Name and Surname | ID | Function |
---|
doc. dr. Gašper Fijavž | 246 | Mentor | prof. dr. Tomaž Košir | 3490 | Comentor |
|
Link to COBISS: | http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00007960660) |
Institution: | University of Ljubljana |
Department: | Faculty of Computer and Information Science |
Item ID: | 1112 |
Date Deposited: | 28 Jun 2010 08:58 |
Last Modified: | 13 Aug 2011 00:37 |
URI: | http://eprints.fri.uni-lj.si/id/eprint/1112 |
---|
Actions (login required)