Leveraging structure in Computer Vision tasks for flexible Deep Learning models

A. Royer, Leveraging Structure in Computer Vision Tasks for Flexible Deep Learning Models, IST Austria, 2020.

Download
OA 2020_Thesis_Royer.pdf 30.22 MB
Restricted thesis_sources.zip

Thesis | Published | English
Department
Series Title
IST Austria Thesis
Abstract
Deep neural networks have established a new standard for data-dependent feature extraction pipelines in the Computer Vision literature. Despite their remarkable performance in the standard supervised learning scenario, i.e. when models are trained with labeled data and tested on samples that follow a similar distribution, neural networks have been shown to struggle with more advanced generalization abilities, such as transferring knowledge across visually different domains, or generalizing to new unseen combinations of known concepts. In this thesis we argue that, in contrast to the usual black-box behavior of neural networks, leveraging more structured internal representations is a promising direction for tackling such problems. In particular, we focus on two forms of structure. First, we tackle modularity: We show that (i) compositional architectures are a natural tool for modeling reasoning tasks, in that they efficiently capture their combinatorial nature, which is key for generalizing beyond the compositions seen during training. We investigate how to to learn such models, both formally and experimentally, for the task of abstract visual reasoning. Then, we show that (ii) in some settings, modularity allows us to efficiently break down complex tasks into smaller, easier, modules, thereby improving computational efficiency; We study this behavior in the context of generative models for colorization, as well as for small objects detection. Secondly, we investigate the inherently layered structure of representations learned by neural networks, and analyze its role in the context of transfer learning and domain adaptation across visually dissimilar domains.
Publishing Year
Date Published
2020-09-14
Acknowledgement
Last but not least, I would like to acknowledge the support of the IST IT and scientific computing team for helping provide a great work environment.
Acknowledged SSUs
Page
197
eISSN
IST-REx-ID

Cite this

Royer A. Leveraging Structure in Computer Vision Tasks for Flexible Deep Learning Models. IST Austria; 2020. doi:10.15479/AT:ISTA:8390
Royer, A. (2020). Leveraging structure in Computer Vision tasks for flexible Deep Learning models. IST Austria. https://doi.org/10.15479/AT:ISTA:8390
Royer, Amélie. Leveraging Structure in Computer Vision Tasks for Flexible Deep Learning Models. IST Austria, 2020. https://doi.org/10.15479/AT:ISTA:8390.
A. Royer, Leveraging structure in Computer Vision tasks for flexible Deep Learning models. IST Austria, 2020.
Royer A. 2020. Leveraging structure in Computer Vision tasks for flexible Deep Learning models, IST Austria, 197p.
Royer, Amélie. Leveraging Structure in Computer Vision Tasks for Flexible Deep Learning Models. IST Austria, 2020, doi:10.15479/AT:ISTA:8390.
All files available under the following license(s):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0):
Main File(s)
File Name
Access Level
OA Open Access
Date Uploaded
2020-09-14
MD5 Checksum
c914d2f88846032f3d8507734861b6ee
File Name
thesis_sources.zip 74.23 MB
Access Level
Restricted Closed Access
Date Uploaded
2020-09-14
MD5 Checksum
ae98fb35d912cff84a89035ae5794d3c


Export

Marked Publications

Open Data IST Research Explorer

Search this title in

Google Scholar
ISBN Search