ChristianHerta

Deep-Learning-Overview slides

Deep Learning¶

an overview¶

Christian Herta, HTW Berlin

Talk slides and teaching material for deep learning at
http://christianherta.de

"Traditional" machine learning¶

Engineering of Features:

"Traditional" approach for image and speech¶

Deep learning is feature learning¶

learning of representations¶

depth is the number of transformation steps.

"Machine Perception"¶

Read (high dimensional) data and transform them into a "higher" representation to perform tasks / reach goals.

High dimensional data¶

Images / Videos
Sound (Voice / Music)
Natural Language
Time Series

What kinds of representations?¶

(typical) representation:

vector (or sequence of vectors)

$$ {\vec h} = (h_1, h_2, \dots h_n) $$

(distributed representations)

(Simple) Feed Forward Neural Network¶

</p>

Transforming the input vector through many layers.
The hidden state of each layer corresponds to a representation of the input.

Layer of a simple feed forward neural network¶

Affine transformation

$$ \vec z = \hat W \cdot \vec x + \vec b $$

followed by an element-wise application of a non-linear function $\sigma (\dots)$

$$ \vec h = \sigma ( \vec z ) $$

Prediction is easy if we have good representations¶

e.g. for classification the representation of the last hidden layer are linear separable

Word Embeddings¶

representations for words (learned from sentences)

</p>

"low" dimensional space ($\sim 10^2$)

Syntactic and semantic information is encoded in the space (directions)
With simple vector arithmetics we can answer questions like
- Man is is related to Woman like King to ?
- Germany ($\vec G$) is related to Berlin ($\vec B$) like Ukraine ($\vec U$) to ?

The nearest word of $$ \vec U - \vec G + \vec B $$ is Kiew.

Feature Transformations

learning representations through many layers

Convolutional Neural Networks</h3>¶

typical neural network for image and video processing

TODO

Image Recognition¶

Classification of Images
ImageNet Dataset

Recurrent Neural Networks¶

for sequence data
have internal state which acts like a memory

e.g. Natural Language Processing:

RNN Language Models
- represents sentences
- can generate (new) unseen sentences

Language Model (RNN enrolled in time)¶

Drawing

Generative Models

Generative Adversarial Networks
Variational Autoencoder

Encoder Decoder Models¶

Neural Machine Translation¶

Drawing

Neural Machine Translation¶

Drawing

(image from http://opennmt.net/)

Image Captions¶

</p>

Encoder: Transforming the image into a vector representation.
Decoder: Language model RNN transforms the vector representation into a sentence.

Image to Image Translation

by Conditional Generative Models:
e.g. [Conditional Adversarial Networks](https://phillipi.github.io/pix2pix/)

Input: The user draws a sketch
Output: A photorealistic picture is generated

≈‚