2016-08-14 22:41:04 +00:00
|
|
|
"""This is an example of using Hierarchical RNN (HRNN) to classify MNIST digits.
|
2016-08-14 22:40:13 +00:00
|
|
|
|
2016-08-14 22:41:04 +00:00
|
|
|
HRNNs can learn across multiple levels of temporal hiearchy over a complex sequence.
|
|
|
|
Usually, the first recurrent layer of an HRNN encodes a sentence (e.g. of word vectors)
|
2016-08-14 22:40:13 +00:00
|
|
|
into a sentence vector. The second recurrent layer then encodes a sequence of
|
2016-08-14 22:41:04 +00:00
|
|
|
such vectors (encoded by the first layer) into a document vector. This
|
2016-08-14 22:40:13 +00:00
|
|
|
document vector is considered to preserve both the word-level and
|
|
|
|
sentence-level structure of the context.
|
|
|
|
|
2016-08-14 22:41:04 +00:00
|
|
|
# References
|
2017-01-10 00:11:00 +00:00
|
|
|
- [A Hierarchical Neural Autoencoder for Paragraphs and Documents](https://arxiv.org/abs/1506.01057)
|
2016-08-14 22:41:04 +00:00
|
|
|
Encodes paragraphs and documents with HRNN.
|
|
|
|
Results have shown that HRNN outperforms standard
|
|
|
|
RNNs and may play some role in more sophisticated generation tasks like
|
|
|
|
summarization or question answering.
|
|
|
|
- [Hierarchical recurrent neural network for skeleton based action recognition](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7298714)
|
|
|
|
Achieved state-of-the-art results on skeleton based action recognition with 3 levels
|
|
|
|
of bidirectional HRNN combined with fully connected layers.
|
2016-08-14 22:40:13 +00:00
|
|
|
|
|
|
|
In the below MNIST example the first LSTM layer first encodes every
|
2016-08-14 22:41:04 +00:00
|
|
|
column of pixels of shape (28, 1) to a column vector of shape (128,). The second LSTM
|
|
|
|
layer encodes then these 28 column vectors of shape (28, 128) to a image vector
|
2016-08-14 22:40:13 +00:00
|
|
|
representing the whole image. A final Dense layer is added for prediction.
|
|
|
|
|
|
|
|
After 5 epochs: train acc: 0.9858, val acc: 0.9864
|
|
|
|
"""
|
|
|
|
from __future__ import print_function
|
|
|
|
|
2017-02-28 22:41:30 +00:00
|
|
|
import keras
|
2016-08-14 22:40:13 +00:00
|
|
|
from keras.datasets import mnist
|
2017-01-06 17:25:03 +00:00
|
|
|
from keras.models import Model
|
2016-08-14 22:40:13 +00:00
|
|
|
from keras.layers import Input, Dense, TimeDistributed
|
|
|
|
from keras.layers import LSTM
|
|
|
|
|
2016-08-14 22:41:04 +00:00
|
|
|
# Training parameters.
|
2016-08-14 22:40:13 +00:00
|
|
|
batch_size = 32
|
2017-02-15 00:08:30 +00:00
|
|
|
num_classes = 10
|
2017-02-28 02:53:41 +00:00
|
|
|
epochs = 5
|
2016-08-14 22:40:13 +00:00
|
|
|
|
2016-08-14 22:41:04 +00:00
|
|
|
# Embedding dimensions.
|
|
|
|
row_hidden = 128
|
|
|
|
col_hidden = 128
|
|
|
|
|
|
|
|
# The data, shuffled and split between train and test sets.
|
2017-02-28 02:53:41 +00:00
|
|
|
(x_train, y_train), (x_test, y_test) = mnist.load_data()
|
2016-08-14 22:40:13 +00:00
|
|
|
|
2016-08-14 22:41:04 +00:00
|
|
|
# Reshapes data to 4D for Hierarchical RNN.
|
2017-02-28 02:53:41 +00:00
|
|
|
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
|
|
|
|
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
|
|
|
|
x_train = x_train.astype('float32')
|
|
|
|
x_test = x_test.astype('float32')
|
|
|
|
x_train /= 255
|
|
|
|
x_test /= 255
|
|
|
|
print('x_train shape:', x_train.shape)
|
|
|
|
print(x_train.shape[0], 'train samples')
|
|
|
|
print(x_test.shape[0], 'test samples')
|
2016-08-14 22:40:13 +00:00
|
|
|
|
2016-08-14 22:41:04 +00:00
|
|
|
# Converts class vectors to binary class matrices.
|
2017-03-12 03:44:29 +00:00
|
|
|
y_train = keras.utils.to_categorical(y_train, num_classes)
|
|
|
|
y_test = keras.utils.to_categorical(y_test, num_classes)
|
2016-08-14 22:40:13 +00:00
|
|
|
|
2017-02-28 02:53:41 +00:00
|
|
|
row, col, pixel = x_train.shape[1:]
|
2016-08-14 22:40:13 +00:00
|
|
|
|
2016-08-14 22:41:04 +00:00
|
|
|
# 4D input.
|
2016-08-14 22:40:13 +00:00
|
|
|
x = Input(shape=(row, col, pixel))
|
|
|
|
|
2016-08-14 22:41:04 +00:00
|
|
|
# Encodes a row of pixels using TimeDistributed Wrapper.
|
2017-02-28 02:53:41 +00:00
|
|
|
encoded_rows = TimeDistributed(LSTM(row_hidden))(x)
|
2016-08-14 22:40:13 +00:00
|
|
|
|
2016-08-14 22:41:04 +00:00
|
|
|
# Encodes columns of encoded rows.
|
|
|
|
encoded_columns = LSTM(col_hidden)(encoded_rows)
|
2016-08-14 22:40:13 +00:00
|
|
|
|
2016-08-14 22:41:04 +00:00
|
|
|
# Final predictions and model.
|
2017-02-15 00:08:30 +00:00
|
|
|
prediction = Dense(num_classes, activation='softmax')(encoded_columns)
|
2017-02-28 02:53:41 +00:00
|
|
|
model = Model(x, prediction)
|
2016-08-14 22:40:13 +00:00
|
|
|
model.compile(loss='categorical_crossentropy',
|
|
|
|
optimizer='rmsprop',
|
|
|
|
metrics=['accuracy'])
|
|
|
|
|
2016-08-14 22:41:04 +00:00
|
|
|
# Training.
|
2017-03-12 03:44:29 +00:00
|
|
|
model.fit(x_train, y_train,
|
2017-03-26 14:27:49 +00:00
|
|
|
batch_size=batch_size,
|
|
|
|
epochs=epochs,
|
|
|
|
verbose=1,
|
|
|
|
validation_data=(x_test, y_test))
|
2016-08-14 22:40:13 +00:00
|
|
|
|
2016-08-14 22:41:04 +00:00
|
|
|
# Evaluation.
|
2017-03-12 03:44:29 +00:00
|
|
|
scores = model.evaluate(x_test, y_test, verbose=0)
|
2016-08-14 22:41:04 +00:00
|
|
|
print('Test loss:', scores[0])
|
|
|
|
print('Test accuracy:', scores[1])
|