Merge pull request #5 from fchollet/master

update
This commit is contained in:
Oleg Sinyavskiy 2016-01-13 17:34:42 -08:00
commit bd2ff26b37
60 changed files with 3210 additions and 488 deletions

@ -11,6 +11,8 @@ matrix:
env: KERAS_BACKEND=theano
- python: 2.7
env: KERAS_BACKEND=tensorflow
- python: 2.7
env: KERAS_BACKEND=theano INTEGRATION_TESTS=true
install:
# code below is taken from http://conda.pydata.org/docs/travis.html
# We do this conditionally because it saves us some downloading if the
@ -55,6 +57,10 @@ script:
# set up keras backend
- sed -i -e 's/"backend":[[:space:]]*"[^"]*/"backend":\ "'$KERAS_BACKEND'/g' ~/.keras/keras.json;
- echo -e "Running tests with the following config:\n$(cat ~/.keras/keras.json)"
- PYTHONPATH=$PWD:$PYTHONPATH py.test tests/
- if [[ "$INTEGRATION_TESTS" == "true" ]]; then
PYTHONPATH=$PWD:$PYTHONPATH py.test tests/integration_tests;
else
PYTHONPATH=$PWD:$PYTHONPATH py.test tests/ --ignore=tests/integration_tests;
fi
after_success:
- coveralls

@ -22,13 +22,13 @@ The more information you provide, the easier it is for us to validate that there
## Requesting a Feature
You can also use Github issues to request features you would like to see in Keras, or changes in the Keras API.
You can also use Github issues to request features you would like to see in Keras, or changes in the Keras API.
1. Provide a clear and detailed explanation of the feature you want and why it's important to add. Keep in mind that we want features that will be useful to the majority of our users and not just a small subset. If you're just targeting a minority of users, consider writing an add-on library for Keras. It is crucial for Keras to avoid bloating the API and codebase.
2. Provide code snippets demonstrating the API you have in mind and illustrating the use cases of your feature. Of course, you don't need to write any real code at this point!
3. After disussing the feature you may choose to attempt a Pull Request. If you're at all able, start writing some code. We always have more work to do than time to do it. If you can write some code then that will speed the process along.
3. After discussing the feature you may choose to attempt a Pull Request. If you're at all able, start writing some code. We always have more work to do than time to do it. If you can write some code then that will speed the process along.
## Pull Requests
@ -49,12 +49,12 @@ We love pull requests. Here's a quick guide:
- with the Theano backend, on Python 2.7 and Python 3.5
- with the TensorFlow backend, on Python 2.7
7. When committing, use appropriate, descriptive commit messages. Make sure that your branch history is not a string of "bug fix", "fix", "oops", etc. When submitting your PR, squash your commit history into 1-3 easy to follow commits, to make sure the project history stays clean and readable.
7. When committing, use appropriate, descriptive commit messages. Make sure that your branch history is not a string of "bug fix", "fix", "oops", etc. When submitting your PR, squash your commits into a single commit with an appropriate commit message, to make sure the project history stays clean and readable. See ['rebase and squash'](http://rebaseandsqua.sh/) for technical help on how to squash your commits.
8. Update the documentation. If introducing new functionality, make sure you include code snippets demonstrating the usage of your new feature.
9. Submit your PR. If your changes have been approved in a previous discussion, and if you have have complete (and passing) unit tests, your PR is likely to be merged promptly. Otherwise, well...
9. Submit your PR. If your changes have been approved in a previous discussion, and if you have complete (and passing) unit tests, your PR is likely to be merged promptly. Otherwise, well...
## Adding new examples
Even if you don't contribute to the Keras source code, if you have an application of Keras that is concise and powerful, please consider adding it to our collection of examples. Existing examples show idiomatic Keras code: make sure to keep your own script in the same spirit.
Even if you don't contribute to the Keras source code, if you have an application of Keras that is concise and powerful, please consider adding it to our collection of examples. [Existing examples](https://github.com/fchollet/keras/tree/master/examples) show idiomatic Keras code: make sure to keep your own script in the same spirit.

@ -4,9 +4,10 @@
## You have just found Keras.
Keras is a minimalist, highly modular neural networks library, written in Python and capable of running either on top of either [TensorFlow](https://github.com/tensorflow/tensorflow) or [Theano](https://github.com/Theano/Theano). It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either [TensorFlow](https://github.com/tensorflow/tensorflow) or [Theano](https://github.com/Theano/Theano). It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
Use Keras if you need a deep learning library that:
- allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).
- supports both convolutional networks and recurrent networks, as well as combinations of the two.
- supports arbitrary connectivity schemes (including multi-input and multi-output training).
@ -36,7 +37,7 @@ Keras is compatible with: __Python 2.7-3.5__.
## Getting started: 30 seconds to Keras
The core datastructure of Keras is a __model__, a way to organize layers. There are two types of models: [`Sequential`](http://keras.io/models/#sequential) and [`Graph`](http://keras.io/models/#graph).
The core data structure of Keras is a __model__, a way to organize layers. There are two types of models: [`Sequential`](http://keras.io/models/#sequential) and [`Graph`](http://keras.io/models/#graph).
Here's the `Sequential` model (a linear pile of layers):
@ -109,6 +110,7 @@ Keras uses the following dependencies:
- Optional but recommended if you use CNNs: cuDNN.
*When using the Theano backend:*
- Theano
- [See installation instructions](http://deeplearning.net/software/theano/install.html#install).
@ -118,6 +120,7 @@ sudo pip install git+git://github.com/Theano/Theano.git
```
*When using the TensorFlow backend:*
- TensorFlow
- [See installation instructions](https://github.com/tensorflow/tensorflow#download-and-setup).

@ -80,6 +80,10 @@ def get_method_signature(method):
for a in args:
st += str(a) + ', '
for a, v in kwargs:
if type(v) == str:
v = '\'' + v + '\''
elif type(v) == unicode:
v = 'u\'' + v + '\''
st += str(a) + '=' + str(v) + ', '
if kwargs or args:
return st[:-2] + ')'
@ -246,4 +250,7 @@ for module, module_name in MODULES:
print('...inserting autogenerated content into template:', path)
else:
print('...creating new page with autogenerated content:', path)
subdir = os.path.dirname(path)
if not os.path.exists(subdir):
os.makedirs(subdir)
open(path, 'w').write(module_page)

@ -23,6 +23,15 @@ It probably looks like this:
Simply change the field `backend` to either `"theano"` or `"tensorflow"`, and Keras will use the new configuration next time you run any Keras code.
You can also define the environment variable ``KERAS_BACKEND`` and this will
override what is defined in your config file :
```bash
KERAS_BACKEND=tensorflow python -c "from keras import backend; print backend._BACKEND"
Using TensorFlow backend.
tensorflow
```
## Using the abstract Keras backend to write new code
If you want the Keras modules you write to be compatible with both Theano and TensorFlow, you have to write them via the abstract Keras backend API. Here's an intro.

62
docs/templates/faq.md vendored

@ -20,6 +20,8 @@
[How can I record the training / validation loss / accuracy at each epoch?](#how-can-i-record-the-training-validation-loss-accuracy-at-each-epoch)
[How can I use stateful RNNs?](#how-can-i-use-stateful-rnns)
---
### How can I run Keras on GPU?
@ -105,22 +107,22 @@ You can build a Theano function that will return the output of a certain layer g
```python
# with a Sequential model
get_3rd_layer_output = theano.function([model.layers[0].input],
get_3rd_layer_output = theano.function([model.layers[0].input],
model.layers[3].get_output(train=False))
layer_output = get_3rd_layer_output(X)
# with a Graph model
get_conv_layer_output = theano.function([model.inputs[i].input for i in model.input_order],
model.outputs['conv'].get_output(train=False),
model.nodes['conv'].get_output(train=False),
on_unused_input='ignore')
conv_output = get_conv_output(input_data_dict)
conv_output = get_conv_layer_output([input_data_dict[i] for i in model.input_order])
```
---
### Isn't there a bug with Merge or Graph related to input concatenation?
Yes, there was a known bug with tensor concatenation in Thenao that was fixed early 2015.
Yes, there was a known bug with tensor concatenation in Theano that was fixed early 2015.
Please upgrade to the latest version of Theano:
```bash
@ -153,7 +155,7 @@ Find out more in the [callbacks documentation](callbacks.md).
### How is the validation split computed?
If you set the `validation_split` arugment in `model.fit` to e.g. 0.1, then the validation data used will be the *last 10%* of the data. If you set it to 0.25, it will be the last 25% of the data, etc.
If you set the `validation_split` argument in `model.fit` to e.g. 0.1, then the validation data used will be the *last 10%* of the data. If you set it to 0.25, it will be the last 25% of the data, etc.
---
@ -176,4 +178,52 @@ hist = model.fit(X, y, validation_split=0.2)
print(hist.history)
```
---
---
### How can I use stateful RNNs?
Making a RNN stateful means that the states for the samples of each batch will be reused as initial states for the samples in the next batch.
When using stateful RNNs, it is therefore assumed that:
- all batches have the same number of samples
- If `X1` and `X2` are successive batches of samples, then `X2[i]` is the follow-up sequence to `X1[i]`, for every `i`.
To use statefulness in RNNs, you need to:
- explicitly specify the batch size you are using, by passing a `batch_input_shape` argument to the first layer in your model. It should be a tuple of integers, e.g. `(32, 10, 16)` for a 32-samples batch of sequences of 10 timesteps with 16 features per timestep.
- set `stateful=True` in your RNN layer(s).
To reset the states accumulated:
- use `model.reset_states()` to reset the states of all layers in the model
- use `layer.reset_states()` to reset the states of a specific stateful RNN layer
Example:
```python
X # this is our input data, of shape (32, 21, 16)
# we will feed it to our model in sequences of length 10
model = Sequential()
model.add(LSTM(32, batch_input_shape=(32, 10, 16), stateful=True))
model.add(Dense(16, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# we train the network to predict the 11th timestep given the first 10:
model.train_on_batch(X[:, :10, :], np.reshape(X[:, 10, :], (32, 16)))
# the state of the network has changed. We can feed the follow-up sequences:
model.train_on_batch(X[:, 10:20, :], np.reshape(X[:, 20, :], (32, 16)))
# let's reset the states of the LSTM layer:
model.reset_states()
# another way to do it in this case:
model.layers[0].reset_states()
```
Notes that the methods `predict`, `fit`, `train_on_batch`, `predict_classes`, etc. will *all* update the states of the stateful layers in a model. This allows you to do not only stateful training, but also stateful prediction.

@ -2,9 +2,10 @@
## You have just found Keras.
Keras is a minimalist, highly modular neural networks library, written in Python and capable of running either on top of either [TensorFlow](https://github.com/tensorflow/tensorflow) or [Theano](https://github.com/Theano/Theano). It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either [TensorFlow](https://github.com/tensorflow/tensorflow) or [Theano](https://github.com/Theano/Theano). It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
Use Keras if you need a deep learning library that:
- allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).
- supports both convolutional networks and recurrent networks, as well as combinations of the two.
- supports arbitrary connectivity schemes (including multi-input and multi-output training).
@ -34,7 +35,7 @@ Keras is compatible with: __Python 2.7-3.5__.
## Getting started: 30 seconds to Keras
The core datastructure of Keras is a __model__, a way to organize layers. There are two types of models: [`Sequential`](/models/#sequential) and [`Graph`](/models/#graph).
The core datastructure of Keras is a __model__, a way to organize layers. There are two types of models: [`Sequential`](http://keras.io/models/#sequential) and [`Graph`](http://keras.io/models/#graph).
Here's the `Sequential` model (a linear pile of layers):
@ -107,6 +108,7 @@ Keras uses the following dependencies:
- Optional but recommended if you use CNNs: cuDNN.
*When using the Theano backend:*
- Theano
- [See installation instructions](http://deeplearning.net/software/theano/install.html#install).
@ -116,6 +118,7 @@ sudo pip install git+git://github.com/Theano/Theano.git
```
*When using the TensorFlow backend:*
- TensorFlow
- [See installation instructions](https://github.com/tensorflow/tensorflow#download-and-setup).
@ -157,4 +160,4 @@ Keras was initially developed as part of the research effort of project ONEIROS
>_"Oneiroi are beyond our unravelling --who can be sure what tale they tell? Not all that men look for comes to pass. Two gates there are that give passage to fleeting Oneiroi; one is made of horn, one of ivory. The Oneiroi that pass through sawn ivory are deceitful, bearing a message that will not be fulfilled; those that come out through polished horn have truth behind them, to be accomplished for men who see them."_ Homer, Odyssey 19. 562 ff (Shewring translation).
------------------
------------------

@ -27,3 +27,5 @@ For a few examples of such functions, check out the [objectives source](https://
- __hinge__
- __binary_crossentropy__: Also known as logloss.
- __categorical_crossentropy__: Also known as multiclass logloss. __Note__: using this objective requires that your labels are binary arrays of shape `(nb_samples, nb_classes)`.
- __poisson__: mean of `(predictions - targets * log(predictions))`
- __cosine_proximity__: the opposite (negative) of the mean cosine proximity between predictions and targets.

106
examples/antirectifier.py Normal file

@ -0,0 +1,106 @@
'''The example demonstrates how to write custom layers for Keras.
We build a custom activation layer called 'Antirectifier',
which modifies the shape of the tensor that passes through it.
We need to specify two methods: `output_shape` and `get_output`.
Note that the same result can also be achieved via a Lambda layer.
Because our custom layer is written with primitives from the Keras
backend (`K`), our code can run both on TensorFlow and Theano.
'''
from __future__ import print_function
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Layer, Activation
from keras.datasets import mnist
from keras import backend as K
from keras.utils import np_utils
class Antirectifier(Layer):
'''This is the combination of a sample-wise
L2 normalization with the concatenation of the
positive part of the input with the negative part
of the input. The result is a tensor of samples that are
twice as large as the input samples.
It can be used in place of a ReLU.
# Input shape
2D tensor of shape (samples, n)
# Output shape
2D tensor of shape (samples, 2*n)
# Theoretical justification
When applying ReLU, assuming that the distribution
of the previous output is approximately centered around 0.,
you are discarding half of your input. This is inefficient.
Antirectifier allows to return all-positive outputs like ReLU,
without discarding any data.
Tests on MNIST show that Antirectifier allows to train networks
with twice less parameters yet with comparable
classification accuracy as an equivalent ReLU-based network.
'''
@property
def output_shape(self):
shape = list(self.input_shape)
assert len(shape) == 2 # only valid for 2D tensors
shape[-1] *= 2
return tuple(shape)
def get_output(self, train):
x = self.get_input(train)
x -= K.mean(x, axis=1, keepdims=True)
x = K.l2_normalize(x, axis=1)
pos = K.relu(x)
neg = K.relu(-x)
return K.concatenate([pos, neg], axis=1)
# global parameters
batch_size = 128
nb_classes = 10
nb_epoch = 40
# the data, shuffled and split between tran and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
# build the model
model = Sequential()
model.add(Dense(256, input_shape=(784,)))
model.add(Antirectifier())
model.add(Dropout(0.1))
model.add(Dense(256))
model.add(Antirectifier())
model.add(Dropout(0.1))
model.add(Dense(10))
model.add(Activation('softmax'))
# compile the model
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
# train the model
model.fit(X_train, Y_train,
batch_size=batch_size, nb_epoch=nb_epoch,
show_accuracy=True, verbose=1,
validation_data=(X_test, Y_test))
# next, compare with an equivalent network
# with2x bigger Dense layers and ReLU

@ -3,13 +3,13 @@
References:
- Jason Weston, Antoine Bordes, Sumit Chopra, Tomas Mikolov, Alexander M. Rush,
"Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks",
http://arxiv.org/abs/1503.08895
http://arxiv.org/abs/1502.05698
- Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus,
"End-To-End Memory Networks",
http://arxiv.org/abs/1503.08895
Reaches 93% accuracy on task 'single_supporting_fact_10k' after 70 epochs.
Reaches 98.6% accuracy on task 'single_supporting_fact_10k' after 120 epochs.
Time per epoch: 3s on CPU (core i7).
'''
@ -153,12 +153,14 @@ input_encoder_m = Sequential()
input_encoder_m.add(Embedding(input_dim=vocab_size,
output_dim=64,
input_length=story_maxlen))
input_encoder_m.add(Dropout(0.3))
# output: (samples, story_maxlen, embedding_dim)
# embed the question into a sequence of vectors
question_encoder = Sequential()
question_encoder.add(Embedding(input_dim=vocab_size,
output_dim=64,
input_length=query_maxlen))
question_encoder.add(Dropout(0.3))
# output: (samples, query_maxlen, embedding_dim)
# compute a 'match' between input sequence elements (which are vectors)
# and the question vector sequence
@ -172,6 +174,7 @@ input_encoder_c = Sequential()
input_encoder_c.add(Embedding(input_dim=vocab_size,
output_dim=query_maxlen,
input_length=story_maxlen))
input_encoder_c.add(Dropout(0.3))
# output: (samples, story_maxlen, query_maxlen)
# sum the match vector with the input vector:
response = Sequential()
@ -185,9 +188,9 @@ answer = Sequential()
answer.add(Merge([response, question_encoder], mode='concat', concat_axis=-1))
# the original paper uses a matrix multiplication for this reduction step.
# we choose to use a RNN instead.
answer.add(LSTM(64))
answer.add(LSTM(32))
# one regularization layer -- more would probably be needed.
answer.add(Dropout(0.25))
answer.add(Dropout(0.3))
answer.add(Dense(vocab_size))
# we output a probability distribution over the vocabulary
answer.add(Activation('softmax'))
@ -196,6 +199,6 @@ answer.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# Note: you could use a Graph model to avoid repeat the input twice
answer.fit([inputs_train, queries_train, inputs_train], answers_train,
batch_size=32,
nb_epoch=70,
nb_epoch=120,
show_accuracy=True,
validation_data=([inputs_test, queries_test, inputs_test], answers_test))

198
examples/deep_dream.py Normal file

@ -0,0 +1,198 @@
'''Deep Dreaming in Keras.
Run the script with:
```
python deep_dream.py path_to_your_base_image.jpg prefix_for_results
```
e.g.:
```
python deep_dream.py img/mypic.jpg results/dream
```
It is preferrable to run this script on GPU, for speed.
If running on CPU, prefer the TensorFlow backend (much faster).
Example results: http://i.imgur.com/FX6ROg9.jpg
'''
from __future__ import print_function
from scipy.misc import imread, imresize, imsave
import numpy as np
from scipy.optimize import fmin_l_bfgs_b
import time
import argparse
import h5py
from keras.models import Sequential
from keras.layers.convolutional import Convolution2D, ZeroPadding2D, MaxPooling2D
from keras import backend as K
parser = argparse.ArgumentParser(description='Deep Dreams with Keras.')
parser.add_argument('base_image_path', metavar='base', type=str,
help='Path to the image to transform.')
parser.add_argument('result_prefix', metavar='res_prefix', type=str,
help='Prefix for the saved results.')
args = parser.parse_args()
base_image_path = args.base_image_path
result_prefix = args.result_prefix
# dimensions of the generated picture.
img_width = 600
img_height = 600
# path to the model weights file.
weights_path = 'vgg16_weights.h5'
# some settings we found interesting
saved_settings = {
'bad_trip': {'features': {'conv4_1': 0.05,
'conv4_2': 0.01,
'conv4_3': 0.01},
'continuity': 0.1,
'dream_l2': 0.8,
'jitter': 5},
'dreamy': {'features': {'conv5_1': 0.05,
'conv5_2': 0.02},
'continuity': 0.1,
'dream_l2': 0.02,
'jitter': 0},
}
# the settings we will use in this experiment
settings = saved_settings['dreamy']
# util function to open, resize and format pictures into appropriate tensors
def preprocess_image(image_path):
img = imresize(imread(image_path), (img_width, img_height))
img = img.transpose((2, 0, 1)).astype('float64')
img = np.expand_dims(img, axis=0)
return img
# util function to convert a tensor into a valid image
def deprocess_image(x):
x = x.transpose((1, 2, 0))
x = np.clip(x, 0, 255).astype('uint8')
return x
# this will contain our generated image
dream = K.placeholder((1, 3, img_width, img_height))
# build the VGG16 network with our dream as input
first_layer = ZeroPadding2D((1, 1), input_shape=(3, img_width, img_height))
first_layer.input = dream
model = Sequential()
model.add(first_layer)
model.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_1'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_2'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_1'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_2'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_1'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_2'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_3'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_1'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_2'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_3'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_1'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_2'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_3'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
# load the weights of the VGG16 networks
# (trained on ImageNet, won the ILSVRC competition in 2014)
# note: when there is a complete match between your model definition
# and your weight savefile, you can simply call model.load_weights(filename)
f = h5py.File(weights_path)
for k in range(f.attrs['nb_layers']):
if k >= len(model.layers):
# we don't look at the last (fully-connected) layers in the savefile
break
g = f['layer_{}'.format(k)]
weights = [g['param_{}'.format(p)] for p in range(g.attrs['nb_params'])]
model.layers[k].set_weights(weights)
f.close()
print('Model loaded.')
# get the symbolic outputs of each "key" layer (we gave them unique names).
layer_dict = dict([(layer.name, layer) for layer in model.layers])
# continuity loss util function
def continuity_loss(x):
assert K.ndim(x) == 4
a = K.square(x[:, :, :img_width-1, :img_height-1] - x[:, :, 1:, :img_height-1])
b = K.square(x[:, :, :img_width-1, :img_height-1] - x[:, :, :img_width-1, 1:])
return K.sum(K.pow(a + b, 1.25))
# define the loss
loss = K.variable(0.)
for layer_name in settings['features']:
# add the L2 norm of the features of a layer to the loss
assert layer_name in layer_dict.keys(), 'Layer ' + layer_name + ' not found in model.'
coeff = settings['features'][layer_name]
x = layer_dict[layer_name].get_output()
shape = layer_dict[layer_name].output_shape
# we avoid border artifacts by only involving non-border pixels in the loss
loss -= coeff * K.sum(K.square(x[:, :, 2: shape[2]-2, 2: shape[3]-2])) / np.prod(shape[1:])
# add continuity loss (gives image local coherence, can result in an artful blur)
loss += settings['continuity'] * continuity_loss(dream) / (3 * img_width * img_height)
# add image L2 norm to loss (prevents pixels from taking very high values, makes image darker)
loss += settings['dream_l2'] * K.sum(K.square(dream)) / (3 * img_width * img_height)
# feel free to further modify the loss as you see fit, to achieve new effects...
# compute the gradients of the dream wrt the loss
grads = K.gradients(loss, dream)
# set up helper functions to extract the loss and gradients
# from the computational graph as Numpy arrays
f_grads = K.function([dream], grads)
def eval_grads(x):
x = x.reshape((1, 3, img_width, img_height))
return np.array(f_grads([x])).flatten().astype('float64')
f_loss = K.function([dream], [loss])
def eval_loss(x):
x = x.reshape((1, 3, img_width, img_height))
return f_loss([x])[0].astype('float64')
# add a random jitter to the initial image. This will be reverted at decoding time
random_jitter = (settings['jitter'] * 2) * (np.random.random((3, img_width, img_height)) - 0.5)
x = preprocess_image(base_image_path)
x += random_jitter
# run scipy-based optimization (L-BFGS) over the pixels of the generated image
# so as to minimize the loss
for i in range(5):
start_time = time.time()
x, min_val, info = fmin_l_bfgs_b(eval_loss, x.flatten(),
fprime=eval_grads, maxfun=7)
print('Current loss value:', min_val)
# decode the dream and save it
x = x.reshape((3, img_width, img_height))
x -= random_jitter
img = deprocess_image(x)
fname = result_prefix + '_at_iteration_%d.png' % i
imsave(fname, img)
end_time = time.time()
print('Image saved as', fname)
print('Iteration %d completed in %ds' % (i, end_time - start_time))

@ -85,7 +85,7 @@ for iteration in range(1, 60):
print('----- Generating with seed: "' + sentence + '"')
sys.stdout.write(generated)
for iteration in range(400):
for i in range(400):
x = np.zeros((1, maxlen, len(chars)))
for t, char in enumerate(sentence):
x[0, t, char_indices[char]] = 1.

@ -0,0 +1,256 @@
'''Neural style transfer with Keras.
Before running this script, download the weights for the VGG16 model at:
https://drive.google.com/file/d/0Bz7KyqmuGsilT0J5dmRCM0ROVHc/view?usp=sharing
(source: https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3)
and make sure the variable `weights_path` in this script matches the location of the file.
Run the script with:
```
python neural_style.py path_to_your_base_image.jpg path_to_your_reference.jpg prefix_for_results
```
e.g.:
```
python neural_style.py img/tuebingen.jpg img/starry_night.jpg results/my_result
```
It is preferrable to run this script on GPU, for speed.
If running on CPU, prefer the TensorFlow backend (much faster).
Example result: https://twitter.com/fchollet/status/686631033085677568
# Details
Style transfer consists in generating an image
with the same "content" as a base image, but with the
"style" of a different picture (typically artistic).
This is achieved through the optimization of a loss function
that has 3 components: "style loss", "content loss",
and "total variation loss":
- The total variation loss imposes local spatial continuity between
the pixels of the combination image, giving it visual coherence.
- The style loss is where the deep learning keeps in --that one is defined
using a deep convolutional neural network. Precisely, it consists in a sum of
L2 distances betwen the Gram matrices of the representations of
the base image and the style reference image, extracted from
different layers of a convnet (trained on ImageNet). The general idea
is to capture color/texture information at different spatial
scales (fairly large scales --defined by the depth of the layer considered).
- The content loss is a L2 distance between the features of the base
image (extracted from a deep layer) and the features of the combination image,
keeping the generated image close enough to the original one.
# References
- [A Neural Algorithm of Artistic Style](http://arxiv.org/abs/1508.06576)
'''
from __future__ import print_function
from scipy.misc import imread, imresize, imsave
import numpy as np
from scipy.optimize import fmin_l_bfgs_b
import time
import argparse
import h5py
from keras.models import Sequential
from keras.layers.convolutional import Convolution2D, ZeroPadding2D, MaxPooling2D
from keras import backend as K
parser = argparse.ArgumentParser(description='Neural style transfer with Keras.')
parser.add_argument('base_image_path', metavar='base', type=str,
help='Path to the image to transform.')
parser.add_argument('style_reference_image_path', metavar='ref', type=str,
help='Path to the style reference image.')
parser.add_argument('result_prefix', metavar='res_prefix', type=str,
help='Prefix for the saved results.')
args = parser.parse_args()
base_image_path = args.base_image_path
style_reference_image_path = args.style_reference_image_path
result_prefix = args.result_prefix
weights_path = 'vgg16_weights.h5'
# these are the weights of the different loss components
total_variation_weight = 1.
style_weight = 1.
content_weight = 0.025
# dimensions of the generated picture.
img_width = 400
img_height = 400
assert img_height == img_width, 'Due to the use of the Gram matrix, width and height must match.'
# util function to open, resize and format pictures into appropriate tensors
def preprocess_image(image_path):
img = imresize(imread(image_path), (img_width, img_height))
img = img.transpose((2, 0, 1)).astype('float64')
img = np.expand_dims(img, axis=0)
return img
# util function to convert a tensor into a valid image
def deprocess_image(x):
x = x.transpose((1, 2, 0))
x = np.clip(x, 0, 255).astype('uint8')
return x
# get tensor representations of our images
base_image = K.variable(preprocess_image(base_image_path))
style_reference_image = K.variable(preprocess_image(style_reference_image_path))
# this will contain our generated image
combination_image = K.placeholder((1, 3, img_width, img_height))
# combine the 3 images into a single Keras tensor
input_tensor = K.concatenate([base_image,
style_reference_image,
combination_image], axis=0)
# build the VGG16 network with our 3 images as input
first_layer = ZeroPadding2D((1, 1), input_shape=(3, img_width, img_height))
first_layer.input = input_tensor
model = Sequential()
model.add(first_layer)
model.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_1'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_1'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_1'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_1'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_2'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_1'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
# load the weights of the VGG16 networks
# (trained on ImageNet, won the ILSVRC competition in 2014)
# note: when there is a complete match between your model definition
# and your weight savefile, you can simply call model.load_weights(filename)
f = h5py.File(weights_path)
for k in range(f.attrs['nb_layers']):
if k >= len(model.layers):
# we don't look at the last (fully-connected) layers in the savefile
break
g = f['layer_{}'.format(k)]
weights = [g['param_{}'.format(p)] for p in range(g.attrs['nb_params'])]
model.layers[k].set_weights(weights)
f.close()
print('Model loaded.')
# get the symbolic outputs of each "key" layer (we gave them unique names).
outputs_dict = dict([(layer.name, layer.get_output()) for layer in model.layers])
# compute the neural style loss
# first we need to define 4 util functions
# the gram matrix of an image tensor (feature-wise outer product)
def gram_matrix(x):
assert K.ndim(x) == 3
features = K.batch_flatten(x)
gram = K.dot(features, K.transpose(features))
return gram
# the "style loss" is designed to maintain
# the style of the reference image in the generated image.
# It is based on the gram matrices (which capture style) of
# feature maps from the style reference image
# and from the generated image
def style_loss(style, combination):
assert K.ndim(style) == 3
assert K.ndim(combination) == 3
S = gram_matrix(style)
C = gram_matrix(combination)
channels = 3
size = img_width * img_height
return K.sum(K.square(S - C)) / (4. * (channels ** 2) * (size ** 2))
# an auxiliary loss function
# designed to maintain the "content" of the
# base image in the generated image
def content_loss(base, combination):
return K.sum(K.square(combination - base))
# the 3rd loss function, total variation loss,
# designed to keep the generated image locally coherent
def total_variation_loss(x):
assert K.ndim(x) == 4
a = K.square(x[:, :, :img_width-1, :img_height-1] - x[:, :, 1:, :img_height-1])
b = K.square(x[:, :, :img_width-1, :img_height-1] - x[:, :, :img_width-1, 1:])
return K.sum(K.pow(a + b, 1.25))
# combine these loss functions into a single scalar
loss = K.variable(0.)
layer_features = outputs_dict['conv4_2']
base_image_features = layer_features[0, :, :, :]
combination_features = layer_features[2, :, :, :]
loss += content_weight * content_loss(base_image_features,
combination_features)
feature_layers = ['conv1_1', 'conv2_1', 'conv3_1', 'conv4_1', 'conv5_1']
for layer_name in feature_layers:
layer_features = outputs_dict[layer_name]
style_reference_features = layer_features[1, :, :, :]
combination_features = layer_features[2, :, :, :]
sl = style_loss(style_reference_features, combination_features)
loss += (style_weight / len(feature_layers)) * sl
loss += total_variation_weight * total_variation_loss(combination_image)
# get the gradients of the generated image wrt the loss
grads = K.gradients(loss, combination_image)
# set up helper functions to extract the loss and gradients
# from the computational graph as Numpy arrays
f_grads = K.function([combination_image], grads)
def eval_grads(x):
x = x.reshape((1, 3, img_width, img_height))
return np.array(f_grads([x])).flatten().astype('float64')
f_loss = K.function([combination_image], [loss])
def eval_loss(x):
x = x.reshape((1, 3, img_width, img_height))
return f_loss([x])[0].astype('float64')
# run scipy-based optimization (L-BFGS) over the pixels of the generated image
# so as to minimize the neural style loss
x = np.random.uniform(0, 255, (1, 3, img_width, img_height))
for i in range(10):
print('Start of iteration', i)
start_time = time.time()
x, min_val, info = fmin_l_bfgs_b(eval_loss, x.flatten(),
fprime=eval_grads, maxfun=20)
print('Current loss value:', min_val)
# save current generated image
img = deprocess_image(x.reshape((3, img_width, img_height)))
fname = result_prefix + '_at_iteration_%d.png' % i
imsave(fname, img)
end_time = time.time()
print('Image saved as', fname)
print('Iteration %d completed in %ds' % (i, end_time - start_time))

@ -1 +1 @@
__version__ = '0.3.0'
__version__ = '0.3.1'

@ -39,7 +39,7 @@ def hard_sigmoid(x):
def linear(x):
'''
The function returns the variable that is passed in, so all types work
The function returns the variable that is passed in, so all types work.
'''
return x

@ -4,12 +4,16 @@ import os
import json
from .common import epsilon, floatx, set_epsilon, set_floatx
_keras_dir = os.path.expanduser(os.path.join('~', '.keras'))
_keras_base_dir = os.path.expanduser('~')
if not os.access(_keras_base_dir, os.W_OK):
_keras_base_dir = '/tmp'
_keras_dir = os.path.join(_keras_base_dir, '.keras')
if not os.path.exists(_keras_dir):
os.makedirs(_keras_dir)
_BACKEND = 'theano'
_config_path = os.path.expanduser(os.path.join('~', '.keras', 'keras.json'))
_config_path = os.path.expanduser(os.path.join(_keras_dir, 'keras.json'))
if os.path.exists(_config_path):
_config = json.load(open(_config_path))
_floatx = _config.get('floatx', floatx())
@ -31,6 +35,11 @@ else:
# add new line in order for bash 'cat' display the content correctly
f.write(json.dumps(_config) + '\n')
if 'KERAS_BACKEND' in os.environ:
_backend = os.environ['KERAS_BACKEND']
assert _backend in {'theano', 'tensorflow'}
_BACKEND = _backend
if _BACKEND == 'theano':
print('Using Theano backend.')
from .theano_backend import *

@ -236,6 +236,41 @@ def permute_dimensions(x, pattern):
return tf.transpose(x, perm=pattern)
def resize_images(X, height_factor, width_factor, dim_ordering):
'''Resize the images contained in a 4D tensor of shape
- [batch, channels, height, width] (for 'th' dim_ordering)
- [batch, height, width, channels] (for 'tf' dim_ordering)
by a factor of (height_factor, width_factor). Both factors should be
positive integers.
'''
if dim_ordering == 'th':
new_height = shape(X)[2].value * height_factor
new_width = shape(X)[3].value * width_factor
X = permute_dimensions(X, [0, 2, 3, 1])
X = tf.image.resize_nearest_neighbor(X, (new_height, new_width))
return permute_dimensions(X, [0, 3, 1, 2])
elif dim_ordering == 'tf':
new_height = shape(X)[1].value * height_factor
new_width = shape(X)[2].value * width_factor
return tf.image.resize_nearest_neighbor(X, (new_height, new_width))
else:
raise Exception('Invalid dim_ordering: ' + dim_ordering)
def repeat_elements(x, rep, axis):
'''Repeats the elements of a tensor along an axis, like np.repeat
If x has shape (s1, s2, s3) and axis=1, the output
will have shape (s1, s2 * rep, s3)
'''
x_shape = x.get_shape().as_list()
# slices along the repeat axis
splits = tf.split(axis, x_shape[axis], x)
# repeat each slice the given number of reps
x_rep = [s for s in splits for i in range(rep)]
return tf.concat(axis, x_rep)
def repeat(x, n):
'''Repeat a 2D tensor:
@ -252,6 +287,10 @@ def tile(x, n):
def flatten(x):
return tf.reshape(x, [-1])
def batch_flatten(x):
'''Turn a n-D tensor into a 2D tensor where
the first dimension is conserved.
'''
@ -274,9 +313,6 @@ def squeeze(x, axis):
def temporal_padding(x, padding=1):
'''Pad the middle dimension of a 3D tensor
with "padding" zeros left and right.
Appologies for the inane API, but Theano makes this
really hard.
'''
pattern = [[0, 0], [padding, padding], [0, 0]]
return tf.pad(x, pattern)
@ -313,12 +349,16 @@ def set_value(x, value):
class Function(object):
def __init__(self, inputs, outputs, updates=[]):
assert type(inputs) in {list, tuple}
assert type(outputs) in {list, tuple}
assert type(updates) in {list, tuple}
self.inputs = list(inputs)
self.outputs = list(outputs)
with tf.control_dependencies(self.outputs):
self.updates = [tf.assign(p, new_p) for (p, new_p) in updates]
def __call__(self, inputs):
assert type(inputs) in {list, tuple}
names = [v.name for v in self.inputs]
feed_dict = dict(zip(names, inputs))
session = _get_session()
@ -410,7 +450,7 @@ def rnn(step_function, inputs, initial_states,
new_states = successive_states[-1]
outputs = tf.transpose(outputs, (1, 0, 2))
return last_output, outputs, states
return last_output, outputs, new_states
def switch(condition, then_expression, else_expression):
@ -499,6 +539,12 @@ def dropout(x, level, seed=None):
return tf.nn.dropout(x * 1., retain_prob, seed=seed)
def l2_normalize(x, axis):
if axis < 0:
axis = axis % len(x.get_shape())
return tf.nn.l2_normalize(x, dim=axis)
# CONVOLUTIONS

@ -11,7 +11,7 @@ theano.config.floatX = _FLOATX
def _on_gpu():
'''Returns whether the session is set to
'''Return whether the session is set to
run on GPU or not (i.e. on CPU).
'''
return theano.config.device[:3] == 'gpu'
@ -19,7 +19,7 @@ def _on_gpu():
if _on_gpu():
'''Import cuDNN only if running on GPU:
not having Cuda install should not
not having Cuda installed should not
prevent from running the present code.
'''
from theano.sandbox.cuda import dnn
@ -243,11 +243,39 @@ def permute_dimensions(x, pattern):
return x.dimshuffle(pattern)
def repeat(x, n):
'''Repeat a 2D tensor:
def repeat_elements(x, rep, axis):
'''Repeat the elements of a tensor along an axis, like np.repeat.
if x has shape (samples, dim) and n=2,
the output will have shape (samples, 2, dim)
If x has shape (s1, s2, s3) and axis=1, the output
will have shape (s1, s2 * rep, s3).
'''
return T.repeat(x, rep, axis=axis)
def resize_images(X, height_factor, width_factor, dim_ordering):
'''Resize the images contained in a 4D tensor of shape
- [batch, channels, height, width] (for 'th' dim_ordering)
- [batch, height, width, channels] (for 'tf' dim_ordering)
by a factor of (height_factor, width_factor). Both factors should be
positive integers.
'''
if dim_ordering == 'th':
output = repeat_elements(X, height_factor, axis=2)
output = repeat_elements(output, width_factor, axis=3)
return output
elif dim_ordering == 'tf':
output = repeat_elements(X, height_factor, axis=1)
output = repeat_elements(output, width_factor, axis=2)
return output
else:
raise Exception('Invalid dim_ordering: ' + dim_ordering)
def repeat(x, n):
'''Repeat a 2D tensor.
If x has shape (samples, dim) and n=2,
the output will have shape (samples, 2, dim).
'''
tensors = [x] * n
stacked = T.stack(*tensors)
@ -259,6 +287,10 @@ def tile(x, n):
def flatten(x):
return T.flatten(x)
def batch_flatten(x):
'''Turn a n-D tensor into a 2D tensor where
the first dimension is conserved.
'''
@ -354,6 +386,7 @@ class Function(object):
allow_input_downcast=True, **kwargs)
def __call__(self, inputs):
assert type(inputs) in {list, tuple}
return self.function(*inputs)
@ -369,7 +402,7 @@ def gradients(loss, variables):
def rnn(step_function, inputs, initial_states,
go_backwards=False, masking=True):
'''Iterates over the time dimension of a tensor.
'''Iterate over the time dimension of a tensor.
Parameters
----------
@ -412,7 +445,7 @@ def rnn(step_function, inputs, initial_states,
if masking:
# if all-zero input timestep, return
# all-zero output and unchanged states
switch = T.any(input)
switch = T.any(input, axis=-1, keepdims=True)
output = T.switch(switch, output, 0. * output)
return_states = []
for state, new_state in zip(states, new_states):
@ -509,9 +542,13 @@ def dropout(x, level, seed=None):
return x
# CONVOLUTIONS
def l2_normalize(x, axis):
norm = T.sqrt(T.sum(T.square(x), axis=axis, keepdims=True))
return x / norm
# CONVOLUTIONS
def conv2d(x, kernel, strides=(1, 1), border_mode='valid', dim_ordering='th',
image_shape=None, filter_shape=None):
'''
@ -540,12 +577,15 @@ def conv2d(x, kernel, strides=(1, 1), border_mode='valid', dim_ordering='th',
if _on_gpu() and dnn.dnn_available():
if border_mode == 'same':
assert(strides == (1, 1))
np_kernel = kernel.eval()
pad_x = (np_kernel.shape[2] - strides[0]) // 2
pad_y = (np_kernel.shape[3] - strides[1]) // 2
conv_out = dnn.dnn_conv(img=x,
kerns=kernel,
border_mode=(pad_x, pad_y))
border_mode='full')
np_kernel = kernel.eval()
shift_x = (np_kernel.shape[2] - 1) // 2
shift_y = (np_kernel.shape[3] - 1) // 2
conv_out = conv_out[:, :,
shift_x:x.shape[2] + shift_x,
shift_y:x.shape[3] + shift_y]
else:
conv_out = dnn.dnn_conv(img=x,
kerns=kernel,
@ -566,8 +606,9 @@ def conv2d(x, kernel, strides=(1, 1), border_mode='valid', dim_ordering='th',
image_shape=image_shape,
filter_shape=filter_shape)
if border_mode == 'same':
shift_x = (kernel.shape[2] - 1) // 2
shift_y = (kernel.shape[3] - 1) // 2
np_kernel = kernel.eval()
shift_x = (np_kernel.shape[2] - 1) // 2
shift_y = (np_kernel.shape[3] - 1) // 2
conv_out = conv_out[:, :,
shift_x:x.shape[2] + shift_x,
shift_y:x.shape[3] + shift_y]

@ -8,6 +8,7 @@ import warnings
from collections import deque
from .utils.generic_utils import Progbar
from keras import backend as K
class CallbackList(object):
@ -43,21 +44,27 @@ class CallbackList(object):
callback.on_batch_begin(batch, logs)
self._delta_ts_batch_begin.append(time.time() - t_before_callbacks)
delta_t_median = np.median(self._delta_ts_batch_begin)
if self._delta_t_batch > 0. and delta_t_median > 0.95 * self._delta_t_batch and delta_t_median > 0.1:
if self._delta_t_batch > 0. and delta_t_median > 0.95 * \
self._delta_t_batch and delta_t_median > 0.1:
warnings.warn('Method on_batch_begin() is slow compared '
'to the batch update (%f). Check your callbacks.' % delta_t_median)
'to the batch update (%f). Check your callbacks.'
% delta_t_median)
self._t_enter_batch = time.time()
def on_batch_end(self, batch, logs={}):
if not hasattr(self, '_t_enter_batch'):
self._t_enter_batch = time.time()
self._delta_t_batch = time.time() - self._t_enter_batch
t_before_callbacks = time.time()
for callback in self.callbacks:
callback.on_batch_end(batch, logs)
self._delta_ts_batch_end.append(time.time() - t_before_callbacks)
delta_t_median = np.median(self._delta_ts_batch_end)
if self._delta_t_batch > 0. and delta_t_median > 0.95 * self._delta_t_batch and delta_t_median > 0.1:
if self._delta_t_batch > 0. and delta_t_median > 0.95 * \
self._delta_t_batch and delta_t_median > 0.1:
warnings.warn('Method on_batch_end() is slow compared '
'to the batch update (%f). Check your callbacks.' % delta_t_median)
'to the batch update (%f). Check your callbacks.'
% delta_t_median)
def on_train_begin(self, logs={}):
for callback in self.callbacks:
@ -249,7 +256,8 @@ class ModelCheckpoint(Callback):
if mode not in ['auto', 'min', 'max']:
warnings.warn('ModelCheckpoint mode %s is unknown, '
'fallback to auto mode' % (self.mode), RuntimeWarning)
'fallback to auto mode.' % (self.mode),
RuntimeWarning)
mode = 'auto'
if mode == 'min':
@ -276,7 +284,8 @@ class ModelCheckpoint(Callback):
else:
if self.monitor_op(current, self.best):
if self.verbose > 0:
print('Epoch %05d: %s improved from %0.5f to %0.5f, saving model to %s'
print('Epoch %05d: %s improved from %0.5f to %0.5f,'
' saving model to %s'
% (epoch, self.monitor, self.best,
current, filepath))
self.best = current
@ -299,23 +308,46 @@ class EarlyStopping(Callback):
patience: number of epochs with no improvement
after which training will be stopped.
verbose: verbosity mode.
mode: one of {auto, min, max}. In 'min' mode,
training will stop when the quantity
monitored has stopped decreasing; in 'max'
mode it will stop when the quantity
monitored has stopped increasing.
'''
def __init__(self, monitor='val_loss', patience=0, verbose=0):
def __init__(self, monitor='val_loss', patience=0, verbose=0, mode='auto'):
super(Callback, self).__init__()
self.monitor = monitor
self.patience = patience
self.verbose = verbose
self.best = np.Inf
self.wait = 0
if mode not in ['auto', 'min', 'max']:
warnings.warn('EarlyStopping mode %s is unknown, '
'fallback to auto mode.' % (self.mode), RuntimeWarning)
mode = 'auto'
if mode == 'min':
self.monitor_op = np.less
self.best = np.Inf
elif mode == 'max':
self.monitor_op = np.greater
self.best = -np.Inf
else:
if 'acc' in self.monitor:
self.monitor_op = np.greater
self.best = -np.Inf
else:
self.monitor_op = np.less
self.best = np.Inf
def on_epoch_end(self, epoch, logs={}):
current = logs.get(self.monitor)
if current is None:
warnings.warn('Early stopping requires %s available!' %
(self.monitor), RuntimeWarning)
if current < self.best:
if self.monitor_op(current, self.best):
self.best = current
self.wait = 0
else:
@ -327,9 +359,16 @@ class EarlyStopping(Callback):
class RemoteMonitor(Callback):
'''Experimental callback used to stream events to a server.
'''Callback used to stream events to a server.
Requires the `requests` library.
# Arguments
root: root url to which the events will be sent (at the end
of every epoch). Events are sent to
`root + '/publish/epoch/end/'`. Calls are HTTP POST,
with a `data` argument which is a JSON-encoded dictionary
of event data.
'''
def __init__(self, root='http://localhost:9000'):
self.root = root
@ -369,13 +408,120 @@ class LearningRateScheduler(Callback):
'''Learning rate scheduler.
# Arguments
schedule: a function that gets an epoch index as input
schedule: a function that takes an epoch index as input
(integer, indexed from 0) and returns a new
learning rate as output.
learning rate as output (float).
'''
def __init__(self, schedule):
super(LearningRateScheduler, self).__init__()
self.schedule = schedule
def on_epoch_begin(self, epoch, logs={}):
self.model.optimizer.lr.set_value(self.schedule(epoch))
assert hasattr(self.model.optimizer, 'lr'), \
'Optimizer must have a "lr" attribute.'
lr = self.schedule(epoch)
assert type(lr) == float, 'The output of the "schedule" function should be float.'
K.set_value(self.model.optimizer.lr, lr)
class TensorBoard(Callback):
''' Tensorboard basic visualizations.
This callback writes a log for TensorBoard, which allows
you to visualize dynamic graphs of your training and test
metrics, as well as activation histograms for the different
layers in your model.
TensorBoard is a visualization tool provided with TensorFlow.
If you have installed TensorFlow with pip, you should be able
to launch TensorBoard from the command line:
```
tensorboard --logdir=/full_path_to_your_logs
```
You can find more information about TensorBoard
[here](https://www.tensorflow.org/versions/master/how_tos/summaries_and_tensorboard/index.html).
# Arguments
log_dir: the path of the directory where to save the log
files to be parsed by tensorboard
histogram_freq: frequency (in epochs) at which to compute activation
histograms for the layers of the model. If set to 0,
histograms won't be computed.
'''
def __init__(self, log_dir='./logs', histogram_freq=0):
super(Callback, self).__init__()
if K._BACKEND != 'tensorflow':
raise Exception('TensorBoard callback only works '
'with the TensorFlow backend.')
self.log_dir = log_dir
self.histogram_freq = histogram_freq
def _set_model(self, model):
import tensorflow as tf
import keras.backend.tensorflow_backend as KTF
self.model = model
self.sess = KTF._get_session()
if self.histogram_freq:
mod_type = self.model.get_config()['name']
if mod_type == 'Sequential':
layers = {l.get_config()['name']: l for l in self.model.layers}
elif mod_type == 'Graph':
layers = self.model.nodes
else:
raise Exception('Unrecognized model:',
self.model.get_config()['name'])
for l in layers:
cur_layer = layers[l]
if hasattr(cur_layer, 'W'):
tf.histogram_summary('{}_W'.format(l), cur_layer.W)
if hasattr(cur_layer, 'b'):
tf.histogram_summary('{}_b'.format(l), cur_layer.b)
if hasattr(cur_layer, 'get_output'):
tf.histogram_summary('{}_out'.format(l),
cur_layer.get_output())
self.merged = tf.merge_all_summaries()
self.writer = tf.train.SummaryWriter(self.log_dir,
self.sess.graph_def)
def on_epoch_begin(self, epoch, logs={}):
self.seen = 0
self.totals = {}
def on_batch_end(self, batch, logs={}):
batch_size = logs.get('size', 0)
self.seen += batch_size
for k, v in logs.items():
if k in self.totals:
self.totals[k] += v * batch_size
else:
self.totals[k] = v * batch_size
def on_epoch_end(self, epoch, logs={}):
import tensorflow as tf
if self.model.validation_data and self.histogram_freq:
if epoch % self.histogram_freq == 0:
if self.params.get('show_accuracy'):
test_function = self.model._test_with_acc
else:
test_function = self.model._test
names = [v.name for v in test_function.inputs]
feed_dict = dict(zip(names, self.model.validation_data))
result = self.sess.run([self.merged], feed_dict=feed_dict)
summary_str = result[0]
self.writer.add_summary(summary_str, epoch)
all_values = self.totals.copy()
all_values.update(logs)
for name, value in all_values.items():
if name in ['batch', 'size']:
continue
summary = tf.Summary()
summary_value = summary.value.add()
summary_value.simple_value = value
summary_value.tag = name
self.writer.add_summary(summary, epoch)
self.writer.flush()

@ -10,7 +10,6 @@ def load_data():
origin = "http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
path = get_file(dirname, origin=origin, untar=True)
nb_test_samples = 10000
nb_train_samples = 50000
X_train = np.zeros((nb_train_samples, 3, 32, 32), dtype="uint8")

@ -14,7 +14,10 @@ class ParanoidURLopener(FancyURLopener):
def get_file(fname, origin, untar=False):
datadir = os.path.expanduser(os.path.join('~', '.keras', 'datasets'))
datadir_base = os.path.expanduser(os.path.join('~', '.keras'))
if not os.access(datadir_base, os.W_OK):
datadir_base = os.path.join('/tmp', '.keras')
datadir = os.path.join(datadir_base, 'datasets')
if not os.path.exists(datadir):
os.makedirs(datadir)

@ -2,12 +2,12 @@ from __future__ import absolute_import
from six.moves import cPickle
import gzip
from .data_utils import get_file
import random
from six.moves import zip
import numpy as np
def load_data(path="imdb.pkl", nb_words=None, skip_top=0, maxlen=None, test_split=0.2, seed=113,
def load_data(path="imdb.pkl", nb_words=None, skip_top=0,
maxlen=None, test_split=0.2, seed=113,
start_char=1, oov_char=2, index_from=3):
path = get_file(path, origin="https://s3.amazonaws.com/text-datasets/imdb.pkl")
@ -39,7 +39,10 @@ def load_data(path="imdb.pkl", nb_words=None, skip_top=0, maxlen=None, test_spli
new_labels.append(y)
X = new_X
labels = new_labels
if not X:
raise Exception('After filtering for sequences shorter than maxlen=' +
str(maxlen) + ', no sequence was kept. '
'Increase maxlen.')
if not nb_words:
nb_words = max([max(x) for x in X])
@ -57,10 +60,10 @@ def load_data(path="imdb.pkl", nb_words=None, skip_top=0, maxlen=None, test_spli
nX.append(nx)
X = nX
X_train = X[:int(len(X)*(1-test_split))]
y_train = labels[:int(len(X)*(1-test_split))]
X_train = X[:int(len(X) * (1 - test_split))]
y_train = labels[:int(len(X) * (1 - test_split))]
X_test = X[int(len(X)*(1-test_split)):]
y_test = labels[int(len(X)*(1-test_split)):]
X_test = X[int(len(X) * (1 - test_split)):]
y_test = labels[int(len(X) * (1 - test_split)):]
return (X_train, y_train), (X_test, y_test)

@ -19,5 +19,4 @@ def load_data(path="mnist.pkl.gz"):
data = cPickle.load(f, encoding="bytes")
f.close()
return data # (X_train, y_train), (X_test, y_test)

@ -1,18 +1,17 @@
# -*- coding: utf-8 -*-
from __future__ import absolute_import
from .data_utils import get_file
import random
from six.moves import cPickle
from six.moves import zip
import numpy as np
def load_data(path="reuters.pkl", nb_words=None, skip_top=0, maxlen=None, test_split=0.2, seed=113,
def load_data(path="reuters.pkl", nb_words=None, skip_top=0,
maxlen=None, test_split=0.2, seed=113,
start_char=1, oov_char=2, index_from=3):
path = get_file(path, origin="https://s3.amazonaws.com/text-datasets/reuters.pkl")
f = open(path, 'rb')
X, labels = cPickle.load(f)
f.close()
@ -53,11 +52,11 @@ def load_data(path="reuters.pkl", nb_words=None, skip_top=0, maxlen=None, test_s
nX.append(nx)
X = nX
X_train = X[:int(len(X)*(1-test_split))]
y_train = labels[:int(len(X)*(1-test_split))]
X_train = X[:int(len(X) * (1 - test_split))]
y_train = labels[:int(len(X) * (1 - test_split))]
X_test = X[int(len(X)*(1-test_split)):]
y_test = labels[int(len(X)*(1-test_split)):]
X_test = X[int(len(X) * (1 - test_split)):]
y_test = labels[int(len(X) * (1 - test_split)):]
return (X_train, y_train), (X_test, y_test)
@ -66,8 +65,3 @@ def get_word_index(path="reuters_word_index.pkl"):
path = get_file(path, origin="https://s3.amazonaws.com/text-datasets/reuters_word_index.pkl")
f = open(path, 'rb')
return cPickle.load(f)
if __name__ == "__main__":
make_reuters_dataset()
(X_train, y_train), (X_test, y_test) = load_data()

@ -9,52 +9,54 @@ def get_fans(shape):
return fan_in, fan_out
def uniform(shape, scale=0.05):
return K.variable(np.random.uniform(low=-scale, high=scale, size=shape))
def uniform(shape, scale=0.05, name=None):
return K.variable(np.random.uniform(low=-scale, high=scale, size=shape),
name=name)
def normal(shape, scale=0.05):
return K.variable(np.random.randn(*shape) * scale)
def normal(shape, scale=0.05, name=None):
return K.variable(np.random.normal(loc=0.0, scale=scale, size=shape),
name=name)
def lecun_uniform(shape):
def lecun_uniform(shape, name=None):
''' Reference: LeCun 98, Efficient Backprop
http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
'''
fan_in, fan_out = get_fans(shape)
scale = np.sqrt(3. / fan_in)
return uniform(shape, scale)
return uniform(shape, scale, name=name)
def glorot_normal(shape):
def glorot_normal(shape, name=None):
''' Reference: Glorot & Bengio, AISTATS 2010
'''
fan_in, fan_out = get_fans(shape)
s = np.sqrt(2. / (fan_in + fan_out))
return normal(shape, s)
return normal(shape, s, name=name)
def glorot_uniform(shape):
def glorot_uniform(shape, name=None):
fan_in, fan_out = get_fans(shape)
s = np.sqrt(6. / (fan_in + fan_out))
return uniform(shape, s)
return uniform(shape, s, name=name)
def he_normal(shape):
def he_normal(shape, name=None):
''' Reference: He et al., http://arxiv.org/abs/1502.01852
'''
fan_in, fan_out = get_fans(shape)
s = np.sqrt(2. / fan_in)
return normal(shape, s)
return normal(shape, s, name=name)
def he_uniform(shape):
def he_uniform(shape, name=None):
fan_in, fan_out = get_fans(shape)
s = np.sqrt(6. / fan_in)
return uniform(shape, s)
return uniform(shape, s, name=name)
def orthogonal(shape, scale=1.1):
def orthogonal(shape, scale=1.1, name=None):
''' From Lasagne. Reference: Saxe et al., http://arxiv.org/abs/1312.6120
'''
flat_shape = (shape[0], np.prod(shape[1:]))
@ -63,22 +65,23 @@ def orthogonal(shape, scale=1.1):
# pick the one with the correct shape
q = u if u.shape == flat_shape else v
q = q.reshape(shape)
return K.variable(scale * q[:shape[0], :shape[1]])
return K.variable(scale * q[:shape[0], :shape[1]], name=name)
def identity(shape, scale=1):
def identity(shape, scale=1, name=None):
if len(shape) != 2 or shape[0] != shape[1]:
raise Exception("Identity matrix initialization can only be used for 2D square matrices")
raise Exception('Identity matrix initialization can only be used '
'for 2D square matrices.')
else:
return K.variable(scale * np.identity(shape[0]))
return K.variable(scale * np.identity(shape[0]), name=name)
def zero(shape):
return K.zeros(shape)
def zero(shape, name=None):
return K.zeros(shape, name=name)
def one(shape):
return K.ones(shape)
def one(shape, name=None):
return K.ones(shape, name=name)
from .utils.generic_utils import get_from_module

@ -23,16 +23,41 @@ class Sequential(Layer):
self.layer_cache = {}
for layer in layers:
self.add(layer)
self._cache_enabled = True
def __call__(self, X, train=False):
def __call__(self, X, mask=None, train=False):
# turn off layer cache temporarily
tmp_cache_enabled = self.cache_enabled
self.cache_enabled = False
# recursively search for a layer which is not a Sequential model
layer = self
while issubclass(layer.__class__, Sequential):
layer = layer.layers[0]
# set temporary input to first layer
tmp = self.layers[0].get_input
self.layers[0].get_input = lambda _: X
tmp_input = layer.get_input
tmp_mask = None
layer.get_input = lambda _: X
if hasattr(layer, 'get_input_mask'):
tmp_mask = layer.get_input_mask
layer.get_input_mask = lambda _: mask
Y = self.get_output(train=train)
# return input to first layer to what it was
self.layers[0].get_input = tmp
# return input from first layer to what it was
layer.get_input = tmp_input
if hasattr(layer, 'get_input_mask'):
layer.get_input_mask = tmp_mask
self.cache_enabled = tmp_cache_enabled
return Y
@property
def cache_enabled(self):
return self._cache_enabled
@cache_enabled.setter
def cache_enabled(self, value):
self._cache_enabled = value
for l in self.layers:
l.cache_enabled = value
def set_previous(self, layer):
self.layers[0].previous = layer
@ -79,7 +104,7 @@ class Sequential(Layer):
@property
def state_updates(self):
"""
Returns the `updates` from all layers in the sequence that are
Return the `updates` from all layers in the sequence that are
stateful. This is useful for separating _training_ updates and
_prediction_ updates for when we need to update a layers internal state
during a stateful prediction.
@ -207,7 +232,7 @@ class Graph(Layer):
@property
def state_updates(self):
"""
Returns the `updates` from all nodes in that graph for nodes that are
Return the `updates` from all nodes in that graph for nodes that are
stateful. This is useful for separating _training_ updates and
_prediction_ updates for when we need to update a layers internal state
during a stateful prediction.
@ -288,7 +313,7 @@ class Graph(Layer):
if dtype == 'float':
layer.input = K.placeholder(shape=layer.input_shape, name=name)
else:
if len(input_shape) == 1:
if (input_shape and len(input_shape) == 1) or (batch_input_shape and len(batch_input_shape) == 2):
layer.input = K.placeholder(shape=layer.input_shape,
dtype='int32',
name=name)
@ -375,9 +400,7 @@ class Graph(Layer):
dot_axes: Same meaning as `dot_axes` argument of `add_node()`
outputs: Used when `merge_mode=None`. Names for the output nodes.
create_output: Same meaning as `create_output` argument of `add_node()`.
When creating an output, `merge_mode` must be specified.
'''
layer.layer_cache = self.layer_cache
if name in self.namespace:
raise Exception('Duplicate node identifier: ' + name)
for o in outputs:
@ -408,7 +431,8 @@ class Graph(Layer):
raise Exception('Unknown identifier: ' + input)
s = Siamese(layer, layers, merge_mode,
concat_axis=concat_axis,
dot_axes=dot_axes)
dot_axes=dot_axes,
is_graph=True)
self.namespace.add(name)
self.nodes[name] = s
self.node_config.append({'name': name,
@ -425,7 +449,7 @@ class Graph(Layer):
self.namespace.add(sh_name)
self.nodes[sh_name] = sh
self.node_config.append({'name': sh_name,
'inputs': [s],
'inputs': [name],
'create_output': create_output})
if create_output:
self.add_output(sh_name, input=sh_name)

@ -584,7 +584,7 @@ class AveragePooling2D(_Pooling2D):
class UpSampling1D(Layer):
'''Repeats each temporal step `length` times along the time axis.
'''Repeat each temporal step `length` times along the time axis.
# Input shape
3D tensor with shape: `(samples, steps, features)`.
@ -609,7 +609,7 @@ class UpSampling1D(Layer):
def get_output(self, train=False):
X = self.get_input(train)
output = K.concatenate([X] * self.length, axis=1)
output = K.repeat_elements(X, self.length, axis=1)
return output
def get_config(self):
@ -620,7 +620,7 @@ class UpSampling1D(Layer):
class UpSampling2D(Layer):
'''Repeats the rows and columns of the data
'''Repeat the rows and columns of the data
by size[0] and size[1] respectively.
# Input shape
@ -668,15 +668,8 @@ class UpSampling2D(Layer):
def get_output(self, train=False):
X = self.get_input(train)
if self.dim_ordering == 'th':
output = K.concatenate([X] * self.size[0], axis=2)
output = K.concatenate([output] * self.size[1], axis=3)
elif self.dim_ordering == 'tf':
output = K.concatenate([X] * self.size[0], axis=1)
output = K.concatenate([output] * self.size[1], axis=2)
else:
raise Exception('Invalid dim_ordering: ' + self.dim_ordering)
return output
return K.resize_images(X, self.size[0], self.size[1],
self.dim_ordering)
def get_config(self):
config = {'name': self.__class__.__name__,

@ -35,36 +35,70 @@ class Layer(object):
def __init__(self, **kwargs):
allowed_kwargs = {'input_shape',
'trainable',
'batch_input_shape'}
'batch_input_shape',
'cache_enabled',
'name'}
for kwarg in kwargs:
assert kwarg in allowed_kwargs, "Keyword argument not understood: " + kwarg
assert kwarg in allowed_kwargs, 'Keyword argument not understood: ' + kwarg
if 'input_shape' in kwargs:
self.set_input_shape((None,) + tuple(kwargs['input_shape']))
if 'batch_input_shape' in kwargs:
self.set_input_shape(tuple(kwargs['batch_input_shape']))
self.trainable = True
if 'trainable' in kwargs:
self._trainable = kwargs['trainable']
self.trainable = kwargs['trainable']
self.name = self.__class__.__name__.lower()
if 'name' in kwargs:
self.name = kwargs['name']
if not hasattr(self, 'params'):
self.params = []
self.cache_enabled = True
if 'cache_enabled' in kwargs:
self.cache_enabled = kwargs['cache_enabled']
def __call__(self, X, train=False):
@property
def name(self):
return self._name
@name.setter
def name(self, name):
self._name = name
@property
def cache_enabled(self):
return self._cache_enabled
@cache_enabled.setter
def cache_enabled(self, value):
self._cache_enabled = value
def __call__(self, X, mask=None, train=False):
# set temporary input
tmp = self.get_input
tmp_input = self.get_input
tmp_mask = None
if hasattr(self, 'get_input_mask'):
tmp_mask = self.get_input_mask
self.get_input_mask = lambda _: mask
self.get_input = lambda _: X
Y = self.get_output(train=train)
# return input to what it was
self.get_input = tmp
if hasattr(self, 'get_input_mask'):
self.get_input_mask = tmp_mask
self.get_input = tmp_input
return Y
def set_previous(self, layer, connection_map={}):
'''Connect a layer to its parent in the computational graph.
'''
assert self.nb_input == layer.nb_output == 1, "Cannot connect layers: input count and output count should be 1."
assert self.nb_input == layer.nb_output == 1, 'Cannot connect layers: input count and output count should be 1.'
if hasattr(self, 'input_ndim'):
assert self.input_ndim == len(layer.output_shape), "Incompatible shapes: layer expected input with ndim=" +\
str(self.input_ndim) + " but previous layer has output_shape " + str(layer.output_shape)
assert self.input_ndim == len(layer.output_shape), ('Incompatible shapes: layer expected input with ndim=' +
str(self.input_ndim) +
' but previous layer has output_shape ' +
str(layer.output_shape))
if layer.get_output_mask() is not None:
assert self.supports_masked_input(), "Cannot connect non-masking layer to layer with masked output"
assert self.supports_masked_input(), 'Cannot connect non-masking layer to layer with masked output.'
self.previous = layer
self.build()
@ -132,12 +166,12 @@ class Layer(object):
if hasattr(self, 'previous'):
# to avoid redundant computations,
# layer outputs are cached when possible.
if hasattr(self, 'layer_cache'):
if hasattr(self, 'layer_cache') and self.cache_enabled:
previous_layer_id = '%s_%s' % (id(self.previous), train)
if previous_layer_id in self.layer_cache:
return self.layer_cache[previous_layer_id]
previous_output = self.previous.get_output(train=train)
if hasattr(self, 'layer_cache'):
if hasattr(self, 'layer_cache') and self.cache_enabled:
previous_layer_id = '%s_%s' % (id(self.previous), train)
self.layer_cache[previous_layer_id] = previous_output
return previous_output
@ -188,11 +222,12 @@ class Layer(object):
of the layer (i.e. it should match the
output of `get_weights`).
'''
assert len(self.params) == len(weights), 'Provided weight array does not match layer weights (' + \
str(len(self.params)) + ' layer params vs. ' + str(len(weights)) + ' provided weights)'
assert len(self.params) == len(weights), ('Provided weight array does not match layer weights (' +
str(len(self.params)) + ' layer params vs. ' +
str(len(weights)) + ' provided weights)')
for p, w in zip(self.params, weights):
if K.get_value(p).shape != w.shape:
raise Exception("Layer shape %s not compatible with weight shape %s." % (K.get_value(p).shape, w.shape))
raise Exception('Layer shape %s not compatible with weight shape %s.' % (K.get_value(p).shape, w.shape))
K.set_value(p, w)
def get_weights(self):
@ -207,11 +242,13 @@ class Layer(object):
def get_config(self):
'''Return the parameters of the layer, as a dictionary.
'''
config = {"name": self.__class__.__name__}
config = {'name': self.__class__.__name__}
if hasattr(self, '_input_shape'):
config['input_shape'] = self._input_shape[1:]
if hasattr(self, '_trainable'):
config['trainable'] = self._trainable
config['cache_enabled'] = self.cache_enabled
config['custom_name'] = self.name
return config
def get_params(self):
@ -285,8 +322,8 @@ class Masking(MaskedLayer):
self.input = K.placeholder(ndim=3)
def get_output_mask(self, train=False):
if K._BACKEND == "tensorflow":
raise Exception("Masking is Theano-only for the time being.")
if K._BACKEND == 'tensorflow':
raise Exception('Masking is Theano-only for the time being.')
X = self.get_input(train)
return K.any(K.ones_like(X) * (1. - K.equal(X, self.mask_value)),
axis=-1)
@ -297,8 +334,8 @@ class Masking(MaskedLayer):
axis=-1, keepdims=True)
def get_config(self):
config = {"name": self.__class__.__name__,
"mask_value": self.mask_value}
config = {'name': self.__class__.__name__,
'mask_value': self.mask_value}
base_config = super(Masking, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
@ -344,8 +381,8 @@ class TimeDistributedMerge(Layer):
raise Exception('Unknown merge mode')
def get_config(self):
config = {"name": self.__class__.__name__,
"mode": self.mode}
config = {'name': self.__class__.__name__,
'mode': self.mode}
base_config = super(TimeDistributedMerge, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
@ -458,6 +495,7 @@ class Merge(Layer):
if p not in self.params:
self.params.append(p)
self.constraints.append(c)
super(Merge, self).__init__()
@property
def output_shape(self):
@ -507,7 +545,7 @@ class Merge(Layer):
for i in range(len(self.layers)):
X = self.layers[i].get_output(train)
if X.name is None:
raise ValueError('merge_mode="join" only works with named inputs')
raise ValueError('merge_mode="join" only works with named inputs.')
else:
inputs[X.name] = X
return inputs
@ -537,7 +575,7 @@ class Merge(Layer):
output = output.dimshuffle((0, 'x'))
return output
else:
raise Exception('Unknown merge mode')
raise Exception('Unknown merge mode.')
def get_input(self, train=False):
res = []
@ -662,13 +700,59 @@ class Reshape(Layer):
super(Reshape, self).__init__(**kwargs)
self.dims = tuple(dims)
def _fix_unknown_dimension(self, input_shape, output_shape):
'''Find and replace a single missing dimension in an output shape
given and input shape.
A near direct port of the internal numpy function _fix_unknown_dimension
in numpy/core/src/multiarray/shape.c
# Arguments
input_shape: shape of array being reshaped
output_shape: desired shaped of the array with at most
a single -1 which indicates a dimension that should be
derived from the input shape.
# Returns
The new output shape with a -1 replaced with its computed value.
Raises a ValueError if the total array size of the output_shape is
different then the input_shape, or more then one unknown dimension
is specified.
'''
output_shape = list(output_shape)
msg = 'total size of new array must be unchanged'
known, unknown = 1, None
for index, dim in enumerate(output_shape):
if dim < 0:
if unknown is None:
unknown = index
else:
raise ValueError('can only specify one unknown dimension')
else:
known *= dim
original = np.prod(input_shape, dtype=int)
if unknown is not None:
if known == 0 or original % known != 0:
raise ValueError(msg)
output_shape[unknown] = original // known
elif original != known:
raise ValueError(msg)
return tuple(output_shape)
@property
def output_shape(self):
return (self.input_shape[0],) + self.dims
return (self.input_shape[0],) + self._fix_unknown_dimension(self.input_shape[1:], self.dims)
def get_output(self, train=False):
X = self.get_input(train)
return K.reshape(X, (-1,) + self.dims)
return K.reshape(X, (-1,) + self.output_shape[1:])
def get_config(self):
config = {'name': self.__class__.__name__,
@ -725,7 +809,7 @@ class Flatten(Layer):
'''Flatten the input. Does not affect the batch size.
# Input shape
Arbitrary, although all dimensions in the input shaped must be fixed.
Arbitrary, although all dimensions in the input shape must be fixed.
Use the keyword argument `input_shape`
(tuple of integers, does not include the samples axis)
when using this layer as the first layer in a model.
@ -739,11 +823,18 @@ class Flatten(Layer):
@property
def output_shape(self):
input_shape = self.input_shape
if not all(input_shape[1:]):
raise Exception('The shape of the input to "Flatten" '
'is not fully defined '
'(got ' + str(input_shape[1:]) + '. '
'Make sure to pass a complete "input_shape" '
'or "batch_input_shape" argument to the first '
'layer in your model.')
return (input_shape[0], np.prod(input_shape[1:]))
def get_output(self, train=False):
X = self.get_input(train)
return K.flatten(X)
return K.batch_flatten(X)
class RepeatVector(Layer):
@ -772,8 +863,8 @@ class RepeatVector(Layer):
return K.repeat(X, self.n)
def get_config(self):
config = {"name": self.__class__.__name__,
"n": self.n}
config = {'name': self.__class__.__name__,
'n': self.n}
base_config = super(RepeatVector, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
@ -918,9 +1009,9 @@ class ActivityRegularization(Layer):
return self.get_input(train)
def get_config(self):
config = {"name": self.__class__.__name__,
"l1": self.l1,
"l2": self.l2}
config = {'name': self.__class__.__name__,
'l1': self.l1,
'l2': self.l2}
base_config = super(ActivityRegularization, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
@ -995,7 +1086,7 @@ class TimeDistributedDense(MaskedLayer):
input_dim = self.input_shape[2]
self.W = self.init((input_dim, self.output_dim))
self.b = K.zeros((self.output_dim))
self.b = K.zeros((self.output_dim,))
self.params = [self.W, self.b]
self.regularizers = []
@ -1033,17 +1124,17 @@ class TimeDistributedDense(MaskedLayer):
return outputs
def get_config(self):
config = {"name": self.__class__.__name__,
"output_dim": self.output_dim,
"init": self.init.__name__,
"activation": self.activation.__name__,
"W_regularizer": self.W_regularizer.get_config() if self.W_regularizer else None,
"b_regularizer": self.b_regularizer.get_config() if self.b_regularizer else None,
"activity_regularizer": self.activity_regularizer.get_config() if self.activity_regularizer else None,
"W_constraint": self.W_constraint.get_config() if self.W_constraint else None,
"b_constraint": self.b_constraint.get_config() if self.b_constraint else None,
"input_dim": self.input_dim,
"input_length": self.input_length}
config = {'name': self.__class__.__name__,
'output_dim': self.output_dim,
'init': self.init.__name__,
'activation': self.activation.__name__,
'W_regularizer': self.W_regularizer.get_config() if self.W_regularizer else None,
'b_regularizer': self.b_regularizer.get_config() if self.b_regularizer else None,
'activity_regularizer': self.activity_regularizer.get_config() if self.activity_regularizer else None,
'W_constraint': self.W_constraint.get_config() if self.W_constraint else None,
'b_constraint': self.b_constraint.get_config() if self.b_constraint else None,
'input_dim': self.input_dim,
'input_length': self.input_length}
base_config = super(TimeDistributedDense, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
@ -1090,6 +1181,10 @@ class AutoEncoder(Layer):
self.decoder.set_previous(self.encoder)
if weights is not None:
self.set_weights(weights)
def build(self):
self.params = []
self.regularizers = []
self.constraints = []
@ -1103,11 +1198,9 @@ class AutoEncoder(Layer):
self.params.append(p)
self.constraints.append(c)
if weights is not None:
self.set_weights(weights)
def set_previous(self, node):
self.encoder.set_previous(node)
def set_previous(self, node, connection_map={}):
self.encoder.set_previous(node, connection_map)
super(AutoEncoder, self).set_previous(node, connection_map)
def get_weights(self):
weights = []
@ -1148,10 +1241,10 @@ class AutoEncoder(Layer):
return self.decoder.get_output(train)
def get_config(self):
return {"name": self.__class__.__name__,
"encoder_config": self.encoder.get_config(),
"decoder_config": self.decoder.get_config(),
"output_reconstruction": self.output_reconstruction}
return {'name': self.__class__.__name__,
'encoder_config': self.encoder.get_config(),
'decoder_config': self.decoder.get_config(),
'output_reconstruction': self.output_reconstruction}
class MaxoutDense(Layer):
@ -1275,6 +1368,8 @@ class Lambda(Layer):
if py3:
self.function = marshal.dumps(function.__code__)
else:
assert hasattr(function, 'func_code'), ('The Lambda layer "function"'
' argument must be a Python function.')
self.function = marshal.dumps(function.func_code)
if output_shape is None:
self._output_shape = None
@ -1285,6 +1380,7 @@ class Lambda(Layer):
self._output_shape = marshal.dumps(output_shape.__code__)
else:
self._output_shape = marshal.dumps(output_shape.func_code)
super(Lambda, self).__init__()
@property
def output_shape(self):
@ -1295,18 +1391,16 @@ class Lambda(Layer):
else:
output_shape_func = marshal.loads(self._output_shape)
output_shape_func = types.FunctionType(output_shape_func, globals())
shape = output_shape_func(self.previous.output_shape)
shape = output_shape_func(self.input_shape)
if type(shape) not in {list, tuple}:
raise Exception("output_shape function must return a tuple")
raise Exception('output_shape function must return a tuple')
return tuple(shape)
def get_output(self, train=False):
X = self.get_input(train)
func = marshal.loads(self.function)
func = types.FunctionType(func, globals())
if hasattr(self, 'previous'):
return func(self.previous.get_output(train))
else:
return func(self.input)
return func(X)
class MaskedLambda(MaskedLayer, Lambda):
@ -1330,7 +1424,7 @@ class LambdaMerge(Lambda):
def __init__(self, layers, function, output_shape=None):
if len(layers) < 2:
raise Exception('Please specify two or more input layers '
'(or containers) to merge')
'(or containers) to merge.')
self.layers = layers
self.params = []
self.regularizers = []
@ -1359,6 +1453,7 @@ class LambdaMerge(Lambda):
self._output_shape = marshal.dumps(output_shape.__code__)
else:
self._output_shape = marshal.dumps(output_shape.func_code)
super(Lambda, self).__init__()
@property
def output_shape(self):
@ -1372,7 +1467,7 @@ class LambdaMerge(Lambda):
output_shape_func = types.FunctionType(output_shape_func, globals())
shape = output_shape_func(input_shapes)
if type(shape) not in {list, tuple}:
raise Exception('output_shape function must return a tuple')
raise Exception('output_shape function must return a tuple.')
return tuple(shape)
def get_params(self):
@ -1442,29 +1537,32 @@ class Siamese(Layer):
merge_mode: Same meaning as `mode` argument of Merge layer
concat_axis: Same meaning as `concat_axis` argument of Merge layer
dot_axes: Same meaning as `dot_axes` argument of Merge layer
is_graph: Should be set to True when used inside `Graph`
'''
def __init__(self, layer, inputs, merge_mode='concat',
concat_axis=1, dot_axes=-1):
concat_axis=1, dot_axes=-1, is_graph=False):
if merge_mode not in ['sum', 'mul', 'concat', 'ave',
'join', 'cos', 'dot', None]:
raise Exception('Invalid merge mode: ' + str(merge_mode))
if merge_mode in {'cos', 'dot'}:
if len(inputs) > 2:
raise Exception(merge_mode + ' merge takes exactly 2 layers')
raise Exception(merge_mode + ' merge takes exactly 2 layers.')
self.layer = layer
self.trainable = layer.trainable
self.is_graph = is_graph
self.inputs = inputs
self.params = []
self.layer.set_previous(inputs[0])
self.merge_mode = merge_mode
self.concat_axis = concat_axis
self.dot_axes = dot_axes
layer.set_previous(inputs[0])
self.params = []
self.regularizers = []
self.constraints = []
self.updates = []
layers = [layer]
if merge_mode:
if merge_mode and not is_graph:
layers += inputs
for l in layers:
params, regs, consts, updates = l.get_params()
@ -1475,6 +1573,7 @@ class Siamese(Layer):
if p not in self.params:
self.params.append(p)
self.constraints.append(c)
super(Siamese, self).__init__()
@property
def output_shape(self):
@ -1512,15 +1611,18 @@ class Siamese(Layer):
def get_params(self):
return self.params, self.regularizers, self.constraints, self.updates
def set_layer_input(self, index):
l = self.layer
while not hasattr(l, 'previous'):
l = l.layers[0]
l.previous = self.inputs[index]
def set_layer_input(self, head):
layer = self.layer
from ..layers.containers import Sequential
while issubclass(layer.__class__, Sequential):
layer = layer.layers[0]
layer.previous = self.inputs[head]
def get_output_at(self, head, train=False):
self.set_layer_input(head)
return self.layer.get_output(train)
X = self.inputs[head].get_output(train)
mask = self.inputs[head].get_output_mask(train)
Y = self.layer(X, mask)
return Y
def get_output_shape(self, head, train=False):
self.set_layer_input(head)
@ -1532,7 +1634,7 @@ class Siamese(Layer):
X = self.get_output_at(i, train)
if X.name is None:
raise ValueError('merge_mode="join" '
'only works with named inputs')
'only works with named inputs.')
o[X.name] = X
return o
@ -1621,7 +1723,7 @@ class Siamese(Layer):
def get_weights(self):
weights = self.layer.get_weights()
if self.merge_mode:
if self.merge_mode and not self.is_graph:
for m in self.inputs:
weights += m.get_weights()
return weights
@ -1630,7 +1732,7 @@ class Siamese(Layer):
nb_param = len(self.layer.params)
self.layer.set_weights(weights[:nb_param])
weights = weights[nb_param:]
if self.merge_mode:
if self.merge_mode and not self.is_graph:
for i in range(len(self.inputs)):
nb_param = len(self.inputs[i].params)
self.inputs[i].set_weights(weights[:nb_param])
@ -1642,17 +1744,18 @@ class Siamese(Layer):
'inputs': [m.get_config() for m in self.inputs],
'merge_mode': self.merge_mode,
'concat_axis': self.concat_axis,
'dot_axes': self.dot_axes}
'dot_axes': self.dot_axes,
'is_graph': self.is_graph}
base_config = super(Siamese, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
class SiameseHead(Layer):
'''This layer should be added only on top of a Siamese layer
with merge_mode = None
with merge_mode = None.
Outputs the output of the Siamese layer at a given index,
specified by the head argument
specified by the head argument.
# Arguments
head: The index at which the output of the Siamese layer
@ -1661,6 +1764,7 @@ class SiameseHead(Layer):
def __init__(self, head):
self.head = head
self.params = []
super(SiameseHead, self).__init__()
def get_output(self, train=False):
return self.get_input(train)
@ -1686,7 +1790,7 @@ class SiameseHead(Layer):
def add_shared_layer(layer, inputs):
'''Use this function to add a shared layer across
multiple Sequential models without merging the outputs
multiple Sequential models without merging the outputs.
'''
input_layers = [l.layers[-1] for l in inputs]
s = Siamese(layer, input_layers, merge_mode=None)
@ -1694,3 +1798,126 @@ def add_shared_layer(layer, inputs):
sh = SiameseHead(i)
inputs[i].add(s)
inputs[i].add(sh)
class Highway(Layer):
'''Densely connected highway network,
a natural extension of LSTMs to feedforward networks.
# Input shape
2D tensor with shape: `(nb_samples, input_dim)`.
# Output shape
2D tensor with shape: `(nb_samples, input_dim)`.
# Arguments
init: name of initialization function for the weights of the layer
(see [initializations](../initializations.md)),
or alternatively, Theano function to use for weights
initialization. This parameter is only relevant
if you don't pass a `weights` argument.
transform_bias: value for the bias to take on initially (default -2)
activation: name of activation function to use
(see [activations](../activations.md)),
or alternatively, elementwise Theano function.
If you don't specify anything, no activation is applied
(ie. "linear" activation: a(x) = x).
weights: list of numpy arrays to set as initial weights.
The list should have 1 element, of shape `(input_dim, output_dim)`.
W_regularizer: instance of [WeightRegularizer](../regularizers.md)
(eg. L1 or L2 regularization), applied to the main weights matrix.
b_regularizer: instance of [WeightRegularizer](../regularizers.md),
applied to the bias.
activity_regularizer: instance of [ActivityRegularizer](../regularizers.md),
applied to the network output.
W_constraint: instance of the [constraints](../constraints.md) module
(eg. maxnorm, nonneg), applied to the main weights matrix.
b_constraint: instance of the [constraints](../constraints.md) module,
applied to the bias.
input_dim: dimensionality of the input (integer).
This argument (or alternatively, the keyword argument `input_shape`)
is required when using this layer as the first layer in a model.
# References
- [Highway Networks](http://arxiv.org/pdf/1505.00387v2.pdf)
'''
input_ndim = 2
def __init__(self, init='glorot_uniform', transform_bias=-2,
activation='linear', weights=None,
W_regularizer=None, b_regularizer=None, activity_regularizer=None,
W_constraint=None, b_constraint=None, input_dim=None, **kwargs):
self.init = initializations.get(init)
self.transform_bias = transform_bias
self.activation = activations.get(activation)
self.W_regularizer = regularizers.get(W_regularizer)
self.b_regularizer = regularizers.get(b_regularizer)
self.activity_regularizer = regularizers.get(activity_regularizer)
self.W_constraint = constraints.get(W_constraint)
self.b_constraint = constraints.get(b_constraint)
self.constraints = [self.W_constraint, self.b_constraint]
self.initial_weights = weights
self.input_dim = input_dim
if self.input_dim:
kwargs['input_shape'] = (self.input_dim,)
self.input = K.placeholder(ndim=2)
super(Highway, self).__init__(**kwargs)
def build(self):
input_dim = self.input_shape[1]
self.W = self.init((input_dim, input_dim))
self.W_carry = self.init((input_dim, input_dim))
self.b = K.zeros((input_dim,))
# initialize with a vector of values `transform_bias`
self.b_carry = K.variable(np.ones((input_dim,)) * self.transform_bias)
self.params = [self.W, self.b, self.W_carry, self.b_carry]
self.regularizers = []
if self.W_regularizer:
self.W_regularizer.set_param(self.W)
self.regularizers.append(self.W_regularizer)
if self.b_regularizer:
self.b_regularizer.set_param(self.b)
self.regularizers.append(self.b_regularizer)
if self.activity_regularizer:
self.activity_regularizer.set_layer(self)
self.regularizers.append(self.activity_regularizer)
if self.initial_weights is not None:
self.set_weights(self.initial_weights)
del self.initial_weights
@property
def output_shape(self):
return (self.input_shape[0], self.input_shape[1])
def get_output(self, train=False):
X = self.get_input(train)
transform_weight = activations.sigmoid(K.dot(X, self.W_carry) + self.b_carry)
act = self.activation(K.dot(X, self.W) + self.b)
act *= transform_weight
output = act + (1 - transform_weight) * X
return output
def get_config(self):
config = {'name': self.__class__.__name__,
'init': self.init.__name__,
'transform_bias': self.transform_bias,
'activation': self.activation.__name__,
'W_regularizer': self.W_regularizer.get_config() if self.W_regularizer else None,
'b_regularizer': self.b_regularizer.get_config() if self.b_regularizer else None,
'activity_regularizer': self.activity_regularizer.get_config() if self.activity_regularizer else None,
'W_constraint': self.W_constraint.get_config() if self.W_constraint else None,
'b_constraint': self.b_constraint.get_config() if self.b_constraint else None,
'input_dim': self.input_dim}
base_config = super(Highway, self).get_config()
return dict(list(base_config.items()) + list(config.items()))

@ -8,10 +8,10 @@ from ..constraints import unitnorm
class Embedding(Layer):
'''Turn positive integers (indexes) into denses vectors of fixed size.
'''Turn positive integers (indexes) into dense vectors of fixed size.
eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]
This layer can only be used as the first layer in a model.
This layer can only be used as the first layer in a model.
# Input shape
2D tensor with shape: `(nb_samples, sequence_length)`.
@ -38,7 +38,7 @@ class Embedding(Layer):
This is useful for [recurrent layers](recurrent.md) which may take
variable length input. If this is `True` then all subsequent layers
in the model need to support masking or an exception will be raised.
input_length: Length of input sequences, when it is constantself.
input_length: Length of input sequences, when it is constant.
This argument is required if you are going to connect
`Flatten` then `Dense` layers upstream
(without it, the shape of the dense outputs cannot be computed).

@ -4,7 +4,9 @@ from .. import backend as K
class BatchNormalization(Layer):
'''Normalize the activations of the previous layer at each batch.
'''Normalize the activations of the previous layer at each batch,
i.e. applies a transformation that maintains the mean activation
close to 0 and the activation standard deviation close to 1.
# Input shape
Arbitrary. Use the keyword argument `input_shape`
@ -18,7 +20,13 @@ class BatchNormalization(Layer):
epsilon: small float > 0. Fuzz parameter.
mode: integer, 0 or 1.
- 0: feature-wise normalization.
- 1: sample-wise normalization.
If the input has multiple feature dimensions,
each will be normalized separately
(e.g. for an image input with shape
`(channels, rows, cols)`,
each combination of a channel, row and column
will be normalized separately).
- 1: sample-wise normalization. This mode assumes a 2D input.
momentum: momentum in the computation of the
exponential average of the mean and standard deviation
of the data, for feature-wise normalization.
@ -42,22 +50,12 @@ class BatchNormalization(Layer):
input_shape = self.input_shape # starts with samples axis
input_shape = input_shape[1:]
self.gamma = self.init((input_shape))
self.gamma = self.init(input_shape)
self.beta = K.zeros(input_shape)
self.params = [self.gamma, self.beta]
self.running_mean = K.zeros(input_shape)
self.running_std = K.ones((input_shape))
# initialize self.updates: batch mean/std computation
X = self.get_input(train=True)
m = K.mean(X, axis=0)
std = K.mean(K.square(X - m) + self.epsilon, axis=0)
std = K.sqrt(std)
mean_update = self.momentum * self.running_mean + (1-self.momentum) * m
std_update = self.momentum * self.running_std + (1-self.momentum) * std
self.updates = [(self.running_mean, mean_update),
(self.running_std, std_update)]
self.running_std = K.ones(input_shape)
if self.initial_weights is not None:
self.set_weights(self.initial_weights)
@ -76,6 +74,13 @@ class BatchNormalization(Layer):
def get_output(self, train):
X = self.get_input(train)
if self.mode == 0:
m = K.mean(X, axis=0)
std = K.mean(K.square(X - m) + self.epsilon, axis=0)
std = K.sqrt(std)
mean_update = self.momentum * self.running_mean + (1-self.momentum) * m
std_update = self.momentum * self.running_std + (1-self.momentum) * std
self.updates = [(self.running_mean, mean_update),
(self.running_std, std_update)]
X_normed = ((X - self.running_mean) /
(self.running_std + self.epsilon))
elif self.mode == 1:

@ -30,7 +30,7 @@ class Recurrent(MaskedLayer):
return_sequences: Boolean. Whether to return the last output
in the output sequence, or the full sequence.
go_backwards: Boolean (default False).
If True, rocess the input sequence backwards.
If True, process the input sequence backwards.
stateful: Boolean (default False). If True, the last state
for each sample at index i in a batch will be used as initial
state for the sample of index i in the following batch.
@ -43,7 +43,7 @@ class Recurrent(MaskedLayer):
`Flatten` then `Dense` layers upstream
(without it, the shape of the dense outputs cannot be computed).
Note that if the recurrent layer is not the first layer
in your model, you would need to specify the input Length
in your model, you would need to specify the input length
at the level of the first layer
(e.g. via the `input_shape` argument)
@ -73,7 +73,7 @@ class Recurrent(MaskedLayer):
To enable statefulness:
- specify `stateful=True` in the layer constructor.
- specify a fixed batch size for your model, by passing
a `batch_input_size=(...)` to the first layer in your model.
a `batch_input_shape=(...)` to the first layer in your model.
This is the expected shape of your inputs *including the batch size*.
It should be a tuple of integers, e.g. `(32, 10, 100)`.
@ -129,7 +129,7 @@ class Recurrent(MaskedLayer):
if K._BACKEND == 'tensorflow':
if not self.input_shape[1]:
raise Exception('When using TensorFlow, you should define ' +
'explicitely the number of timesteps of ' +
'explicitly the number of timesteps of ' +
'your sequences. Make sure the first layer ' +
'has a "batch_input_shape" argument ' +
'including the samples axis.')
@ -205,7 +205,7 @@ class SimpleRNN(Recurrent):
self.W = self.init((input_dim, self.output_dim))
self.U = self.inner_init((self.output_dim, self.output_dim))
self.b = K.zeros((self.output_dim))
self.b = K.zeros((self.output_dim,))
self.params = [self.W, self.U, self.b]
if self.initial_weights is not None:
@ -326,7 +326,7 @@ class GRU(Recurrent):
z = self.inner_activation(x_z + K.dot(h_tm1, self.U_z))
r = self.inner_activation(x_r + K.dot(h_tm1, self.U_r))
hh = self.inner_activation(x_h + K.dot(r * h_tm1, self.U_h))
hh = self.activation(x_h + K.dot(r * h_tm1, self.U_h))
h = z * h_tm1 + (1 - z) * hh
return h, [h]
@ -391,19 +391,19 @@ class LSTM(Recurrent):
self.W_i = self.init((input_dim, self.output_dim))
self.U_i = self.inner_init((self.output_dim, self.output_dim))
self.b_i = K.zeros((self.output_dim))
self.b_i = K.zeros((self.output_dim,))
self.W_f = self.init((input_dim, self.output_dim))
self.U_f = self.inner_init((self.output_dim, self.output_dim))
self.b_f = self.forget_bias_init((self.output_dim))
self.b_f = self.forget_bias_init((self.output_dim,))
self.W_c = self.init((input_dim, self.output_dim))
self.U_c = self.inner_init((self.output_dim, self.output_dim))
self.b_c = K.zeros((self.output_dim))
self.b_c = K.zeros((self.output_dim,))
self.W_o = self.init((input_dim, self.output_dim))
self.U_o = self.inner_init((self.output_dim, self.output_dim))
self.b_o = K.zeros((self.output_dim))
self.b_o = K.zeros((self.output_dim,))
self.params = [self.W_i, self.U_i, self.b_i,
self.W_c, self.U_c, self.b_c,

@ -5,6 +5,12 @@ import warnings
import pprint
from six.moves import range
import six
import time
import threading
try:
import queue
except ImportError:
import Queue as queue
from . import backend as K
from . import optimizers
@ -53,7 +59,7 @@ def slice_X(X, start=None, stop=None):
'''
if type(X) == list:
if hasattr(start, '__len__'):
# hdf5 dataset only support list object as indices
# hdf5 datasets only support list objects as indices
if hasattr(start, 'shape'):
start = start.tolist()
return [x[start] for x in X]
@ -75,10 +81,12 @@ def weighted_objective(fn):
# score_array has ndim >= 2
score_array = fn(y_true, y_pred)
if mask is not None:
# Cast the mask to floatX to avoid float64 upcasting in theano
mask = K.cast(mask, K.floatx())
# mask should have the same shape as score_array
score_array *= mask
# the loss per batch should be proportional
# to the number of unmasked sampled.
# to the number of unmasked samples.
score_array /= K.mean(mask)
# reduce score_array to 1D
@ -148,6 +156,16 @@ def model_from_config(config, custom_objects={}):
if 'optimizer' in config:
# if it has an optimizer, the model is assumed to be compiled
loss = config.get('loss')
# if a custom loss function is passed replace it in loss
if model_name == 'Graph':
for l in loss:
for c in custom_objects:
if loss[l] == c:
loss[l] = custom_objects[c]
elif model_name == 'Sequential' and loss in custom_objects:
loss = custom_objects[loss]
class_mode = config.get('class_mode')
optimizer_params = dict([(k, v) for k, v in config.get('optimizer').items()])
@ -179,6 +197,8 @@ class Model(object):
Abstract fit function for f(ins).
Assume that f returns a list, labelled by out_labels.
'''
self.training_data = ins
self.validation_data = val_ins
do_validation = False
if val_f and val_ins:
do_validation = True
@ -360,8 +380,21 @@ class Model(object):
`keras.models.from_json(json_string, custom_objects={})`.
'''
import json
def get_json_type(obj):
# if obj is any numpy type
if type(obj).__module__ == np.__name__:
return obj.item();
# if obj is a python 'type'
if type(obj).__name__ == type.__name__:
return obj.__name__
raise TypeError('Not JSON Serializable')
config = self.get_config()
return json.dumps(config, **kwargs)
return json.dumps(config, default=get_json_type, **kwargs)
def summary(self):
'''Print out a summary of the model architecture,
@ -391,7 +424,7 @@ class Sequential(Model, containers.Sequential):
self.optimizer = optimizers.get(optimizer)
self.loss = objectives.get(loss)
weighted_loss = weighted_objective(objectives.get(loss))
weighted_loss = weighted_objective(self.loss)
# input of model
self.X_train = self.get_input(train=True)
@ -445,15 +478,15 @@ class Sequential(Model, containers.Sequential):
self._train = K.function(train_ins, [train_loss], updates=updates)
self._train_with_acc = K.function(train_ins, [train_loss, train_accuracy], updates=updates)
self._predict = K.function(predict_ins, [self.y_test], updates=self.state_updates)
self._test = K.function(test_ins, [test_loss])
self._test_with_acc = K.function(test_ins, [test_loss, test_accuracy])
self._test = K.function(test_ins, [test_loss], updates=self.state_updates)
self._test_with_acc = K.function(test_ins, [test_loss, test_accuracy], updates=self.state_updates)
def fit(self, X, y, batch_size=128, nb_epoch=100, verbose=1, callbacks=[],
validation_split=0., validation_data=None, shuffle=True,
show_accuracy=False, class_weight=None, sample_weight=None):
'''Train the model for a fixed number of epochs.
Returns a history object. It `history` attribute is a record of
Returns a history object. Its `history` attribute is a record of
training loss values at successive epochs,
as well as validation loss values (if applicable).
@ -490,6 +523,20 @@ class Sequential(Model, containers.Sequential):
output timesteps, which is useful
in sequence to sequence learning.
'''
if type(X) == list:
if len(set([len(a) for a in X] + [len(y)])) != 1:
raise Exception('All input arrays and the target array must '
'have the same number of samples.')
else:
if len(X) != len(y):
raise Exception('The input data tensor (X) and '
'the target tensor (y) must have '
'the same number of samples. Found: '
'len(X) = {}, len(y) = {}'.format(len(X), len(y)))
if sample_weight is not None:
assert len(sample_weight) == len(y), ('"sample_weight" must have '
'the same number of samples '
'as X and y.')
X = standardize_X(X)
y = standardize_y(y)
@ -503,11 +550,20 @@ class Sequential(Model, containers.Sequential):
if validation_data:
if len(validation_data) == 2:
X_val, y_val = validation_data
if type(X_val) == list:
assert len(set([len(a) for a in X_val] + [len(y_val)])) == 1
else:
assert len(X_val) == len(y_val)
X_val = standardize_X(X_val)
y_val = standardize_y(y_val)
sample_weight_val = standardize_weights(y_val)
elif len(validation_data) == 3:
X_val, y_val, sample_weight_val = validation_data
if type(X_val) == list:
assert len(set([len(a) for a in X_val] +
[len(y_val), len(sample_weight_val)])) == 1
else:
assert len(X_val) == len(y_val) == len(sample_weight_val)
X_val = standardize_X(X_val)
y_val = standardize_y(y_val)
sample_weight_val = standardize_weights(y_val,
@ -611,6 +667,20 @@ class Sequential(Model, containers.Sequential):
verbose: verbosity mode, 0 or 1.
sample_weight: sample weights, as a numpy array.
'''
if type(X) == list:
if len(set([len(a) for a in X] + [len(y)])) != 1:
raise Exception('All input arrays and the target array must '
'have the same number of samples.')
else:
if len(X) != len(y):
raise Exception('The input data tensor (X) and '
'the target tensor (y) must have '
'the same number of samples. Found: '
'len(X) = {}, len(y) = {}'.format(len(X), len(y)))
if sample_weight is not None:
assert len(sample_weight) == len(y), ('"sample_weight" must have '
'the same number of samples '
'as X and y.')
X = standardize_X(X)
y = standardize_y(y)
sample_weight = standardize_weights(y, sample_weight=sample_weight)
@ -635,6 +705,20 @@ class Sequential(Model, containers.Sequential):
Arguments: see `fit` method.
'''
if type(X) == list:
if len(set([len(a) for a in X] + [len(y)])) != 1:
raise Exception('All input arrays and the target array must '
'have the same number of samples.')
else:
if len(X) != len(y):
raise Exception('The input data tensor (X) and '
'the target tensor (y) must have '
'the same number of samples. Found: '
'len(X) = {}, len(y) = {}'.format(len(X), len(y)))
if sample_weight is not None:
assert len(sample_weight) == len(y), ('"sample_weight" must have '
'the same number of samples '
'as X and y.')
X = standardize_X(X)
y = standardize_y(y)
sample_weight = standardize_weights(y, class_weight=class_weight,
@ -651,6 +735,20 @@ class Sequential(Model, containers.Sequential):
Arguments: see `fit` method.
'''
if type(X) == list:
if len(set([len(a) for a in X] + [len(y)])) != 1:
raise Exception('All input arrays and the target array must '
'have the same number of samples.')
else:
if len(X) != len(y):
raise Exception('The input data tensor (X) and '
'the target tensor (y) must have '
'the same number of samples. Found: '
'len(X) = {}, len(y) = {}'.format(len(X), len(y)))
if sample_weight is not None:
assert len(sample_weight) == len(y), ('"sample_weight" must have '
'the same number of samples '
'as X and y.')
X = standardize_X(X)
y = standardize_y(y)
sample_weight = standardize_weights(y, sample_weight=sample_weight)
@ -713,6 +811,208 @@ class Sequential(Model, containers.Sequential):
self.layers[k].set_weights(weights)
f.close()
def fit_generator(self, generator, samples_per_epoch, nb_epoch,
verbose=1, show_accuracy=False, callbacks=[],
validation_data=None, class_weight=None, nb_worker=1):
'''Fit a model on data generated batch-by-batch by a Python generator.
The generator is run in parallel to the model, for efficiency,
and can be run by multiple workers at the same time.
For instance, this allows you to do real-time data augmentation
on images on CPU in parallel to training your model on GPU.
# Arguments
generator: a Python generator,
yielding either (X, y) or (X, y, sample_weight).
The generator is expected to loop over its data
indefinitely. An epoch finishes when `samples_per_epoch`
samples have been seen by the model.
The output of the generator must be a tuple of either 2 or 3
numpy arrays.
If the output tuple has two elements, they are assumed to be
(input_data, target_data).
If it has three elements, they are assumed to be
(input_data, target_data, sample_weight).
All arrays should contain the same number of samples.
samples_per_epoch: integer, number of samples to process before
starting a new epoch.
nb_epoch: integer, total number of iterations on the data.
verbose: verbosity mode, 0, 1, or 2.
show_accuracy: boolean. Whether to display accuracy (only relevant
for classification problems).
callbacks: list of callbacks to be called during training.
validation_data: tuple of 2 or 3 numpy arrays. If 2 elements,
they are assumed to be (input_data, target_data);
if 3 elements, they are assumed to be
(input_data, target_data, sample weights).
class_weight: dictionary mapping class indices to a weight
for the class.
nb_worker: integer, number of workers to use for running
the generator (in parallel to model training).
If using multiple workers, the processing order of batches
generated by the model will be non-deterministic.
If using multiple workers, make sure to protect
any thread-unsafe operation done by the generator
using a Python mutex.
# Returns
A `History` object.
# Examples
```python
def generate_arrays_from_file(path):
while 1:
f = open(path)
for line in f:
# create numpy arrays of input data
# and labels, from each line in the file
x, y = process_line(line)
yield x, y
f.close()
model.fit_generator(generate_arrays_from_file('/my_file.txt'),
samples_per_epoch=10000, nb_epoch=10)
```
'''
max_queue_size = 10 # maximum number of batches in queue
wait_time = 0.05 # in seconds
epoch = 0
do_validation = bool(validation_data)
if show_accuracy:
out_labels = ['loss', 'acc']
else:
out_labels = ['loss']
metrics = ['loss', 'acc', 'val_loss', 'val_acc']
# prepare callbacks
history = cbks.History()
if verbose:
callbacks = [history, cbks.BaseLogger()] + callbacks
else:
callbacks = [history] + callbacks
callbacks = cbks.CallbackList(callbacks)
callbacks._set_model(self)
callbacks._set_params({
'nb_epoch': nb_epoch,
'nb_sample': samples_per_epoch,
'verbose': verbose,
'do_validation': do_validation,
'metrics': metrics,
})
callbacks.on_train_begin()
# util function to validate the batches produced
# by the generator
def input_validation(generator_output):
if not hasattr(generator_output, '__len__'):
_stop.set()
raise Exception('The generator output must be a tuple.')
if len(generator_output) == 2:
X, y = generator_output
if type(X) == list:
assert len(set([len(a) for a in X] + [len(y)])) == 1
else:
assert len(X) == len(y)
sample_weight = None
elif len(generator_output) == 3:
X, y, sample_weight = generator_output
if type(X) == list:
assert len(set([len(a) for a in X] + [len(y), len(sample_weight)])) == 1
else:
assert len(X) == len(y) == len(sample_weight)
else:
_stop.set()
raise Exception('The generator output tuple must have '
'2 or 3 elements.')
return X, y, sample_weight
# start generator thread storing batches into a queue
generator_queue = queue.Queue()
_stop = threading.Event()
def generator_task():
i = 0
while not _stop.is_set():
try:
if generator_queue.qsize() < max_queue_size:
generator_output = next(generator)
generator_queue.put(generator_output)
i += 1
else:
time.sleep(wait_time)
except:
_stop.set()
return
generator_threads = [threading.Thread(target=generator_task) for _ in range(nb_worker)]
for thread in generator_threads:
thread.start()
self.stop_training = False
while epoch < nb_epoch:
callbacks.on_epoch_begin(epoch)
samples_seen = 0
batch_index = 0
while samples_seen < samples_per_epoch:
while not _stop.is_set():
if not generator_queue.empty():
generator_output = generator_queue.get()
break
else:
time.sleep(wait_time)
X, y, sample_weight = input_validation(generator_output)
batch_logs = {}
batch_size = len(X[0])
batch_logs['batch'] = batch_index
batch_logs['size'] = batch_size
callbacks.on_batch_begin(batch_index, batch_logs)
outs = self.train_on_batch(X, y,
accuracy=show_accuracy,
sample_weight=sample_weight,
class_weight=class_weight)
if type(outs) != list:
outs = [outs]
for l, o in zip(out_labels, outs):
batch_logs[l] = o
callbacks.on_batch_end(batch_index, batch_logs)
# construct epoch logs
epoch_logs = {}
batch_index += 1
samples_seen += batch_size
if samples_seen >= samples_per_epoch: # epoch finished
if do_validation:
if hasattr(validation_data, 'next'):
# assumed to be generator
# TODO: call self.evaluate_generator()
_stop.set()
raise NotImplementedError()
else:
# input validation
X, y, sample_weight = input_validation(validation_data)
val_outs = self.evaluate(X, y,
show_accuracy=show_accuracy,
sample_weight=sample_weight,
verbose=0)
if type(val_outs) != list:
val_outs = [val_outs]
# same labels assumed
for l, o in zip(out_labels, val_outs):
epoch_logs['val_' + l] = o
callbacks.on_epoch_end(epoch, epoch_logs)
epoch += 1
if self.stop_training:
break
_stop.set()
callbacks.on_train_end()
return history
class Graph(Model, containers.Graph):
'''Arbitrary connection graph.
@ -774,7 +1074,7 @@ class Graph(Model, containers.Graph):
self.loss = loss
self._train = K.function(train_ins, [train_loss], updates=updates)
self._test = K.function(test_ins, [test_loss])
self._test = K.function(test_ins, [test_loss], updates=self.state_updates)
self._predict = K.function(inputs=ins, outputs=ys_test,
updates=self.state_updates)
@ -783,7 +1083,7 @@ class Graph(Model, containers.Graph):
class_weight={}, sample_weight={}):
'''Train the model for a fixed number of epochs.
Returns a history object. It `history` attribute is a record of
Returns a history object. Its `history` attribute is a record of
training loss values at successive epochs,
as well as validation loss values (if applicable).
@ -812,6 +1112,9 @@ class Graph(Model, containers.Graph):
'''
X = [data[name] for name in self.input_order]
y = [standardize_y(data[name]) for name in self.output_order]
if len(set([len(a) for a in X] + [len(a) for a in y])) != 1:
raise Exception('All input arrays and target arrays must have '
'the same number of samples.')
sample_weight_list = [standardize_weights(y[i],
sample_weight=sample_weight.get(self.output_order[i])) for i in range(len(self.output_order))]
@ -856,8 +1159,10 @@ class Graph(Model, containers.Graph):
'''
sample_weight = [standardize_weights(data[name],
sample_weight=sample_weight.get(name)) for name in self.output_order]
ins = [data[name] for name in self.input_order] + [standardize_y(data[name]) for name in self.output_order] + sample_weight
if len(set([len(a) for a in ins])) != 1:
raise Exception('All input arrays and target arrays must have '
'the same number of samples.')
outs = self._test_loop(self._test, ins, batch_size, verbose)
return outs[0]
@ -868,6 +1173,9 @@ class Graph(Model, containers.Graph):
Arguments: see `fit` method.
'''
ins = [data[name] for name in self.input_order]
if len(set([len(a) for a in ins])) != 1:
raise Exception('All input arrays and target arrays must have '
'the same number of samples.')
outs = self._predict_loop(self._predict, ins, batch_size, verbose)
return dict(zip(self.output_order, outs))
@ -880,6 +1188,9 @@ class Graph(Model, containers.Graph):
sample_weight=sample_weight.get(name),
class_weight=class_weight.get(name)) for name in self.output_order]
ins = [data[name] for name in self.input_order] + [standardize_y(data[name]) for name in self.output_order] + sample_weight
if len(set([len(a) for a in ins])) != 1:
raise Exception('All input arrays and target arrays must have '
'the same number of samples.')
return self._train(ins)
def test_on_batch(self, data, sample_weight={}):
@ -890,13 +1201,20 @@ class Graph(Model, containers.Graph):
sample_weight = [standardize_weights(data[name],
sample_weight=sample_weight.get(name)) for name in self.output_order]
ins = [data[name] for name in self.input_order] + [standardize_y(data[name]) for name in self.output_order] + sample_weight
if len(set([len(a) for a in ins])) != 1:
raise Exception('All input arrays and target arrays must have '
'the same number of samples.')
return self._test(ins)
def predict_on_batch(self, data):
'''Generate predictions for a single batch of samples.
'''
ins = [data[name] for name in self.input_order]
return self._predict(ins)
if len(set([len(a) for a in ins])) != 1:
raise Exception('All input arrays and target arrays must have '
'the same number of samples.')
outs = self._predict(ins)
return dict(zip(self.output_order, outs))
def save_weights(self, filepath, overwrite=False):
'''Save weights from all layers to a HDF5 files.
@ -938,3 +1256,198 @@ class Graph(Model, containers.Graph):
weights = [g['param_{}'.format(p)] for p in range(g.attrs['nb_params'])]
self.set_weights(weights)
f.close()
def fit_generator(self, generator, samples_per_epoch, nb_epoch,
verbose=1, callbacks=[],
validation_data=None, class_weight={}, nb_worker=1):
'''Fit a model on data generated batch-by-batch by a Python generator.
The generator is run in parallel to the model, for efficiency,
and can be run by multiple workers at the same time.
For instance, this allows you to do real-time data augmentation
on images on CPU in parallel to training your model on GPU.
# Arguments
generator: a generator.
The output of the generator must be either a dictionary
mapping inputs and outputs names to numpy arrays, or
a tuple of dictionaries (input_data, sample_weight).
All arrays should contain the same number of samples.
The generator is expected to loop over its data
indefinitely. An epoch finishes when `samples_per_epoch`
samples have been seen by the model.
samples_per_epoch: integer, number of samples to process before
going to the next epoch.
nb_epoch: integer, total number of iterations on the data.
verbose: verbosity mode, 0, 1, or 2.
callbacks: list of callbacks to be called during training.
validation_data: dictionary mapping input names and outputs names
to appropriate numpy arrays to be used as
held-out validation data.
All arrays should contain the same number of samples.
class_weight: dictionary mapping class indices to a weight
for the class.
nb_worker: integer, number of workers to use for running
the generator (in parallel to model training).
If using multiple workers, the processing order of batches
generated by the model will be non-deterministic.
If using multiple workers, make sure to protect
any thread-unsafe operation done by the generator
using a Python mutex.
# Returns
A `History` object.
# Examples
```python
def generate_arrays_from_file(path):
while 1:
f = open(path)
for line in f:
# create numpy arrays of input data
# and labels, from each line in the file
x1, x2, y = process_line(line)
yield {'input_1': x1, 'input_2': x2, 'output': y}
f.close()
graph.fit_generator(generate_arrays_from_file('/my_file.txt'),
samples_per_epoch=10000, nb_epoch=10)
```
'''
max_queue_size = 10 # maximum number of batches in queue
wait_time = 0.05 # in seconds
epoch = 0
do_validation = bool(validation_data)
out_labels = ['loss']
metrics = ['loss', 'val_loss']
if not class_weight:
class_weight = {}
# prepare callbacks
history = cbks.History()
if verbose:
callbacks = [history, cbks.BaseLogger()] + callbacks
else:
callbacks = [history] + callbacks
callbacks = cbks.CallbackList(callbacks)
callbacks._set_model(self)
callbacks._set_params({
'nb_epoch': nb_epoch,
'nb_sample': samples_per_epoch,
'verbose': verbose,
'do_validation': do_validation,
'metrics': metrics,
})
callbacks.on_train_begin()
# util function to validate the batches produced
# by the generator
def input_validation(generator_output):
if type(generator_output) in [list, tuple]:
if len(generator_output) == 2:
data, sample_weight = generator_output
else:
_stop.set()
raise Exception('The generator output tuple must have '
'2 dictionary elements: '
'(data, sample_weight).')
elif type(generator_output) == dict:
data = generator_output
sample_weight = {}
else:
_stop.set()
raise Exception('The generator output must be '
'a data dictionary or a tuple '
'(data, sample_weight).')
assert type(data) == dict
assert type(sample_weight) == dict
if len(set([len(data[name]) for name in data.keys()] +
[len(sample_weight[name]) for name in sample_weight.keys()])) != 1:
raise Exception('All input arrays and target arrays must have '
'the same number of samples.')
return data, sample_weight
# start generator thread storing batches into a queue
generator_queue = queue.Queue()
_stop = threading.Event()
def generator_task():
i = 0
while not _stop.is_set():
try:
if generator_queue.qsize() < max_queue_size:
generator_output = next(generator)
generator_queue.put(generator_output)
i += 1
else:
time.sleep(wait_time)
except:
_stop.set()
return
generator_threads = [threading.Thread(target=generator_task) for _ in range(nb_worker)]
for thread in generator_threads:
thread.start()
self.stop_training = False
while epoch < nb_epoch:
callbacks.on_epoch_begin(epoch)
samples_seen = 0
batch_index = 0
while samples_seen < samples_per_epoch:
while not _stop.is_set():
if not generator_queue.empty():
generator_output = generator_queue.get()
break
else:
time.sleep(wait_time)
data, sample_weight = input_validation(generator_output)
batch_logs = {}
batch_size = len(data[list(data.keys())[0]])
batch_logs['batch'] = batch_index
batch_logs['size'] = batch_size
callbacks.on_batch_begin(batch_index, batch_logs)
outs = self.train_on_batch(data,
sample_weight=sample_weight,
class_weight=class_weight)
if type(outs) != list:
outs = [outs]
for l, o in zip(out_labels, outs):
batch_logs[l] = o
callbacks.on_batch_end(batch_index, batch_logs)
# construct epoch logs
epoch_logs = {}
batch_index += 1
samples_seen += batch_size
if samples_seen >= samples_per_epoch: # epoch finished
if do_validation:
if hasattr(validation_data, 'next'):
# assumed to be generator
# TODO: call self.evaluate_generator()
_stop.set()
raise NotImplementedError()
else:
# input validation
data, sample_weight = input_validation(validation_data)
val_outs = self.evaluate(data,
sample_weight=sample_weight,
verbose=0)
if type(val_outs) != list:
val_outs = [val_outs]
# same labels assumed
for l, o in zip(out_labels, val_outs):
epoch_logs['val_' + l] = o
callbacks.on_epoch_end(epoch, epoch_logs)
epoch += 1
if self.stop_training:
break
_stop.set()
callbacks.on_train_end()
return history

@ -35,7 +35,7 @@ def hinge(y_true, y_pred):
def categorical_crossentropy(y_true, y_pred):
'''Expects a binary class matrix instead of a vector of scalar classes
'''Expects a binary class matrix instead of a vector of scalar classes.
'''
return K.mean(K.categorical_crossentropy(y_pred, y_true), axis=-1)
@ -44,15 +44,25 @@ def binary_crossentropy(y_true, y_pred):
return K.mean(K.binary_crossentropy(y_pred, y_true), axis=-1)
def poisson_loss(y_true, y_pred):
def poisson(y_true, y_pred):
return K.mean(y_pred - y_true * K.log(y_pred + K.epsilon()), axis=-1)
def cosine_proximity(y_true, y_pred):
assert K.ndim(y_true) == 2
assert K.ndim(y_pred) == 2
y_true = K.l2_normalize(y_true, axis=1)
y_pred = K.l2_normalize(y_pred, axis=1)
return -K.mean(y_true * y_pred, axis=1)
# aliases
mse = MSE = mean_squared_error
rmse = RMSE = root_mean_squared_error
mae = MAE = mean_absolute_error
mape = MAPE = mean_absolute_percentage_error
msle = MSLE = mean_squared_logarithmic_error
cosine = cosine_proximity
from .utils.generic_utils import get_from_module
def get(identifier):

@ -275,12 +275,66 @@ class Adam(Optimizer):
"beta_2": float(K.get_value(self.beta_2)),
"epsilon": self.epsilon}
class Adamax(Optimizer):
'''Adamax optimizer from Adam paper's Section 7. It is a variant
of Adam based on the infinity norm.
Default parameters follow those provided in the paper.
# Arguments
lr: float >= 0. Learning rate.
beta_1/beta_2: floats, 0 < beta < 1. Generally close to 1.
epsilon: float >= 0. Fuzz factor.
# References
- [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
'''
def __init__(self, lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=1e-8,
*args, **kwargs):
super(Adamax, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable(0)
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2)
def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [(self.iterations, self.iterations+1.)]
t = self.iterations + 1
lr_t = self.lr / (1 - K.pow(self.beta_1, t))
for p, g, c in zip(params, grads, constraints):
# zero init of 1st moment
m = K.variable(np.zeros(K.get_value(p).shape))
# zero init of exponentially weighted infinity norm
u = K.variable(np.zeros(K.get_value(p).shape))
m_t = (self.beta_1 * m) + (1 - self.beta_1) * g
u_t = K.maximum(self.beta_2 * u, K.abs(g))
p_t = p - lr_t * m_t / (u_t + self.epsilon)
self.updates.append((m, m_t))
self.updates.append((u, u_t))
self.updates.append((p, c(p_t))) # apply constraints
return self.updates
def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"beta_1": float(K.get_value(self.beta_1)),
"beta_2": float(K.get_value(self.beta_2)),
"epsilon": self.epsilon}
# aliases
sgd = SGD
rmsprop = RMSprop
adagrad = Adagrad
adadelta = Adadelta
adam = Adam
adamax = Adamax
def get(identifier, kwargs=None):

@ -6,7 +6,7 @@ from six.moves import range
def pad_sequences(sequences, maxlen=None, dtype='int32', padding='pre', truncating='pre', value=0.):
"""
Pad each sequence to the same length:
Pad each sequence to the same length:
the length of the longest sequence.
If maxlen is provided, any sequence longer
@ -15,6 +15,19 @@ def pad_sequences(sequences, maxlen=None, dtype='int32', padding='pre', truncati
Supports post-padding and pre-padding (default).
Parameters:
-----------
sequences: list of lists where each element is a sequence
maxlen: int, maximum length
dtype: type to cast the resulting sequence.
padding: 'pre' or 'post', pad either before or after each sequence.
truncating: 'pre' or 'post', remove values from sequences larger than
maxlen either in the beginning or in the end of the sequence
value: float, value to pad the sequences to the desired value.
Returns:
x: numpy array with dimensions (number_of_sequences, maxlen)
"""
lengths = [len(s) for s in sequences]
@ -47,39 +60,53 @@ def make_sampling_table(size, sampling_factor=1e-5):
This generates an array where the ith element
is the probability that a word of rank i would be sampled,
according to the sampling distribution used in word2vec.
The word2vec formula is:
p(word) = min(1, sqrt(word.frequency/sampling_factor) / (word.frequency/sampling_factor))
We assume that the word frequencies follow Zipf's law (s=1) to derive
We assume that the word frequencies follow Zipf's law (s=1) to derive
a numerical approximation of frequency(rank):
frequency(rank) ~ 1/(rank * (log(rank) + gamma) + 1/2 - 1/(12*rank))
where gamma is the Euler-Mascheroni constant.
Parameters:
-----------
size: int, number of possible words to sample.
'''
gamma = 0.577
rank = np.array(list(range(size)))
rank[0] = 1
inv_fq = rank * (np.log(rank) + gamma) + 0.5 - 1./(12.*rank)
f = sampling_factor * inv_fq
return np.minimum(1., f / np.sqrt(f))
def skipgrams(sequence, vocabulary_size,
window_size=4, negative_samples=1., shuffle=True,
categorical=False, sampling_table=None):
'''
Take a sequence (list of indexes of words),
def skipgrams(sequence, vocabulary_size,
window_size=4, negative_samples=1., shuffle=True,
categorical=False, sampling_table=None):
'''
Take a sequence (list of indexes of words),
returns couples of [word_index, other_word index] and labels (1s or 0s),
where label = 1 if 'other_word' belongs to the context of 'word',
and label=0 if 'other_word' is ramdomly sampled
@param vocabulary_size: int. maximum possible word index + 1
@param window_size: int. actually half-window. The window of a word wi will be [i-window_size, i+window_size+1]
@param negative_samples: float >= 0. 0 for no negative (=random) samples. 1 for same number as positive samples. etc.
@param categorical: bool. if False, labels will be integers (eg. [0, 1, 1 .. ]),
Paramaters:
-----------
vocabulary_size: int. maximum possible word index + 1
window_size: int. actually half-window. The window of a word wi will be [i-window_size, i+window_size+1]
negative_samples: float >= 0. 0 for no negative (=random) samples. 1 for same number as positive samples. etc.
categorical: bool. if False, labels will be integers (eg. [0, 1, 1 .. ]),
if True labels will be categorical eg. [[1,0],[0,1],[0,1] .. ]
Note: by convention, index 0 in the vocabulary is a non-word and will be skipped.
Returns:
--------
couples, lables: where `couples` are int pairs and
`labels` are either 0 or 1.
Notes:
------
By convention, index 0 in the vocabulary is a non-word and will be skipped.
'''
couples = []
labels = []

@ -39,7 +39,30 @@ def one_hot(text, n, filters=base_filter(), lower=True, split=" "):
class Tokenizer(object):
def __init__(self, nb_words=None, filters=base_filter(), lower=True, split=" "):
def __init__(self, nb_words=None, filters=base_filter(),
lower=True, split=' '):
'''The class allows to vectorize a text corpus, by turning each
text into either a sequence of integers (each integer being the index
of a token in a dictionary) or into a vector where the coefficient
for each token could be binary, based on word count, based on tf-idf...
# Arguments
nb_words: the maximum number of words to keep, based
on word frequency. Only the most common `nb_words` words will
be kept.
filters: a string where each element is a character that will be
filtered from the texts. The default is all punctuation, plus
tabs and line breaks, minus the `'` character.
lower: boolean. Whether to convert the texts to lowercase.
split: character or string to use for token splitting.
By default, all punctuation is removed, turning the texts into
space-separated sequences of words
(words maybe include the `'` character). These sequences are then
split into lists of tokens. They will then be indexed or vectorized.
`0` is a reserved index that won't be assigned to any word.
'''
self.word_counts = {}
self.word_docs = {}
self.filters = filters
@ -51,7 +74,10 @@ class Tokenizer(object):
def fit_on_texts(self, texts):
'''
required before using texts_to_sequences or texts_to_matrix
@param texts: can be a list or a generator (for memory-efficiency)
# Arguments
texts: can be a list of strings,
or a generator of strings (for memory-efficiency)
'''
self.document_count = 0
for text in texts:
@ -141,12 +167,12 @@ class Tokenizer(object):
if self.word_index:
nb_words = len(self.word_index) + 1
else:
raise Exception("Specify a dimension (nb_words argument), or fit on some text data first")
raise Exception("Specify a dimension (nb_words argument), or fit on some text data first.")
else:
nb_words = self.nb_words
if mode == "tfidf" and not self.document_count:
raise Exception("Fit the Tokenizer on some data before using tfidf mode")
raise Exception("Fit the Tokenizer on some data before using tfidf mode.")
X = np.zeros((len(sequences), nb_words))
for i, seq in enumerate(sequences):

@ -5,11 +5,13 @@ import sys
import six
def get_from_module(identifier, module_params, module_name, instantiate=False, kwargs=None):
def get_from_module(identifier, module_params, module_name,
instantiate=False, kwargs=None):
if isinstance(identifier, six.string_types):
res = module_params.get(identifier)
if not res:
raise Exception('Invalid ' + str(module_name) + ': ' + str(identifier))
raise Exception('Invalid ' + str(module_name) + ': ' +
str(identifier))
if instantiate and not kwargs:
return res()
elif instantiate and kwargs:
@ -23,28 +25,6 @@ def make_tuple(*args):
return args
def printv(v, prefix=''):
if type(v) == dict:
if 'name' in v:
print(prefix + '#' + v['name'])
del v['name']
prefix += '...'
for nk, nv in v.items():
if type(nv) in [dict, list]:
print(prefix + nk + ':')
printv(nv, prefix)
else:
print(prefix + nk + ':' + str(nv))
elif type(v) == list:
prefix += '...'
for i, nv in enumerate(v):
print(prefix + '#' + str(i))
printv(nv, prefix)
else:
prefix += '...'
print(prefix + str(v))
class Progbar(object):
def __init__(self, target, width=30, verbose=1):
'''
@ -110,7 +90,7 @@ class Progbar(object):
info += ' - %s:' % k
if type(self.sum_values[k]) is list:
avg = self.sum_values[k][0] / max(1, self.sum_values[k][1])
if avg > 1e-3:
if abs(avg) > 1e-3:
info += ' %.4f' % avg
else:
info += ' %.4e' % avg

@ -26,12 +26,14 @@ def container_from_config(original_layer_dict, custom_objects={}):
if name == 'Merge':
mode = layer_dict.get('mode')
concat_axis = layer_dict.get('concat_axis')
dot_axes = layer_dict.get('dot_axes')
layers = layer_dict.get('layers')
layer_list = []
for layer in layers:
init_layer = container_from_config(layer)
layer_list.append(init_layer)
merge_layer = Merge(layer_list, mode)
merge_layer = Merge(layer_list, mode, concat_axis, dot_axes)
return merge_layer
elif name == 'Sequential':
@ -69,10 +71,11 @@ def container_from_config(original_layer_dict, custom_objects={}):
kwargs[kwarg] = layer_dict[kwarg]
return AutoEncoder(**kwargs)
else:
else: # this is a non-topological layer (e.g. Dense, etc.)
layer_dict.pop('name')
for k, v in layer_dict.items():
# a dictionary argument may be a regularizer or constraint
if isinstance(v, dict):
vname = v.pop('name')
if vname in [x for x, y in inspect.getmembers(constraints, predicate=inspect.isclass)]:
@ -83,6 +86,9 @@ def container_from_config(original_layer_dict, custom_objects={}):
# not a regularizer of constraint, don't touch it
v['name'] = vname
# the "name" keyword argument of layers is saved as "custom_name"
if 'custom_name' in layer_dict:
layer_dict['name'] = layer_dict.pop('custom_name')
base_layer = get_layer(name, layer_dict)
return base_layer

@ -7,7 +7,7 @@ from six.moves import zip
def to_categorical(y, nb_classes=None):
'''Convert class vector (integers from 0 to nb_classes)
to binary class matrix, for use with categorical_crossentropy
to binary class matrix, for use with categorical_crossentropy.
'''
y = np.asarray(y, dtype='int32')
if not nb_classes:

@ -1,10 +1,17 @@
import pydot
# old pydot will not work with python3, must use one
# that works with python3 such as pydot2 or pydot
import itertools
from keras.layers.containers import Graph, Sequential
from keras.layers.core import Merge
try:
# pydot-ng is a fork of pydot that is better maintained
import pydot_ng as pydot
except ImportError:
# fall back on pydot if necessary
import pydot
if not pydot.find_graphviz():
raise RuntimeError("Failed to import pydot. You must install pydot"
" and graphviz for `pydotprint` to work.")
def layer_typename(layer):
return type(layer).__module__ + "." + type(layer).__name__
@ -120,7 +127,7 @@ class ModelToDot(object):
self.g = pydot.Dot()
self.g.set('rankdir', 'TB')
self.g.set('concentrate', True)
self.g.set_node_defaults(shape='record', fontname="Fira Mono")
self.g.set_node_defaults(shape='record')
if hasattr(model, 'outputs'):
# Graph
@ -136,8 +143,8 @@ class ModelToDot(object):
def to_graph(model, **kwargs):
"""
`recursive` controls wether we recursively explore container layers
`show_shape` controls wether the shape is shown in the graph
`recursive` controls whether we recursively explore container layers
`show_shape` controls whether the shape is shown in the graph
"""
return ModelToDot()(model, **kwargs)

@ -3,12 +3,12 @@ from setuptools import find_packages
setup(name='Keras',
version='0.3.0',
version='0.3.1',
description='Theano-based Deep Learning library',
author='Francois Chollet',
author_email='francois.chollet@gmail.com',
url='https://github.com/fchollet/keras',
download_url='https://github.com/fchollet/keras/tarball/0.3.0',
download_url='https://github.com/fchollet/keras/tarball/0.3.1',
license='MIT',
install_requires=['theano', 'pyyaml', 'six'],
extras_require={

@ -0,0 +1,46 @@
from __future__ import print_function
import numpy as np
import pytest
from keras.utils.test_utils import get_test_data
from keras.models import Sequential
from keras.layers.core import Dense, Flatten, Activation
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.utils.np_utils import to_categorical
def test_image_classification():
'''
Classify random 16x16 color images into several classes using logistic regression
with convolutional hidden layer.
'''
np.random.seed(1337)
input_shape = (3, 16, 16)
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
nb_test=200,
input_shape=input_shape,
classification=True,
nb_class=4)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# convolution kernel size
nb_conv = 3
# size of pooling area for max pooling
nb_pool = 2
model = Sequential([
Convolution2D(nb_filter=8, nb_row=nb_conv, nb_col=nb_conv, input_shape=input_shape),
MaxPooling2D(pool_size=(nb_pool, nb_pool)),
Flatten(),
Activation('relu'),
Dense(y_test.shape[-1], activation='softmax')
])
model.compile(loss='categorical_crossentropy', optimizer='sgd')
history = model.fit(X_train, y_train, nb_epoch=10, batch_size=16,
validation_data=(X_test, y_test),
show_accuracy=True, verbose=0)
assert(history.history['val_acc'][-1] > 0.85)
if __name__ == '__main__':
pytest.main([__file__])

@ -0,0 +1,131 @@
from __future__ import print_function
import numpy as np
import pytest
import string
from keras.utils.test_utils import get_test_data
from keras.models import Sequential
from keras.layers.core import TimeDistributedDense, Dropout, Dense
from keras.layers.recurrent import GRU, LSTM
from keras.utils.np_utils import to_categorical
def test_temporal_classification():
'''
Classify temporal sequences of float numbers of length 3 into 2 classes using
single layer of GRU units and softmax applied to the last activations of the units
'''
np.random.seed(1337)
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
nb_test=200,
input_shape=(3, 5),
classification=True,
nb_class=2)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
model = Sequential()
model.add(GRU(y_train.shape[-1],
input_shape=(X_train.shape[1], X_train.shape[2]),
activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adadelta')
history = model.fit(X_train, y_train, nb_epoch=5, batch_size=16,
validation_data=(X_test, y_test),
show_accuracy=True, verbose=0)
assert(history.history['val_acc'][-1] > 0.9)
def test_temporal_regression():
'''
Predict float numbers (regression) based on sequences of float numbers of length 3 using
single layer of GRU units
'''
np.random.seed(1337)
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
nb_test=200,
input_shape=(3, 5),
output_shape=(2,),
classification=False)
model = Sequential()
model.add(GRU(y_train.shape[-1],
input_shape=(X_train.shape[1], X_train.shape[2])))
model.compile(loss='hinge', optimizer='adam')
history = model.fit(X_train, y_train, nb_epoch=5, batch_size=16,
validation_data=(X_test, y_test), verbose=0)
assert(history.history['val_loss'][-1] < 0.75)
def test_sequence_to_sequence():
'''
Apply a same Dense layer for each element of time dimension of the input
and make predictions of the output sequence elements.
This does not make use of the temporal structure of the sequence
(see TimeDistributedDense for more details)
'''
np.random.seed(1337)
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
nb_test=200,
input_shape=(3, 5),
output_shape=(3, 5),
classification=False)
model = Sequential()
model.add(TimeDistributedDense(y_train.shape[-1],
input_shape=(X_train.shape[1], X_train.shape[2])))
model.compile(loss='hinge', optimizer='rmsprop')
history = model.fit(X_train, y_train, nb_epoch=20, batch_size=16,
validation_data=(X_test, y_test), verbose=0)
assert(history.history['val_loss'][-1] < 0.8)
def test_stacked_lstm_char_prediction():
'''
Learn alphabetical char sequence with stacked LSTM.
Predict the whole alphabet based on the first two letters ('ab' -> 'ab...z')
See non-toy example in examples/lstm_text_generation.py
'''
np.random.seed(1336)
# generate alphabet: http://stackoverflow.com/questions/16060899/alphabet-range-python
alphabet = string.ascii_lowercase
number_of_chars = len(alphabet)
# generate char sequences of length 'sequence_length' out of alphabet and store the next char as label (e.g. 'ab'->'c')
sequence_length = 2
sentences = [alphabet[i: i + sequence_length] for i in range(len(alphabet) - sequence_length)]
next_chars = [alphabet[i + sequence_length] for i in range(len(alphabet) - sequence_length)]
# Transform sequences and labels into 'one-hot' encoding
X = np.zeros((len(sentences), sequence_length, number_of_chars), dtype=np.bool)
y = np.zeros((len(sentences), number_of_chars), dtype=np.bool)
for i, sentence in enumerate(sentences):
for t, char in enumerate(sentence):
X[i, t, ord(char)-ord('a')] = 1
y[i, ord(next_chars[i])-ord('a')] = 1
# learn the alphabet with stacked LSTM
model = Sequential([
LSTM(16, return_sequences=True, input_shape=(sequence_length, number_of_chars)),
LSTM(16, return_sequences=False),
Dense(number_of_chars, activation='softmax')
])
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.fit(X, y, batch_size=1, nb_epoch=60, verbose=1)
# prime the model with 'ab' sequence and let it generate the learned alphabet
sentence = alphabet[:sequence_length]
generated = sentence
for iteration in range(number_of_chars-sequence_length):
x = np.zeros((1, sequence_length, number_of_chars))
for t, char in enumerate(sentence):
x[0, t, ord(char) - ord('a')] = 1.
preds = model.predict(x, verbose=0)[0]
next_char = chr(np.argmax(preds) + ord('a'))
generated += next_char
sentence = sentence[1:] + next_char
# check that it did generate the alphabet correctly
assert(generated == alphabet)
if __name__ == '__main__':
pytest.main([__file__])

@ -0,0 +1,63 @@
from __future__ import print_function
import numpy as np
import pytest
from keras.utils.test_utils import get_test_data
from keras.models import Sequential
from keras.layers.core import Dense
from keras.utils.np_utils import to_categorical
def test_vector_classification():
'''
Classify random float vectors into 2 classes with logistic regression
using 2 layer neural network with ReLU hidden units.
'''
np.random.seed(1337)
nb_hidden = 10
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
nb_test=200,
input_shape=(20,),
classification=True,
nb_class=2)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
model = Sequential([
Dense(nb_hidden, input_shape=(X_train.shape[-1],), activation='relu'),
Dense(y_train.shape[-1], activation='softmax')
])
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
history = model.fit(X_train, y_train, nb_epoch=15, batch_size=16,
validation_data=(X_test, y_test),
show_accuracy=True, verbose=0)
assert(history.history['val_acc'][-1] > 0.8)
def test_vector_regression():
'''
Perform float data prediction (regression) using 2 layer MLP
with tanh and sigmoid activations.
'''
np.random.seed(1337)
nb_hidden = 10
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
nb_test=200,
input_shape=(20,),
output_shape=(2,),
classification=False)
model = Sequential([
Dense(nb_hidden, input_shape=(X_train.shape[-1],), activation='tanh'),
Dense(y_train.shape[-1])
])
model.compile(loss='hinge', optimizer='adagrad')
history = model.fit(X_train, y_train, nb_epoch=20, batch_size=16,
validation_data=(X_test, y_test), verbose=0)
assert (history.history['val_loss'][-1] < 0.9)
if __name__ == '__main__':
pytest.main([__file__])

@ -64,6 +64,26 @@ class TestBackend(object):
check_single_tensor_operation('expand_dims', (4, 3, 2), dim=1)
check_single_tensor_operation('squeeze', (4, 3, 1), axis=2)
def test_repeat_elements(self):
reps = 3
for ndims in [1, 2, 3]:
shape = np.arange(2, 2+ndims)
arr = np.arange(np.prod(shape)).reshape(shape)
arr_th = KTH.variable(arr)
arr_tf = KTF.variable(arr)
for rep_axis in range(ndims):
np_rep = np.repeat(arr, reps, axis=rep_axis)
th_rep = KTH.eval(
KTH.repeat_elements(arr_th, reps, axis=rep_axis))
tf_rep = KTF.eval(
KTF.repeat_elements(arr_tf, reps, axis=rep_axis))
assert th_rep.shape == np_rep.shape
assert tf_rep.shape == np_rep.shape
assert_allclose(np_rep, th_rep, atol=1e-05)
assert_allclose(np_rep, tf_rep, atol=1e-05)
def test_value_manipulation(self):
val = np.random.random((4, 2))
xth = KTH.variable(val)
@ -261,9 +281,11 @@ class TestBackend(object):
check_two_tensor_operation('binary_crossentropy', (4, 2), (4, 2), from_logits=True)
check_two_tensor_operation('categorical_crossentropy', (4, 2), (4, 2), from_logits=True)
check_two_tensor_operation('binary_crossentropy', (4, 2), (4, 2), from_logits=False)
check_two_tensor_operation('categorical_crossentropy', (4, 2), (4, 2), from_logits=False)
check_single_tensor_operation('l2_normalize', (4, 3), axis=-1)
check_single_tensor_operation('l2_normalize', (4, 3), axis=1)
# def test_conv2d(self):
# '''conv2d works "properly" with Theano and TF but outputs different
# values in each case. Cause unclear (input / kernel shape format?)

@ -11,6 +11,7 @@ def test_cifar():
def test_reuters():
(X_train, y_train), (X_test, y_test) = reuters.load_data()
(X_train, y_train), (X_test, y_test) = reuters.load_data(maxlen=10)
def test_mnist():
@ -19,6 +20,7 @@ def test_mnist():
def test_imdb():
(X_train, y_train), (X_test, y_test) = imdb.load_data()
(X_train, y_train), (X_test, y_test) = imdb.load_data(maxlen=40)
if __name__ == '__main__':

@ -16,10 +16,10 @@ def test_layer_call():
W = np.asarray(K.eval(layer.W)).astype(K.floatx())
X = K.placeholder(ndim=2)
Y = layer(X)
F = K.function([X], [Y])
f = K.function([X], [Y])
x = np.ones((nb_samples, input_dim)).astype(K.floatx())
y = F([x])[0].astype(K.floatx())
y = f([x])[0].astype(K.floatx())
t = np.dot(x, W).astype(K.floatx())
assert_allclose(t, y, rtol=.2)
@ -31,16 +31,30 @@ def test_sequential_call():
model.add(Dense(output_dim=output_dim, input_dim=input_dim))
model.compile('sgd', 'mse')
# test flat model
X = K.placeholder(ndim=2)
Y = model(X)
F = K.function([X], [Y])
f = K.function([X], [Y])
x = np.ones((nb_samples, input_dim)).astype(K.floatx())
y1 = F([x])[0].astype(K.floatx())
y1 = f([x])[0].astype(K.floatx())
y2 = model.predict(x)
# results of __call__ should match model.predict
assert_allclose(y1, y2)
# test nested model
model2 = Sequential()
model2.add(model)
model2.compile('sgd', 'mse')
Y2 = model2(X)
f = K.function([X], [Y2])
y1 = f([x])[0].astype(K.floatx())
y2 = model2.predict(x)
# results of __call__ should match model.predict
assert_allclose(y1, y2)
if __name__ == '__main__':
pytest.main([__file__])

@ -188,17 +188,44 @@ def test_upsampling_2d():
input_nb_row = 11
input_nb_col = 12
input = np.ones((nb_samples, stack_size, input_nb_row, input_nb_col))
for length_row in [2, 3, 9]:
for length_col in [2, 3, 9]:
layer = convolutional.UpSampling2D(size=(length_row, length_col))
layer.input = K.variable(input)
for train in [True, False]:
out = K.eval(layer.get_output(train))
assert out.shape[2] == length_row * input_nb_row
assert out.shape[3] == length_col * input_nb_col
layer.get_config()
for dim_ordering in ['th', 'tf']:
if dim_ordering == 'th':
input = np.random.rand(nb_samples, stack_size, input_nb_row,
input_nb_col)
else: # tf
input = np.random.rand(nb_samples, input_nb_row, input_nb_col,
stack_size)
for length_row in [2, 3, 9]:
for length_col in [2, 3, 9]:
layer = convolutional.UpSampling2D(
size=(length_row, length_col),
input_shape=input.shape[1:],
dim_ordering=dim_ordering)
layer.input = K.variable(input)
for train in [True, False]:
out = K.eval(layer.get_output(train))
if dim_ordering == 'th':
assert out.shape[2] == length_row * input_nb_row
assert out.shape[3] == length_col * input_nb_col
else: # tf
assert out.shape[1] == length_row * input_nb_row
assert out.shape[2] == length_col * input_nb_col
# compare with numpy
if dim_ordering == 'th':
expected_out = np.repeat(input, length_row, axis=2)
expected_out = np.repeat(expected_out, length_col,
axis=3)
else: # tf
expected_out = np.repeat(input, length_row, axis=1)
expected_out = np.repeat(expected_out, length_col,
axis=2)
assert_allclose(out, expected_out)
layer.get_config()
if __name__ == '__main__':

@ -1,5 +1,6 @@
import pytest
import numpy as np
from keras.models import Sequential
from numpy.testing import assert_allclose
from keras import backend as K
@ -100,6 +101,11 @@ def test_time_dist_merge():
_runner(layer)
def test_highway():
layer = core.Highway(input_shape=(10,))
_runner(layer)
def test_autoencoder():
layer_1 = core.Layer()
layer_2 = core.Layer()
@ -108,11 +114,37 @@ def test_autoencoder():
_runner(layer)
def test_autoencoder_second_layer():
# regression test for issue #1275
encoder = core.Dense(input_dim=10, output_dim=2)
decoder = core.Dense(input_dim=2, output_dim=10)
model = Sequential()
model.add(core.Dense(input_dim=20, output_dim=10))
model.add(core.AutoEncoder(encoder=encoder, decoder=decoder,
output_reconstruction=False))
model.compile(loss='mse', optimizer='sgd')
def test_maxout_dense():
layer = core.MaxoutDense(10, 10, input_shape=(20,))
_runner(layer)
def test_naming():
layer = core.Dense(2, input_dim=2)
assert layer.name == 'dense'
model = Sequential()
model.add(core.Dense(2, input_dim=2, name='my_dense'))
model.add(core.Dense(2, name='my_dense'))
assert model.layers[0].name == 'my_dense'
assert model.layers[1].name == 'my_dense'
model.compile(optimizer='rmsprop', loss='mse')
model.train_on_batch(np.random.random((2, 2)), np.random.random((2, 2)))
@pytest.mark.skipif(K._BACKEND == 'tensorflow',
reason='currently not working with TensorFlow')
def test_sequences():
@ -175,6 +207,29 @@ def _runner(layer):
layer.trainable = True
layer.trainable = False
def test_siamese_all():
right_input_layer = core.Dense(7, input_dim=3)
left_input_layer = core.Dense(7, input_dim=3)
shared_layer = core.Dense(5,input_dim=7)
for mode in ['sum', 'mul', 'ave', 'concat']:
siamese_layer = core.Siamese(shared_layer, [left_input_layer, right_input_layer], merge_mode=mode)
siamese_layer.output_shape
siamese_layer.get_output()
@pytest.mark.skipif(K._BACKEND == 'tensorflow',
reason='currently not working with TensorFlow')
def test_siamese_theano_only():
right_input_layer = core.Dense(7, input_dim=3)
left_input_layer = core.Dense(7, input_dim=3)
shared_layer = core.Dense(5,input_dim=7)
for mode in ['dot', 'cos']:
siamese_layer = core.Siamese(shared_layer, [left_input_layer, right_input_layer], merge_mode=mode,
dot_axes=([1], [1]))
siamese_layer.output_shape
siamese_layer.get_output()
if __name__ == '__main__':
pytest.main([__file__])

@ -24,7 +24,7 @@ def test_unitnorm_constraint():
class_mode='binary')
lookup.train_on_batch(X1, np.array([[1], [0]], dtype='int32'))
norm = np.linalg.norm(K.get_value(lookup.params[0]), axis=1)
assert_allclose(norm, np.ones_like(norm).astype('float32'))
assert_allclose(norm, np.ones_like(norm).astype('float32'), rtol=1e-05)
if __name__ == '__main__':

@ -0,0 +1,41 @@
import pytest
import numpy as np
from keras import backend as K
from keras.layers import core
from keras.layers import noise
input_shape = (10, 10)
batch_input_shape = (10, 10, 10)
def test_GaussianNoise():
layer = noise.GaussianNoise(sigma=1., input_shape=input_shape)
_runner(layer)
def test_GaussianDropout():
layer = noise.GaussianDropout(p=0.2, input_shape=input_shape)
_runner(layer)
def _runner(layer):
assert isinstance(layer, core.Layer)
layer.build()
conf = layer.get_config()
assert (type(conf) == dict)
param = layer.get_params()
# Typically a list or a tuple, but may be any iterable
assert hasattr(param, '__iter__')
layer.input = K.variable(np.random.random(batch_input_shape))
output = layer.get_output(train=False)
output_np = K.eval(output)
assert output_np.shape == batch_input_shape
output = layer.get_output(train=True)
output_np = K.eval(output)
assert output_np.shape == batch_input_shape
if __name__ == '__main__':
pytest.main([__file__])

@ -1,9 +1,10 @@
import pytest
import numpy as np
from keras.layers.core import Dense, Activation
from numpy.testing import assert_allclose
from keras.layers import normalization
from keras.models import Sequential
from keras.models import Sequential, Graph
from keras import backend as K
@ -83,6 +84,9 @@ def test_batchnorm_config():
norm = normalization.BatchNormalization(input_shape=(10, 10), mode=1,
epsilon=0.1, momentum=0.9)
conf = norm.get_config()
del conf['cache_enabled']
del conf['trainable']
del conf['custom_name']
conf_target = {"input_shape": (10, 10),
"name": normalization.BatchNormalization.__name__,
"epsilon": 0.1, "mode": 1, "momentum": 0.9}
@ -97,5 +101,27 @@ def test_batchnorm_save_weights():
norm.set_weights(weights)
def test_batchnorm_nested():
# regression test for issue #1386
g = Graph()
g.add_input("input", input_shape=[20])
g.add_node(Dense(10), "dense", "input")
g.add_node(normalization.BatchNormalization(), "bn", "dense")
g.add_node(Activation('relu'), "activ", "bn")
g.add_output("output", "activ")
g2 = Graph()
g2.add_input("input", input_shape=[10])
g2.add_node(Dense(15), "dense", "input")
g2.add_node(normalization.BatchNormalization(), "bn", "dense")
g2.add_node(Activation('relu'), "activ", "bn")
g2.add_output("output", "activ")
model = Sequential()
model.add(g)
model.add(g2)
model.compile(loss="mse", optimizer="adadelta")
if __name__ == '__main__':
pytest.main([__file__])

@ -0,0 +1,53 @@
import numpy as np
from numpy.testing import assert_allclose
import pytest
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing.sequence import make_sampling_table
from keras.preprocessing.sequence import skipgrams
def test_pad_sequences():
a = [[1], [1, 2], [1, 2, 3]]
# test padding
b = pad_sequences(a, maxlen=3, padding='pre')
assert_allclose(b, [[0, 0, 1], [0, 1, 2], [1, 2, 3]])
b = pad_sequences(a, maxlen=3, padding='post')
assert_allclose(b, [[1, 0, 0], [1, 2, 0], [1, 2, 3]])
# test truncating
b = pad_sequences(a, maxlen=2, truncating='pre')
assert_allclose(b, [[0, 1], [1, 2], [2, 3]])
b = pad_sequences(a, maxlen=2, truncating='post')
assert_allclose(b, [[0, 1], [1, 2], [1, 2]])
# test value
b = pad_sequences(a, maxlen=3, value=1)
assert_allclose(b, [[1, 1, 1], [1, 1, 2], [1, 2, 3]])
def test_make_sampling_table():
a = make_sampling_table(3)
assert_allclose(a, np.asarray([0.00315225, 0.00315225, 0.00547597]),
rtol=.1)
def test_skipgrams():
# test with no window size and binary labels
couples, labels = skipgrams(np.arange(3), vocabulary_size=3)
for couple in couples:
assert couple[0] in [0, 1, 2] and couple[1] in [0, 1, 2]
# test window size and categorical labels
couples, labels = skipgrams(np.arange(5), vocabulary_size=5, window_size=1,
categorical=True)
for couple in couples:
assert couple[0] - couple[1] <= 3
for l in labels:
assert len(l) == 2
if __name__ == '__main__':
pytest.main([__file__])

@ -0,0 +1,34 @@
from keras.preprocessing.text import Tokenizer, one_hot
import pytest
import numpy as np
def test_one_hot():
text = 'The cat sat on the mat.'
encoded = one_hot(text, 5)
assert len(encoded) == 6
assert np.max(encoded) <= 4
assert np.min(encoded) >= 0
def test_tokenizer():
texts = ['The cat sat on the mat.',
'The dog sat on the log.',
'Dogs and cats living together.']
tokenizer = Tokenizer(nb_words=10)
tokenizer.fit_on_texts(texts)
sequences = []
for seq in tokenizer.texts_to_sequences_generator(texts):
sequences.append(seq)
assert np.max(np.max(sequences)) < 10
assert np.min(np.min(sequences)) == 1
tokenizer.fit_on_sequences(sequences)
for mode in ['binary', 'count', 'tfidf', 'freq']:
matrix = tokenizer.texts_to_matrix(texts, mode)
if __name__ == '__main__':
pytest.main([__file__])

@ -0,0 +1,198 @@
import pytest
import os
import sys
import numpy as np
np.random.seed(1337)
from keras import callbacks
from keras.models import Graph, Sequential
from keras.layers.core import Dense
from keras.utils.test_utils import get_test_data
from keras import backend as K
from keras.utils import np_utils
input_dim = 2
nb_hidden = 4
nb_class = 2
batch_size = 5
train_samples = 20
test_samples = 20
def test_ModelCheckpoint():
filepath = 'checkpoint.h5'
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=train_samples,
nb_test=test_samples,
input_shape=(input_dim,),
classification=True,
nb_class=nb_class)
y_test = np_utils.to_categorical(y_test)
y_train = np_utils.to_categorical(y_train)
# case 1
monitor = 'val_loss'
save_best_only = False
mode = 'auto'
model = Sequential()
model.add(Dense(nb_hidden, input_dim=input_dim, activation='relu'))
model.add(Dense(nb_class, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
cbks = [callbacks.ModelCheckpoint(filepath, monitor=monitor,
save_best_only=save_best_only, mode=mode)]
model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=1)
assert os.path.exists(filepath)
os.remove(filepath)
# case 2
mode = 'min'
cbks = [callbacks.ModelCheckpoint(filepath, monitor=monitor,
save_best_only=save_best_only, mode=mode)]
model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=1)
assert os.path.exists(filepath)
os.remove(filepath)
# case 3
mode = 'max'
monitor = 'val_acc'
cbks = [callbacks.ModelCheckpoint(filepath, monitor=monitor,
save_best_only=save_best_only, mode=mode)]
model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=1)
assert os.path.exists(filepath)
os.remove(filepath)
# case 4
save_best_only = True
cbks = [callbacks.ModelCheckpoint(filepath, monitor=monitor,
save_best_only=save_best_only, mode=mode)]
model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=1)
assert os.path.exists(filepath)
os.remove(filepath)
def test_EarlyStopping():
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=train_samples,
nb_test=test_samples,
input_shape=(input_dim,),
classification=True,
nb_class=nb_class)
y_test = np_utils.to_categorical(y_test)
y_train = np_utils.to_categorical(y_train)
model = Sequential()
model.add(Dense(nb_hidden, input_dim=input_dim, activation='relu'))
model.add(Dense(nb_class, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
mode = 'max'
monitor = 'val_acc'
patience = 0
cbks = [callbacks.EarlyStopping(patience=patience, monitor=monitor, mode=mode)]
history = model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=20)
mode = 'auto'
monitor = 'val_acc'
patience = 2
cbks = [callbacks.EarlyStopping(patience=patience, monitor=monitor, mode=mode)]
history = model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=20)
def test_LearningRateScheduler():
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=train_samples,
nb_test=test_samples,
input_shape=(input_dim,),
classification=True,
nb_class=nb_class)
y_test = np_utils.to_categorical(y_test)
y_train = np_utils.to_categorical(y_train)
model = Sequential()
model.add(Dense(nb_hidden, input_dim=input_dim, activation='relu'))
model.add(Dense(nb_class, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='sgd')
cbks = [callbacks.LearningRateScheduler(lambda x: 1. / (1. + x))]
model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=5)
assert (float(K.get_value(model.optimizer.lr)) - 0.2) < K.epsilon()
@pytest.mark.skipif((K._BACKEND != 'tensorflow') or (sys.version_info[0] == 3),
reason="Requires tensorflow backend")
def test_TensorBoard():
import shutil
import tensorflow as tf
import keras.backend.tensorflow_backend as KTF
old_session = KTF._get_session()
filepath = './logs'
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=train_samples,
nb_test=test_samples,
input_shape=(input_dim,),
classification=True,
nb_class=nb_class)
y_test = np_utils.to_categorical(y_test)
y_train = np_utils.to_categorical(y_train)
# case 1 Sequential wo accuracy
with tf.Graph().as_default():
session = tf.Session('')
KTF._set_session(session)
model = Sequential()
model.add(Dense(nb_hidden, input_dim=input_dim, activation='relu'))
model.add(Dense(nb_class, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='sgd')
tsb = callbacks.TensorBoard(log_dir=filepath, histogram_freq=1)
cbks = [tsb]
model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=2)
assert os.path.exists(filepath)
shutil.rmtree(filepath)
# case 2 Sequential w accuracy
with tf.Graph().as_default():
session = tf.Session('')
KTF._set_session(session)
model = Sequential()
model.add(Dense(nb_hidden, input_dim=input_dim, activation='relu'))
model.add(Dense(nb_class, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='sgd')
tsb = callbacks.TensorBoard(log_dir=filepath, histogram_freq=1)
cbks = [tsb]
model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=2)
assert os.path.exists(filepath)
shutil.rmtree(filepath)
# case 3 Graph
with tf.Graph().as_default():
session = tf.Session('')
KTF._set_session(session)
model = Graph()
model.add_input(name='X_vars', input_shape=(input_dim, ))
model.add_node(Dense(nb_hidden, activation="sigmoid"),
name='Dense1', input='X_vars')
model.add_node(Dense(nb_class, activation="softmax"),
name='last_dense',
input='Dense1')
model.add_output(name='output', input='last_dense')
model.compile(optimizer='sgd', loss={'output': 'mse'})
tsb = callbacks.TensorBoard(log_dir=filepath, histogram_freq=1)
cbks = [tsb]
model.fit({'X_vars': X_train, 'output': y_train},
batch_size=batch_size,
validation_data={'X_vars': X_test, 'output': y_test},
callbacks=cbks, nb_epoch=2)
assert os.path.exists(filepath)
shutil.rmtree(filepath)
KTF._set_session(old_session)
if __name__ == '__main__':
pytest.main([__file__])

@ -0,0 +1,85 @@
import pytest
import numpy as np
from keras import initializations
from keras import backend as K
SHAPE = (100, 100)
def _runner(init, shape, target_mean=None, target_std=None,
target_max=None, target_min=None):
variable = init(shape)
output = K.get_value(variable)
lim = 1e-2
if target_std is not None:
assert abs(output.std() - target_std) < lim
if target_mean is not None:
assert abs(output.mean() - target_mean) < lim
if target_max is not None:
assert abs(output.max() - target_max) < lim
if target_min is not None:
assert abs(output.min() - target_min) < lim
def test_uniform():
_runner(initializations.uniform, SHAPE, target_mean=0.,
target_max=0.05, target_min=-0.05)
def test_normal():
_runner(initializations.normal, SHAPE, target_mean=0., target_std=0.05)
def test_lecun_uniform():
scale = np.sqrt(3. / SHAPE[0])
_runner(initializations.lecun_uniform, SHAPE,
target_mean=0., target_max=scale, target_min=-scale)
def test_glorot_uniform():
scale = np.sqrt(6. / (SHAPE[0] + SHAPE[1]))
_runner(initializations.glorot_uniform, SHAPE, target_mean=0.,
target_max=scale, target_min=-scale)
def test_glorot_normal():
scale = np.sqrt(2. / (SHAPE[0] + SHAPE[1]))
_runner(initializations.glorot_normal, SHAPE,
target_mean=0., target_std=scale)
def test_he_uniform():
scale = np.sqrt(6. / SHAPE[0])
_runner(initializations.he_uniform, SHAPE, target_mean=0.,
target_max=scale, target_min=-scale)
def test_he_normal():
scale = np.sqrt(2. / SHAPE[0])
_runner(initializations.he_normal, SHAPE,
target_mean=0., target_std=scale)
def test_orthogonal():
_runner(initializations.orthogonal, SHAPE,
target_mean=0.)
def test_identity():
_runner(initializations.identity, SHAPE,
target_mean=1./SHAPE[0], target_max=1.)
def test_zero():
_runner(initializations.zero, SHAPE,
target_mean=0., target_max=0.)
def test_one():
_runner(initializations.one, SHAPE,
target_mean=1., target_max=1.)
if __name__ == '__main__':
pytest.main([__file__])

@ -6,13 +6,13 @@ np.random.seed(1337)
from keras import backend as K
from keras.models import Graph, Sequential, model_from_json, model_from_yaml
from keras.layers.core import Dense, Activation, Merge, Lambda, LambdaMerge
from keras.layers.core import Dense, Activation, Merge, Lambda, LambdaMerge, Siamese, add_shared_layer
from keras.layers import containers
from keras.utils import np_utils
from keras.utils.test_utils import get_test_data
import os
from keras.utils.layer_utils import model_summary
input_dim = 32
nb_hidden = 16
@ -20,24 +20,63 @@ nb_class = 4
batch_size = 32
nb_epoch = 1
train_samples = 2000
test_samples = 500
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=train_samples,
nb_test=test_samples,
input_shape=(input_dim,),
classification=True,
nb_class=4)
y_test = np_utils.to_categorical(y_test)
y_train = np_utils.to_categorical(y_train)
def _get_test_data():
np.random.seed(1234)
train_samples = 2000
test_samples = 500
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=train_samples,
nb_test=test_samples,
input_shape=(input_dim,),
classification=True,
nb_class=4)
y_test = np_utils.to_categorical(y_test)
y_train = np_utils.to_categorical(y_train)
return (X_train, y_train), (X_test, y_test)
####################
# SEQUENTIAL TEST #
####################
def test_sequential_fit_generator():
(X_train, y_train), (X_test, y_test) = _get_test_data()
def data_generator(train):
if train:
max_batch_index = len(X_train) // batch_size
else:
max_batch_index = len(X_test) // batch_size
i = 0
while 1:
if train:
yield (X_train[i * batch_size: (i + 1) * batch_size], y_train[i * batch_size: (i + 1) * batch_size])
else:
yield (X_test[i * batch_size: (i + 1) * batch_size], y_test[i * batch_size: (i + 1) * batch_size])
i += 1
i = i % max_batch_index
model = Sequential()
model.add(Dense(nb_hidden, input_shape=(input_dim,)))
model.add(Activation('relu'))
model.add(Dense(nb_class))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model.fit_generator(data_generator(True), len(X_train), nb_epoch, show_accuracy=False)
model.fit_generator(data_generator(True), len(X_train), nb_epoch, show_accuracy=True)
model.fit_generator(data_generator(True), len(X_train), nb_epoch, show_accuracy=False, validation_data=(X_test, y_test))
model.fit_generator(data_generator(True), len(X_train), nb_epoch, show_accuracy=True, validation_data=(X_test, y_test))
loss = model.evaluate(X_train, y_train, verbose=0)
assert(loss < 0.9)
def test_sequential():
(X_train, y_train), (X_test, y_test) = _get_test_data()
model = Sequential()
model.add(Dense(nb_hidden, input_shape=(input_dim,)))
model.add(Activation('relu'))
@ -55,8 +94,8 @@ def test_sequential():
model.train_on_batch(X_train[:32], y_train[:32])
loss = model.evaluate(X_train, y_train, verbose=0)
assert(loss < 0.7)
loss = model.evaluate(X_test, y_test, verbose=0)
assert(loss < 0.8)
model.predict(X_test, verbose=0)
model.predict_classes(X_test, verbose=0)
@ -74,7 +113,7 @@ def test_sequential():
model.load_weights(fname)
os.remove(fname)
nloss = model.evaluate(X_train, y_train, verbose=0)
nloss = model.evaluate(X_test, y_test, verbose=0)
assert(loss == nloss)
# test json serialization
@ -87,6 +126,7 @@ def test_sequential():
def test_merge_sum():
(X_train, y_train), (X_test, y_test) = _get_test_data()
left = Sequential()
left.add(Dense(nb_hidden, input_shape=(input_dim,)))
left.add(Activation('relu'))
@ -108,8 +148,8 @@ def test_merge_sum():
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0)
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0, shuffle=False)
loss = model.evaluate([X_train, X_train], y_train, verbose=0)
assert(loss < 0.7)
loss = model.evaluate([X_test, X_test], y_test, verbose=0)
assert(loss < 0.8)
model.predict([X_test, X_test], verbose=0)
model.predict_classes([X_test, X_test], verbose=0)
@ -133,13 +173,15 @@ def test_merge_sum():
os.remove(fname)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
nloss = model.evaluate([X_train, X_train], y_train, verbose=0)
nloss = model.evaluate([X_test, X_test], y_test, verbose=0)
assert(loss == nloss)
@pytest.mark.skipif(K._BACKEND == 'tensorflow',
reason='currently not working with TensorFlow')
def test_merge_dot():
(X_train, y_train), (X_test, y_test) = _get_test_data()
left = Sequential()
left.add(Dense(input_dim=input_dim, output_dim=nb_hidden))
left.add(Activation('relu'))
@ -172,6 +214,8 @@ def test_merge_dot():
def test_merge_concat():
(X_train, y_train), (X_test, y_test) = _get_test_data()
left = Sequential()
left.add(Dense(nb_hidden, input_shape=(input_dim,)))
left.add(Activation('relu'))
@ -193,8 +237,8 @@ def test_merge_concat():
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0)
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0, shuffle=False)
loss = model.evaluate([X_train, X_train], y_train, verbose=0)
assert(loss < 0.7)
loss = model.evaluate([X_test, X_test], y_test, verbose=0)
assert(loss < 0.8)
model.predict([X_test, X_test], verbose=0)
model.predict_classes([X_test, X_test], verbose=0)
@ -221,11 +265,12 @@ def test_merge_concat():
model.load_weights(fname)
os.remove(fname)
nloss = model.evaluate([X_train, X_train], y_train, verbose=0)
nloss = model.evaluate([X_test, X_test], y_test, verbose=0)
assert(loss == nloss)
def test_merge_recursivity():
(X_train, y_train), (X_test, y_test) = _get_test_data()
left = Sequential()
left.add(Dense(nb_hidden, input_shape=(input_dim,)))
left.add(Activation('relu'))
@ -256,8 +301,8 @@ def test_merge_recursivity():
model.fit([X_train, X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0)
model.fit([X_train, X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0, shuffle=False)
loss = model.evaluate([X_train, X_train, X_train], y_train, verbose=0)
assert(loss < 0.7)
loss = model.evaluate([X_test, X_test, X_test], y_test, verbose=0)
assert(loss < 0.8)
model.predict([X_test, X_test, X_test], verbose=0)
model.predict_classes([X_test, X_test, X_test], verbose=0)
@ -269,11 +314,12 @@ def test_merge_recursivity():
model.load_weights(fname)
os.remove(fname)
nloss = model.evaluate([X_train, X_train, X_train], y_train, verbose=0)
nloss = model.evaluate([X_test, X_test, X_test], y_test, verbose=0)
assert(loss == nloss)
def test_merge_overlap():
(X_train, y_train), (X_test, y_test) = _get_test_data()
left = Sequential()
left.add(Dense(nb_hidden, input_shape=(input_dim,)))
left.add(Activation('relu'))
@ -293,7 +339,7 @@ def test_merge_overlap():
model.train_on_batch(X_train[:32], y_train[:32])
loss = model.evaluate(X_train, y_train, verbose=0)
loss = model.evaluate(X_test, y_test, verbose=0)
assert(loss < 0.9)
model.predict(X_test, verbose=0)
model.predict_classes(X_test, verbose=0)
@ -305,11 +351,12 @@ def test_merge_overlap():
model.load_weights(fname)
os.remove(fname)
nloss = model.evaluate(X_train, y_train, verbose=0)
nloss = model.evaluate(X_test, y_test, verbose=0)
assert(loss == nloss)
def test_lambda():
(X_train, y_train), (X_test, y_test) = _get_test_data()
def func(X):
s = X[0]
for i in range(1, len(X)):
@ -344,8 +391,8 @@ def test_lambda():
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0)
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0, shuffle=False)
loss = model.evaluate([X_train, X_train], y_train, verbose=0)
assert(loss < 0.7)
loss = model.evaluate([X_test, X_test], y_test, verbose=0)
assert(loss < 0.8)
model.predict([X_test, X_test], verbose=0)
model.predict_classes([X_test, X_test], verbose=0)
@ -370,7 +417,7 @@ def test_lambda():
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
os.remove(fname)
nloss = model.evaluate([X_train, X_train], y_train, verbose=0)
nloss = model.evaluate([X_test, X_test], y_test, verbose=0)
assert(loss == nloss)
@ -388,14 +435,136 @@ def test_sequential_count_params():
model.add(Dense(nb_units))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
assert(n == model.count_params())
model.compile('sgd', 'binary_crossentropy')
assert(n == model.count_params())
def test_siamese_1():
(X_train, y_train), (X_test, y_test) = _get_test_data()
left = Sequential()
left.add(Dense(nb_hidden, input_shape=(input_dim,)))
left.add(Activation('relu'))
right = Sequential()
right.add(Dense(nb_hidden, input_shape=(input_dim,)))
right.add(Activation('relu'))
model = Sequential()
model.add(Siamese(Dense(nb_hidden), [left, right], merge_mode='sum'))
model.add(Dense(nb_class))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=True, verbose=0, validation_data=([X_test, X_test], y_test))
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=False, verbose=0, validation_data=([X_test, X_test], y_test))
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=True, verbose=0, validation_split=0.1)
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=False, verbose=0, validation_split=0.1)
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0)
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0, shuffle=False)
loss = model.evaluate([X_test, X_test], y_test, verbose=0)
assert(loss < 0.8)
model.predict([X_test, X_test], verbose=0)
model.predict_classes([X_test, X_test], verbose=0)
model.predict_proba([X_test, X_test], verbose=0)
model.get_config(verbose=0)
# test weight saving
fname = 'test_siamese_1.h5'
model.save_weights(fname, overwrite=True)
left = Sequential()
left.add(Dense(nb_hidden, input_shape=(input_dim,)))
left.add(Activation('relu'))
right = Sequential()
right.add(Dense(nb_hidden, input_shape=(input_dim,)))
right.add(Activation('relu'))
model = Sequential()
model.add(Siamese(Dense(nb_hidden), [left, right], merge_mode='sum'))
model.add(Dense(nb_class))
model.add(Activation('softmax'))
model.load_weights(fname)
os.remove(fname)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
nloss = model.evaluate([X_test, X_test], y_test, verbose=0)
assert(loss == nloss)
def test_siamese_2():
(X_train, y_train), (X_test, y_test) = _get_test_data()
left = Sequential()
left.add(Dense(nb_hidden, input_shape=(input_dim,)))
left.add(Activation('relu'))
right = Sequential()
right.add(Dense(nb_hidden, input_shape=(input_dim,)))
right.add(Activation('relu'))
add_shared_layer(Dense(nb_hidden), [left, right])
left.add(Dense(nb_hidden))
right.add(Dense(nb_hidden))
add_shared_layer(Dense(nb_hidden), [left, right])
model = Sequential()
model.add(Merge([left, right], mode='sum'))
model.add(Dense(nb_class))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=True, verbose=0, validation_data=([X_test, X_test], y_test))
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=False, verbose=0, validation_data=([X_test, X_test], y_test))
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=True, verbose=0, validation_split=0.1)
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=False, verbose=0, validation_split=0.1)
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0)
model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0, shuffle=False)
loss = model.evaluate([X_test, X_test], y_test, verbose=0)
assert(loss < 0.8)
model.predict([X_test, X_test], verbose=0)
model.predict_classes([X_test, X_test], verbose=0)
model.predict_proba([X_test, X_test], verbose=0)
model.get_config(verbose=0)
# test weight saving
fname = 'test_siamese_2.h5'
model.save_weights(fname, overwrite=True)
left = Sequential()
left.add(Dense(nb_hidden, input_shape=(input_dim,)))
left.add(Activation('relu'))
right = Sequential()
right.add(Dense(nb_hidden, input_shape=(input_dim,)))
right.add(Activation('relu'))
add_shared_layer(Dense(nb_hidden), [left, right])
left.add(Dense(nb_hidden))
right.add(Dense(nb_hidden))
add_shared_layer(Dense(nb_hidden), [left, right])
model = Sequential()
model.add(Merge([left, right], mode='sum'))
model.add(Dense(nb_class))
model.add(Activation('softmax'))
model.load_weights(fname)
os.remove(fname)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
nloss = model.evaluate([X_test, X_test], y_test, verbose=0)
assert(loss == nloss)
###############
# GRAPH TEST #
###############
@ -412,6 +581,35 @@ def test_sequential_count_params():
output_shape=(1,))
def test_graph_fit_generator():
def data_generator_graph(train):
while 1:
if train:
yield {'input1': X_train_graph, 'output1': y_train_graph}
else:
yield {'input1': X_test_graph, 'output1': y_test_graph}
graph = Graph()
graph.add_input(name='input1', input_shape=(32,))
graph.add_node(Dense(16), name='dense1', input='input1')
graph.add_node(Dense(4), name='dense2', input='input1')
graph.add_node(Dense(4), name='dense3', input='dense1')
graph.add_output(name='output1',
inputs=['dense2', 'dense3'],
merge_mode='sum')
graph.compile('rmsprop', {'output1': 'mse'})
graph.fit_generator(data_generator_graph(True), 1000, nb_epoch=4)
graph.fit_generator(data_generator_graph(True), 1000, nb_epoch=4)
graph.fit_generator(data_generator_graph(True), 1000, nb_epoch=4, validation_data={'input1': X_test_graph, 'output1': y_test_graph})
graph.fit_generator(data_generator_graph(True), 1000, nb_epoch=4, validation_data={'input1': X_test_graph, 'output1': y_test_graph})
loss = graph.evaluate({'input1': X_test_graph, 'output1': y_test_graph}, verbose=0)
assert(loss < 3.)
def test_1o_1i():
# test a non-sequential graph with 1 input and 1 output
np.random.seed(1337)
@ -435,7 +633,7 @@ def test_1o_1i():
assert(len(out) == 1)
loss = graph.test_on_batch({'input1': X_test_graph, 'output1': y_test_graph})
loss = graph.train_on_batch({'input1': X_test_graph, 'output1': y_test_graph})
loss = graph.evaluate({'input1': X_test_graph, 'output1': y_test_graph})
loss = graph.evaluate({'input1': X_test_graph, 'output1': y_test_graph}, verbose=0)
assert(loss < 2.5)
# test validation split
@ -507,6 +705,89 @@ def test_1o_2i():
graph.get_config(verbose=1)
def test_siamese_3():
graph = Graph()
graph.add_input(name='input1', input_shape=(32,))
graph.add_input(name='input2', input_shape=(32,))
graph.add_shared_node(Dense(16), name='shared', inputs=['input1', 'input2'], merge_mode='sum')
graph.add_node(Dense(4), name='dense1', input='shared')
graph.add_node(Dense(4), name='dense2', input='dense1')
graph.add_output(name='output1', input='dense2')
graph.compile('rmsprop', {'output1': 'mse'})
graph.fit({'input1': X_train_graph, 'input2': X2_train_graph, 'output1': y_train_graph},
nb_epoch=10)
out = graph.predict({'input1': X_test_graph, 'input2': X2_test_graph})
assert(type(out == dict))
assert(len(out) == 1)
loss = graph.test_on_batch({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
loss = graph.train_on_batch({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
loss = graph.evaluate({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
assert(loss < 3.0)
graph.get_config(verbose=1)
def test_siamese_4():
graph = Graph()
graph.add_input(name='input1', input_shape=(32,))
graph.add_input(name='input2', input_shape=(32,))
graph.add_shared_node(Dense(16), name='shared1', inputs=['input1', 'input2'])
graph.add_shared_node(Dense(4), name='shared2', inputs=['shared1'])
graph.add_shared_node(Dense(4), name='shared3', inputs=['shared2'], merge_mode='sum')
graph.add_node(Dense(4), name='dense', input='shared3')
graph.add_output(name='output1', input='dense',
merge_mode='sum')
graph.compile('rmsprop', {'output1': 'mse'})
graph.fit({'input1': X_train_graph, 'input2': X2_train_graph, 'output1': y_train_graph},
nb_epoch=10)
out = graph.predict({'input1': X_test_graph, 'input2': X2_test_graph})
assert(type(out == dict))
assert(len(out) == 1)
loss = graph.test_on_batch({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
loss = graph.train_on_batch({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
loss = graph.evaluate({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
assert(loss < 3.0)
graph.get_config(verbose=1)
def test_siamese_5():
graph = Graph()
graph.add_input(name='input1', input_shape=(32,))
graph.add_input(name='input2', input_shape=(32,))
graph.add_shared_node(Dense(16), name='shared1', inputs=['input1', 'input2'])
graph.add_shared_node(Dense(4), name='shared2', inputs=['shared1'])
graph.add_shared_node(Dense(4), name='shared3', inputs=['shared2'], outputs=['shared_output1','shared_output2'])
graph.add_node(Dense(4), name='dense1', input='shared_output1')
graph.add_node(Dense(4), name='dense2', input='shared_output2')
graph.add_output(name='output1', inputs=['dense1', 'dense2'],
merge_mode='sum')
graph.compile('rmsprop', {'output1': 'mse'})
graph.fit({'input1': X_train_graph, 'input2': X2_train_graph, 'output1': y_train_graph},
nb_epoch=10)
out = graph.predict({'input1': X_test_graph, 'input2': X2_test_graph})
assert(type(out == dict))
assert(len(out) == 1)
loss = graph.test_on_batch({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
loss = graph.train_on_batch({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
loss = graph.evaluate({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
assert(loss < 3.0)
graph.get_config(verbose=1)
def test_2o_1i_weights():
# test a non-sequential graph with 1 input and 2 outputs
graph = Graph()

@ -2,7 +2,7 @@ from __future__ import print_function
import pytest
from keras.utils.test_utils import get_test_data
from keras.optimizers import SGD, RMSprop, Adagrad, Adadelta, Adam
from keras.optimizers import SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.utils.np_utils import to_categorical
@ -32,28 +32,34 @@ def _test_optimizer(optimizer, target=0.9):
history = model.fit(X_train, y_train, nb_epoch=12, batch_size=16,
validation_data=(X_test, y_test),
show_accuracy=True, verbose=2)
return history.history['val_acc'][-1] > target
config = optimizer.get_config()
assert type(config) == dict
assert history.history['val_acc'][-1] > target
def test_sgd():
sgd = SGD(lr=0.01, momentum=0.9, nesterov=True)
assert(_test_optimizer(sgd))
_test_optimizer(sgd)
def test_rmsprop():
assert(_test_optimizer(RMSprop()))
_test_optimizer(RMSprop())
def test_adagrad():
assert(_test_optimizer(Adagrad()))
_test_optimizer(Adagrad())
def test_adadelta():
assert(_test_optimizer(Adadelta()))
_test_optimizer(Adadelta())
def test_adam():
assert(_test_optimizer(Adam()))
_test_optimizer(Adam())
def test_adamax():
_test_optimizer(Adamax())
if __name__ == '__main__':

@ -22,10 +22,19 @@ def check_layer_output_shape(layer, input_data):
# Core #
########
def test_Reshape():
layer = Reshape(dims=(2, 3))
input_data = np.random.random((2, 6))
layer = Reshape(dims=(2, 3))
check_layer_output_shape(layer, input_data)
layer = Reshape(dims=(-1,))
check_layer_output_shape(layer, input_data)
layer = Reshape(dims=(-1, 2))
check_layer_output_shape(layer, input_data)
layer = Reshape(dims=(2, -1))
check_layer_output_shape(layer, input_data)
def test_Permute():
layer = Permute(dims=(1, 3, 2))

@ -1,129 +0,0 @@
from __future__ import print_function
import numpy as np
import pytest
np.random.seed(1337)
from keras.utils.test_utils import get_test_data
from keras.models import Sequential
from keras.layers.core import Dense, TimeDistributedDense, Flatten
from keras.layers.recurrent import GRU
from keras.layers.convolutional import Convolution2D
from keras.utils.np_utils import to_categorical
def test_vector_classification():
nb_hidden = 10
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
nb_test=200,
input_shape=(20,),
classification=True,
nb_class=2)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
model = Sequential([
Dense(nb_hidden, input_shape=(X_train.shape[-1],), activation='relu'),
Dense(y_train.shape[-1], activation='softmax')
])
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
history = model.fit(X_train, y_train, nb_epoch=15, batch_size=16,
validation_data=(X_test, y_test),
show_accuracy=True, verbose=0)
assert(history.history['val_acc'][-1] > 0.8)
def test_vector_regression():
nb_hidden = 10
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
nb_test=200,
input_shape=(20,),
output_shape=(2,),
classification=False)
model = Sequential([
Dense(nb_hidden, input_shape=(X_train.shape[-1],), activation='tanh'),
Dense(y_train.shape[-1])
])
model.compile(loss='hinge', optimizer='adagrad')
history = model.fit(X_train, y_train, nb_epoch=20, batch_size=16,
validation_data=(X_test, y_test), verbose=0)
assert (history.history['val_loss'][-1] < 0.9)
def test_temporal_classification():
np.random.seed(1337)
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
nb_test=200,
input_shape=(3, 5),
classification=True,
nb_class=2)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
model = Sequential()
model.add(GRU(y_train.shape[-1],
input_shape=(X_train.shape[1], X_train.shape[2]),
activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adadelta')
history = model.fit(X_train, y_train, nb_epoch=20, batch_size=16,
validation_data=(X_test, y_test),
show_accuracy=True, verbose=0)
assert(history.history['val_acc'][-1] > 0.9)
def test_temporal_regression():
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
nb_test=200,
input_shape=(3, 5),
output_shape=(2,),
classification=False)
model = Sequential()
model.add(GRU(y_train.shape[-1],
input_shape=(X_train.shape[1], X_train.shape[2])))
model.compile(loss='hinge', optimizer='adam')
history = model.fit(X_train, y_train, nb_epoch=20, batch_size=16,
validation_data=(X_test, y_test), verbose=0)
assert(history.history['val_loss'][-1] < 0.8)
def test_sequence_to_sequence():
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
nb_test=200,
input_shape=(3, 5),
output_shape=(3, 5),
classification=False)
model = Sequential()
model.add(TimeDistributedDense(y_train.shape[-1],
input_shape=(X_train.shape[1], X_train.shape[2])))
model.compile(loss='hinge', optimizer='rmsprop')
history = model.fit(X_train, y_train, nb_epoch=20, batch_size=16,
validation_data=(X_test, y_test), verbose=0)
assert(history.history['val_loss'][-1] < 0.8)
def test_image_classification():
(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
nb_test=200,
input_shape=(3, 8, 8),
classification=True,
nb_class=2)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
model = Sequential([
Convolution2D(8, 8, 8, input_shape=(3, 8, 8), activation='sigmoid'),
Flatten(),
Dense(y_test.shape[-1], activation='softmax')
])
model.compile(loss='categorical_crossentropy', optimizer='sgd')
history = model.fit(X_train, y_train, nb_epoch=20, batch_size=16,
validation_data=(X_test, y_test),
show_accuracy=True, verbose=0)
assert(history.history['val_acc'][-1] > 0.9)
if __name__ == '__main__':
pytest.main([__file__])