Merge pull request #5 from fchollet/master

update
2016-01-13 17:34:42 -08:00 · 2016-01-13 17:34:42 -08:00 · bd2ff26b37
commit bd2ff26b37
parent 2ee8917836 58a94a9b05
60 changed files with 3210 additions and 488 deletions
--- a/.travis.yml
+++ b/.travis.yml
@ -11,6 +11,8 @@ matrix:
          env: KERAS_BACKEND=theano
        - python: 2.7
          env: KERAS_BACKEND=tensorflow
+        - python: 2.7
+          env: KERAS_BACKEND=theano INTEGRATION_TESTS=true
 install:
  # code below is taken from http://conda.pydata.org/docs/travis.html
  # We do this conditionally because it saves us some downloading if the
@ -55,6 +57,10 @@ script:
  # set up keras backend
  - sed -i -e 's/"backend":[[:space:]]*"[^"]*/"backend":\ "'$KERAS_BACKEND'/g' ~/.keras/keras.json;
  - echo -e "Running tests with the following config:\n$(cat ~/.keras/keras.json)"
-  - PYTHONPATH=$PWD:$PYTHONPATH py.test tests/
+  - if [[ "$INTEGRATION_TESTS" == "true" ]]; then
+       PYTHONPATH=$PWD:$PYTHONPATH py.test tests/integration_tests;
+    else
+       PYTHONPATH=$PWD:$PYTHONPATH py.test tests/ --ignore=tests/integration_tests;
+    fi
 after_success:
  - coveralls
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -22,13 +22,13 @@ The more information you provide, the easier it is for us to validate that there

 ## Requesting a Feature

-You can also use Github issues to request features you would like to see in Keras, or changes in the Keras API. 
+You can also use Github issues to request features you would like to see in Keras, or changes in the Keras API.

 1. Provide a clear and detailed explanation of the feature you want and why it's important to add. Keep in mind that we want features that will be useful to the majority of our users and not just a small subset. If you're just targeting a minority of users, consider writing an add-on library for Keras. It is crucial for Keras to avoid bloating the API and codebase.

 2. Provide code snippets demonstrating the API you have in mind and illustrating the use cases of your feature. Of course, you don't need to write any real code at this point!

-3. After disussing the feature you may choose to attempt a Pull Request. If you're at all able, start writing some code. We always have more work to do than time to do it. If you can write some code then that will speed the process along.
+3. After discussing the feature you may choose to attempt a Pull Request. If you're at all able, start writing some code. We always have more work to do than time to do it. If you can write some code then that will speed the process along.

 ## Pull Requests

@ -49,12 +49,12 @@ We love pull requests. Here's a quick guide:
  - with the Theano backend, on Python 2.7 and Python 3.5
  - with the TensorFlow backend, on Python 2.7

-7. When committing, use appropriate, descriptive commit messages. Make sure that your branch history is not a string of "bug fix", "fix", "oops", etc. When submitting your PR, squash your commit history into 1-3 easy to follow commits, to make sure the project history stays clean and readable.
+7. When committing, use appropriate, descriptive commit messages. Make sure that your branch history is not a string of "bug fix", "fix", "oops", etc. When submitting your PR, squash your commits into a single commit with an appropriate commit message, to make sure the project history stays clean and readable. See ['rebase and squash'](http://rebaseandsqua.sh/) for technical help on how to squash your commits.

 8. Update the documentation. If introducing new functionality, make sure you include code snippets demonstrating the usage of your new feature.

-9. Submit your PR. If your changes have been approved in a previous discussion, and if you have have complete (and passing) unit tests, your PR is likely to be merged promptly. Otherwise, well...
+9. Submit your PR. If your changes have been approved in a previous discussion, and if you have complete (and passing) unit tests, your PR is likely to be merged promptly. Otherwise, well...

 ## Adding new examples

-Even if you don't contribute to the Keras source code, if you have an application of Keras that is concise and powerful, please consider adding it to our collection of examples. Existing examples show idiomatic Keras code: make sure to keep your own script in the same spirit.
+Even if you don't contribute to the Keras source code, if you have an application of Keras that is concise and powerful, please consider adding it to our collection of examples. [Existing examples](https://github.com/fchollet/keras/tree/master/examples) show idiomatic Keras code: make sure to keep your own script in the same spirit.
--- a/README.md
+++ b/README.md
@ -4,9 +4,10 @@

 ## You have just found Keras.

-Keras is a minimalist, highly modular neural networks library, written in Python and capable of running either on top of either [TensorFlow](https://github.com/tensorflow/tensorflow) or [Theano](https://github.com/Theano/Theano). It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
+Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either [TensorFlow](https://github.com/tensorflow/tensorflow) or [Theano](https://github.com/Theano/Theano). It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

 Use Keras if you need a deep learning library that:
+
 - allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).
 - supports both convolutional networks and recurrent networks, as well as combinations of the two.
 - supports arbitrary connectivity schemes (including multi-input and multi-output training).
@ -36,7 +37,7 @@ Keras is compatible with: __Python 2.7-3.5__.

 ## Getting started: 30 seconds to Keras

-The core datastructure of Keras is a __model__, a way to organize layers. There are two types of models: [`Sequential`](http://keras.io/models/#sequential) and [`Graph`](http://keras.io/models/#graph).
+The core data structure of Keras is a __model__, a way to organize layers. There are two types of models: [`Sequential`](http://keras.io/models/#sequential) and [`Graph`](http://keras.io/models/#graph).

 Here's the `Sequential` model (a linear pile of layers):

@ -109,6 +110,7 @@ Keras uses the following dependencies:
 - Optional but recommended if you use CNNs: cuDNN.

 *When using the Theano backend:*
+
 - Theano
    - [See installation instructions](http://deeplearning.net/software/theano/install.html#install).

@ -118,6 +120,7 @@ sudo pip install git+git://github.com/Theano/Theano.git
 ```

 *When using the TensorFlow backend:*
+
 - TensorFlow
    - [See installation instructions](https://github.com/tensorflow/tensorflow#download-and-setup).

--- a/docs/autogen.py
+++ b/docs/autogen.py
@ -80,6 +80,10 @@ def get_method_signature(method):
    for a in args:
        st += str(a) + ', '
    for a, v in kwargs:
+        if  type(v) == str:
+            v = '\'' + v + '\''
+        elif type(v) == unicode:
+            v = 'u\'' + v + '\''
        st += str(a) + '=' + str(v) + ', '
    if kwargs or args:
        return st[:-2] + ')'
@ -246,4 +250,7 @@ for module, module_name in MODULES:
        print('...inserting autogenerated content into template:', path)
    else:
        print('...creating new page with autogenerated content:', path)
+    subdir = os.path.dirname(path)
+    if not os.path.exists(subdir):
+        os.makedirs(subdir)
    open(path, 'w').write(module_page)
--- a/docs/templates/backend.md
+++ b/docs/templates/backend.md
@ -23,6 +23,15 @@ It probably looks like this:

 Simply change the field `backend` to either `"theano"` or `"tensorflow"`, and Keras will use the new configuration next time you run any Keras code.

+You can also define the environment variable ``KERAS_BACKEND`` and this will
+override what is defined in your config file :
+
+```bash
+KERAS_BACKEND=tensorflow python -c "from keras import backend; print backend._BACKEND"
+Using TensorFlow backend.
+tensorflow
+```
+
 ## Using the abstract Keras backend to write new code

 If you want the Keras modules you write to be compatible with both Theano and TensorFlow, you have to write them via the abstract Keras backend API. Here's an intro.
--- a/docs/templates/faq.md
+++ b/docs/templates/faq.md
@ -20,6 +20,8 @@

 [How can I record the training / validation loss / accuracy at each epoch?](#how-can-i-record-the-training-validation-loss-accuracy-at-each-epoch)

+[How can I use stateful RNNs?](#how-can-i-use-stateful-rnns)
+
 ---

 ### How can I run Keras on GPU?
@ -105,22 +107,22 @@ You can build a Theano function that will return the output of a certain layer g

 ```python
 # with a Sequential model
-get_3rd_layer_output = theano.function([model.layers[0].input], 
+get_3rd_layer_output = theano.function([model.layers[0].input],
                                       model.layers[3].get_output(train=False))
 layer_output = get_3rd_layer_output(X)

 # with a Graph model
 get_conv_layer_output = theano.function([model.inputs[i].input for i in model.input_order],
-                                        model.outputs['conv'].get_output(train=False),
+                                        model.nodes['conv'].get_output(train=False),
                                        on_unused_input='ignore')
-conv_output = get_conv_output(input_data_dict)
+conv_output = get_conv_layer_output([input_data_dict[i] for i in model.input_order])
 ```

 ---

 ### Isn't there a bug with Merge or Graph related to input concatenation?

-Yes, there was a known bug with tensor concatenation in Thenao that was fixed early 2015. 
+Yes, there was a known bug with tensor concatenation in Theano that was fixed early 2015.
 Please upgrade to the latest version of Theano:

 ```bash
@ -153,7 +155,7 @@ Find out more in the [callbacks documentation](callbacks.md).

 ### How is the validation split computed?

-If you set the `validation_split` arugment in `model.fit` to e.g. 0.1, then the validation data used will be the *last 10%* of the data. If you set it to 0.25, it will be the last 25% of the data, etc.
+If you set the `validation_split` argument in `model.fit` to e.g. 0.1, then the validation data used will be the *last 10%* of the data. If you set it to 0.25, it will be the last 25% of the data, etc.


 ---
@ -176,4 +178,52 @@ hist = model.fit(X, y, validation_split=0.2)
 print(hist.history)
 ```

---
+---
+
+### How can I use stateful RNNs?
+
+Making a RNN stateful means that the states for the samples of each batch will be reused as initial states for the samples in the next batch.
+
+When using stateful RNNs, it is therefore assumed that:
+
+- all batches have the same number of samples
+- If `X1` and `X2` are successive batches of samples, then `X2[i]` is the follow-up sequence to `X1[i]`, for every `i`.
+
+To use statefulness in RNNs, you need to:
+
+- explicitly specify the batch size you are using, by passing a `batch_input_shape` argument to the first layer in your model. It should be a tuple of integers, e.g. `(32, 10, 16)` for a 32-samples batch of sequences of 10 timesteps with 16 features per timestep.
+- set `stateful=True` in your RNN layer(s).
+
+To reset the states accumulated:
+
+- use `model.reset_states()` to reset the states of all layers in the model
+- use `layer.reset_states()` to reset the states of a specific stateful RNN layer
+
+Example:
+
+```python
+
+X  # this is our input data, of shape (32, 21, 16)
+# we will feed it to our model in sequences of length 10
+
+model = Sequential()
+model.add(LSTM(32, batch_input_shape=(32, 10, 16), stateful=True))
+model.add(Dense(16, activation='softmax'))
+
+model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
+
+# we train the network to predict the 11th timestep given the first 10:
+model.train_on_batch(X[:, :10, :], np.reshape(X[:, 10, :], (32, 16)))
+
+# the state of the network has changed. We can feed the follow-up sequences:
+model.train_on_batch(X[:, 10:20, :], np.reshape(X[:, 20, :], (32, 16)))
+
+# let's reset the states of the LSTM layer:
+model.reset_states()
+
+# another way to do it in this case:
+model.layers[0].reset_states()
+```
+
+Notes that the methods `predict`, `fit`, `train_on_batch`, `predict_classes`, etc. will *all* update the states of the stateful layers in a model. This allows you to do not only stateful training, but also stateful prediction.
+
--- a/docs/templates/index.md
+++ b/docs/templates/index.md
@ -2,9 +2,10 @@

 ## You have just found Keras.

-Keras is a minimalist, highly modular neural networks library, written in Python and capable of running either on top of either [TensorFlow](https://github.com/tensorflow/tensorflow) or [Theano](https://github.com/Theano/Theano). It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
+Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either [TensorFlow](https://github.com/tensorflow/tensorflow) or [Theano](https://github.com/Theano/Theano). It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

 Use Keras if you need a deep learning library that:
+
 - allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).
 - supports both convolutional networks and recurrent networks, as well as combinations of the two.
 - supports arbitrary connectivity schemes (including multi-input and multi-output training).
@ -34,7 +35,7 @@ Keras is compatible with: __Python 2.7-3.5__.

 ## Getting started: 30 seconds to Keras

-The core datastructure of Keras is a __model__, a way to organize layers. There are two types of models: [`Sequential`](/models/#sequential) and [`Graph`](/models/#graph).
+The core datastructure of Keras is a __model__, a way to organize layers. There are two types of models: [`Sequential`](http://keras.io/models/#sequential) and [`Graph`](http://keras.io/models/#graph).

 Here's the `Sequential` model (a linear pile of layers):

@ -107,6 +108,7 @@ Keras uses the following dependencies:
 - Optional but recommended if you use CNNs: cuDNN.

 *When using the Theano backend:*
+
 - Theano
    - [See installation instructions](http://deeplearning.net/software/theano/install.html#install).

@ -116,6 +118,7 @@ sudo pip install git+git://github.com/Theano/Theano.git
 ```

 *When using the TensorFlow backend:*
+
 - TensorFlow
    - [See installation instructions](https://github.com/tensorflow/tensorflow#download-and-setup).

@ -157,4 +160,4 @@ Keras was initially developed as part of the research effort of project ONEIROS

 >_"Oneiroi are beyond our unravelling --who can be sure what tale they tell? Not all that men look for comes to pass. Two gates there are that give passage to fleeting Oneiroi; one is made of horn, one of ivory. The Oneiroi that pass through sawn ivory are deceitful, bearing a message that will not be fulfilled; those that come out through polished horn have truth behind them, to be accomplished for men who see them."_ Homer, Odyssey 19. 562 ff (Shewring translation).

------------------
+------------------
--- a/docs/templates/objectives.md
+++ b/docs/templates/objectives.md
@ -27,3 +27,5 @@ For a few examples of such functions, check out the [objectives source](https://
 - __hinge__
 - __binary_crossentropy__: Also known as logloss. 
 - __categorical_crossentropy__: Also known as multiclass logloss. __Note__: using this objective requires that your labels are binary arrays of shape `(nb_samples, nb_classes)`.
+- __poisson__: mean of `(predictions - targets * log(predictions))`
+- __cosine_proximity__: the opposite (negative) of the mean cosine proximity between predictions and targets.
--- a/examples/antirectifier.py
+++ b/examples/antirectifier.py
@ -0,0 +1,106 @@
+'''The example demonstrates how to write custom layers for Keras.
+
+We build a custom activation layer called 'Antirectifier',
+which modifies the shape of the tensor that passes through it.
+We need to specify two methods: `output_shape` and `get_output`.
+
+Note that the same result can also be achieved via a Lambda layer.
+
+Because our custom layer is written with primitives from the Keras
+backend (`K`), our code can run both on TensorFlow and Theano.
+'''
+
+from __future__ import print_function
+import numpy as np
+from keras.models import Sequential
+from keras.layers.core import Dense, Dropout, Layer, Activation
+from keras.datasets import mnist
+from keras import backend as K
+from keras.utils import np_utils
+
+
+class Antirectifier(Layer):
+    '''This is the combination of a sample-wise
+    L2 normalization with the concatenation of the
+    positive part of the input with the negative part
+    of the input. The result is a tensor of samples that are
+    twice as large as the input samples.
+
+    It can be used in place of a ReLU.
+
+    # Input shape
+        2D tensor of shape (samples, n)
+
+    # Output shape
+        2D tensor of shape (samples, 2*n)
+
+    # Theoretical justification
+        When applying ReLU, assuming that the distribution
+        of the previous output is approximately centered around 0.,
+        you are discarding half of your input. This is inefficient.
+
+        Antirectifier allows to return all-positive outputs like ReLU,
+        without discarding any data.
+
+        Tests on MNIST show that Antirectifier allows to train networks
+        with twice less parameters yet with comparable
+        classification accuracy as an equivalent ReLU-based network.
+    '''
+    @property
+    def output_shape(self):
+        shape = list(self.input_shape)
+        assert len(shape) == 2  # only valid for 2D tensors
+        shape[-1] *= 2
+        return tuple(shape)
+
+    def get_output(self, train):
+        x = self.get_input(train)
+        x -= K.mean(x, axis=1, keepdims=True)
+        x = K.l2_normalize(x, axis=1)
+        pos = K.relu(x)
+        neg = K.relu(-x)
+        return K.concatenate([pos, neg], axis=1)
+
+# global parameters
+batch_size = 128
+nb_classes = 10
+nb_epoch = 40
+
+# the data, shuffled and split between tran and test sets
+(X_train, y_train), (X_test, y_test) = mnist.load_data()
+
+X_train = X_train.reshape(60000, 784)
+X_test = X_test.reshape(10000, 784)
+X_train = X_train.astype('float32')
+X_test = X_test.astype('float32')
+X_train /= 255
+X_test /= 255
+print(X_train.shape[0], 'train samples')
+print(X_test.shape[0], 'test samples')
+
+# convert class vectors to binary class matrices
+Y_train = np_utils.to_categorical(y_train, nb_classes)
+Y_test = np_utils.to_categorical(y_test, nb_classes)
+
+# build the model
+model = Sequential()
+model.add(Dense(256, input_shape=(784,)))
+model.add(Antirectifier())
+model.add(Dropout(0.1))
+model.add(Dense(256))
+model.add(Antirectifier())
+model.add(Dropout(0.1))
+model.add(Dense(10))
+model.add(Activation('softmax'))
+
+# compile the model
+model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
+
+# train the model
+model.fit(X_train, Y_train,
+          batch_size=batch_size, nb_epoch=nb_epoch,
+          show_accuracy=True, verbose=1,
+          validation_data=(X_test, Y_test))
+
+# next, compare with an equivalent network
+# with2x bigger Dense layers and ReLU
--- a/examples/babi_memnn.py
+++ b/examples/babi_memnn.py
@ -3,13 +3,13 @@
 References:
 - Jason Weston, Antoine Bordes, Sumit Chopra, Tomas Mikolov, Alexander M. Rush,
  "Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks",
-  http://arxiv.org/abs/1503.08895
+  http://arxiv.org/abs/1502.05698

 - Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus,
  "End-To-End Memory Networks",
  http://arxiv.org/abs/1503.08895

-Reaches 93% accuracy on task 'single_supporting_fact_10k' after 70 epochs.
+Reaches 98.6% accuracy on task 'single_supporting_fact_10k' after 120 epochs.
 Time per epoch: 3s on CPU (core i7).
 '''

@ -153,12 +153,14 @@ input_encoder_m = Sequential()
 input_encoder_m.add(Embedding(input_dim=vocab_size,
                              output_dim=64,
                              input_length=story_maxlen))
+input_encoder_m.add(Dropout(0.3))
 # output: (samples, story_maxlen, embedding_dim)
 # embed the question into a sequence of vectors
 question_encoder = Sequential()
 question_encoder.add(Embedding(input_dim=vocab_size,
                               output_dim=64,
                               input_length=query_maxlen))
+question_encoder.add(Dropout(0.3))
 # output: (samples, query_maxlen, embedding_dim)
 # compute a 'match' between input sequence elements (which are vectors)
 # and the question vector sequence
@ -172,6 +174,7 @@ input_encoder_c = Sequential()
 input_encoder_c.add(Embedding(input_dim=vocab_size,
                              output_dim=query_maxlen,
                              input_length=story_maxlen))
+input_encoder_c.add(Dropout(0.3))
 # output: (samples, story_maxlen, query_maxlen)
 # sum the match vector with the input vector:
 response = Sequential()
@ -185,9 +188,9 @@ answer = Sequential()
 answer.add(Merge([response, question_encoder], mode='concat', concat_axis=-1))
 # the original paper uses a matrix multiplication for this reduction step.
 # we choose to use a RNN instead.
-answer.add(LSTM(64))
+answer.add(LSTM(32))
 # one regularization layer -- more would probably be needed.
-answer.add(Dropout(0.25))
+answer.add(Dropout(0.3))
 answer.add(Dense(vocab_size))
 # we output a probability distribution over the vocabulary
 answer.add(Activation('softmax'))
@ -196,6 +199,6 @@ answer.compile(optimizer='rmsprop', loss='categorical_crossentropy')
 # Note: you could use a Graph model to avoid repeat the input twice
 answer.fit([inputs_train, queries_train, inputs_train], answers_train,
           batch_size=32,
-           nb_epoch=70,
+           nb_epoch=120,
           show_accuracy=True,
           validation_data=([inputs_test, queries_test, inputs_test], answers_test))
--- a/examples/deep_dream.py
+++ b/examples/deep_dream.py
@ -0,0 +1,198 @@
+'''Deep Dreaming in Keras.
+
+Run the script with:
+```
+python deep_dream.py path_to_your_base_image.jpg prefix_for_results
+```
+e.g.:
+```
+python deep_dream.py img/mypic.jpg results/dream
+```
+
+It is preferrable to run this script on GPU, for speed.
+If running on CPU, prefer the TensorFlow backend (much faster).
+
+Example results: http://i.imgur.com/FX6ROg9.jpg
+'''
+from __future__ import print_function
+from scipy.misc import imread, imresize, imsave
+import numpy as np
+from scipy.optimize import fmin_l_bfgs_b
+import time
+import argparse
+import h5py
+
+from keras.models import Sequential
+from keras.layers.convolutional import Convolution2D, ZeroPadding2D, MaxPooling2D
+from keras import backend as K
+
+parser = argparse.ArgumentParser(description='Deep Dreams with Keras.')
+parser.add_argument('base_image_path', metavar='base', type=str,
+                    help='Path to the image to transform.')
+parser.add_argument('result_prefix', metavar='res_prefix', type=str,
+                    help='Prefix for the saved results.')
+
+args = parser.parse_args()
+base_image_path = args.base_image_path
+result_prefix = args.result_prefix
+
+# dimensions of the generated picture.
+img_width = 600
+img_height = 600
+
+# path to the model weights file.
+weights_path = 'vgg16_weights.h5'
+
+# some settings we found interesting
+saved_settings = {
+    'bad_trip': {'features': {'conv4_1': 0.05,
+                              'conv4_2': 0.01,
+                              'conv4_3': 0.01},
+                 'continuity': 0.1,
+                 'dream_l2': 0.8,
+                 'jitter': 5},
+    'dreamy': {'features': {'conv5_1': 0.05,
+                            'conv5_2': 0.02},
+               'continuity': 0.1,
+               'dream_l2': 0.02,
+               'jitter': 0},
+}
+# the settings we will use in this experiment
+settings = saved_settings['dreamy']
+
+# util function to open, resize and format pictures into appropriate tensors
+def preprocess_image(image_path):
+    img = imresize(imread(image_path), (img_width, img_height))
+    img = img.transpose((2, 0, 1)).astype('float64')
+    img = np.expand_dims(img, axis=0)
+    return img
+
+# util function to convert a tensor into a valid image
+def deprocess_image(x):
+    x = x.transpose((1, 2, 0))
+    x = np.clip(x, 0, 255).astype('uint8')
+    return x
+
+# this will contain our generated image
+dream = K.placeholder((1, 3, img_width, img_height))
+
+# build the VGG16 network with our dream as input
+first_layer = ZeroPadding2D((1, 1), input_shape=(3, img_width, img_height))
+first_layer.input = dream
+
+model = Sequential()
+model.add(first_layer)
+model.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_1'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_2'))
+model.add(MaxPooling2D((2, 2), strides=(2, 2)))
+
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_1'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_2'))
+model.add(MaxPooling2D((2, 2), strides=(2, 2)))
+
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_1'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_2'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_3'))
+model.add(MaxPooling2D((2, 2), strides=(2, 2)))
+
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_1'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_2'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_3'))
+model.add(MaxPooling2D((2, 2), strides=(2, 2)))
+
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_1'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_2'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_3'))
+model.add(MaxPooling2D((2, 2), strides=(2, 2)))
+
+# load the weights of the VGG16 networks
+# (trained on ImageNet, won the ILSVRC competition in 2014)
+# note: when there is a complete match between your model definition
+# and your weight savefile, you can simply call model.load_weights(filename)
+f = h5py.File(weights_path)
+for k in range(f.attrs['nb_layers']):
+    if k >= len(model.layers):
+        # we don't look at the last (fully-connected) layers in the savefile
+        break
+    g = f['layer_{}'.format(k)]
+    weights = [g['param_{}'.format(p)] for p in range(g.attrs['nb_params'])]
+    model.layers[k].set_weights(weights)
+f.close()
+print('Model loaded.')
+
+# get the symbolic outputs of each "key" layer (we gave them unique names).
+layer_dict = dict([(layer.name, layer) for layer in model.layers])
+
+# continuity loss util function
+def continuity_loss(x):
+    assert K.ndim(x) == 4
+    a = K.square(x[:, :, :img_width-1, :img_height-1] - x[:, :, 1:, :img_height-1])
+    b = K.square(x[:, :, :img_width-1, :img_height-1] - x[:, :, :img_width-1, 1:])
+    return K.sum(K.pow(a + b, 1.25))
+
+# define the loss
+loss = K.variable(0.)
+for layer_name in settings['features']:
+    # add the L2 norm of the features of a layer to the loss
+    assert layer_name in layer_dict.keys(), 'Layer ' + layer_name + ' not found in model.'
+    coeff = settings['features'][layer_name]
+    x = layer_dict[layer_name].get_output()
+    shape = layer_dict[layer_name].output_shape
+    # we avoid border artifacts by only involving non-border pixels in the loss
+    loss -= coeff * K.sum(K.square(x[:, :, 2: shape[2]-2, 2: shape[3]-2])) / np.prod(shape[1:])
+
+# add continuity loss (gives image local coherence, can result in an artful blur)
+loss += settings['continuity'] * continuity_loss(dream) / (3 * img_width * img_height)
+# add image L2 norm to loss (prevents pixels from taking very high values, makes image darker)
+loss += settings['dream_l2'] * K.sum(K.square(dream)) / (3 * img_width * img_height)
+
+# feel free to further modify the loss as you see fit, to achieve new effects...
+
+# compute the gradients of the dream wrt the loss
+grads = K.gradients(loss, dream)
+
+# set up helper functions to extract the loss and gradients
+# from the computational graph as Numpy arrays
+f_grads = K.function([dream], grads)
+def eval_grads(x):
+    x = x.reshape((1, 3, img_width, img_height))
+    return np.array(f_grads([x])).flatten().astype('float64')
+
+f_loss = K.function([dream], [loss])
+def eval_loss(x):
+    x = x.reshape((1, 3, img_width, img_height))
+    return f_loss([x])[0].astype('float64')
+
+# add a random jitter to the initial image. This will be reverted at decoding time
+random_jitter = (settings['jitter'] * 2) * (np.random.random((3, img_width, img_height)) - 0.5)
+x = preprocess_image(base_image_path)
+x += random_jitter
+
+# run scipy-based optimization (L-BFGS) over the pixels of the generated image
+# so as to minimize the loss
+for i in range(5):
+    start_time = time.time()
+    x, min_val, info = fmin_l_bfgs_b(eval_loss, x.flatten(),
+                                     fprime=eval_grads, maxfun=7)
+    print('Current loss value:', min_val)
+    # decode the dream and save it
+    x = x.reshape((3, img_width, img_height))
+    x -= random_jitter
+    img = deprocess_image(x)
+    fname = result_prefix + '_at_iteration_%d.png' % i
+    imsave(fname, img)
+    end_time = time.time()
+    print('Image saved as', fname)
+    print('Iteration %d completed in %ds' % (i, end_time - start_time))
--- a/examples/lstm_text_generation.py
+++ b/examples/lstm_text_generation.py
@ -85,7 +85,7 @@ for iteration in range(1, 60):
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

-        for iteration in range(400):
+        for i in range(400):
            x = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x[0, t, char_indices[char]] = 1.
--- a/examples/neural_style_transfer.py
+++ b/examples/neural_style_transfer.py
@ -0,0 +1,256 @@
+'''Neural style transfer with Keras.
+
+Before running this script, download the weights for the VGG16 model at:
+https://drive.google.com/file/d/0Bz7KyqmuGsilT0J5dmRCM0ROVHc/view?usp=sharing
+(source: https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3)
+and make sure the variable `weights_path` in this script matches the location of the file.
+
+Run the script with:
+```
+python neural_style.py path_to_your_base_image.jpg path_to_your_reference.jpg prefix_for_results
+```
+e.g.:
+```
+python neural_style.py img/tuebingen.jpg img/starry_night.jpg results/my_result
+```
+
+It is preferrable to run this script on GPU, for speed.
+If running on CPU, prefer the TensorFlow backend (much faster).
+
+Example result: https://twitter.com/fchollet/status/686631033085677568
+
+# Details
+
+Style transfer consists in generating an image
+with the same "content" as a base image, but with the
+"style" of a different picture (typically artistic).
+
+This is achieved through the optimization of a loss function
+that has 3 components: "style loss", "content loss",
+and "total variation loss":
+
+- The total variation loss imposes local spatial continuity between
+the pixels of the combination image, giving it visual coherence.
+
+- The style loss is where the deep learning keeps in --that one is defined
+using a deep convolutional neural network. Precisely, it consists in a sum of
+L2 distances betwen the Gram matrices of the representations of
+the base image and the style reference image, extracted from
+different layers of a convnet (trained on ImageNet). The general idea
+is to capture color/texture information at different spatial
+scales (fairly large scales --defined by the depth of the layer considered).
+
+ - The content loss is a L2 distance between the features of the base
+image (extracted from a deep layer) and the features of the combination image,
+keeping the generated image close enough to the original one.
+
+# References
+    - [A Neural Algorithm of Artistic Style](http://arxiv.org/abs/1508.06576)
+'''
+
+from __future__ import print_function
+from scipy.misc import imread, imresize, imsave
+import numpy as np
+from scipy.optimize import fmin_l_bfgs_b
+import time
+import argparse
+import h5py
+
+from keras.models import Sequential
+from keras.layers.convolutional import Convolution2D, ZeroPadding2D, MaxPooling2D
+from keras import backend as K
+
+parser = argparse.ArgumentParser(description='Neural style transfer with Keras.')
+parser.add_argument('base_image_path', metavar='base', type=str,
+                    help='Path to the image to transform.')
+parser.add_argument('style_reference_image_path', metavar='ref', type=str,
+                    help='Path to the style reference image.')
+parser.add_argument('result_prefix', metavar='res_prefix', type=str,
+                    help='Prefix for the saved results.')
+
+args = parser.parse_args()
+base_image_path = args.base_image_path
+style_reference_image_path = args.style_reference_image_path
+result_prefix = args.result_prefix
+weights_path = 'vgg16_weights.h5'
+
+# these are the weights of the different loss components
+total_variation_weight = 1.
+style_weight = 1.
+content_weight = 0.025
+
+# dimensions of the generated picture.
+img_width = 400
+img_height = 400
+assert img_height == img_width, 'Due to the use of the Gram matrix, width and height must match.'
+
+# util function to open, resize and format pictures into appropriate tensors
+def preprocess_image(image_path):
+    img = imresize(imread(image_path), (img_width, img_height))
+    img = img.transpose((2, 0, 1)).astype('float64')
+    img = np.expand_dims(img, axis=0)
+    return img
+
+# util function to convert a tensor into a valid image
+def deprocess_image(x):
+    x = x.transpose((1, 2, 0))
+    x = np.clip(x, 0, 255).astype('uint8')
+    return x
+
+# get tensor representations of our images
+base_image = K.variable(preprocess_image(base_image_path))
+style_reference_image = K.variable(preprocess_image(style_reference_image_path))
+
+# this will contain our generated image
+combination_image = K.placeholder((1, 3, img_width, img_height))
+
+# combine the 3 images into a single Keras tensor
+input_tensor = K.concatenate([base_image,
+                              style_reference_image,
+                              combination_image], axis=0)
+
+# build the VGG16 network with our 3 images as input
+first_layer = ZeroPadding2D((1, 1), input_shape=(3, img_width, img_height))
+first_layer.input = input_tensor
+
+model = Sequential()
+model.add(first_layer)
+model.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_1'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(64, 3, 3, activation='relu'))
+model.add(MaxPooling2D((2, 2), strides=(2, 2)))
+
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_1'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(128, 3, 3, activation='relu'))
+model.add(MaxPooling2D((2, 2), strides=(2, 2)))
+
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_1'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(256, 3, 3, activation='relu'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(256, 3, 3, activation='relu'))
+model.add(MaxPooling2D((2, 2), strides=(2, 2)))
+
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_1'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_2'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(512, 3, 3, activation='relu'))
+model.add(MaxPooling2D((2, 2), strides=(2, 2)))
+
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_1'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(512, 3, 3, activation='relu'))
+model.add(ZeroPadding2D((1, 1)))
+model.add(Convolution2D(512, 3, 3, activation='relu'))
+model.add(MaxPooling2D((2, 2), strides=(2, 2)))
+
+# load the weights of the VGG16 networks
+# (trained on ImageNet, won the ILSVRC competition in 2014)
+# note: when there is a complete match between your model definition
+# and your weight savefile, you can simply call model.load_weights(filename)
+f = h5py.File(weights_path)
+for k in range(f.attrs['nb_layers']):
+    if k >= len(model.layers):
+        # we don't look at the last (fully-connected) layers in the savefile
+        break
+    g = f['layer_{}'.format(k)]
+    weights = [g['param_{}'.format(p)] for p in range(g.attrs['nb_params'])]
+    model.layers[k].set_weights(weights)
+f.close()
+print('Model loaded.')
+
+# get the symbolic outputs of each "key" layer (we gave them unique names).
+outputs_dict = dict([(layer.name, layer.get_output()) for layer in model.layers])
+
+# compute the neural style loss
+# first we need to define 4 util functions
+
+# the gram matrix of an image tensor (feature-wise outer product)
+def gram_matrix(x):
+    assert K.ndim(x) == 3
+    features = K.batch_flatten(x)
+    gram = K.dot(features, K.transpose(features))
+    return gram
+
+# the "style loss" is designed to maintain
+# the style of the reference image in the generated image.
+# It is based on the gram matrices (which capture style) of
+# feature maps from the style reference image
+# and from the generated image
+def style_loss(style, combination):
+    assert K.ndim(style) == 3
+    assert K.ndim(combination) == 3
+    S = gram_matrix(style)
+    C = gram_matrix(combination)
+    channels = 3
+    size = img_width * img_height
+    return K.sum(K.square(S - C)) / (4. * (channels ** 2) * (size ** 2))
+
+# an auxiliary loss function
+# designed to maintain the "content" of the
+# base image in the generated image
+def content_loss(base, combination):
+    return K.sum(K.square(combination - base))
+
+# the 3rd loss function, total variation loss,
+# designed to keep the generated image locally coherent
+def total_variation_loss(x):
+    assert K.ndim(x) == 4
+    a = K.square(x[:, :, :img_width-1, :img_height-1] - x[:, :, 1:, :img_height-1])
+    b = K.square(x[:, :, :img_width-1, :img_height-1] - x[:, :, :img_width-1, 1:])
+    return K.sum(K.pow(a + b, 1.25))
+
+# combine these loss functions into a single scalar
+loss = K.variable(0.)
+layer_features = outputs_dict['conv4_2']
+base_image_features = layer_features[0, :, :, :]
+combination_features = layer_features[2, :, :, :]
+loss += content_weight * content_loss(base_image_features,
+                                      combination_features)
+
+feature_layers = ['conv1_1', 'conv2_1', 'conv3_1', 'conv4_1', 'conv5_1']
+for layer_name in feature_layers:
+    layer_features = outputs_dict[layer_name]
+    style_reference_features = layer_features[1, :, :, :]
+    combination_features = layer_features[2, :, :, :]
+    sl = style_loss(style_reference_features, combination_features)
+    loss += (style_weight / len(feature_layers)) * sl
+loss += total_variation_weight * total_variation_loss(combination_image)
+
+# get the gradients of the generated image wrt the loss
+grads = K.gradients(loss, combination_image)
+
+# set up helper functions to extract the loss and gradients
+# from the computational graph as Numpy arrays
+f_grads = K.function([combination_image], grads)
+def eval_grads(x):
+    x = x.reshape((1, 3, img_width, img_height))
+    return np.array(f_grads([x])).flatten().astype('float64')
+
+f_loss = K.function([combination_image], [loss])
+def eval_loss(x):
+    x = x.reshape((1, 3, img_width, img_height))
+    return f_loss([x])[0].astype('float64')
+
+# run scipy-based optimization (L-BFGS) over the pixels of the generated image
+# so as to minimize the neural style loss
+x = np.random.uniform(0, 255, (1, 3, img_width, img_height))
+for i in range(10):
+    print('Start of iteration', i)
+    start_time = time.time()
+    x, min_val, info = fmin_l_bfgs_b(eval_loss, x.flatten(),
+                                     fprime=eval_grads, maxfun=20)
+    print('Current loss value:', min_val)
+    # save current generated image
+    img = deprocess_image(x.reshape((3, img_width, img_height)))
+    fname = result_prefix + '_at_iteration_%d.png' % i
+    imsave(fname, img)
+    end_time = time.time()
+    print('Image saved as', fname)
+    print('Iteration %d completed in %ds' % (i, end_time - start_time))
--- a/keras/init.py
+++ b/keras/init.py
@ -1 +1 @@
-__version__ = '0.3.0'
+__version__ = '0.3.1'
--- a/keras/activations.py
+++ b/keras/activations.py
@ -39,7 +39,7 @@ def hard_sigmoid(x):

 def linear(x):
    '''
-    The function returns the variable that is passed in, so all types work
+    The function returns the variable that is passed in, so all types work.
    '''
    return x

--- a/keras/backend/init.py
+++ b/keras/backend/init.py
@ -4,12 +4,16 @@ import os
 import json
 from .common import epsilon, floatx, set_epsilon, set_floatx

-_keras_dir = os.path.expanduser(os.path.join('~', '.keras'))
+_keras_base_dir = os.path.expanduser('~')
+if not os.access(_keras_base_dir, os.W_OK):
+    _keras_base_dir = '/tmp'
+
+_keras_dir = os.path.join(_keras_base_dir, '.keras')
 if not os.path.exists(_keras_dir):
    os.makedirs(_keras_dir)

 _BACKEND = 'theano'
-_config_path = os.path.expanduser(os.path.join('~', '.keras', 'keras.json'))
+_config_path = os.path.expanduser(os.path.join(_keras_dir, 'keras.json'))
 if os.path.exists(_config_path):
    _config = json.load(open(_config_path))
    _floatx = _config.get('floatx', floatx())
@ -31,6 +35,11 @@ else:
        # add new line in order for bash 'cat' display the content correctly
        f.write(json.dumps(_config) + '\n')

+if 'KERAS_BACKEND' in os.environ:
+    _backend = os.environ['KERAS_BACKEND']
+    assert _backend in {'theano', 'tensorflow'}
+    _BACKEND = _backend
+
 if _BACKEND == 'theano':
    print('Using Theano backend.')
    from .theano_backend import *
--- a/keras/backend/tensorflow_backend.py
+++ b/keras/backend/tensorflow_backend.py
@ -236,6 +236,41 @@ def permute_dimensions(x, pattern):
    return tf.transpose(x, perm=pattern)


+def resize_images(X, height_factor, width_factor, dim_ordering):
+    '''Resize the images contained in a 4D tensor of shape
+    - [batch, channels, height, width] (for 'th' dim_ordering)
+    - [batch, height, width, channels] (for 'tf' dim_ordering)
+    by a factor of (height_factor, width_factor). Both factors should be
+    positive integers.
+    '''
+    if dim_ordering == 'th':
+        new_height = shape(X)[2].value * height_factor
+        new_width = shape(X)[3].value * width_factor
+        X = permute_dimensions(X, [0, 2, 3, 1])
+        X = tf.image.resize_nearest_neighbor(X, (new_height, new_width))
+        return permute_dimensions(X, [0, 3, 1, 2])
+    elif dim_ordering == 'tf':
+        new_height = shape(X)[1].value * height_factor
+        new_width = shape(X)[2].value * width_factor
+        return tf.image.resize_nearest_neighbor(X, (new_height, new_width))
+    else:
+        raise Exception('Invalid dim_ordering: ' + dim_ordering)
+
+
+def repeat_elements(x, rep, axis):
+    '''Repeats the elements of a tensor along an axis, like np.repeat
+
+    If x has shape (s1, s2, s3) and axis=1, the output
+    will have shape (s1, s2 * rep, s3)
+    '''
+    x_shape = x.get_shape().as_list()
+    # slices along the repeat axis
+    splits = tf.split(axis, x_shape[axis], x)
+    # repeat each slice the given number of reps
+    x_rep = [s for s in splits for i in range(rep)]
+    return tf.concat(axis, x_rep)
+
+
 def repeat(x, n):
    '''Repeat a 2D tensor:

@ -252,6 +287,10 @@ def tile(x, n):


 def flatten(x):
+    return tf.reshape(x, [-1])
+
+
+def batch_flatten(x):
    '''Turn a n-D tensor into a 2D tensor where
    the first dimension is conserved.
    '''
@ -274,9 +313,6 @@ def squeeze(x, axis):
 def temporal_padding(x, padding=1):
    '''Pad the middle dimension of a 3D tensor
    with "padding" zeros left and right.
-
-    Appologies for the inane API, but Theano makes this
-    really hard.
    '''
    pattern = [[0, 0], [padding, padding], [0, 0]]
    return tf.pad(x, pattern)
@ -313,12 +349,16 @@ def set_value(x, value):
 class Function(object):

    def __init__(self, inputs, outputs, updates=[]):
+        assert type(inputs) in {list, tuple}
+        assert type(outputs) in {list, tuple}
+        assert type(updates) in {list, tuple}
        self.inputs = list(inputs)
        self.outputs = list(outputs)
        with tf.control_dependencies(self.outputs):
            self.updates = [tf.assign(p, new_p) for (p, new_p) in updates]

    def __call__(self, inputs):
+        assert type(inputs) in {list, tuple}
        names = [v.name for v in self.inputs]
        feed_dict = dict(zip(names, inputs))
        session = _get_session()
@ -410,7 +450,7 @@ def rnn(step_function, inputs, initial_states,
    new_states = successive_states[-1]

    outputs = tf.transpose(outputs, (1, 0, 2))
-    return last_output, outputs, states
+    return last_output, outputs, new_states


 def switch(condition, then_expression, else_expression):
@ -499,6 +539,12 @@ def dropout(x, level, seed=None):
    return tf.nn.dropout(x * 1., retain_prob, seed=seed)


+def l2_normalize(x, axis):
+    if axis < 0:
+        axis = axis % len(x.get_shape())
+    return tf.nn.l2_normalize(x, dim=axis)
+
+
 # CONVOLUTIONS


--- a/keras/backend/theano_backend.py
+++ b/keras/backend/theano_backend.py
@ -11,7 +11,7 @@ theano.config.floatX = _FLOATX


 def _on_gpu():
-    '''Returns whether the session is set to
+    '''Return whether the session is set to
    run on GPU or not (i.e. on CPU).
    '''
    return theano.config.device[:3] == 'gpu'
@ -19,7 +19,7 @@ def _on_gpu():

 if _on_gpu():
    '''Import cuDNN only if running on GPU:
-    not having Cuda install should not
+    not having Cuda installed should not
    prevent from running the present code.
    '''
    from theano.sandbox.cuda import dnn
@ -243,11 +243,39 @@ def permute_dimensions(x, pattern):
    return x.dimshuffle(pattern)


-def repeat(x, n):
-    '''Repeat a 2D tensor:
+def repeat_elements(x, rep, axis):
+    '''Repeat the elements of a tensor along an axis, like np.repeat.

-    if x has shape (samples, dim) and n=2,
-    the output will have shape (samples, 2, dim)
+    If x has shape (s1, s2, s3) and axis=1, the output
+    will have shape (s1, s2 * rep, s3).
+    '''
+    return T.repeat(x, rep, axis=axis)
+
+
+def resize_images(X, height_factor, width_factor, dim_ordering):
+    '''Resize the images contained in a 4D tensor of shape
+    - [batch, channels, height, width] (for 'th' dim_ordering)
+    - [batch, height, width, channels] (for 'tf' dim_ordering)
+    by a factor of (height_factor, width_factor). Both factors should be
+    positive integers.
+    '''
+    if dim_ordering == 'th':
+        output = repeat_elements(X, height_factor, axis=2)
+        output = repeat_elements(output, width_factor, axis=3)
+        return output
+    elif dim_ordering == 'tf':
+        output = repeat_elements(X, height_factor, axis=1)
+        output = repeat_elements(output, width_factor, axis=2)
+        return output
+    else:
+        raise Exception('Invalid dim_ordering: ' + dim_ordering)
+
+
+def repeat(x, n):
+    '''Repeat a 2D tensor.
+
+    If x has shape (samples, dim) and n=2,
+    the output will have shape (samples, 2, dim).
    '''
    tensors = [x] * n
    stacked = T.stack(*tensors)
@ -259,6 +287,10 @@ def tile(x, n):


 def flatten(x):
+    return T.flatten(x)
+
+
+def batch_flatten(x):
    '''Turn a n-D tensor into a 2D tensor where
    the first dimension is conserved.
    '''
@ -354,6 +386,7 @@ class Function(object):
                                        allow_input_downcast=True, **kwargs)

    def __call__(self, inputs):
+        assert type(inputs) in {list, tuple}
        return self.function(*inputs)


@ -369,7 +402,7 @@ def gradients(loss, variables):

 def rnn(step_function, inputs, initial_states,
        go_backwards=False, masking=True):
-    '''Iterates over the time dimension of a tensor.
+    '''Iterate over the time dimension of a tensor.

    Parameters
    ----------
@ -412,7 +445,7 @@ def rnn(step_function, inputs, initial_states,
        if masking:
            # if all-zero input timestep, return
            # all-zero output and unchanged states
-            switch = T.any(input)
+            switch = T.any(input, axis=-1, keepdims=True)
            output = T.switch(switch, output, 0. * output)
            return_states = []
            for state, new_state in zip(states, new_states):
@ -509,9 +542,13 @@ def dropout(x, level, seed=None):
    return x


-# CONVOLUTIONS
+def l2_normalize(x, axis):
+    norm = T.sqrt(T.sum(T.square(x), axis=axis, keepdims=True))
+    return x / norm


+# CONVOLUTIONS
+
 def conv2d(x, kernel, strides=(1, 1), border_mode='valid', dim_ordering='th',
           image_shape=None, filter_shape=None):
    '''
@ -540,12 +577,15 @@ def conv2d(x, kernel, strides=(1, 1), border_mode='valid', dim_ordering='th',
    if _on_gpu() and dnn.dnn_available():
        if border_mode == 'same':
            assert(strides == (1, 1))
-            np_kernel = kernel.eval()
-            pad_x = (np_kernel.shape[2] - strides[0]) // 2
-            pad_y = (np_kernel.shape[3] - strides[1]) // 2
            conv_out = dnn.dnn_conv(img=x,
                                    kerns=kernel,
-                                    border_mode=(pad_x, pad_y))
+                                    border_mode='full')
+            np_kernel = kernel.eval()
+            shift_x = (np_kernel.shape[2] - 1) // 2
+            shift_y = (np_kernel.shape[3] - 1) // 2
+            conv_out = conv_out[:, :,
+                                shift_x:x.shape[2] + shift_x,
+                                shift_y:x.shape[3] + shift_y]
        else:
            conv_out = dnn.dnn_conv(img=x,
                                    kerns=kernel,
@ -566,8 +606,9 @@ def conv2d(x, kernel, strides=(1, 1), border_mode='valid', dim_ordering='th',
                                      image_shape=image_shape,
                                      filter_shape=filter_shape)
        if border_mode == 'same':
-            shift_x = (kernel.shape[2] - 1) // 2
-            shift_y = (kernel.shape[3] - 1) // 2
+            np_kernel = kernel.eval()
+            shift_x = (np_kernel.shape[2] - 1) // 2
+            shift_y = (np_kernel.shape[3] - 1) // 2
            conv_out = conv_out[:, :,
                                shift_x:x.shape[2] + shift_x,
                                shift_y:x.shape[3] + shift_y]
--- a/keras/callbacks.py
+++ b/keras/callbacks.py
@ -8,6 +8,7 @@ import warnings

 from collections import deque
 from .utils.generic_utils import Progbar
+from keras import backend as K


 class CallbackList(object):
@ -43,21 +44,27 @@ class CallbackList(object):
            callback.on_batch_begin(batch, logs)
        self._delta_ts_batch_begin.append(time.time() - t_before_callbacks)
        delta_t_median = np.median(self._delta_ts_batch_begin)
-        if self._delta_t_batch > 0. and delta_t_median > 0.95 * self._delta_t_batch and delta_t_median > 0.1:
+        if self._delta_t_batch > 0. and delta_t_median > 0.95 * \
+           self._delta_t_batch and delta_t_median > 0.1:
            warnings.warn('Method on_batch_begin() is slow compared '
-                          'to the batch update (%f). Check your callbacks.' % delta_t_median)
+                          'to the batch update (%f). Check your callbacks.'
+                          % delta_t_median)
        self._t_enter_batch = time.time()

    def on_batch_end(self, batch, logs={}):
+        if not hasattr(self, '_t_enter_batch'):
+            self._t_enter_batch = time.time()
        self._delta_t_batch = time.time() - self._t_enter_batch
        t_before_callbacks = time.time()
        for callback in self.callbacks:
            callback.on_batch_end(batch, logs)
        self._delta_ts_batch_end.append(time.time() - t_before_callbacks)
        delta_t_median = np.median(self._delta_ts_batch_end)
-        if self._delta_t_batch > 0. and delta_t_median > 0.95 * self._delta_t_batch and delta_t_median > 0.1:
+        if self._delta_t_batch > 0. and delta_t_median > 0.95 * \
+           self._delta_t_batch and delta_t_median > 0.1:
            warnings.warn('Method on_batch_end() is slow compared '
-                          'to the batch update (%f). Check your callbacks.' % delta_t_median)
+                          'to the batch update (%f). Check your callbacks.'
+                          % delta_t_median)

    def on_train_begin(self, logs={}):
        for callback in self.callbacks:
@ -249,7 +256,8 @@ class ModelCheckpoint(Callback):

        if mode not in ['auto', 'min', 'max']:
            warnings.warn('ModelCheckpoint mode %s is unknown, '
-                          'fallback to auto mode' % (self.mode), RuntimeWarning)
+                          'fallback to auto mode.' % (self.mode),
+                          RuntimeWarning)
            mode = 'auto'

        if mode == 'min':
@ -276,7 +284,8 @@ class ModelCheckpoint(Callback):
            else:
                if self.monitor_op(current, self.best):
                    if self.verbose > 0:
-                        print('Epoch %05d: %s improved from %0.5f to %0.5f, saving model to %s'
+                        print('Epoch %05d: %s improved from %0.5f to %0.5f,'
+                              ' saving model to %s'
                              % (epoch, self.monitor, self.best,
                                 current, filepath))
                    self.best = current
@ -299,23 +308,46 @@ class EarlyStopping(Callback):
        patience: number of epochs with no improvement
            after which training will be stopped.
        verbose: verbosity mode.
+        mode: one of {auto, min, max}. In 'min' mode,
+            training will stop when the quantity
+            monitored has stopped decreasing; in 'max'
+            mode it will stop when the quantity
+            monitored has stopped increasing.
    '''
-    def __init__(self, monitor='val_loss', patience=0, verbose=0):
+    def __init__(self, monitor='val_loss', patience=0, verbose=0, mode='auto'):
        super(Callback, self).__init__()

        self.monitor = monitor
        self.patience = patience
        self.verbose = verbose
-        self.best = np.Inf
        self.wait = 0

+        if mode not in ['auto', 'min', 'max']:
+            warnings.warn('EarlyStopping mode %s is unknown, '
+                          'fallback to auto mode.' % (self.mode), RuntimeWarning)
+            mode = 'auto'
+
+        if mode == 'min':
+            self.monitor_op = np.less
+            self.best = np.Inf
+        elif mode == 'max':
+            self.monitor_op = np.greater
+            self.best = -np.Inf
+        else:
+            if 'acc' in self.monitor:
+                self.monitor_op = np.greater
+                self.best = -np.Inf
+            else:
+                self.monitor_op = np.less
+                self.best = np.Inf
+
    def on_epoch_end(self, epoch, logs={}):
        current = logs.get(self.monitor)
        if current is None:
            warnings.warn('Early stopping requires %s available!' %
                          (self.monitor), RuntimeWarning)

-        if current < self.best:
+        if self.monitor_op(current, self.best):
            self.best = current
            self.wait = 0
        else:
@ -327,9 +359,16 @@ class EarlyStopping(Callback):


 class RemoteMonitor(Callback):
-    '''Experimental callback used to stream events to a server.
+    '''Callback used to stream events to a server.

    Requires the `requests` library.
+
+    # Arguments
+        root: root url to which the events will be sent (at the end
+            of every epoch). Events are sent to
+            `root + '/publish/epoch/end/'`. Calls are HTTP POST,
+            with a `data` argument which is a JSON-encoded dictionary
+            of event data.
    '''
    def __init__(self, root='http://localhost:9000'):
        self.root = root
@ -369,13 +408,120 @@ class LearningRateScheduler(Callback):
    '''Learning rate scheduler.

    # Arguments
-        schedule: a function that gets an epoch index as input
+        schedule: a function that takes an epoch index as input
            (integer, indexed from 0) and returns a new
-            learning rate as output.
+            learning rate as output (float).
    '''
    def __init__(self, schedule):
        super(LearningRateScheduler, self).__init__()
        self.schedule = schedule

    def on_epoch_begin(self, epoch, logs={}):
-        self.model.optimizer.lr.set_value(self.schedule(epoch))
+        assert hasattr(self.model.optimizer, 'lr'), \
+            'Optimizer must have a "lr" attribute.'
+        lr = self.schedule(epoch)
+        assert type(lr) == float, 'The output of the "schedule" function should be float.'
+        K.set_value(self.model.optimizer.lr, lr)
+
+
+class TensorBoard(Callback):
+    ''' Tensorboard basic visualizations.
+
+    This callback writes a log for TensorBoard, which allows
+    you to visualize dynamic graphs of your training and test
+    metrics, as well as activation histograms for the different
+    layers in your model.
+
+    TensorBoard is a visualization tool provided with TensorFlow.
+
+    If you have installed TensorFlow with pip, you should be able
+    to launch TensorBoard from the command line:
+    ```
+    tensorboard --logdir=/full_path_to_your_logs
+    ```
+    You can find more information about TensorBoard
+    [here](https://www.tensorflow.org/versions/master/how_tos/summaries_and_tensorboard/index.html).
+
+    # Arguments
+        log_dir: the path of the directory where to save the log
+            files to be parsed by tensorboard
+        histogram_freq: frequency (in epochs) at which to compute activation
+            histograms for the layers of the model. If set to 0,
+            histograms won't be computed.
+    '''
+    def __init__(self, log_dir='./logs', histogram_freq=0):
+        super(Callback, self).__init__()
+        if K._BACKEND != 'tensorflow':
+            raise Exception('TensorBoard callback only works '
+                            'with the TensorFlow backend.')
+        self.log_dir = log_dir
+        self.histogram_freq = histogram_freq
+
+    def _set_model(self, model):
+        import tensorflow as tf
+        import keras.backend.tensorflow_backend as KTF
+
+        self.model = model
+        self.sess = KTF._get_session()
+        if self.histogram_freq:
+            mod_type = self.model.get_config()['name']
+            if mod_type == 'Sequential':
+                layers = {l.get_config()['name']: l for l in self.model.layers}
+            elif mod_type == 'Graph':
+                layers = self.model.nodes
+            else:
+                raise Exception('Unrecognized model:',
+                                self.model.get_config()['name'])
+            for l in layers:
+                cur_layer = layers[l]
+                if hasattr(cur_layer, 'W'):
+                    tf.histogram_summary('{}_W'.format(l), cur_layer.W)
+                if hasattr(cur_layer, 'b'):
+                    tf.histogram_summary('{}_b'.format(l), cur_layer.b)
+                if hasattr(cur_layer, 'get_output'):
+                    tf.histogram_summary('{}_out'.format(l),
+                                         cur_layer.get_output())
+        self.merged = tf.merge_all_summaries()
+        self.writer = tf.train.SummaryWriter(self.log_dir,
+                                             self.sess.graph_def)
+
+    def on_epoch_begin(self, epoch, logs={}):
+        self.seen = 0
+        self.totals = {}
+
+    def on_batch_end(self, batch, logs={}):
+        batch_size = logs.get('size', 0)
+        self.seen += batch_size
+        for k, v in logs.items():
+            if k in self.totals:
+                self.totals[k] += v * batch_size
+            else:
+                self.totals[k] = v * batch_size
+
+    def on_epoch_end(self, epoch, logs={}):
+        import tensorflow as tf
+
+        if self.model.validation_data and self.histogram_freq:
+            if epoch % self.histogram_freq == 0:
+                if self.params.get('show_accuracy'):
+                    test_function = self.model._test_with_acc
+                else:
+                    test_function = self.model._test
+                names = [v.name for v in test_function.inputs]
+                feed_dict = dict(zip(names, self.model.validation_data))
+                result = self.sess.run([self.merged], feed_dict=feed_dict)
+                summary_str = result[0]
+                self.writer.add_summary(summary_str, epoch)
+
+        all_values = self.totals.copy()
+        all_values.update(logs)
+        
+        for name, value in all_values.items():
+            if name in ['batch', 'size']:
+                continue
+            summary = tf.Summary()
+            summary_value = summary.value.add()
+            summary_value.simple_value = value
+            summary_value.tag = name
+            self.writer.add_summary(summary, epoch)
+        self.writer.flush()
--- a/keras/datasets/cifar10.py
+++ b/keras/datasets/cifar10.py
@ -10,7 +10,6 @@ def load_data():
    origin = "http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
    path = get_file(dirname, origin=origin, untar=True)

-    nb_test_samples = 10000
    nb_train_samples = 50000

    X_train = np.zeros((nb_train_samples, 3, 32, 32), dtype="uint8")
--- a/keras/datasets/data_utils.py
+++ b/keras/datasets/data_utils.py
@ -14,7 +14,10 @@ class ParanoidURLopener(FancyURLopener):


 def get_file(fname, origin, untar=False):
-    datadir = os.path.expanduser(os.path.join('~', '.keras', 'datasets'))
+    datadir_base = os.path.expanduser(os.path.join('~', '.keras'))
+    if not os.access(datadir_base, os.W_OK):
+        datadir_base = os.path.join('/tmp', '.keras')
+    datadir = os.path.join(datadir_base, 'datasets')
    if not os.path.exists(datadir):
        os.makedirs(datadir)

--- a/keras/datasets/imdb.py
+++ b/keras/datasets/imdb.py
@ -2,12 +2,12 @@ from __future__ import absolute_import
 from six.moves import cPickle
 import gzip
 from .data_utils import get_file
-import random
 from six.moves import zip
 import numpy as np


-def load_data(path="imdb.pkl", nb_words=None, skip_top=0, maxlen=None, test_split=0.2, seed=113,
+def load_data(path="imdb.pkl", nb_words=None, skip_top=0,
+              maxlen=None, test_split=0.2, seed=113,
              start_char=1, oov_char=2, index_from=3):

    path = get_file(path, origin="https://s3.amazonaws.com/text-datasets/imdb.pkl")
@ -39,7 +39,10 @@ def load_data(path="imdb.pkl", nb_words=None, skip_top=0, maxlen=None, test_spli
                new_labels.append(y)
        X = new_X
        labels = new_labels
-
+    if not X:
+        raise Exception('After filtering for sequences shorter than maxlen=' +
+                        str(maxlen) + ', no sequence was kept. '
+                        'Increase maxlen.')
    if not nb_words:
        nb_words = max([max(x) for x in X])

@ -57,10 +60,10 @@ def load_data(path="imdb.pkl", nb_words=None, skip_top=0, maxlen=None, test_spli
            nX.append(nx)
        X = nX

-    X_train = X[:int(len(X)*(1-test_split))]
-    y_train = labels[:int(len(X)*(1-test_split))]
+    X_train = X[:int(len(X) * (1 - test_split))]
+    y_train = labels[:int(len(X) * (1 - test_split))]

-    X_test = X[int(len(X)*(1-test_split)):]
-    y_test = labels[int(len(X)*(1-test_split)):]
+    X_test = X[int(len(X) * (1 - test_split)):]
+    y_test = labels[int(len(X) * (1 - test_split)):]

    return (X_train, y_train), (X_test, y_test)
--- a/keras/datasets/mnist.py
+++ b/keras/datasets/mnist.py
@ -19,5 +19,4 @@ def load_data(path="mnist.pkl.gz"):
        data = cPickle.load(f, encoding="bytes")

    f.close()
-
    return data  # (X_train, y_train), (X_test, y_test)
--- a/keras/datasets/reuters.py
+++ b/keras/datasets/reuters.py
@ -1,18 +1,17 @@
 # -*- coding: utf-8 -*-
 from __future__ import absolute_import
 from .data_utils import get_file
-import random
 from six.moves import cPickle
 from six.moves import zip
 import numpy as np


-def load_data(path="reuters.pkl", nb_words=None, skip_top=0, maxlen=None, test_split=0.2, seed=113,
+def load_data(path="reuters.pkl", nb_words=None, skip_top=0,
+              maxlen=None, test_split=0.2, seed=113,
              start_char=1, oov_char=2, index_from=3):

    path = get_file(path, origin="https://s3.amazonaws.com/text-datasets/reuters.pkl")
    f = open(path, 'rb')
-
    X, labels = cPickle.load(f)
    f.close()

@ -53,11 +52,11 @@ def load_data(path="reuters.pkl", nb_words=None, skip_top=0, maxlen=None, test_s
            nX.append(nx)
        X = nX

-    X_train = X[:int(len(X)*(1-test_split))]
-    y_train = labels[:int(len(X)*(1-test_split))]
+    X_train = X[:int(len(X) * (1 - test_split))]
+    y_train = labels[:int(len(X) * (1 - test_split))]

-    X_test = X[int(len(X)*(1-test_split)):]
-    y_test = labels[int(len(X)*(1-test_split)):]
+    X_test = X[int(len(X) * (1 - test_split)):]
+    y_test = labels[int(len(X) * (1 - test_split)):]

    return (X_train, y_train), (X_test, y_test)

@ -66,8 +65,3 @@ def get_word_index(path="reuters_word_index.pkl"):
    path = get_file(path, origin="https://s3.amazonaws.com/text-datasets/reuters_word_index.pkl")
    f = open(path, 'rb')
    return cPickle.load(f)
-
-
-if __name__ == "__main__":
-    make_reuters_dataset()
-    (X_train, y_train), (X_test, y_test) = load_data()
--- a/keras/initializations.py
+++ b/keras/initializations.py
@ -9,52 +9,54 @@ def get_fans(shape):
    return fan_in, fan_out


-def uniform(shape, scale=0.05):
-    return K.variable(np.random.uniform(low=-scale, high=scale, size=shape))
+def uniform(shape, scale=0.05, name=None):
+    return K.variable(np.random.uniform(low=-scale, high=scale, size=shape),
+                      name=name)


-def normal(shape, scale=0.05):
-    return K.variable(np.random.randn(*shape) * scale)
+def normal(shape, scale=0.05, name=None):
+    return K.variable(np.random.normal(loc=0.0, scale=scale, size=shape),
+                      name=name)


-def lecun_uniform(shape):
+def lecun_uniform(shape, name=None):
    ''' Reference: LeCun 98, Efficient Backprop
        http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
    '''
    fan_in, fan_out = get_fans(shape)
    scale = np.sqrt(3. / fan_in)
-    return uniform(shape, scale)
+    return uniform(shape, scale, name=name)


-def glorot_normal(shape):
+def glorot_normal(shape, name=None):
    ''' Reference: Glorot & Bengio, AISTATS 2010
    '''
    fan_in, fan_out = get_fans(shape)
    s = np.sqrt(2. / (fan_in + fan_out))
-    return normal(shape, s)
+    return normal(shape, s, name=name)


-def glorot_uniform(shape):
+def glorot_uniform(shape, name=None):
    fan_in, fan_out = get_fans(shape)
    s = np.sqrt(6. / (fan_in + fan_out))
-    return uniform(shape, s)
+    return uniform(shape, s, name=name)


-def he_normal(shape):
+def he_normal(shape, name=None):
    ''' Reference:  He et al., http://arxiv.org/abs/1502.01852
    '''
    fan_in, fan_out = get_fans(shape)
    s = np.sqrt(2. / fan_in)
-    return normal(shape, s)
+    return normal(shape, s, name=name)


-def he_uniform(shape):
+def he_uniform(shape, name=None):
    fan_in, fan_out = get_fans(shape)
    s = np.sqrt(6. / fan_in)
-    return uniform(shape, s)
+    return uniform(shape, s, name=name)


-def orthogonal(shape, scale=1.1):
+def orthogonal(shape, scale=1.1, name=None):
    ''' From Lasagne. Reference: Saxe et al., http://arxiv.org/abs/1312.6120
    '''
    flat_shape = (shape[0], np.prod(shape[1:]))
@ -63,22 +65,23 @@ def orthogonal(shape, scale=1.1):
    # pick the one with the correct shape
    q = u if u.shape == flat_shape else v
    q = q.reshape(shape)
-    return K.variable(scale * q[:shape[0], :shape[1]])
+    return K.variable(scale * q[:shape[0], :shape[1]], name=name)


-def identity(shape, scale=1):
+def identity(shape, scale=1, name=None):
    if len(shape) != 2 or shape[0] != shape[1]:
-        raise Exception("Identity matrix initialization can only be used for 2D square matrices")
+        raise Exception('Identity matrix initialization can only be used '
+                        'for 2D square matrices.')
    else:
-        return K.variable(scale * np.identity(shape[0]))
+        return K.variable(scale * np.identity(shape[0]), name=name)


-def zero(shape):
-    return K.zeros(shape)
+def zero(shape, name=None):
+    return K.zeros(shape, name=name)


-def one(shape):
-    return K.ones(shape)
+def one(shape, name=None):
+    return K.ones(shape, name=name)


 from .utils.generic_utils import get_from_module
--- a/keras/layers/containers.py
+++ b/keras/layers/containers.py
@ -23,16 +23,41 @@ class Sequential(Layer):
        self.layer_cache = {}
        for layer in layers:
            self.add(layer)
+        self._cache_enabled = True

-    def __call__(self, X, train=False):
+    def __call__(self, X, mask=None, train=False):
+        # turn off layer cache temporarily
+        tmp_cache_enabled = self.cache_enabled
+        self.cache_enabled = False
+        # recursively search for a layer which is not a Sequential model
+        layer = self
+        while issubclass(layer.__class__, Sequential):
+            layer = layer.layers[0]
        # set temporary input to first layer
-        tmp = self.layers[0].get_input
-        self.layers[0].get_input = lambda _: X
+        tmp_input = layer.get_input
+        tmp_mask = None
+        layer.get_input = lambda _: X
+        if hasattr(layer, 'get_input_mask'):
+            tmp_mask = layer.get_input_mask
+            layer.get_input_mask = lambda _: mask
        Y = self.get_output(train=train)
-        # return input to first layer to what it was
-        self.layers[0].get_input = tmp
+        # return input from first layer to what it was
+        layer.get_input = tmp_input
+        if hasattr(layer, 'get_input_mask'):
+            layer.get_input_mask = tmp_mask
+        self.cache_enabled = tmp_cache_enabled
        return Y

+    @property
+    def cache_enabled(self):
+        return self._cache_enabled
+
+    @cache_enabled.setter
+    def cache_enabled(self, value):
+        self._cache_enabled = value
+        for l in self.layers:
+            l.cache_enabled = value
+
    def set_previous(self, layer):
        self.layers[0].previous = layer

@ -79,7 +104,7 @@ class Sequential(Layer):
    @property
    def state_updates(self):
        """
-        Returns the `updates` from all layers in the sequence that are
+        Return the `updates` from all layers in the sequence that are
        stateful.  This is useful for separating _training_ updates and
        _prediction_ updates for when we need to update a layers internal state
        during a stateful prediction.
@ -207,7 +232,7 @@ class Graph(Layer):
    @property
    def state_updates(self):
        """
-        Returns the `updates` from all nodes in that graph for nodes that are
+        Return the `updates` from all nodes in that graph for nodes that are
        stateful.  This is useful for separating _training_ updates and
        _prediction_ updates for when we need to update a layers internal state
        during a stateful prediction.
@ -288,7 +313,7 @@ class Graph(Layer):
        if dtype == 'float':
            layer.input = K.placeholder(shape=layer.input_shape, name=name)
        else:
-            if len(input_shape) == 1:
+            if (input_shape and len(input_shape) == 1) or (batch_input_shape and len(batch_input_shape) == 2):
                layer.input = K.placeholder(shape=layer.input_shape,
                                            dtype='int32',
                                            name=name)
@ -375,9 +400,7 @@ class Graph(Layer):
            dot_axes: Same meaning as `dot_axes` argument of `add_node()`
            outputs: Used when `merge_mode=None`. Names for the output nodes.
            create_output: Same meaning as `create_output` argument of `add_node()`.
-                When creating an output, `merge_mode` must be specified.
        '''
-        layer.layer_cache = self.layer_cache
        if name in self.namespace:
            raise Exception('Duplicate node identifier: ' + name)
        for o in outputs:
@ -408,7 +431,8 @@ class Graph(Layer):
                raise Exception('Unknown identifier: ' + input)
        s = Siamese(layer, layers, merge_mode,
                    concat_axis=concat_axis,
-                    dot_axes=dot_axes)
+                    dot_axes=dot_axes,
+                    is_graph=True)
        self.namespace.add(name)
        self.nodes[name] = s
        self.node_config.append({'name': name,
@ -425,7 +449,7 @@ class Graph(Layer):
                self.namespace.add(sh_name)
                self.nodes[sh_name] = sh
                self.node_config.append({'name': sh_name,
-                                         'inputs': [s],
+                                         'inputs': [name],
                                         'create_output': create_output})
                if create_output:
                    self.add_output(sh_name, input=sh_name)
--- a/keras/layers/convolutional.py
+++ b/keras/layers/convolutional.py
@ -584,7 +584,7 @@ class AveragePooling2D(_Pooling2D):


 class UpSampling1D(Layer):
-    '''Repeats each temporal step `length` times along the time axis.
+    '''Repeat each temporal step `length` times along the time axis.

    # Input shape
        3D tensor with shape: `(samples, steps, features)`.
@ -609,7 +609,7 @@ class UpSampling1D(Layer):

    def get_output(self, train=False):
        X = self.get_input(train)
-        output = K.concatenate([X] * self.length, axis=1)
+        output = K.repeat_elements(X, self.length, axis=1)
        return output

    def get_config(self):
@ -620,7 +620,7 @@ class UpSampling1D(Layer):


 class UpSampling2D(Layer):
-    '''Repeats the rows and columns of the data
+    '''Repeat the rows and columns of the data
    by size[0] and size[1] respectively.

    # Input shape
@ -668,15 +668,8 @@ class UpSampling2D(Layer):

    def get_output(self, train=False):
        X = self.get_input(train)
-        if self.dim_ordering == 'th':
-            output = K.concatenate([X] * self.size[0], axis=2)
-            output = K.concatenate([output] * self.size[1], axis=3)
-        elif self.dim_ordering == 'tf':
-            output = K.concatenate([X] * self.size[0], axis=1)
-            output = K.concatenate([output] * self.size[1], axis=2)
-        else:
-            raise Exception('Invalid dim_ordering: ' + self.dim_ordering)
-        return output
+        return K.resize_images(X, self.size[0], self.size[1],
+                               self.dim_ordering)

    def get_config(self):
        config = {'name': self.__class__.__name__,
--- a/keras/layers/core.py
+++ b/keras/layers/core.py
@ -35,36 +35,70 @@ class Layer(object):
    def __init__(self, **kwargs):
        allowed_kwargs = {'input_shape',
                          'trainable',
-                          'batch_input_shape'}
+                          'batch_input_shape',
+                          'cache_enabled',
+                          'name'}
        for kwarg in kwargs:
-            assert kwarg in allowed_kwargs, "Keyword argument not understood: " + kwarg
+            assert kwarg in allowed_kwargs, 'Keyword argument not understood: ' + kwarg
+
        if 'input_shape' in kwargs:
            self.set_input_shape((None,) + tuple(kwargs['input_shape']))
        if 'batch_input_shape' in kwargs:
            self.set_input_shape(tuple(kwargs['batch_input_shape']))
+        self.trainable = True
        if 'trainable' in kwargs:
-            self._trainable = kwargs['trainable']
+            self.trainable = kwargs['trainable']
+        self.name = self.__class__.__name__.lower()
+        if 'name' in kwargs:
+            self.name = kwargs['name']
        if not hasattr(self, 'params'):
            self.params = []
+        self.cache_enabled = True
+        if 'cache_enabled' in kwargs:
+            self.cache_enabled = kwargs['cache_enabled']

-    def __call__(self, X, train=False):
+    @property
+    def name(self):
+        return self._name
+
+    @name.setter
+    def name(self, name):
+        self._name = name
+
+    @property
+    def cache_enabled(self):
+        return self._cache_enabled
+
+    @cache_enabled.setter
+    def cache_enabled(self, value):
+        self._cache_enabled = value
+
+    def __call__(self, X, mask=None, train=False):
        # set temporary input
-        tmp = self.get_input
+        tmp_input = self.get_input
+        tmp_mask = None
+        if hasattr(self, 'get_input_mask'):
+            tmp_mask = self.get_input_mask
+            self.get_input_mask = lambda _: mask
        self.get_input = lambda _: X
        Y = self.get_output(train=train)
        # return input to what it was
-        self.get_input = tmp
+        if hasattr(self, 'get_input_mask'):
+            self.get_input_mask = tmp_mask
+        self.get_input = tmp_input
        return Y

    def set_previous(self, layer, connection_map={}):
        '''Connect a layer to its parent in the computational graph.
        '''
-        assert self.nb_input == layer.nb_output == 1, "Cannot connect layers: input count and output count should be 1."
+        assert self.nb_input == layer.nb_output == 1, 'Cannot connect layers: input count and output count should be 1.'
        if hasattr(self, 'input_ndim'):
-            assert self.input_ndim == len(layer.output_shape), "Incompatible shapes: layer expected input with ndim=" +\
-                str(self.input_ndim) + " but previous layer has output_shape " + str(layer.output_shape)
+            assert self.input_ndim == len(layer.output_shape), ('Incompatible shapes: layer expected input with ndim=' +
+                                                                str(self.input_ndim) +
+                                                                ' but previous layer has output_shape ' +
+                                                                str(layer.output_shape))
        if layer.get_output_mask() is not None:
-            assert self.supports_masked_input(), "Cannot connect non-masking layer to layer with masked output"
+            assert self.supports_masked_input(), 'Cannot connect non-masking layer to layer with masked output.'
        self.previous = layer
        self.build()

@ -132,12 +166,12 @@ class Layer(object):
        if hasattr(self, 'previous'):
            # to avoid redundant computations,
            # layer outputs are cached when possible.
-            if hasattr(self, 'layer_cache'):
+            if hasattr(self, 'layer_cache') and self.cache_enabled:
                previous_layer_id = '%s_%s' % (id(self.previous), train)
                if previous_layer_id in self.layer_cache:
                    return self.layer_cache[previous_layer_id]
            previous_output = self.previous.get_output(train=train)
-            if hasattr(self, 'layer_cache'):
+            if hasattr(self, 'layer_cache') and self.cache_enabled:
                previous_layer_id = '%s_%s' % (id(self.previous), train)
                self.layer_cache[previous_layer_id] = previous_output
            return previous_output
@ -188,11 +222,12 @@ class Layer(object):
            of the layer (i.e. it should match the
            output of `get_weights`).
        '''
-        assert len(self.params) == len(weights), 'Provided weight array does not match layer weights (' + \
-            str(len(self.params)) + ' layer params vs. ' + str(len(weights)) + ' provided weights)'
+        assert len(self.params) == len(weights), ('Provided weight array does not match layer weights (' +
+                                                  str(len(self.params)) + ' layer params vs. ' +
+                                                  str(len(weights)) + ' provided weights)')
        for p, w in zip(self.params, weights):
            if K.get_value(p).shape != w.shape:
-                raise Exception("Layer shape %s not compatible with weight shape %s." % (K.get_value(p).shape, w.shape))
+                raise Exception('Layer shape %s not compatible with weight shape %s.' % (K.get_value(p).shape, w.shape))
            K.set_value(p, w)

    def get_weights(self):
@ -207,11 +242,13 @@ class Layer(object):
    def get_config(self):
        '''Return the parameters of the layer, as a dictionary.
        '''
-        config = {"name": self.__class__.__name__}
+        config = {'name': self.__class__.__name__}
        if hasattr(self, '_input_shape'):
            config['input_shape'] = self._input_shape[1:]
        if hasattr(self, '_trainable'):
            config['trainable'] = self._trainable
+        config['cache_enabled'] = self.cache_enabled
+        config['custom_name'] = self.name
        return config

    def get_params(self):
@ -285,8 +322,8 @@ class Masking(MaskedLayer):
        self.input = K.placeholder(ndim=3)

    def get_output_mask(self, train=False):
-        if K._BACKEND == "tensorflow":
-            raise Exception("Masking is Theano-only for the time being.")
+        if K._BACKEND == 'tensorflow':
+            raise Exception('Masking is Theano-only for the time being.')
        X = self.get_input(train)
        return K.any(K.ones_like(X) * (1. - K.equal(X, self.mask_value)),
                     axis=-1)
@ -297,8 +334,8 @@ class Masking(MaskedLayer):
                         axis=-1, keepdims=True)

    def get_config(self):
-        config = {"name": self.__class__.__name__,
-                  "mask_value": self.mask_value}
+        config = {'name': self.__class__.__name__,
+                  'mask_value': self.mask_value}
        base_config = super(Masking, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

@ -344,8 +381,8 @@ class TimeDistributedMerge(Layer):
            raise Exception('Unknown merge mode')

    def get_config(self):
-        config = {"name": self.__class__.__name__,
-                  "mode": self.mode}
+        config = {'name': self.__class__.__name__,
+                  'mode': self.mode}
        base_config = super(TimeDistributedMerge, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

@ -458,6 +495,7 @@ class Merge(Layer):
                if p not in self.params:
                    self.params.append(p)
                    self.constraints.append(c)
+        super(Merge, self).__init__()

    @property
    def output_shape(self):
@ -507,7 +545,7 @@ class Merge(Layer):
            for i in range(len(self.layers)):
                X = self.layers[i].get_output(train)
                if X.name is None:
-                    raise ValueError('merge_mode="join" only works with named inputs')
+                    raise ValueError('merge_mode="join" only works with named inputs.')
                else:
                    inputs[X.name] = X
            return inputs
@ -537,7 +575,7 @@ class Merge(Layer):
            output = output.dimshuffle((0, 'x'))
            return output
        else:
-            raise Exception('Unknown merge mode')
+            raise Exception('Unknown merge mode.')

    def get_input(self, train=False):
        res = []
@ -662,13 +700,59 @@ class Reshape(Layer):
        super(Reshape, self).__init__(**kwargs)
        self.dims = tuple(dims)

+    def _fix_unknown_dimension(self, input_shape, output_shape):
+        '''Find and replace a single missing dimension in an output shape
+        given and input shape.
+
+        A near direct port of the internal numpy function _fix_unknown_dimension
+        in numpy/core/src/multiarray/shape.c
+
+        # Arguments
+            input_shape: shape of array being reshaped
+
+            output_shape: desired shaped of the array with at most
+                a single -1 which indicates a dimension that should be
+                derived from the input shape.
+
+        # Returns
+            The new output shape with a -1 replaced with its computed value.
+
+            Raises a ValueError if the total array size of the output_shape is
+            different then the input_shape, or more then one unknown dimension
+            is specified.
+        '''
+
+        output_shape = list(output_shape)
+
+        msg = 'total size of new array must be unchanged'
+
+        known, unknown = 1, None
+        for index, dim in enumerate(output_shape):
+            if dim < 0:
+                if unknown is None:
+                    unknown = index
+                else:
+                    raise ValueError('can only specify one unknown dimension')
+            else:
+                known *= dim
+
+        original = np.prod(input_shape, dtype=int)
+        if unknown is not None:
+            if known == 0 or original % known != 0:
+                raise ValueError(msg)
+            output_shape[unknown] = original // known
+        elif original != known:
+            raise ValueError(msg)
+
+        return tuple(output_shape)
+
    @property
    def output_shape(self):
-        return (self.input_shape[0],) + self.dims
+        return (self.input_shape[0],) + self._fix_unknown_dimension(self.input_shape[1:], self.dims)

    def get_output(self, train=False):
        X = self.get_input(train)
-        return K.reshape(X, (-1,) + self.dims)
+        return K.reshape(X, (-1,) + self.output_shape[1:])

    def get_config(self):
        config = {'name': self.__class__.__name__,
@ -725,7 +809,7 @@ class Flatten(Layer):
    '''Flatten the input. Does not affect the batch size.

    # Input shape
-        Arbitrary, although all dimensions in the input shaped must be fixed.
+        Arbitrary, although all dimensions in the input shape must be fixed.
        Use the keyword argument `input_shape`
        (tuple of integers, does not include the samples axis)
        when using this layer as the first layer in a model.
@ -739,11 +823,18 @@ class Flatten(Layer):
    @property
    def output_shape(self):
        input_shape = self.input_shape
+        if not all(input_shape[1:]):
+            raise Exception('The shape of the input to "Flatten" '
+                            'is not fully defined '
+                            '(got ' + str(input_shape[1:]) + '. '
+                            'Make sure to pass a complete "input_shape" '
+                            'or "batch_input_shape" argument to the first '
+                            'layer in your model.')
        return (input_shape[0], np.prod(input_shape[1:]))

    def get_output(self, train=False):
        X = self.get_input(train)
-        return K.flatten(X)
+        return K.batch_flatten(X)


 class RepeatVector(Layer):
@ -772,8 +863,8 @@ class RepeatVector(Layer):
        return K.repeat(X, self.n)

    def get_config(self):
-        config = {"name": self.__class__.__name__,
-                  "n": self.n}
+        config = {'name': self.__class__.__name__,
+                  'n': self.n}
        base_config = super(RepeatVector, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

@ -918,9 +1009,9 @@ class ActivityRegularization(Layer):
        return self.get_input(train)

    def get_config(self):
-        config = {"name": self.__class__.__name__,
-                  "l1": self.l1,
-                  "l2": self.l2}
+        config = {'name': self.__class__.__name__,
+                  'l1': self.l1,
+                  'l2': self.l2}
        base_config = super(ActivityRegularization, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

@ -995,7 +1086,7 @@ class TimeDistributedDense(MaskedLayer):
        input_dim = self.input_shape[2]

        self.W = self.init((input_dim, self.output_dim))
-        self.b = K.zeros((self.output_dim))
+        self.b = K.zeros((self.output_dim,))

        self.params = [self.W, self.b]
        self.regularizers = []
@ -1033,17 +1124,17 @@ class TimeDistributedDense(MaskedLayer):
        return outputs

    def get_config(self):
-        config = {"name": self.__class__.__name__,
-                  "output_dim": self.output_dim,
-                  "init": self.init.__name__,
-                  "activation": self.activation.__name__,
-                  "W_regularizer": self.W_regularizer.get_config() if self.W_regularizer else None,
-                  "b_regularizer": self.b_regularizer.get_config() if self.b_regularizer else None,
-                  "activity_regularizer": self.activity_regularizer.get_config() if self.activity_regularizer else None,
-                  "W_constraint": self.W_constraint.get_config() if self.W_constraint else None,
-                  "b_constraint": self.b_constraint.get_config() if self.b_constraint else None,
-                  "input_dim": self.input_dim,
-                  "input_length": self.input_length}
+        config = {'name': self.__class__.__name__,
+                  'output_dim': self.output_dim,
+                  'init': self.init.__name__,
+                  'activation': self.activation.__name__,
+                  'W_regularizer': self.W_regularizer.get_config() if self.W_regularizer else None,
+                  'b_regularizer': self.b_regularizer.get_config() if self.b_regularizer else None,
+                  'activity_regularizer': self.activity_regularizer.get_config() if self.activity_regularizer else None,
+                  'W_constraint': self.W_constraint.get_config() if self.W_constraint else None,
+                  'b_constraint': self.b_constraint.get_config() if self.b_constraint else None,
+                  'input_dim': self.input_dim,
+                  'input_length': self.input_length}
        base_config = super(TimeDistributedDense, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

@ -1090,6 +1181,10 @@ class AutoEncoder(Layer):

        self.decoder.set_previous(self.encoder)

+        if weights is not None:
+            self.set_weights(weights)
+
+    def build(self):
        self.params = []
        self.regularizers = []
        self.constraints = []
@ -1103,11 +1198,9 @@ class AutoEncoder(Layer):
                    self.params.append(p)
                    self.constraints.append(c)

-        if weights is not None:
-            self.set_weights(weights)
-
-    def set_previous(self, node):
-        self.encoder.set_previous(node)
+    def set_previous(self, node, connection_map={}):
+        self.encoder.set_previous(node, connection_map)
+        super(AutoEncoder, self).set_previous(node, connection_map)

    def get_weights(self):
        weights = []
@ -1148,10 +1241,10 @@ class AutoEncoder(Layer):
        return self.decoder.get_output(train)

    def get_config(self):
-        return {"name": self.__class__.__name__,
-                "encoder_config": self.encoder.get_config(),
-                "decoder_config": self.decoder.get_config(),
-                "output_reconstruction": self.output_reconstruction}
+        return {'name': self.__class__.__name__,
+                'encoder_config': self.encoder.get_config(),
+                'decoder_config': self.decoder.get_config(),
+                'output_reconstruction': self.output_reconstruction}


 class MaxoutDense(Layer):
@ -1275,6 +1368,8 @@ class Lambda(Layer):
        if py3:
            self.function = marshal.dumps(function.__code__)
        else:
+            assert hasattr(function, 'func_code'), ('The Lambda layer "function"'
+                                                    ' argument must be a Python function.')
            self.function = marshal.dumps(function.func_code)
        if output_shape is None:
            self._output_shape = None
@ -1285,6 +1380,7 @@ class Lambda(Layer):
                self._output_shape = marshal.dumps(output_shape.__code__)
            else:
                self._output_shape = marshal.dumps(output_shape.func_code)
+        super(Lambda, self).__init__()

    @property
    def output_shape(self):
@ -1295,18 +1391,16 @@ class Lambda(Layer):
        else:
            output_shape_func = marshal.loads(self._output_shape)
            output_shape_func = types.FunctionType(output_shape_func, globals())
-            shape = output_shape_func(self.previous.output_shape)
+            shape = output_shape_func(self.input_shape)
            if type(shape) not in {list, tuple}:
-                raise Exception("output_shape function must return a tuple")
+                raise Exception('output_shape function must return a tuple')
            return tuple(shape)

    def get_output(self, train=False):
+        X = self.get_input(train)
        func = marshal.loads(self.function)
        func = types.FunctionType(func, globals())
-        if hasattr(self, 'previous'):
-            return func(self.previous.get_output(train))
-        else:
-            return func(self.input)
+        return func(X)


 class MaskedLambda(MaskedLayer, Lambda):
@ -1330,7 +1424,7 @@ class LambdaMerge(Lambda):
    def __init__(self, layers, function, output_shape=None):
        if len(layers) < 2:
            raise Exception('Please specify two or more input layers '
-                            '(or containers) to merge')
+                            '(or containers) to merge.')
        self.layers = layers
        self.params = []
        self.regularizers = []
@ -1359,6 +1453,7 @@ class LambdaMerge(Lambda):
                self._output_shape = marshal.dumps(output_shape.__code__)
            else:
                self._output_shape = marshal.dumps(output_shape.func_code)
+        super(Lambda, self).__init__()

    @property
    def output_shape(self):
@ -1372,7 +1467,7 @@ class LambdaMerge(Lambda):
            output_shape_func = types.FunctionType(output_shape_func, globals())
            shape = output_shape_func(input_shapes)
            if type(shape) not in {list, tuple}:
-                raise Exception('output_shape function must return a tuple')
+                raise Exception('output_shape function must return a tuple.')
            return tuple(shape)

    def get_params(self):
@ -1442,29 +1537,32 @@ class Siamese(Layer):
        merge_mode: Same meaning as `mode` argument of Merge layer
        concat_axis: Same meaning as `concat_axis` argument of Merge layer
        dot_axes: Same meaning as `dot_axes` argument of Merge layer
+        is_graph: Should be set to True when used inside `Graph`
    '''
    def __init__(self, layer, inputs, merge_mode='concat',
-                 concat_axis=1, dot_axes=-1):
+                 concat_axis=1, dot_axes=-1, is_graph=False):
        if merge_mode not in ['sum', 'mul', 'concat', 'ave',
                              'join', 'cos', 'dot', None]:
            raise Exception('Invalid merge mode: ' + str(merge_mode))

        if merge_mode in {'cos', 'dot'}:
            if len(inputs) > 2:
-                raise Exception(merge_mode + ' merge takes exactly 2 layers')
+                raise Exception(merge_mode + ' merge takes exactly 2 layers.')

        self.layer = layer
+        self.trainable = layer.trainable
+        self.is_graph = is_graph
        self.inputs = inputs
-        self.params = []
+        self.layer.set_previous(inputs[0])
        self.merge_mode = merge_mode
        self.concat_axis = concat_axis
        self.dot_axes = dot_axes
-        layer.set_previous(inputs[0])
+        self.params = []
        self.regularizers = []
        self.constraints = []
        self.updates = []
        layers = [layer]
-        if merge_mode:
+        if merge_mode and not is_graph:
            layers += inputs
        for l in layers:
            params, regs, consts, updates = l.get_params()
@ -1475,6 +1573,7 @@ class Siamese(Layer):
                if p not in self.params:
                    self.params.append(p)
                    self.constraints.append(c)
+        super(Siamese, self).__init__()

    @property
    def output_shape(self):
@ -1512,15 +1611,18 @@ class Siamese(Layer):
    def get_params(self):
        return self.params, self.regularizers, self.constraints, self.updates

-    def set_layer_input(self, index):
-        l = self.layer
-        while not hasattr(l, 'previous'):
-            l = l.layers[0]
-        l.previous = self.inputs[index]
+    def set_layer_input(self, head):
+        layer = self.layer
+        from ..layers.containers import Sequential
+        while issubclass(layer.__class__, Sequential):
+            layer = layer.layers[0]
+        layer.previous = self.inputs[head]

    def get_output_at(self, head, train=False):
-        self.set_layer_input(head)
-        return self.layer.get_output(train)
+        X = self.inputs[head].get_output(train)
+        mask = self.inputs[head].get_output_mask(train)
+        Y = self.layer(X, mask)
+        return Y

    def get_output_shape(self, head, train=False):
        self.set_layer_input(head)
@ -1532,7 +1634,7 @@ class Siamese(Layer):
            X = self.get_output_at(i, train)
            if X.name is None:
                raise ValueError('merge_mode="join" '
-                                 'only works with named inputs')
+                                 'only works with named inputs.')
            o[X.name] = X
        return o

@ -1621,7 +1723,7 @@ class Siamese(Layer):

    def get_weights(self):
        weights = self.layer.get_weights()
-        if self.merge_mode:
+        if self.merge_mode and not self.is_graph:
            for m in self.inputs:
                weights += m.get_weights()
        return weights
@ -1630,7 +1732,7 @@ class Siamese(Layer):
        nb_param = len(self.layer.params)
        self.layer.set_weights(weights[:nb_param])
        weights = weights[nb_param:]
-        if self.merge_mode:
+        if self.merge_mode and not self.is_graph:
            for i in range(len(self.inputs)):
                nb_param = len(self.inputs[i].params)
                self.inputs[i].set_weights(weights[:nb_param])
@ -1642,17 +1744,18 @@ class Siamese(Layer):
                  'inputs': [m.get_config() for m in self.inputs],
                  'merge_mode': self.merge_mode,
                  'concat_axis': self.concat_axis,
-                  'dot_axes': self.dot_axes}
+                  'dot_axes': self.dot_axes,
+                  'is_graph': self.is_graph}
        base_config = super(Siamese, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))


 class SiameseHead(Layer):
    '''This layer should be added only on top of a Siamese layer
-    with merge_mode = None
+    with merge_mode = None.

    Outputs the output of the Siamese layer at a given index,
-    specified by the head argument
+    specified by the head argument.

    # Arguments
        head: The index at which the output of the Siamese layer
@ -1661,6 +1764,7 @@ class SiameseHead(Layer):
    def __init__(self, head):
        self.head = head
        self.params = []
+        super(SiameseHead, self).__init__()

    def get_output(self, train=False):
        return self.get_input(train)
@ -1686,7 +1790,7 @@ class SiameseHead(Layer):

 def add_shared_layer(layer, inputs):
    '''Use this function to add a shared layer across
-    multiple Sequential models without merging the outputs
+    multiple Sequential models without merging the outputs.
    '''
    input_layers = [l.layers[-1] for l in inputs]
    s = Siamese(layer, input_layers, merge_mode=None)
@ -1694,3 +1798,126 @@ def add_shared_layer(layer, inputs):
        sh = SiameseHead(i)
        inputs[i].add(s)
        inputs[i].add(sh)
+
+
+class Highway(Layer):
+    '''Densely connected highway network,
+    a natural extension of LSTMs to feedforward networks.
+
+    # Input shape
+        2D tensor with shape: `(nb_samples, input_dim)`.
+
+    # Output shape
+        2D tensor with shape: `(nb_samples, input_dim)`.
+
+    # Arguments
+        init: name of initialization function for the weights of the layer
+            (see [initializations](../initializations.md)),
+            or alternatively, Theano function to use for weights
+            initialization. This parameter is only relevant
+            if you don't pass a `weights` argument.
+        transform_bias: value for the bias to take on initially (default -2)
+        activation: name of activation function to use
+            (see [activations](../activations.md)),
+            or alternatively, elementwise Theano function.
+            If you don't specify anything, no activation is applied
+            (ie. "linear" activation: a(x) = x).
+        weights: list of numpy arrays to set as initial weights.
+            The list should have 1 element, of shape `(input_dim, output_dim)`.
+        W_regularizer: instance of [WeightRegularizer](../regularizers.md)
+            (eg. L1 or L2 regularization), applied to the main weights matrix.
+        b_regularizer: instance of [WeightRegularizer](../regularizers.md),
+            applied to the bias.
+        activity_regularizer: instance of [ActivityRegularizer](../regularizers.md),
+            applied to the network output.
+        W_constraint: instance of the [constraints](../constraints.md) module
+            (eg. maxnorm, nonneg), applied to the main weights matrix.
+        b_constraint: instance of the [constraints](../constraints.md) module,
+            applied to the bias.
+        input_dim: dimensionality of the input (integer).
+            This argument (or alternatively, the keyword argument `input_shape`)
+            is required when using this layer as the first layer in a model.
+
+    # References
+        - [Highway Networks](http://arxiv.org/pdf/1505.00387v2.pdf)
+    '''
+    input_ndim = 2
+
+    def __init__(self, init='glorot_uniform', transform_bias=-2,
+                 activation='linear', weights=None,
+                 W_regularizer=None, b_regularizer=None, activity_regularizer=None,
+                 W_constraint=None, b_constraint=None, input_dim=None, **kwargs):
+        self.init = initializations.get(init)
+        self.transform_bias = transform_bias
+        self.activation = activations.get(activation)
+
+        self.W_regularizer = regularizers.get(W_regularizer)
+        self.b_regularizer = regularizers.get(b_regularizer)
+        self.activity_regularizer = regularizers.get(activity_regularizer)
+
+        self.W_constraint = constraints.get(W_constraint)
+        self.b_constraint = constraints.get(b_constraint)
+        self.constraints = [self.W_constraint, self.b_constraint]
+
+        self.initial_weights = weights
+
+        self.input_dim = input_dim
+        if self.input_dim:
+            kwargs['input_shape'] = (self.input_dim,)
+        self.input = K.placeholder(ndim=2)
+        super(Highway, self).__init__(**kwargs)
+
+    def build(self):
+        input_dim = self.input_shape[1]
+
+        self.W = self.init((input_dim, input_dim))
+        self.W_carry = self.init((input_dim, input_dim))
+
+        self.b = K.zeros((input_dim,))
+        # initialize with a vector of values `transform_bias`
+        self.b_carry = K.variable(np.ones((input_dim,)) * self.transform_bias)
+
+        self.params = [self.W, self.b, self.W_carry, self.b_carry]
+
+        self.regularizers = []
+        if self.W_regularizer:
+            self.W_regularizer.set_param(self.W)
+            self.regularizers.append(self.W_regularizer)
+
+        if self.b_regularizer:
+            self.b_regularizer.set_param(self.b)
+            self.regularizers.append(self.b_regularizer)
+
+        if self.activity_regularizer:
+            self.activity_regularizer.set_layer(self)
+            self.regularizers.append(self.activity_regularizer)
+
+        if self.initial_weights is not None:
+            self.set_weights(self.initial_weights)
+            del self.initial_weights
+
+    @property
+    def output_shape(self):
+        return (self.input_shape[0], self.input_shape[1])
+
+    def get_output(self, train=False):
+        X = self.get_input(train)
+        transform_weight = activations.sigmoid(K.dot(X, self.W_carry) + self.b_carry)
+        act = self.activation(K.dot(X, self.W) + self.b)
+        act *= transform_weight
+        output = act + (1 - transform_weight) * X
+        return output
+
+    def get_config(self):
+        config = {'name': self.__class__.__name__,
+                  'init': self.init.__name__,
+                  'transform_bias': self.transform_bias,
+                  'activation': self.activation.__name__,
+                  'W_regularizer': self.W_regularizer.get_config() if self.W_regularizer else None,
+                  'b_regularizer': self.b_regularizer.get_config() if self.b_regularizer else None,
+                  'activity_regularizer': self.activity_regularizer.get_config() if self.activity_regularizer else None,
+                  'W_constraint': self.W_constraint.get_config() if self.W_constraint else None,
+                  'b_constraint': self.b_constraint.get_config() if self.b_constraint else None,
+                  'input_dim': self.input_dim}
+        base_config = super(Highway, self).get_config()
+        return dict(list(base_config.items()) + list(config.items()))
--- a/keras/layers/embeddings.py
+++ b/keras/layers/embeddings.py
@ -8,10 +8,10 @@ from ..constraints import unitnorm


 class Embedding(Layer):
-    '''Turn positive integers (indexes) into denses vectors of fixed size.
+    '''Turn positive integers (indexes) into dense vectors of fixed size.
    eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]

-    This layer can only be used as the first layer in  a model.
+    This layer can only be used as the first layer in a model.

    # Input shape
        2D tensor with shape: `(nb_samples, sequence_length)`.
@ -38,7 +38,7 @@ class Embedding(Layer):
          This is useful for [recurrent layers](recurrent.md) which may take
          variable length input. If this is `True` then all subsequent layers
          in the model need to support masking or an exception will be raised.
-      input_length: Length of input sequences, when it is constantself.
+      input_length: Length of input sequences, when it is constant.
          This argument is required if you are going to connect
          `Flatten` then `Dense` layers upstream
          (without it, the shape of the dense outputs cannot be computed).
--- a/keras/layers/normalization.py
+++ b/keras/layers/normalization.py
@ -4,7 +4,9 @@ from .. import backend as K


 class BatchNormalization(Layer):
-    '''Normalize the activations of the previous layer at each batch.
+    '''Normalize the activations of the previous layer at each batch,
+    i.e. applies a transformation that maintains the mean activation
+    close to 0 and the activation standard deviation close to 1.

    # Input shape
        Arbitrary. Use the keyword argument `input_shape`
@ -18,7 +20,13 @@ class BatchNormalization(Layer):
        epsilon: small float > 0. Fuzz parameter.
        mode: integer, 0 or 1.
            - 0: feature-wise normalization.
-            - 1: sample-wise normalization.
+                If the input has multiple feature dimensions,
+                each will be normalized separately
+                (e.g. for an image input with shape
+                `(channels, rows, cols)`,
+                each combination of a channel, row and column
+                will be normalized separately).
+            - 1: sample-wise normalization. This mode assumes a 2D input.
        momentum: momentum in the computation of the
            exponential average of the mean and standard deviation
            of the data, for feature-wise normalization.
@ -42,22 +50,12 @@ class BatchNormalization(Layer):
        input_shape = self.input_shape  # starts with samples axis
        input_shape = input_shape[1:]

-        self.gamma = self.init((input_shape))
+        self.gamma = self.init(input_shape)
        self.beta = K.zeros(input_shape)

        self.params = [self.gamma, self.beta]
        self.running_mean = K.zeros(input_shape)
-        self.running_std = K.ones((input_shape))
-
-        # initialize self.updates: batch mean/std computation
-        X = self.get_input(train=True)
-        m = K.mean(X, axis=0)
-        std = K.mean(K.square(X - m) + self.epsilon, axis=0)
-        std = K.sqrt(std)
-        mean_update = self.momentum * self.running_mean + (1-self.momentum) * m
-        std_update = self.momentum * self.running_std + (1-self.momentum) * std
-        self.updates = [(self.running_mean, mean_update),
-                        (self.running_std, std_update)]
+        self.running_std = K.ones(input_shape)

        if self.initial_weights is not None:
            self.set_weights(self.initial_weights)
@ -76,6 +74,13 @@ class BatchNormalization(Layer):
    def get_output(self, train):
        X = self.get_input(train)
        if self.mode == 0:
+            m = K.mean(X, axis=0)
+            std = K.mean(K.square(X - m) + self.epsilon, axis=0)
+            std = K.sqrt(std)
+            mean_update = self.momentum * self.running_mean + (1-self.momentum) * m
+            std_update = self.momentum * self.running_std + (1-self.momentum) * std
+            self.updates = [(self.running_mean, mean_update),
+                            (self.running_std, std_update)]
            X_normed = ((X - self.running_mean) /
                        (self.running_std + self.epsilon))
        elif self.mode == 1:
--- a/keras/layers/recurrent.py
+++ b/keras/layers/recurrent.py
@ -30,7 +30,7 @@ class Recurrent(MaskedLayer):
        return_sequences: Boolean. Whether to return the last output
            in the output sequence, or the full sequence.
        go_backwards: Boolean (default False).
-            If True, rocess the input sequence backwards.
+            If True, process the input sequence backwards.
        stateful: Boolean (default False). If True, the last state
            for each sample at index i in a batch will be used as initial
            state for the sample of index i in the following batch.
@ -43,7 +43,7 @@ class Recurrent(MaskedLayer):
            `Flatten` then `Dense` layers upstream
            (without it, the shape of the dense outputs cannot be computed).
            Note that if the recurrent layer is not the first layer
-            in your model, you would need to specify the input Length
+            in your model, you would need to specify the input length
            at the level of the first layer
            (e.g. via the `input_shape` argument)

@ -73,7 +73,7 @@ class Recurrent(MaskedLayer):
        To enable statefulness:
            - specify `stateful=True` in the layer constructor.
            - specify a fixed batch size for your model, by passing
-                a `batch_input_size=(...)` to the first layer in your model.
+                a `batch_input_shape=(...)` to the first layer in your model.
                This is the expected shape of your inputs *including the batch size*.
                It should be a tuple of integers, e.g. `(32, 10, 100)`.

@ -129,7 +129,7 @@ class Recurrent(MaskedLayer):
        if K._BACKEND == 'tensorflow':
            if not self.input_shape[1]:
                raise Exception('When using TensorFlow, you should define ' +
-                                'explicitely the number of timesteps of ' +
+                                'explicitly the number of timesteps of ' +
                                'your sequences. Make sure the first layer ' +
                                'has a "batch_input_shape" argument ' +
                                'including the samples axis.')
@ -205,7 +205,7 @@ class SimpleRNN(Recurrent):

        self.W = self.init((input_dim, self.output_dim))
        self.U = self.inner_init((self.output_dim, self.output_dim))
-        self.b = K.zeros((self.output_dim))
+        self.b = K.zeros((self.output_dim,))
        self.params = [self.W, self.U, self.b]

        if self.initial_weights is not None:
@ -326,7 +326,7 @@ class GRU(Recurrent):
        z = self.inner_activation(x_z + K.dot(h_tm1, self.U_z))
        r = self.inner_activation(x_r + K.dot(h_tm1, self.U_r))

-        hh = self.inner_activation(x_h + K.dot(r * h_tm1, self.U_h))
+        hh = self.activation(x_h + K.dot(r * h_tm1, self.U_h))
        h = z * h_tm1 + (1 - z) * hh
        return h, [h]

@ -391,19 +391,19 @@ class LSTM(Recurrent):

        self.W_i = self.init((input_dim, self.output_dim))
        self.U_i = self.inner_init((self.output_dim, self.output_dim))
-        self.b_i = K.zeros((self.output_dim))
+        self.b_i = K.zeros((self.output_dim,))

        self.W_f = self.init((input_dim, self.output_dim))
        self.U_f = self.inner_init((self.output_dim, self.output_dim))
-        self.b_f = self.forget_bias_init((self.output_dim))
+        self.b_f = self.forget_bias_init((self.output_dim,))

        self.W_c = self.init((input_dim, self.output_dim))
        self.U_c = self.inner_init((self.output_dim, self.output_dim))
-        self.b_c = K.zeros((self.output_dim))
+        self.b_c = K.zeros((self.output_dim,))

        self.W_o = self.init((input_dim, self.output_dim))
        self.U_o = self.inner_init((self.output_dim, self.output_dim))
-        self.b_o = K.zeros((self.output_dim))
+        self.b_o = K.zeros((self.output_dim,))

        self.params = [self.W_i, self.U_i, self.b_i,
                       self.W_c, self.U_c, self.b_c,
--- a/keras/models.py
+++ b/keras/models.py
@ -5,6 +5,12 @@ import warnings
 import pprint
 from six.moves import range
 import six
+import time
+import threading
+try:
+    import queue
+except ImportError:
+    import Queue as queue

 from . import backend as K
 from . import optimizers
@ -53,7 +59,7 @@ def slice_X(X, start=None, stop=None):
    '''
    if type(X) == list:
        if hasattr(start, '__len__'):
-            # hdf5 dataset only support list object as indices
+            # hdf5 datasets only support list objects as indices
            if hasattr(start, 'shape'):
                start = start.tolist()
            return [x[start] for x in X]
@ -75,10 +81,12 @@ def weighted_objective(fn):
        # score_array has ndim >= 2
        score_array = fn(y_true, y_pred)
        if mask is not None:
+            # Cast the mask to floatX to avoid float64 upcasting in theano
+            mask = K.cast(mask, K.floatx())
            # mask should have the same shape as score_array
            score_array *= mask
            #  the loss per batch should be proportional
-            #  to the number of unmasked sampled.
+            #  to the number of unmasked samples.
            score_array /= K.mean(mask)

        # reduce score_array to 1D
@ -148,6 +156,16 @@ def model_from_config(config, custom_objects={}):
    if 'optimizer' in config:
        # if it has an optimizer, the model is assumed to be compiled
        loss = config.get('loss')
+
+        # if a custom loss function is passed replace it in loss
+        if model_name == 'Graph':
+            for l in loss:
+                for c in custom_objects:
+                    if loss[l] == c:
+                        loss[l] = custom_objects[c]
+        elif model_name == 'Sequential' and loss in custom_objects:
+            loss = custom_objects[loss]
+
        class_mode = config.get('class_mode')

        optimizer_params = dict([(k, v) for k, v in config.get('optimizer').items()])
@ -179,6 +197,8 @@ class Model(object):
            Abstract fit function for f(ins).
            Assume that f returns a list, labelled by out_labels.
        '''
+        self.training_data = ins
+        self.validation_data = val_ins
        do_validation = False
        if val_f and val_ins:
            do_validation = True
@ -360,8 +380,21 @@ class Model(object):
        `keras.models.from_json(json_string, custom_objects={})`.
        '''
        import json
+
+        def get_json_type(obj):
+
+            # if obj is any numpy type
+            if type(obj).__module__ == np.__name__:
+                return obj.item();
+
+            # if obj is a python 'type'
+            if type(obj).__name__ == type.__name__:
+                return obj.__name__
+
+            raise TypeError('Not JSON Serializable')
+
        config = self.get_config()
-        return json.dumps(config, **kwargs)
+        return json.dumps(config, default=get_json_type, **kwargs)

    def summary(self):
        '''Print out a summary of the model architecture,
@ -391,7 +424,7 @@ class Sequential(Model, containers.Sequential):
        self.optimizer = optimizers.get(optimizer)

        self.loss = objectives.get(loss)
-        weighted_loss = weighted_objective(objectives.get(loss))
+        weighted_loss = weighted_objective(self.loss)

        # input of model
        self.X_train = self.get_input(train=True)
@ -445,15 +478,15 @@ class Sequential(Model, containers.Sequential):
        self._train = K.function(train_ins, [train_loss], updates=updates)
        self._train_with_acc = K.function(train_ins, [train_loss, train_accuracy], updates=updates)
        self._predict = K.function(predict_ins, [self.y_test], updates=self.state_updates)
-        self._test = K.function(test_ins, [test_loss])
-        self._test_with_acc = K.function(test_ins, [test_loss, test_accuracy])
+        self._test = K.function(test_ins, [test_loss], updates=self.state_updates)
+        self._test_with_acc = K.function(test_ins, [test_loss, test_accuracy], updates=self.state_updates)

    def fit(self, X, y, batch_size=128, nb_epoch=100, verbose=1, callbacks=[],
            validation_split=0., validation_data=None, shuffle=True,
            show_accuracy=False, class_weight=None, sample_weight=None):
        '''Train the model for a fixed number of epochs.

-        Returns a history object. It `history` attribute is a record of
+        Returns a history object. Its `history` attribute is a record of
        training loss values at successive epochs,
        as well as validation loss values (if applicable).

@ -490,6 +523,20 @@ class Sequential(Model, containers.Sequential):
                output timesteps, which is useful
                in sequence to sequence learning.
        '''
+        if type(X) == list:
+            if len(set([len(a) for a in X] + [len(y)])) != 1:
+                raise Exception('All input arrays and the target array must '
+                                'have the same number of samples.')
+        else:
+            if len(X) != len(y):
+                raise Exception('The input data tensor (X) and '
+                                'the target tensor (y) must have '
+                                'the same number of samples. Found: '
+                                'len(X) = {}, len(y) = {}'.format(len(X), len(y)))
+        if sample_weight is not None:
+            assert len(sample_weight) == len(y), ('"sample_weight" must have '
+                                                  'the same number of samples '
+                                                  'as X and y.')
        X = standardize_X(X)
        y = standardize_y(y)

@ -503,11 +550,20 @@ class Sequential(Model, containers.Sequential):
        if validation_data:
            if len(validation_data) == 2:
                X_val, y_val = validation_data
+                if type(X_val) == list:
+                    assert len(set([len(a) for a in X_val] + [len(y_val)])) == 1
+                else:
+                    assert len(X_val) == len(y_val)
                X_val = standardize_X(X_val)
                y_val = standardize_y(y_val)
                sample_weight_val = standardize_weights(y_val)
            elif len(validation_data) == 3:
                X_val, y_val, sample_weight_val = validation_data
+                if type(X_val) == list:
+                    assert len(set([len(a) for a in X_val] +
+                                   [len(y_val), len(sample_weight_val)])) == 1
+                else:
+                    assert len(X_val) == len(y_val) == len(sample_weight_val)
                X_val = standardize_X(X_val)
                y_val = standardize_y(y_val)
                sample_weight_val = standardize_weights(y_val,
@ -611,6 +667,20 @@ class Sequential(Model, containers.Sequential):
            verbose: verbosity mode, 0 or 1.
            sample_weight: sample weights, as a numpy array.
        '''
+        if type(X) == list:
+            if len(set([len(a) for a in X] + [len(y)])) != 1:
+                raise Exception('All input arrays and the target array must '
+                                'have the same number of samples.')
+        else:
+            if len(X) != len(y):
+                raise Exception('The input data tensor (X) and '
+                                'the target tensor (y) must have '
+                                'the same number of samples. Found: '
+                                'len(X) = {}, len(y) = {}'.format(len(X), len(y)))
+        if sample_weight is not None:
+            assert len(sample_weight) == len(y), ('"sample_weight" must have '
+                                                  'the same number of samples '
+                                                  'as X and y.')
        X = standardize_X(X)
        y = standardize_y(y)
        sample_weight = standardize_weights(y, sample_weight=sample_weight)
@ -635,6 +705,20 @@ class Sequential(Model, containers.Sequential):

        Arguments: see `fit` method.
        '''
+        if type(X) == list:
+            if len(set([len(a) for a in X] + [len(y)])) != 1:
+                raise Exception('All input arrays and the target array must '
+                                'have the same number of samples.')
+        else:
+            if len(X) != len(y):
+                raise Exception('The input data tensor (X) and '
+                                'the target tensor (y) must have '
+                                'the same number of samples. Found: '
+                                'len(X) = {}, len(y) = {}'.format(len(X), len(y)))
+        if sample_weight is not None:
+            assert len(sample_weight) == len(y), ('"sample_weight" must have '
+                                                  'the same number of samples '
+                                                  'as X and y.')
        X = standardize_X(X)
        y = standardize_y(y)
        sample_weight = standardize_weights(y, class_weight=class_weight,
@ -651,6 +735,20 @@ class Sequential(Model, containers.Sequential):

        Arguments: see `fit` method.
        '''
+        if type(X) == list:
+            if len(set([len(a) for a in X] + [len(y)])) != 1:
+                raise Exception('All input arrays and the target array must '
+                                'have the same number of samples.')
+        else:
+            if len(X) != len(y):
+                raise Exception('The input data tensor (X) and '
+                                'the target tensor (y) must have '
+                                'the same number of samples. Found: '
+                                'len(X) = {}, len(y) = {}'.format(len(X), len(y)))
+        if sample_weight is not None:
+            assert len(sample_weight) == len(y), ('"sample_weight" must have '
+                                                  'the same number of samples '
+                                                  'as X and y.')
        X = standardize_X(X)
        y = standardize_y(y)
        sample_weight = standardize_weights(y, sample_weight=sample_weight)
@ -713,6 +811,208 @@ class Sequential(Model, containers.Sequential):
            self.layers[k].set_weights(weights)
        f.close()

+    def fit_generator(self, generator, samples_per_epoch, nb_epoch,
+                      verbose=1, show_accuracy=False, callbacks=[],
+                      validation_data=None, class_weight=None, nb_worker=1):
+        '''Fit a model on data generated batch-by-batch by a Python generator.
+        The generator is run in parallel to the model, for efficiency,
+        and can be run by multiple workers at the same time.
+        For instance, this allows you to do real-time data augmentation
+        on images on CPU in parallel to training your model on GPU.
+
+        # Arguments
+            generator: a Python generator,
+                yielding either (X, y) or (X, y, sample_weight).
+                The generator is expected to loop over its data
+                indefinitely. An epoch finishes when `samples_per_epoch`
+                samples have been seen by the model.
+                The output of the generator must be a tuple of either 2 or 3
+                numpy arrays.
+                If the output tuple has two elements, they are assumed to be
+                (input_data, target_data).
+                If it has three elements, they are assumed to be
+                (input_data, target_data, sample_weight).
+                All arrays should contain the same number of samples.
+            samples_per_epoch: integer, number of samples to process before
+                starting a new epoch.
+            nb_epoch: integer, total number of iterations on the data.
+            verbose: verbosity mode, 0, 1, or 2.
+            show_accuracy: boolean. Whether to display accuracy (only relevant
+                for classification problems).
+            callbacks: list of callbacks to be called during training.
+            validation_data: tuple of 2 or 3 numpy arrays. If 2 elements,
+                they are assumed to be (input_data, target_data);
+                if 3 elements, they are assumed to be
+                (input_data, target_data, sample weights).
+            class_weight: dictionary mapping class indices to a weight
+                for the class.
+            nb_worker: integer, number of workers to use for running
+                the generator (in parallel to model training).
+                If using multiple workers, the processing order of batches
+                generated by the model will be non-deterministic.
+                If using multiple workers, make sure to protect
+                any thread-unsafe operation done by the generator
+                using a Python mutex.
+
+        # Returns
+
+        A `History` object.
+
+        # Examples
+
+        ```python
+        def generate_arrays_from_file(path):
+            while 1:
+                f = open(path)
+                for line in f:
+                    # create numpy arrays of input data
+                    # and labels, from each line in the file
+                    x, y = process_line(line)
+                    yield x, y
+                f.close()
+
+        model.fit_generator(generate_arrays_from_file('/my_file.txt'),
+                            samples_per_epoch=10000, nb_epoch=10)
+        ```
+        '''
+        max_queue_size = 10  # maximum number of batches in queue
+        wait_time = 0.05  # in seconds
+        epoch = 0
+        do_validation = bool(validation_data)
+        if show_accuracy:
+            out_labels = ['loss', 'acc']
+        else:
+            out_labels = ['loss']
+        metrics = ['loss', 'acc', 'val_loss', 'val_acc']
+
+        # prepare callbacks
+        history = cbks.History()
+        if verbose:
+            callbacks = [history, cbks.BaseLogger()] + callbacks
+        else:
+            callbacks = [history] + callbacks
+        callbacks = cbks.CallbackList(callbacks)
+
+        callbacks._set_model(self)
+        callbacks._set_params({
+            'nb_epoch': nb_epoch,
+            'nb_sample': samples_per_epoch,
+            'verbose': verbose,
+            'do_validation': do_validation,
+            'metrics': metrics,
+        })
+        callbacks.on_train_begin()
+
+        # util function to validate the batches produced
+        # by the generator
+        def input_validation(generator_output):
+            if not hasattr(generator_output, '__len__'):
+                _stop.set()
+                raise Exception('The generator output must be a tuple.')
+            if len(generator_output) == 2:
+                X, y = generator_output
+                if type(X) == list:
+                    assert len(set([len(a) for a in X] + [len(y)])) == 1
+                else:
+                    assert len(X) == len(y)
+                sample_weight = None
+            elif len(generator_output) == 3:
+                X, y, sample_weight = generator_output
+                if type(X) == list:
+                    assert len(set([len(a) for a in X] + [len(y), len(sample_weight)])) == 1
+                else:
+                    assert len(X) == len(y) == len(sample_weight)
+            else:
+                _stop.set()
+                raise Exception('The generator output tuple must have '
+                                '2 or 3 elements.')
+            return X, y, sample_weight
+
+        # start generator thread storing batches into a queue
+        generator_queue = queue.Queue()
+        _stop = threading.Event()
+
+        def generator_task():
+            i = 0
+            while not _stop.is_set():
+                try:
+                    if generator_queue.qsize() < max_queue_size:
+                        generator_output = next(generator)
+                        generator_queue.put(generator_output)
+                        i += 1
+                    else:
+                        time.sleep(wait_time)
+                except:
+                    _stop.set()
+                    return
+
+        generator_threads = [threading.Thread(target=generator_task) for _ in range(nb_worker)]
+        for thread in generator_threads:
+            thread.start()
+
+        self.stop_training = False
+        while epoch < nb_epoch:
+            callbacks.on_epoch_begin(epoch)
+            samples_seen = 0
+            batch_index = 0
+            while samples_seen < samples_per_epoch:
+                while not _stop.is_set():
+                    if not generator_queue.empty():
+                        generator_output = generator_queue.get()
+                        break
+                    else:
+                        time.sleep(wait_time)
+
+                X, y, sample_weight = input_validation(generator_output)
+
+                batch_logs = {}
+                batch_size = len(X[0])
+                batch_logs['batch'] = batch_index
+                batch_logs['size'] = batch_size
+                callbacks.on_batch_begin(batch_index, batch_logs)
+                outs = self.train_on_batch(X, y,
+                                           accuracy=show_accuracy,
+                                           sample_weight=sample_weight,
+                                           class_weight=class_weight)
+                if type(outs) != list:
+                    outs = [outs]
+                for l, o in zip(out_labels, outs):
+                    batch_logs[l] = o
+
+                callbacks.on_batch_end(batch_index, batch_logs)
+
+                # construct epoch logs
+                epoch_logs = {}
+                batch_index += 1
+                samples_seen += batch_size
+                if samples_seen >= samples_per_epoch:  # epoch finished
+                    if do_validation:
+                        if hasattr(validation_data, 'next'):
+                            # assumed to be generator
+                            # TODO: call self.evaluate_generator()
+                            _stop.set()
+                            raise NotImplementedError()
+                        else:
+                            # input validation
+                            X, y, sample_weight = input_validation(validation_data)
+                            val_outs = self.evaluate(X, y,
+                                                     show_accuracy=show_accuracy,
+                                                     sample_weight=sample_weight,
+                                                     verbose=0)
+                        if type(val_outs) != list:
+                            val_outs = [val_outs]
+                        # same labels assumed
+                        for l, o in zip(out_labels, val_outs):
+                            epoch_logs['val_' + l] = o
+
+            callbacks.on_epoch_end(epoch, epoch_logs)
+            epoch += 1
+            if self.stop_training:
+                break
+        _stop.set()
+        callbacks.on_train_end()
+        return history
+

 class Graph(Model, containers.Graph):
    '''Arbitrary connection graph.
@ -774,7 +1074,7 @@ class Graph(Model, containers.Graph):
        self.loss = loss

        self._train = K.function(train_ins, [train_loss], updates=updates)
-        self._test = K.function(test_ins, [test_loss])
+        self._test = K.function(test_ins, [test_loss], updates=self.state_updates)
        self._predict = K.function(inputs=ins, outputs=ys_test,
                                   updates=self.state_updates)

@ -783,7 +1083,7 @@ class Graph(Model, containers.Graph):
            class_weight={}, sample_weight={}):
        '''Train the model for a fixed number of epochs.

-        Returns a history object. It `history` attribute is a record of
+        Returns a history object. Its `history` attribute is a record of
        training loss values at successive epochs,
        as well as validation loss values (if applicable).

@ -812,6 +1112,9 @@ class Graph(Model, containers.Graph):
        '''
        X = [data[name] for name in self.input_order]
        y = [standardize_y(data[name]) for name in self.output_order]
+        if len(set([len(a) for a in X] + [len(a) for a in y])) != 1:
+            raise Exception('All input arrays and target arrays must have '
+                            'the same number of samples.')

        sample_weight_list = [standardize_weights(y[i],
                                                  sample_weight=sample_weight.get(self.output_order[i])) for i in range(len(self.output_order))]
@ -856,8 +1159,10 @@ class Graph(Model, containers.Graph):
        '''
        sample_weight = [standardize_weights(data[name],
                                             sample_weight=sample_weight.get(name)) for name in self.output_order]
-
        ins = [data[name] for name in self.input_order] + [standardize_y(data[name]) for name in self.output_order] + sample_weight
+        if len(set([len(a) for a in ins])) != 1:
+            raise Exception('All input arrays and target arrays must have '
+                            'the same number of samples.')
        outs = self._test_loop(self._test, ins, batch_size, verbose)
        return outs[0]

@ -868,6 +1173,9 @@ class Graph(Model, containers.Graph):
        Arguments: see `fit` method.
        '''
        ins = [data[name] for name in self.input_order]
+        if len(set([len(a) for a in ins])) != 1:
+            raise Exception('All input arrays and target arrays must have '
+                            'the same number of samples.')
        outs = self._predict_loop(self._predict, ins, batch_size, verbose)
        return dict(zip(self.output_order, outs))

@ -880,6 +1188,9 @@ class Graph(Model, containers.Graph):
                                             sample_weight=sample_weight.get(name),
                                             class_weight=class_weight.get(name)) for name in self.output_order]
        ins = [data[name] for name in self.input_order] + [standardize_y(data[name]) for name in self.output_order] + sample_weight
+        if len(set([len(a) for a in ins])) != 1:
+            raise Exception('All input arrays and target arrays must have '
+                            'the same number of samples.')
        return self._train(ins)

    def test_on_batch(self, data, sample_weight={}):
@ -890,13 +1201,20 @@ class Graph(Model, containers.Graph):
        sample_weight = [standardize_weights(data[name],
                                             sample_weight=sample_weight.get(name)) for name in self.output_order]
        ins = [data[name] for name in self.input_order] + [standardize_y(data[name]) for name in self.output_order] + sample_weight
+        if len(set([len(a) for a in ins])) != 1:
+            raise Exception('All input arrays and target arrays must have '
+                            'the same number of samples.')
        return self._test(ins)

    def predict_on_batch(self, data):
        '''Generate predictions for a single batch of samples.
        '''
        ins = [data[name] for name in self.input_order]
-        return self._predict(ins)
+        if len(set([len(a) for a in ins])) != 1:
+            raise Exception('All input arrays and target arrays must have '
+                            'the same number of samples.')
+        outs = self._predict(ins)
+        return dict(zip(self.output_order, outs))

    def save_weights(self, filepath, overwrite=False):
        '''Save weights from all layers to a HDF5 files.
@ -938,3 +1256,198 @@ class Graph(Model, containers.Graph):
        weights = [g['param_{}'.format(p)] for p in range(g.attrs['nb_params'])]
        self.set_weights(weights)
        f.close()
+
+    def fit_generator(self, generator, samples_per_epoch, nb_epoch,
+                      verbose=1, callbacks=[],
+                      validation_data=None, class_weight={}, nb_worker=1):
+        '''Fit a model on data generated batch-by-batch by a Python generator.
+        The generator is run in parallel to the model, for efficiency,
+        and can be run by multiple workers at the same time.
+        For instance, this allows you to do real-time data augmentation
+        on images on CPU in parallel to training your model on GPU.
+
+        # Arguments
+            generator: a generator.
+                The output of the generator must be either a dictionary
+                mapping inputs and outputs names to numpy arrays, or
+                a tuple of dictionaries (input_data, sample_weight).
+                All arrays should contain the same number of samples.
+                The generator is expected to loop over its data
+                indefinitely. An epoch finishes when `samples_per_epoch`
+                samples have been seen by the model.
+            samples_per_epoch: integer, number of samples to process before
+                going to the next epoch.
+            nb_epoch: integer, total number of iterations on the data.
+            verbose: verbosity mode, 0, 1, or 2.
+            callbacks: list of callbacks to be called during training.
+            validation_data: dictionary mapping input names and outputs names
+                to appropriate numpy arrays to be used as
+                held-out validation data.
+                All arrays should contain the same number of samples.
+            class_weight: dictionary mapping class indices to a weight
+                for the class.
+            nb_worker: integer, number of workers to use for running
+                the generator (in parallel to model training).
+                If using multiple workers, the processing order of batches
+                generated by the model will be non-deterministic.
+                If using multiple workers, make sure to protect
+                any thread-unsafe operation done by the generator
+                using a Python mutex.
+
+        # Returns
+
+        A `History` object.
+
+        # Examples
+
+        ```python
+        def generate_arrays_from_file(path):
+            while 1:
+                f = open(path)
+                for line in f:
+                    # create numpy arrays of input data
+                    # and labels, from each line in the file
+                    x1, x2, y = process_line(line)
+                    yield {'input_1': x1, 'input_2': x2, 'output': y}
+                f.close()
+
+        graph.fit_generator(generate_arrays_from_file('/my_file.txt'),
+                            samples_per_epoch=10000, nb_epoch=10)
+        ```
+        '''
+        max_queue_size = 10  # maximum number of batches in queue
+        wait_time = 0.05  # in seconds
+        epoch = 0
+        do_validation = bool(validation_data)
+        out_labels = ['loss']
+        metrics = ['loss', 'val_loss']
+        if not class_weight:
+            class_weight = {}
+
+        # prepare callbacks
+        history = cbks.History()
+        if verbose:
+            callbacks = [history, cbks.BaseLogger()] + callbacks
+        else:
+            callbacks = [history] + callbacks
+        callbacks = cbks.CallbackList(callbacks)
+
+        callbacks._set_model(self)
+        callbacks._set_params({
+            'nb_epoch': nb_epoch,
+            'nb_sample': samples_per_epoch,
+            'verbose': verbose,
+            'do_validation': do_validation,
+            'metrics': metrics,
+        })
+        callbacks.on_train_begin()
+
+        # util function to validate the batches produced
+        # by the generator
+        def input_validation(generator_output):
+            if type(generator_output) in [list, tuple]:
+                if len(generator_output) == 2:
+                    data, sample_weight = generator_output
+                else:
+                    _stop.set()
+                    raise Exception('The generator output tuple must have '
+                                    '2 dictionary elements: '
+                                    '(data, sample_weight).')
+            elif type(generator_output) == dict:
+                data = generator_output
+                sample_weight = {}
+            else:
+                _stop.set()
+                raise Exception('The generator output must be '
+                                'a data dictionary or a tuple '
+                                '(data, sample_weight).')
+            assert type(data) == dict
+            assert type(sample_weight) == dict
+            if len(set([len(data[name]) for name in data.keys()] +
+                       [len(sample_weight[name]) for name in sample_weight.keys()])) != 1:
+                raise Exception('All input arrays and target arrays must have '
+                                'the same number of samples.')
+            return data, sample_weight
+
+        # start generator thread storing batches into a queue
+        generator_queue = queue.Queue()
+        _stop = threading.Event()
+
+        def generator_task():
+            i = 0
+            while not _stop.is_set():
+                try:
+                    if generator_queue.qsize() < max_queue_size:
+                        generator_output = next(generator)
+                        generator_queue.put(generator_output)
+                        i += 1
+                    else:
+                        time.sleep(wait_time)
+                except:
+                    _stop.set()
+                    return
+
+        generator_threads = [threading.Thread(target=generator_task) for _ in range(nb_worker)]
+        for thread in generator_threads:
+            thread.start()
+
+        self.stop_training = False
+        while epoch < nb_epoch:
+            callbacks.on_epoch_begin(epoch)
+            samples_seen = 0
+            batch_index = 0
+            while samples_seen < samples_per_epoch:
+                while not _stop.is_set():
+                    if not generator_queue.empty():
+                        generator_output = generator_queue.get()
+                        break
+                    else:
+                        time.sleep(wait_time)
+
+                data, sample_weight = input_validation(generator_output)
+
+                batch_logs = {}
+                batch_size = len(data[list(data.keys())[0]])
+                batch_logs['batch'] = batch_index
+                batch_logs['size'] = batch_size
+                callbacks.on_batch_begin(batch_index, batch_logs)
+                outs = self.train_on_batch(data,
+                                           sample_weight=sample_weight,
+                                           class_weight=class_weight)
+                if type(outs) != list:
+                    outs = [outs]
+                for l, o in zip(out_labels, outs):
+                    batch_logs[l] = o
+
+                callbacks.on_batch_end(batch_index, batch_logs)
+
+                # construct epoch logs
+                epoch_logs = {}
+                batch_index += 1
+                samples_seen += batch_size
+                if samples_seen >= samples_per_epoch:  # epoch finished
+                    if do_validation:
+                        if hasattr(validation_data, 'next'):
+                            # assumed to be generator
+                            # TODO: call self.evaluate_generator()
+                            _stop.set()
+                            raise NotImplementedError()
+                        else:
+                            # input validation
+                            data, sample_weight = input_validation(validation_data)
+                            val_outs = self.evaluate(data,
+                                                     sample_weight=sample_weight,
+                                                     verbose=0)
+                        if type(val_outs) != list:
+                            val_outs = [val_outs]
+                        # same labels assumed
+                        for l, o in zip(out_labels, val_outs):
+                            epoch_logs['val_' + l] = o
+
+            callbacks.on_epoch_end(epoch, epoch_logs)
+            epoch += 1
+            if self.stop_training:
+                break
+        _stop.set()
+        callbacks.on_train_end()
+        return history
--- a/keras/objectives.py
+++ b/keras/objectives.py
@ -35,7 +35,7 @@ def hinge(y_true, y_pred):


 def categorical_crossentropy(y_true, y_pred):
-    '''Expects a binary class matrix instead of a vector of scalar classes
+    '''Expects a binary class matrix instead of a vector of scalar classes.
    '''
    return K.mean(K.categorical_crossentropy(y_pred, y_true), axis=-1)

@ -44,15 +44,25 @@ def binary_crossentropy(y_true, y_pred):
    return K.mean(K.binary_crossentropy(y_pred, y_true), axis=-1)


-def poisson_loss(y_true, y_pred):
+def poisson(y_true, y_pred):
    return K.mean(y_pred - y_true * K.log(y_pred + K.epsilon()), axis=-1)

+
+def cosine_proximity(y_true, y_pred):
+    assert K.ndim(y_true) == 2
+    assert K.ndim(y_pred) == 2
+    y_true = K.l2_normalize(y_true, axis=1)
+    y_pred = K.l2_normalize(y_pred, axis=1)
+    return -K.mean(y_true * y_pred, axis=1)
+
+
 # aliases
 mse = MSE = mean_squared_error
 rmse = RMSE = root_mean_squared_error
 mae = MAE = mean_absolute_error
 mape = MAPE = mean_absolute_percentage_error
 msle = MSLE = mean_squared_logarithmic_error
+cosine = cosine_proximity

 from .utils.generic_utils import get_from_module
 def get(identifier):
--- a/keras/optimizers.py
+++ b/keras/optimizers.py
@ -275,12 +275,66 @@ class Adam(Optimizer):
                "beta_2": float(K.get_value(self.beta_2)),
                "epsilon": self.epsilon}

+class Adamax(Optimizer):
+    '''Adamax optimizer from Adam paper's Section 7. It is a variant
+     of Adam based on the infinity norm.
+
+    Default parameters follow those provided in the paper.
+
+    # Arguments
+        lr: float >= 0. Learning rate.
+        beta_1/beta_2: floats, 0 < beta < 1. Generally close to 1.
+        epsilon: float >= 0. Fuzz factor.
+
+    # References
+        - [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
+    '''
+    def __init__(self, lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=1e-8,
+                 *args, **kwargs):
+        super(Adamax, self).__init__(**kwargs)
+        self.__dict__.update(locals())
+        self.iterations = K.variable(0)
+        self.lr = K.variable(lr)
+        self.beta_1 = K.variable(beta_1)
+        self.beta_2 = K.variable(beta_2)
+
+    def get_updates(self, params, constraints, loss):
+        grads = self.get_gradients(loss, params)
+        self.updates = [(self.iterations, self.iterations+1.)]
+
+        t = self.iterations + 1
+        lr_t = self.lr / (1 - K.pow(self.beta_1, t))
+
+        for p, g, c in zip(params, grads, constraints):
+            # zero init of 1st moment
+            m = K.variable(np.zeros(K.get_value(p).shape))
+            # zero init of exponentially weighted infinity norm
+            u = K.variable(np.zeros(K.get_value(p).shape))
+
+            m_t = (self.beta_1 * m) + (1 - self.beta_1) * g
+            u_t = K.maximum(self.beta_2 * u, K.abs(g))
+            p_t = p - lr_t * m_t / (u_t + self.epsilon)
+
+            self.updates.append((m, m_t))
+            self.updates.append((u, u_t))
+            self.updates.append((p, c(p_t)))  # apply constraints
+        return self.updates
+
+    def get_config(self):
+        return {"name": self.__class__.__name__,
+                "lr": float(K.get_value(self.lr)),
+                "beta_1": float(K.get_value(self.beta_1)),
+                "beta_2": float(K.get_value(self.beta_2)),
+                "epsilon": self.epsilon}
+
+
 # aliases
 sgd = SGD
 rmsprop = RMSprop
 adagrad = Adagrad
 adadelta = Adadelta
 adam = Adam
+adamax = Adamax


 def get(identifier, kwargs=None):
--- a/keras/preprocessing/sequence.py
+++ b/keras/preprocessing/sequence.py
@ -6,7 +6,7 @@ from six.moves import range

 def pad_sequences(sequences, maxlen=None, dtype='int32', padding='pre', truncating='pre', value=0.):
    """
-        Pad each sequence to the same length: 
+        Pad each sequence to the same length:
        the length of the longest sequence.

        If maxlen is provided, any sequence longer
@ -15,6 +15,19 @@ def pad_sequences(sequences, maxlen=None, dtype='int32', padding='pre', truncati

        Supports post-padding and pre-padding (default).

+        Parameters:
+        -----------
+        sequences: list of lists where each element is a sequence
+        maxlen: int, maximum length
+        dtype: type to cast the resulting sequence.
+        padding: 'pre' or 'post', pad either before or after each sequence.
+        truncating: 'pre' or 'post', remove values from sequences larger than
+            maxlen either in the beginning or in the end of the sequence
+        value: float, value to pad the sequences to the desired value.
+
+        Returns:
+        x: numpy array with dimensions (number_of_sequences, maxlen)
+
    """
    lengths = [len(s) for s in sequences]

@ -47,39 +60,53 @@ def make_sampling_table(size, sampling_factor=1e-5):
        This generates an array where the ith element
        is the probability that a word of rank i would be sampled,
        according to the sampling distribution used in word2vec.
-        
+
        The word2vec formula is:
            p(word) = min(1, sqrt(word.frequency/sampling_factor) / (word.frequency/sampling_factor))

-        We assume that the word frequencies follow Zipf's law (s=1) to derive 
+        We assume that the word frequencies follow Zipf's law (s=1) to derive
        a numerical approximation of frequency(rank):
           frequency(rank) ~ 1/(rank * (log(rank) + gamma) + 1/2 - 1/(12*rank))
        where gamma is the Euler-Mascheroni constant.
+
+        Parameters:
+        -----------
+        size: int, number of possible words to sample. 
    '''
    gamma = 0.577
    rank = np.array(list(range(size)))
    rank[0] = 1
    inv_fq = rank * (np.log(rank) + gamma) + 0.5 - 1./(12.*rank)
    f = sampling_factor * inv_fq
+
    return np.minimum(1., f / np.sqrt(f))


-def skipgrams(sequence, vocabulary_size, 
-    window_size=4, negative_samples=1., shuffle=True, 
-    categorical=False, sampling_table=None):
-    ''' 
-        Take a sequence (list of indexes of words), 
+def skipgrams(sequence, vocabulary_size,
+              window_size=4, negative_samples=1., shuffle=True,
+              categorical=False, sampling_table=None):
+    '''
+        Take a sequence (list of indexes of words),
        returns couples of [word_index, other_word index] and labels (1s or 0s),
        where label = 1 if 'other_word' belongs to the context of 'word',
        and label=0 if 'other_word' is ramdomly sampled

-        @param vocabulary_size: int. maximum possible word index + 1
-        @param window_size: int. actually half-window. The window of a word wi will be [i-window_size, i+window_size+1]
-        @param negative_samples: float >= 0. 0 for no negative (=random) samples. 1 for same number as positive samples. etc.
-        @param categorical: bool. if False, labels will be integers (eg. [0, 1, 1 .. ]), 
+        Paramaters:
+        -----------
+        vocabulary_size: int. maximum possible word index + 1
+        window_size: int. actually half-window. The window of a word wi will be [i-window_size, i+window_size+1]
+        negative_samples: float >= 0. 0 for no negative (=random) samples. 1 for same number as positive samples. etc.
+        categorical: bool. if False, labels will be integers (eg. [0, 1, 1 .. ]),
            if True labels will be categorical eg. [[1,0],[0,1],[0,1] .. ]

-        Note: by convention, index 0 in the vocabulary is a non-word and will be skipped.
+        Returns:
+        --------
+        couples, lables: where `couples` are int pairs and
+            `labels` are either 0 or 1.
+
+        Notes:
+        ------
+        By convention, index 0 in the vocabulary is a non-word and will be skipped.
    '''
    couples = []
    labels = []
--- a/keras/preprocessing/text.py
+++ b/keras/preprocessing/text.py
@ -39,7 +39,30 @@ def one_hot(text, n, filters=base_filter(), lower=True, split=" "):


 class Tokenizer(object):
-    def __init__(self, nb_words=None, filters=base_filter(), lower=True, split=" "):
+    def __init__(self, nb_words=None, filters=base_filter(),
+                 lower=True, split=' '):
+        '''The class allows to vectorize a text corpus, by turning each
+        text into either a sequence of integers (each integer being the index
+        of a token in a dictionary) or into a vector where the coefficient
+        for each token could be binary, based on word count, based on tf-idf...
+
+        # Arguments
+            nb_words: the maximum number of words to keep, based
+                on word frequency. Only the most common `nb_words` words will
+                be kept.
+            filters: a string where each element is a character that will be
+                filtered from the texts. The default is all punctuation, plus
+                tabs and line breaks, minus the `'` character.
+            lower: boolean. Whether to convert the texts to lowercase.
+            split: character or string to use for token splitting.
+
+        By default, all punctuation is removed, turning the texts into
+        space-separated sequences of words
+        (words maybe include the `'` character). These sequences are then
+        split into lists of tokens. They will then be indexed or vectorized.
+
+        `0` is a reserved index that won't be assigned to any word.
+        '''
        self.word_counts = {}
        self.word_docs = {}
        self.filters = filters
@ -51,7 +74,10 @@ class Tokenizer(object):
    def fit_on_texts(self, texts):
        '''
            required before using texts_to_sequences or texts_to_matrix
-            @param texts: can be a list or a generator (for memory-efficiency)
+
+        # Arguments
+            texts: can be a list of strings,
+                or a generator of strings (for memory-efficiency)
        '''
        self.document_count = 0
        for text in texts:
@ -141,12 +167,12 @@ class Tokenizer(object):
            if self.word_index:
                nb_words = len(self.word_index) + 1
            else:
-                raise Exception("Specify a dimension (nb_words argument), or fit on some text data first")
+                raise Exception("Specify a dimension (nb_words argument), or fit on some text data first.")
        else:
            nb_words = self.nb_words

        if mode == "tfidf" and not self.document_count:
-            raise Exception("Fit the Tokenizer on some data before using tfidf mode")
+            raise Exception("Fit the Tokenizer on some data before using tfidf mode.")

        X = np.zeros((len(sequences), nb_words))
        for i, seq in enumerate(sequences):
--- a/keras/utils/generic_utils.py
+++ b/keras/utils/generic_utils.py
@ -5,11 +5,13 @@ import sys
 import six


-def get_from_module(identifier, module_params, module_name, instantiate=False, kwargs=None):
+def get_from_module(identifier, module_params, module_name,
+                    instantiate=False, kwargs=None):
    if isinstance(identifier, six.string_types):
        res = module_params.get(identifier)
        if not res:
-            raise Exception('Invalid ' + str(module_name) + ': ' + str(identifier))
+            raise Exception('Invalid ' + str(module_name) + ': ' +
+                            str(identifier))
        if instantiate and not kwargs:
            return res()
        elif instantiate and kwargs:
@ -23,28 +25,6 @@ def make_tuple(*args):
    return args


-def printv(v, prefix=''):
-    if type(v) == dict:
-        if 'name' in v:
-            print(prefix + '#' + v['name'])
-            del v['name']
-        prefix += '...'
-        for nk, nv in v.items():
-            if type(nv) in [dict, list]:
-                print(prefix + nk + ':')
-                printv(nv, prefix)
-            else:
-                print(prefix + nk + ':' + str(nv))
-    elif type(v) == list:
-        prefix += '...'
-        for i, nv in enumerate(v):
-            print(prefix + '#' + str(i))
-            printv(nv, prefix)
-    else:
-        prefix += '...'
-        print(prefix + str(v))
-
-
 class Progbar(object):
    def __init__(self, target, width=30, verbose=1):
        '''
@ -110,7 +90,7 @@ class Progbar(object):
                info += ' - %s:' % k
                if type(self.sum_values[k]) is list:
                    avg = self.sum_values[k][0] / max(1, self.sum_values[k][1])
-                    if avg > 1e-3:
+                    if abs(avg) > 1e-3:
                        info += ' %.4f' % avg
                    else:
                        info += ' %.4e' % avg
--- a/keras/utils/layer_utils.py
+++ b/keras/utils/layer_utils.py
@ -26,12 +26,14 @@ def container_from_config(original_layer_dict, custom_objects={}):

    if name == 'Merge':
        mode = layer_dict.get('mode')
+        concat_axis = layer_dict.get('concat_axis')
+        dot_axes = layer_dict.get('dot_axes')
        layers = layer_dict.get('layers')
        layer_list = []
        for layer in layers:
            init_layer = container_from_config(layer)
            layer_list.append(init_layer)
-        merge_layer = Merge(layer_list, mode)
+        merge_layer = Merge(layer_list, mode, concat_axis, dot_axes)
        return merge_layer

    elif name == 'Sequential':
@ -69,10 +71,11 @@ def container_from_config(original_layer_dict, custom_objects={}):
                kwargs[kwarg] = layer_dict[kwarg]
        return AutoEncoder(**kwargs)

-    else:
+    else:  # this is a non-topological layer (e.g. Dense, etc.)
        layer_dict.pop('name')

        for k, v in layer_dict.items():
+            # a dictionary argument may be a regularizer or constraint
            if isinstance(v, dict):
                vname = v.pop('name')
                if vname in [x for x, y in inspect.getmembers(constraints, predicate=inspect.isclass)]:
@ -83,6 +86,9 @@ def container_from_config(original_layer_dict, custom_objects={}):
                    # not a regularizer of constraint, don't touch it
                    v['name'] = vname

+        # the "name" keyword argument of layers is saved as "custom_name"
+        if 'custom_name' in layer_dict:
+            layer_dict['name'] = layer_dict.pop('custom_name')
        base_layer = get_layer(name, layer_dict)
        return base_layer

--- a/keras/utils/np_utils.py
+++ b/keras/utils/np_utils.py
@ -7,7 +7,7 @@ from six.moves import zip

 def to_categorical(y, nb_classes=None):
    '''Convert class vector (integers from 0 to nb_classes)
-    to binary class matrix, for use with categorical_crossentropy
+    to binary class matrix, for use with categorical_crossentropy.
    '''
    y = np.asarray(y, dtype='int32')
    if not nb_classes:
--- a/keras/utils/visualize_util.py
+++ b/keras/utils/visualize_util.py
@ -1,10 +1,17 @@
-import pydot
-# old pydot will not work with python3, must use one
-# that works with python3 such as pydot2 or pydot
 import itertools
 from keras.layers.containers import Graph, Sequential
 from keras.layers.core import Merge

+try:
+    # pydot-ng is a fork of pydot that is better maintained
+    import pydot_ng as pydot
+except ImportError:
+    # fall back on pydot if necessary
+    import pydot
+if not pydot.find_graphviz():
+    raise RuntimeError("Failed to import pydot. You must install pydot"
+                       " and graphviz for `pydotprint` to work.")
+

 def layer_typename(layer):
    return type(layer).__module__ + "." + type(layer).__name__
@ -120,7 +127,7 @@ class ModelToDot(object):
        self.g = pydot.Dot()
        self.g.set('rankdir', 'TB')
        self.g.set('concentrate', True)
-        self.g.set_node_defaults(shape='record', fontname="Fira Mono")
+        self.g.set_node_defaults(shape='record')

        if hasattr(model, 'outputs'):
            # Graph
@ -136,8 +143,8 @@ class ModelToDot(object):

 def to_graph(model, **kwargs):
    """
-    `recursive` controls wether we recursively explore container layers
-    `show_shape` controls wether the shape is shown in the graph
+    `recursive` controls whether we recursively explore container layers
+    `show_shape` controls whether the shape is shown in the graph
    """
    return ModelToDot()(model, **kwargs)

--- a/setup.py
+++ b/setup.py
@ -3,12 +3,12 @@ from setuptools import find_packages


 setup(name='Keras',
-      version='0.3.0',
+      version='0.3.1',
      description='Theano-based Deep Learning library',
      author='Francois Chollet',
      author_email='francois.chollet@gmail.com',
      url='https://github.com/fchollet/keras',
-      download_url='https://github.com/fchollet/keras/tarball/0.3.0',
+      download_url='https://github.com/fchollet/keras/tarball/0.3.1',
      license='MIT',
      install_requires=['theano', 'pyyaml', 'six'],
      extras_require={
--- a/tests/integration_tests/test_image_data_tasks.py
+++ b/tests/integration_tests/test_image_data_tasks.py
@ -0,0 +1,46 @@
+from __future__ import print_function
+import numpy as np
+import pytest
+
+from keras.utils.test_utils import get_test_data
+from keras.models import Sequential
+from keras.layers.core import Dense, Flatten, Activation
+from keras.layers.convolutional import Convolution2D, MaxPooling2D
+from keras.utils.np_utils import to_categorical
+
+
+def test_image_classification():
+    '''
+    Classify random 16x16 color images into several classes using logistic regression
+    with convolutional hidden layer.
+    '''
+    np.random.seed(1337)
+    input_shape = (3, 16, 16)
+    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
+                                                         nb_test=200,
+                                                         input_shape=input_shape,
+                                                         classification=True,
+                                                         nb_class=4)
+    y_train = to_categorical(y_train)
+    y_test = to_categorical(y_test)
+    # convolution kernel size
+    nb_conv = 3
+    # size of pooling area for max pooling
+    nb_pool = 2
+
+    model = Sequential([
+        Convolution2D(nb_filter=8, nb_row=nb_conv, nb_col=nb_conv, input_shape=input_shape),
+        MaxPooling2D(pool_size=(nb_pool, nb_pool)),
+        Flatten(),
+        Activation('relu'),
+        Dense(y_test.shape[-1], activation='softmax')
+    ])
+    model.compile(loss='categorical_crossentropy', optimizer='sgd')
+    history = model.fit(X_train, y_train, nb_epoch=10, batch_size=16,
+                        validation_data=(X_test, y_test),
+                        show_accuracy=True, verbose=0)
+    assert(history.history['val_acc'][-1] > 0.85)
+
+
+if __name__ == '__main__':
+    pytest.main([__file__])
--- a/tests/integration_tests/test_temporal_data_tasks.py
+++ b/tests/integration_tests/test_temporal_data_tasks.py
@ -0,0 +1,131 @@
+from __future__ import print_function
+import numpy as np
+import pytest
+import string
+
+from keras.utils.test_utils import get_test_data
+from keras.models import Sequential
+from keras.layers.core import TimeDistributedDense, Dropout, Dense
+from keras.layers.recurrent import GRU, LSTM
+from keras.utils.np_utils import to_categorical
+
+
+def test_temporal_classification():
+    '''
+    Classify temporal sequences of float numbers of length 3 into 2 classes using
+    single layer of GRU units and softmax applied to the last activations of the units
+    '''
+    np.random.seed(1337)
+    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
+                                                         nb_test=200,
+                                                         input_shape=(3, 5),
+                                                         classification=True,
+                                                         nb_class=2)
+    y_train = to_categorical(y_train)
+    y_test = to_categorical(y_test)
+
+    model = Sequential()
+    model.add(GRU(y_train.shape[-1],
+                  input_shape=(X_train.shape[1], X_train.shape[2]),
+                  activation='softmax'))
+    model.compile(loss='categorical_crossentropy', optimizer='adadelta')
+    history = model.fit(X_train, y_train, nb_epoch=5, batch_size=16,
+                        validation_data=(X_test, y_test),
+                        show_accuracy=True, verbose=0)
+    assert(history.history['val_acc'][-1] > 0.9)
+
+
+def test_temporal_regression():
+    '''
+    Predict float numbers (regression) based on sequences of float numbers of length 3 using
+    single layer of GRU units
+    '''
+    np.random.seed(1337)
+    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
+                                                         nb_test=200,
+                                                         input_shape=(3, 5),
+                                                         output_shape=(2,),
+                                                         classification=False)
+    model = Sequential()
+    model.add(GRU(y_train.shape[-1],
+              input_shape=(X_train.shape[1], X_train.shape[2])))
+    model.compile(loss='hinge', optimizer='adam')
+    history = model.fit(X_train, y_train, nb_epoch=5, batch_size=16,
+                        validation_data=(X_test, y_test), verbose=0)
+    assert(history.history['val_loss'][-1] < 0.75)
+
+
+def test_sequence_to_sequence():
+    '''
+    Apply a same Dense layer for each element of time dimension of the input
+    and make predictions of the output sequence elements.
+    This does not make use of the temporal structure of the sequence
+    (see TimeDistributedDense for more details)
+    '''
+    np.random.seed(1337)
+    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
+                                                         nb_test=200,
+                                                         input_shape=(3, 5),
+                                                         output_shape=(3, 5),
+                                                         classification=False)
+
+    model = Sequential()
+    model.add(TimeDistributedDense(y_train.shape[-1],
+              input_shape=(X_train.shape[1], X_train.shape[2])))
+    model.compile(loss='hinge', optimizer='rmsprop')
+    history = model.fit(X_train, y_train, nb_epoch=20, batch_size=16,
+                        validation_data=(X_test, y_test), verbose=0)
+    assert(history.history['val_loss'][-1] < 0.8)
+
+
+def test_stacked_lstm_char_prediction():
+    '''
+    Learn alphabetical char sequence with stacked LSTM.
+    Predict the whole alphabet based on the first two letters ('ab' -> 'ab...z')
+    See non-toy example in examples/lstm_text_generation.py
+    '''
+    np.random.seed(1336)
+    # generate alphabet: http://stackoverflow.com/questions/16060899/alphabet-range-python
+    alphabet = string.ascii_lowercase
+    number_of_chars = len(alphabet)
+
+    # generate char sequences of length 'sequence_length' out of alphabet and store the next char as label (e.g. 'ab'->'c')
+    sequence_length = 2
+    sentences = [alphabet[i: i + sequence_length] for i in range(len(alphabet) - sequence_length)]
+    next_chars = [alphabet[i + sequence_length] for i in range(len(alphabet) - sequence_length)]
+
+    # Transform sequences and labels into 'one-hot' encoding
+    X = np.zeros((len(sentences), sequence_length, number_of_chars), dtype=np.bool)
+    y = np.zeros((len(sentences), number_of_chars), dtype=np.bool)
+    for i, sentence in enumerate(sentences):
+        for t, char in enumerate(sentence):
+            X[i, t, ord(char)-ord('a')] = 1
+        y[i, ord(next_chars[i])-ord('a')] = 1
+
+    # learn the alphabet with stacked LSTM
+    model = Sequential([
+        LSTM(16, return_sequences=True, input_shape=(sequence_length, number_of_chars)),
+        LSTM(16, return_sequences=False),
+        Dense(number_of_chars, activation='softmax')
+    ])
+    model.compile(loss='categorical_crossentropy', optimizer='adam')
+    model.fit(X, y, batch_size=1, nb_epoch=60, verbose=1)
+
+    # prime the model with 'ab' sequence and let it generate the learned alphabet
+    sentence = alphabet[:sequence_length]
+    generated = sentence
+    for iteration in range(number_of_chars-sequence_length):
+        x = np.zeros((1, sequence_length, number_of_chars))
+        for t, char in enumerate(sentence):
+            x[0, t, ord(char) - ord('a')] = 1.
+        preds = model.predict(x, verbose=0)[0]
+        next_char = chr(np.argmax(preds) + ord('a'))
+        generated += next_char
+        sentence = sentence[1:] + next_char
+
+    # check that it did generate the alphabet correctly
+    assert(generated == alphabet)
+
+
+if __name__ == '__main__':
+    pytest.main([__file__])
--- a/tests/integration_tests/test_vector_data_tasks.py
+++ b/tests/integration_tests/test_vector_data_tasks.py
@ -0,0 +1,63 @@
+from __future__ import print_function
+import numpy as np
+import pytest
+
+from keras.utils.test_utils import get_test_data
+from keras.models import Sequential
+from keras.layers.core import Dense
+from keras.utils.np_utils import to_categorical
+
+
+def test_vector_classification():
+    '''
+    Classify random float vectors into 2 classes with logistic regression
+    using 2 layer neural network with ReLU hidden units.
+    '''
+    np.random.seed(1337)
+    nb_hidden = 10
+
+    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
+                                                         nb_test=200,
+                                                         input_shape=(20,),
+                                                         classification=True,
+                                                         nb_class=2)
+    y_train = to_categorical(y_train)
+    y_test = to_categorical(y_test)
+
+    model = Sequential([
+        Dense(nb_hidden, input_shape=(X_train.shape[-1],), activation='relu'),
+        Dense(y_train.shape[-1], activation='softmax')
+    ])
+    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
+    history = model.fit(X_train, y_train, nb_epoch=15, batch_size=16,
+                        validation_data=(X_test, y_test),
+                        show_accuracy=True, verbose=0)
+    assert(history.history['val_acc'][-1] > 0.8)
+
+
+def test_vector_regression():
+    '''
+    Perform float data prediction (regression) using 2 layer MLP
+    with tanh and sigmoid activations.
+    '''
+    np.random.seed(1337)
+    nb_hidden = 10
+    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
+                                                         nb_test=200,
+                                                         input_shape=(20,),
+                                                         output_shape=(2,),
+                                                         classification=False)
+
+    model = Sequential([
+        Dense(nb_hidden, input_shape=(X_train.shape[-1],), activation='tanh'),
+        Dense(y_train.shape[-1])
+    ])
+
+    model.compile(loss='hinge', optimizer='adagrad')
+    history = model.fit(X_train, y_train, nb_epoch=20, batch_size=16,
+                        validation_data=(X_test, y_test), verbose=0)
+    assert (history.history['val_loss'][-1] < 0.9)
+
+
+if __name__ == '__main__':
+    pytest.main([__file__])
--- a/tests/keras/backend/test_backends.py
+++ b/tests/keras/backend/test_backends.py
@ -64,6 +64,26 @@ class TestBackend(object):
        check_single_tensor_operation('expand_dims', (4, 3, 2), dim=1)
        check_single_tensor_operation('squeeze', (4, 3, 1), axis=2)

+    def test_repeat_elements(self):
+        reps = 3
+        for ndims in [1, 2, 3]:
+            shape = np.arange(2, 2+ndims)
+            arr = np.arange(np.prod(shape)).reshape(shape)
+            arr_th = KTH.variable(arr)
+            arr_tf = KTF.variable(arr)
+
+            for rep_axis in range(ndims):
+                np_rep = np.repeat(arr, reps, axis=rep_axis)
+                th_rep = KTH.eval(
+                    KTH.repeat_elements(arr_th, reps, axis=rep_axis))
+                tf_rep = KTF.eval(
+                    KTF.repeat_elements(arr_tf, reps, axis=rep_axis))
+
+                assert th_rep.shape == np_rep.shape
+                assert tf_rep.shape == np_rep.shape
+                assert_allclose(np_rep, th_rep, atol=1e-05)
+                assert_allclose(np_rep, tf_rep, atol=1e-05)
+
    def test_value_manipulation(self):
        val = np.random.random((4, 2))
        xth = KTH.variable(val)
@ -261,9 +281,11 @@ class TestBackend(object):
        check_two_tensor_operation('binary_crossentropy', (4, 2), (4, 2), from_logits=True)
        check_two_tensor_operation('categorical_crossentropy', (4, 2), (4, 2), from_logits=True)
        check_two_tensor_operation('binary_crossentropy', (4, 2), (4, 2), from_logits=False)
-
        check_two_tensor_operation('categorical_crossentropy', (4, 2), (4, 2), from_logits=False)

+        check_single_tensor_operation('l2_normalize', (4, 3), axis=-1)
+        check_single_tensor_operation('l2_normalize', (4, 3), axis=1)
+
    # def test_conv2d(self):
    #     '''conv2d works "properly" with Theano and TF but outputs different
    #     values in each case. Cause unclear (input / kernel shape format?)
--- a/tests/keras/datasets/test_datasets.py
+++ b/tests/keras/datasets/test_datasets.py
@ -11,6 +11,7 @@ def test_cifar():

 def test_reuters():
    (X_train, y_train), (X_test, y_test) = reuters.load_data()
+    (X_train, y_train), (X_test, y_test) = reuters.load_data(maxlen=10)


 def test_mnist():
@ -19,6 +20,7 @@ def test_mnist():

 def test_imdb():
    (X_train, y_train), (X_test, y_test) = imdb.load_data()
+    (X_train, y_train), (X_test, y_test) = imdb.load_data(maxlen=40)


 if __name__ == '__main__':
--- a/tests/keras/layers/test_call.py
+++ b/tests/keras/layers/test_call.py
@ -16,10 +16,10 @@ def test_layer_call():
    W = np.asarray(K.eval(layer.W)).astype(K.floatx())
    X = K.placeholder(ndim=2)
    Y = layer(X)
-    F = K.function([X], [Y])
+    f = K.function([X], [Y])

    x = np.ones((nb_samples, input_dim)).astype(K.floatx())
-    y = F([x])[0].astype(K.floatx())
+    y = f([x])[0].astype(K.floatx())
    t = np.dot(x, W).astype(K.floatx())
    assert_allclose(t, y, rtol=.2)

@ -31,16 +31,30 @@ def test_sequential_call():
    model.add(Dense(output_dim=output_dim, input_dim=input_dim))
    model.compile('sgd', 'mse')

+    # test flat model
    X = K.placeholder(ndim=2)
    Y = model(X)
-    F = K.function([X], [Y])
+    f = K.function([X], [Y])

    x = np.ones((nb_samples, input_dim)).astype(K.floatx())
-    y1 = F([x])[0].astype(K.floatx())
+    y1 = f([x])[0].astype(K.floatx())
    y2 = model.predict(x)
    # results of __call__ should match model.predict
    assert_allclose(y1, y2)

+    # test nested model
+    model2 = Sequential()
+    model2.add(model)
+    model2.compile('sgd', 'mse')
+
+    Y2 = model2(X)
+    f = K.function([X], [Y2])
+
+    y1 = f([x])[0].astype(K.floatx())
+    y2 = model2.predict(x)
+    # results of __call__ should match model.predict
+    assert_allclose(y1, y2)
+

 if __name__ == '__main__':
    pytest.main([__file__])
--- a/tests/keras/layers/test_convolutional.py
+++ b/tests/keras/layers/test_convolutional.py
@ -188,17 +188,44 @@ def test_upsampling_2d():
    input_nb_row = 11
    input_nb_col = 12

-    input = np.ones((nb_samples, stack_size, input_nb_row, input_nb_col))

-    for length_row in [2, 3, 9]:
-        for length_col in [2, 3, 9]:
-            layer = convolutional.UpSampling2D(size=(length_row, length_col))
-            layer.input = K.variable(input)
-            for train in [True, False]:
-                out = K.eval(layer.get_output(train))
-                assert out.shape[2] == length_row * input_nb_row
-                assert out.shape[3] == length_col * input_nb_col
-        layer.get_config()
+    for dim_ordering in ['th', 'tf']:
+        if dim_ordering == 'th':
+            input = np.random.rand(nb_samples, stack_size, input_nb_row,
+                                   input_nb_col)
+        else:  # tf
+            input = np.random.rand(nb_samples, input_nb_row, input_nb_col,
+                                   stack_size)
+
+        for length_row in [2, 3, 9]:
+            for length_col in [2, 3, 9]:
+                    layer = convolutional.UpSampling2D(
+                        size=(length_row, length_col),
+                        input_shape=input.shape[1:],
+                        dim_ordering=dim_ordering)
+                    layer.input = K.variable(input)
+                    for train in [True, False]:
+                        out = K.eval(layer.get_output(train))
+                        if dim_ordering == 'th':
+                            assert out.shape[2] == length_row * input_nb_row
+                            assert out.shape[3] == length_col * input_nb_col
+                        else:  # tf
+                            assert out.shape[1] == length_row * input_nb_row
+                            assert out.shape[2] == length_col * input_nb_col
+
+                        # compare with numpy
+                        if dim_ordering == 'th':
+                            expected_out = np.repeat(input, length_row, axis=2)
+                            expected_out = np.repeat(expected_out, length_col,
+                                                     axis=3)
+                        else:  # tf
+                            expected_out = np.repeat(input, length_row, axis=1)
+                            expected_out = np.repeat(expected_out, length_col,
+                                                     axis=2)
+
+                        assert_allclose(out, expected_out)
+
+                    layer.get_config()


 if __name__ == '__main__':
--- a/tests/keras/layers/test_core.py
+++ b/tests/keras/layers/test_core.py
@ -1,5 +1,6 @@
 import pytest
 import numpy as np
+from keras.models import Sequential
 from numpy.testing import assert_allclose

 from keras import backend as K
@ -100,6 +101,11 @@ def test_time_dist_merge():
    _runner(layer)


+def test_highway():
+    layer = core.Highway(input_shape=(10,))
+    _runner(layer)
+
+
 def test_autoencoder():
    layer_1 = core.Layer()
    layer_2 = core.Layer()
@ -108,11 +114,37 @@ def test_autoencoder():
    _runner(layer)


+def test_autoencoder_second_layer():
+    # regression test for issue #1275
+    encoder = core.Dense(input_dim=10, output_dim=2)
+    decoder = core.Dense(input_dim=2, output_dim=10)
+    model = Sequential()
+    model.add(core.Dense(input_dim=20, output_dim=10))
+    model.add(core.AutoEncoder(encoder=encoder, decoder=decoder,
+                               output_reconstruction=False))
+    model.compile(loss='mse', optimizer='sgd')
+
+
 def test_maxout_dense():
    layer = core.MaxoutDense(10, 10, input_shape=(20,))
    _runner(layer)


+def test_naming():
+    layer = core.Dense(2, input_dim=2)
+    assert layer.name == 'dense'
+
+    model = Sequential()
+    model.add(core.Dense(2, input_dim=2, name='my_dense'))
+    model.add(core.Dense(2, name='my_dense'))
+
+    assert model.layers[0].name == 'my_dense'
+    assert model.layers[1].name == 'my_dense'
+
+    model.compile(optimizer='rmsprop', loss='mse')
+    model.train_on_batch(np.random.random((2, 2)), np.random.random((2, 2)))
+
+
@pytest.mark.skipif(K._BACKEND == 'tensorflow',
                    reason='currently not working with TensorFlow')
 def test_sequences():
@ -175,6 +207,29 @@ def _runner(layer):
    layer.trainable = True
    layer.trainable = False

+def test_siamese_all():
+    right_input_layer = core.Dense(7, input_dim=3)
+    left_input_layer = core.Dense(7, input_dim=3)
+
+    shared_layer = core.Dense(5,input_dim=7)
+    for mode in ['sum', 'mul', 'ave', 'concat']:
+        siamese_layer = core.Siamese(shared_layer, [left_input_layer, right_input_layer], merge_mode=mode)
+        siamese_layer.output_shape
+        siamese_layer.get_output()
+
+@pytest.mark.skipif(K._BACKEND == 'tensorflow',
+                    reason='currently not working with TensorFlow')
+def test_siamese_theano_only():
+    right_input_layer = core.Dense(7, input_dim=3)
+    left_input_layer = core.Dense(7, input_dim=3)
+
+    shared_layer = core.Dense(5,input_dim=7)
+
+    for mode in ['dot', 'cos']:
+        siamese_layer = core.Siamese(shared_layer, [left_input_layer, right_input_layer], merge_mode=mode,
+                                     dot_axes=([1], [1]))
+        siamese_layer.output_shape
+        siamese_layer.get_output()
+
 if __name__ == '__main__':
    pytest.main([__file__])
-
--- a/tests/keras/layers/test_embeddings.py
+++ b/tests/keras/layers/test_embeddings.py
@ -24,7 +24,7 @@ def test_unitnorm_constraint():
                   class_mode='binary')
    lookup.train_on_batch(X1, np.array([[1], [0]], dtype='int32'))
    norm = np.linalg.norm(K.get_value(lookup.params[0]), axis=1)
-    assert_allclose(norm, np.ones_like(norm).astype('float32'))
+    assert_allclose(norm, np.ones_like(norm).astype('float32'), rtol=1e-05)


 if __name__ == '__main__':
--- a/tests/keras/layers/test_noise.py
+++ b/tests/keras/layers/test_noise.py
@ -0,0 +1,41 @@
+import pytest
+import numpy as np
+from keras import backend as K
+from keras.layers import core
+from keras.layers import noise
+
+input_shape = (10, 10)
+batch_input_shape = (10, 10, 10)
+
+
+def test_GaussianNoise():
+    layer = noise.GaussianNoise(sigma=1., input_shape=input_shape)
+    _runner(layer)
+
+
+def test_GaussianDropout():
+    layer = noise.GaussianDropout(p=0.2, input_shape=input_shape)
+    _runner(layer)
+
+
+def _runner(layer):
+    assert isinstance(layer, core.Layer)
+    layer.build()
+    conf = layer.get_config()
+    assert (type(conf) == dict)
+
+    param = layer.get_params()
+    # Typically a list or a tuple, but may be any iterable
+    assert hasattr(param, '__iter__')
+    layer.input = K.variable(np.random.random(batch_input_shape))
+    output = layer.get_output(train=False)
+    output_np = K.eval(output)
+    assert output_np.shape == batch_input_shape
+
+    output = layer.get_output(train=True)
+    output_np = K.eval(output)
+    assert output_np.shape == batch_input_shape
+
+
+if __name__ == '__main__':
+    pytest.main([__file__])
--- a/tests/keras/layers/test_normalization.py
+++ b/tests/keras/layers/test_normalization.py
@ -1,9 +1,10 @@
 import pytest
 import numpy as np
+from keras.layers.core import Dense, Activation
 from numpy.testing import assert_allclose

 from keras.layers import normalization
-from keras.models import Sequential
+from keras.models import Sequential, Graph
 from keras import backend as K


@ -83,6 +84,9 @@ def test_batchnorm_config():
    norm = normalization.BatchNormalization(input_shape=(10, 10), mode=1,
                                            epsilon=0.1, momentum=0.9)
    conf = norm.get_config()
+    del conf['cache_enabled']
+    del conf['trainable']
+    del conf['custom_name']
    conf_target = {"input_shape": (10, 10),
                   "name": normalization.BatchNormalization.__name__,
                   "epsilon": 0.1, "mode": 1, "momentum": 0.9}
@ -97,5 +101,27 @@ def test_batchnorm_save_weights():
    norm.set_weights(weights)


+def test_batchnorm_nested():
+    # regression test for issue #1386
+    g = Graph()
+    g.add_input("input", input_shape=[20])
+    g.add_node(Dense(10), "dense", "input")
+    g.add_node(normalization.BatchNormalization(), "bn", "dense")
+    g.add_node(Activation('relu'), "activ", "bn")
+    g.add_output("output", "activ")
+
+    g2 = Graph()
+    g2.add_input("input", input_shape=[10])
+    g2.add_node(Dense(15), "dense", "input")
+    g2.add_node(normalization.BatchNormalization(), "bn", "dense")
+    g2.add_node(Activation('relu'), "activ", "bn")
+    g2.add_output("output", "activ")
+
+    model = Sequential()
+    model.add(g)
+    model.add(g2)
+    model.compile(loss="mse", optimizer="adadelta")
+
+
 if __name__ == '__main__':
    pytest.main([__file__])
--- a/tests/keras/preprocessing/test_sequence.py
+++ b/tests/keras/preprocessing/test_sequence.py
@ -0,0 +1,53 @@
+import numpy as np
+from numpy.testing import assert_allclose
+
+import pytest
+
+from keras.preprocessing.sequence import pad_sequences
+from keras.preprocessing.sequence import make_sampling_table
+from keras.preprocessing.sequence import skipgrams
+
+
+def test_pad_sequences():
+    a = [[1], [1, 2], [1, 2, 3]]
+
+    # test padding
+    b = pad_sequences(a, maxlen=3, padding='pre')
+    assert_allclose(b, [[0, 0, 1], [0, 1, 2], [1, 2, 3]])
+    b = pad_sequences(a, maxlen=3, padding='post')
+    assert_allclose(b, [[1, 0, 0], [1, 2, 0], [1, 2, 3]])
+
+    # test truncating
+    b = pad_sequences(a, maxlen=2, truncating='pre')
+    assert_allclose(b, [[0, 1], [1, 2], [2, 3]])
+    b = pad_sequences(a, maxlen=2, truncating='post')
+    assert_allclose(b, [[0, 1], [1, 2], [1, 2]])
+
+    # test value
+    b = pad_sequences(a, maxlen=3, value=1)
+    assert_allclose(b, [[1, 1, 1], [1, 1, 2], [1, 2, 3]])
+
+
+def test_make_sampling_table():
+    a = make_sampling_table(3)
+    assert_allclose(a, np.asarray([0.00315225,  0.00315225,  0.00547597]),
+                    rtol=.1)
+
+
+def test_skipgrams():
+    # test with no window size and binary labels
+    couples, labels = skipgrams(np.arange(3), vocabulary_size=3)
+    for couple in couples:
+        assert couple[0] in [0, 1, 2] and couple[1] in [0, 1, 2]
+
+    # test window size and categorical labels
+    couples, labels = skipgrams(np.arange(5), vocabulary_size=5, window_size=1,
+                                categorical=True)
+    for couple in couples:
+        assert couple[0] - couple[1] <= 3
+    for l in labels:
+        assert len(l) == 2
+
+
+if __name__ == '__main__':
+    pytest.main([__file__])
--- a/tests/keras/preprocessing/test_text.py
+++ b/tests/keras/preprocessing/test_text.py
@ -0,0 +1,34 @@
+from keras.preprocessing.text import Tokenizer, one_hot
+import pytest
+import numpy as np
+
+
+def test_one_hot():
+    text = 'The cat sat on the mat.'
+    encoded = one_hot(text, 5)
+    assert len(encoded) == 6
+    assert np.max(encoded) <= 4
+    assert np.min(encoded) >= 0
+
+
+def test_tokenizer():
+    texts = ['The cat sat on the mat.',
+             'The dog sat on the log.',
+             'Dogs and cats living together.']
+    tokenizer = Tokenizer(nb_words=10)
+    tokenizer.fit_on_texts(texts)
+
+    sequences = []
+    for seq in tokenizer.texts_to_sequences_generator(texts):
+        sequences.append(seq)
+    assert np.max(np.max(sequences)) < 10
+    assert np.min(np.min(sequences)) == 1
+
+    tokenizer.fit_on_sequences(sequences)
+
+    for mode in ['binary', 'count', 'tfidf', 'freq']:
+        matrix = tokenizer.texts_to_matrix(texts, mode)
+
+
+if __name__ == '__main__':
+    pytest.main([__file__])
--- a/tests/keras/test_callbacks.py
+++ b/tests/keras/test_callbacks.py
@ -0,0 +1,198 @@
+import pytest
+import os
+import sys
+import numpy as np
+np.random.seed(1337)
+
+from keras import callbacks
+from keras.models import Graph, Sequential
+from keras.layers.core import Dense
+from keras.utils.test_utils import get_test_data
+from keras import backend as K
+from keras.utils import np_utils
+
+input_dim = 2
+nb_hidden = 4
+nb_class = 2
+batch_size = 5
+train_samples = 20
+test_samples = 20
+
+
+def test_ModelCheckpoint():
+    filepath = 'checkpoint.h5'
+    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=train_samples,
+                                                         nb_test=test_samples,
+                                                         input_shape=(input_dim,),
+                                                         classification=True,
+                                                         nb_class=nb_class)
+    y_test = np_utils.to_categorical(y_test)
+    y_train = np_utils.to_categorical(y_train)
+    # case 1
+    monitor = 'val_loss'
+    save_best_only = False
+    mode = 'auto'
+
+    model = Sequential()
+    model.add(Dense(nb_hidden, input_dim=input_dim, activation='relu'))
+    model.add(Dense(nb_class, activation='softmax'))
+    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
+
+    cbks = [callbacks.ModelCheckpoint(filepath, monitor=monitor,
+                                      save_best_only=save_best_only, mode=mode)]
+    model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
+              validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=1)
+    assert os.path.exists(filepath)
+    os.remove(filepath)
+
+    # case 2
+    mode = 'min'
+    cbks = [callbacks.ModelCheckpoint(filepath, monitor=monitor,
+                                      save_best_only=save_best_only, mode=mode)]
+    model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
+              validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=1)
+    assert os.path.exists(filepath)
+    os.remove(filepath)
+
+    # case 3
+    mode = 'max'
+    monitor = 'val_acc'
+    cbks = [callbacks.ModelCheckpoint(filepath, monitor=monitor,
+                                      save_best_only=save_best_only, mode=mode)]
+    model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
+              validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=1)
+    assert os.path.exists(filepath)
+    os.remove(filepath)
+
+    # case 4
+    save_best_only = True
+    cbks = [callbacks.ModelCheckpoint(filepath, monitor=monitor,
+                                      save_best_only=save_best_only, mode=mode)]
+    model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
+              validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=1)
+    assert os.path.exists(filepath)
+    os.remove(filepath)
+
+
+def test_EarlyStopping():
+    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=train_samples,
+                                                         nb_test=test_samples,
+                                                         input_shape=(input_dim,),
+                                                         classification=True,
+                                                         nb_class=nb_class)
+    y_test = np_utils.to_categorical(y_test)
+    y_train = np_utils.to_categorical(y_train)
+    model = Sequential()
+    model.add(Dense(nb_hidden, input_dim=input_dim, activation='relu'))
+    model.add(Dense(nb_class, activation='softmax'))
+    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
+
+    mode = 'max'
+    monitor = 'val_acc'
+    patience = 0
+    cbks = [callbacks.EarlyStopping(patience=patience, monitor=monitor, mode=mode)]
+    history = model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
+                        validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=20)
+
+    mode = 'auto'
+    monitor = 'val_acc'
+    patience = 2
+    cbks = [callbacks.EarlyStopping(patience=patience, monitor=monitor, mode=mode)]
+    history = model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
+                        validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=20)
+
+
+def test_LearningRateScheduler():
+    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=train_samples,
+                                                         nb_test=test_samples,
+                                                         input_shape=(input_dim,),
+                                                         classification=True,
+                                                         nb_class=nb_class)
+    y_test = np_utils.to_categorical(y_test)
+    y_train = np_utils.to_categorical(y_train)
+    model = Sequential()
+    model.add(Dense(nb_hidden, input_dim=input_dim, activation='relu'))
+    model.add(Dense(nb_class, activation='softmax'))
+    model.compile(loss='categorical_crossentropy', optimizer='sgd')
+
+    cbks = [callbacks.LearningRateScheduler(lambda x: 1. / (1. + x))]
+    model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
+              validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=5)
+    assert (float(K.get_value(model.optimizer.lr)) - 0.2) < K.epsilon()
+
+
+@pytest.mark.skipif((K._BACKEND != 'tensorflow') or (sys.version_info[0] == 3),
+                    reason="Requires tensorflow backend")
+def test_TensorBoard():
+    import shutil
+    import tensorflow as tf
+    import keras.backend.tensorflow_backend as KTF
+    old_session = KTF._get_session()
+    filepath = './logs'
+    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=train_samples,
+                                                         nb_test=test_samples,
+                                                         input_shape=(input_dim,),
+                                                         classification=True,
+                                                         nb_class=nb_class)
+    y_test = np_utils.to_categorical(y_test)
+    y_train = np_utils.to_categorical(y_train)
+    # case 1 Sequential wo accuracy
+    with tf.Graph().as_default():
+        session = tf.Session('')
+        KTF._set_session(session)
+        model = Sequential()
+        model.add(Dense(nb_hidden, input_dim=input_dim, activation='relu'))
+        model.add(Dense(nb_class, activation='softmax'))
+        model.compile(loss='categorical_crossentropy', optimizer='sgd')
+
+        tsb = callbacks.TensorBoard(log_dir=filepath, histogram_freq=1)
+        cbks = [tsb]
+        model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
+                  validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=2)
+        assert os.path.exists(filepath)
+        shutil.rmtree(filepath)
+
+    # case 2 Sequential w accuracy
+    with tf.Graph().as_default():
+        session = tf.Session('')
+        KTF._set_session(session)
+        model = Sequential()
+        model.add(Dense(nb_hidden, input_dim=input_dim, activation='relu'))
+        model.add(Dense(nb_class, activation='softmax'))
+        model.compile(loss='categorical_crossentropy', optimizer='sgd')
+
+        tsb = callbacks.TensorBoard(log_dir=filepath, histogram_freq=1)
+        cbks = [tsb]
+        model.fit(X_train, y_train, batch_size=batch_size, show_accuracy=True,
+                  validation_data=(X_test, y_test), callbacks=cbks, nb_epoch=2)
+        assert os.path.exists(filepath)
+        shutil.rmtree(filepath)
+
+    # case 3 Graph
+    with tf.Graph().as_default():
+        session = tf.Session('')
+        KTF._set_session(session)
+        model = Graph()
+        model.add_input(name='X_vars', input_shape=(input_dim, ))
+
+        model.add_node(Dense(nb_hidden, activation="sigmoid"),
+                       name='Dense1', input='X_vars')
+        model.add_node(Dense(nb_class, activation="softmax"),
+                       name='last_dense',
+                       input='Dense1')
+        model.add_output(name='output', input='last_dense')
+        model.compile(optimizer='sgd', loss={'output': 'mse'})
+
+        tsb = callbacks.TensorBoard(log_dir=filepath, histogram_freq=1)
+        cbks = [tsb]
+        model.fit({'X_vars': X_train, 'output': y_train},
+                  batch_size=batch_size,
+                  validation_data={'X_vars': X_test, 'output': y_test},
+                  callbacks=cbks, nb_epoch=2)
+        assert os.path.exists(filepath)
+        shutil.rmtree(filepath)
+
+    KTF._set_session(old_session)
+
+if __name__ == '__main__':
+    pytest.main([__file__])
--- a/tests/keras/test_initializations.py
+++ b/tests/keras/test_initializations.py
@ -0,0 +1,85 @@
+import pytest
+import numpy as np
+
+from keras import initializations
+from keras import backend as K
+
+SHAPE = (100, 100)
+
+
+def _runner(init, shape, target_mean=None, target_std=None,
+            target_max=None, target_min=None):
+    variable = init(shape)
+    output = K.get_value(variable)
+    lim = 1e-2
+    if target_std is not None:
+        assert abs(output.std() - target_std) < lim
+    if target_mean is not None:
+        assert abs(output.mean() - target_mean) < lim
+    if target_max is not None:
+        assert abs(output.max() - target_max) < lim
+    if target_min is not None:
+        assert abs(output.min() - target_min) < lim
+
+
+def test_uniform():
+    _runner(initializations.uniform, SHAPE, target_mean=0.,
+            target_max=0.05, target_min=-0.05)
+
+
+def test_normal():
+    _runner(initializations.normal, SHAPE, target_mean=0., target_std=0.05)
+
+
+def test_lecun_uniform():
+    scale = np.sqrt(3. / SHAPE[0])
+    _runner(initializations.lecun_uniform, SHAPE,
+            target_mean=0., target_max=scale, target_min=-scale)
+
+
+def test_glorot_uniform():
+    scale = np.sqrt(6. / (SHAPE[0] + SHAPE[1]))
+    _runner(initializations.glorot_uniform, SHAPE, target_mean=0.,
+            target_max=scale, target_min=-scale)
+
+
+def test_glorot_normal():
+    scale = np.sqrt(2. / (SHAPE[0] + SHAPE[1]))
+    _runner(initializations.glorot_normal, SHAPE,
+            target_mean=0., target_std=scale)
+
+
+def test_he_uniform():
+    scale = np.sqrt(6. / SHAPE[0])
+    _runner(initializations.he_uniform, SHAPE, target_mean=0.,
+            target_max=scale, target_min=-scale)
+
+
+def test_he_normal():
+    scale = np.sqrt(2. / SHAPE[0])
+    _runner(initializations.he_normal, SHAPE,
+            target_mean=0., target_std=scale)
+
+
+def test_orthogonal():
+    _runner(initializations.orthogonal, SHAPE,
+            target_mean=0.)
+
+
+def test_identity():
+    _runner(initializations.identity, SHAPE,
+            target_mean=1./SHAPE[0], target_max=1.)
+
+
+def test_zero():
+    _runner(initializations.zero, SHAPE,
+            target_mean=0., target_max=0.)
+
+
+def test_one():
+    _runner(initializations.one, SHAPE,
+            target_mean=1., target_max=1.)
+
+
+if __name__ == '__main__':
+    pytest.main([__file__])
--- a/tests/keras/test_models.py
+++ b/tests/keras/test_models.py
@ -6,13 +6,13 @@ np.random.seed(1337)

 from keras import backend as K
 from keras.models import Graph, Sequential, model_from_json, model_from_yaml
-from keras.layers.core import Dense, Activation, Merge, Lambda, LambdaMerge
+from keras.layers.core import Dense, Activation, Merge, Lambda, LambdaMerge, Siamese, add_shared_layer
 from keras.layers import containers
 from keras.utils import np_utils
 from keras.utils.test_utils import get_test_data

 import os
-from keras.utils.layer_utils import model_summary
+

 input_dim = 32
 nb_hidden = 16
@ -20,24 +20,63 @@ nb_class = 4
 batch_size = 32
 nb_epoch = 1

-train_samples = 2000
-test_samples = 500

-(X_train, y_train), (X_test, y_test) = get_test_data(nb_train=train_samples,
-                                                     nb_test=test_samples,
-                                                     input_shape=(input_dim,),
-                                                     classification=True,
-                                                     nb_class=4)
-y_test = np_utils.to_categorical(y_test)
-y_train = np_utils.to_categorical(y_train)
+def _get_test_data():
+    np.random.seed(1234)
+
+    train_samples = 2000
+    test_samples = 500
+
+    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=train_samples,
+                                                         nb_test=test_samples,
+                                                         input_shape=(input_dim,),
+                                                         classification=True,
+                                                         nb_class=4)
+    y_test = np_utils.to_categorical(y_test)
+    y_train = np_utils.to_categorical(y_train)
+    return (X_train, y_train), (X_test, y_test)


 ####################
 # SEQUENTIAL TEST  #
 ####################

+def test_sequential_fit_generator():
+    (X_train, y_train), (X_test, y_test) = _get_test_data()
+
+    def data_generator(train):
+        if train:
+            max_batch_index = len(X_train) // batch_size
+        else:
+            max_batch_index = len(X_test) // batch_size
+        i = 0
+        while 1:
+            if train:
+                yield (X_train[i * batch_size: (i + 1) * batch_size], y_train[i * batch_size: (i + 1) * batch_size])
+            else:
+                yield (X_test[i * batch_size: (i + 1) * batch_size], y_test[i * batch_size: (i + 1) * batch_size])
+            i += 1
+            i = i % max_batch_index
+
+    model = Sequential()
+    model.add(Dense(nb_hidden, input_shape=(input_dim,)))
+    model.add(Activation('relu'))
+    model.add(Dense(nb_class))
+    model.add(Activation('softmax'))
+    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
+
+    model.fit_generator(data_generator(True), len(X_train), nb_epoch, show_accuracy=False)
+    model.fit_generator(data_generator(True), len(X_train), nb_epoch, show_accuracy=True)
+    model.fit_generator(data_generator(True), len(X_train), nb_epoch, show_accuracy=False, validation_data=(X_test, y_test))
+    model.fit_generator(data_generator(True), len(X_train), nb_epoch, show_accuracy=True, validation_data=(X_test, y_test))
+
+    loss = model.evaluate(X_train, y_train, verbose=0)
+    assert(loss < 0.9)
+

 def test_sequential():
+    (X_train, y_train), (X_test, y_test) = _get_test_data()
+
    model = Sequential()
    model.add(Dense(nb_hidden, input_shape=(input_dim,)))
    model.add(Activation('relu'))
@ -55,8 +94,8 @@ def test_sequential():

    model.train_on_batch(X_train[:32], y_train[:32])

-    loss = model.evaluate(X_train, y_train, verbose=0)
-    assert(loss < 0.7)
+    loss = model.evaluate(X_test, y_test, verbose=0)
+    assert(loss < 0.8)

    model.predict(X_test, verbose=0)
    model.predict_classes(X_test, verbose=0)
@ -74,7 +113,7 @@ def test_sequential():
    model.load_weights(fname)
    os.remove(fname)

-    nloss = model.evaluate(X_train, y_train, verbose=0)
+    nloss = model.evaluate(X_test, y_test, verbose=0)
    assert(loss == nloss)

    # test json serialization
@ -87,6 +126,7 @@ def test_sequential():


 def test_merge_sum():
+    (X_train, y_train), (X_test, y_test) = _get_test_data()
    left = Sequential()
    left.add(Dense(nb_hidden, input_shape=(input_dim,)))
    left.add(Activation('relu'))
@ -108,8 +148,8 @@ def test_merge_sum():
    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0)
    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0, shuffle=False)

-    loss = model.evaluate([X_train, X_train], y_train, verbose=0)
-    assert(loss < 0.7)
+    loss = model.evaluate([X_test, X_test], y_test, verbose=0)
+    assert(loss < 0.8)

    model.predict([X_test, X_test], verbose=0)
    model.predict_classes([X_test, X_test], verbose=0)
@ -133,13 +173,15 @@ def test_merge_sum():
    os.remove(fname)
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

-    nloss = model.evaluate([X_train, X_train], y_train, verbose=0)
+    nloss = model.evaluate([X_test, X_test], y_test, verbose=0)
    assert(loss == nloss)


@pytest.mark.skipif(K._BACKEND == 'tensorflow',
                    reason='currently not working with TensorFlow')
 def test_merge_dot():
+    (X_train, y_train), (X_test, y_test) = _get_test_data()
+
    left = Sequential()
    left.add(Dense(input_dim=input_dim, output_dim=nb_hidden))
    left.add(Activation('relu'))
@ -172,6 +214,8 @@ def test_merge_dot():


 def test_merge_concat():
+    (X_train, y_train), (X_test, y_test) = _get_test_data()
+
    left = Sequential()
    left.add(Dense(nb_hidden, input_shape=(input_dim,)))
    left.add(Activation('relu'))
@ -193,8 +237,8 @@ def test_merge_concat():
    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0)
    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0, shuffle=False)

-    loss = model.evaluate([X_train, X_train], y_train, verbose=0)
-    assert(loss < 0.7)
+    loss = model.evaluate([X_test, X_test], y_test, verbose=0)
+    assert(loss < 0.8)

    model.predict([X_test, X_test], verbose=0)
    model.predict_classes([X_test, X_test], verbose=0)
@ -221,11 +265,12 @@ def test_merge_concat():
    model.load_weights(fname)
    os.remove(fname)

-    nloss = model.evaluate([X_train, X_train], y_train, verbose=0)
+    nloss = model.evaluate([X_test, X_test], y_test, verbose=0)
    assert(loss == nloss)


 def test_merge_recursivity():
+    (X_train, y_train), (X_test, y_test) = _get_test_data()
    left = Sequential()
    left.add(Dense(nb_hidden, input_shape=(input_dim,)))
    left.add(Activation('relu'))
@ -256,8 +301,8 @@ def test_merge_recursivity():
    model.fit([X_train, X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0)
    model.fit([X_train, X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0, shuffle=False)

-    loss = model.evaluate([X_train, X_train, X_train], y_train, verbose=0)
-    assert(loss < 0.7)
+    loss = model.evaluate([X_test, X_test, X_test], y_test, verbose=0)
+    assert(loss < 0.8)

    model.predict([X_test, X_test, X_test], verbose=0)
    model.predict_classes([X_test, X_test, X_test], verbose=0)
@ -269,11 +314,12 @@ def test_merge_recursivity():
    model.load_weights(fname)
    os.remove(fname)

-    nloss = model.evaluate([X_train, X_train, X_train], y_train, verbose=0)
+    nloss = model.evaluate([X_test, X_test, X_test], y_test, verbose=0)
    assert(loss == nloss)


 def test_merge_overlap():
+    (X_train, y_train), (X_test, y_test) = _get_test_data()
    left = Sequential()
    left.add(Dense(nb_hidden, input_shape=(input_dim,)))
    left.add(Activation('relu'))
@ -293,7 +339,7 @@ def test_merge_overlap():

    model.train_on_batch(X_train[:32], y_train[:32])

-    loss = model.evaluate(X_train, y_train, verbose=0)
+    loss = model.evaluate(X_test, y_test, verbose=0)
    assert(loss < 0.9)
    model.predict(X_test, verbose=0)
    model.predict_classes(X_test, verbose=0)
@ -305,11 +351,12 @@ def test_merge_overlap():
    model.load_weights(fname)
    os.remove(fname)

-    nloss = model.evaluate(X_train, y_train, verbose=0)
+    nloss = model.evaluate(X_test, y_test, verbose=0)
    assert(loss == nloss)


 def test_lambda():
+    (X_train, y_train), (X_test, y_test) = _get_test_data()
    def func(X):
        s = X[0]
        for i in range(1, len(X)):
@ -344,8 +391,8 @@ def test_lambda():
    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0)
    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0, shuffle=False)

-    loss = model.evaluate([X_train, X_train], y_train, verbose=0)
-    assert(loss < 0.7)
+    loss = model.evaluate([X_test, X_test], y_test, verbose=0)
+    assert(loss < 0.8)

    model.predict([X_test, X_test], verbose=0)
    model.predict_classes([X_test, X_test], verbose=0)
@ -370,7 +417,7 @@ def test_lambda():
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
    os.remove(fname)

-    nloss = model.evaluate([X_train, X_train], y_train, verbose=0)
+    nloss = model.evaluate([X_test, X_test], y_test, verbose=0)
    assert(loss == nloss)


@ -388,14 +435,136 @@ def test_sequential_count_params():
    model.add(Dense(nb_units))
    model.add(Dense(nb_classes))
    model.add(Activation('softmax'))
-
    assert(n == model.count_params())

    model.compile('sgd', 'binary_crossentropy')
-
    assert(n == model.count_params())


+def test_siamese_1():
+    (X_train, y_train), (X_test, y_test) = _get_test_data()
+    left = Sequential()
+    left.add(Dense(nb_hidden, input_shape=(input_dim,)))
+    left.add(Activation('relu'))
+
+    right = Sequential()
+    right.add(Dense(nb_hidden, input_shape=(input_dim,)))
+    right.add(Activation('relu'))
+
+    model = Sequential()
+    model.add(Siamese(Dense(nb_hidden), [left, right], merge_mode='sum'))
+    model.add(Dense(nb_class))
+    model.add(Activation('softmax'))
+    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
+
+    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=True, verbose=0, validation_data=([X_test, X_test], y_test))
+    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=False, verbose=0, validation_data=([X_test, X_test], y_test))
+    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=True, verbose=0, validation_split=0.1)
+    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=False, verbose=0, validation_split=0.1)
+    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0)
+    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0, shuffle=False)
+
+    loss = model.evaluate([X_test, X_test], y_test, verbose=0)
+    assert(loss < 0.8)
+
+    model.predict([X_test, X_test], verbose=0)
+    model.predict_classes([X_test, X_test], verbose=0)
+    model.predict_proba([X_test, X_test], verbose=0)
+    model.get_config(verbose=0)
+
+    # test weight saving
+    fname = 'test_siamese_1.h5'
+    model.save_weights(fname, overwrite=True)
+    left = Sequential()
+    left.add(Dense(nb_hidden, input_shape=(input_dim,)))
+    left.add(Activation('relu'))
+
+    right = Sequential()
+    right.add(Dense(nb_hidden, input_shape=(input_dim,)))
+    right.add(Activation('relu'))
+
+    model = Sequential()
+    model.add(Siamese(Dense(nb_hidden), [left, right], merge_mode='sum'))
+    model.add(Dense(nb_class))
+    model.add(Activation('softmax'))
+
+    model.load_weights(fname)
+    os.remove(fname)
+    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
+
+    nloss = model.evaluate([X_test, X_test], y_test, verbose=0)
+    assert(loss == nloss)
+
+
+def test_siamese_2():
+    (X_train, y_train), (X_test, y_test) = _get_test_data()
+    left = Sequential()
+    left.add(Dense(nb_hidden, input_shape=(input_dim,)))
+    left.add(Activation('relu'))
+
+    right = Sequential()
+    right.add(Dense(nb_hidden, input_shape=(input_dim,)))
+    right.add(Activation('relu'))
+
+    add_shared_layer(Dense(nb_hidden), [left, right])
+
+    left.add(Dense(nb_hidden))
+    right.add(Dense(nb_hidden))
+
+    add_shared_layer(Dense(nb_hidden), [left, right])
+
+    model = Sequential()
+    model.add(Merge([left, right], mode='sum'))
+    model.add(Dense(nb_class))
+    model.add(Activation('softmax'))
+    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
+
+    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=True, verbose=0, validation_data=([X_test, X_test], y_test))
+    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=False, verbose=0, validation_data=([X_test, X_test], y_test))
+    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=True, verbose=0, validation_split=0.1)
+    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=False, verbose=0, validation_split=0.1)
+    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0)
+    model.fit([X_train, X_train], y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=0, shuffle=False)
+
+    loss = model.evaluate([X_test, X_test], y_test, verbose=0)
+    assert(loss < 0.8)
+
+    model.predict([X_test, X_test], verbose=0)
+    model.predict_classes([X_test, X_test], verbose=0)
+    model.predict_proba([X_test, X_test], verbose=0)
+    model.get_config(verbose=0)
+
+    # test weight saving
+    fname = 'test_siamese_2.h5'
+    model.save_weights(fname, overwrite=True)
+    left = Sequential()
+    left.add(Dense(nb_hidden, input_shape=(input_dim,)))
+    left.add(Activation('relu'))
+
+    right = Sequential()
+    right.add(Dense(nb_hidden, input_shape=(input_dim,)))
+    right.add(Activation('relu'))
+
+    add_shared_layer(Dense(nb_hidden), [left, right])
+
+    left.add(Dense(nb_hidden))
+    right.add(Dense(nb_hidden))
+
+    add_shared_layer(Dense(nb_hidden), [left, right])
+
+    model = Sequential()
+    model.add(Merge([left, right], mode='sum'))
+    model.add(Dense(nb_class))
+    model.add(Activation('softmax'))
+
+    model.load_weights(fname)
+    os.remove(fname)
+    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
+
+    nloss = model.evaluate([X_test, X_test], y_test, verbose=0)
+    assert(loss == nloss)
+
+
 ###############
 # GRAPH TEST  #
 ###############
@ -412,6 +581,35 @@ def test_sequential_count_params():
                                                                                 output_shape=(1,))


+def test_graph_fit_generator():
+    def data_generator_graph(train):
+        while 1:
+            if train:
+                yield {'input1': X_train_graph, 'output1': y_train_graph}
+            else:
+                yield {'input1': X_test_graph, 'output1': y_test_graph}
+
+    graph = Graph()
+    graph.add_input(name='input1', input_shape=(32,))
+
+    graph.add_node(Dense(16), name='dense1', input='input1')
+    graph.add_node(Dense(4), name='dense2', input='input1')
+    graph.add_node(Dense(4), name='dense3', input='dense1')
+
+    graph.add_output(name='output1',
+                     inputs=['dense2', 'dense3'],
+                     merge_mode='sum')
+    graph.compile('rmsprop', {'output1': 'mse'})
+
+    graph.fit_generator(data_generator_graph(True), 1000, nb_epoch=4)
+    graph.fit_generator(data_generator_graph(True), 1000, nb_epoch=4)
+    graph.fit_generator(data_generator_graph(True), 1000, nb_epoch=4, validation_data={'input1': X_test_graph, 'output1': y_test_graph})
+    graph.fit_generator(data_generator_graph(True), 1000, nb_epoch=4, validation_data={'input1': X_test_graph, 'output1': y_test_graph})
+
+    loss = graph.evaluate({'input1': X_test_graph, 'output1': y_test_graph}, verbose=0)
+    assert(loss < 3.)
+
+
 def test_1o_1i():
    # test a non-sequential graph with 1 input and 1 output
    np.random.seed(1337)
@ -435,7 +633,7 @@ def test_1o_1i():
    assert(len(out) == 1)
    loss = graph.test_on_batch({'input1': X_test_graph, 'output1': y_test_graph})
    loss = graph.train_on_batch({'input1': X_test_graph, 'output1': y_test_graph})
-    loss = graph.evaluate({'input1': X_test_graph, 'output1': y_test_graph})
+    loss = graph.evaluate({'input1': X_test_graph, 'output1': y_test_graph}, verbose=0)
    assert(loss < 2.5)

    # test validation split
@ -507,6 +705,89 @@ def test_1o_2i():
    graph.get_config(verbose=1)


+def test_siamese_3():
+    graph = Graph()
+    graph.add_input(name='input1', input_shape=(32,))
+    graph.add_input(name='input2', input_shape=(32,))
+
+    graph.add_shared_node(Dense(16), name='shared', inputs=['input1', 'input2'], merge_mode='sum')
+    graph.add_node(Dense(4), name='dense1', input='shared')
+    graph.add_node(Dense(4), name='dense2', input='dense1')
+
+    graph.add_output(name='output1', input='dense2')
+    graph.compile('rmsprop', {'output1': 'mse'})
+
+    graph.fit({'input1': X_train_graph, 'input2': X2_train_graph, 'output1': y_train_graph},
+              nb_epoch=10)
+    out = graph.predict({'input1': X_test_graph, 'input2': X2_test_graph})
+    assert(type(out == dict))
+    assert(len(out) == 1)
+
+    loss = graph.test_on_batch({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
+    loss = graph.train_on_batch({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
+    loss = graph.evaluate({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
+    assert(loss < 3.0)
+
+    graph.get_config(verbose=1)
+
+
+def test_siamese_4():
+    graph = Graph()
+    graph.add_input(name='input1', input_shape=(32,))
+    graph.add_input(name='input2', input_shape=(32,))
+
+    graph.add_shared_node(Dense(16), name='shared1', inputs=['input1', 'input2'])
+    graph.add_shared_node(Dense(4), name='shared2', inputs=['shared1'])
+    graph.add_shared_node(Dense(4), name='shared3', inputs=['shared2'], merge_mode='sum')
+    graph.add_node(Dense(4), name='dense', input='shared3')
+
+    graph.add_output(name='output1', input='dense',
+                     merge_mode='sum')
+    graph.compile('rmsprop', {'output1': 'mse'})
+
+    graph.fit({'input1': X_train_graph, 'input2': X2_train_graph, 'output1': y_train_graph},
+              nb_epoch=10)
+    out = graph.predict({'input1': X_test_graph, 'input2': X2_test_graph})
+    assert(type(out == dict))
+    assert(len(out) == 1)
+
+    loss = graph.test_on_batch({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
+    loss = graph.train_on_batch({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
+    loss = graph.evaluate({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
+    assert(loss < 3.0)
+
+    graph.get_config(verbose=1)
+
+
+def test_siamese_5():
+    graph = Graph()
+    graph.add_input(name='input1', input_shape=(32,))
+    graph.add_input(name='input2', input_shape=(32,))
+
+    graph.add_shared_node(Dense(16), name='shared1', inputs=['input1', 'input2'])
+    graph.add_shared_node(Dense(4), name='shared2', inputs=['shared1'])
+    graph.add_shared_node(Dense(4), name='shared3', inputs=['shared2'], outputs=['shared_output1','shared_output2'])
+    graph.add_node(Dense(4), name='dense1',  input='shared_output1')
+    graph.add_node(Dense(4), name='dense2',  input='shared_output2')
+
+    graph.add_output(name='output1', inputs=['dense1', 'dense2'],
+                     merge_mode='sum')
+    graph.compile('rmsprop', {'output1': 'mse'})
+
+    graph.fit({'input1': X_train_graph, 'input2': X2_train_graph, 'output1': y_train_graph},
+              nb_epoch=10)
+    out = graph.predict({'input1': X_test_graph, 'input2': X2_test_graph})
+    assert(type(out == dict))
+    assert(len(out) == 1)
+
+    loss = graph.test_on_batch({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
+    loss = graph.train_on_batch({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
+    loss = graph.evaluate({'input1': X_test_graph, 'input2': X2_test_graph, 'output1': y_test_graph})
+    assert(loss < 3.0)
+
+    graph.get_config(verbose=1)
+
+
 def test_2o_1i_weights():
    # test a non-sequential graph with 1 input and 2 outputs
    graph = Graph()
--- a/tests/keras/test_optimizers.py
+++ b/tests/keras/test_optimizers.py
@ -2,7 +2,7 @@ from __future__ import print_function
 import pytest

 from keras.utils.test_utils import get_test_data
-from keras.optimizers import SGD, RMSprop, Adagrad, Adadelta, Adam
+from keras.optimizers import SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax
 from keras.models import Sequential
 from keras.layers.core import Dense, Activation
 from keras.utils.np_utils import to_categorical
@ -32,28 +32,34 @@ def _test_optimizer(optimizer, target=0.9):
    history = model.fit(X_train, y_train, nb_epoch=12, batch_size=16,
                        validation_data=(X_test, y_test),
                        show_accuracy=True, verbose=2)
-    return history.history['val_acc'][-1] > target
+    config = optimizer.get_config()
+    assert type(config) == dict
+    assert history.history['val_acc'][-1] > target


 def test_sgd():
    sgd = SGD(lr=0.01, momentum=0.9, nesterov=True)
-    assert(_test_optimizer(sgd))
+    _test_optimizer(sgd)


 def test_rmsprop():
-    assert(_test_optimizer(RMSprop()))
+    _test_optimizer(RMSprop())


 def test_adagrad():
-    assert(_test_optimizer(Adagrad()))
+    _test_optimizer(Adagrad())


 def test_adadelta():
-    assert(_test_optimizer(Adadelta()))
+    _test_optimizer(Adadelta())


 def test_adam():
-    assert(_test_optimizer(Adam()))
+    _test_optimizer(Adam())
+
+
+def test_adamax():
+    _test_optimizer(Adamax())


 if __name__ == '__main__':
--- a/tests/test_shape_inference.py
+++ b/tests/test_shape_inference.py
@ -22,10 +22,19 @@ def check_layer_output_shape(layer, input_data):
 # Core #
 ########
 def test_Reshape():
-    layer = Reshape(dims=(2, 3))
    input_data = np.random.random((2, 6))
+
+    layer = Reshape(dims=(2, 3))
    check_layer_output_shape(layer, input_data)

+    layer = Reshape(dims=(-1,))
+    check_layer_output_shape(layer, input_data)
+
+    layer = Reshape(dims=(-1, 2))
+    check_layer_output_shape(layer, input_data)
+
+    layer = Reshape(dims=(2, -1))
+    check_layer_output_shape(layer, input_data)

 def test_Permute():
    layer = Permute(dims=(1, 3, 2))
--- a/tests/test_tasks.py
+++ b/tests/test_tasks.py
@ -1,129 +0,0 @@
-from __future__ import print_function
-import numpy as np
-import pytest
-np.random.seed(1337)
-
-from keras.utils.test_utils import get_test_data
-from keras.models import Sequential
-from keras.layers.core import Dense, TimeDistributedDense, Flatten
-from keras.layers.recurrent import GRU
-from keras.layers.convolutional import Convolution2D
-from keras.utils.np_utils import to_categorical
-
-
-def test_vector_classification():
-    nb_hidden = 10
-
-    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
-                                                         nb_test=200,
-                                                         input_shape=(20,),
-                                                         classification=True,
-                                                         nb_class=2)
-    y_train = to_categorical(y_train)
-    y_test = to_categorical(y_test)
-
-    model = Sequential([
-        Dense(nb_hidden, input_shape=(X_train.shape[-1],), activation='relu'),
-        Dense(y_train.shape[-1], activation='softmax')
-    ])
-    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
-    history = model.fit(X_train, y_train, nb_epoch=15, batch_size=16,
-                        validation_data=(X_test, y_test),
-                        show_accuracy=True, verbose=0)
-    assert(history.history['val_acc'][-1] > 0.8)
-
-
-def test_vector_regression():
-    nb_hidden = 10
-    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
-                                                         nb_test=200,
-                                                         input_shape=(20,),
-                                                         output_shape=(2,),
-                                                         classification=False)
-
-    model = Sequential([
-        Dense(nb_hidden, input_shape=(X_train.shape[-1],), activation='tanh'),
-        Dense(y_train.shape[-1])
-    ])
-
-    model.compile(loss='hinge', optimizer='adagrad')
-    history = model.fit(X_train, y_train, nb_epoch=20, batch_size=16,
-                        validation_data=(X_test, y_test), verbose=0)
-    assert (history.history['val_loss'][-1] < 0.9)
-
-
-def test_temporal_classification():
-    np.random.seed(1337)
-    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
-                                                         nb_test=200,
-                                                         input_shape=(3, 5),
-                                                         classification=True,
-                                                         nb_class=2)
-    y_train = to_categorical(y_train)
-    y_test = to_categorical(y_test)
-
-    model = Sequential()
-    model.add(GRU(y_train.shape[-1],
-                  input_shape=(X_train.shape[1], X_train.shape[2]),
-                  activation='softmax'))
-    model.compile(loss='categorical_crossentropy', optimizer='adadelta')
-    history = model.fit(X_train, y_train, nb_epoch=20, batch_size=16,
-                        validation_data=(X_test, y_test),
-                        show_accuracy=True, verbose=0)
-    assert(history.history['val_acc'][-1] > 0.9)
-
-
-def test_temporal_regression():
-    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
-                                                         nb_test=200,
-                                                         input_shape=(3, 5),
-                                                         output_shape=(2,),
-                                                         classification=False)
-    model = Sequential()
-    model.add(GRU(y_train.shape[-1],
-              input_shape=(X_train.shape[1], X_train.shape[2])))
-    model.compile(loss='hinge', optimizer='adam')
-    history = model.fit(X_train, y_train, nb_epoch=20, batch_size=16,
-                        validation_data=(X_test, y_test), verbose=0)
-    assert(history.history['val_loss'][-1] < 0.8)
-
-
-def test_sequence_to_sequence():
-    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
-                                                         nb_test=200,
-                                                         input_shape=(3, 5),
-                                                         output_shape=(3, 5),
-                                                         classification=False)
-
-    model = Sequential()
-    model.add(TimeDistributedDense(y_train.shape[-1],
-              input_shape=(X_train.shape[1], X_train.shape[2])))
-    model.compile(loss='hinge', optimizer='rmsprop')
-    history = model.fit(X_train, y_train, nb_epoch=20, batch_size=16,
-                        validation_data=(X_test, y_test), verbose=0)
-    assert(history.history['val_loss'][-1] < 0.8)
-
-
-def test_image_classification():
-    (X_train, y_train), (X_test, y_test) = get_test_data(nb_train=500,
-                                                         nb_test=200,
-                                                         input_shape=(3, 8, 8),
-                                                         classification=True,
-                                                         nb_class=2)
-    y_train = to_categorical(y_train)
-    y_test = to_categorical(y_test)
-
-    model = Sequential([
-        Convolution2D(8, 8, 8, input_shape=(3, 8, 8), activation='sigmoid'),
-        Flatten(),
-        Dense(y_test.shape[-1], activation='softmax')
-    ])
-    model.compile(loss='categorical_crossentropy', optimizer='sgd')
-    history = model.fit(X_train, y_train, nb_epoch=20, batch_size=16,
-                        validation_data=(X_test, y_test),
-                        show_accuracy=True, verbose=0)
-    assert(history.history['val_acc'][-1] > 0.9)
-
-
-if __name__ == '__main__':
-    pytest.main([__file__])