Update documentation.

2015-11-28 16:34:35 -08:00 · 2015-11-28 16:34:35 -08:00 · 634aedca1a
commit 634aedca1a
parent 4b39b5f36b
9 changed files with 425 additions and 352 deletions
--- a/README.md
+++ b/README.md
@ -1,8 +1,8 @@
-# Keras: Theano-based Deep Learning library
+# Keras: Deep Learning library for Theano and TensorFlow

 ## You have just found Keras.

-Keras is a minimalist, highly modular neural network library in the spirit of Torch, written in Python / Theano so as not to have to deal with the dearth of ecosystem in Lua. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
+Keras is a minimalist, highly modular neural networks library, written in Python and capable of running either on top of either [TensorFlow](https://github.com/tensorflow/tensorflow) or [Theano](https://github.com/Theano/Theano). It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

 Use Keras if you need a deep learning library that:
 - allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).
@ -12,200 +12,92 @@ Use Keras if you need a deep learning library that:

 Read the documentation at [Keras.io](http://keras.io).

-Keras is compatible with __Python 2.7-3.4__.
+Keras is compatible with:
+    - __Python 2.7-3.5__ with the Theano backend
+    - __Python 2.7__ with the TensorFlow backend
+
+
+------------------
+

 ## Guiding principles

 - __Modularity.__ A model is understood as a sequence or a graph of standalone, fully-configurable modules that can be plugged together with as little restrictions as possible. In particular, neural layers, cost functions, optimizers, initialization schemes, activation functions, regularization schemes are all standalone modules that you can combine to create new models.

- __Minimalism.__ Each module should be kept short and simple (<100 lines of code). Every piece of code should be transparent upon first reading. No black magic: it hurts iteration speed and ability to innovate.
+- __Minimalism.__ Each module should be kept short and simple. Every piece of code should be transparent upon first reading. No black magic: it hurts iteration speed and ability to innovate.

- __Easy extensibility.__ New modules are dead simple to add (as new classes/functions), and existing modules provide ample examples. To be able to easily create new modules allows for total expressiveness, making Keras suitable for advanced research.
+- __Easy extensibility.__ New modules are dead simple to add (as new classes and functions), and existing modules provide ample examples. To be able to easily create new modules allows for total expressiveness, making Keras suitable for advanced research.

- __Work with Python__. No separate models configuration files in a declarative format (like in Caffe or PyLearn2). Models are described in Python code, which is compact, easier to debug, and allows for ease of extensibility.
+- __Work with Python__. No separate models configuration files in a declarative format. Models are described in Python code, which is compact, easier to debug, and allows for ease of extensibility.

-## Examples

-### Multilayer Perceptron (MLP):
+------------------
+
+
+## Getting started: 30 seconds to Keras
+
+The core datastructure of Keras is a __model__, a way to organize layers. There are two types of models: [`Sequential`](/models/#sequential) and [`Graph`](/models/#graph).
+
+Here's the `Sequential` model (a linear pile of layers):

 ```python
 from keras.models import Sequential
-from keras.layers.core import Dense, Dropout, Activation
+
+model = Sequential()
+```
+
+Stacking layers is as easy as `.add()`:
+
+```python
+from keras.layers.core import Dense, Activation
+
+model.add(Dense(output_dim=64, input_dim=100, init="glorot_uniform"))
+model.add(Activation("relu"))
+model.add(Dense(output_dim=10, init="glorot_uniform"))
+model.add(Activation("softmax"))
+```
+
+Once your model looks good, configure its learning process with `.compile()`:
+```python
+model.compile(loss='categorical_crossentropy', optimizer='sgd')
+```
+
+If you need to, you can further configure your optimizer. A core principle of Keras is to make things reasonably simple, while allowing the user to be fully in control when they need to (the ultimate control being the easy extensibility of the source code).
+```python
 from keras.optimizers import SGD
-
-model = Sequential()
-# Dense(64) is a fully-connected layer with 64 hidden units.
-# in the first layer, you must specify the expected input data shape:
-# here, 20-dimensional vectors.
-model.add(Dense(64, input_dim=20, init='uniform'))
-model.add(Activation('tanh'))
-model.add(Dropout(0.5))
-model.add(Dense(64, init='uniform'))
-model.add(Activation('tanh'))
-model.add(Dropout(0.5))
-model.add(Dense(2, init='uniform'))
-model.add(Activation('softmax'))
-
-sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
-model.compile(loss='mean_squared_error', optimizer=sgd)
-
-model.fit(X_train, y_train, nb_epoch=20, batch_size=16)
-score = model.evaluate(X_test, y_test, batch_size=16)
+model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True))
 ```

-### Alternative implementation of MLP:
-
+You can now iterate on your training data in batches:
 ```python
-model = Sequential()
-model.add(Dense(64, input_dim=20, init='uniform', activation='tanh'))
-model.add(Dropout(0.5))
-model.add(Dense(64, init='uniform', activation='tanh'))
-model.add(Dropout(0.5))
-model.add(Dense(2, init='uniform', activation='softmax'))
-
-sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
-model.compile(loss='mean_squared_error', optimizer=sgd)
+model.fit(X_train, Y_train, nb_epoch=5, batch_size=32)
 ```

-### VGG-like convnet:
-
+Alternatively, you can feed batches to your model manually:
 ```python
-from keras.models import Sequential
-from keras.layers.core import Dense, Dropout, Activation, Flatten
-from keras.layers.convolutional import Convolution2D, MaxPooling2D
-from keras.optimizers import SGD
-
-model = Sequential()
-# input: 100x100 images with 3 channels -> (3, 100, 100) tensors.
-# this applies 32 convolution filters of size 3x3 each.
-model.add(Convolution2D(32, 3, 3, border_mode='full', input_shape=(3, 100, 100)))
-model.add(Activation('relu'))
-model.add(Convolution2D(32, 3, 3))
-model.add(Activation('relu'))
-model.add(MaxPooling2D(pool_size=(2, 2)))
-model.add(Dropout(0.25))
-
-model.add(Convolution2D(64, 3, 3, border_mode='valid'))
-model.add(Activation('relu'))
-model.add(Convolution2D(64, 3, 3))
-model.add(Activation('relu'))
-model.add(MaxPooling2D(pool_size=(2, 2)))
-model.add(Dropout(0.25))
-
-model.add(Flatten())
-# Note: Keras does automatic shape inference.
-model.add(Dense(256))
-model.add(Activation('relu'))
-model.add(Dropout(0.5))
-
-model.add(Dense(10))
-model.add(Activation('softmax'))
-
-sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
-model.compile(loss='categorical_crossentropy', optimizer=sgd)
-
-model.fit(X_train, Y_train, batch_size=32, nb_epoch=1)
-
+model.train_on_batch(X_batch, Y_batch)
 ```

-### Sequence classification with LSTM:
-
+Evaluate your performance in one line:
 ```python
-from keras.models import Sequential
-from keras.layers.core import Dense, Dropout, Activation
-from keras.layers.embeddings import Embedding
-from keras.layers.recurrent import LSTM
-
-model = Sequential()
-model.add(Embedding(max_features, 256, input_length=maxlen))
-model.add(LSTM(output_dim=128, activation='sigmoid', inner_activation='hard_sigmoid'))
-model.add(Dropout(0.5))
-model.add(Dense(1))
-model.add(Activation('sigmoid'))
-
-model.compile(loss='binary_crossentropy', optimizer='rmsprop')
-
-model.fit(X_train, Y_train, batch_size=16, nb_epoch=10)
-score = model.evaluate(X_test, Y_test, batch_size=16)
+objective_score = model.evaluate(X_test, Y_test, batch_size=32)
 ```

-### Architecture for learning image captions with a convnet and a Gated Recurrent Unit:
-(word-level embedding, caption of maximum length 16 words).
-
-Note that getting this to work well will require using a bigger convnet, initialized with pre-trained weights.
-
+Or generate predictions on new data:
 ```python
-max_caption_len = 16
-vocab_size = 10000
-
-# first, let's define an image model that
-# will encode pictures into 128-dimensional vectors.
-# it should be initialized with pre-trained weights.
-image_model = Sequential()
-image_model.add(Convolution2D(32, 3, 3, border_mode='full', input_shape=(3, 100, 100)))
-image_model.add(Activation('relu'))
-image_model.add(Convolution2D(32, 3, 3))
-image_model.add(Activation('relu'))
-image_model.add(MaxPooling2D(pool_size=(2, 2)))
-
-image_model.add(Convolution2D(64, 3, 3, border_mode='full'))
-image_model.add(Activation('relu'))
-image_model.add(Convolution2D(64, 3, 3))
-image_model.add(Activation('relu'))
-image_model.add(MaxPooling2D(pool_size=(2, 2)))
-
-image_model.add(Flatten())
-image_model.add(Dense(128))
-
-# let's load the weights from a save file.
-image_model.load_weights('weight_file.h5')
-
-# next, let's define a RNN model that encodes sequences of words
-# into sequences of 128-dimensional word vectors.
-language_model = Sequential()
-language_model.add(Embedding(vocab_size, 256, input_length=max_caption_len))
-language_model.add(GRU(output_dim=128, return_sequences=True))
-language_model.add(TimeDistributedDense(128))
-
-# let's repeat the image vector to turn it into a sequence.
-image_model.add(RepeatVector(max_caption_len))
-
-# the output of both models will be tensors of shape (samples, max_caption_len, 128).
-# let's concatenate these 2 vector sequences.
-model = Merge([image_model, language_model], mode='concat', concat_axis=-1)
-# let's encode this vector sequence into a single vector
-model.add(GRU(256, 256, return_sequences=False))
-# which will be used to compute a probability
-# distribution over what the next word in the caption should be!
-model.add(Dense(vocab_size))
-model.add(Activation('softmax'))
-
-model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
-
-# "images" is a numpy float array of shape (nb_samples, nb_channels=3, width, height).
-# "captions" is a numpy integer array of shape (nb_samples, max_caption_len)
-# containing word index sequences representing partial captions.
-# "next_words" is a numpy float array of shape (nb_samples, vocab_size)
-# containing a categorical encoding (0s and 1s) of the next word in the corresponding
-# partial caption.
-model.fit([images, partial_captions], next_words, batch_size=16, nb_epoch=100)
+classes = model.predict_classes(X_test, batch_size=32)
+proba = model.predict_proba(X_test, batch_size=32)
 ```

-In the examples folder, you will find example models for real datasets:
- CIFAR10 small images classification: Convolutional Neural Network (CNN) with realtime data augmentation
- IMDB movie review sentiment classification: LSTM over sequences of words
- Reuters newswires topic classification: Multilayer Perceptron (MLP)
- MNIST handwritten digits classification: MLP & CNN
- Character-level text generation with LSTM
+Building a network of LSTMs, a deep CNN, a Neural Turing Machine, a word2vec embedder or any other model is just as fast. The ideas behind deep learning are simple, so why should their implementation be painful?

-...and more.
+Have a look at these [starter examples](http://keras.io/examples/).
+
+In the [examples folder](https://github.com/fchollet/keras/tree/master/examples) of the repo, you will find more advanced models: question-answering with memory networks, text generation with stacked LSTMs, neural turing machines, etc.


-## Current capabilities
+------------------

-For complete coverage of the API, check out [the Keras documentation](http://keras.io).
-
-A few highlights: convnets, LSTM, GRU, word2vec-style embeddings, PReLU, BatchNormalization...

 ## Installation

@ -213,16 +105,22 @@ Keras uses the following dependencies:

 - numpy, scipy
 - pyyaml
- Theano
-    - See installation instructions: http://deeplearning.net/software/theano/install.html#install
 - HDF5 and h5py (optional, required if you use model saving/loading functions)
 - Optional but recommended if you use CNNs: cuDNN.

+When using the Theano backend:
+- Theano
+    - [See installation instructions](http://deeplearning.net/software/theano/install.html#install).
+
 **Note**: You should use the latest version of Theano, not the PyPI version. Install it with:
 ```
 sudo pip install git+git://github.com/Theano/Theano.git
 ```

+When using the TensorFlow backend:
+- TensorFlow
+    - [See installation instructions](https://github.com/tensorflow/tensorflow#download-and-setup).
+
 To install, `cd` to the Keras folder and run the install command:
 ```
 sudo python setup.py install
@ -233,10 +131,45 @@ You can also install Keras from PyPI:
 sudo pip install keras
 ```

+------------------
+
+
+## Switching from Theano to TensorFlow
+
+By default, Keras will use Theano as its tensor manipulation library. [Follow these instructions](http://keras.io/backend/) to configure the Keras backend.
+
+------------------
+
+
+## Support
+
+You can ask questions and join the development discussion on the [Keras Google group](https://groups.google.com/forum/#!forum/keras-users).
+
+------------------
+
+
+## Contribution Guidelines
+
+Keras welcomes all contributions from the community. 
+
+- Keep a pragmatic mindset and avoid bloat. Only add to the source if that is the only path forward.
+- New features should be documented. Make sure you update the documentation along with your Pull Request.
+- Any new function or class should have a proper docstring.
+- The documentation for every new feature should include a usage example in the form of a code snippet. 
+- All changes should be tested. Make sure any new feature you add has a corresponding unit test.
+- Please no Pull Requests about coding style.
+- Even if you don't contribute to the Keras source code, if you have an application of Keras that is concise and powerful, please consider adding it to our collection of [examples](https://github.com/fchollet/keras/tree/master/examples).
+
+
+------------------
+
+
 ## Why this name, Keras?

 Keras (κέρας) means _horn_ in Greek. It is a reference to a literary image from ancient Greek and Latin literature, first found in the _Odyssey_, where dream spirits (_Oneiroi_, singular _Oneiros_) are divided between those who deceive men with false visions, who arrive to Earth through a gate of ivory, and those who announce a future that will come to pass, who arrive through a gate of horn. It's a play on the words κέρας (horn) / κραίνω (fulfill), and ἐλέφας (ivory) / ἐλεφαίρομαι (deceive).

-Keras was developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System).
+Keras was initially developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System).

 >_"Oneiroi are beyond our unravelling --who can be sure what tale they tell? Not all that men look for comes to pass. Two gates there are that give passage to fleeting Oneiroi; one is made of horn, one of ivory. The Oneiroi that pass through sawn ivory are deceitful, bearing a message that will not be fulfilled; those that come out through polished horn have truth behind them, to be accomplished for men who see them."_ Homer, Odyssey 19. 562 ff (Shewring translation).
+
+------------------
--- a/docs/sources/backend.md
+++ b/docs/sources/backend.md
@ -0,0 +1,74 @@
+# Keras backends
+
+## What is a "backend"?
+
+Keras is a model-level library, providing high-level building blocks for developing deep learning models. It does not handle itself low-level operations such as tensor products, convolutions and so on. Instead, it relies on a specialized, well-optimized tensor manipulation library to do so, serving as the "backend engine" of Keras. Rather than picking one single tensor library and making the implementation of Keras tied to that library, Keras handles the problem in a modular way, and several different backend engines can be plugged seamlessly into Keras.
+
+At this time, Keras has two backend implementations available: the **Theano** backend and the **TensorFlow** backend.
+
+- [Theano](http://deeplearning.net/software/theano/) is an open-source symbolic tensor manipulation framework developed by LISA/MILA Lab at Université de Montréal.
+- [TensorFlow](http://www.tensorflow.org/) is an open-source symbolic tensor manipulation framework developed by Google, Inc.
+
+## Switching from one backend to another
+
+If you have run Keras at least once, you will find the Keras configuration file at:
+
+`~/.keras/keras.json`
+
+If it isn't there, you can create it.
+
+It probably looks like this:
+
+`{"epsilon": 1e-07, "floatx": "float32", "backend": "theano"}`
+
+Simply change the field `backend` to either `"theano"` or `"tensorflow"`, and Keras will use the new configuration next time you run any Keras code.
+
+## Using the abstract Keras backend to write new code
+
+If you want the Keras modules you write to be compatible with both Theano and TensorFlow, you have to write them via the abstract Keras backend API. Here's an intro.
+
+You can import the backend module via:
+```python
+from keras import backend as K
+```
+
+This instantiates an input placeholder. It's equivalent to `tf.placeholder()` or `T.matrix()`, `T.tensor3()`, etc.
+
+```python
+input = K.placeholder(shape=(2, 4, 5))
+# also works:
+input = K.placeholder(shape=(None, 4, 5))
+# also works:
+input = K.placeholder(ndim=3)
+```
+
+This instantiates a shared variable. It's equivalent to `tf.variable()` or `theano.shared()`.
+
+```python
+val = np.random.random((3, 4, 5))
+var = K.variable(value=val)
+
+# all-zeros variable:
+var = K.ones(shape=(3, 4, 5))
+# all-ones:
+var = K.zeros(shape=(3, 4, 5))
+```
+
+Most tensor operations you will need can be done as you would in TensorFlow or Theano:
+
+```python
+a = b + c * K.abs(d)
+c = K.dot(a, K.transpose(b))
+a = K.sum(b, axis=2)
+a = K.softmax(b)
+a = concatenate([b, c], axis=-1)
+# etc...
+```
+
+For more information, see the code at `keras/backend/theano_backend.py` and `keras/backend/tensorflow_backend.py`.
+
+
+
+
+
+
--- a/docs/sources/documentation.md
+++ b/docs/sources/documentation.md
@ -6,6 +6,7 @@
 - [Index](documentation.md)
 - [Examples](examples.md)
 - [FAQ](faq.md)
+- [Backend](backend.md)

 ---

--- a/docs/sources/index.md
+++ b/docs/sources/index.md
@ -1,33 +1,38 @@
-# Keras: Theano-based Deep Learning library
+# Keras: Deep Learning library for Theano and TensorFlow

-## Overview
+## You have just found Keras.

-Keras is a minimalist, highly modular neural network library in the spirit of Torch, written in Python, that uses [Theano](http://deeplearning.net/software/theano/) under the hood for optimized tensor manipulation on GPU and CPU. It was developed with a focus on enabling fast experimentation. 
+Keras is a minimalist, highly modular neural networks library, written in Python and capable of running either on top of either [TensorFlow](https://github.com/tensorflow/tensorflow) or [Theano](https://github.com/Theano/Theano). It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

 Use Keras if you need a deep learning library that:
-
 - allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).
 - supports both convolutional networks and recurrent networks, as well as combinations of the two.
 - supports arbitrary connectivity schemes (including multi-input and multi-output training).
 - runs seamlessly on CPU and GPU.

+Read the documentation at [Keras.io](http://keras.io).
+
+Keras is compatible with:
+    - __Python 2.7-3.5__ with the Theano backend
+    - __Python 2.7__ with the TensorFlow backend
+
+
+------------------
+
+
 ## Guiding principles

 - __Modularity.__ A model is understood as a sequence or a graph of standalone, fully-configurable modules that can be plugged together with as little restrictions as possible. In particular, neural layers, cost functions, optimizers, initialization schemes, activation functions, regularization schemes are all standalone modules that you can combine to create new models.

- __Minimalism.__ Each module should be kept short and simple (<100 lines of code). Every piece of code should be transparent upon first reading. No black magic: it hurts iteration speed and ability to innovate.
+- __Minimalism.__ Each module should be kept short and simple. Every piece of code should be transparent upon first reading. No black magic: it hurts iteration speed and ability to innovate.

- __Easy extensibility.__ New modules are dead simple to add (as new classes/functions), and existing modules provide ample examples. To be able to easily create new modules allows for total expressiveness, making Keras suitable for advanced research.
+- __Easy extensibility.__ New modules are dead simple to add (as new classes and functions), and existing modules provide ample examples. To be able to easily create new modules allows for total expressiveness, making Keras suitable for advanced research.

- __Work with Python__. No separate models configuration files in a declarative format (like in Caffe or PyLearn2). Models are described in Python code, which is compact, easier to debug, and allows for ease of extensibility.
+- __Work with Python__. No separate models configuration files in a declarative format. Models are described in Python code, which is compact, easier to debug, and allows for ease of extensibility.

-## Code

-Find the code on Github: [fchollet/keras](https://github.com/fchollet/keras).
+------------------

-## License
-
-Keras is licensed under the [MIT license](http://opensource.org/licenses/MIT). 

 ## Getting started: 30 seconds to Keras

@ -86,60 +91,85 @@ proba = model.predict_proba(X_test, batch_size=32)

 Building a network of LSTMs, a deep CNN, a Neural Turing Machine, a word2vec embedder or any other model is just as fast. The ideas behind deep learning are simple, so why should their implementation be painful?

-Have a look at the [examples](examples.md).
+Have a look at these [starter examples](http://keras.io/examples/).
+
+In the [examples folder](https://github.com/fchollet/keras/tree/master/examples) of the repo, you will find more advanced models: question-answering with memory networks, text generation with stacked LSTMs, neural turing machines, etc.
+
+
+------------------
+

 ## Installation

 Keras uses the following dependencies:

- __numpy__, __scipy__
- __pyyaml__
- __Theano__
-    - See [installation instructions](http://deeplearning.net/software/theano/install.html#install).
- __HDF5__ and __h5py__ (optional, required if you use model saving/loading functions)
- Optional but recommended if you use CNNs: __cuDNN__.
+- numpy, scipy
+- pyyaml
+- HDF5 and h5py (optional, required if you use model saving/loading functions)
+- Optional but recommended if you use CNNs: cuDNN.
+
+When using the Theano backend:
+- Theano
+    - [See installation instructions](http://deeplearning.net/software/theano/install.html#install).

 **Note**: You should use the latest version of Theano, not the PyPI version. Install it with:
 ```
 sudo pip install git+git://github.com/Theano/Theano.git
 ```

-Once you have the dependencies installed, clone the repo:
-```bash
-git clone https://github.com/fchollet/keras.git
+When using the TensorFlow backend:
+- TensorFlow
+    - [See installation instructions](https://github.com/tensorflow/tensorflow#download-and-setup).
+
+To install, `cd` to the Keras folder and run the install command:
 ```
-Go to the Keras folder and run the install command:
-```bash
-cd keras
 sudo python setup.py install
 ```
+
 You can also install Keras from PyPI:
 ```
 sudo pip install keras
 ```

+------------------
+
+
+## Switching from Theano to TensorFlow
+
+By default, Keras will use Theano as its tensor manipulation library. [Follow these instructions](http://keras.io/backend/) to configure the Keras backend.
+
+------------------
+
+
 ## Support

 You can ask questions and join the development discussion on the [Keras Google group](https://groups.google.com/forum/#!forum/keras-users).

+------------------
+
+
 ## Contribution Guidelines

 Keras welcomes all contributions from the community. 

 - Keep a pragmatic mindset and avoid bloat. Only add to the source if that is the only path forward.
 - New features should be documented. Make sure you update the documentation along with your Pull Request.
+- Any new function or class should have a proper docstring.
 - The documentation for every new feature should include a usage example in the form of a code snippet. 
 - All changes should be tested. Make sure any new feature you add has a corresponding unit test.
 - Please no Pull Requests about coding style.
 - Even if you don't contribute to the Keras source code, if you have an application of Keras that is concise and powerful, please consider adding it to our collection of [examples](https://github.com/fchollet/keras/tree/master/examples).


+------------------
+
+
 ## Why this name, Keras?

 Keras (κέρας) means _horn_ in Greek. It is a reference to a literary image from ancient Greek and Latin literature, first found in the _Odyssey_, where dream spirits (_Oneiroi_, singular _Oneiros_) are divided between those who deceive men with false visions, who arrive to Earth through a gate of ivory, and those who announce a future that will come to pass, who arrive through a gate of horn. It's a play on the words κέρας (horn) / κραίνω (fulfill), and ἐλέφας (ivory) / ἐλεφαίρομαι (deceive).

-Keras was developed as part of the research effort of project __ONEIROS__ (*Open-ended Neuro-Electronic Intelligent Robot Operating System*).
+Keras was initially developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System).

-> _"Oneiroi are beyond our unravelling --who can be sure what tale they tell? Not all that men look for comes to pass. Two gates there are that give passage to fleeting Oneiroi; one is made of horn, one of ivory. The Oneiroi that pass through sawn ivory are deceitful, bearing a message that will not be fulfilled; those that come out through polished horn have truth behind them, to be accomplished for men who see them."_ 
+>_"Oneiroi are beyond our unravelling --who can be sure what tale they tell? Not all that men look for comes to pass. Two gates there are that give passage to fleeting Oneiroi; one is made of horn, one of ivory. The Oneiroi that pass through sawn ivory are deceitful, bearing a message that will not be fulfilled; those that come out through polished horn have truth behind them, to be accomplished for men who see them."_ Homer, Odyssey 19. 562 ff (Shewring translation).

-> -- Homer, Odyssey 19. 562 ff (Shewring translation).
+------------------
--- a/docs/sources/layers/convolutional.md
+++ b/docs/sources/layers/convolutional.md
@ -3,17 +3,21 @@

 ```python
 keras.layers.convolutional.Convolution1D(nb_filter, filter_length, 
-        init='uniform', activation='linear', weights=None, 
-        border_mode='valid', subsample_length=1, 
-        W_regularizer=None, b_regularizer=None, W_constraint=None, 
-        b_constraint=None, input_dim=None, input_length=None)
+        init='uniform',
+        activation='linear',
+        weights=None, 
+        border_mode='valid',
+        subsample_length=1, 
+        W_regularizer=None, b_regularizer=None,
+        W_constraint=None, b_constraint=None,
+        input_dim=None, input_length=None)
 ```

 Convolution operator for filtering neighborhoods of one-dimensional inputs. When using this layer as the first layer in a model, either provide the keyword argument `input_dim` (int, e.g. 128 for sequences of 128-dimensional vectors), or `input_shape` (tuple of integers, e.g. (10, 128) for sequences of 10 vectors of 128-dimensional vectors).

- __Input shape__: 3D tensor with shape: `(nb_samples, steps, input_dim)`.
+- __Input shape__: 3D tensor with shape: `(samples, steps, input_dim)`.

- __Output shape__: 3D tensor with shape: `(nb_samples, steps, nb_filter)`. `steps` value might have changed due to padding.
+- __Output shape__: 3D tensor with shape: `(samples, new_steps, nb_filter)`. `steps` value might have changed due to padding.

 - __Arguments__:
    - __nb_filter__: Number of convolution kernels to use (dimensionality of the output).
@ -21,7 +25,7 @@ Convolution operator for filtering neighborhoods of one-dimensional inputs. When
    - __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
    - __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
    - __weights__: list of numpy arrays to set as initial weights.
-    - __border_mode__: 'valid' or 'full'. see scipy.signal.convolve2d.
+    - __border_mode__: 'valid' or 'same'.
    - __subsample_length__: factor by which to subsample output.
    - __W_regularizer__: instance of [WeightRegularizer](../regularizers.md) (eg. L1 or L2 regularization), applied to the main weights matrix.
    - __b_regularizer__: instance of [WeightRegularizer](../regularizers.md), applied to the bias.
@ -37,16 +41,23 @@ Convolution operator for filtering neighborhoods of one-dimensional inputs. When

 ```python
 keras.layers.convolutional.Convolution2D(nb_filter, nb_row, nb_col, 
-        init='glorot_uniform', activation='linear', weights=None, 
-        border_mode='valid', subsample=(1, 1),
-        W_regularizer=None, b_regularizer=None, W_constraint=None)
+        init='glorot_uniform',
+        activation='linear',
+        weights=None, 
+        border_mode='valid',
+        subsample=(1, 1),
+        W_regularizer=None, b_regularizer=None,
+        W_constraint=None,
+        dim_ordering='th')
 ```

 Convolution operator for filtering windows of two-dimensional inputs. When using this layer as the first layer in a model, provide the keyword argument `input_shape` (tuple of integers, does not include the sample axis), e.g. `input_shape=(3, 128, 128)` for 128x128 RGB pictures.

- __Input shape__: 4D tensor with shape: `(nb_samples, channels, rows, cols)`.
+- __Input shape__: 4D tensor with shape: `(samples, channels, rows, cols)` if dim_ordering='th'
+or 4D tensor with shape: `(samples, rows, cols, channels)` if dim_ordering='tf'.

- __Output shape__: 4D tensor with shape: `(nb_samples, nb_filter, rows, cols)`. `rows`, `cols` might have changed due to padding.
+- __Output shape__: 4D tensor with shape: `(samples, nb_filter, nb_row, nb_col)` if dim_ordering='th'
+or 4D tensor with shape: `(samples, nb_row, nb_col, nb_filter)` if dim_ordering='tf'.


 - __Arguments__:
@ -57,13 +68,14 @@ Convolution operator for filtering windows of two-dimensional inputs. When using
    - __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
    - __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
    - __weights__: list of numpy arrays to set as initial weights.
-    - __border_mode__: 'valid', 'full', or 'same'. [See scipy.signal.convolve2d](http://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.convolve2d.html).
+    - __border_mode__: 'valid' or 'same'.
    - __subsample__: tuple of length 2. Factor by which to subsample output. Also called strides elsewhere.
    - __W_regularizer__: instance of [WeightRegularizer](../regularizers.md) (eg. L1 or L2 regularization), applied to the main weights matrix.
    - __b_regularizer__: instance of [WeightRegularizer](../regularizers.md), applied to the bias.
    - __activity_regularizer__: instance of [ActivityRegularizer](../regularizers.md), applied to the network output.
    - __W_constraint__: instance of the [constraints](../constraints.md) module (eg. maxnorm, nonneg), applied to the main weights matrix.
    - __b_constraint__: instance of the [constraints](../constraints.md) module, applied to the bias.
+    - __dim_ordering__: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3.


 ---
@ -71,33 +83,120 @@ Convolution operator for filtering windows of two-dimensional inputs. When using
 ## MaxPooling1D

 ```python
-keras.layers.convolutional.MaxPooling1D(pool_length=2, stride=None, ignore_border=True)
+keras.layers.convolutional.MaxPooling1D(pool_length=2, stride=None, border_mode='valid')
 ```

- __Input shape__: 3D tensor with shape: `(nb_samples, steps, dim)`.
+Max pooling operation for temporal data.

- __Output shape__: 3D tensor with shape: `(nb_samples, downsampled_steps, dim)`.
+- __Input shape__: 3D tensor with shape: `(samples, steps, features)`.
+
+- __Output shape__: 3D tensor with shape: `(samples, downsampled_steps, features)`.

 - __Arguments__:

    - __pool_length__: factor by which to downscale. 2 will halve the input.
    - __stride__: integer or None. Stride value.
-    - __ignore_border__: boolean.
+    - __border_mode__: 'valid' or 'same'. **Note:** 'same' will only work with TensorFlow for the time being.

 ---

 ## MaxPooling2D

 ```python
-keras.layers.convolutional.MaxPooling2D(pool_size=(2, 2), ignore_border=True)
+keras.layers.convolutional.MaxPooling2D(pool_size=(2, 2), border_mode='valid', dim_ordering='th')
 ```

- __Input shape__: 4D tensor with shape: `(nb_samples, stack_size, nb_row, nb_col)`.
+Max pooling operation for spatial data.

- __Output shape__: 4D tensor with shape: `(nb_samples, stack_size, new_nb_row, new_nb_col)`.
+- __Input shape__: 4D tensor with shape: `(samples, channels, rows, cols)` if dim_ordering='th'
+or 4D tensor with shape: `(samples, rows, cols, channels)` if dim_ordering='tf'.
+
+- __Output shape__: 4D tensor with shape: `(nb_samples, channels, pooled_rows, pooled_cols)` if dim_ordering='th'
+or 4D tensor with shape: `(samples, pooled_rows, pooled_cols, channels)` if dim_ordering='tf'.

 - __Arguments__:

-    - __pool_size__: factor by which to downscale (vertical ds, horizontal ds). (2, 2) will halve the image in each dimension.
-    - __ignore_border__: boolean. When True, (5, 5) input with pool_size=(2, 2) will generate a (2, 2) output, (3, 3) otherwise.
+    - __pool_size__: tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the image in each dimension.
+    - __strides__: tuple of 2 integers, or None. Strides values.
+    - __border_mode__: 'valid' or 'same'. **Note:** 'same' will only work with TensorFlow for the time being.
+    - __dim_ordering__: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3.

+
+---
+
+## UpSampling1D
+
+```python
+keras.layers.convolutional.UpSampling1D(length=2)
+```
+
+Repeats each temporal step `length` times along the time axis.
+
+- __Input shape__: 3D tensor with shape: `(samples, steps, features)`.
+
+- __Output shape__: 3D tensor with shape: `(samples, upsampled_steps, features)`.
+
+- __Arguments__:
+    - __length__: integer. Upsampling factor.
+
+---
+
+
+## UpSampling2D
+
+```python
+keras.layers.convolutional.UpSampling2D(size=(2, 2), dim_ordering='th')
+```
+
+Repeats the rows and columns of the data by size[0] and size[1] respectively.
+
+- __Input shape__: 4D tensor with shape: `(samples, channels, rows, cols)` if dim_ordering='th'
+or 4D tensor with shape: `(samples, rows, cols, channels)` if dim_ordering='tf'.
+
+- __Output shape__: 4D tensor with shape: `(samples, channels, upsampled_rows, upsampled_cols)` if dim_ordering='th'
+or 4D tensor with shape: `(samples, upsampled_rows, upsampled_cols, channels)` if dim_ordering='tf'.
+
+- __Arguments__:
+    - __size__: tuple of 2 integers. The upsampling factors for rows and columns.
+    - __dim_ordering__: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3.
+
+---
+
+
+## ZeroPadding1D
+
+```python
+keras.layers.convolutional.ZeroPaddding1D(padding=1)
+```
+
+Pads the input with zeros left and right along the time axis.
+
+- __Input shape__: 3D tensor with shape: `(nb_samples, steps, dim)`.
+
+- __Output shape__: 3D tensor with shape: `(nb_samples, padded_steps, dim)`.
+
+- __Arguments__:
+    - __padding__: integer, the size of the padding.
+
+---
+
+
+## ZeroPadding2D
+
+```python
+keras.layers.convolutional.ZeroPaddding2D(padding=(1, 1), dim_ordering='th')
+```
+
+Pads the rows and columns of the input with zeros, left and right.
+
+- __Input shape__: 4D tensor with shape: `(samples, channels, rows, cols)` if dim_ordering='th'
+or 4D tensor with shape: `(samples, rows, cols, channels)` if dim_ordering='tf'.
+
+- __Output shape__: 4D tensor with shape: `(samples, channels, padded_rows, padded_cols)` if dim_ordering='th'
+or 4D tensor with shape: `(samples, padded_rows, padded_cols, channels)` if dim_ordering='tf'.
+
+- __Arguments__:
+    - __padding__: tuple of 2 integers, the size of the padding for rows and columns respectively.
+    - __dim_ordering__: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3.
+
+---
--- a/docs/sources/layers/core.md
+++ b/docs/sources/layers/core.md
@ -76,9 +76,13 @@ get_config()

 ## Dense
 ```python
-keras.layers.core.Dense(output_dim, init='glorot_uniform', activation='linear', weights=None
-W_regularizer=None, b_regularizer=None, activity_regularizer=None,
-W_constraint=None, b_constraint=None, input_dim=None)
+keras.layers.core.Dense(output_dim,
+                        init='glorot_uniform',
+                        activation='linear',
+                        weights=None,
+                        W_regularizer=None, b_regularizer=None, activity_regularizer=None,
+                        W_constraint=None, b_constraint=None,
+                        input_dim=None)
 ```

 Standard 1D fully-connect layer. 
@ -104,9 +108,13 @@ Standard 1D fully-connect layer.

 ## TimeDistributedDense
 ```python
-keras.layers.core.TimeDistributedDense(output_dim, init='glorot_uniform', activation='linear', weights=None
-W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None,
-input_dim=None, input_length=None)
+keras.layers.core.TimeDistributedDense(output_dim,
+                                       init='glorot_uniform',
+                                       activation='linear',
+                                       weights=None
+                                       W_regularizer=None, b_regularizer=None, activity_regularizer=None,
+                                       W_constraint=None, b_constraint=None,
+                                       input_dim=None, input_length=None)
 ```

 Fully-connected layer distributed over the time dimension. Useful after a recurrent network set to `return_sequences=True`.
@ -300,9 +308,12 @@ This layer can be used, for instance, to induce activation sparsity in the previ

 ## MaxoutDense
 ```python
-keras.layers.core.MaxoutDense(output_dim, nb_feature=4, init='glorot_uniform', weights=None,
+keras.layers.core.MaxoutDense(output_dim, nb_feature=4,
+        init='glorot_uniform',
+        weights=None,
        W_regularizer=None, b_regularizer=None, activity_regularizer=None,
-        W_constraint=None, b_constraint=None, input_dim=None)
+        W_constraint=None, b_constraint=None,
+        input_dim=None)
 ```

 A dense maxout layer. A `MaxoutDense` layer takes the element-wise maximum of `nb_feature` `Dense(input_dim, output_dim)` linear layers. This allows the layer to learn a convex, piecewise linear activation function over the inputs. See [this paper](http://arxiv.org/pdf/1302.4389.pdf) for more details. Note that this is a *linear* layer -- if you wish to apply activation function (you shouldn't need to -- they are universal function approximators), an `Activation` layer must be added after.
@ -332,14 +343,20 @@ model.add(MaxoutDense(50, nb_feature=10)) # output shape: (nb_samples, 50)

 ## Merge
 ```python
-keras.layers.core.Merge(layers, mode='sum')
+keras.layers.core.Merge(layers, mode='sum', concat_axis=-1, dot_axes=-1)
 ```

-Merge the output of a list of layers (or containers) into a single tensor, following one of three modes: `sum`, `mul` or `concat`. 
+Merge the output of a list of layers (or containers) into a single tensor.

 - __Arguments__:
    - __layers__: List of layers or [containers](/layers/containers/).
-    - __mode__: String, one of `{'sum', 'mul', 'concat'}`. `sum` and `mul` will simply sum/multiply the outputs of the layers (therefore all layers should have an output with the same shape). `concat` will concatenate the outputs along the last dimension (therefore all layers should have an output that only differ along the last dimension). 
+    - __mode__: String, one of `{'sum', 'mul', 'concat', 'ave', 'dot'}`. `sum`, `mul` and `ave` will simply sum/multiply/average the outputs of the layers (therefore all layers should have an output with the same shape). `concat` will concatenate the outputs along the dimension specified by `concate_axis` (therefore all layers should have an output that only differ along this dimension). `dot` will dot tensor contraction on the axes specified by `dot_axes` (see [the Numpy documentation](http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.tensordot.html) for more details).
+    - __concat_axis__: axis to use in `concat` mode.
+    - __dot_axes__: axis or axes to use in `dot` mode (see [the Numpy documentation](http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.tensordot.html) for more details).
+
+
+- __Notes__:
+    - `dot` mode only works with Theano for the time being.

 - __Example__:

@ -375,6 +392,8 @@ Given an input of dimensions `(nb_samples, timesteps, input_dim)`, return the in

 - __Output shape__: 3D tensor with shape: `(nb_samples, timesteps, features)`.

+- __Notes__: Masking only works in Theano for the time being.
+
 ## Lambda
 ```python
 keras.layers.core.Lambda(function, output_shape=None)
--- a/docs/sources/layers/embeddings.md
+++ b/docs/sources/layers/embeddings.md
@ -2,7 +2,12 @@
 ## Embedding

 ```python
-keras.layers.embeddings.Embedding(input_dim, output_dim, init='uniform', input_length=None, weights=None, W_regularizer=None, W_constraint=None, mask_zero=False)
+keras.layers.embeddings.Embedding(input_dim, output_dim,
+                                  init='uniform',
+                                  weights=None,
+                                  W_regularizer=None, W_constraint=None,
+                                  mask_zero=False,
+                                  input_length=None)
 ```

 Turn positive integers (indexes) into denses vectors of fixed size,
@ -23,28 +28,4 @@ eg. `[[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]`
 	- __mask_zero__: Whether or not the input value 0 is a special "padding" value that should be masked out. This is useful for [recurrent layers](recurrent.md) which may take variable length input. If this is `True` then all subsequent layers in the model need to support masking or an exception will be raised.
    - __input_length__: Length of input sequences, when it is constant. This argument is required if you are going to connect `Flatten` then `Dense` layers upstream (without it, the shape of the dense outputs cannot be computed).

-
-## WordContextProduct
-
-```python
-keras.layers.embeddings.WordContextProduct(input_dim, proj_dim=128,
-        init='uniform', activation='sigmoid', weights=None)
-```
-
-This layer turns a pair of words (a pivot word + a context word, ie. a word from the same context as a pivot, or a random, out-of-context word), identified by their indices in a vocabulary, into two dense representations (word representation and context representation).
-
-Then it returns `activation(dot(pivot_embedding, context_embedding))`, which can be trained to encode the probability of finding the context word in the context of the pivot word (or reciprocally depending on your training procedure).
-
-For more context, see Mikolov et al.: [Efficient Estimation of Word representations in Vector Space](http://arxiv.org/pdf/1301.3781v3.pdf)
-
- __Input shape__: 2D tensor with shape: `(nb_samples, 2)`.
-
- __Output shape__: 2D tensor with shape: `(nb_samples, 1)`.
-
- __Arguments__:
-
-    - __input_dim__: int >= 0. Size of the vocabulary, ie. 1+maximum integer index occurring in the input data.
-    - __proj_dim__: int >= 0. Dimension of the dense embedding used internally.
-    - __init__: name of initialization function for the embeddings (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
-    - __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function.
-    - __weights__: list of numpy arrays to set as initial weights. The list should have 2 element, both of shape `(input_dim, proj_dim)`. The first element is the word embedding weights, the second one is the context embedding weights.
+---
--- a/docs/sources/layers/recurrent.md
+++ b/docs/sources/layers/recurrent.md
@ -3,8 +3,13 @@

 ```python
 keras.layers.recurrent.SimpleRNN(output_dim,
-        init='glorot_uniform', inner_init='orthogonal', activation='sigmoid', weights=None,
-        truncate_gradient=-1, return_sequences=False, input_dim=None, input_length=None)
+        init='glorot_uniform', inner_init='orthogonal',
+        activation='sigmoid',
+        weights=None,
+        return_sequences=False,
+        go_backwards=False,
+        stateful=False,
+        input_dim=None, input_length=None)
 ```
 Fully connected RNN where output is to fed back to input. 

@ -14,7 +19,9 @@ Fully connected RNN where output is to fed back to input.
    - if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, output_dim)`.
    - else: 2D tensor with shape: `(nb_samples, output_dim)`.

- __Masking__: This layer supports masking for input data with a variable number of timesteps To introduce masks to your data, use an [Embedding](embeddings.md) layer with the `mask_zero` parameter set to `True`.
+- __Masking__: This layer supports masking for input data with a variable number of timesteps To introduce masks to your data, use an [Embedding](embeddings.md) layer with the `mask_zero` parameter set to `True`. **Note:** for the time being, masking in only supported with Theano.
+
+- __Notes__: When using the TensorFlow backend, the number of timesteps used must be fixed. Make sure to pass an `input_length` int argument or a complete `input_shape` tuple argument.


 - __Arguments__:
@ -22,53 +29,11 @@ Fully connected RNN where output is to fed back to input.
    - __init__: weight initialization function. Can be the name of an existing function (str), or a Theano function (see: [initializations](../initializations.md)).
    - __activation__: activation function. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
    - __weights__: list of numpy arrays to set as initial weights. The list should have 3 elements, of shapes: `[(input_dim, output_dim), (output_dim, output_dim), (output_dim,)]`.
-    - __truncate_gradient__: Number of steps to use in truncated BPTT. See: [Theano "scan"](http://deeplearning.net/software/theano/library/scan.html).
    - __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
+    - __go_backwards__: Boolean (default False). If True, rocess the input sequence backwards.
+    - __stateful__: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
    - __input_dim__: dimensionality of the input (integer). This argument (or alternatively, the keyword argument `input_shape`) is required when using this layer as the first layer in a model.
    - __input_length__: Length of input sequences, when it is constant. This argument is required if you are going to connect `Flatten` then `Dense` layers upstream (without it, the shape of the dense outputs cannot be computed).
-    - __go_backwards__: Boolean (Default False).  Process the input sequence backwards.
-
---
-
-## SimpleDeepRNN
-
-```python
-keras.layers.recurrent.SimpleDeepRNN(output_dim, depth=3,
-        init='glorot_uniform', inner_init='orthogonal', 
-        activation='sigmoid', inner_activation='hard_sigmoid',
-        weights=None, truncate_gradient=-1, return_sequences=False,
-        input_dim=None, input_length=None)
-```
-Fully connected RNN where the output of multiple timesteps (up to "depth" steps in the past) is fed back to the input: 
-
-```
-output = activation( W.x_t + b + inner_activation(U_1.h_tm1) + inner_activation(U_2.h_tm2) + ... )
-```
-
-Not a particularly useful model, included for demonstration purposes.
-
- __Input shape__: 3D tensor with shape: `(nb_samples, timesteps, input_dim)`.
-
- __Output shape__:
-    - if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, output_dim)`.
-    - else: 2D tensor with shape: `(nb_samples, output_dim)`.
-
- __Masking__: This layer supports masking for input data with a variable number of timesteps To introduce masks to your data, use an [Embedding](embeddings.md) layer with the `mask_zero` parameter set to `True`.
-
-
- __Arguments__:
-    - __output_dim__: dimension of the internal projections and the final output.
-    - __depth__: int >= 1. Lookback depth (eg. depth=1 is equivalent to SimpleRNN).
-    - __init__: weight initialization function for the output cell. Can be the name of an existing function (str), or a Theano function (see: [initializations](../initializations.md)).
-    - __inner_init__: weight initialization function for the inner cells.
-    - __activation__: activation function for the output. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
-    - __inner_activation__: activation function for the inner cells.
-    - __weights__: list of numpy arrays to set as initial weights. The list should have depth+2 elements.
-    - __truncate_gradient__: Number of steps to use in truncated BPTT. See: [Theano "scan"](http://deeplearning.net/software/theano/library/scan.html).
-    - __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
-    - __input_dim__: dimensionality of the input (integer). This argument (or alternatively, the keyword argument `input_shape`) is required when using this layer as the first layer in a model.
-    - __input_length__: Length of input sequences, when it is constant. This argument is required if you are going to connect `Flatten` then `Dense` layers upstream (without it, the shape of the dense outputs cannot be computed).
-    - __go_backwards__: Boolean (Default False).  Process the input sequence backwards.


 ---
@ -79,7 +44,9 @@ Not a particularly useful model, included for demonstration purposes.
 keras.layers.recurrent.GRU(output_dim,
        init='glorot_uniform', inner_init='orthogonal',
        activation='sigmoid', inner_activation='hard_sigmoid',
-        weights=None, truncate_gradient=-1, return_sequences=False,
+        return_sequences=False,
+        go_backwards=False,
+        stateful=False,
        input_dim=None, input_length=None)
 ```

@ -91,7 +58,9 @@ Gated Recurrent Unit - Cho et al. 2014.
    - if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, output_dim)`.
    - else: 2D tensor with shape: `(nb_samples, output_dim)`.

- __Masking__: This layer supports masking for input data with a variable number of timesteps To introduce masks to your data, use an [Embedding](embeddings.md) layer with the `mask_zero` parameter set to true.
+- __Masking__: This layer supports masking for input data with a variable number of timesteps To introduce masks to your data, use an [Embedding](embeddings.md) layer with the `mask_zero` parameter set to true. **Note:** for the time being, masking in only supported with Theano.
+
+- __Notes__: When using the TensorFlow backend, the number of timesteps used must be fixed. Make sure to pass an `input_length` int argument or a complete `input_shape` tuple argument.

 - __Arguments__:
    - __output_dim__: dimension of the internal projections and the final output.
@ -100,11 +69,12 @@ Gated Recurrent Unit - Cho et al. 2014.
    - __activation__: activation function for the output. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
    - __inner_activation__: activation function for the inner cells.
    - __weights__: list of numpy arrays to set as initial weights. The list should have 9 elements.
-    - __truncate_gradient__: Number of steps to use in truncated BPTT. See: [Theano "scan"](http://deeplearning.net/software/theano/library/scan.html).
    - __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
+    - __go_backwards__: Boolean (default False). If True, rocess the input sequence backwards.
+    - __stateful__: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
    - __input_dim__: dimensionality of the input (integer). This argument (or alternatively, the keyword argument `input_shape`) is required when using this layer as the first layer in a model.
    - __input_length__: Length of input sequences, when it is constant. This argument is required if you are going to connect `Flatten` then `Dense` layers upstream (without it, the shape of the dense outputs cannot be computed).
-    - __go_backwards__: Boolean (Default False).  Process the input sequence backwards.
+

 - __References__: 
    - [On the Properties of Neural Machine Translation: Encoder–Decoder Approaches](http://www.aclweb.org/anthology/W14-4012)
@ -118,7 +88,10 @@ Gated Recurrent Unit - Cho et al. 2014.
 keras.layers.recurrent.LSTM(output_dim,
        init='glorot_uniform', inner_init='orthogonal', forget_bias_init='one',
        activation='tanh', inner_activation='hard_sigmoid',
-        weights=None, truncate_gradient=-1, return_sequences=False,
+        weights=None,
+        return_sequences=False,
+        go_backwards=False,
+        stateful=False,
        input_dim=None, input_length=None)
 ```

@ -130,7 +103,9 @@ Long-Short Term Memory unit - Hochreiter 1997.
    - if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, output_dim)`.
    - else: 2D tensor with shape: `(nb_samples, output_dim)`.

- __Masking__: This layer supports masking for input data with a variable number of timesteps To introduce masks to your data, use an [Embedding](embeddings.md) layer with the `mask_zero` parameter set to true.
+- __Masking__: This layer supports masking for input data with a variable number of timesteps To introduce masks to your data, use an [Embedding](embeddings.md) layer with the `mask_zero` parameter set to true. **Note:** for the time being, masking in only supported with Theano.
+
+- __Notes__: When using the TensorFlow backend, the number of timesteps used must be fixed. Make sure to pass an `input_length` int argument or a complete `input_shape` tuple argument.

 - __Arguments__:
    - __output_dim__: dimension of the internal projections and the final output.
@ -140,11 +115,12 @@ Long-Short Term Memory unit - Hochreiter 1997.
    - __activation__: activation function for the output. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
    - __inner_activation__: activation function for the inner cells.
    - __weights__: list of numpy arrays to set as initial weights. The list should have 12 elements.
-    - __truncate_gradient__: Number of steps to use in truncated BPTT. See: [Theano "scan"](http://deeplearning.net/software/theano/library/scan.html).
    - __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
+    - __go_backwards__: Boolean (default False). If True, rocess the input sequence backwards.
+    - __stateful__: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
    - __input_dim__: dimensionality of the input (integer). This argument (or alternatively, the keyword argument `input_shape`) is required when using this layer as the first layer in a model.
    - __input_length__: Length of input sequences, when it is constant. This argument is required if you are going to connect `Flatten` then `Dense` layers upstream (without it, the shape of the dense outputs cannot be computed).
-    - __go_backwards__: Boolean (Default False).  Process the input sequence backwards.
+

 - __References__: 
    - [Long short-term memory](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf) (original 1997 paper)
@ -152,42 +128,3 @@ Long-Short Term Memory unit - Hochreiter 1997.
    - [Supervised sequence labelling with recurrent neural networks](http://www.cs.toronto.edu/~graves/preprint.pdf)

 ---
-
-## JZS1, JZS2, JZS3
-
-```python
-keras.layers.recurrent.JZS1(output_dim,
-        init='glorot_uniform', inner_init='orthogonal', 
-        activation='tanh', inner_activation='sigmoid',
-        weights=None, truncate_gradient=-1, return_sequences=False,
-        input_dim=None, input_length=None)
-```
-
-Top 3 RNN architectures evolved from the evaluation of thousands of models. Serves as alternatives to LSTMs and GRUs. Corresponds to `MUT1`, `MUT2`, and `MUT3` architectures described in the paper: An Empirical Exploration of Recurrent Network Architectures, Jozefowicz et al. 2015.
-
- __Input shape__: 3D tensor with shape: `(nb_samples, timesteps, input_dim)`.
-
- __Output shape__:
-    - if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, output_dim)`.
-    - else: 2D tensor with shape: `(nb_samples, output_dim)`.
-
- __Masking__: This layer supports masking for input data with a variable number of timesteps To introduce masks to your data, use an [Embedding](embeddings.md) layer with the `mask_zero` parameter set to true.
-
- __Arguments__:
-    - __output_dim__: dimension of the internal projections and the final output.
-    - __init__: weight initialization function for the output cell. Can be the name of an existing function (str), or a Theano function (see: [initializations](../initializations.md)).
-    - __inner_init__: weight initialization function for the inner cells.
-    - __activation__: activation function for the output. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
-    - __inner_activation__: activation function for the inner cells.
-    - __weights__: list of numpy arrays to set as initial weights. The list should have 9 elements.
-    - __truncate_gradient__: Number of steps to use in truncated BPTT. See: [Theano "scan"](http://deeplearning.net/software/theano/library/scan.html).
-    - __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
-    - __input_dim__: dimensionality of the input (integer). This argument (or alternatively, the keyword argument `input_shape`) is required when using this layer as the first layer in a model.
-    - __input_length__: Length of input sequences, when it is constant. This argument is required if you are going to connect `Flatten` then `Dense` layers upstream (without it, the shape of the dense outputs cannot be computed).
-    - __go_backwards__: Boolean (Default False).  Process the input sequence backwards.
-
- __References__: 
-    - [An Empirical Exploration of Recurrent Network Architectures](http://www.jmlr.org/proceedings/papers/v37/jozefowicz15.pdf)
-            
-            
-                
--- a/docs/sources/models.md
+++ b/docs/sources/models.md
@ -12,9 +12,8 @@ model = keras.models.Sequential()
            - __optimizer__: str (name of optimizer) or optimizer object. See [optimizers](optimizers.md).
            - __loss__: str (name of objective function) or objective function. See [objectives](objectives.md).
            - __class_mode__: one of "categorical", "binary". This is only used for computing classification accuracy or using the predict_classes method.
-            - __theano_mode__: A `theano.compile.mode.Mode` ([reference](http://deeplearning.net/software/theano/library/compile/mode.html)) instance controlling specifying compilation options.
    - __fit__(X, y, batch_size=128, nb_epoch=100, verbose=1, validation_split=0., validation_data=None, shuffle=True, show_accuracy=False, callbacks=[], class_weight=None, sample_weight=None): Train a model for a fixed number of epochs.
-        - __Return__: a history dictionary with a record of training loss values at successive epochs, as well as validation loss values (if applicable), accuracy (if applicable), etc.
+        - __Return__: a history object. It `history` attribute is a record of training loss values at successive epochs, as well as validation loss values (if applicable).
        - __Arguments__:
            - __X__: data.
            - __y__: labels.
@ -154,7 +153,7 @@ model = keras.models.Graph()
            - __optimizer__: str (name of optimizer) or optimizer object. See [optimizers](optimizers.md).
            - __loss__: dictionary mapping the name(s) of the output(s) to a loss function (string name of objective function or objective function. See [objectives](objectives.md)).
    - __fit__(data, batch_size=128, nb_epoch=100, verbose=1, validation_split=0., validation_data=None, shuffle=True, callbacks=[]): Train a model for a fixed number of epochs.
-        - __Return__: a history dictionary with a record of training loss values at successive epochs, as well as validation loss values (if applicable).
+        - __Return__: a history object. It `history` attribute is a record of training loss values at successive epochs, as well as validation loss values (if applicable).
        - __Arguments__:
            - __data__:dictionary mapping input names out outputs names to appropriate numpy arrays. All arrays should contain the same number of samples.
            - __batch_size__: int. Number of samples per gradient update.