Documentation refactor

2015-12-12 12:36:00 -08:00 · 2015-12-12 12:36:00 -08:00 · 06f5f43079
commit 06f5f43079
parent 656681e9d8
28 changed files with 141 additions and 1453 deletions
--- a/docs/sources/layers/advanced_activations.md
+++ b/docs/sources/layers/advanced_activations.md
@ -1,106 +0,0 @@
-
-## LeakyReLU
-
-```python
-keras.layers.advanced_activations.LeakyReLU(alpha=0.3)
-```
-
-Special version of a Rectified Linear Unit that allows a small gradient when the unit is not active (`f(x) = alpha*x for x < 0`).
-
-
- __Input shape__: Arbitrary. Use the keyword argument `input_shape` (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.
-
- __Output shape__: Same as input.
-
- __Arguments__:
-    - __alpha__: float >= 0. Negative slope coefficient.
-
---
-
-## PReLU
-
-```python
-keras.layers.advanced_activations.PReLU()
-```
-
-Parametrized linear unit. Similar to a LeakyReLU, where each input unit has its alpha coefficient, and where these coefficients are learned during training.
-
-
- __Input shape__: Arbitrary. Use the keyword argument `input_shape` (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.
-
- __Output shape__: Same as input.
-
- __References__:
-    - [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](http://arxiv.org/pdf/1502.01852v1.pdf)
-
---
-
-## ELU
-
-```python
-keras.layers.advanced_activations.ELU()
-```
-
-Exponential linear unit. Negative values pushes mean unit activations closer to zero, with the advantage of having a noise-robust deactivation state.
-
-
- __Input shape__: Arbitrary. Use the keyword argument `input_shape` (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.
-
- __Output shape__: Same as input.
-
- __References__:
-    - [Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)](http://arxiv.org/pdf/1511.07289v1.pdf)
-
---
-
-## ParametricSoftplus
-
-```python
-keras.layers.advanced_activations.ParametricSoftplus()
-```
-
-Parametric Softplus of the form: (`f(x) = alpha * (1 + exp(beta * x))`). This is essentially a smooth version of ReLU where the parameters control the sharpness of the rectification. The parameters are initialized to more closely approximate a ReLU than the standard `softplus`: `alpha` initialized to `0.2` and `beta`  initialized to `5.0`. The parameters are fit separately for each hidden unit.
-
- __Input shape__: Arbitrary. Use the keyword argument `input_shape=...` when using this layer as the first layer in a model.
-
- __Output shape__: Same as input.
-
- __References__:
-    - [Inferring Nonlinear Neuronal Computation Based on Physiologically Plausible Inputs](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003143)
-
-## Thresholded Linear
-
-```python
-keras.layers.advanced_activations.ThresholdedLinear(theta)
-```
-
-Parametrized linear unit. provides a threshold near zero where values are zeroed.
-
-
- __Input shape__: Arbitrary. Use the keyword argument `input_shape` (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.
-
- __Output shape__: Same as input.
-
- __Arguments__:
-    - __theta__: float >= 0. Threshold location of activation
-
- __References__:
-    - [Zero-Bias Autoencoders and the Benefits of Co-Adapting Features](http://arxiv.org/pdf/1402.3337.pdf)
-
-## Thresholded ReLu
-
-```python
-keras.layers.advanced_activations.ThresholdedReLu(theta)
-```
-
-Parametrized rectified linear unit. provides a threshold near zero where values are zeroed.
-
- __Input shape__: Arbitrary. Use the keyword argument `input_shape=...` when using this layer as the first layer in a model.
-
- __Output shape__: Same as input.
-
- __Arguments__:
-    - __theta__: float >= 0. Threshold location of activation
-
- __References__:
-    - [Zero-Bias Autoencoders and the Benefits of Co-Adapting Features](http://arxiv.org/pdf/1402.3337.pdf)
--- a/docs/sources/layers/containers.md
+++ b/docs/sources/layers/containers.md
@ -1,21 +0,0 @@
-Containers are ensembles of layers that can be interacted with through the same API as `Layer` objects.
-
-## Sequential
-
-```python
-keras.layers.containers.Sequential(layers=[])
-```
-
-The Sequential container is a linear stack of layers. Apart from the `add` methods and the `layers` constructor argument, the API is identical to that of the `Layer` class.
-
-This class is also the basis for the `keras.models.Sequential` architecture.
-
-The `layers` constructor argument is a list of Layer instances.
-
-__Methods__:
-
-```python
-add(layer)
-```
-
-Add a new layer to the stack.
--- a/docs/sources/layers/convolutional.md
+++ b/docs/sources/layers/convolutional.md
@ -1,244 +0,0 @@
-
-## Convolution1D
-
-```python
-keras.layers.convolutional.Convolution1D(nb_filter, filter_length, 
-        init='uniform',
-        activation='linear',
-        weights=None, 
-        border_mode='valid',
-        subsample_length=1, 
-        W_regularizer=None, b_regularizer=None,
-        W_constraint=None, b_constraint=None,
-        input_dim=None, input_length=None)
-```
-
-Convolution operator for filtering neighborhoods of one-dimensional inputs. When using this layer as the first layer in a model, either provide the keyword argument `input_dim` (int, e.g. 128 for sequences of 128-dimensional vectors), or `input_shape` (tuple of integers, e.g. (10, 128) for sequences of 10 vectors of 128-dimensional vectors).
-
- __Input shape__: 3D tensor with shape: `(samples, steps, input_dim)`.
-
- __Output shape__: 3D tensor with shape: `(samples, new_steps, nb_filter)`. `steps` value might have changed due to padding.
-
- __Arguments__:
-    - __nb_filter__: Number of convolution kernels to use (dimensionality of the output).
-    - __filter_length__: The extension (spatial or temporal) of each filter.
-    - __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
-    - __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
-    - __weights__: list of numpy arrays to set as initial weights.
-    - __border_mode__: 'valid' or 'same'.
-    - __subsample_length__: factor by which to subsample output.
-    - __W_regularizer__: instance of [WeightRegularizer](../regularizers.md) (eg. L1 or L2 regularization), applied to the main weights matrix.
-    - __b_regularizer__: instance of [WeightRegularizer](../regularizers.md), applied to the bias.
-    - __activity_regularizer__: instance of [ActivityRegularizer](../regularizers.md), applied to the network output.
-    - __W_constraint__: instance of the [constraints](../constraints.md) module (eg. maxnorm, nonneg), applied to the main weights matrix.
-    - __b_constraint__: instance of the [constraints](../constraints.md) module, applied to the bias.
-    - __input_dim__: Number of channels/dimensions in the input. Either this argument or the keyword argument `input_shape` must be provided when using this layer as the first layer in a model.
-    - __input_length__: Length of input sequences, when it is constant. This argument is required if you are going to connect `Flatten` then `Dense` layers upstream (without it, the shape of the dense outputs cannot be computed).
-
---
-
-## Convolution2D
-
-```python
-keras.layers.convolutional.Convolution2D(nb_filter, nb_row, nb_col, 
-        init='glorot_uniform',
-        activation='linear',
-        weights=None, 
-        border_mode='valid',
-        subsample=(1, 1),
-        W_regularizer=None, b_regularizer=None,
-        W_constraint=None,
-        dim_ordering='th')
-```
-
-Convolution operator for filtering windows of two-dimensional inputs. When using this layer as the first layer in a model, provide the keyword argument `input_shape` (tuple of integers, does not include the sample axis), e.g. `input_shape=(3, 128, 128)` for 128x128 RGB pictures.
-
- __Input shape__: 4D tensor with shape: `(samples, channels, rows, cols)` if dim_ordering='th'
-or 4D tensor with shape: `(samples, rows, cols, channels)` if dim_ordering='tf'.
-
- __Output shape__: 4D tensor with shape: `(samples, nb_filter, nb_row, nb_col)` if dim_ordering='th'
-or 4D tensor with shape: `(samples, nb_row, nb_col, nb_filter)` if dim_ordering='tf'.
-
-
- __Arguments__:
-
-    - __nb_filter__: Number of convolution filters to use.
-    - __nb_row__: Number of rows in the convolution kernel.
-    - __nb_col__: Number of columns in the convolution kernel.
-    - __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
-    - __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
-    - __weights__: list of numpy arrays to set as initial weights.
-    - __border_mode__: 'valid' or 'same'.
-    - __subsample__: tuple of length 2. Factor by which to subsample output. Also called strides elsewhere.
-    - __W_regularizer__: instance of [WeightRegularizer](../regularizers.md) (eg. L1 or L2 regularization), applied to the main weights matrix.
-    - __b_regularizer__: instance of [WeightRegularizer](../regularizers.md), applied to the bias.
-    - __activity_regularizer__: instance of [ActivityRegularizer](../regularizers.md), applied to the network output.
-    - __W_constraint__: instance of the [constraints](../constraints.md) module (eg. maxnorm, nonneg), applied to the main weights matrix.
-    - __b_constraint__: instance of the [constraints](../constraints.md) module, applied to the bias.
-    - __dim_ordering__: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3.
-
-
---
-
-## MaxPooling1D
-
-```python
-keras.layers.convolutional.MaxPooling1D(pool_length=2, stride=None, border_mode='valid')
-```
-
-Max pooling operation for temporal data.
-
- __Input shape__: 3D tensor with shape: `(samples, steps, features)`.
-
- __Output shape__: 3D tensor with shape: `(samples, downsampled_steps, features)`.
-
- __Arguments__:
-
-    - __pool_length__: factor by which to downscale. 2 will halve the input.
-    - __stride__: integer or None. Stride value.
-    - __border_mode__: 'valid' or 'same'. **Note:** 'same' will only work with TensorFlow for the time being.
-
---
-
-## AveragePooling1D
-
-```python
-keras.layers.convolutional.AveragePooling1D(pool_length=2, stride=None, border_mode='valid')
-```
-
-Average pooling operation for temporal data.
-
- __Input shape__: 3D tensor with shape: `(samples, steps, features)`.
-
- __Output shape__: 3D tensor with shape: `(samples, downsampled_steps, features)`.
-
- __Arguments__:
-
-    - __pool_length__: factor by which to downscale. 2 will halve the input.
-    - __stride__: integer or None. Stride value.
-    - __border_mode__: 'valid' or 'same'. **Note:** 'same' will only work with TensorFlow for the time being.
-
---
-
-## MaxPooling2D
-
-```python
-keras.layers.convolutional.MaxPooling2D(pool_size=(2, 2), border_mode='valid', dim_ordering='th')
-```
-
-Max pooling operation for spatial data.
-
- __Input shape__: 4D tensor with shape: `(samples, channels, rows, cols)` if dim_ordering='th'
-or 4D tensor with shape: `(samples, rows, cols, channels)` if dim_ordering='tf'.
-
- __Output shape__: 4D tensor with shape: `(nb_samples, channels, pooled_rows, pooled_cols)` if dim_ordering='th'
-or 4D tensor with shape: `(samples, pooled_rows, pooled_cols, channels)` if dim_ordering='tf'.
-
- __Arguments__:
-
-    - __pool_size__: tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the image in each dimension.
-    - __strides__: tuple of 2 integers, or None. Strides values.
-    - __border_mode__: 'valid' or 'same'. **Note:** 'same' will only work with TensorFlow for the time being.
-    - __dim_ordering__: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3.
-
---
-
-## AveragePooling2D
-
-```python
-keras.layers.convolutional.AveragePooling2D(pool_size=(2, 2), border_mode='valid', dim_ordering='th')
-```
-
-Average pooling operation for spatial data.
-
- __Input shape__: 4D tensor with shape: `(samples, channels, rows, cols)` if dim_ordering='th'
-or 4D tensor with shape: `(samples, rows, cols, channels)` if dim_ordering='tf'.
-
- __Output shape__: 4D tensor with shape: `(nb_samples, channels, pooled_rows, pooled_cols)` if dim_ordering='th'
-or 4D tensor with shape: `(samples, pooled_rows, pooled_cols, channels)` if dim_ordering='tf'.
-
- __Arguments__:
-
-    - __pool_size__: tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the image in each dimension.
-    - __strides__: tuple of 2 integers, or None. Strides values.
-    - __border_mode__: 'valid' or 'same'. **Note:** 'same' will only work with TensorFlow for the time being.
-    - __dim_ordering__: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3.
-
---
-
-## UpSampling1D
-
-```python
-keras.layers.convolutional.UpSampling1D(length=2)
-```
-
-Repeats each temporal step `length` times along the time axis.
-
- __Input shape__: 3D tensor with shape: `(samples, steps, features)`.
-
- __Output shape__: 3D tensor with shape: `(samples, upsampled_steps, features)`.
-
- __Arguments__:
-    - __length__: integer. Upsampling factor.
-
---
-
-
-## UpSampling2D
-
-```python
-keras.layers.convolutional.UpSampling2D(size=(2, 2), dim_ordering='th')
-```
-
-Repeats the rows and columns of the data by size[0] and size[1] respectively.
-
- __Input shape__: 4D tensor with shape: `(samples, channels, rows, cols)` if dim_ordering='th'
-or 4D tensor with shape: `(samples, rows, cols, channels)` if dim_ordering='tf'.
-
- __Output shape__: 4D tensor with shape: `(samples, channels, upsampled_rows, upsampled_cols)` if dim_ordering='th'
-or 4D tensor with shape: `(samples, upsampled_rows, upsampled_cols, channels)` if dim_ordering='tf'.
-
- __Arguments__:
-    - __size__: tuple of 2 integers. The upsampling factors for rows and columns.
-    - __dim_ordering__: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3.
-
---
-
-
-## ZeroPadding1D
-
-```python
-keras.layers.convolutional.ZeroPaddding1D(padding=1)
-```
-
-Pads the input with zeros left and right along the time axis.
-
- __Input shape__: 3D tensor with shape: `(nb_samples, steps, dim)`.
-
- __Output shape__: 3D tensor with shape: `(nb_samples, padded_steps, dim)`.
-
- __Arguments__:
-    - __padding__: integer, the size of the padding.
-
---
-
-
-## ZeroPadding2D
-
-```python
-keras.layers.convolutional.ZeroPaddding2D(padding=(1, 1), dim_ordering='th')
-```
-
-Pads the rows and columns of the input with zeros, left and right.
-
- __Input shape__: 4D tensor with shape: `(samples, channels, rows, cols)` if dim_ordering='th'
-or 4D tensor with shape: `(samples, rows, cols, channels)` if dim_ordering='tf'.
-
- __Output shape__: 4D tensor with shape: `(samples, channels, padded_rows, padded_cols)` if dim_ordering='th'
-or 4D tensor with shape: `(samples, padded_rows, padded_cols, channels)` if dim_ordering='tf'.
-
- __Arguments__:
-    - __padding__: tuple of 2 integers, the size of the padding for rows and columns respectively.
-    - __dim_ordering__: 'th' or 'tf'. In 'th' mode, the channels dimension (the depth) is at index 1, in 'tf' mode is it at index 3.
-
---
--- a/docs/sources/layers/core.md
+++ b/docs/sources/layers/core.md
@ -1,491 +0,0 @@
-## Base class
-
-Note: Where ever we refer to a *Tensor*, bear in mind that its type depends on the backend you are using.
-
-```python
-keras.layers.core.Layer()
-```
-
-__Methods__:
-
-```python
-__call__
-```
-
-Apply layer transformation an input Tensor `X`.
-
- __Return__: Tensor.
-
- __Arguments__:
-    - __X__: Tensor. Input Tensor.
-    - __train__: bool. Specifies whether output is computed in training mode or in testing mode, which may. change the logic. For instance Dropout and regularization is not applied is testing mode.
-
-
-```python
-set_previous(previous_layer)
-```
-
-Connect the input of the current layer to the output of the argument layer.
-
- __Return__: None.
-
- __Arguments__:
-    - __previous_layer__: Layer object.
-
-
-
-```python
-get_output(train)
-```
-
-Get the output of the layer.
-
- __Return__: Tensor.
-
- __Arguments__:
-    - __train__: Boolean. Specifies whether output is computed in training mode or in testing mode, which can change the logic, for instance in there are any `Dropout` layers in the network.
-
-
-
-```python
-get_input(train)
-```
-
-Get the input of the layer.
-
- __Return__: Tensor.
-
- __Arguments__:
-    - __train__: Boolean. Specifies whether output is computed in training mode or in testing mode, which can change the logic, for instance in there are any `Dropout` layers in the network.
-
-
-
-```python
-get_weights()
-```
-
-Get the weights of the parameters of the layer.
-
- __Return__: List of numpy arrays (one per layer parameter).
-
-
-
-```python
-set_weights(weights)
-```
-
-Set the weights of the parameters of the layer.
-
- __Arguments__:
-    - __weights__: List of numpy arrays (one per layer parameter). Should be in the same order as what `get_weights(self)` returns.
-
-
-```python
-get_config()
-```
-
- __Return__: Configuration dictionary describing the layer.
-
-
---
-
-## Dense
-```python
-keras.layers.core.Dense(output_dim,
-                        init='glorot_uniform',
-                        activation='linear',
-                        weights=None,
-                        W_regularizer=None, b_regularizer=None, activity_regularizer=None,
-                        W_constraint=None, b_constraint=None,
-                        input_dim=None)
-```
-
-Standard 1D fully-connect layer.
-
- __Input shape__: 2D tensor with shape: `(nb_samples, input_dim)`.
-
- __Output shape__: 2D tensor with shape: `(nb_samples, output_dim)`.
-
- __Arguments__:
-
-    - __output_dim__: int >= 0.
-    - __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
-    - __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
-    - __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
-    - __W_regularizer__: instance of [WeightRegularizer](../regularizers.md) (eg. L1 or L2 regularization), applied to the main weights matrix.
-    - __b_regularizer__: instance of [WeightRegularizer](../regularizers.md), applied to the bias.
-    - __activity_regularizer__: instance of [ActivityRegularizer](../regularizers.md), applied to the network output.
-    - __W_constraint__: instance of the [constraints](../constraints.md) module (eg. maxnorm, nonneg), applied to the main weights matrix.
-    - __b_constraint__: instance of the [constraints](../constraints.md) module, applied to the bias.
-    - __input_dim__: dimensionality of the input (integer). This argument (or alternatively, the keyword argument `input_shape`) is required when using this layer as the first layer in a model.
-
---
-
-## TimeDistributedDense
-```python
-keras.layers.core.TimeDistributedDense(output_dim,
-                                       init='glorot_uniform',
-                                       activation='linear',
-                                       weights=None
-                                       W_regularizer=None, b_regularizer=None, activity_regularizer=None,
-                                       W_constraint=None, b_constraint=None,
-                                       input_dim=None, input_length=None)
-```
-
-Fully-connected layer distributed over the time dimension. Useful after a recurrent network set to `return_sequences=True`.
-
- __Input shape__: 3D tensor with shape: `(nb_samples, timesteps, input_dim)`.
-
- __Arguments__:
-    - __output_dim__: int >= 0.
-    - __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
-    - __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
-    - __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
-    - __W_regularizer__: instance of [WeightRegularizer](../regularizers.md) (eg. L1 or L2 regularization), applied to the main weights matrix.
-    - __b_regularizer__: instance of [WeightRegularizer](../regularizers.md), applied to the bias.
-    - __activity_regularizer__: instance of [ActivityRegularizer](../regularizers.md), applied to the network output.
-    - __W_constraint__: instance of the [constraints](../constraints.md) module (eg. maxnorm, nonneg), applied to the main weights matrix.
-    - __b_constraint__: instance of the [constraints](../constraints.md) module, applied to the bias.
-    - __input_dim__: dimensionality of the input (integer). This argument (or alternatively, the keyword argument `input_shape`) is required when using this layer as the first layer in a model.
-    - __input_length__: Length of input sequences, when it is constant. This argument is required if you are going to connect `Flatten` then `Dense` layers upstream (without it, the shape of the dense outputs cannot be computed).
-
- __Example__:
-```python
-# input shape: (nb_samples, timesteps, 10)
-model.add(LSTM(5, return_sequences=True, input_dim=10)) # output shape: (nb_samples, timesteps, 5)
-model.add(TimeDistributedDense(15)) # output shape: (nb_samples, timesteps, 15)
-```
-
-
---
-
-## AutoEncoder
-```python
-keras.layers.core.AutoEncoder(encoder, decoder, output_reconstruction=True, weights=None):
-```
-
-A customizable autoencoder model. If `output_reconstruction = True` then dim(input) = dim(output) else dim(output) = dim(hidden)
-
-
- __Input shape__: The layer shape is defined by the encoder definitions
-
- __Output shape__: The layer shape is defined by the decoder definitions
-
- __Arguments__:
-
-    - __encoder__: A [layer](./) or [layer container](./containers.md).
-
-    - __decoder__: A [layer](./) or [layer container](./containers.md).
-
-    - __output_reconstruction__: If this is False, then when .predict() is called, the output is the deepest hidden layer's activation. Otherwise, the output of the final decoder layer is presented. Be sure your validation data conforms to this logic if you decide to use any.
-
-    - __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
-
- __Example__:
-```python
-from keras.layers import containers
-
-# input shape: (nb_samples, 32)
-encoder = containers.Sequential([Dense(16, input_dim=32), Dense(8)])
-decoder = containers.Sequential([Dense(16, input_dim=8), Dense(32)])
-
-autoencoder = Sequential()
-autoencoder.add(AutoEncoder(encoder=encoder, decoder=decoder, output_reconstruction=False))
-```
-
-
---
-
-## Activation
-```python
-keras.layers.core.Activation(activation)
-```
-Apply an activation function to the input.
-
-
- __Input shape__: Arbitrary. Use the keyword argument `input_shape` (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model. To specify the number of samples per batch, you can use the keyword argument `batch_input_shape` (tuple of integers, including the samples axis).
-
- __Output shape__: Same as input.
-
- __Arguments__:
-
-    - __activation__: name of activation function to use (see: [activations](../activations.md)), or alternatively, elementwise Theano function.
-
-
---
-
-## Dropout
-```python
-keras.layers.core.Dropout(p)
-```
-Apply dropout to the input. Dropout consists in randomly setting a fraction `p` of input units to 0 at each update during training time, which helps prevent overfitting. Reference: [Dropout: A Simple Way to Prevent Neural Networks from Overfitting](http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf)
-
-
- __Input shape__: Arbitrary. Use the keyword argument `input_shape` (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model. To specify the number of samples per batch, you can use the keyword argument `batch_input_shape` (tuple of integers, including the samples axis).
-
- __Output shape__: Same as input.
-
- __Arguments__:
-
-    - __p__: float (0 <= p < 1). Fraction of the input that gets dropped out at training time.
-
---
-
-
-## Reshape
-```python
-keras.layers.core.Reshape(dims)
-```
-
-Reshape the input to a new shape containing the same number of units.
-
-
- __Input shape__: Arbitrary. Use the keyword argument `input_shape` (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model. To specify the number of samples per batch, you can use the keyword argument `batch_input_shape` (tuple of integers, including the samples axis).
-
- __Output shape__: `(nb_samples, dims)`.
-
- __Arguments__:
-
-    - dims: tuple of integers. Dimensions of the new shape.
-
- __Example__:
-```python
-# input shape: (nb_samples, 10)
-model.add(Dense(100, input_dim=10)) # output shape: (nb_samples, 100)
-model.add(Reshape(dims=(10, 10)))  # output shape: (nb_samples, 10, 10)
-```
-
---
-
-## Flatten
-```python
-keras.layers.core.Flatten()
-```
-
-Convert a nD input to 1D.
-
- __Input shape__: Arbitrary. Use the keyword argument `input_shape` (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model. To specify the number of samples per batch, you can use the keyword argument `batch_input_shape` (tuple of integers, including the samples axis).
-
- __Output shape__: `(nb_samples, nb_input_units)`.
-
---
-
-## RepeatVector
-```python
-keras.layers.core.RepeatVector(n)
-```
-Repeat the 1D input n times. Dimensions of input are assumed to be `(nb_samples, dim)`. Output will have the shape `(nb_samples, n, dim)`.
-
-Note that the output is still a single tensor; `RepeatVector` does not split the data flow.
-
- __Input shape__: Arbitrary. Use the keyword argument `input_shape` (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model. To specify the number of samples per batch, you can use the keyword argument `batch_input_shape` (tuple of integers, including the samples axis).
-
- __Output shape__: `(nb_samples, n, input_dims)`.
-
- __Arguments__:
-    - __n__: int.
-
---
-
-## Permute
-```python
-keras.layers.core.Permute(dims)
-```
-Permute the dimensions of the input data according to the given tuple. Sometimes useful for connecting RNNs and convnets together.
-
- __Input shape__: Arbitrary. Use the keyword argument `input_shape` (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model. To specify the number of samples per batch, you can use the keyword argument `batch_input_shape` (tuple of integers, including the samples axis).
-
- __Output shape__: Same as the input shape, but with the dimensions re-ordered according to the ordering specified by the tuple.
-
- __Argument__: tuple specifying the permutation scheme (e.g. `(2, 1)` permutes the first and second dimension of the input).
-
- __Example__:
-```python
-# input shape: (nb_samples, 10)
-model.add(Dense(50, input_dim=10)) # output shape: (nb_samples, 50)
-model.add(Reshape(dims=(10, 5))) # output shape: (nb_samples, 10, 5)
-model.add(Permute(dims=(2, 1))) #output shape: (nb_samples, 5, 10)
-```
-
---
-
-## ActivityRegularization
-```python
-keras.layers.core.ActivityRegularization(l1=0., l2=0.)
-```
-
-Leaves the input unchanged, but adds a term to the loss function based on the input activity. L1 and L2 regularization supported.
-
-This layer can be used, for instance, to induce activation sparsity in the previous layer.
-
---
-
-## MaxoutDense
-```python
-keras.layers.core.MaxoutDense(output_dim, nb_feature=4,
-        init='glorot_uniform',
-        weights=None,
-        W_regularizer=None, b_regularizer=None, activity_regularizer=None,
-        W_constraint=None, b_constraint=None,
-        input_dim=None)
-```
-
-A dense maxout layer. A `MaxoutDense` layer takes the element-wise maximum of `nb_feature` `Dense(input_dim, output_dim)` linear layers. This allows the layer to learn a convex, piecewise linear activation function over the inputs. See [this paper](http://arxiv.org/pdf/1302.4389.pdf) for more details. Note that this is a *linear* layer -- if you wish to apply activation function (you shouldn't need to -- they are universal function approximators), an `Activation` layer must be added after.
-
- __Input shape__: 2D tensor with shape: `(nb_samples, input_dim)`.
-
- __Output shape__: 2D tensor with shape: `(nb_samples, output_dim)`.
-
- __Arguments__:
-
-    - __output_dim__: int >= 0.
-    - __nb_feature__: int >= 0. the number of features to create for the maxout. This is equivalent to the number of piecewise elements to be allowed for the activation function.
-    - __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
-    - __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
-    - __W_regularizer__: instance of [WeightRegularizer](../regularizers.md) (eg. L1 or L2 regularization), applied to the main weights matrix.
-    - __b_regularizer__: instance of [WeightRegularizer](../regularizers.md), applied to the bias.
-    - __activity_regularizer__: instance of [ActivityRegularizer](../regularizers.md), applied to the network output.
-    - __W_constraint__: instance of the [constraints](../constraints.md) module (eg. maxnorm, nonneg), applied to the main weights matrix.
-    - __b_constraint__: instance of the [constraints](../constraints.md) module, applied to the bias.
-    - __input_dim__: dimensionality of the input (integer). This argument (or alternatively, the keyword argument `input_shape`) is required when using this layer as the first layer in a model.
-
-```python
-# input shape: (nb_samples, 10)
-model.add(Dense(100, input_dim=10)) # output shape: (nb_samples, 100)
-model.add(MaxoutDense(50, nb_feature=10)) # output shape: (nb_samples, 50)
-```
-
-## Merge
-```python
-keras.layers.core.Merge(layers, mode='sum', concat_axis=-1, dot_axes=-1)
-```
-
-Merge the output of a list of layers (or containers) into a single tensor.
-
- __Arguments__:
-    - __layers__: List of layers or [containers](/layers/containers/).
-    - __mode__: String, one of `{'sum', 'mul', 'concat', 'ave', 'dot'}`. `sum`, `mul` and `ave` will simply sum/multiply/average the outputs of the layers (therefore all layers should have an output with the same shape). `concat` will concatenate the outputs along the dimension specified by `concate_axis` (therefore all layers should have an output that only differ along this dimension). `dot` will dot tensor contraction on the axes specified by `dot_axes` (see [the Numpy documentation](http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.tensordot.html) for more details).
-    - __concat_axis__: axis to use in `concat` mode.
-    - __dot_axes__: axis or axes to use in `dot` mode (see [the Numpy documentation](http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.tensordot.html) for more details).
-
-
- __Notes__:
-    - `dot` mode only works with Theano for the time being.
-
- __Example__:
-
-```python
-left = Sequential()
-left.add(Dense(50, input_shape=(784,)))
-left.add(Activation('relu'))
-
-right = Sequential()
-right.add(Dense(50, input_shape=(784,)))
-right.add(Activation('relu'))
-
-model = Sequential()
-model.add(Merge([left, right], mode='sum'))
-
-model.add(Dense(10))
-model.add(Activation('softmax'))
-
-model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
-
-model.fit([X_train, X_train], Y_train, batch_size=128, nb_epoch=20, validation_data=([X_test, X_test], Y_test))
-```
-
-## Masking
-```python
-keras.layers.core.Masking(mask_value=0.)
-```
-
-Create a mask for the input data by using `mask_value` as the sentinel value which should be masked out.
-Given an input of dimensions `(nb_samples, timesteps, input_dim)`, return the input untouched as output, and supply a mask of shape `(nb_samples, timesteps)` where all timesteps which had *all* their values equal to `mask_value` are masked out.
-
- __Input shape__: 3D tensor with shape: `(nb_samples, timesteps, features)`.
-
- __Output shape__: 3D tensor with shape: `(nb_samples, timesteps, features)`.
-
- __Notes__: Masking only works in Theano for the time being.
-
-## Lambda
-```python
-keras.layers.core.Lambda(function, output_shape=None)
-```
-
-Used for evaluating an arbitrary Theano expression on the output of the previous layer.
-
- __Input shape__: Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.
-
- __Output shape__: Specified by the `output_shape` argument.
-
- __Arguments__:
-
-    - __function__: The expression to be evaluated. Takes one argument: the output of the previous layer.
-    - __output_shape__: Shape of the tensor returned by `function`. Should be a shape tuple (not including the samples dimension) or a function of the full input shape tuple (including samples dimension).
-
- __Example__:
-
-```python
-# custom softmax function
-def sharp_softmax(X, beta=1.5):
-    return theano.tensor.nnet.softmax(X * beta)
-
-def output_shape(input_shape):
-    # here input_shape includes the samples dimension
-    return input_shape  # shape is unchanged
-
-model = Sequential()
-model.add(Dense(input_dim=10, output_dim=10))
-model.add(Lambda(sharp_softmax, output_shape))
-model.add(Dense(1))
-model.add(Activation('sigmoid'))
-```
-
-
-## LambdaMerge
-```python
-keras.layers.core.LambdaMerge(layers, function, output_shape=None)
-```
-
-Merge the output of a list of layers (or containers) into a single tensor, using an arbitrary Theano expression.
-
- __Arguments__:
-    - __layers__: List of layers or [containers](/layers/containers/).
-    - __function__: The expression to be evaluated. Takes one argument: the list of input tensors.
-    - __output_shape__: Shape of the tensor returned by `function`. Should be a shape tuple (not including samples dimension) or a function of the list of input shape tuples (including samples dimension).
-
- __Example__:
-
-```python
-# root mean square function
-def rms(inputs):
-    # inputs is a list of tensors
-    s = inputs[0] ** 2
-    for i in range(1, len(inputs)):
-        s += inputs[i] ** 2
-    s /= len(inputs)
-    s = theano.tensor.sqrt(s)
-    # return a single tensor
-    return s
-
-def output_shape(input_shapes):
-    # return the shape of the first tensor
-    return input_shapes[0]
-
-left = Sequential()
-left.add(Dense(input_dim=10, output_dim=10))
-left.add(Activation('sigmoid'))
-
-right = Sequential()
-right.add(Dense(input_dim=10, output_dim=10))
-right.add(Activation('sigmoid'))
-
-model = Sequential()
-model.add(LambdaMerge([left, right], rms, output_shape))
-
-model.add(Dense(1))
-model.add(Activation('sigmoid'))
-```
-
---
--- a/docs/sources/layers/embeddings.md
+++ b/docs/sources/layers/embeddings.md
@ -1,31 +0,0 @@
-
-## Embedding
-
-```python
-keras.layers.embeddings.Embedding(input_dim, output_dim,
-                                  init='uniform',
-                                  weights=None,
-                                  W_regularizer=None, W_constraint=None,
-                                  mask_zero=False,
-                                  input_length=None)
-```
-
-Turn positive integers (indexes) into dense vectors of fixed size,
-eg. `[[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]`
-
- __Input shape__: 2D tensor with shape: `(nb_samples, sequence_length)`.
-
- __Output shape__: 3D tensor with shape: `(nb_samples, sequence_length, output_dim)`.
-
- __Arguments__:
-
-    - __input_dim__: int >= 0. Size of the vocabulary, ie. 1+maximum integer index occurring in the input data.
-    - __output_dim__: int >= 0. Dimension of the dense embedding.
-    - __init__: name of initialization function for the weights of the layer (see: [initializations](../initializations.md)), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument.
-    - __weights__: list of numpy arrays to set as initial weights. The list should have 1 element, of shape `(input_dim, output_dim)`.
-    - __W_regularizer__: instance of the [regularizers](../regularizers.md) module (eg. L1 or L2 regularization), applied to the embedding matrix.
-    - __W_constraint__: instance of the [constraints](../constraints.md) module (eg. maxnorm, nonneg), applied to the embedding matrix.
-	- __mask_zero__: Whether or not the input value 0 is a special "padding" value that should be masked out. This is useful for [recurrent layers](recurrent.md) which may take variable length input. If this is `True` then all subsequent layers in the model need to support masking or an exception will be raised.
-    - __input_length__: Length of input sequences, when it is constant. This argument is required if you are going to connect `Flatten` then `Dense` layers upstream (without it, the shape of the dense outputs cannot be computed).
-
---
--- a/docs/sources/layers/noise.md
+++ b/docs/sources/layers/noise.md
@ -1,37 +0,0 @@
-
-
-## GaussianNoise
-```python
-keras.layers.noise.GaussianNoise(sigma)
-```
-Apply to the input an additive zero-centred gaussian noise with standard deviation `sigma`. This is useful to mitigate overfitting (you could see it as a kind of random data augmentation). Gaussian Noise (GS) is a natural choice as corruption process for real valued inputs.
-
-Only active at training time.
-
- __Input shape__: Arbitrary. Use the keyword argument `input_shape` (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model. To specify the number of samples per batch, you can use the keyword argument `batch_input_shape` (tuple of integers, including the samples axis).
-
- __Output shape__: Same as input.
-
- __Arguments__:
-
-    - __sigma__: float, standard deviation of the noise distribution.
-
---
-
-## GaussianDropout
-```python
-keras.layers.noise.GaussianDropout(p)
-```
-Apply to the input an multiplicative one-centred gaussian noise with standard deviation `sqrt(p/(1-p))`. p refers to drop probability to match Dropout layer syntax. 
-
-Only active at training time.
-
- __Input shape__: Arbitrary. Use the keyword argument `input_shape` (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model. To specify the number of samples per batch, you can use the keyword argument `batch_input_shape` (tuple of integers, including the samples axis).
-
- __Output shape__: Same as input.
-
- __Arguments__:
-
-    - __p__: float, drop probability as with Dropout.
-
-
--- a/docs/sources/layers/normalization.md
+++ b/docs/sources/layers/normalization.md
@ -1,19 +0,0 @@
-
-## BatchNormalization
-
-```python
-keras.layers.normalization.BatchNormalization(epsilon=1e-6, weights=None)
-```
-
-Normalize the activations of the previous layer at each batch.
-
- __Input shape__: Arbitrary. Use the keyword argument `input_shape` (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.
-
- __Output shape__: Same as input.
-
- __Arguments__: 
-    - __epsilon__: small float > 0. Fuzz parameter.
-    - __weights__: Initialization weights. List of 2 numpy arrays, with shapes: `[(input_shape,), (input_shape,)]`
-
- __References__:
-    - [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](http://arxiv.org/pdf/1502.03167v3.pdf)
--- a/docs/sources/layers/recurrent.md
+++ b/docs/sources/layers/recurrent.md
@ -1,130 +0,0 @@
-
-## SimpleRNN
-
-```python
-keras.layers.recurrent.SimpleRNN(output_dim,
-        init='glorot_uniform', inner_init='orthogonal',
-        activation='sigmoid',
-        weights=None,
-        return_sequences=False,
-        go_backwards=False,
-        stateful=False,
-        input_dim=None, input_length=None)
-```
-Fully connected RNN where the output is to fed back to the input. 
-
- __Input shape__: 3D tensor with shape: `(nb_samples, timesteps, input_dim)`.
-
- __Output shape__: 
-    - if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, output_dim)`.
-    - else: 2D tensor with shape: `(nb_samples, output_dim)`.
-
- __Masking__: This layer supports masking for input data with a variable number of timesteps To introduce masks to your data, use an [Embedding](embeddings.md) layer with the `mask_zero` parameter set to `True`. **Note:** for the time being, masking in only supported with Theano.
-
- __Notes__: When using the TensorFlow backend, the number of timesteps used must be fixed. Make sure to pass an `input_length` int argument or a complete `input_shape` tuple argument.
-
-
- __Arguments__:
-    - __output_dim__: dimension of the internal projections and the final output.
-    - __init__: weight initialization function. Can be the name of an existing function (str), or a Theano function (see: [initializations](../initializations.md)).
-    - __activation__: activation function. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
-    - __weights__: list of numpy arrays to set as initial weights. The list should have 3 elements, of shapes: `[(input_dim, output_dim), (output_dim, output_dim), (output_dim,)]`.
-    - __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
-    - __go_backwards__: Boolean (default False). If True, process the input sequence backwards.
-    - __stateful__: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
-    - __input_dim__: dimensionality of the input (integer). This argument (or alternatively, the keyword argument `input_shape`) is required when using this layer as the first layer in a model.
-    - __input_length__: Length of input sequences, when it is constant. This argument is required if you are going to connect `Flatten` then `Dense` layers upstream (without it, the shape of the dense outputs cannot be computed).
-
-
---
-
-## GRU
-
-```python
-keras.layers.recurrent.GRU(output_dim,
-        init='glorot_uniform', inner_init='orthogonal',
-        activation='sigmoid', inner_activation='hard_sigmoid',
-        return_sequences=False,
-        go_backwards=False,
-        stateful=False,
-        input_dim=None, input_length=None)
-```
-
-Gated Recurrent Unit - Cho et al. 2014.
-
- __Input shape__: 3D tensor with shape: `(nb_samples, timesteps, input_dim)`.
-
- __Output shape__:
-    - if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, output_dim)`.
-    - else: 2D tensor with shape: `(nb_samples, output_dim)`.
-
- __Masking__: This layer supports masking for input data with a variable number of timesteps To introduce masks to your data, use an [Embedding](embeddings.md) layer with the `mask_zero` parameter set to true. **Note:** for the time being, masking in only supported with Theano.
-
- __Notes__: When using the TensorFlow backend, the number of timesteps used must be fixed. Make sure to pass an `input_length` int argument or a complete `input_shape` tuple argument.
-
- __Arguments__:
-    - __output_dim__: dimension of the internal projections and the final output.
-    - __init__: weight initialization function for the output cell. Can be the name of an existing function (str), or a Theano function (see: [initializations](../initializations.md)).
-    - __inner_init__: weight initialization function for the inner cells.
-    - __activation__: activation function for the output. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
-    - __inner_activation__: activation function for the inner cells.
-    - __weights__: list of numpy arrays to set as initial weights. The list should have 9 elements.
-    - __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
-    - __go_backwards__: Boolean (default False). If True, process the input sequence backwards.
-    - __stateful__: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
-    - __input_dim__: dimensionality of the input (integer). This argument (or alternatively, the keyword argument `input_shape`) is required when using this layer as the first layer in a model.
-    - __input_length__: Length of input sequences, when it is constant. This argument is required if you are going to connect `Flatten` then `Dense` layers upstream (without it, the shape of the dense outputs cannot be computed).
-
-
- __References__: 
-    - [On the Properties of Neural Machine Translation: Encoder–Decoder Approaches](http://www.aclweb.org/anthology/W14-4012)
-    - [Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling](http://arxiv.org/pdf/1412.3555v1.pdf)
-
---
-
-## LSTM
-
-```python
-keras.layers.recurrent.LSTM(output_dim,
-        init='glorot_uniform', inner_init='orthogonal', forget_bias_init='one',
-        activation='tanh', inner_activation='hard_sigmoid',
-        weights=None,
-        return_sequences=False,
-        go_backwards=False,
-        stateful=False,
-        input_dim=None, input_length=None)
-```
-
-Long-Short Term Memory unit - Hochreiter 1997.
-
- __Input shape__: 3D tensor with shape: `(nb_samples, timesteps, input_dim)`.
-
- __Output shape__:
-    - if `return_sequences`: 3D tensor with shape: `(nb_samples, timesteps, output_dim)`.
-    - else: 2D tensor with shape: `(nb_samples, output_dim)`.
-
- __Masking__: This layer supports masking for input data with a variable number of timesteps To introduce masks to your data, use an [Embedding](embeddings.md) layer with the `mask_zero` parameter set to true. **Note:** for the time being, masking in only supported with Theano.
-
- __Notes__: When using the TensorFlow backend, the number of timesteps used must be fixed. Make sure to pass an `input_length` int argument or a complete `input_shape` tuple argument.
-
- __Arguments__:
-    - __output_dim__: dimension of the internal projections and the final output.
-    - __init__: weight initialization function for the output cell. Can be the name of an existing function (str), or a Theano function (see: [initializations](../initializations.md)).
-    - __inner_init__: weight initialization function for the inner cells.
-    - __forget_bias_init__: initialization function for the bias of the forget gate. [Jozefowicz et al.](http://www.jmlr.org/proceedings/papers/v37/jozefowicz15.pdf) recommend initializing with ones.
-    - __activation__: activation function for the output. Can be the name of an existing function (str), or a Theano function (see: [activations](../activations.md)).
-    - __inner_activation__: activation function for the inner cells.
-    - __weights__: list of numpy arrays to set as initial weights. The list should have 12 elements.
-    - __return_sequences__: Boolean. Whether to return the last output in the output sequence, or the full sequence.
-    - __go_backwards__: Boolean (default False). If True, process the input sequence backwards.
-    - __stateful__: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
-    - __input_dim__: dimensionality of the input (integer). This argument (or alternatively, the keyword argument `input_shape`) is required when using this layer as the first layer in a model.
-    - __input_length__: Length of input sequences, when it is constant. This argument is required if you are going to connect `Flatten` then `Dense` layers upstream (without it, the shape of the dense outputs cannot be computed).
-
-
- __References__: 
-    - [Long short-term memory](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf) (original 1997 paper)
-    - [Learning to forget: Continual prediction with LSTM](http://www.mitpressjournals.org/doi/pdf/10.1162/089976600300015015)
-    - [Supervised sequence labelling with recurrent neural networks](http://www.cs.toronto.edu/~graves/preprint.pdf)
-
---
--- a/docs/sources/models.md
+++ b/docs/sources/models.md
@ -1,216 +0,0 @@
-## Sequential
-
-Linear stack of layers.
-
-```python
-model = keras.models.Sequential()
-```
- __Methods__:
-    - __add__(layer): Add a layer to the model.
-    - __compile__(optimizer, loss, class_mode="categorical"):
-        - __Arguments__:
-            - __optimizer__: str (name of optimizer) or optimizer object. See [optimizers](optimizers.md).
-            - __loss__: str (name of objective function) or objective function. See [objectives](objectives.md).
-            - __class_mode__: one of "categorical", "binary". This is only used for computing classification accuracy or using the predict_classes method.
-    - __fit__(X, y, batch_size=128, nb_epoch=100, verbose=1, validation_split=0., validation_data=None, shuffle=True, show_accuracy=False, callbacks=[], class_weight=None, sample_weight=None): Train a model for a fixed number of epochs.
-        - __Return__: a history object. It `history` attribute is a record of training loss values at successive epochs, as well as validation loss values (if applicable).
-        - __Arguments__:
-            - __X__: data.
-            - __y__: labels.
-            - __batch_size__: int. Number of samples per gradient update.
-            - __nb_epoch__: int.
-            - __verbose__: 0 for no logging to stdout, 1 for progress bar logging, 2 for one log line per epoch.
-            - __callbacks__: `keras.callbacks.Callback` list. List of callbacks to apply during training. See [callbacks](callbacks.md).
-            - __validation_split__: float (0. < x < 1). Fraction of the data to use as held-out validation data.
-            - __validation_data__: tuple (X, y) to be used as held-out validation data. Will override validation_split.
-            - __shuffle__: boolean or str (for 'batch'). Whether to shuffle the samples at each epoch. 'batch' is a special option for dealing with the limitations of HDF5 data; it shuffles in batch-sized chunks.
-            - __show_accuracy__: boolean. Whether to display class accuracy in the logs to stdout at each epoch.
-            - __class_weight__: dictionary mapping classes to a weight value, used for scaling the loss function (during training only).
-            - __sample_weight__: list or numpy array with 1:1 mapping to the training samples, used for scaling the loss function (during training only). For time-distributed data, there is one weight per sample *per timestep*, i.e. if your output data is shaped `(nb_samples, timesteps, output_dim)`, your mask should be of shape `(nb_samples, timesteps, 1)`. This allows you to mask out or reweight individual output timesteps, which is useful in sequence to sequence learning.
-    - __evaluate__(X, y, batch_size=128, show_accuracy=False, verbose=1, sample_weight=None): Show performance of the model over some validation data.
-        - __Return__: The loss score over the data, or a `(loss, accuracy)` tuple if `show_accuracy=True`.
-        - __Arguments__: Same meaning as fit method above. verbose is used as a binary flag (progress bar or nothing).
-    - __predict__(X, batch_size=128, verbose=1):
-        - __Return__: An array of predictions for some test data.
-        - __Arguments__: Same meaning as fit method above.
-    - __predict_classes__(X, batch_size=128, verbose=1): Return an array of class predictions for some test data.
-        - __Return__: An array of labels for some test data.
-        - __Arguments__: Same meaning as fit method above. verbose is used as a binary flag (progress bar or nothing).
-    - __train_on_batch__(X, y, accuracy=False, class_weight=None, sample_weight=None): Single gradient update on one batch.
-        - __Return__: loss over the data, or tuple `(loss, accuracy)` if `accuracy=True`.
-    - __test_on_batch__(X, y, accuracy=False, sample_weight=None): Single performance evaluation on one batch.
-        - __Return__: loss over the data, or tuple `(loss, accuracy)` if `accuracy=True`.
-    - __save_weights__(fname, overwrite=False): Store the weights of all layers to a HDF5 file. If overwrite==False and the file already exists, an exception will be thrown.
-    - __load_weights__(fname): Sets the weights of a model, based to weights stored by __save_weights__. You can only __load_weights__ on a savefile from a model with an identical architecture. __load_weights__ can be called either before or after the __compile__ step.
-    - __summary__(): Print out a summary of the model architecture, with parameter count information.
-
-__Examples__:
-
-```python
-from keras.models import Sequential
-from keras.layers.core import Dense, Dropout, Activation
-from keras.optimizers import SGD
-
-model = Sequential()
-model.add(Dense(2, init='uniform', input_dim=64))
-model.add(Activation('softmax'))
-
-model.compile(optimizer='sgd', loss='mse')
-
-'''
-Demonstration of verbose modes 1 and 2
-'''
-model.fit(X_train, y_train, nb_epoch=3, batch_size=16, verbose=1)
-# outputs
-'''
-Train on 37800 samples, validate on 4200 samples
-Epoch 0
-37800/37800 [==============================] - 7s - loss: 0.0385
-Epoch 1
-37800/37800 [==============================] - 8s - loss: 0.0140
-Epoch 2
-10960/37800 [=======>......................] - ETA: 4s - loss: 0.0109
-'''
-
-model.fit(X_train, y_train, nb_epoch=3, batch_size=16, verbose=2)
-# outputs
-'''
-Train on 37800 samples, validate on 4200 samples
-Epoch 0
-loss: 0.0190
-Epoch 1
-loss: 0.0146
-Epoch 2
-loss: 0.0049
-'''
-
-'''
-Demonstration of show_accuracy
-'''
-model.fit(X_train, y_train, nb_epoch=3, batch_size=16, verbose=2, show_accuracy=True)
-# outputs
-'''
-Train on 37800 samples, validate on 4200 samples
-Epoch 0
-loss: 0.0190 - acc.: 0.8750
-Epoch 1
-loss: 0.0146 - acc.: 0.8750
-Epoch 2
-loss: 0.0049 - acc.: 1.0000
-'''
-
-'''
-Demonstration of validation_split
-'''
-model.fit(X_train, y_train, nb_epoch=3, batch_size=16, validation_split=0.1, show_accuracy=True, verbose=1)
-# outputs
-'''
-Train on 37800 samples, validate on 4200 samples
-Epoch 0
-37800/37800 [==============================] - 7s - loss: 0.0385 - acc.: 0.7258 - val. loss: 0.0160 - val. acc.: 0.9136
-Epoch 1
-37800/37800 [==============================] - 8s - loss: 0.0140 - acc.: 0.9265 - val. loss: 0.0109 - val. acc.: 0.9383
-Epoch 2
-10960/37800 [=======>......................] - ETA: 4s - loss: 0.0109 - acc.: 0.9420
-'''
-```
-
---
-
-## Graph
-
-Arbitrary connection graph. It can have any number of inputs and outputs, with each output trained with its own loss function. The quantity being optimized by a Graph model is the sum of all loss functions over the different outputs.
-
-```python
-model = keras.models.Graph()
-```
- __Methods__:
-    - __add_input__(name, input_shape, dtype='float'): Add an input with shape dimensionality `ndim`. 
-        - __Arguments__:
-            - __input_shape__: Integer tuple, shape of the expected input (not including the samples axis). E.g. (10,) for 10-dimensional vectors, (None, 128) for sequences (of variable length) of 128-dimensional vectors, (3, 32, 32) for 32x32 images with RGB channels.
-            - __batch_input_shape: Integer tuple, shape of the expected batch input (including the samples axis).
-            - __dtype__: `float` or `int`. Type of the expected input data.
-    - __add_output__(name, input=None, inputs=[], merge_mode='concat'): Add an output connect to `input` or `inputs`.
-        - __Arguments__:
-            - __name__: str. unique identifier of the output.
-            - __input__: str name of the node that the output is connected to. Only specify *one* of either `input` or `inputs`.
-            - __inputs__: list of str names of the node that the output is connected to.
-            - __merge_mode__: "sum" or "concat". Only applicable if `inputs` list is specified. Merge mode for the different inputs.
-    - __add_node__(layer, name, input=None, inputs=[], merge_mode='concat', concat_axis=-1, dot_axes=-1): Add an output connect to `input` or `inputs`.
-        - __Arguments__:
-            - __layer__: Layer instance.
-            - __name__: str. unique identifier of the node.
-            - __input__: str name of the node/input that the node is connected to. Only specify *one* of either `input` or `inputs`.
-            - __inputs__: list of str names of the node that the node is connected to.
-            - __merge_mode__: "sum" or "concat". Only applicable if `inputs` list is specified. Merge mode for the different inputs.
-            - __concat_axis__: axis to use in `concat` mode.
-            - __dot_axes__: axis or axes to use in `dot` mode (see [the Numpy documentation](http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.tensordot.html) for more details).
-    - __add_shared_node__(layer, name, inputs=[], merge_mode=None, outputs=[]): Add a shared node connected to `inputs`. A shared node is a layer that will be applied separately to every incoming input, and that uses only one set of weights. The merging operation occurs on the outputs of the layer. 
-        - __Arguments__:
-            - __layer__: Layer instance.
-            - __name__: str. unique identifier of the node.
-            - __inputs__: list of str names of the node that the node is connected to.
-            - __merge_mode__: Merge mode for the different inputs.
-            - __outputs__: Optional. List of names for outputs, when merge_mode = None.
-    - __compile__(optimizer, loss):
-        - __Arguments__:
-            - __optimizer__: str (name of optimizer) or optimizer object. See [optimizers](optimizers.md).
-            - __loss__: dictionary mapping the name(s) of the output(s) to a loss function (string name of objective function or objective function. See [objectives](objectives.md)).
-    - __fit__(data, batch_size=128, nb_epoch=100, verbose=1, validation_split=0., validation_data=None, shuffle=True, callbacks=[]): Train a model for a fixed number of epochs.
-        - __Return__: a history object. It `history` attribute is a record of training loss values at successive epochs, as well as validation loss values (if applicable).
-        - __Arguments__:
-            - __data__: dictionary mapping input names and outputs names to appropriate numpy arrays. All arrays should contain the same number of samples.
-            - __batch_size__: int. Number of samples per gradient update.
-            - __nb_epoch__: int.
-            - __verbose__: 0 for no logging to stdout, 1 for progress bar logging, 2 for one log line per epoch.
-            - __callbacks__: `keras.callbacks.Callback` list. List of callbacks to apply during training. See [callbacks](callbacks.md).
-            - __validation_split__: float (0. < x < 1). Fraction of the data to use as held-out validation data.
-            - __validation_data__: dictionary mapping input names and outputs names to appropriate numpy arrays to be used as held-out validation data. All arrays should contain the same number of samples. Will override validation_split.
-            - __shuffle__: boolean. Whether to shuffle the samples at each epoch.
-    - __evaluate__(data, batch_size=128, verbose=1): Show performance of the model over some validation data.
-        - __Return__: The loss score over the data.
-        - __Arguments__: Same meaning as fit method above. verbose is used as a binary flag (progress bar or nothing).
-    - __predict__(data, batch_size=128, verbose=1):
-        - __Return__: A dictionary mapping output names to arrays of predictions over the data.
-        - __Arguments__: Same meaning as fit method above. Only inputs need to be specified in `data`.
-    - __train_on_batch__(data): Single gradient update on one batch.
-        - __Return__: loss over the data.
-    - __test_on_batch__(data): Single performance evaluation on one batch.
-        - __Return__: loss over the data.
-    - __save_weights__(fname, overwrite=False): Store the weights of all layers to a HDF5 file. If `overwrite==False` and the file already exists, an exception will be thrown.
-    - __load_weights__(fname): Sets the weights of a model, based to weights stored by __save_weights__. You can only __load_weights__ on a savefile from a model with an identical architecture. __load_weights__ can be called either before or after the __compile__ step.
-    - __summary__(): Print out a summary of the model architecture, with parameter count information.
-
-
-__Examples__:
-
-```python
-# graph model with one input and two outputs
-graph = Graph()
-graph.add_input(name='input', input_shape=(32,))
-graph.add_node(Dense(16), name='dense1', input='input')
-graph.add_node(Dense(4), name='dense2', input='input')
-graph.add_node(Dense(4), name='dense3', input='dense1')
-graph.add_output(name='output1', input='dense2')
-graph.add_output(name='output2', input='dense3')
-
-graph.compile(optimizer='rmsprop', loss={'output1':'mse', 'output2':'mse'})
-history = graph.fit({'input':X_train, 'output1':y_train, 'output2':y2_train}, nb_epoch=10)
-
-```
-
-```python
-# graph model with two inputs and one output
-graph = Graph()
-graph.add_input(name='input1', input_shape=(32,))
-graph.add_input(name='input2', input_shape=(32,))
-graph.add_node(Dense(16), name='dense1', input='input1')
-graph.add_node(Dense(4), name='dense2', input='input2')
-graph.add_node(Dense(4), name='dense3', input='dense1')
-graph.add_output(name='output', inputs=['dense2', 'dense3'], merge_mode='sum')
-graph.compile(optimizer='rmsprop', loss={'output':'mse'})
-
-history = graph.fit({'input1':X_train, 'input2':X2_train, 'output':y_train}, nb_epoch=10)
-predictions = graph.predict({'input1':X_test, 'input2':X2_test}) # {'output':...}
-
-```
--- a/docs/sources/optimizers.md
+++ b/docs/sources/optimizers.md
@ -1,117 +0,0 @@
-
-## Usage of optimizers
-
-An optimizer is one of the two arguments required for compiling a Keras model:
-
-```python
-model = Sequential()
-model.add(Dense(64, init='uniform', input_dim=10))
-model.add(Activation('tanh'))
-model.add(Activation('softmax'))
-
-sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
-model.compile(loss='mean_squared_error', optimizer=sgd)
-```
-
-You can either instantiate an optimizer before passing it to `model.compile()` , as in the above example, or you can call it by its name. In the latter case, the default parameters for the optimizer will be used.
-
-```python
-# pass optimizer by name: default parameters will be used
-model.compile(loss='mean_squared_error', optimizer='sgd')
-```
-
---
-
-## Base class
-
-```python
-keras.optimizers.Optimizer(**kwargs)
-```
-
-All optimizers descended from this class support the following keyword argument:
-
- __clipnorm__: float >= 0.
-
-Note: this is base class for building optimizers, not an actual optimizer that can be used for training models.
-
---
-
-##  SGD
-
-```python
-keras.optimizers.SGD(lr=0.01, momentum=0., decay=0., nesterov=False)
-``` 
-
-__Arguments__:
-
- __lr__: float >= 0. Learning rate.
- __momentum__: float >= 0. Parameter updates momentum.
- __decay__: float >= 0. Learning rate decay over each update.
- __nesterov__: boolean. Whether to apply Nesterov momentum.
-
---
-
-##  Adagrad
-
-```python
-keras.optimizers.Adagrad(lr=0.01, epsilon=1e-6)
-```
-
-It is recommended to leave the parameters of this optimizer at their default values.
-
-__Arguments__:
-
- __lr__: float >= 0. Learning rate. 
- __epsilon__: float >= 0. 
-
---
-
-##  Adadelta
-
-```python
-keras.optimizers.Adadelta(lr=1.0, rho=0.95, epsilon=1e-6)
-```
-
-It is recommended to leave the parameters of this optimizer at their default values.
-
-__Arguments__:
-
- __lr__: float >= 0. Learning rate. It is recommended to leave it at the default value.
- __rho__: float >= 0. 
- __epsilon__: float >= 0. Fuzz factor.
-
-For more info, see *"Adadelta: an adaptive learning rate method"* by Matthew Zeiler.
-
---
-
-##  RMSprop 
-
-```python
-keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-6)
-```
-
-It is recommended to leave the parameters of this optimizer at their default values.
-
-__Arguments__:
-
- __lr__: float >= 0. Learning rate. 
- __rho__: float >= 0.
- __epsilon__: float >= 0. Fuzz factor.
-
---
-
-## Adam
-
-```python
-keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
-```
-
-Adam optimizer, proposed by Kingma and Lei Ba in [Adam: A Method For Stochastic Optimization](http://arxiv.org/pdf/1412.6980v8.pdf). Default parameters are those suggested in the paper.
-
-__Arguments__:
-
- __lr__: float >= 0. Learning rate.
- __beta_1__, __beta_2__: floats, 0 < beta < 1. Generally close to 1.
- __epsilon__: float >= 0. Fuzz factor.
-
---
--- a/docs/templates/activations.md
+++ b/docs/templates/activations.md
--- a/docs/templates/backend.md
+++ b/docs/templates/backend.md
--- a/docs/templates/callbacks.md
+++ b/docs/templates/callbacks.md
@ -4,51 +4,12 @@ A callback is a set of functions to be applied at given stages of the training p

 ---

-## Base class
-
-```python
-keras.callbacks.Callback()
-```
- __Properties__:
-    - __params__: dict. Training parameters (eg. verbosity, batch size, number of epochs...).
-    - __model__: `keras.models.Model`. Reference of the model being trained.
- __Methods__:
-    - __on_train_begin__(logs={}): Method called at the beginning of training.
-    - __on_train_end__(logs={}): Method called at the end of training.
-    - __on_epoch_begin__(epoch, logs={}): Method called at the beginning of epoch `epoch`.
-    - __on_epoch_end__(epoch, logs={}): Method called at the end of epoch `epoch`.
-    - __on_batch_begin__(batch, logs={}): Method called at the beginning of batch `batch`.
-    - __on_batch_end__(batch, logs={}): Method called at the end of batch `batch`.
-
-The `logs` dictionary will contain keys for quantities relevant to the current batch or epoch. Currently, the `.fit()` method of the `Sequential` model class will include the following quantities in the `logs` that it passes to its callbacks:
- __on_epoch_end__: logs optionally include `val_loss` (if validation is enabled in `fit`), and `val_acc` (if validation and accuracy monitoring are enabled).
- __on_batch_begin__: logs include `size`, the number of samples in the current batch.
- __on_batch_end__: logs include `loss`, and optionally `acc` (if accuracy monitoring is enabled).
-
---
-
-## Available callbacks
-
-```python
-keras.callbacks.ModelCheckpoint(filepath, verbose=0, save_best_only=False)
-```
-
-Save the model after every epoch. If `save_best_only=True`, the latest best model according to the validation loss will not be overwritten.
-`filepath` can contain named formatting options, which will be filled the value of `epoch` and keys in `logs` (passed in `on_epoch_end`).
-
-For example: if `filepath` is `weights.{epoch:02d}-{val_loss:.2f}.hdf5`, then multiple files will be save with the epoch number and the validation loss.
-
-
-```python
-keras.callbacks.EarlyStopping(monitor='val_loss', patience=0, verbose=0)
-```
-
-Stop training after no improvement of the metric `monitor` is seen for `patience` epochs.
+{{autogenerated}}

 ---


-## Create a callback
+# Create a callback

 You can create a custom callback by extending the base class `keras.callbacks.Callback`. A callback has access to its associated model through the class property `self.model`.

--- a/docs/templates/constraints.md
+++ b/docs/templates/constraints.md
--- a/docs/templates/datasets.md
+++ b/docs/templates/datasets.md
--- a/docs/templates/documentation.md
+++ b/docs/templates/documentation.md
--- a/docs/templates/examples.md
+++ b/docs/templates/examples.md
--- a/docs/templates/faq.md
+++ b/docs/templates/faq.md
--- a/docs/templates/index.md
+++ b/docs/templates/index.md
--- a/docs/templates/initializations.md
+++ b/docs/templates/initializations.md
--- a/docs/templates/models.md
+++ b/docs/templates/models.md
@ -0,0 +1,114 @@
+Keras has two models: __Sequential__, a linear stack of layers, and __Graph__, a directed acyclic graph of layers.
+
+# Using the Sequential model
+
+```python
+from keras.models import Sequential
+from keras.layers.core import Dense, Dropout, Activation
+from keras.optimizers import SGD
+
+model = Sequential()
+model.add(Dense(2, init='uniform', input_dim=64))
+model.add(Activation('softmax'))
+
+model.compile(optimizer='sgd', loss='mse')
+
+'''
+Train the model for 3 epochs, in batches of 16 samples,
+on data stored in the Numpy array X_train,
+and labels stored in the Numpy array y_train:
+'''
+model.fit(X_train, y_train, nb_epoch=3, batch_size=16, verbose=1)
+'''
+What you will see with mode verbose=1:
+Train on 37800 samples, validate on 4200 samples
+Epoch 0
+37800/37800 [==============================] - 7s - loss: 0.0385
+Epoch 1
+37800/37800 [==============================] - 8s - loss: 0.0140
+Epoch 2
+10960/37800 [=======>......................] - ETA: 4s - loss: 0.0109
+'''
+
+model.fit(X_train, y_train, nb_epoch=3, batch_size=16, verbose=2)
+'''
+What you will see with mode verbose=2:
+Train on 37800 samples, validate on 4200 samples
+Epoch 0
+loss: 0.0190
+Epoch 1
+loss: 0.0146
+Epoch 2
+loss: 0.0049
+'''
+
+'''
+Demonstration of the show_accuracy argument
+'''
+model.fit(X_train, y_train, nb_epoch=3, batch_size=16, verbose=2, show_accuracy=True)
+'''
+Train on 37800 samples, validate on 4200 samples
+Epoch 0
+loss: 0.0190 - acc.: 0.8750
+Epoch 1
+loss: 0.0146 - acc.: 0.8750
+Epoch 2
+loss: 0.0049 - acc.: 1.0000
+'''
+
+'''
+Demonstration of the validation_split argument
+'''
+model.fit(X_train, y_train, nb_epoch=3, batch_size=16,
+          validation_split=0.1, show_accuracy=True, verbose=1)
+'''
+Train on 37800 samples, validate on 4200 samples
+Epoch 0
+37800/37800 [==============================] - 7s - loss: 0.0385 - acc.: 0.7258 - val. loss: 0.0160 - val. acc.: 0.9136
+Epoch 1
+37800/37800 [==============================] - 8s - loss: 0.0140 - acc.: 0.9265 - val. loss: 0.0109 - val. acc.: 0.9383
+Epoch 2
+10960/37800 [=======>......................] - ETA: 4s - loss: 0.0109 - acc.: 0.9420
+'''
+```
+
+# Using the Graph model
+
+```python
+# graph model with one input and two outputs
+graph = Graph()
+graph.add_input(name='input', input_shape=(32,))
+graph.add_node(Dense(16), name='dense1', input='input')
+graph.add_node(Dense(4), name='dense2', input='input')
+graph.add_node(Dense(4), name='dense3', input='dense1')
+graph.add_output(name='output1', input='dense2')
+graph.add_output(name='output2', input='dense3')
+
+graph.compile(optimizer='rmsprop', loss={'output1':'mse', 'output2':'mse'})
+history = graph.fit({'input':X_train, 'output1':y_train, 'output2':y2_train}, nb_epoch=10)
+
+```
+
+```python
+# graph model with two inputs and one output
+graph = Graph()
+graph.add_input(name='input1', input_shape=(32,))
+graph.add_input(name='input2', input_shape=(32,))
+graph.add_node(Dense(16), name='dense1', input='input1')
+graph.add_node(Dense(4), name='dense2', input='input2')
+graph.add_node(Dense(4), name='dense3', input='dense1')
+graph.add_output(name='output', inputs=['dense2', 'dense3'], merge_mode='sum')
+graph.compile(optimizer='rmsprop', loss={'output':'mse'})
+
+history = graph.fit({'input1':X_train, 'input2':X2_train, 'output':y_train}, nb_epoch=10)
+predictions = graph.predict({'input1':X_test, 'input2':X2_test}) # {'output':...}
+
+```
+
+----
+
+# Model API documentation
+
+
+
+{{autogenerated}}
--- a/docs/templates/objectives.md
+++ b/docs/templates/objectives.md
--- a/docs/templates/optimizers.md
+++ b/docs/templates/optimizers.md
@ -0,0 +1,25 @@
+
+## Usage of optimizers
+
+An optimizer is one of the two arguments required for compiling a Keras model:
+
+```python
+model = Sequential()
+model.add(Dense(64, init='uniform', input_dim=10))
+model.add(Activation('tanh'))
+model.add(Activation('softmax'))
+
+sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
+model.compile(loss='mean_squared_error', optimizer=sgd)
+```
+
+You can either instantiate an optimizer before passing it to `model.compile()` , as in the above example, or you can call it by its name. In the latter case, the default parameters for the optimizer will be used.
+
+```python
+# pass optimizer by name: default parameters will be used
+model.compile(loss='mean_squared_error', optimizer='sgd')
+```
+
+---
+
+{{autogenerated}}
--- a/docs/templates/preprocessing/image.md
+++ b/docs/templates/preprocessing/image.md
--- a/docs/templates/preprocessing/sequence.md
+++ b/docs/templates/preprocessing/sequence.md
--- a/docs/templates/preprocessing/text.md
+++ b/docs/templates/preprocessing/text.md
--- a/docs/templates/regularizers.md
+++ b/docs/templates/regularizers.md
--- a/docs/templates/visualization.md
+++ b/docs/templates/visualization.md