keras/docs/sources/optimizers.md

118 lines
2.7 KiB
Markdown
Raw Normal View History

2015-04-06 19:01:00 +00:00
## Usage of optimizers
An optimizer is one of the two arguments required for compiling a Keras model:
```python
model = Sequential()
model.add(Dense(20, 64, init='uniform'))
model.add(Activation('tanh'))
model.add(Activation('softmax'))
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)
```
You can either instantiate an optimizer before passing it to `model.compile()` , as in the above example, or you can call it by its name. In the latter case, the default parameters for the optimizer will be used.
```python
# pass optimizer by name: default parameters will be used
model.compile(loss='mean_squared_error', optimizer='sgd')
```
2015-04-10 00:41:48 +00:00
---
## Base class
2015-04-09 01:49:36 +00:00
```python
keras.optimizers.Optimizer(**kwargs)
```
2015-04-06 19:01:00 +00:00
All optimizers descended from this class support the following keyword argument:
2015-04-06 19:01:00 +00:00
- __clipnorm__: float >= 0.
Note: this is base class for building optimizers, not an actual optimizer that can be used for training models.
2015-04-10 00:41:48 +00:00
---
## SGD
2015-04-09 01:49:36 +00:00
```python
keras.optimizers.SGD(lr=0.01, momentum=0., decay=0., nesterov=False)
```
2015-04-06 19:01:00 +00:00
2015-04-10 00:41:48 +00:00
__Arguments__:
2015-04-06 19:01:00 +00:00
- __lr__: float >= 0. Learning rate.
- __momentum__: float >= 0. Parameter updates momentum.
- __decay__: float >= 0. Learning rate decay over each update.
- __nesterov__: boolean. Whether to apply Nesterov momentum.
2015-04-10 00:41:48 +00:00
---
## Adagrad
2015-04-09 01:49:36 +00:00
```python
keras.optimizers.Adagrad(lr=0.01, epsilon=1e-6)
```
2015-04-06 19:01:00 +00:00
It is recommended to leave the parameters of this optimizer at their default values.
2015-04-06 19:01:00 +00:00
2015-04-10 00:41:48 +00:00
__Arguments__:
2015-04-06 19:01:00 +00:00
- __lr__: float >= 0. Learning rate.
- __epsilon__: float >= 0.
2015-04-10 00:41:48 +00:00
---
## Adadelta
2015-04-09 01:49:36 +00:00
```python
keras.optimizers.Adadelta(lr=1.0, rho=0.95, epsilon=1e-6)
```
2015-04-06 19:01:00 +00:00
It is recommended to leave the parameters of this optimizer at their default values.
2015-04-06 19:01:00 +00:00
2015-04-10 00:41:48 +00:00
__Arguments__:
2015-04-06 19:01:00 +00:00
- __lr__: float >= 0. Learning rate. It is recommended to leave it at the default value.
- __rho__: float >= 0.
- __epsilon__: float >= 0. Fuzz factor.
For more info, see *"Adadelta: an adaptive learning rate method"* by Matthew Zeiler.
2015-04-10 00:41:48 +00:00
---
2015-04-11 22:39:44 +00:00
## RMSprop
2015-04-09 01:49:36 +00:00
```python
keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-6)
```
2015-04-06 19:01:00 +00:00
It is recommended to leave the parameters of this optimizer at their default values.
2015-04-06 19:01:00 +00:00
2015-04-10 00:41:48 +00:00
__Arguments__:
2015-04-06 19:01:00 +00:00
- __lr__: float >= 0. Learning rate.
- __rho__: float >= 0.
2015-04-14 23:57:18 +00:00
- __epsilon__: float >= 0. Fuzz factor.
---
## Adam
```python
keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
2015-04-14 23:57:18 +00:00
```
Adam optimizer, proposed by Kingma and Lei Ba in [Adam: A Method For Stochastic Optimization](http://arxiv.org/pdf/1412.6980v8.pdf). Default parameters are those suggested in the paper.
2015-04-14 23:57:18 +00:00
__Arguments__:
- __lr__: float >= 0. Learning rate.
2015-04-14 23:57:18 +00:00
- __beta_1__, __beta_2__: floats, 0 < beta < 1. Generally close to 1.
- __epsilon__: float >= 0. Fuzz factor.
---