keras/examples/imdb_cnn.py

'''This example demonstrates the use of Convolution1D for text classification.

Gets to 0.89 test accuracy after 2 epochs.
90s/epoch on Intel i5 2.4Ghz CPU.
10s/epoch on Tesla K40 GPU.

'''

from __future__ import print_function

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import Conv1D, GlobalMaxPooling1D
from keras.datasets import imdb

# set parameters:
max_features = 5000
maxlen = 400
batch_size = 32
embedding_dims = 50
filters = 250
kernel_size = 3
hidden_dims = 250
epochs = 2

print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

print('Build model...')
model = Sequential()

# we start off with an efficient embedding layer which maps
# our vocab indices into embedding_dims dimensions
model.add(Embedding(max_features,
                    embedding_dims,
                    input_length=maxlen))
model.add(Dropout(0.2))

# we add a Convolution1D, which will learn filters
# word group filters of size filter_length:
model.add(Conv1D(filters,
                 kernel_size,
                 padding='valid',
                 activation='relu',
                 strides=1))
# we use max pooling:
model.add(GlobalMaxPooling1D())

# We add a vanilla hidden layer:
model.add(Dense(hidden_dims))
model.add(Dropout(0.2))
model.add(Activation('relu'))

# We project onto a single unit output layer, and squash it with a sigmoid:
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          validation_data=(x_test, y_test))
Cleanup examples 2015-12-09 02:49:14 +00:00			`'''This example demonstrates the use of Convolution1D for text classification.`

Update docs. 2016-08-01 00:45:32 +00:00			`Gets to 0.89 test accuracy after 2 epochs.`
Max Over Time in imdb_cnn.py (#2320) * Max Over Time in imdb_cnn.py Following this issue https://github.com/fchollet/keras/issues/2296 i propose this PR. The mayor optimisation a part of the Max over time are: - Dropout in the Embedding layer. - Longer input sequences (400 instead of 100), made possible from the speedup of the Max Over Time. - Adam optimizer. Overall it takes 90 to 100 sec per epoch on my laptop CPU and in two epochs it reaches 0.885 accuracy that is a 5 points improvement over the previous implementation. Moreover it requires less memory (300k parameters vs 3M+) since the number of parameters do not depend by the length of the input sequence anymore. * Update imdb_cnn.py 2016-04-14 20:22:06 +00:00			`90s/epoch on Intel i5 2.4Ghz CPU.`
			`10s/epoch on Tesla K40 GPU.`

Cleanup examples 2015-12-09 02:49:14 +00:00			`'''`

added conv1D example 2015-07-14 20:34:05 +00:00			`from __future__ import print_function`

			`from keras.preprocessing import sequence`
			`from keras.models import Sequential`
Update imdb_cnn.py to use GlobalMaxPooling1D (#4164) 2016-10-24 16:25:08 +00:00			`from keras.layers import Dense, Dropout, Activation`
Normalize layer imports in examples 2016-05-12 01:45:37 +00:00			`from keras.layers import Embedding`
Update several examples to work with the new API (#5548) * Update mnist_transfer_cnn for new API * Update mnist_siamese_graph.py for new API * Refactor example a little bit for clarity * Update mnist_irnn.py for new API * Fix variable name * Update mnist_heirarchial_rnn.py for new api * Fix a few api calls i missed * Update mnist_acgan.py for new API * Fix variable name * Update imdb_cnn for new API * Update benchmark.py to work with new API * PEP8 fix * Change filter_length to kernel_size * Update imdb_cnn_lstm.py for new API * PEP8 indentation fix 2017-02-28 02:53:41 +00:00			`from keras.layers import Conv1D, GlobalMaxPooling1D`
added conv1D example 2015-07-14 20:34:05 +00:00			`from keras.datasets import imdb`

			`# set parameters:`
			`max_features = 5000`
Max Over Time in imdb_cnn.py (#2320) * Max Over Time in imdb_cnn.py Following this issue https://github.com/fchollet/keras/issues/2296 i propose this PR. The mayor optimisation a part of the Max over time are: - Dropout in the Embedding layer. - Longer input sequences (400 instead of 100), made possible from the speedup of the Max Over Time. - Adam optimizer. Overall it takes 90 to 100 sec per epoch on my laptop CPU and in two epochs it reaches 0.885 accuracy that is a 5 points improvement over the previous implementation. Moreover it requires less memory (300k parameters vs 3M+) since the number of parameters do not depend by the length of the input sequence anymore. * Update imdb_cnn.py 2016-04-14 20:22:06 +00:00			`maxlen = 400`
Revise IMDB conv1d example 2015-07-15 03:35:28 +00:00			`batch_size = 32`
Max Over Time in imdb_cnn.py (#2320) * Max Over Time in imdb_cnn.py Following this issue https://github.com/fchollet/keras/issues/2296 i propose this PR. The mayor optimisation a part of the Max over time are: - Dropout in the Embedding layer. - Longer input sequences (400 instead of 100), made possible from the speedup of the Max Over Time. - Adam optimizer. Overall it takes 90 to 100 sec per epoch on my laptop CPU and in two epochs it reaches 0.885 accuracy that is a 5 points improvement over the previous implementation. Moreover it requires less memory (300k parameters vs 3M+) since the number of parameters do not depend by the length of the input sequence anymore. * Update imdb_cnn.py 2016-04-14 20:22:06 +00:00			`embedding_dims = 50`
Integration tests passing. 2017-02-15 00:08:30 +00:00			`filters = 250`
Update several examples to work with the new API (#5548) * Update mnist_transfer_cnn for new API * Update mnist_siamese_graph.py for new API * Refactor example a little bit for clarity * Update mnist_irnn.py for new API * Fix variable name * Update mnist_heirarchial_rnn.py for new api * Fix a few api calls i missed * Update mnist_acgan.py for new API * Fix variable name * Update imdb_cnn for new API * Update benchmark.py to work with new API * PEP8 fix * Change filter_length to kernel_size * Update imdb_cnn_lstm.py for new API * PEP8 indentation fix 2017-02-28 02:53:41 +00:00			`kernel_size = 3`
added conv1D example 2015-07-14 20:34:05 +00:00			`hidden_dims = 250`
Integration tests passing. 2017-02-15 00:08:30 +00:00			`epochs = 2`
added conv1D example 2015-07-14 20:34:05 +00:00
Cleanup examples 2015-12-09 02:49:14 +00:00			`print('Loading data...')`
Finish updating examples. 2017-03-12 03:44:29 +00:00			`(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)`
			`print(len(x_train), 'train sequences')`
			`print(len(x_test), 'test sequences')`
added conv1D example 2015-07-14 20:34:05 +00:00
Cleanup examples 2015-12-09 02:49:14 +00:00			`print('Pad sequences (samples x time)')`
Finish updating examples. 2017-03-12 03:44:29 +00:00			`x_train = sequence.pad_sequences(x_train, maxlen=maxlen)`
			`x_test = sequence.pad_sequences(x_test, maxlen=maxlen)`
			`print('x_train shape:', x_train.shape)`
			`print('x_test shape:', x_test.shape)`
added conv1D example 2015-07-14 20:34:05 +00:00
			`print('Build model...')`
			`model = Sequential()`

			`# we start off with an efficient embedding layer which maps`
			`# our vocab indices into embedding_dims dimensions`
Max Over Time in imdb_cnn.py (#2320) * Max Over Time in imdb_cnn.py Following this issue https://github.com/fchollet/keras/issues/2296 i propose this PR. The mayor optimisation a part of the Max over time are: - Dropout in the Embedding layer. - Longer input sequences (400 instead of 100), made possible from the speedup of the Max Over Time. - Adam optimizer. Overall it takes 90 to 100 sec per epoch on my laptop CPU and in two epochs it reaches 0.885 accuracy that is a 5 points improvement over the previous implementation. Moreover it requires less memory (300k parameters vs 3M+) since the number of parameters do not depend by the length of the input sequence anymore. * Update imdb_cnn.py 2016-04-14 20:22:06 +00:00			`model.add(Embedding(max_features,`
			`embedding_dims,`
Update several examples to work with the new API (#5548) * Update mnist_transfer_cnn for new API * Update mnist_siamese_graph.py for new API * Refactor example a little bit for clarity * Update mnist_irnn.py for new API * Fix variable name * Update mnist_heirarchial_rnn.py for new api * Fix a few api calls i missed * Update mnist_acgan.py for new API * Fix variable name * Update imdb_cnn for new API * Update benchmark.py to work with new API * PEP8 fix * Change filter_length to kernel_size * Update imdb_cnn_lstm.py for new API * PEP8 indentation fix 2017-02-28 02:53:41 +00:00			`input_length=maxlen))`
			`model.add(Dropout(0.2))`
added conv1D example 2015-07-14 20:34:05 +00:00
Integration tests passing. 2017-02-15 00:08:30 +00:00			`# we add a Convolution1D, which will learn filters`
added conv1D example 2015-07-14 20:34:05 +00:00			`# word group filters of size filter_length:`
Update several examples to work with the new API (#5548) * Update mnist_transfer_cnn for new API * Update mnist_siamese_graph.py for new API * Refactor example a little bit for clarity * Update mnist_irnn.py for new API * Fix variable name * Update mnist_heirarchial_rnn.py for new api * Fix a few api calls i missed * Update mnist_acgan.py for new API * Fix variable name * Update imdb_cnn for new API * Update benchmark.py to work with new API * PEP8 fix * Change filter_length to kernel_size * Update imdb_cnn_lstm.py for new API * PEP8 indentation fix 2017-02-28 02:53:41 +00:00			`model.add(Conv1D(filters,`
			`kernel_size,`
			`padding='valid',`
			`activation='relu',`
			`strides=1))`
Fix broken imdb_cnn example (#3244) * Fix broken imdb_cnn example * Update imdb_cnn fix 2016-07-19 19:18:59 +00:00			`# we use max pooling:`
Update imdb_cnn.py to use GlobalMaxPooling1D (#4164) 2016-10-24 16:25:08 +00:00			`model.add(GlobalMaxPooling1D())`
added conv1D example 2015-07-14 20:34:05 +00:00
			`# We add a vanilla hidden layer:`
Update all examples with new API 2015-10-05 01:44:49 +00:00			`model.add(Dense(hidden_dims))`
Max Over Time in imdb_cnn.py (#2320) * Max Over Time in imdb_cnn.py Following this issue https://github.com/fchollet/keras/issues/2296 i propose this PR. The mayor optimisation a part of the Max over time are: - Dropout in the Embedding layer. - Longer input sequences (400 instead of 100), made possible from the speedup of the Max Over Time. - Adam optimizer. Overall it takes 90 to 100 sec per epoch on my laptop CPU and in two epochs it reaches 0.885 accuracy that is a 5 points improvement over the previous implementation. Moreover it requires less memory (300k parameters vs 3M+) since the number of parameters do not depend by the length of the input sequence anymore. * Update imdb_cnn.py 2016-04-14 20:22:06 +00:00			`model.add(Dropout(0.2))`
added conv1D example 2015-07-14 20:34:05 +00:00			`model.add(Activation('relu'))`

			`# We project onto a single unit output layer, and squash it with a sigmoid:`
Update all examples with new API 2015-10-05 01:44:49 +00:00			`model.add(Dense(1))`
added conv1D example 2015-07-14 20:34:05 +00:00			`model.add(Activation('sigmoid'))`

Update examples. 2015-11-29 00:34:52 +00:00			`model.compile(loss='binary_crossentropy',`
Max Over Time in imdb_cnn.py (#2320) * Max Over Time in imdb_cnn.py Following this issue https://github.com/fchollet/keras/issues/2296 i propose this PR. The mayor optimisation a part of the Max over time are: - Dropout in the Embedding layer. - Longer input sequences (400 instead of 100), made possible from the speedup of the Max Over Time. - Adam optimizer. Overall it takes 90 to 100 sec per epoch on my laptop CPU and in two epochs it reaches 0.885 accuracy that is a 5 points improvement over the previous implementation. Moreover it requires less memory (300k parameters vs 3M+) since the number of parameters do not depend by the length of the input sequence anymore. * Update imdb_cnn.py 2016-04-14 20:22:06 +00:00			`optimizer='adam',`
Keras 1.0 preview. 2016-03-19 16:07:15 +00:00			`metrics=['accuracy'])`
Finish updating examples. 2017-03-12 03:44:29 +00:00			`model.fit(x_train, y_train,`
Keras 1.0 preview. 2016-03-19 16:07:15 +00:00			`batch_size=batch_size,`
Integration tests passing. 2017-02-15 00:08:30 +00:00			`epochs=epochs,`
Finish updating examples. 2017-03-12 03:44:29 +00:00			`validation_data=(x_test, y_test))`