Note
Two similar implementation exists for conv2d:
signal.conv2d and nnet.conv2d.
The former implements a traditional 2D convolution, while the latter implements the convolutional layers present in convolutional neural networks (where filters are 3D and pool over several input channels).
Note
As of October 21st, 2014, the default GPU image convolution changed: By default, if cuDNN is available, we will use it, otherwise we will fall back to using the gemm version (slower then cuDNN in most cases and uses more memory).
Both cuDNN and the gemm version can be disabled using the Theano flags optimizer_excluding=conv_dnn and optimizer_excluding=conv_gemm, respectively. In this case, we will fall back to using the legacy convolution code, which is slower, but does not require extra memory. To verify that cuDNN is used, you can supply the Theano flag optimizer_including=cudnn. This will raise an error if cuDNN is unavailable.
It is not advised to ever disable cuDNN, as this is usually the fastest option. Disabling the gemm version is only useful if cuDNN is unavailable and you run out of GPU memory.
There are two other implementations: An FFT-based convolution integrated into Theano, and an implementation by Alex Krizhevsky available via Pylearn2. See the documentation below on how to use them.
As of November 24th, 2014, you can also use a meta-optimizer to automatically choose the fastest implementation for each specific convolution in your graph. For each instance, it will compile and benchmark each applicable implementation of the ones listed below and choose the fastest one. As performance is dependent on input and filter shapes, this only works for operations introduced via nnet.conv2d with fully specified shape information. Enable it via the Theano flag optimizer_including=conv_meta, and optionally set it to verbose mode via the flag metaopt.verbose=1.
TODO: Give examples on how to use these things! They are pretty complicated.
nnet.conv2d. This is the standard operator for convolutional neural networks working with batches of multi-channel 2D images, available for CPU and GPU. It computes a convolution, i.e., it flips the kernel. Most of the more efficient GPU implementations listed below can be inserted automatically as a replacement for nnet.conv2d via graph optimizations. Some of these graph optimizations are enabled by default, others can be enabled via Theano flags.
conv2d_fft This is a GPU-only version of nnet.conv2d that uses an FFT transform to perform the work. It flips the kernel just like conv2d. conv2d_fft should not be used directly as it does not provide a gradient. Instead, use nnet.conv2d and allow Theano’s graph optimizer to replace it by the FFT version by setting ‘THEANO_FLAGS=optimizer_including=conv_fft’ in your environment. If enabled, it will take precedence over cuDNN and the gemm version. It is not enabled by default because it has some restrictions on input and uses a lot more memory. Also note that it requires CUDA >= 5.0, scikits.cuda >= 0.5.0 and PyCUDA to run. To deactivate the FFT optimization on a specific nnet.conv2d while the optimization flag is active, you can set its version parameter to 'no_fft'. To enable it for just one Theano function:
mode = theano.compile.get_default_mode()
mode = mode.including('conv_fft')
f = theano.function(..., mode=mode)
cuda-convnet wrapper for 2d correlation
Wrapper for an open-source GPU-only implementation of conv2d by Alex Krizhevsky, very fast, but with several restrictions on input and kernel shapes, and with a different memory layout for the input. It does not flip the kernel.
This is in Pylearn2, where it is normally called from the linear transform implementation, but it can also be used directly from within Theano as a manual replacement for nnet.conv2d.
GpuCorrMM This is a GPU-only 2d correlation implementation taken from caffe and also used by Torch. It does not flip the kernel.
For each element in a batch, it first creates a Toeplitz matrix in a CUDA kernel. Then, it performs a gemm call to multiply this Toeplitz matrix and the filters (hence the name: MM is for matrix multiplication). It needs extra memory for the Toeplitz matrix, which is a 2D matrix of shape (no of channels * filter width * filter height, output width * output height).
As it provides a gradient, you can use it as a replacement for nnet.conv2d. But usually, you will just use nnet.conv2d and allow Theano’s graph optimizer to automatically replace it by the GEMM version if cuDNN is not available. To explicitly disable the graph optimizer, set THEANO_FLAGS=optimizer_excluding=conv_gemm in your environment. If using it, please see the warning about a bug in CUDA 5.0 to 6.0 below.
dnn_conv GPU-only convolution using NVIDIA’s cuDNN library. This requires that you have cuDNN installed and available, which in turn requires CUDA 6.5 and a GPU with compute capability 3.0 or more.
If cuDNN is available, by default, Theano will replace all nnet.conv2d operations with dnn_conv. To explicitly disable it, set THEANO_FLAGS=optimizer_excluding=conv_dnn in your environment. As dnn_conv has a gradient defined, you can also use it manually.
conv3D 3D Convolution applying multi-channel 3D filters to batches of multi-channel 3D images. It does not flip the kernel.
conv3d_fft GPU-only version of conv3D using FFT transform. conv3d_fft should not be called directly as it does not provide a gradient. Instead, use conv3D and allow Theano’s graph optimizer to replace it by the FFT version by setting THEANO_FLAGS=optimizer_including=conv3d_fft:convgrad3d_fft:convtransp3d_fft in your environment. This is not enabled by default because it does not support strides and uses more memory. Also note that it requires CUDA >= 5.0, scikits.cuda >= 0.5.0 and PyCUDA to run. To enable for just one Theano function:
mode = theano.compile.get_default_mode()
mode = mode.including('conv3d_fft', 'convgrad3d_fft', 'convtransp3d_fft')
f = theano.function(..., mode=mode)
GpuCorr3dMM This is a GPU-only 3d correlation relying on a Toeplitz matrix and gemm implementation (see GpuCorrMM) It needs extra memory for the Toeplitz matrix, which is a 2D matrix of shape (no of channels * filter width * filter height * filter depth, output width * output height * output depth). As it provides a gradient, you can use it as a replacement for nnet.conv3d. Alternatively, you can use nnet.conv3d and allow Theano’s graph optimizer to replace it by the GEMM version by setting THEANO_FLAGS=optimizer_including=conv3d_gemm:convgrad3d_gemm:convtransp3d_gemm in your environment. This is not enabled by default because it uses some extra memory, but the overhead is small compared to conv3d_fft, there are no restrictions on input or kernel shapes and strides are supported. If using it, please see the warning about a bug in CUDA 5.0 to 6.0 in GpuCorrMM.
conv3d2d Another conv3d implementation that uses the conv2d with data reshaping. It is faster in some cases than conv3d, and work on the GPU. It flip the kernel.
This function will build the symbolic graph for convolving a stack of input images with a set of filters. The implementation is modelled after Convolutional Neural Networks (CNN). It is simply a wrapper to the ConvOp but provides a much cleaner interface.
Parameters: |
|
---|---|
Return type: | symbolic 4D tensor |
Returns: | set of feature maps generated by convolutional layer. Tensor is of shape (batch size, nb filters, output row, output col) |
Perform a convolution through fft.
Only support input which will be even on the last dimension (width). All other dimensions can be anything and the filters can have an even or odd width.
If you must use input which has an odd width, you can either pad it or use the pad_last_dim argument which will do it for you and take care to strip the padding before returning. Don’t use this argument if you are not sure the input is odd since the padding is unconditional and will make even input odd, thus leading to problems.
On valid mode the filters must be smaller than the input.
input: (b, ic, i0, i1) filters: (oc, ic, f0, f1)
border_mode: ‘valid’ of ‘full’
3D “convolution” of multiple filters on a minibatch (does not flip the kernel, moves kernel with a user specified stride)
Parameters: |
|
---|---|
Note: | The order of dimensions does not correspond to the one in conv2d. This is for optimization. |
Note: | The GPU implementation is very slow. You should use conv3d2d or conv3d_fft for a GPU graph instead. |
See: | Someone made a script that shows how to swap the axes between both 3d convolution implementations in Theano. See the last attachment. |
Perform a convolution through fft.
Only supports input whose shape is even on the last dimension. All other dimensions can be anything and the filters can have an even or odd last dimension.
The semantics associated with the last three dimensions are not important as long as they are in the same order between the inputs and the filters. For example, when the convolution is done on a sequence of images, they could be either (duration, height, width) or (height, width, duration).
If you must use input which has an odd width, you can either pad it or use the pad_last_dim argument which will do it for you and take care to strip the padding before returning. pad_last_dim checks that the last dimension is odd before the actual paddding
On valid mode the filters must be smaller than the input.
input: (b, ic, i0, i1, i2) filters: (oc, ic, f0, f1, i2)
border_mode: ‘valid’ of ‘full’
Convolve spatio-temporal filters with a movie.
It flips the filters.
Parameters: |
|
---|---|
Note: | Another way to define signals: (batch, time, in channel, row, column) Another way to define filters: (out channel,time,in channel, row, column) |
Note: | For the GPU, you can use this implementation or conv3d_fft. |
See: | Someone made a script that shows how to swap the axes between both 3d convolution implementations in Theano. See the last attachment. |