Setup docker for Kaggle

3 min readMay 1, 2019

Here are some notes on setting up docker for Kaggle (especially on installing and enabling nbextensions). I had to do this from time to time and wanted to write the steps down for the record. I put it here in case it’s useful for someone else.

Why use docker for Kaggle

Kaggle is a good place to learn machine learning and data science. I think its docker image is a good option for data science development environment for two scenarios:

if I want to do local experiments for Kaggle’s kernel-only competitions, this is exactly the same environment of Kaggle kernels; and it is can be kept up-to-date by simply rebuilding the latest images.
for everything else, the list of packages included in the image is built by the Kaggle community and therefore includes most useful tools that I know or don’t know about; this is much easier than building your own list of packages.

There are two images, CPU-only and GPU. The first is sufficient if you don’t have or need a GPU.

First, install docker if not already.

CPU-only image

It is easier than GPU partly because there is already an image stored on Google Container Registry and I can simply install the image by typing from the terminal.

docker pull kaggle/python

Put the following in .bash_profile and run source .bash_profile

kjupyter() {
    docker run -v $PWD:/tmp/working -w=/tmp/working -p 8888:8888 --rm -it kaggle/python bash -c "pip install jupyter_contrib_nbextensions; pip install jupyter_nbextensions_configurator; jupyter contrib nbextension install --user; jupyter notebook --notebook-dir=/tmp/working --ip='*' --port=8888 --no-browser --allow-root"
}

This command basically setups file directories and port forwarding, and runs jupyter notebook from within docker. (There are more details about the docker-run command on this blog post from Kaggle.) Now I can run kjupyter from the terminal and go to http://localhost:8888/ for jupyter notebook.

About nbextensions

One issue I had with Kaggle’s docker image is that it does not have nbextensions out of the box, and these extensions (e.g., table of content) are very useful in the notebook environment. These lines in the definition of kjpyter above solved that

pip install jupyter_contrib_nbextensions; pip install jupyter_nbextensions_configurator; jupyter contrib nbextension install --user;

GPU image

Unlike CPU image, this does not have an image in the repo so you can’t run docker pull like before. Instead, follow the instructions here, clone the git repository of Kaggle docker by running

git clone https://github.com/Kaggle/docker-python.git

Under the project root docker-python/, run ./build --gpu.

Like before, put the following in .bash_profile and run source .bash_profile

kjupyter() {
    docker run  --runtime=nvidia -v $PWD:/tmp/working -v     /data:/tmp/working/data -w=/tmp/working -p 8888:8888 --rm -it kaggle/python-gpu-build bash -c "export LD_LIBRARY_PATH=/usr/local/cuda/lib64; pip install jupyter_contrib_nbextensions; pip install jupyter_nbextensions_configurator; jupyter contrib nbextension install --user; jupyter notebook --notebook-dir=/tmp/working --ip='*' --port=8888 --no-browser --allow-root"
}

This line in the above definition of kjupyter solves a problem with using GPU. (This link has some background.)

export LD_LIBRARY_PATH=/usr/local/cuda/lib64

Now you can run kjupyter from the terminal and go to http://localhost:8888/ for jupyter notebook. Nbextensions are taken care of in a similar way as the CPU version.

Update-1: an example where I enable some extension by default, notice the additional jupyter nbextension enable toc2/main --user in the defintion below:

kjupyter() {    docker run --runtime=nvidia -v $PWD:/tmp/working -v /data:/tmp/working/data -w=/tmp/working -p 8888:8888 — rm -it kaggle/python-gpu-build bash -c “export LD_LIBRARY_PATH=/usr/local/cuda/lib64; pip install jupyter_contrib_nbextensions; pip install jupyter_nbextensions_configurator; jupyter contrib nbextension instal --user; jupyter nbextension enable toc2/main --user; jupyter notebook — notebook-dir=/tmp/working — ip=’*’ — port=8888 — no-browser — allow-root”}

Update-2: under some recent versions of conda the notebook fails to connect to the kernel and hangs there. I’ve found the solution described here works: downgrade tornado by pip install tornado==4.5.3:

kjupyter() {
    docker run  --runtime=nvidia -v $PWD:/tmp/working -v /data:/tmp/working/data -w=/tmp/working -p 8888:8888 --rm -it kaggle/python-gpu-build bash -c "pip install tornado==4.5.3; export LD_LIBRARY_PATH=/usr/local/cuda/lib64; pip install jupyter_contrib_nbextensions; pip install jupyter_nbextensions_configurator; jupyter contrib nbextension install --user; jupyter nbextension enable toc2/main --user; jupyter notebook --notebook-dir=/tmp/working --ip='*' --port=8888 --no-browser --allow-root"
}

Setup docker for Kaggle

Why use docker for Kaggle

CPU-only image

About nbextensions

GPU image

References

Written by Yang Zhang

No responses yet