Setup docker for Kaggle
Here are some notes on setting up docker for Kaggle (especially on installing and enabling nbextensions). I had to do this from time to time and wanted to write the steps down for the record. I put it here in case it’s useful for someone else.
Why use docker for Kaggle
Kaggle is a good place to learn machine learning and data science. I think its docker image is a good option for data science development environment for two scenarios:
- if I want to do local experiments for Kaggle’s kernel-only competitions, this is exactly the same environment of Kaggle kernels; and it is can be kept up-to-date by simply rebuilding the latest images.
- for everything else, the list of packages included in the image is built by the Kaggle community and therefore includes most useful tools that I know or don’t know about; this is much easier than building your own list of packages.
There are two images, CPU-only and GPU. The first is sufficient if you don’t have or need a GPU.
First, install docker if not already.
CPU-only image
It is easier than GPU partly because there is already an image stored on Google Container Registry and I can simply install the image by typing from the terminal.
docker pull kaggle/python
Put the following in .bash_profile
and run source .bash_profile
kjupyter() {
docker run -v $PWD:/tmp/working -w=/tmp/working -p 8888:8888 --rm -it kaggle/python bash -c "pip install jupyter_contrib_nbextensions; pip install jupyter_nbextensions_configurator; jupyter contrib nbextension install --user; jupyter notebook --notebook-dir=/tmp/working --ip='*' --port=8888 --no-browser --allow-root"
}
This command basically setups file directories and port forwarding, and runs jupyter notebook from within docker. (There are more details about the docker-run command on this blog post from Kaggle.) Now I can run kjupyter
from the terminal and go to http://localhost:8888/
for jupyter notebook.
About nbextensions
One issue I had with Kaggle’s docker image is that it does not have nbextensions out of the box, and these extensions (e.g., table of content) are very useful in the notebook environment. These lines in the definition of kjpyter
above solved that
pip install jupyter_contrib_nbextensions; pip install jupyter_nbextensions_configurator; jupyter contrib nbextension install --user;
GPU image
Unlike CPU image, this does not have an image in the repo so you can’t run docker pull
like before. Instead, follow the instructions here, clone the git repository of Kaggle docker by running
git clone https://github.com/Kaggle/docker-python.git
Under the project root docker-python/,
run ./build --gpu
.
Like before, put the following in .bash_profile
and run source .bash_profile
kjupyter() {
docker run --runtime=nvidia -v $PWD:/tmp/working -v /data:/tmp/working/data -w=/tmp/working -p 8888:8888 --rm -it kaggle/python-gpu-build bash -c "export LD_LIBRARY_PATH=/usr/local/cuda/lib64; pip install jupyter_contrib_nbextensions; pip install jupyter_nbextensions_configurator; jupyter contrib nbextension install --user; jupyter notebook --notebook-dir=/tmp/working --ip='*' --port=8888 --no-browser --allow-root"
}
This line in the above definition of kjupyter
solves a problem with using GPU. (This link has some background.)
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
Now you can run kjupyter
from the terminal and go to http://localhost:8888/
for jupyter notebook. Nbextensions are taken care of in a similar way as the CPU version.
Update-1: an example where I enable some extension by default, notice the additional jupyter nbextension enable toc2/main --user
in the defintion below:
kjupyter() { docker run --runtime=nvidia -v $PWD:/tmp/working -v /data:/tmp/working/data -w=/tmp/working -p 8888:8888 — rm -it kaggle/python-gpu-build bash -c “export LD_LIBRARY_PATH=/usr/local/cuda/lib64; pip install jupyter_contrib_nbextensions; pip install jupyter_nbextensions_configurator; jupyter contrib nbextension instal --user; jupyter nbextension enable toc2/main --user; jupyter notebook — notebook-dir=/tmp/working — ip=’*’ — port=8888 — no-browser — allow-root”}
Update-2: under some recent versions of conda the notebook fails to connect to the kernel and hangs there. I’ve found the solution described here works: downgrade tornado by pip install tornado==4.5.3
:
kjupyter() {
docker run --runtime=nvidia -v $PWD:/tmp/working -v /data:/tmp/working/data -w=/tmp/working -p 8888:8888 --rm -it kaggle/python-gpu-build bash -c "pip install tornado==4.5.3; export LD_LIBRARY_PATH=/usr/local/cuda/lib64; pip install jupyter_contrib_nbextensions; pip install jupyter_nbextensions_configurator; jupyter contrib nbextension install --user; jupyter nbextension enable toc2/main --user; jupyter notebook --notebook-dir=/tmp/working --ip='*' --port=8888 --no-browser --allow-root"
}