Command
If you're on linux:
# create directory to mount into
mkdir -p ~/docker-mounts/pyspark-notebook
# start pyspark notebook container, and sync it to the mount directory
docker run -it --rm -p 8888:8888 -v ${HOME}/docker-mounts/pyspark-notebook:/home/jovyan --user root -e CHOWN_HOME=yes -e CHOWN_HOME_OPTS='-R' jupyter/pyspark-notebook start.sh jupyter notebook --NotebookApp.token=''
If you're on windows, you'll have to change the mount directory path to some path that windows recognizes. (~/docker-mounts/pyspark-notebook
)
It will pull a docker image that eats up 4.91GB
of disk space
Accessing the notebooks
You can open the notebook server on:
- http://127.0.0.1:8888/tree - jupyter notebook server
- http://127.0.0.1:8888/lab - jupyter lab server. Easier to upload files etc.
That's it
Notes:
- Everything in the home directory is saved to disk.
- I've disabled the authentication with the
--NotebookApp.token=''
flag. - It runs spark in standalone mode. Good enough for doing small experiments.
That's it. Enjoy.