Additional Material

In various chapters in the book we make references to demos and lectures that can be found on-line.  Much of this is located in a Github site we keep for the book.  Below are links to a few of these

  1. Chapter 6 (Containers) references is simple Dockerfile for building a trivial web site.  The source code for that demo is linked here.
  2. In the Kinesis Array of Things example  in the book there is a data file you need and a program that will send the events to Kinesis. This is illustrated in Notebook 18.  The folder with more code and the data is Kinesis-spark-Aot.
  3. Chapter 7 (Scaling deployments) contains an MPI C program that sends a token down a line: You can find this program and a related ring version in the Github site at aws-hpc-cluster, along with a simple C program that show how to invoke a python program.
  4. The demonstration of the small microservice document classifier demo used the EC2 Container Service is also in the Github repo in directory aws-ml-container.  This goes along with Notebooks  10 and 11.
  5. Chapter 7 (Scaling deployments) presents a Kubernetes example that uses a Celery program. There is no notebook for this example, but the basic components for the solution are in Github in file gcloud-container.

Tutorial Files

Here is a zipped, downloadable tarball containing the lecture slides, exercises from the tutorial given at SC17 and many of the Jupyter notebooks:

The agenda for the tutorial  is shown below.   Microsoft provided  free accounts on Azure for the students to use.

Part 1. The Cloud and Interactive Scientific Discovery

An Introduction to basic cloud access and operation.

    • Azure account setup and introduction to Jupyter
    • Storage Systems: blob stores including S3, Azure blob storage, OpenStack Swift, SQL and NoSQL storage including Google Big Table, AWS DynamoDB, AWS RDS, Azure Tables
    • Hands-on Lab: Blob and Table Storage using the Azure Portal and Jupyter.
  • 12:00 Lunch

Part 2. Scaling Science in the Cloud

This section focuses on higher level services in the cloud.

  • Virtual Machine and Containers
    • Compute Infrastructure: Virtual Machines and how to launch them and attach storage.   Demos from AWS and JetStream.
    • Containers: Docker Demo.
  • Parallelism in the cloud (discussion and demo)
    • Map Reduce
    • Spark and Hadoop
    • Kubernetes and Mesos and container services.
    • Microservice concepts and demo
  • Data Analytics
    • Hands-on Lab: Yarn on Azure with Spark.
  • Hour.  Machine learning and event stream analysis.
    • Survey discussion
    • Demos from Microsoft research tutorial on AI tools for Azure (not included here)


Tutorial Container

We have put together a container based on jupyter/all-spark-notebook with additional installed SDKs for Azure and AWS.  The notebook is accurate as of 3/28/2017.   However there is a self-signed certificate so you will need to accept security exceptions to run it.   To invoke it with Docker do:

docker run -it -p 8888:8888 dbgannon/tutorial

and go to https://ip-of-host:8888 and login with password “tutorial”.

%d bloggers like this: