We provide accompanying Jupyter notebooks to illustrate the use of various technologies described in this book. These notebooks explain selected techniques and approaches and provide thorough implementation details so that you can quickly start using the technologies covered within. They combine explanation, basic exercises, and substantial additional Python code to provide a conceptual understanding of each technology, give insight into how key parts of the process are implemented through exercises, and then lay out an end-to-end pattern for implementing each in your own work. The notebooks are interactive documents that mix formatted text and Python code samples that can be edited and run in real-time in a Jupyter notebook server, allowing you to run and explore the code for each technology as you read about it.
The notebooks and related files are accessible below. They are freely available to be downloaded by anyone at any time, and run on any appropriately configured computer. In most cases, additional packages will need to be added and each notebook will give instructions for adding the appropriate tool.
You need a version of Python and iPython on your system, including a local Jupyter server that you can use to run the notebooks. You also need to install additional Python packages needed by the notebooks, and a few additional programs. The easiest way to get this all working is to install the free Anaconda Python distribution provided by Continuum Analytics https://continuum.io/downloads. Anaconda includes a Jupyter server and pre-compiled versions of many packages used in the notebooks. It includes multiple tools for installing and updating both Python and installed packages. It is separate from any OS-level version of Python, and is easy to completely uninstall. It works well on Windows, Mac, and Linux.
Alternatively, you can also create your Python environment manually, installing Python, package managers, and Python packages separately. Packages like numpy and pandas can be difficult to get working, however, particularly on Windows, and Anaconda simplifies this setup considerably, regardless of your OS.
We provide for each notebook a web link to an HTML rendering, a brief description, and a link to download the .ipynb file.
- Jupyter introduction. This first look at Jupyter illustrates some basic properties of Jupyter that we use extensively including the mixing of text and Latex math with Python code and the use in-line graphics. Download notebook file.
- Using Amazon storage services, and the three notebooks that follow, show how to manage cloud data storage from a Python script. This scenario assumes you have a CSV file describing some experimental data and a collection of that data. The task is to create a table in the cloud, then upload the data for each experiment to blob storage and then add a row to the table which contains the metadata for that experiment and a URL for the associated data. Here we use the AWS dynomodb table and the S3 storage service for the blobs. Download notebook file.
- Using Azure storage services performs the same exercise but using the Azure table and blob services. Download notebook file.
- Google Cloud Storage-1. Here we do not do the entire exercise but we illustrate how to use Google Bigtable to create a table.Download notebook file.
- Google Cloud Storage-2. Here we complete the exercise described above, but it uses the Google datastore and blob storage to accomplish the task. Download notebook file.
- Using OpenStack storage services illustrates the use of CloudBridge Python package to manage basic storage operations on the OpenStack layer of the Jetstream cloud. Download notebook file.
- AWS EC2 shows how to create and manage virtual machines on AWS using to Boto3 Python library.
- Using Globus Transfer and Sharing illustrates how Python can be used to manage file transfers and data sharing with Globus. Download notebook file.
- Using Amazon’s EC2 Container Service describes how to use the AWS EC2 Container Service. The notebook shows how to interact with the container service and launch new versions of containers. Download notebook file. Notebook 10 is the one that feeds data to the task queue and the data files and the microservice docker files are here aws-ml-container.
- https://sciengcloud.github.io/ecs-driver Notebook 11 is the client program that feeds data into queue to be consumed by the microservices. Download notebook file.
- https://sciengcloud.github.io/spark-euler Notebook 12 is a simple illustration of Spark used to demonstrate a trivial map-reduce computation. Download notebook file.
- https://sciengcloud.github.io/spark Notebook 13 provides the second demonstration of Spark for a simple k-means clustering algorithm. Download notebook file.
- https://sciengcloud.github.io/sql-magic As described in section ???, SQL commands may be executed in Spark. This notebook illustrate the use of a special set of commands that allow us to imbed SQL in an IPython notebook directly.Download notebook file.
- https://sciengcloud.github.io/aws-emr This Notebook provides a small tutorial on how to deploy Jupyter in a Spark cluster on an AWS Elastic Map Reduce cluster. The Notebook illustrates this with a small example of exploring Wikipedia data. Download notebook file.
- https://sciengcloud.github.io/datalab1 This and the following notebook illustrate Google’s Datalab. This notebook is an exploration of contagious disease records from the the U.S. CDC, specifically looking at Rubella cases over a period of time. Download notebook file.
- https://sciengcloud.github.io/datalab2 The second Datalab notebook examines weather station data and spots an anomaly in one station’s reporting. Download notebook file.
- https://sciengcloud.github.io/kinesis This notebook uses AWS Kinesis together with Spark to detect anomalies in data from the Chicago Array of Things instrument streams.Download notebook file. The data file is in Kinesis-spark-Aot.
- https://sciengcloud.github.io/sparkml Azure’s HDInsight plus Spark are used here to look at food inspection records.Download notebook file.
- https://sciengcloud.github.io/azuremlc In the machine learning chapter the AzureML tool is used to build a simple document classifier as a web service. This notebook is a client that can be used to push data to the web service. Download notebook file.
- https://sciengcloud.github.io/rnn-lstm CNTK was used to train a recurrent neural network with text from business news items. The trained model was saved and this notebook shows how to reconstitute the network and load the model and run it. Download notebook file.
- https://sciengcloud.github.io/mxnet MXNet was used to train the resnet image recognition model. This notebook is based on the MXNet example for loading and running the trained network to identify images from the web.Download notebook file.
- https://sciengcloud.github.io/cntk is coming later.
- https://sciengcloud.github.io/tensorflow This notebook is a simple illustration of tensorflow to build a very simple logistic regression analyzer that can be used to make simple predictions of graduate school admissions. Download notebook file.