Salmon Run: AWS ML Week and adventures with SageMaker

I attended the AWS ML Week at San Francisco couple of weeks ago. It was held over 2 days and consisted of presentations and workshops, presented and run by Amazon Web Services (AWS) architects. The event was meant to showcase the ML capabilities of AWS and was targeted at Data Scientists and Engineers, as well as innovators who want to include Machine Learning (ML) capabilities in their applications.

The AWS ML stack at the time of writing is as shown below. This image comes from one of the presentation slides. The top layer (Application Services) is a set of canned ML models exposed through an API and is aimed at people who want to exploit ML capabilities in their applications without having to go through the hassle of building it themselves. The middle layer (Platform Services) is aimed at the Data Scientist / Engineer types who are training and consuming their own ML models. The bottom layer (Frameworks and Interfaces) is the infrastructure layer, based upon the Amazon Deep Learning AMIs that were released some time back.

The first day of talks covered the Application Services (top) layer, and the second day covered the Platform Services (middle) layer. The Frameworks and Interfaces (bottom) layer was not covered at all, but those of us who've trained Deep Learning (DL) models on AWS have probably used the Amazon Deep Learning AMIs and know enough about them already.

My main reason for attending the event was twofold. First, some colleagues were talking about the cool canned ML algorithms that AWS was coming out with, and I thought attending this kind of event would be a way to quickly learn about them all at a high level. Second, a colleague and I had evaluated SageMaker earlier for our own use, and I had concluded that while it was a good managed notebook environment for development, I wasn't too impressed with its stated goal as an unified platform for distributed model training and deployment, and I was hoping that I would learn new information here that would change my mind.

In this post, I will focus on these two aspects in depth.

Application Services

This is just a list of application services with a brief description. All of these can be consumed through an API, and provide very general services such as emotion detection in faces, keyword extraction from text, etc. However, it is often possible to compose these undifferentiated services to produce unique functionality. Such applications can just use the AWS Application Services rather than build them themselves, saving them some time and wheel reinvention effort in the process.

Rekognition - a group of Computer Vision (CV) services. There is a Rekognition for Images and a Rekognition for Video. Rekognition for Images provides functionality for Object and Scene Detection, Facial Analysis (sentiment, gender, facial features), Face Recognition, Unsafe (NSFW) detection, Celebrity recognition, Text in Images and Face Similarity comparison. Rekognition for Video has all the services of Rekognition for Images plus Person tracking.
Transcribe - speech to text conversion. This was in preview at the time but has since become generally available. Unlike other services, the API is asynchronous only.
Translate - language translation. Supported only English and Spanish at the time, but they are adding more languages, so this might have changed also. As with Transcribe, it was in preview but has become generally available.
Comprehend - a set of language services that work on text to detect sentiment, entities, language and key phrases. It also has functionality to build topic models out of a corpus of text.
Polly - text to speech conversion. Multiple voices and accents available for customization.
Lex - conversational interface for text and voice based applications, it is the API underlying the Amazon Echo and Alexa family of devices.

Some examples of applications that could be composed using these components are listed below. Some of these are covered in more depth in the presentation slides (see list at bottom).

Using Comprehend to detect non-English tweets and translate to translate them into English, and extract keywords from them using Comprehend.
Using Comprehend to generate sentiment on incoming customer service requests.
Extracting entities and keyphrases from a text corpus to generate knowledge graphs.
Video captioning of different languages simultaneously using Translate.
Pollexy project (video) - application to remind autistic child to do specific things at different times of the day.
Finding missing persons by comparing images on social media with reference image - this was our workshop example from day 1.

SageMaker

SageMaker bills itself as a fully-managed platform to build, train and deploy ML models at scale. As a user, I see two main use cases - a managed notebook platform for development, and a unified platform for training and deploying ML models.

For the first use case, as long as you are using Keras, Tensorflow (TF) or MXNet with either Python2 or Python3, you could simply choose the appropriate notebook type and use it. You could also install other frameworks such as Pytorch using pip on the notebook's virtual terminal and use that instead. It's not very different from running Jupyter notebooks on your Deep Learning AMI and possibly a little less flexible, but it can be convenient for enterprise customers with their own Virtual Private Clouds (VPCs) since the SageMaker notebook is available within your Amazon console without having to do complex network finagling. Strangely enough, Amazon does not emphasize this use case at all.

The other use case is as a unified platform for large scale (possibly distributed) model training and model deployment. In this mode, SageMaker acts as a wrapper that calls into user provided functionality at different points in its life cycle. This allows you to run the SageMaker notebook on a relatively low end EC2 instance because you would spin up a high performance EC2 box (possibly even a GPU box if needed) for the duration of the training. Similarly, you would deploy the trained model to a different EC2 instance as well. In Java object oriented terms, SageMaker does this by exposing an Estimator interface that various ML models must implement. In this mode, SageMaker supports a wide variety of ML algorithms (Deep Learning and traditional), as listed below.

Built-in ML algorithms - the following algorithms are provided as part of SageMaker - Linear Learner, Factorization Machines, XGBoost, Image Classification, Sequence2Sequence, KMeans, Principal Components Analysis (PCA), Latent Dirichlet Allocation (LDA), Neural Topic Models (NTM), DeepAR Forecasting (Time Series) and BlazingText (word2vec implementation). The built-in algorithms are all exposed via a common Estimator interface that uses Docker registry paths to identify a specific algorithm.
MXNet and TF Estimators - these SageMaker Estimators allow wrapping of the user's MXNet and TF models (as well as Keras models built using Keras embedded in TF, also known as tf.keras). The user has to provide implementations of certain functions to SageMaker and SageMaker calls them at different points in its lifecycle. Since TF comes with its own Estimators which are pre-built DL and ML networks, this opens up even more possibilities. So overall, this allows wrapping of the following kinds of models.
- MXNet models (including prebuilt models from around the net and from the MXNet model zoo)
- TF Estimators (list of pre-made Estimators from TF docs)
- TF models (including prebuilt models from around the net and from the recently announced Tensorflow Hub)
- Keras models (including prebuilt models from around the net and from the Keras model zoo)
Bring Your Own Model (BYOM) Estimators - you set up a Docker contiainer in a specific way to expose training and serving functionality via scripts, and the SageMaker Estimator would use these scripts to train and deploy the model. This is the same Estimator that exposes SageMaker's built-in ML functionality.

Examples of each of these use cases can be found in the awslabs/amazon-sagemaker-examples repository. We did run through some of these in one of the workshops, but I decided to expose one of my own recent Keras models through SageMaker to figure out the steps involved.

The documentation about wrapping TF/Keras models on the aws/sagemaker-python-sdk repository says that the training script must contain the following function overrides.

Exactly one of model_fn, keras_model_fn or estimator_fn - defines the model to be trained, each of the options correspond to definitions for one of TF model, tf.keras model or TF Estimator respectively
train_input_fn - to preprocess and load training data
eval_input_fn - to preprocess and load evaluation data
serving_input_fn - required for deploying endpoint through SageMaker

My model takes as input dense vectors of size (2048,) and predicts one of two classes. Since the model was built using tf.keras I started by defining a keras_model_fn and the other function overrides as follows:

import boto3
import numpy as np
import os
import tensorflow as tf
from tensorflow.python.estimator.export.export import build_raw_serving_input_receiver_fn

INPUT_TENSOR_NAME = 'inputs_input' # needs to match the name of the first layer + "_input"
hyperparams = {
    "learning_rate": 1e-3
}


def keras_model_fn(hyperparams):
    # build model
    model = tf.keras.models.Sequential()
    
    model.add(tf.keras.layers.Dense(2048, input_shape=(2048,), name="inputs"))
    model.add(tf.keras.layers.Activation("relu"))
    model.add(tf.keras.layers.Dropout(0.5))
    
    model.add(tf.keras.layers.Dense(512))
    model.add(tf.keras.layers.Activation("relu"))
    model.add(tf.keras.layers.Dropout(0.5))

    model.add(tf.keras.layers.Dense(128))
    model.add(tf.keras.layers.Activation("relu"))
    model.add(tf.keras.layers.Dropout(0.5))

    model.add(tf.keras.layers.Dense(2))
    model.add(tf.keras.layers.Activation("softmax", name="output"))

    # compile model
    optim = tf.keras.optimizers.Adam(lr=hyperparams["learning_rate"])
    model.compile(optimizer=optim, loss="categorical_crossentropy", 
                  metrics=["accuracy"])
    return model


def train_input_fn(training_dir, hyperparams):
    return _train_eval_input_fn(training_dir, "train_file.csv")


def eval_input_fn(training_dir, hyperparams):
    return _train_eval_input_fn(training_dir, "eval_file.csv")


def _train_eval_input_fn(training_dir, training_file):
    xs, ys = [], []
    ftest = open(os.path.join(training_dir, training_file), "rb")
    for line in ftest:
        _, label, vec_str = line.strip().split("\t")
        xs.append(np.array([float(e) for e in vec_str.split(",")]))
        ys.append(int(label))
    ftest.close()
    X = np.array(xs, dtype=np.float32)
    Y = tf.keras.utils.to_categorical(
        np.array(ys), num_classes=2).astype(np.int)
    return tf.estimator.inputs.numpy_input_fn(
        x={INPUT_TENSOR_NAME: X},
        y=Y,
        num_epochs=None,
        shuffle=True)()


def serving_input_fn(hyperparams):
    tensor = tf.placeholder(tf.float32, [1, 2048])
    return build_raw_serving_input_receiver_fn({INPUT_TENSOR_NAME: tensor})()

Later, however, I found that the deployed model was not able to parse request vectors serialized by TF's make_proto mechanism, so I had to switch to treating it as a TF model instead. This just meant that I had to replace the keras_model_fn with a model_fn function. The other change was that now my INPUT_TENSOR_NAME was no longer under the control of the Keras API, so I could rename it to the more readable "inputs". In addition, since I now have to explicitly provide EstimatorSpec objects for each of my operation modes (train, eval, predict), I need an additional import (PredictOutput) and an additional prediction signature key given by SIGNATURE_NAME. Also notice how the model definition has changed from the Sequential model to a functional form, where the input to the network comes from the features parameter. The other functions remain unchanged.

from tensorflow.python.estimator.export.export_output import PredictOutput

INPUT_TENSOR_NAME = "inputs"
SIGNATURE_NAME = "serving_default"

def model_fn(features, labels, mode, hyperparams):
    # build model (notice no input layer, fed from features parameter)
    hidden_1 = tf.keras.layers.Dense(2048, activation="relu")(features[INPUT_TENSOR_NAME])
    hidden_1 = tf.keras.layers.Dropout(0.5)(hidden_1)
    hidden_2 = tf.keras.layers.Dense(512, activation="relu")(hidden_1)
    hidden_2 = tf.keras.layers.Dropout(0.5)(hidden_2)
    hidden_3 = tf.keras.layers.Dense(128, activation="relu")(hidden_2)
    hidden_3 = tf.keras.layers.Dropout(0.5)(hidden_3)
    predictions = tf.keras.layers.Dense(2, activation="softmax", name="output")(hidden_3)
    
    # estimator for predictions
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(
            mode=mode,
            predictions={"output": predictions},
            export_outputs={SIGNATURE_NAME: PredictOutput({"output": predictions})})

    # define loss function (using TF)
    loss = tf.losses.softmax_cross_entropy(labels, predictions)
    
    # define training op (using TF)
    train_op = tf.contrib.layers.optimize_loss(
        loss=loss,
        global_step=tf.train.get_global_step(),
        learning_rate=hyperparams["learning_rate"],
        optimizer="Adam")
    
    # generate predictions as TF tensors
    predictions_dict = {"output": predictions}
    
    # generate eval_metric ops
    eval_metric_ops = {
        "accuracy": tf.metrics.accuracy(
            tf.cast(labels, tf.float32), predictions)
    }
    
    # estimator for train and eval
    return tf.estimator.EstimatorSpec(
        mode=mode,
        loss=loss,
        train_op=train_op,
        eval_metric_ops=eval_metric_ops)

The above functions are written out into a script file and passed into the SageMaker Estimator. I tested each individual function separately to verify that they work by themselves. I did note that the SageMaker Estimator code (as well as TF code) is much more picky about data types - for the features, it expects a matrix of np.float32 and for the labels it expects a vector of np.int. Here is the code to train this model using SageMaker.

import json
import numpy as np
import os
import sagemaker
import tensorflow as tf

from sagemaker import get_execution_role
from sagemaker.tensorflow import TensorFlow

# (1) set up
sagemaker_session = sagemaker.Session()
# role = get_execution_role()
role = "arn:aws:iam:..." # copy-paste the IAM role from your SageMaker console

# (2) upload data to S3
inputs = sagemaker_session.upload_data(path="data/", 
                                       key_prefix="data")

# (3) train model
compdetect_estimator = TensorFlow(entry_point="composite-detector-tf.py",
                                  role=role,
                                  training_steps=100,
                                  evaluation_steps=100,
                                  hyperparameters={"learning_rate": 1e-3},
                                  train_instance_count=1,
                                  train_instance_type="ml.p2.xlarge")
compdetect_estimator.fit(inputs)

# (4) deploy model
compdetect_predictor = compdetect_estimator.deploy(initial_instance_count=1, 
                                                   instance_type='ml.m4.xlarge')

# (5) load data for evaluation and run predictions
Xeval = load_eval_data()
for i in range(Xeval.shape[0]):
    data = Xeval[i]
    tensor_proto = tf.make_tensor_proto(values=np.asarray(data), 
                                        shape=[1, len(data)], 
                                        dtype=tf.float32)
    pred = compdetect_predictor.predict(tensor_proto)
    Y_ = pred["outputs"]["output"]["floatVal"]
    y = np.argmax(Y_)
    print(i, y)

# (6) delete the endpoint when done
sagemaker.Session().delete_endpoint(compdetect_predictor.endpoint)

Below I explain each of these steps in detail.

The first step is to open a SageMaker session and extract the IAM role from it. There seems to be a bug in this code, so I found (and others on the Internet have the same advice) that just using the IAM role value from the SageMaker notbook console works just as well.
Lately, I usually have code in my notebooks to copy any data I need (and don't already have locally) from S3 and write back my models and output datasets back to S3 using the boto3 package. Here, the upload_data call expects to see the data locally, so I used awscli to copy it down from S3 to a local data subfolder, then invoke the command. The upload_data command will place it in a well-known (within the session) S3 bucket accessible to the training instance as well.
The next step is to train the model. The entry_point parameter to the Tensorflow Estimator points to the script file with the functions that we set up above. The only hyperparameter we are passing in is the learning rate. The training will be done on a single m1.p2.xlarge instance (as indicated by the train_instance_count and train_instance_type parameters). We could have used distributed training by setting the train_instance_count to a value larger than 1. Note that tf.keras models exposed through the keras_model_fn cannot be trained in distributed mode. The model trains for 100 iterations and is evaluated for 100 iterations.
Once trained, the model is deployed to yet another m1.m4.xlarge instance, called the endpoint, with the estimator.deploy call. The endpoint can auto-scale, meaning that SageMaker can automatically spin up additional instances of the endpoint in case the usage goes too high.
We can now run predictions against the model by hitting the endpoint. Our endpoint is set up to consume input one at a time, but we could also set it up to consume fixed size batches if desired. The data has to be serialized using tf.make_tensor_proto and then passed to the predictor.predict call. This was the part that was failing for my Estimator using the keras_model_fn function, I suspect it has to do mismatch between the way TF serializes the data and the way Keras expects it. A sample output from the endpoint is shown below.
Finally, if we no longer need to use the endpoint, we can just destroy it using delete_endpoint call. We can also delete it from the console.

So that's what it took to wrap my model into a SageMaker estimator and train and deploy it. Its not a lot of code, but documentation and Stack Overflow style support is still scarce, so the going is not very smooth. However, having trained and deployed one network through SageMaker, I feel more confident of being able to do the same with others. So given that it's a pain to work with, but does provide a lot of benefits, I have reconsidered my original skepticism towards it.

Other stuff: IoT and DeepLens

The last two talks focused on ML on IoT devices using MXNet. Models could be on-board on the IoT device or be accessed from the cloud over a SageMaker endpoint or using AWS Greengrass. In line with the focus on IoT, AWS DeepLens device is a device that can host ML algorithms, either canned ones from the AWS ML Application Services layer, or those you build yourself. Similar to the Echo/Alexa family of devices, I think DeepLens is meant to catalyze development of novel ML and CV applications for the consumer market. It is expected to be available in June 2018 and is available for pre-order on Amazon.

Links to Slides

Links to presentation slides were provided after the event and are all publicly available on Slideshare, would be awesome (wink wink nudge nudge, AWS guys) if these links were also updated on the original event page for AWS ML Week.

Day 1 (Monday, March 26 2018)
- Build Intelligent Applications with AWS ML Services - by Steve Shirkey, Solutions Architect, AWS
- Automate for Efficiency with Amazon Transcribe and Amazon Translate - by Niranjan Hira, Solutions Architect, AWS
- Using Amazon Comprehend and Amazon SageMaker to gain insights from text - by Nino Bice, Sr Product Manager, AWS
- Build a Virtual Assistant with Amazon Polly and Amazon Lex (workshop) - by Niranjan Hira, Solutions Architect, AWS
- Using Amazon Rekognition to build a Facial Recognition System (workshop) - by Kashif Imran, Solutions Architect, AWS
Day 2 (Tuesday, March 27 2018)
- Using Amazon SageMaker to build, train and deploy your ML Models - by Gitansh Chadha, Solutions Architect, AWS
- SageMaker Algorithms: Infinitely Scalable Machine Learning - by Amit Sharma, Principal Solutions Architec, AWS
- Integrating Deep Learning into your Enterprise - by Yash Pant, Enterprise Solutions Architect, AWS
- Building Deep Learning Applications with Tensorflow and Amazon SageMaker (workshop) - by Steve Shirkey, Solutions Architect, AWS
- Enabling Deep Learning in IoT Applications with Apache MXNet - by Pratap Ramamurthy, Partner Solutions Architec and Hagay Lupesco, Software Development Engineer, AWS
- AWS DeepLens: A new way to learn Machine Learning - by Mike Miller, Sr Manager, AWS

So that's all I had for the AWS ML Week. I think I ended up getting what I went there for. First I now have a good idea of the different services available in the Application Services layer. Second, I have a much better understanding and appreciation of SageMaker. I hope you found my writeup useful as well.