Making and Deploying an AI Web App in 2023 (Part 8)

Deploy a Serverless AI App with Google Cloud

This is part of a multi-part blogpost about how to build an AI Web App. Please refer to Part 1 for more context.

This post uses Google’s Cloud Run. Alternatives would be AWS’s Lambda, Azure Functions.

It’s finally time to deploy our web app so that everyone can reach it. We’ll deploy our app in a serverless fashion. This means that every time someone makes a request, a new container will be started and run only for the needed time to respond to it. By doing it this way, we avoid wasting money on compute when no one is making requests to our server. Furthermore, scaling happens automatically: if suddenly our web app gets very popular, our cloud provider automatically starts as many containers as needed.

In this post we’ll do everything with the gcloud CLI, but it can all be done with the graphical interface of the Google Cloud Console website.

Setup

The first step is to install the gcloud CLI and run gcloud init to configure your account.

Then we need the following setup steps:

  1. Create a new project

    gcloud projects create my-example-webapp-23867 --name="AI Web App"
    

    Note that the project ID (my-example-webapp-23867) needs to be unique. So this exact command won’t work for you, you will need to create your own project ID.

  2. Configure gcloud to use your project:

    gcloud config set project my-example-webapp-23867
    
  3. Activate the Cloud Run API

    gcloud services enable run.googleapis.com
    
  4. Activate the Artifact Registry API, which we will use to upload our docker images.

    gcloud services enable artifactregistry.googleapis.com
    

    Note that this step requires that billing is enabled in the project. This step is not strictly required: you could upload the docker images somewhere else, such as docker hub.

    To activate billing for the project, first set up a billing account on the Google Cloud Console. Using the CLI, you can list the existing billing accounts:

    gcloud billing accounts list
    

    You can then link the project to a specific billing account

    gcloud billing projects link my-example-webapp-23867 --billing-account 0X0X0X-0X0X0X-0X0X0X
    

    (don’t forget to replace the billing account ID by your own)

  5. Create a new artifacts repository

    gcloud artifacts repositories create ai-web-app-artifacts --repository-format=docker --location=us-central1 --description="Docker artifacts"
    

    Choose the location that is closer to you and your users.

    Setup your docker client to authenticate to this repo with

    gcloud auth configure-docker us-central1-docker.pkg.dev
    

    If you chose something else than us-central1 for the location in the last step, be sure to reflect that in this auth command. For example, if your region is europe-west1, you would instead run

    gcloud auth configure-docker europe-west1-docker.pkg.dev
    

Deploy

By this point, we’ve done all the necessary setup to deploy our app. Now all that’s left is to upload our docker image and start our serverless function.

We should add a helper script in pyproject.toml to push our image to Google Cloud:

[tool.hatch.envs.default.scripts]
push = [
    "docker tag ai-web-app:latest us-central1-docker.pkg.dev/my-example-webapp-23867/ai-web-app-artifacts/ai-web-app:latest",
    "docker push us-central1-docker.pkg.dev/my-example-webapp-23867/ai-web-app-artifacts/ai-web-app:latest"
]

We can then build and push our docker image

hatch run build && hatch run push

And finally we can deploy our serverless function (when asked, choose unauthenticated access)

gcloud run deploy ai-web-app --image="us-central1-docker.pkg.dev/my-example-webapp-23867/ai-web-app-artifacts/ai-web-app:latest" --region=us-central1 --memory=2Gi

Note that the --memory=2Gi option is important for this specific case, as the all-MiniLM-L6-v2 model (see Part 1) requires almost all that memory to run.

The output of this deploy command will give you a Service URL for your app, which you should use to access it.

The app should now be up (if it’s not, have a look at the Cloud Run logs for errors), and you can reach it with

curl -X GET "https://<SERVICE-URL>/search?query=risk%20factors"

About keeping your budget low:

  • If you setup your Cloud Run function without authentication, it’s a good idea to set budget limits and avoid unpleasant billing surprises.
  • If you’re pushing many images to the Artifact Registry, the costs can also quickly add up. You should regularly delete old images, or create a cleanup policy.

To continue this tutorial, go to Part 9.

For comments or questions, use the Reddit discussion or reach out to me directly via email.