Making and Deploying an AI Web App in 2023 (Part 8)
Deploy a Serverless AI App with Google Cloud
This is part of a multi-part blogpost about how to build an AI Web App. Please refer to Part 1 for more context.
This post uses Google’s Cloud Run. Alternatives would be AWS’s Lambda, Azure Functions.
It’s finally time to deploy our web app so that everyone can reach it. We’ll deploy our app in a serverless fashion. This means that every time someone makes a request, a new container will be started and run only for the needed time to respond to it. By doing it this way, we avoid wasting money on compute when no one is making requests to our server. Furthermore, scaling happens automatically: if suddenly our web app gets very popular, our cloud provider automatically starts as many containers as needed.
In this post we’ll do everything with the gcloud
CLI,
but it can all be done with the graphical interface of the
Google Cloud Console website.
Setup
The first step is to install the gcloud
CLI
and run gcloud init
to configure your account.
Then we need the following setup steps:
Create a new project
gcloud projects create my-example-webapp-23867 --name="AI Web App"
Note that the project ID (
my-example-webapp-23867
) needs to be unique. So this exact command won’t work for you, you will need to create your own project ID.Configure
gcloud
to use your project:gcloud config set project my-example-webapp-23867
Activate the Cloud Run API
gcloud services enable run.googleapis.com
Activate the Artifact Registry API, which we will use to upload our docker images.
gcloud services enable artifactregistry.googleapis.com
Note that this step requires that billing is enabled in the project. This step is not strictly required: you could upload the docker images somewhere else, such as docker hub.
To activate billing for the project, first set up a billing account on the Google Cloud Console. Using the CLI, you can list the existing billing accounts:
gcloud billing accounts list
You can then link the project to a specific billing account
gcloud billing projects link my-example-webapp-23867 --billing-account 0X0X0X-0X0X0X-0X0X0X
(don’t forget to replace the billing account ID by your own)
Create a new artifacts repository
gcloud artifacts repositories create ai-web-app-artifacts --repository-format=docker --location=us-central1 --description="Docker artifacts"
Choose the location that is closer to you and your users.
Setup your docker client to authenticate to this repo with
gcloud auth configure-docker us-central1-docker.pkg.dev
If you chose something else than
us-central1
for the location in the last step, be sure to reflect that in this auth command. For example, if your region iseurope-west1
, you would instead rungcloud auth configure-docker europe-west1-docker.pkg.dev
Deploy
By this point, we’ve done all the necessary setup to deploy our app. Now all that’s left is to upload our docker image and start our serverless function.
We should add a helper script in pyproject.toml
to push our image to Google Cloud:
[tool.hatch.envs.default.scripts]
push = [
"docker tag ai-web-app:latest us-central1-docker.pkg.dev/my-example-webapp-23867/ai-web-app-artifacts/ai-web-app:latest",
"docker push us-central1-docker.pkg.dev/my-example-webapp-23867/ai-web-app-artifacts/ai-web-app:latest"
]
We can then build and push our docker image
hatch run build && hatch run push
And finally we can deploy our serverless function (when asked, choose unauthenticated access)
gcloud run deploy ai-web-app --image="us-central1-docker.pkg.dev/my-example-webapp-23867/ai-web-app-artifacts/ai-web-app:latest" --region=us-central1 --memory=2Gi
Note that the --memory=2Gi
option is important for this specific case,
as the all-MiniLM-L6-v2
model (see Part 1) requires almost all that memory to run.
The output of this deploy command will give you a Service URL for your app, which you should use to access it.
The app should now be up (if it’s not, have a look at the Cloud Run logs for errors), and you can reach it with
curl -X GET "https://<SERVICE-URL>/search?query=risk%20factors"
About keeping your budget low:
- If you setup your Cloud Run function without authentication, it’s a good idea to set budget limits and avoid unpleasant billing surprises.
- If you’re pushing many images to the Artifact Registry, the costs can also quickly add up. You should regularly delete old images, or create a cleanup policy.
To continue this tutorial, go to Part 9.
For comments or questions, use the Reddit discussion or reach out to me directly via email.