Making and Deploying an AI Web App in 2023 (Part 5)

Implement and Test a Python Backend Server

This is part of a multi-part blogpost about how to build an AI Web App. Please refer to Part 1 for more context.

This post uses FastAPI as the web framework and Uvicorn as the web server.

Some alternatives to FastAPI are Flask, Starlette, and Django. Alternatives to Uvicorn include Gunicorn, Hypercorn, nginx.

By this point we have an AI project that we’re happy with, and want to make it easily available. The goal of this post is to write a simple backend service that calls the Python code from Part 4, and returns a result in some readable form, such as JSON.


For our case, we only want to expose the search function (see Part 4 for details).

The first thing to do is to add our dependencies to pyproject.toml. We want to add fastapi and uvicorn to our config file. FastAPI allows us to write Python code to define the API with type hints (what goes in and what comes out), as well as the Python logic to be executed (what happens in between). Uvicorn will allow us to actually serve our Python code as an HTTP service.

Let’s start by making a new file in the root of our project. In this file, we define a function search_endpoint which receives a query and calls our search function to query our database.

from pathlib import Path
from typing import List

from fastapi import FastAPI
from loguru import logger

from ai_web_app.main import Result, index_embeddings, search

app = FastAPI()"Indexing database...")
database_path = Path("./articles.sqlite")
embeddings = index_embeddings(database_path)

async def search_endpoint(query: str, topn: int = 5) -> List[Result]:
    results = search(embeddings, database_path, query, topn=topn)
    return results

To make this easier to run, we should also add a new script to our pyproject.toml file (add the following script to the appropriate section)

serve = "uvicorn app:app --port 8080"

and then we can simply run this command to start the server:

hatch run serve

Running that command should index our database (took ~10s on my computer) and start serving requests on port 8080.

We can test how this looks like by running in another terminal window

curl -X GET ""

If you’ve been following so far, this is what you should get as a response


Integration Tests

To make sure our backend endpoint behaves as expected even if we change it in the future, we should set up some simple integration tests.

For this, we will use pytest-xprocess to launch our server as a separate process, and then make some simple requests and assert that the response is as expected.

First of all, we need to add pytest-xprocess to our dev dependencies in pyproject.toml.

After that, we create a file tests/

import pytest
import requests
from xprocess import ProcessStarter

class TestPythonServer:
    def server(self, xprocess):
        class Starter(ProcessStarter):
            timeout = 600
            pattern = "Application startup complete"
            args = ["hatch", "run", "serve"]

        xprocess.ensure("server", Starter)
        url = ""
        yield url


    def test_valid_search(self, server):
        topn = 5
        response = requests.get(
            server + "/search", params={"query": "symptoms of covid", "topn": topn}

        assert response.status_code == 200
        assert len(response.json()) == topn

    def test_empty_search(self, server):
        response = requests.get(server + "/search")

        assert response.status_code == 422

This file defines a class with 3 methods:

  • server starts up our web server, and waits for it to be available. It’s configured to wait until it sees the words "Application startup complete" in the output of hatch run serve.
  • test_valid_search makes a simple request and assures that the response is valid, and has the correct number of requested results. You could spend a lot of time here coming up with better test cases, but for this simple app there’s not that much that can go wrong.
  • test_empty_search tests a case where the incoming request doesn’t have any query. In this case, the server should return an error. It is important to verify your assumptions of the behavior for failing cases, so you don’t have silent errors in your system.

You can also add this as a new script in the pyproject.toml file:

integration = "no-cov -m integration -x"

One final thing for the integration tests: typically these tests are quite slow to run, as they require an often lengthy setup process. As such, we shouldn’t run these tests every time.

In the file above, we marked the tests with @pytest.mark.integration. This allows us to configure pytest to not run those tests by default. We can do that by adding this to our pyproject.toml file:

addopts = "--strict-markers -m \"not integration\""
markers = [
    "integration: slow tests, shouldn't be run so often"

Now we can differentiate between running unit tests (hatch run cov) and integration tests (hatch run integration).

If you’ve been following these instructions, your code should look like this:

To continue this tutorial, go to Part 6.

For comments or questions, use the Reddit discussion or reach out to me directly via email.