Deployment

Introduction & motivation

Artificial intelligence tools are far more woven into daily life today than they were a decade ago. That shift comes down to advancements in the field, and to software and hardware being built with the user placed at the centre of it all. As more solutions reach the market, the goal stays consistent: take a cumbersome task and make it bearable.

Case study

Navigation, made less laborious

GPS, and its use inside apps like Google Maps and Apple Maps, has turned navigating unfamiliar territory into a far less painstaking adventure. Beyond directions, users get recommendations, ratings and reviews for shops, restaurants and hotels along the way — imperfect at times, but transformative overall.

Apple Maps ↗ Google Maps ↗

Case study

Shazam & audio fingerprinting

Shazam identifies a song playing nearby in seconds. It leverages audio fingerprinting — digitally condensing an audio signal by extracting its acoustically relevant characteristics — to find an exact match, even against background noise.

The recorded audio is transformed into a spectrogram — a visual representation of signal frequencies over time.
Peak points are extracted from that spectrogram.
Hash pairs of these peaks are combined into unique hash values.
The fingerprint is sent to Shazam's server, which holds fingerprints for millions of songs, and searches for a matching pattern at the right time offsets.
If a strong match is found, the song name comes back within seconds.

Shazam compares your recording's hashes against its database and identifies the song with the highest number of matches — the fingerprint that lines up best with your sample, even when it isn't an exact match.

How Shazam IDs 23,000+ songs a minute — WSJ ↗ The five-second fingerprint — Towards Data Science ↗ Audio fingerprinting — BMAT ↗

The pattern repeats across every example: powerful algorithms only matter once they reach the end user with keen, strategic execution. Get that right and the ripple effects follow — high adoption and engagement, organic growth in the user base, minimised training requirements, long-term loyalty, and sensitive data that stays protected along the way.

The build: people counting with YOLO

Today's project: a simple web application that uses a camera and YOLOv11n to count people through an entrance point in real time.

Prerequisites

VS Code or your preferred IDE
Cursor
GitHub account
Hugging Face account
Vercel account

Key terms

YOLO ("You Only Look Once") — a family of fast object-detection models that locate and classify objects in an image in a single pass.
ByteTrack — a tracking algorithm that links detections of the same object across video frames, so a person keeps the same ID as they move.
WebSockets — a persistent, two-way connection between browser and server, used here to stream live video/analytics without repeated requests.
SSE (Server-Sent Events) — a simpler one-way stream from server to browser; lighter than WebSockets when the browser only needs to receive updates.
Docker — packages an app with everything it needs to run into a single "container," so it behaves the same on your laptop and in the cloud.
CORS — a browser security rule that blocks a webpage from calling an API on a different domain unless that API explicitly allows it.

How the pieces fit together

Before touching any prompts, it's worth seeing the full picture: what runs where, and how the pieces talk to each other once everything is deployed.

Figure. Vercel only ever serves the frontend's files. Once loaded, the browser talks directly to the Hugging Face Space over a different domain — which is exactly the situation CORS rules exist for.

Local webcam vs. a webcam once it's deployed

The local-dev prompt below asks for a backend that opens the webcam itself (cv2.VideoCapture). That's correct for testing on your own laptop — but once the backend moves to a Hugging Face Space, it's running on a remote server with no physical webcam attached. cv2.VideoCapture(0) in the cloud has no camera to open.

Figure. Locally, the backend can read the webcam directly. Once deployed, the browser must capture the webcam and stream frames to the backend — the backend on Hugging Face has no camera of its own.

What this means for your prompts

Use the local-dev prompt as written to build and test on your own machine first. Before deploying, either (a) add a follow-up prompt asking your AI assistant to capture video in the browser with getUserMedia and stream frames to the backend over WebSockets instead of opening the camera server-side, or (b) treat the deployed version as a live dashboard/architecture demo and keep the full camera-to-dashboard experience running locally. Either is a reasonable choice — just decide on purpose rather than by surprise.

Finding the right camera index

The frontend spec asks for a dropdown to pick a camera index (0, 1, 2). To find out which index corresponds to which physical camera on your machine, run a short script rather than guessing:

find_camera.py

import cv2

for i in range(5):
    cap = cv2.VideoCapture(i)
    if cap.isOpened():
        print(f"Camera index {i} is available")
        cap.release()
    else:
        print(f"Camera index {i} is not available")

Run it with python find_camera.py. Built-in laptop webcams are usually index 0; a plugged-in USB webcam often shows up at 1 or higher — but this varies by OS and by what else is plugged in, so testing beats assuming.

Setting up your tools

Before writing a single prompt, get your local machine ready: Cursor, Python, Node.js/npm, and Git.

Installing Cursor

Cursor is a fork of VS Code with AI features built in, and it runs on Windows, macOS, and Linux.

Windows

Go to cursor.com/download ↗ — it detects Windows automatically and offers the .exe installer.
Run the downloaded .exe file and follow the setup wizard.
Launch Cursor from the Start Menu once installation finishes.

Requires Windows 10 or 11 (64-bit).

macOS

Go to cursor.com/download ↗. Pick the build that matches your chip: Apple Silicon (ARM64) for M1/M2/M3/M4 Macs, or Intel (x64) for older Macs.
Open the downloaded .dmg file, then drag the Cursor icon into your Applications folder.
Open Cursor from Applications. If macOS blocks it, go to System Settings → Privacy & Security and click Open Anyway.

Requires macOS 10.15 (Catalina) or newer.

Linux

Go to cursor.com/download ↗ and download the .AppImage (or a .deb/.rpm if your distro prefers packages).
Make the AppImage executable and run it:

chmod +x cursor-*.AppImage
./cursor-*.AppImage

On Ubuntu/Debian, AppImages need libfuse2 installed first: sudo apt install libfuse2. Works on Ubuntu 20.04+, Debian, Fedora, and RHEL, on both x64 and ARM64.

After installing (all platforms)

On first launch, Cursor offers to sign in (GitHub, Google, or email) and to import your VS Code extensions, themes, and keybindings — safe to accept either way. Then open the Command Palette (Ctrl+Shift+P on Windows/Linux, Cmd+Shift+P on macOS) and run "Install 'cursor' command in PATH", so you can later open any project by typing cursor . in a terminal.

Installing Python

The backend (FastAPI + YOLO) runs on Python. Version 3.10 or newer is recommended for compatibility with recent Ultralytics releases.

Windows

Go to python.org/downloads ↗ and download the latest Python 3.x installer.
Run it, and — this step trips people up more than any other — tick "Add python.exe to PATH" on the very first screen before clicking Install.

macOS

Homebrew: brew install python
or download the .pkg installer from python.org/downloads ↗ and run it.

Linux

Most distributions ship with Python 3 already. Install/upgrade with your package manager:

sudo apt install python3 python3-pip python3-venv

Verify: python --version (or python3 --version on macOS/Linux) and pip --version should both print a version number.

Installing Node.js and npm

npm ships bundled with Node.js, so installing Node gives you both. Always choose the LTS (Long-Term Support) version unless you have a specific reason not to — it's what most libraries are tested against.

Windows

Option A — official installer:

Download the LTS .msi from nodejs.org ↗.
Run it as Administrator, keeping the default options (this includes npm and adds Node to your PATH).
Restart PowerShell or Command Prompt.

Option B — winget:

winget install OpenJS.NodeJS.LTS --source winget

macOS

Option A — Homebrew (recommended if you have it):

brew install node

Option B — official installer: download the LTS .pkg from nodejs.org ↗ and run it.

Option C — nvm (best if you'll juggle multiple Node versions — see below).

Linux

Recommended — nvm:

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.5/install.sh | bash

Close and reopen your terminal, then:

nvm install --lts
nvm use --lts

Ubuntu/Debian users can alternatively use NodeSource's apt repository, or their distro's package manager (apt, dnf, pacman).

nvm — one tool, every OS

If you'll work across multiple projects that need different Node versions, install nvm ↗ (macOS/Linux) or nvm-windows ↗ (Windows — a separate project with the same idea, since the original nvm doesn't run natively on Windows). Both let you switch versions per project with nvm use <version>.

Verify: node -v and npm -v should both print a version number. If either says "not recognized," close and reopen your terminal first — most installers need a fresh shell to pick up the updated PATH.

Installing Git

You'll need Git to push code to your Hugging Face Space and to a GitHub repository for Vercel.

Windows

Download and run the installer from git-scm.com/download/win ↗. The default options are fine for almost everyone — just click through the wizard.

macOS

Run git --version in Terminal — macOS will offer to install the Xcode Command Line Tools (which include Git) if it isn't already present. Or install via Homebrew: brew install git.

Linux

sudo apt install git

Use dnf or pacman in place of apt on Fedora/RHEL or Arch.

Verify: git --version. First-time setup also needs your identity for commits:

git config --global user.name "Your Name"
git config --global user.email "you@example.com"

Running the project locally

Once your AI assistant (Cursor) has generated the codebase from the local development prompt below, you'll typically end up with a backend folder (Python/FastAPI) and a frontend folder (React/TypeScript). Adjust folder and file names to match whatever Cursor actually generates, but the shape of it looks like this:

Terminal — backend

cd backend
python -m venv venv

# Activate the virtual environment:
#   macOS/Linux:
source venv/bin/activate
#   Windows (PowerShell):
venv\Scripts\Activate.ps1

pip install -r requirements.txt
uvicorn main:app --reload --port 8000

Terminal — frontend (new terminal tab/window)

cd frontend
npm install
npm run dev

Open the URL your terminal prints (typically http://localhost:5173 for a Vite + React frontend) in your browser. The frontend should now be talking to your backend at http://localhost:8000. If the browser can't reach your webcam, check that no other application (Zoom, Teams, another browser tab) is already holding the camera.

If something doesn't start

Re-read the terminal error message top to bottom — it usually names the exact missing package or port conflict.
Confirm you're in the right folder (pwd on macOS/Linux, cd alone on Windows shows the current path).
Confirm node -v and python --version both work in that same terminal — a fresh terminal window sometimes doesn't have PATH updates yet.
Ask Cursor directly: paste the exact error into the chat and ask it to fix it.

Accounts & deployment

The plan: the backend (the YOLO model + FastAPI server) runs in a Docker container on Hugging Face Spaces; the frontend (the React dashboard) runs on Vercel.

Creating a GitHub account

Go to github.com/signup ↗.
Enter an email, create a password, and choose a username — this username becomes part of your repository URLs, so pick something you're fine having public.
Verify your email address when prompted.
Create a new repository for your frontend code from the + → New repository button once logged in.

Creating a Hugging Face account

Go to huggingface.co/join ↗.
Sign up with an email address (or continue with an existing account), and verify your email — some features are gated behind a verified address.
Generate an access token for git operations: go to Settings → Access Tokens, click New token, and give it at least write access if you'll be pushing code to a Space via git. Copy and store it somewhere safe — you won't be able to see it again.

Deploying the backend on Hugging Face Spaces

Go to huggingface.co/spaces ↗ and click Create new Space.
Give it a name, choose the owner (your account or an organization), and pick a visibility (Public or Private).
Under Space SDK, select Docker — this tells Hugging Face to build and run whatever Dockerfile it finds in the repo, rather than a pre-set framework like Gradio or Streamlit.
Pick hardware — the free CPU basic tier is enough to start with.
Click Create Space. Hugging Face creates a Git repository for it.
Add your files — either:
- Git: clone the Space's repo locally (git clone https://huggingface.co/spaces/<your-username>/<space-name>), copy in your Dockerfile, requirements.txt, and backend code, then git add . && git commit -m "Initial deploy" && git push. When prompted for a password, use the access token you generated above.
- Web UI: use the Space's Files tab to upload files directly, if you'd rather not use git.
Hugging Face automatically rebuilds the container on every push. The first build can take several minutes, especially if your requirements.txt includes large packages like PyTorch or Ultralytics.
By default, Spaces expects your app to listen on port 7860 — make sure your Dockerfile's CMD starts uvicorn on that port (e.g. CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860"]).
Add any secrets (API keys, config values) under Settings → Variables and secrets on the Space, rather than hard-coding them — they'll be exposed to your container as environment variables.

Where your Dockerfile prompt fits

This is exactly what the "Deployment on the web" prompt below is for — it asks your AI assistant to write a Dockerfile that installs your dependencies, sets up the right user permissions, pre-downloads the YOLO model weights (so the first request isn't slow), and starts the server on the port Hugging Face expects.

Enabling CORS on the backend

Your frontend (on a vercel.app domain) and your backend (on a hf.space domain) are on different origins. By default, browsers block a webpage from calling an API on a different origin — this is CORS, and without configuring it, every request from your deployed frontend to your deployed backend will fail silently with a console error, even though the same call works fine on localhost.

main.py — add before your routes

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://your-frontend.vercel.app"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

Replace https://your-frontend.vercel.app with your actual Vercel URL once you know it. You can ask your AI assistant to add this for you — just tell it your frontend's deployed URL.

Deploying the frontend on Vercel

Push your frontend code to a GitHub repository (create one if you haven't already).
Sign up or log in at vercel.com ↗ — signing in with your GitHub account makes the next step smoother.
From the Vercel dashboard, click Add New… → Project, then Import Git Repository.
Authorize Vercel to access your GitHub account (or organization), then select the frontend repository from the list.
Vercel auto-detects your framework (React, Vite, Next.js, etc.) and fills in sensible build settings — review them, but you usually won't need to change anything.
Add an environment variable pointing the frontend at your backend, e.g. VITE_API_URL (or whatever your code expects) set to your Hugging Face Space's URL — typically https://<your-username>-<space-name>.hf.space.
Click Deploy. Vercel installs dependencies, runs your build, and publishes it — usually within a couple of minutes.

After the first deploy

Every push to your connected branch automatically triggers a new deployment: pushes to your production branch (commonly main) update your live URL, and pushes to any other branch or pull request get their own preview URL — handy for testing changes before they go live.

Free-tier limits worth knowing

Hugging Face Spaces

The free CPU tier has no GPU, limited RAM, and Spaces "go to sleep" after a period of inactivity — the first request after a sleep can take a while as the container wakes back up.

Vercel

The free (Hobby) tier caps monthly bandwidth and serverless function execution time, and is intended for personal, non-commercial projects.

If a deployment fails

Hugging Face Space won't start

Open your Space and check the Logs tab — build errors (missing dependency, Dockerfile syntax) and runtime errors (crash after startup) show up here, and almost always name the exact problem.
Confirm the Space status badge — it should read Running. "Build error" or "Runtime error" links directly to the relevant part of the log.
Visit your Space's URL directly (e.g. append /docs for FastAPI's interactive docs) to confirm the backend responds at all, separate from any frontend issue.

Vercel deployment fails or the site loads blank

Open the failed deployment in the Vercel dashboard and read the Build Logs — most failures are a missing dependency or a build command that doesn't match your project.
If the build succeeds but the page is blank or broken, open your browser's developer console (F12) and check for CORS errors or failed network requests to your backend.
Double-check the environment variable pointing at your backend URL is set for the right environment (Production vs. Preview) in Settings → Environment Variables.

Definition of done

Before calling the deployment finished, confirm each of these:

Locally, opening the dashboard shows live video with bounding boxes and updating counts.
docker build completes without errors on your own machine before you push to Hugging Face.
Your Hugging Face Space shows a Running status, not a build or runtime error.
Visiting the Space's URL directly returns a response (e.g. the FastAPI docs page), confirming the backend is genuinely live.
Your Vercel deployment shows Ready.
Visiting the Vercel URL loads the dashboard with no CORS errors in the browser console.
The deployed dashboard updates its counts from whatever video source you decided on in How the pieces fit together above.

Vibe coding: the prompts

Two prompts carry this build from a local prototype to a deployed web service. Copy either one into your coding assistant of choice and adapt as needed.

Local development phase

prompt — local-dev.txt

You are an expert full-stack engineer and computer vision developer. I want you to build a complete, production-ready web application for real-time people counting and tracking using a fixed webcam mounted in a standard room.

The core architecture must use YOLO (via the Ultralytics library) paired with the ByteTrack algorithm for persistent object tracking.

Please generate the complete codebase, configuration files, and architectural setup based on the following specifications:

1. System Architecture
- Backend: Python (FastAPI or Flask) to handle the video stream processing, YOLO initialization, and tracking logic.
- Frontend: A modern, clean, responsive dashboard in React and TypeScript.
- Communication: Use WebSockets or Server-Sent Events (SSE) to stream real-time analytics data and live processed video frames from the backend to the frontend UI without UI lag.

2. Backend Computer Vision Requirements
- Load the "yolo11n.pt" (or yolov8n.pt) model for optimized real-time CPU/GPU performance.
- Restrict object detection strictly to the "person" class (Class ID 0).
- Implement the tracking loop using model.track(source, persist=True, tracker="bytetrack.yaml").
- Implement robust exception handling for camera initialization. The code must gracefully attempt to fall back across camera indices (0, 1, 2) and try alternative video backends (like cv2.CAP_DSHOW on Windows) if the default camera stream fails to open.
- Maintain two distinct metrics in memory:
1. Current Count: active unique tracking IDs present in the immediate frame.
2. Cumulative Count: the total historical count of unique tracking IDs seen since the session started.

3. Frontend Dashboard UI Requirements
- Viewport: a main center stage displaying the live annotated video stream with YOLO bounding boxes and tracking IDs drawn on screen.
- Analytics cards: high-visibility, clean stat cards showing "Live Headcount" (current occupancy), "Total Unique Visitors" (cumulative traffic), and "System FPS / Status" (camera connection health).
- Control panel: interactive buttons to start/pause the live camera stream, reset the cumulative counter back to zero, and a dropdown to select the camera index (0, 1, 2).

Ensure all code includes inline documentation explaining how frames are grabbed, processed, tracked, and served to the client web browser. Ensure the webcam loop safely releases hardware resources when closed.

Before you deploy this

This prompt (correctly) has the backend open the webcam directly, which only works when the backend is running on the same machine as the camera. Re-read How the pieces fit together above before running the deployment prompt below — you'll need to decide whether the deployed version captures video in the browser instead, or stays a local-only demo.

Deployment on the web

prompt — deployment.txt

For this project to run well on a web browser instead of a local machine, what needs to be done?

I want a Dockerfile that I can have on Hugging Face, which I will use to run my backend. The frontend I will deploy on Vercel. This Dockerfile should tell Hugging Face how to build my environment, grant the correct user permissions, and pre-download the YOLO model weights.

Discuss · Mentimeter

How can this be scaled further for something impactful?

Development concerns & best practices

As a machine learning practitioner, being informed is key. You have to be sensitive to how you collect data, process it, and use it to train your model — and you have to ensure the security of sensitive data throughout.

Data collection

Data is the foundation of every machine learning model. Poor-quality or biased data leads to poor model performance.

Obtain appropriate consent where required.
Collect only the data necessary for your use case (data minimisation).
Ensure the dataset represents the diversity of the environment the model will operate in.
Avoid datasets that introduce bias or unfair representation.

"Garbage in, garbage out." The quality of your model depends heavily on the quality of your data.

Data processing

Before training a model, data must be cleaned and prepared.

Remove duplicate or corrupted data.
Handle missing values appropriately.
Standardise formats and labels.
Annotate data consistently.
Split datasets into training, validation and testing sets to evaluate performance fairly.

Proper preprocessing improves both model accuracy and generalisation.

Privacy & data protection

ML applications often process sensitive information — images, video, audio, or personal identifiers.

Store sensitive data securely.
Encrypt data both in transit and at rest.
Limit access using authentication and authorisation.
Anonymise or pseudonymise personal information where possible.
Delete data that is no longer required.

Security

AI systems can become targets for cyberattacks if not properly secured.

Secure APIs with authentication.
Use HTTPS for all communications.
Keep software updated and manage dependencies.
Validate input to prevent malicious requests.
Log and monitor for suspicious activity.

Security should be considered from the start of development, not added as an afterthought.

Model performance & scalability

A model that performs well during development should also perform reliably in production.

Monitor model accuracy over time.
Detect model drift as real-world data changes.
Optimise inference speed for real-time applications.
Use lightweight models where appropriate (e.g. YOLO11n for edge devices).
Containerise applications with Docker for consistent deployment.
Scale services using cloud platforms when demand increases.

Introduction & motivation

Navigation, made less laborious

Shazam & audio fingerprinting

The build: people counting with YOLO

How the pieces fit together

Local webcam vs. a webcam once it's deployed

Finding the right camera index

Setting up your tools

Installing Cursor

Installing Python

Installing Node.js and npm

Installing Git

Running the project locally

Accounts & deployment

Creating a GitHub account

Creating a Hugging Face account

Deploying the backend on Hugging Face Spaces

Enabling CORS on the backend

Deploying the frontend on Vercel

Free-tier limits worth knowing

If a deployment fails

Definition of done

Vibe coding: the prompts

Local development phase

Deployment on the web

Development concerns & best practices

References