Introduction & motivation
Artificial intelligence tools are far more woven into daily life today than they were a decade ago. That shift comes down to advancements in the field, and to software and hardware being built with the user placed at the centre of it all. As more solutions reach the market, the goal stays consistent: take a cumbersome task and make it bearable.
Navigation, made less laborious
GPS, and its use inside apps like Google Maps and Apple Maps, has turned navigating unfamiliar territory into a far less painstaking adventure. Beyond directions, users get recommendations, ratings and reviews for shops, restaurants and hotels along the way — imperfect at times, but transformative overall.
Shazam & audio fingerprinting
Shazam identifies a song playing nearby in seconds. It leverages audio fingerprinting — digitally condensing an audio signal by extracting its acoustically relevant characteristics — to find an exact match, even against background noise.
- The recorded audio is transformed into a spectrogram — a visual representation of signal frequencies over time.
- Peak points are extracted from that spectrogram.
- Hash pairs of these peaks are combined into unique hash values.
- The fingerprint is sent to Shazam's server, which holds fingerprints for millions of songs, and searches for a matching pattern at the right time offsets.
- If a strong match is found, the song name comes back within seconds.
Shazam compares your recording's hashes against its database and identifies the song with the highest number of matches — the fingerprint that lines up best with your sample, even when it isn't an exact match.
The pattern repeats across every example: powerful algorithms only matter once they reach the end user with keen, strategic execution. Get that right and the ripple effects follow — high adoption and engagement, organic growth in the user base, minimised training requirements, long-term loyalty, and sensitive data that stays protected along the way.
The build: people counting with YOLO
Today's project: a simple web application that uses a camera and YOLOv11n to count people through an entrance point in real time.
- VS Code or your preferred IDE
- Cursor
- Hugging Face account
- GitHub account
- Vercel account
Vibe coding: the prompts
Two prompts carry this build from a local prototype to a deployed web service. Copy either one into your coding assistant of choice and adapt as needed.
Local development phase
You are an expert full-stack engineer and computer vision developer. I want you to build a complete, production-ready web application for real-time people counting and tracking using a fixed webcam mounted in a standard room. The core architecture must use YOLO (via the Ultralytics library) paired with the ByteTrack algorithm for persistent object tracking. Please generate the complete codebase, configuration files, and architectural setup based on the following specifications: 1. System Architecture - Backend: Python (FastAPI or Flask) to handle the video stream processing, YOLO initialization, and tracking logic. - Frontend: A modern, clean, responsive dashboard in React and TypeScript. - Communication: Use WebSockets or Server-Sent Events (SSE) to stream real-time analytics data and live processed video frames from the backend to the frontend UI without UI lag. 2. Backend Computer Vision Requirements - Load the "yolo11n.pt" (or yolov8n.pt) model for optimized real-time CPU/GPU performance. - Restrict object detection strictly to the "person" class (Class ID 0). - Implement the tracking loop using model.track(source, persist=True, tracker="bytetrack.yaml"). - Implement robust exception handling for camera initialization. The code must gracefully attempt to fall back across camera indices (0, 1, 2) and try alternative video backends (like cv2.CAP_DSHOW on Windows) if the default camera stream fails to open. - Maintain two distinct metrics in memory: 1. Current Count: active unique tracking IDs present in the immediate frame. 2. Cumulative Count: the total historical count of unique tracking IDs seen since the session started. 3. Frontend Dashboard UI Requirements - Viewport: a main center stage displaying the live annotated video stream with YOLO bounding boxes and tracking IDs drawn on screen. - Analytics cards: high-visibility, clean stat cards showing "Live Headcount" (current occupancy), "Total Unique Visitors" (cumulative traffic), and "System FPS / Status" (camera connection health). - Control panel: interactive buttons to start/pause the live camera stream, reset the cumulative counter back to zero, and a dropdown to select the camera index (0, 1, 2). Ensure all code includes inline documentation explaining how frames are grabbed, processed, tracked, and served to the client web browser. Ensure the webcam loop safely releases hardware resources when closed.
Deployment on the web
For this project to run well on a web browser instead of a local machine, what needs to be done? I want a Dockerfile that I can have on Hugging Face, which I will use to run my backend. The frontend I will deploy on Vercel. This Dockerfile should tell Hugging Face how to build my environment, grant the correct user permissions, and pre-download the YOLO model weights.
How can this be scaled further for something impactful?
Development concerns & best practices
As a machine learning practitioner, being informed is key. You have to be sensitive to how you collect data, process it, and use it to train your model — and you have to ensure the security of sensitive data throughout.
Data is the foundation of every machine learning model. Poor-quality or biased data leads to poor model performance.
- Obtain appropriate consent where required.
- Collect only the data necessary for your use case (data minimisation).
- Ensure the dataset represents the diversity of the environment the model will operate in.
- Avoid datasets that introduce bias or unfair representation.
"Garbage in, garbage out." The quality of your model depends heavily on the quality of your data.
Before training a model, data must be cleaned and prepared.
- Remove duplicate or corrupted data.
- Handle missing values appropriately.
- Standardise formats and labels.
- Annotate data consistently.
- Split datasets into training, validation and testing sets to evaluate performance fairly.
Proper preprocessing improves both model accuracy and generalisation.
ML applications often process sensitive information — images, video, audio, or personal identifiers.
- Store sensitive data securely.
- Encrypt data both in transit and at rest.
- Limit access using authentication and authorisation.
- Anonymise or pseudonymise personal information where possible.
- Delete data that is no longer required.
AI systems can become targets for cyberattacks if not properly secured.
- Secure APIs with authentication.
- Use HTTPS for all communications.
- Keep software updated and manage dependencies.
- Validate input to prevent malicious requests.
- Log and monitor for suspicious activity.
Security should be considered from the start of development, not added as an afterthought.
A model that performs well during development should also perform reliably in production.
- Monitor model accuracy over time.
- Detect model drift as real-world data changes.
- Optimise inference speed for real-time applications.
- Use lightweight models where appropriate (e.g. YOLO11n for edge devices).
- Containerise applications with Docker for consistent deployment.
- Scale services using cloud platforms when demand increases.