Introduction
Imagine running a sophisticated object detection modelβtrained on 80+ categoriesβentirely in your browser. No server uploads, no API calls, no privacy concerns. Just pure client-side inference at 30+ FPS.
This isn't science fiction. With TensorFlow.js, we can deploy production-ready machine learning models that:
- Run 100% client-side (privacy by design)
- Work offline (no internet required after page load)
- Cost $0 per inference (no cloud bills)
- Process data instantly (no network latency)
In this tutorial, I'll walk through building a real-time object detection app using TensorFlow.js and the COCO-SSD model. We'll cover architecture decisions, performance optimizations, and real-world trade-offs.
Live Demo: Try object detection Code: GitHub Repository
Why Client-Side ML?
The Privacy Argument
Traditional server-side approach:
User uploads photo β Server processes β Returns results
β
Your image is now on someone's server
Who has access? How long is it stored?
Client-side approach:
User loads model β Inference in browser β Results instantly
β
Your image NEVER leaves your device
Real-world applications:
- Healthcare: Skin lesion detection without uploading medical photos
- Finance: Document analysis (checks, invoices) without server exposure
- Personal: Photo organization without sending your family photos to Google
The Performance Argument
| Metric | Server-Side | Client-Side |
|---|---|---|
| Latency | 200-500ms (network + compute) | 50-150ms (compute only) |
| Cost per 1M inferences | $100-500 | $0 |
| Scales with users? | β (need more servers) | β (user's device) |
| Works offline? | β | β |
The catch: Client-side requires more powerful devices and larger initial download.
Architecture Overview
Tech Stack
{
"dependencies": {
"@tensorflow/tfjs": "^4.20.0", // Core TensorFlow.js
"@tensorflow-models/coco-ssd": "^2.2.3", // Pre-trained object detection
"next": "14.2.x", // React framework
"react-webcam": "^7.2.0" // Camera access
}
}
Model Selection: COCO-SSD
COCO-SSD (Common Objects in Context - Single Shot Detector) is a pre-trained model that detects 80 object categories:
// Sample detections
[
{ class: 'person', score: 0.95, bbox: [100, 150, 200, 400] },
{ class: 'laptop', score: 0.87, bbox: [300, 200, 150, 100] },
{ class: 'cup', score: 0.72, bbox: [450, 250, 50, 80] }
]
Why COCO-SSD?
| Criterion | COCO-SSD | MobileNet | YOLO |
|---|---|---|---|
| Model Size | 5.4 MB | 4.3 MB | 60+ MB |
| Inference Speed | 50-70ms | 30-40ms | 200+ ms |
| Accuracy (mAP) | 22% | 18% | 40%+ |
| Browser Support | β | β | β οΈ (ONNX required) |
| Pre-trained | β | β οΈ (image classification only) | β |
Trade-off: COCO-SSD is the best balance of size, speed, and accuracy for browser deployment.
Implementation: Step-by-Step
Step 1: Setup TensorFlow.js
npm install @tensorflow/tfjs @tensorflow-models/coco-ssd react-webcam
// lib/loadModel.ts
import * as cocoSsd from '@tensorflow-models/coco-ssd';
import '@tensorflow/tfjs-backend-webgl'; // GPU acceleration
let model: cocoSsd.ObjectDetection | null = null;
export async function loadModel(): Promise<cocoSsd.ObjectDetection> {
if (model) return model; // Cache model after first load
console.log('Loading COCO-SSD model...');
const startTime = performance.now();
model = await cocoSsd.load({
base: 'lite_mobilenet_v2', // Fastest variant (5.4 MB)
});
const loadTime = performance.now() - startTime;
console.log(`Model loaded in ${loadTime.toFixed(2)}ms`);
return model;
}
Model variants:
// lite_mobilenet_v2 (5.4 MB) - fastest
await cocoSsd.load({ base: 'lite_mobilenet_v2' });
// mobilenet_v2 (13 MB) - balanced
await cocoSsd.load({ base: 'mobilenet_v2' });
// mobilenet_v1 (6.2 MB) - legacy
await cocoSsd.load({ base: 'mobilenet_v1' });
Recommendation: Use lite_mobilenet_v2 unless you need the extra 2-3% accuracy gain.
Step 2: Camera Integration
'use client';
import React, { useRef, useEffect, useState } from 'react';
import Webcam from 'react-webcam';
import { loadModel } from '@/lib/loadModel';
import type { DetectedObject } from '@tensorflow-models/coco-ssd';
export default function ObjectDetection() {
const webcamRef = useRef<Webcam>(null);
const canvasRef = useRef<HTMLCanvasElement>(null);
const [model, setModel] = useState<any>(null);
const [isDetecting, setIsDetecting] = useState(false);
const [fps, setFps] = useState(0);
// Load model on mount
useEffect(() => {
loadModel().then((loadedModel) => {
setModel(loadedModel);
console.log('Model ready for inference');
});
}, []);
return (
<div className="relative">
{/* Webcam */}
<Webcam
ref={webcamRef}
audio={false}
screenshotFormat="image/jpeg"
videoConstraints={{
width: 640,
height: 480,
facingMode: 'user', // Front camera
}}
className="rounded-lg"
/>
{/* Canvas overlay for bounding boxes */}
<canvas
ref={canvasRef}
width={640}
height={480}
className="absolute top-0 left-0"
/>
{/* Controls */}
<div className="mt-4 space-x-4">
<button
onClick={() => setIsDetecting(!isDetecting)}
disabled={!model}
className="px-4 py-2 bg-blue-600 text-white rounded-lg"
>
{isDetecting ? 'Stop Detection' : 'Start Detection'}
</button>
<span className="text-sm text-slate-600">
{fps > 0 && `${fps} FPS`}
</span>
</div>
</div>
);
}
Step 3: Real-Time Detection Loop
useEffect(() => {
if (!isDetecting || !model) return;
let animationFrameId: number;
let lastFrameTime = performance.now();
let frameCount = 0;
const detect = async () => {
// Get video element from Webcam component
const video = webcamRef.current?.video;
const canvas = canvasRef.current;
if (!video || !canvas || video.readyState !== 4) {
animationFrameId = requestAnimationFrame(detect);
return;
}
// Run inference
const predictions = await model.detect(video);
// Draw results
drawPredictions(predictions, canvas);
// Calculate FPS
frameCount++;
const currentTime = performance.now();
if (currentTime - lastFrameTime >= 1000) {
setFps(frameCount);
frameCount = 0;
lastFrameTime = currentTime;
}
// Next frame
animationFrameId = requestAnimationFrame(detect);
};
detect();
// Cleanup
return () => {
if (animationFrameId) {
cancelAnimationFrame(animationFrameId);
}
};
}, [isDetecting, model]);
Key Optimization: requestAnimationFrame synchronizes with display refresh rate (60 Hz) for smooth rendering.
Step 4: Drawing Bounding Boxes
function drawPredictions(
predictions: DetectedObject[],
canvas: HTMLCanvasElement
) {
const ctx = canvas.getContext('2d');
if (!ctx) return;
// Clear previous frame
ctx.clearRect(0, 0, canvas.width, canvas.height);
// Configure drawing style
ctx.font = '16px Arial';
ctx.textBaseline = 'top';
predictions.forEach((prediction) => {
const [x, y, width, height] = prediction.bbox;
const { class: className, score } = prediction;
// Draw bounding box
ctx.strokeStyle = '#00FF00';
ctx.lineWidth = 2;
ctx.strokeRect(x, y, width, height);
// Draw label background
const label = `${className} (${(score * 100).toFixed(1)}%)`;
const labelWidth = ctx.measureText(label).width;
const labelHeight = 20;
ctx.fillStyle = '#00FF00';
ctx.fillRect(x, y - labelHeight, labelWidth + 10, labelHeight);
// Draw label text
ctx.fillStyle = '#000000';
ctx.fillText(label, x + 5, y - labelHeight + 2);
});
}
Visual Result:
βββββββββββββββββββββββ
β person (95.3%) β
β βββββββββββββββββ β
β β β β
β β β β
β β β β
β βββββββββββββββββ β
βββββββββββββββββββββββ
Performance Optimizations
1. Backend Selection (CPU vs. GPU)
TensorFlow.js supports multiple backends:
import '@tensorflow/tfjs-backend-webgl'; // GPU (fastest)
import '@tensorflow/tfjs-backend-wasm'; // CPU fallback
import '@tensorflow/tfjs-backend-cpu'; // Slowest
// Check active backend
console.log(tf.getBackend()); // 'webgl', 'wasm', or 'cpu'
Benchmark (inference time per frame):
| Backend | Desktop | Mobile |
|---|---|---|
| WebGL (GPU) | 50ms | 120ms |
| WASM (CPU) | 180ms | 400ms |
| CPU (JavaScript) | 800ms | 2000ms |
Recommendation: WebGL whenever possible (95% of modern browsers support it).
2. Model Warm-Up
First inference is slow due to GPU initialization. Pre-warm the model:
async function warmUpModel(model: cocoSsd.ObjectDetection) {
// Create a dummy 1x1 image
const dummyImage = tf.zeros([1, 1, 3]);
// Run inference (discard result)
await model.detect(dummyImage as any);
dummyImage.dispose(); // Clean up memory
console.log('Model warmed up');
}
// After loading model
loadModel().then((model) => {
warmUpModel(model);
setModel(model);
});
Impact: First real inference drops from 300ms β 50ms.
3. Frame Skipping
On slower devices, skip frames to maintain responsiveness:
let frameSkipCounter = 0;
const FRAME_SKIP = 2; // Process every 3rd frame
const detect = async () => {
frameSkipCounter++;
if (frameSkipCounter % FRAME_SKIP !== 0) {
animationFrameId = requestAnimationFrame(detect);
return;
}
// Run detection...
};
Trade-off: Reduces CPU/GPU load but detection feels less smooth.
4. Confidence Threshold Filtering
Filter low-confidence predictions:
const predictions = await model.detect(video, undefined, 0.5); // 50% threshold
// Or filter manually
const highConfidence = predictions.filter((p) => p.score > 0.6);
Impact: Fewer false positives, faster rendering.
5. Memory Management
TensorFlow.js creates GPU textures that must be manually disposed:
import * as tf from '@tensorflow/tfjs';
// Before component unmounts
useEffect(() => {
return () => {
tf.dispose(); // Clean up all tensors
};
}, []);
// Check memory usage
console.log(tf.memory());
// { numTensors: 124, numDataBuffers: 98, numBytes: 4567890 }
Trade-Offs: Client-Side vs. Server-Side
When Client-Side ML Makes Sense
β Privacy-sensitive applications (healthcare, finance) β High inference volume (cost prohibitive on server) β Real-time responsiveness (latency-critical) β Offline capability (no internet required)
When Server-Side ML Makes Sense
β Complex models (>50 MB, won't fit in browser) β Frequent updates (retraining weekly, don't want users to re-download) β Low-powered devices (IoT, older phones) β Centralized monitoring (need aggregate analytics)
Hybrid Approach
Best of both worlds:
async function detectObjects(image: HTMLImageElement) {
// Try client-side first
if (tf.getBackend() === 'webgl') {
return await clientSideDetection(image);
}
// Fallback to server
console.warn('GPU unavailable, using server...');
return await serverSideDetection(image);
}
Real-World Considerations
1. Model Size vs. User Experience
COCO-SSD download: 5.4 MB
User experience on different connections:
| Connection | Download Time | Acceptable? |
|---|---|---|
| 4G (10 Mbps) | 4.3 seconds | β |
| 3G (1 Mbps) | 43 seconds | β οΈ |
| 2G (0.1 Mbps) | 7 minutes | β |
Solution: Show loading progress and allow background download:
const [loadingProgress, setLoadingProgress] = useState(0);
await cocoSsd.load({
base: 'lite_mobilenet_v2',
modelUrl: 'https://cdn.example.com/model.json', // Custom CDN
onProgress: (fraction) => {
setLoadingProgress(Math.round(fraction * 100));
},
});
2. Browser Compatibility
TensorFlow.js support:
| Browser | WebGL Backend | WASM Backend | CPU Backend |
|---|---|---|---|
| Chrome 90+ | β | β | β |
| Firefox 88+ | β | β | β |
| Safari 14+ | β | β | β |
| Edge 90+ | β | β | β |
| Mobile (iOS 14+) | β | β | β |
| Mobile (Android 10+) | β | β | β |
Coverage: 95% of users (as of 2025).
3. Battery Consumption
GPU-accelerated inference drains battery. Monitor performance:
// Throttle inference on battery-powered devices
const isBatteryLow = navigator.getBattery().then((battery) => battery.level < 0.2);
if (isBatteryLow) {
// Reduce FPS or switch to WASM backend
await tf.setBackend('wasm');
}
Advanced Use Cases
1. Custom Model Training
Train your own model and convert to TensorFlow.js:
# Train in Python
import tensorflow as tf
model = tf.keras.Sequential([...])
model.fit(X_train, y_train)
# Convert to TensorFlow.js
import tensorflowjs as tfjs
tfjs.converters.save_keras_model(model, 'path/to/model')
// Load custom model
const model = await tf.loadLayersModel('https://yourcdn.com/model.json');
2. Post-Processing: Object Tracking
Track objects across frames (e.g., count people entering a store):
interface TrackedObject {
id: string;
class: string;
lastSeen: number;
bbox: number[];
}
let trackedObjects: TrackedObject[] = [];
function updateTracking(predictions: DetectedObject[]) {
predictions.forEach((pred) => {
// Find closest existing object (IoU matching)
const match = trackedObjects.find((obj) =>
isSameObject(obj.bbox, pred.bbox)
);
if (match) {
match.lastSeen = Date.now();
match.bbox = pred.bbox;
} else {
// New object detected
trackedObjects.push({
id: crypto.randomUUID(),
class: pred.class,
lastSeen: Date.now(),
bbox: pred.bbox,
});
}
});
// Remove objects not seen in 2 seconds
trackedObjects = trackedObjects.filter(
(obj) => Date.now() - obj.lastSeen < 2000
);
}
3. Multi-Model Pipeline
Combine multiple models for complex tasks:
// 1. Object detection (COCO-SSD)
const objects = await cocoModel.detect(image);
// 2. Filter for people
const people = objects.filter((obj) => obj.class === 'person');
// 3. Face detection on each person
const faces = await Promise.all(
people.map((person) => {
const croppedImage = cropImage(image, person.bbox);
return faceDetectionModel.detect(croppedImage);
})
);
// 4. Emotion recognition on each face
const emotions = await Promise.all(
faces.map((face) => emotionModel.predict(face))
);
Lessons Learned
What Worked
- Progressive enhancement: Start with server-side, add client-side as enhancement
- Lazy loading: Don't load model until user clicks "Start Detection"
- User feedback: Show FPS counter and model loading progress
What Was Challenging
- Mobile Safari quirks: WebGL context limits (max 16 simultaneous)
- Memory leaks: Forgetting to
dispose()tensors crashed tab after 5 minutes - Webcam permissions: Users confused by browser permission prompts
Future Improvements
- WebGPU backend: 2-3x faster than WebGL (Chrome 113+)
- Model quantization: Reduce COCO-SSD to 2 MB with INT8 quantization
- WebAssembly SIMD: Faster CPU inference on devices without GPU
Try It Yourself
Starter Template: GitHub - TensorFlow.js Object Detection
git clone https://github.com/nicolasavril/tfjs-object-detection.git
cd tfjs-object-detection
npm install
npm run dev
Explore other pre-trained models:
Conclusion
Client-side machine learning with TensorFlow.js represents a paradigm shift: privacy-first AI that scales infinitely. By moving inference to the user's device, we eliminate:
- Privacy concerns (data never leaves device)
- Infrastructure costs (users provide compute)
- Network latency (instant predictions)
The trade-off? Initial model download and device compatibility. But for 95% of use cases, the benefits outweigh the costs.
Three key takeaways:
- Privacy is a feature: Users increasingly value data protection
- Performance matters: WebGL backend is 10x faster than CPU
- User experience first: Show loading states, handle errors gracefully
The future is edge AI: As browsers get more powerful and WebGPU rolls out, client-side ML will become the default, not the exception.
Resources
Code & Demo:
Documentation:
Performance:
About the Author
Nicolas Avril is a Data Scientist & AI Engineer specializing in computer vision, NLP, and privacy-preserving machine learning. He builds production-ready AI applications with a focus on user privacy and performance optimization.
Connect: LinkedIn | Portfolio | GitHub
Interested in client-side ML? Follow me for more tutorials on TensorFlow.js, privacy-first AI, and modern web development.
Questions about the implementation? Drop a comment below or reach out on GitHub!