It’s hard to imagine the modern world without media content. Avatars, photos, videos — all of these have become natural attributes of our daily lives. Developers usually provide different ways to upload an avatar or fetch it from social networks or other services.
But what if a static image is not enough? What if you want to give your users the ability to create something like a video business card — a short introduction where they present themselves or their product?
At first glance, it seems simple: upload the file and just play it. Well, not exactly. A user may upload a file with the wrong compression format or the wrong orientation. They may upload a video that is way too large (and video files, by the way, are always much bigger than what a typical server comfortably handles). And ideally, you’d also want to let the user preview the result directly in the browser.
For operations like this, there’s a universal tool — FFmpeg.
What does the duck say?
FFmpeg is a powerful cross-platform library and set of tools for working with multimedia. It can record, convert, process audio and video, and even stream content in real time. Thanks to its support for a huge number of formats and codecs, FFmpeg has become the de-facto standard for media processing.
Great — so we’ll just use it on the backend. For example, like this:
val command = listOf(
"ffmpeg", "-y", // overwrite output files
"-i", inputFile.absolutePath,
"-vf", "scale=${width}:${height}",
"-c:v", "libvpx", // Use libvpx instead of libvpx-vp9 which might not be available
"-crf", "30",
"-b:v", "500k", // Set a specific bitrate instead of 0
outputFile.absolutePath
)
val process = ProcessBuilder(command)
.redirectErrorStream(true)
.start()
The only problem is that the file must first be uploaded to the server and then wait in a processing queue. Not very reactive. Isn’t there a better way???
That’s when chance stepped in. During one of my interviews, we touched on the topic of WebAssembly. Had I used it in practice? (Well, I did have some cases working with client-side workers.) So, on a rainy day, armed with AI assistance, I started digging in. (Spoiler: I spent about 4 hours figuring out the approach, and it looks like I can now apply it in my own projects.)
What does the duck say?
WebAssembly (Wasm) is a technology that lets you run code written in languages like C, C++ or Rust directly in the browser, with near-native performance. It enables you to use complex and “heavy” libraries like FFmpeg without installing anything on the computer and without needing a server for processing.
This means I can actually load FFmpeg on the client side and process videos right there. A bit of searching led me to this implementation example:
https://github.com/ffmpegwasm/ffmpeg.wasm/tree/main/apps/nextjs-app
Not everything I wanted to see. Let’s optimize the example a bit.
First, let’s add client-side caching, since the library is quite large and should ideally be preloaded.
// Helper to safely access window.caches only in the browser.
function getCaches(): CacheStorage | null {
if (typeof window === "undefined") return null;
if (!("caches" in window)) return null;
return window.caches;
}
// Check if both JS and WASM are present in our cache bucket.
export async function isCoreCached(): Promise<boolean> {
const caches = getCaches();
if (!caches) return false;
const cache = await caches.open(FFMPEG_CACHE_NAME);
const [jsRes, wasmRes] = await Promise.all([
cache.match(CORE_JS_URL),
cache.match(CORE_WASM_URL),
]);
return !!(jsRes && wasmRes);
}
Second step — wrap the processing in a dedicated worker and set up communication with the main thread. The worker will receive processing parameters and return status updates showing how much has been completed.
// Message handler from the main thread
self.onmessage = async (ev: MessageEvent<WorkerCommand>) => {
const msg = ev.data;
try {
if (msg.type === "load") {
await ensureLoaded(msg.payload?.coreURL, msg.payload?.wasmURL);
return;
}
if (msg.type === "preview") {
await ensureLoaded();
if (busy) throw new Error("FFmpeg worker is busy");
await doPreview(msg.payload);
return;
}
if (msg.type === "snapshot") {
await ensureLoaded();
if (busy) throw new Error("FFmpeg worker is busy");
await doSnapshot(msg.payload);
return;
}
if (msg.type === "cancel") {
cancelCurrent();
return;
}
if (msg.type === "terminate") {
cancelCurrent();
post({ type: "terminated" });
return;
}
} catch (e: any) {
post({ type: "error", payload: { message: e?.message || String(e) } });
}
};
And voilà — here’s your implementation of client-side video avatar processing. A user records themselves on camera, then processes the file — for example, trimming and compressing the first 10 seconds and generating a preview for the video.
Of course, this does load the client’s resources, and there’s still a lot to check — how the process behaves in different browsers and systems, and how heavy the load is. But the main goal I set for myself was just to see if the use case works at all. And the answer I got: yes, it does.
GitHub: https://github.com/ninydev-com/ffmpeg-minio-solutions/tree/main/nextjs