AI

    FastVLM now runs in your browser, and it’s shockingly fast

    FastVLM, Apple’s blazing-fast captioning model, now runs in your browser. Try it on Hugging Face and see real-time results with zero lag.
    FastVLM-1

    Apple’s FastVLM model just became way more accessible, and you can try it right now without downloading a thing.

    FastVLM-2

    A few months back, Apple introduced FastVLM, its ultra-light Visual Language Model built for Apple Silicon. Using MLX, Apple’s in-house open machine learning framework, it promised jaw-dropping speed for tasks like image captioning and object recognition.

    The model is reportedly up to 85 times faster than competitors and three times smaller, which makes it perfect for low-latency video tasks. And now, Apple has opened the door for public testing.

    iPhone users urged to update WhatsApp after silent cyberattack surfaces

    Thanks to Hugging Face, you can now test FastVLM-0.5B (the lightweight version) straight from your browser. No terminal, no install just open the page and start feeding it visuals.

    On an M2 Pro MacBook Pro with 16GB of RAM, it took just a couple of minutes to load. Once running, the model could immediately:

    • Describe people, rooms, and objects
    • Identify facial expressions and emotions
    • Interpret hand gestures or items in view
    • Recognize text or writing
    • Respond to real-time changes in the scene

    You can tweak the input prompt or pick from predefined options like “What is the color of my shirt?” or “What action is happening right now?”

    The system handled scene changes with ease, even when fed chaotic video via a virtual camera. The captions updated rapidly and accurately, even when objects and movement layered over one another.

    That’s impressive, but what makes it even better is this: the model runs locally, in your browser. No cloud processing. No data upload. And yes, it even works offline.

    FastVLM’s lean footprint and near-instant speed make it a natural fit for assistive tech and wearables. Devices that need to process vision data on the fly with zero network dependency could benefit from models just like this.

    Plus, with privacy baked in by design, the local-processing approach checks boxes for healthcare, accessibility, and personal safety use cases.

    FastVLM-0.5B is just the start. Apple is also working on larger variants with 1.5 billion and 7 billion parameters. These promise even better accuracy and complexity, though browser support may not scale with them.

    Still, what’s here already is lightning-fast, eerily accurate, and entirely local. It might be a demo, but it feels like a preview of what’s next.

    No comments yet Write the First Comment
    ×

    Your comment has been submitted,
    it will be published after approval.

    Write a Comment