Back to Blog
Voice Model Deep Dives5 min read

Smallest.ai Pulse Review: Is it Faster Than Deepgram?

For the last two years, if you asked "What is the fastest Speech-to-Text model?", the answer was always Deepgram Nova-2.

But a new contender has entered the ring. Smallest.ai's Pulse is built specifically to dethrone the king. It doesn't just promise raw transcription speed; it promises conversational agility.

Here is our deep dive into Pulse and how it compares to the market leader.

1. The Speed Metric: TTFB

Time-To-First-Byte (TTFB) is the critical metric for voice agents. It measures how long from the moment you stop speaking until the first word of the transcript appears.

  • OpenAI Whisper (API): ~1500ms+ (Too slow for live convos).
  • Deepgram Nova-2: ~200-300ms (The gold standard).
  • Smallest.ai Pulse: <100ms (claimed).

In our testing, Pulse feels instantaneous. The transcript seems to appear as you are speaking, almost predicting the flow. This 100ms difference might seem small, but in a voice conversation, it's the difference between "snappy" and "sluggish."

2. The Killer Feature: Interruption Handling

This is where Pulse shines. Most STT engines just transcribe. They don't know context. If a user says: "Wait, stop, actually I want the—"

  • Standard STT: Transcribes "Wait stop actually I want the" and sends it to the LLM. The LLM then generates a confused response.
  • Pulse: Has built-in Interruption Detection. It signals the system that the user is interrupting the bot. This allows your agent to stop speaking immediately, mimicking human reflex.

3. Code-Switching & Languages

Pulse supports 36+ languages. But its real strength is Code-Switching. Like Sarvam AI, Pulse is designed to handle mixed-language inputs seamlessly.

  • Input: "Hola, can you help me con mi cuenta?"
  • Pulse: Accurately transcribes the Spanglish without hallucinating.

Deepgram is getting better at this with Nova-3, but Pulse's architecture seems natively designed for this "messy" real-world speech.

4. Batch vs. Streaming

Pulse supports both.

  • Streaming: For live agents (WebSockets).
  • Batch: For analyzing call recordings.

While Deepgram is still superior for massive batch processing of 1-hour long meetings (due to their mature infrastructure), Pulse is optimizing heavily for the Streaming use case.

Verdict: The New Speed King?

Is Pulse faster than Deepgram? Yes, marginally. Is it better? It depends.

  • Use Deepgram if: You need rock-solid stability, established enterprise support, and 99.9% accuracy on long-form English audio.
  • Use Pulse if: You are building a Conversational Agent. The combination of <100ms latency and native interruption handling makes it feel more "alive" than any other STT we've tested.

The gap is closing. Deepgram is no longer the default; it's now a choice.