Smallest.ai Pulse Review: Is it Faster Than Deepgram?
For the last two years, if you asked "What is the fastest Speech-to-Text model?", the answer was always Deepgram Nova-2.
But a new contender has entered the ring. Smallest.ai's Pulse is built specifically to dethrone the king. It doesn't just promise raw transcription speed; it promises conversational agility.
Here is our deep dive into Pulse and how it compares to the market leader.
1. The Speed Metric: TTFB
Time-To-First-Byte (TTFB) is the critical metric for voice agents. It measures how long from the moment you stop speaking until the first word of the transcript appears.
- OpenAI Whisper (API): ~1500ms+ (Too slow for live convos).
- Deepgram Nova-2: ~200-300ms (The gold standard).
- Smallest.ai Pulse: <100ms (claimed).
In our testing, Pulse feels instantaneous. The transcript seems to appear as you are speaking, almost predicting the flow. This 100ms difference might seem small, but in a voice conversation, it's the difference between "snappy" and "sluggish."
2. The Killer Feature: Interruption Handling
This is where Pulse shines. Most STT engines just transcribe. They don't know context. If a user says: "Wait, stop, actually I want the—"
- Standard STT: Transcribes "Wait stop actually I want the" and sends it to the LLM. The LLM then generates a confused response.
- Pulse: Has built-in Interruption Detection. It signals the system that the user is interrupting the bot. This allows your agent to stop speaking immediately, mimicking human reflex.
3. Code-Switching & Languages
Pulse supports 36+ languages. But its real strength is Code-Switching. Like Sarvam AI, Pulse is designed to handle mixed-language inputs seamlessly.
- Input: "Hola, can you help me con mi cuenta?"
- Pulse: Accurately transcribes the Spanglish without hallucinating.
Deepgram is getting better at this with Nova-3, but Pulse's architecture seems natively designed for this "messy" real-world speech.
4. Batch vs. Streaming
Pulse supports both.
- Streaming: For live agents (WebSockets).
- Batch: For analyzing call recordings.
While Deepgram is still superior for massive batch processing of 1-hour long meetings (due to their mature infrastructure), Pulse is optimizing heavily for the Streaming use case.
Verdict: The New Speed King?
Is Pulse faster than Deepgram? Yes, marginally. Is it better? It depends.
- Use Deepgram if: You need rock-solid stability, established enterprise support, and 99.9% accuracy on long-form English audio.
- Use Pulse if: You are building a Conversational Agent. The combination of <100ms latency and native interruption handling makes it feel more "alive" than any other STT we've tested.
The gap is closing. Deepgram is no longer the default; it's now a choice.
