Choosing a Satellite
HiveMind satellites form a spectrum based on where audio processing happens. Choose the right satellite for your hardware and use case.
Comparison
| Satellite | Mic | VAD | Wakeword | STT | TTS | What crosses the wire |
|---|---|---|---|---|---|---|
| HiveMind-cli | — | — | — | — | — | Text in / text out |
| hivemind-mic-satellite | local | local | hub | hub | hub | Raw audio stream |
| HiveMind-voice-relay | local | local | local | hub | hub | Audio after wakeword |
| HiveMind-voice-sat | local | local | local | local | local | Text utterances only |
| WebSpeech Browser | browser | browser | — | hub | hub | Audio from browser |
Hub requirements by satellite
| Satellite | Hub must provide |
|---|---|
| HiveMind-cli | Any hub (OVOS skills or Persona) |
| hivemind-mic-satellite | hivemind-audio-binary-protocol for STT/TTS/wakeword |
| HiveMind-voice-relay | hivemind-audio-binary-protocol for STT/TTS |
| HiveMind-voice-sat | Any hub (sends text utterances) |
| WebSpeech Browser | hivemind-audio-binary-protocol for STT/TTS |
See Audio Binary Protocol for hub-side setup.
Decision guide
- Do you need voice input?
- No → use HiveMind-cli
-
Yes → continue
-
Can your device run local STT and TTS models? (needs a capable CPU/GPU)
- Yes → use HiveMind-voice-sat (most private, works offline after setup)
-
No → continue
-
Do you need a wakeword on the device? (saves bandwidth; needed for service-at-scale)
- Yes → use HiveMind-voice-relay (local wakeword; STT/TTS on hub)
-
No → use hivemind-mic-satellite (cheapest hardware; everything on hub)
-
Is this a web browser or web app?
- Yes → use WebSpeech Browser
About hub-owned STT/TTS
When a satellite uses hub-side audio processing (mic-satellite or voice-relay), the hub operator decides the STT engine, TTS engine, and voice — the satellite cannot override them. This is the "HiveMind as a service" model: speech services are authenticated and centrally governed, like any other message on the protocol.
A voice-sat by contrast runs its own STT and TTS plugins and sends only the transcribed text to the hub. It has full control over its local audio stack.