Voice Control
Compiled RobotRoss knowledge page generated from RobotRoss source code, architecture notes, and operational documentation.
Voice Control
1. Overview
Voice control is a top-level part of the ATF architecture, not a side note. It provides spoken interaction with the same local evidence base used by the text query path.
2. Speech-to-Text
- Whisper is the intended speech-to-text engine for converting operator prompts into local text queries.
- Spoken prompts should be interpreted against the compiled wiki and the operational ledger.
3. Reasoning Path
- The local model answers from the RobotRoss knowledge corpus and ledger evidence.
- This query path is intended to run on the local system, not through a cloud-hosted inference layer.
4. Text-to-Speech
- Voxtral is the intended text-to-speech engine for spoken answers.
- Spoken output should summarize the same evidence-backed answer returned in the text channel.
5. Notes and Open Points
- Voice interaction should remain aligned with the same provenance expectations as the text query path.
- The UI should present voice as a first-class control surface once the runtime hookup is in place.