A voice-first coaching assistant that simulates reflective, ICF-style conversations across three languages — built on Whisper, GPT, and OpenAI TTS, deployed on Hugging Face Spaces.
Most AI assistants are optimised for answers. Professional coaching works the other way — it uses open questions, reflection, and deliberate restraint to surface what the person already knows.
This project explores whether conversational AI can simulate a more reflective coaching experience through multilingual voice interaction, conversational continuity, and coaching-oriented prompt engineering.
Modular six-stage pipeline with separable components.
The architectural choices that aren't obvious from the flow diagram.
Other ASR providers were evaluated and failed on Kannada and Hindi code-switching without fine-tuning.
Whisper's multilingual model handles all three languages in one inference call with no language detection pre-step.
It can also be replaced with a local faster-whisper instance with no interface change if API cost becomes a constraint.
Coaching conversations are personal. A model response to a garbled transcript — replying to something the user never actually said — breaks trust immediately and is hard to recover from mid-session. One deliberate confirmation tap is a worthwhile UX trade-off for that reliability.
The system prompt overrides the model's default "helpful assistant" behaviour with hard rules derived from ICF coaching principles — not soft stylistic instructions. Explicit constraints: ask one question per turn, never give unsolicited advice, acknowledge emotion before exploring action, reflect before probing deeper.
Empathy level, directness, warmth, questioning style, and coaching framework are independently adjustable through the prompt — no code changes needed.
A Streamlit or FastAPI frontend would have required building microphone capture and audio playback from scratch. Gradio ships both natively alongside session state management — the right scope for a conversational AI prototype.
HF Spaces provides a free persistent deployment with a shareable URL from day one, automatic restarts, and no server provisioning.
All OpenAI services. Each has a documented replacement path at the same interface boundary.
| Service | Role | Drop-in replacements |
|---|---|---|
| OpenAI Whisper | Multilingual speech-to-text | faster-whisper (local), Groq Whisper |
| OpenAI GPT | Coaching intelligence + session memory | Claude, Gemini, LLaMA via Groq/Ollama |
| OpenAI TTS | Text-to-speech response delivery | ElevenLabs, Coqui, gTTS (free/lower quality) |
| Gradio | UI — mic, audio playback, session state | Streamlit + custom audio components |
| HF Spaces | Hosting and deployment | Streamlit Cloud, Render, Railway, local |
The pipeline is modular by design. The session architecture, transcript layer, and prompt scaffolding are reusable as-is — only the content changes.
Adjust empathy, directness, warmth, question style, or switch frameworks — GROW, Co-Active, solution-focused. The prompt file is the only thing that changes; the rest of the pipeline is untouched.
Whisper supports 90+ languages natively. Add a language selector to the UI, pass the
language= parameter to Whisper, and update the system prompt language instruction.
No model changes needed.
GPT → Claude or LLaMA, Whisper → Groq or faster-whisper, TTS → ElevenLabs or Coqui. Each layer has a single function boundary — replacing one doesn't touch the others.
Interview prep coach, journaling companion, language learning tutor, leadership simulation. The voice pipeline and session memory are format-agnostic — any reflective conversation structure fits.
The current architecture is session-scoped by design. Persistent memory, vector database retrieval, session summaries, or personalised coaching profiles can be layered in without restructuring the core pipeline.
# clone and run locally git clone <repository-url> pip install -r requirements.txt # add your API keys to .env python app.py # or push to Hugging Face Spaces for a shareable URL
Coaching conversations are personal. The defaults reflect that.
Conversation history lives in memory for the duration of the session and is not stored in a database..
Conversations are not stored after the session ends. Persistent storage would require an explicit architectural addition.
All API keys are stored as HF Spaces secrets — never sent to the client or visible in browser network traffic.
This is a reflective conversational prototype, not a therapist or mental health service. Not designed for crisis support or clinical use.