Conversational AI Coach

Background

Why This Project

Most AI assistants are optimised for answers. Professional coaching works the other way — it uses open questions, reflection, and deliberate restraint to surface what the person already knows.

This project explores whether conversational AI can simulate a more reflective coaching experience through multilingual voice interaction, conversational continuity, and coaching-oriented prompt engineering.

How it works

End-to-end conversation flow

Modular six-stage pipeline with separable components.

01

Speak

Microphone input via Gradio

User speaks in English, Hindi, or Kannada. Whisper supports multilingual transcription while the coaching response language is selected by the user.

02

Transcribe

OpenAI Whisper — multilingual ASR

Whisper converts audio to text in a single multilingual inference call. Outputs the detected language alongside the transcript for downstream prompt construction.

03

Validate transcript

Editable text field — user-controlled submit

The transcript is shown before submission. Nothing reaches GPT until the user confirms. Accent errors, mixed-script confusion, or mispronunciations can be corrected in under two seconds — without breaking the conversation.

04

Generate coaching response

OpenAI GPT + ICF persona system prompt + session history

Full session history is passed with every turn so the model can reference earlier statements. The system prompt enforces ICF coaching behaviour: one question at a time, no unsolicited advice, reflect before probing deeper.

05

Synthesise voice

OpenAI TTS

The coaching response is converted to spoken audio using OpenAI TTS voice synthesis.

06

Play response

Gradio audio component + session state

Audio is played back through the same Gradio interface. Session state is maintained across turns for conversational continuity.

Technical depth

Layer decisions and rationale

The architectural choices that aren't obvious from the flow diagram.

Layer 02 Why Whisper over other ASR providers

OpenAI Whisper Indic language support

Other ASR providers were evaluated and failed on Kannada and Hindi code-switching without fine-tuning. Whisper's multilingual model handles all three languages in one inference call with no language detection pre-step. It can also be replaced with a local faster-whisper instance with no interface change if API cost becomes a constraint.

Layer 03 Why the transcript isn't auto-submitted

User trust ASR error recovery

Coaching conversations are personal. A model response to a garbled transcript — replying to something the user never actually said — breaks trust immediately and is hard to recover from mid-session. One deliberate confirmation tap is a worthwhile UX trade-off for that reliability.

Layer 04 ICF behavioural constraints in the system prompt

Prompt engineering Inspired by ICF-style coaching principles

The system prompt overrides the model's default "helpful assistant" behaviour with hard rules derived from ICF coaching principles — not soft stylistic instructions. Explicit constraints: ask one question per turn, never give unsolicited advice, acknowledge emotion before exploring action, reflect before probing deeper.

Prompt-adjustable coaching style

Empathy level, directness, warmth, questioning style, and coaching framework are independently adjustable through the prompt — no code changes needed.

empathy

How much emotion is validated before moving forward

high / moderate / low

directness

Whether the coach reflects or gently challenges

reflective / challenging

warmth

Language register and relational tone

warm / neutral / formal

question style

Coaching question pattern to follow

Socratic / scaling / miracle

framework

Coaching model the prompt is anchored to

ICF / GROW / Co-active

Layer 06 Why Gradio and Hugging Face Spaces

Gradio HF Spaces

A Streamlit or FastAPI frontend would have required building microphone capture and audio playback from scratch. Gradio ships both natively alongside session state management — the right scope for a conversational AI prototype.

HF Spaces provides a free persistent deployment with a shareable URL from day one, automatic restarts, and no server provisioning.

Stack

External services and modularity

All OpenAI services. Each has a documented replacement path at the same interface boundary.

Service	Role	Drop-in replacements
OpenAI Whisper	Multilingual speech-to-text	faster-whisper (local), Groq Whisper
OpenAI GPT	Coaching intelligence + session memory	Claude, Gemini, LLaMA via Groq/Ollama
OpenAI TTS	Text-to-speech response delivery	ElevenLabs, Coqui, gTTS (free/lower quality)
Gradio	UI — mic, audio playback, session state	Streamlit + custom audio components
HF Spaces	Hosting and deployment	Streamlit Cloud, Render, Railway, local

Adaptability

Extend it for your use case

The pipeline is modular by design. The session architecture, transcript layer, and prompt scaffolding are reusable as-is — only the content changes.

Revoice

the persona

Change the coaching personality

Adjust empathy, directness, warmth, question style, or switch frameworks — GROW, Co-Active, solution-focused. The prompt file is the only thing that changes; the rest of the pipeline is untouched.

Extend

languages

Add more languages

Whisper supports 90+ languages natively. Add a language selector to the UI, pass the language= parameter to Whisper, and update the system prompt language instruction. No model changes needed.

Swap

the models

Replace any AI service

GPT → Claude or LLaMA, Whisper → Groq or faster-whisper, TTS → ElevenLabs or Coqui. Each layer has a single function boundary — replacing one doesn't touch the others.

Repurpose

the workflow

Different conversation, same architecture

Interview prep coach, journaling companion, language learning tutor, leadership simulation. The voice pipeline and session memory are format-agnostic — any reflective conversation structure fits.

Add

persistence

Extend with memory and personalisation

The current architecture is session-scoped by design. Persistent memory, vector database retrieval, session summaries, or personalised coaching profiles can be layered in without restructuring the core pipeline.

# clone and run locally
git clone <repository-url>
pip install -r requirements.txt
# add your API keys to .env
python app.py

# or push to Hugging Face Spaces for a shareable URL

Privacy

Designed for a sensitive context

Coaching conversations are personal. The defaults reflect that.

Session-scoped only

Conversation history lives in memory for the duration of the session and is not stored in a database..

No logging by default

Conversations are not stored after the session ends. Persistent storage would require an explicit architectural addition.

Credentials server-side

All API keys are stored as HF Spaces secrets — never sent to the client or visible in browser network traffic.

Not a clinical tool

This is a reflective conversational prototype, not a therapist or mental health service. Not designed for crisis support or clinical use.

ConversationalAI Coach