TL;DR
10 hours. One night. A 3D conversational agent with lip sync, synthetic voice, and the absurd personality of a Gaulish druid. 3rd place out of 267 teams, Viveris challenge.
This write-up covers both the technical pipeline (how we connected OpenAI, ElevenLabs, and Rhubarb together) and the 3D pipeline (how the avatar was designed in Blender with Mixamo animations). Because a beautiful demo that only runs locally is a demo that dies on presentation day.
Context: Nuit de L’info 2025
Nuit de L’info is a French national hackathon that takes place in a single night — from sunset to sunrise. Teams from across France compete on multiple simultaneous challenges proposed by partner companies.
Our challenge: Viveris, which asked teams to design an original conversational agent with an innovative interface.
Our answer: Jean-Michel Apeupréx, the Resistant Digital Druid — a deliberately absurd Gaulish character who advises villagers on resisting “Big Tech” (the Romans). Embodied in a 3D animated character with lip sync synchronized to a synthesized voice.
The team: Morris II.
L’architecture globale
┌──────────────────────────────────────────────────────┐
│ Browser │
│ React + Three.js ── Audio Player │
│ (3D Avatar / Lip Sync) │
└────────────────────┬─────────────────────────────────┘
│ HTTP POST /chat
┌────────────────────▼─────────────────────────────────┐
│ FastAPI (Koyeb, 24/7) │
│ │
│ ┌─────────────────┐ ┌──────────────────────┐ │
│ │ OpenAI API │ │ ElevenLabs API │ │
│ │ GPT-4o-mini │ │ TTS (voice fr) │ │
│ └────────┬────────┘ └──────────┬───────────┘ │
│ │ │ │
│ ┌────────▼────────────────────────▼───────────┐ │
│ │ FFmpeg (mp3→wav) + Rhubarb (lip sync JSON) │ │
│ └─────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
A single Docker service deployed continuously on Koyeb. No local model — the OpenAI and ElevenLabs APIs handle the AI and voice, which keeps memory usage within the constraints of a free hosting tier.
Architecture Choice: Why Containerize Everything From the Start
This was the first strategic decision of the night, made at 9:30 PM before writing a single line of business logic.
The Problem Without Docker
In a hackathon, every team member has a different environment:
- Incompatible Python versions (
3.10vs3.12) ffmpegmissing on some machines (required by Rhubarb)- Rhubarb Lip Sync only available as a Linux binary
- Windows/Mac/Linux with different paths
Without containerization, you spend 2 hours debugging ModuleNotFoundError instead of building the product.
The Solution: A Dockerfile as a Team Contract
We chose a single service (vs Docker Compose multi-services) because we were targeting a Koyeb deployment at the end of the night — one container, one image to push.
FROM python:3.10-slim
WORKDIR /app
# Dépendances système : ffmpeg (conversion mp3→wav) et le binaire Rhubarb
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg curl \
&& rm -rf /var/lib/apt/lists/*
# Rhubarb Lipsync (binaire Linux pré-compilé)
COPY bin/ ./bin/
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 10000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "10000"]
The critical point: ffmpeg and the rhubarb binary must be in the Linux image. You can’t install them via pip — that’s why the Dockerfile is essential, even for a hackathon. Result: zero onboarding friction, and a deployment via docker push.
Response Generation Pipeline
The magic of the project was this chain:
[Text Input] → [OpenAI GPT-4o-mini] → [ElevenLabs TTS] → [Rhubarb Lip Sync] → [HTTP] → [Three.js]
1. LLM with OpenAI GPT-4o-mini
We chose GPT-4o-mini via the OpenAI API. No local model — in a hackathon, reliability comes first over data sovereignty.
The main constraint: forcing the model to respond in structured JSON with the text, facial expression, and animation to play. For this, response_format={"type": "json_object"} is a hackathon’s best friend.
completion = client.chat.completions.create(
model="gpt-4o-mini",
max_tokens=250,
temperature=0.8,
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message}
]
)
data = json.loads(completion.choices[0].message.content)
messages = data.get("messages", [])
# Chaque message : { text, facialExpression, animation }
The system prompt defines the personality of Jean-Michel Apeupréx: a Gaulish druid who resists “Big Tech” (the Romans), proposes plant-based solutions for computer problems, and never responds seriously.
2. Text-to-Speech with ElevenLabs
We used ElevenLabs for voice synthesis — well above Coqui TTS in quality, and the free API tier was sufficient for one night.
async def generate_audio_elevenlabs(text: str, filename: str):
url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}"
headers = {
"Accept": "audio/mpeg",
"Content-Type": "application/json",
"xi-api-key": ELEVEN_LABS_API_KEY
}
data = {
"text": text,
"model_id": "eleven_multilingual_v2",
"voice_settings": {"stability": 0.4, "similarity_boost": 0.6}
}
response = requests.post(url, json=data, headers=headers)
with open(filename, "wb") as f:
f.write(response.content)
The eleven_multilingual_v2 model handles French natively — no need to hack the target language.
3. Lip Sync with Rhubarb
Rhubarb Lip Sync analyzes a .wav file and generates temporal phonemes (mouthCues) in JSON format. This JSON is sent to the frontend to animate the 3D face mesh’s morph targets.
Full audio pipeline:
ElevenLabs → .mp3 → FFmpeg → .wav → Rhubarb → .json (mouthCues)
async def lip_sync_message(message_id: int):
mp3_file = f"audios/message_{message_id}.mp3"
wav_file = f"audios/message_{message_id}.wav"
json_file = f"audios/message_{message_id}.json"
# MP3 → WAV (Rhubarb requires WAV)
exec_command(["ffmpeg", "-y", "-i", mp3_file, wav_file])
# WAV → mouthCues JSON
exec_command(["rhubarb", "-f", "json", "-o", json_file, wav_file, "-r", "phonetic"])
Rhubarb outputs mouth position codes (A, B, C… X) with their timestamps. The Three.js frontend maps these codes to the avatar’s facial morph targets.
4. The FastAPI Endpoint That Brings It All Together
Single endpoint POST /chat — no WebSocket, no streaming needed for a hackathon:
@app.post("/chat")
async def chat(request: ChatRequest):
user_message = request.message or "Bonjour"
# 1. LLM → structured JSON
completion = client.chat.completions.create(...)
messages = json.loads(completion.choices[0].message.content)["messages"]
for i, msg in enumerate(messages):
# 2. TTS → .mp3
await generate_audio_elevenlabs(msg["text"], f"audios/message_{i}.mp3")
# 3. Lip sync → .json
await lip_sync_message(i)
# 4. Base64 encoding for transport
msg["audio"] = audio_file_to_base64(f"audios/message_{i}.mp3")
msg["lipsync"] = read_json_transcript(f"audios/message_{i}.json")
return {"messages": messages}
The response contains for each message: the text, audio encoded in Base64, mouthCues, and facial animation parameters.
Deployment: From localhost to Koyeb in Production
At 2am, the project was running locally. We needed to make it accessible to the judges — and ideally keep it online after the night.
Why Koyeb and Not Heroku/Render
Koyeb offers a free tier with a Docker container that’s always active (no cold start after 30min of inactivity like with Render Free). For a demo project you want to show at any time, it’s perfect.
The deployment: push the image to GitHub Container Registry, Koyeb detects it and restarts automatically.
# Build et push
docker build -t ghcr.io/sylvaincostes/backend-hackathon:latest .
docker push ghcr.io/sylvaincostes/backend-hackathon:latest
# Koyeb redéploie automatiquement via webhook
The API is publicly accessible 24/7: https://musical-darlleen-morrisii-3d1ed0cf.koyeb.app/docs
Environment Variables in Production
The OPENAI_API_KEY and ELEVEN_LABS_API_KEY keys are injected via Koyeb secrets — never in the Docker image.
The 3D Frontend: Three.js + React + Blender
The backend is important, but the visual magic comes from the frontend. The 3D character is a full technical stack in its own right.
The Avatar: Blender + Mixamo
Jean-Michel’s avatar was modeled and rigged by hand in Blender, then animations were imported from Mixamo (Adobe) — an online library of 3D animations in FBX format.
The workflow:
- Character modeling and rigging in Blender
- Export mesh to Mixamo to generate animations (walk, gestures, idle…)
- Import FBX animations into Blender
- Final export to
.glbwith morph targets for lip sync
The face morph targets (mouth positions) are directly mapped to Rhubarb phoneme codes (A, B, C… X). The frontend receives the mouthCues, and Three.js interpolates the morph targets in real time during audio playback.
The React UI
The interface is built in React with Tailwind. It handles:
- Conversation state and message history
- Audio controls (music / voice volume sliders via shadcn/ui)
- Camera zoom on the avatar
- A start overlay (required to unlock the browser’s AudioContext — a browser constraint)
const handleStart = () => {
// Débloquer l'AudioContext — une interaction utilisateur est requise
const audioContext = new (window.AudioContext || window.webkitAudioContext)()
audioContext.resume()
// Jouer un son silencieux pour "chauffer" le contexte
const oscillator = audioContext.createOscillator()
const gainNode = audioContext.createGain()
gainNode.gain.value = 0.001
oscillator.connect(gainNode)
gainNode.connect(audioContext.destination)
oscillator.start()
oscillator.stop(audioContext.currentTime + 0.1)
startMusic() // Musique de fond de la clairière
setStarted(true)
}
This pattern (silent oscillator) is the only reliable way to unlock the AudioContext on iOS and most modern browsers.
What I Would Have Done Differently
LLM Response Streaming
We waited for GPT-4o-mini to generate the complete response before launching TTS. With streaming, you can pipe tokens to ElevenLabs as soon as the first sentences arrive — perceived latency divided by ~3.
Audio Cache on the Server Side
Every ElevenLabs call for the same text regenerates a new file. A simple cache based on the text’s MD5 hash would avoid redundant calls and reduce latency for recurring greetings.
More Varied Mixamo Animations
We used Mixamo animations as-is. With more time, we would have blended between multiple idle animations to avoid the repetitive effect.
The Numbers
| Metric | Value |
|---|---|
| Total dev time | ~10 hours |
| Docker services | 1 |
| Backend lines of code | ~600 |
| Ranking | 3rd / 267 teams |
| First public URL generated | 4:12 AM |
| Services deployed | 1 (single container Koyeb) |
Conclusion
This hackathon confirmed to me that DevOps isn’t a phase of the project — it’s the starting point. The decision to containerize everything at 9:30 PM is what allowed us to deliver a working demo at 7am without chasing missing dependencies.
The “brilliant” code (Three.js, lip sync, bot personality) would have been worthless if we hadn’t been able to deploy it reliably. The judges saw a demo that runs. Not a screenshot.
The backend source code is available on GitHub: SylvainCostes/backend-hackathon.
The API is live: musical-darlleen-morrisii-3d1ed0cf.koyeb.app/docs.