A WhatsApp group bot that records gym workouts, detects duplicates, transcribes voice notes, and replies with GPT-4.1 sarcasm — all running in a single Docker container.
The problem
My friend group has a WhatsApp chat where we log gym sessions. Before this bot, someone was running an n8n workflow that received messages, parsed them, and inserted rows into a PostgreSQL table. It worked, but n8n’s visual editor became a bottleneck as the logic grew — adding a guardrail, handling audio messages, or tweaking a system prompt meant wrestling with node configuration instead of writing code.
I rewrote the whole thing as a Python microservice: same domain, full control.
What it does
The bot listens to a WhatsApp group via webhook. When someone posts a workout (“biceps 45 min”), it:
- Buffers rapid messages — waits 7 seconds after the first message to collect bursts (people often send 2–3 short messages in a row).
- Transcribes audio if the message is a voice note (Whisper-1).
- Analyzes images if the message contains a photo (GPT-4o vision, returns structured JSON).
- Classifies intent — is this a workout log, a summary request, or just noise?
- Generates an INSERT via GPT-4.1 with the exact column schema in the system prompt.
- Validates the SQL through a two-layer guardrail (regex + GPT-4o-mini).
- Runs the INSERT with a pre-check for duplicates.
- Replies to the group with a sarcastic success or duplicate message.
Architecture
WhatsApp
│
▼
Evolution API (self-hosted Docker, exposes webhooks)
│ POST /webhook
▼
FastAPI app
├─ dedup by message_id → Redis SET NX (5 min TTL)
├─ BackgroundTask → process_message()
│ ├─ Whisper-1 (audio)
│ ├─ GPT-4o vision (image)
│ ├─ Redis buffer (7 s window)
│ ├─ GPT-4.1 (intent classification)
│ ├─ GPT-4.1 (SQL generation)
│ ├─ regex + GPT-4o-mini (guardrail)
│ ├─ psycopg2 (INSERT into Postgres)
│ └─ GPT-4.1 (response formatter)
│
├─ Redis 7 (buffer + dedup)
└─ PostgreSQL 14 (workouts + chat memory)
Key design decisions
1. One Uvicorn worker — non-negotiable
The message buffer works by pushing each incoming message onto a Redis list, then sleeping N seconds before processing. Only the task that holds the last message ID processes the whole buffer — every other task wakes up and exits early.
This invariant breaks with multiple workers: two messages could land in different workers, both wake up thinking they’re the latest, and you get double processing or race conditions. One worker is the simplest correct solution for a single-group bot.
uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 1
2. Python controls the flow — no agentic loops
The LLM is called as a stateless function, never as a decision-maker. Python dictates the pipeline:
intent = await classify_intent(text) # gpt-4.1, temperature=0
sql = await generate_sql(text, intent) # gpt-4.1, temperature=0
safe, reason = await guardrail_check(sql) # regex → gpt-4o-mini
if safe:
ok, msg = await insert_registro(sql) # psycopg2
response = await format_reply(ok, msg) # gpt-4.1, temperature=0.7
No tool_use, no response_format, no loops. The LLM returns plain text; Python interprets it. This makes the pipeline predictable, cheap to debug, and easy to test in isolation.
3. Multi-model routing by role
| Role | Model | Why |
|---|---|---|
| Intent classifier, SQL generator, response formatter | gpt-4.1 | Best reasoning; temperature=0 for determinism |
| Guardrail | gpt-4o-mini | A cheap second opinion on the generated SQL |
| Image analysis | gpt-4o | Multimodal vision |
| Audio transcription | whisper-1 | Only available transcription model in the OpenAI API |
The cheap model as guardrail means most dangerous SQL is caught by a regex first (O(1), zero network latency), and the LLM only handles edge cases. The guardrail fails safe: any parse error counts as “blocked”.
4. Redis buffer — handling message bursts
WhatsApp users send thoughts as sequential short messages. Without a buffer, each line fires a separate pipeline call. The fix:
await buffer_push(group_id, entry)
await asyncio.sleep(settings.buffer_seconds) # default: 7 s
if not await is_latest_message(group_id, msg_id, timestamp):
return # a newer message will process the full buffer
messages = await buffer_get(group_id)
text = "\n".join(m["text"] for m in messages if m.get("text"))
await buffer_delete(group_id)
The entire burst arrives as a single string to the pipeline.
5. Sync psycopg2 + asyncio.to_thread
psycopg2 is synchronous. Rather than adopting psycopg3 (async-native but more moving parts), all database calls are wrapped in asyncio.to_thread() — they run in a thread pool without blocking the event loop. For a single-group bot the thread-switching overhead is negligible.
async def insert_registro(sql: str) -> tuple[bool, str]:
def _execute() -> tuple[bool, str]:
with get_conn() as conn:
with conn.cursor() as cur:
cur.execute(sql)
return True, ""
return await asyncio.to_thread(_execute)
6. System prompts use str.replace(), not .format()
System prompts include JSON examples with literal {} braces. Using .format() raises KeyError. Using str.replace("{current_datetime}", value) is explicit, safe, and sidesteps the footgun entirely.
Observability
Logs are structured JSON via structlog. Every request binds group_id and session_id as context vars so every downstream log line carries them automatically:
{"level": "info", "event": "pipeline: response sent", "group_id": "120363XXX@g.us", "elapsed_ms": 1842.3}
docker logs -f bot | jq is all the tooling you need for a prototype.
GET /metrics exposes in-memory counters: messages by type, pipeline runs, insert success/duplicate/error, guardrail blocks, and average latency. Swap in prometheus_client when dashboards become necessary.
Stack
Python 3.12 · FastAPI · Uvicorn · OpenAI SDK (gpt-4.1, gpt-4o, gpt-4o-mini, whisper-1) · Evolution API v2 · Redis 7 Alpine · PostgreSQL 14 · psycopg2 · structlog · pydantic-settings · Docker + Compose · Ruff · pytest + pytest-asyncio + respx + fakeredis