YumKiosk YumKiosk Docs
Website Agent login Owner panel
Architecture

Video stack

How Agora powers the live video between kiosks and agents.

Video stack

Every YumKiosk session includes a live video + audio link between the kiosk and the agent. This is handled by Agora — a managed WebRTC service that handles TURN, media servers, adaptive bitrate, and client SDKs for both browser and mobile. We chose Agora over rolling our own WebRTC because building a reliable video stack from scratch would eat the entire engineering roadmap for a year.

The handshake

When a session transitions from pending to active (i.e., an agent accepted), both sides need to join the same Agora channel. The flow:

  1. Kiosk fetches GET /api/public/agora/token?channel=ses_8f2c1a&role=publisher and gets a signed Agora token.
  2. Agent fetches GET /api/agora/token?channel=ses_8f2c1a&role=publisher and gets the same kind of token.
  3. Both call Agora's SDK client.join(appId, channel, token, uid) with the same channel name (the session ID).
  4. Both publish their local audio + video streams.
  5. Both subscribe to the other side's remote streams.
  6. Video appears on both screens within 500ms.

The channel name is always the session UUID, so it's globally unique and never collides.

Token issuance

Tokens are signed on our backend using the Agora App Certificate (stored in config/services.php under services.agora.certificate). The sign function is implemented in App\Http\Controllers\Api\AgoraTokenController, using the taylanunutmaz/agora-token-builder composer package.

Each token includes:

  • appId — the Agora project ID (public)
  • appCertificate — the shared secret (private, server only)
  • channelName — the session UUID
  • uid — a random integer user ID
  • rolepublisher or subscriber. Both sides use publisher in YumKiosk since both stream.
  • expireTs — Unix timestamp when the token is no longer valid

Tokens are set to expire 60 minutes after issue. Most sessions are under 5 minutes, so this is plenty. For longer sessions, the client can fetch a fresh token via POST /api/agora/token/refresh and Agora will accept it for the same channel.

Network requirements

Agora prefers UDP for low-latency media, specifically UDP ports 49152–65535. If UDP is blocked (common on corporate networks and captive portals), it falls back to TCP over 443 with degraded quality. Most restaurant Wi-Fi is wide open, so this fallback rarely triggers, but see the Network requirements page for what to tell IT.

Adaptive bitrate

Agora adjusts encoding bitrate dynamically based on network quality. A typical YumKiosk call runs at:

  • Video: 360p at 500 Kbps during normal conditions, downshifting to 240p at 200 Kbps on bad connections.
  • Audio: 48 Kbps Opus, mono.

Total bandwidth per call is around 500–700 Kbps in each direction. See Network requirements for planning numbers.

Privacy and recording

We do not record video by default. Calls are ephemeral — they exist in the Agora media servers for the duration of the session and then are discarded. This matches the typical customer expectation at a kiosk.

Optionally, an owner can enable session recording under Settings → Compliance → Recording. When enabled:

  • A small "REC" indicator appears on the kiosk during the session.
  • The customer must tap Acknowledge before the session starts.
  • Recordings are stored encrypted in Agora's cloud for 30 days, then auto-deleted.
  • Only managers can access recordings via the session detail view.

Recording is disabled by default because many jurisdictions (California, Illinois, the EU) require two-party consent for audio recording. Turning it on without understanding the legal implications is risky. Talk to a lawyer.

Fallback to audio-only

If the kiosk's camera fails (permission denied, hardware fault, blocked), the session falls back to audio-only. The kiosk displays the agent's avatar instead of a black video feed, and the agent is notified that video isn't available. The call continues with order building as normal.

Multi-agent sessions

For future features like supervised training, we may need multiple agents in the same channel. Agora supports this natively — multiple publishers in one channel all see and hear each other. For now, YumKiosk is strictly 1:1 kiosk:agent, but the stack doesn't have any technical limit preventing expansion.

Why not WebRTC directly

We seriously considered rolling our own WebRTC stack — there are good open source TURN servers (Coturn) and good signaling libraries (LiveKit). The deciding factor was reliability and time-to-market. Agora has been battle-tested at scale, handles edge cases we haven't even thought of yet, and let us ship the first version of YumKiosk in weeks instead of months. We may revisit this decision if Agora pricing becomes a problem at scale, but for now the tradeoff is the right one.