AI Agent Office Case Study

FEATURED INSIGHT

“One inbox, eight agents, zero copy-paste — and every output is audited by another agent before it reaches us.”

NEONVIL & Confidential B2B SaaS Client

A Case Study: An Event-Driven Multi-Agent Office on Serverless Firestore

Business Outcome

A multi-agent 'virtual office' on Firebase — eight specialised AI agents plan, execute, and audit work on top of Firestore as both message bus and real-time UI feed. MVP feature-complete and QA-hardened, scheduled for pilot launch.

Event-Driven Pipeline

Eight agents orchestrated by Firestore triggers — no queue, no WebSocket, no polling.

Self-Auditing Outputs

A dedicated auditor agent reviews every labor-unit output against the original plan before it reaches the user.

Zero-Trust Multi-Tenant

LLM keys in Secret Manager, tenant-scoped Firestore rules, and admin-only writes to agent configs and system defaults.

The numbers behind the platform:

0

Specialised AI Agents

0

Executable Skills in Registry

0

Hot-Swappable LLM Providers

0

Firestore Triggers Drive the Pipeline

Context & Challenges

The client wanted to replace ad-hoc prompt-engineering — users copy-pasting into ChatGPT, then copy-pasting results back into Gmail, Telegram, or Calendar — with a single natural-language inbox where a team of specialised AI agents autonomously plans, executes, audits, and delivers outcomes, with human approval gates on anything high-impact. The constraint: no queue, no WebSocket, no polling — everything event-driven on Firestore.

Five critical engineering challenges:

Unbounded LLM cost and hallucinated outputs with no second pair of eyes: a dedicated auditor agent runs after every labor-unit output, with a retry cap of 3 before escalating for human review.

Heterogeneous LLM providers per agent: only Secret Manager secret IDs persist in Firestore; raw keys resolve at function runtime and are stripped before any agent config is served to the client.

Stale tasks on a serverless pipeline: a 12-hour scheduled cron scans a CollectionGroup query for tasks stuck in 'executing' longer than 9 minutes and nudges the user.

Firestore is not a queue: idempotent triggers guarded by status transitions (pending → executing → awaiting_audit) prevent double-fired audits on concurrent onUpdate events.

Strict multi-tenant isolation with zero client-side trust: Firestore rules lock agents and system defaults to deny-all; every mutation flows through Cloud Functions using the Admin SDK.

Project Goals

Kanban Frontend

A real-time Kanban SPA in React 19 + MUI 7 + Redux Toolkit, with @dnd-kit drag-and-drop and Firestore listeners — no WebSocket infrastructure.

Agent Orchestration

An event-driven Cloud Functions Gen 2 backend with 8 specialised agents, a plan-execute-audit loop, and tenant-scoped Secret Manager for LLM keys.

Skills Registry

A 29-skill registry that agents invoke for Gmail, Calendar, Telegram, WhatsApp, web search, content, and memory operations.

Our Solution

Three layers, one substrate. The frontend drops missions into Firestore. The orchestration backend plans, executes, and audits via triggers. The skills layer reaches into external tools through a dispatcher registry. Firestore is both the message bus and the real-time UI feed.

INTERFACE

▲

Kanban Frontend

Drag-and-drop missions, real-time agent feedback

Real-time Kanban (Intake / In Progress / Needs Approval / Done) via Firestore listeners

@dnd-kit for WCAG-compliant drag-and-drop (keyboard and screen-reader)

Outcome dashboard: hours saved, LLM cost, worker credits

Virtualised long lists via react-virtuoso; markdown via react-markdown

AI ORCHESTRATION

Agent Orchestration

Plan, execute, audit — no queue, no WebSocket

Ariv

Vima

Kapp

Theri

Sood

Koor

Punay

Valai

The orchestrator produces a JSON plan with human-in-the-loop flags; the auditor validates every output against that plan

5 Firestore triggers drive the pipeline end-to-end — zero queue infrastructure

Auditor veto loop retries up to 3 times before escalating for human review

A monitor agent scans for stale tasks every 12 hours and nudges the user

EXECUTION LAYER

🧩

Skills & Integrations

The hands that reach into external tools

Gmail

Calendar

Web Search

Content

Memory

29-skill registry with a dispatcher pattern — agents request skills, not services

Multi-provider LLM: Gemini planning, GPT-4o extraction, Claude Sonnet long-form

API keys live in Secret Manager, never in Firestore — only secret IDs persist

Gmail OAuth2, Telegram webhook, WhatsApp Cloud API — platform and per-tenant

Effort Allocation

Business Logic (35%)

Codebase (22%)

Requirements & Spec (15%)

Infrastructure (12%)

Security (10%)

QA & Reliability (6%)

Infrastructure & Technologies

MESSAGE-BUS

🔥

Firestore (Native)

Chosen so writes become events and client listeners become pub/sub — one substrate covers the agent message bus and the real-time UI feed, with zero queue or WebSocket infrastructure.

COMPUTE

🟧

Cloud Functions Gen 2

Chosen because labor-unit agents chain LLM calls and fan out to external APIs for longer than 9 minutes — Gen 1's 540s cap was a blocker, Gen 2's Cloud Run base gives up to 60 minutes.

SECURITY

🔐

Secret Manager (GCP)

Chosen to keep LLM provider keys out of Firestore — only secret IDs persist, raw keys resolve at function runtime, and the API never ships them to the client.

INTEGRITY

📦

Shared Types Workspace

Chosen to eliminate type drift between backend and frontend — a single npm workspace package defines Mission, Task, AgentConfig, and Plan, so schema changes break compilation on both sides rather than failing silently at runtime.

VALIDATION

🛡️

Zod

Chosen because LLM output is unstructured by nature — every plan from the orchestrator and every verdict from the auditor is Zod-parsed at ingest, so malformed responses fail loud rather than poisoning the pipeline.

STATE

🔁

Redux Toolkit + RTK Query

Chosen because the Kanban holds complex cross-slice state (drag × status × auth × agent config) that useState and useContext couldn't keep coherent — a custom Firebase base query unifies REST calls and Firestore listeners behind one cache.

ACCESSIBILITY

🎯

@dnd-kit

Chosen because WCAG-compliant drag-and-drop (keyboard + screen-reader) was a hard requirement; react-dnd doesn't clear that bar.

🔀

Multi-Provider LLM

Chosen because no single model wins every role — Gemini 2.0 Flash plans fast, GPT-4o extracts structured data, Claude Sonnet handles long-form. Providers are selectable per agent per tenant, hot-swappable with no code change.

Back to Case Studies