Building a Team of Background Agents

We built a team of background agents this week. Two agents running in sandboxed containers, one doing market research and outreach, one working through our coding backlog.

The execution layer is a file system agent. It reads tasks from files, writes findings back, and keeps a journal that persists across sessions. The sandboxes are swappable so we can optimize for whatever runtime makes sense as the tech evolves.

Architecture diagram showing agents, sandbox, file system, and tools

The agents

The market agent handles outreach. It researches prospects using our browser agent and xAI's tools, pulling from X posts and web search to build lead profiles with talking points. It generates personalized context for each prospect, waiting for me to approve and craft before sending.

The code agent specializes in taking in feedback and improving product. It reads the task list, picks what's ready, and ships PRs. Both agents share the sandbox and journal, so product and customer research blend naturally.

Sandbox

Agents need to run code. That's the whole point. But you can't give an AI direct access to your machine and hope for the best. If the agent hallucinates and tries to rm -rf /, you need that to fail safely.

This is why, to build effectively on tools like OpenClaw, you run everything inside a sandbox. The agent can create files, run scripts, control browsers, but it all happens in an isolated environment. Your host stays protected. The sandbox is the security boundary between what the agent wants to do and what it's actually allowed to touch.

The landscape is moving fast. E2B uses Firecracker microVMs with ~150ms cold starts. Modal gives you serverless containers with gVisor isolation. Fargate handles longer-running workloads with persistent storage. Each has tradeoffs: startup latency, session duration, cost per hour, how much state persists between runs.

We treat the sandbox as a swappable layer. Run on Modal when iterating fast, Fargate when files need to persist across sessions, and swap in whatever emerges next. The interface stays the same. By abstracting the runtime, we're not locked into any single provider as the tech evolves.

File system

The file system is the execution layer. The Unix philosophy of everything is a file turns out to be a great fit for agents. They're good at reading files and using bash tools, so we lean into that instead of building elaborate state management.

The agent keeps a continuous journal in the workspace. When context gets too large, we compact it into a summary and reset. Tasks, research, and progress all persist to the file system.

Some patterns that help: let the agent start reading files immediately, even if a sync from the latest branch is still running. In a large repo, the incoming prompt probably isn't touching files that changed in the last 30 minutes. Block writes until sync is done, but let reads happen async. Move everything you can to the container build step. Run your app and test suite once during build so cached files exist for the second run. Pre-warm aggressively.

Queue follow-up prompts instead of inserting them mid-execution. We found it easier to manage, and it lets you send thoughts on next steps while the agent is still working on the current task. Build a way to stop the agent mid-run too.

Coding agents keep getting better at reading files and making tool calls. Context windows keep expanding. Research on agentic file systems and dynamic context discovery points in the same direction. By betting on the file system, we get those improvements for free.

Workspace file structure showing journal, leads, outreach, and task files

Tools

The agent shouldn't be isolated from your existing systems. Most of the value comes from wiring it into what you already have.

We connected ours to the journal, so the agent can pull context from what we've been thinking about. An example is YouTube processing. I love that my agents can learn new concepts from a video I share with them. Research compounds in our context graph instead of disappearing into chat history.

We made our APIs agent-friendly: expose them to the agent, document them in the system prompt, and let the agent figure out when to use each one. You're not building new tools, you're giving the agent access to tools you already built for yourself.

This is where owning the infrastructure pays off. Off-the-shelf agents come with generic integrations. Your agent gets access to your specific systems, your specific context, the things that actually matter for how you work.

Modularity

Having every layer swappable is huge for the future. The sandbox can be Modal, AWS, or Vercel. Whatever performs best. The model can be Opus 6 or GPT-7. Tools plug in through the Atris OS API.

Some integrations we built from scratch because customization mattered more than saving time. The lead finder scrapes directories and constructs emails from patterns. The outreach flow chains research into email generation with specific formatting rules.

Customization matters more than cost here. If you're building agents for your own workflows, they should fit how you actually work.

Examples

Tag the agent in Slack: "Hey, can you take this feedback, log it into today's feedback file, and update the docs for that feature to address it." The agent reads the feedback, appends to the journal, finds the relevant docs, makes the edit, opens a PR. You review it in the morning.

Planning a trip to Louisiana for customer visits: "Find small business owners in the area we can talk to." The agent runs the market research flow, pulls from X and web search, builds a list of prospects with context on each one, saves it to the workspace. By the time you land, you have a shortlist with talking points.

This isn't hypothetical. The infrastructure is live and the agents are running. The only limit is what you connect them to.

Slack interaction showing agent responding to feedback request

Build your own

Every company should have some version of this, or at least work with someone to build it.

Think AI-first. Set up your context for the system before you write code. Design for modularity because a lot of code generated by agents today will get thrown away in the next few months as the models improve. That's fine. Don't over-invest in implementation details.

The most powerful question I asked while building this: "How can this be easier for you to use?" Genuinely ask your coding agent how the system could help it do better work. You'll get feedback you wouldn't think to ask for. The agent becomes a user with opinions about what's missing.

If this is something you want to build, email us and we can work something out.