Chukwudi's blog 👋🏽

Observable file systems for agents

Table of Contents

rack88 home
Browser-based agent file system explorer (with macOS 10 skin)

For local agent development, a simulated shell environment provides quicker feedback loops, and sufficient isolation guarantees, compared to traditional container/virtualization based approaches.

Over at Ije, I’ve been building rack88, an autonomous agent for [redacted].

The idea is that rack88 aggregates data from a set of data sources, generates a thesis based on a dialectic framework, and then autonomously makes a decision.

One of the key capabilities which rack88 needs to get work done is being able to transform and process textual data expressively, and then offload the context onto a durable location.

A file system serves this purpose.

More generally, what we want here is a portable and light-weight sandbox environment, backed by a file system.

windows xp file system
Windows XP file system

File systems allow the agent to move raw/summarised data, which has been processed into information within the context window, to a durable location.

This also allows the agent to utilise the notion of memory.

Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action — Wikipedia

By being able to query, retrieve and recall details of previous episodes (backed by a file system), we can begin to augment the agent’s memory.

Once the agent can read and write files within its sandbox, we’ll need to be able to run arbitrary computations.

This capability allows the agent to run aggregations and basic mathematical operations without having to rely on the LLM’s reasoning capabilities.

Additionally, it would be inefficient to do numerical calculations in token space, regardless of the model.

A caveat to this is that there have been significant advancements in mathematical reasoning in LLMs, particularly DeepSeekMath 7B.

However, I think these reasoning capabilities should be augmented alongside the agent’s turing-complete shell environment, rather than relying solely on the agent’s internal reasoning capabilities, which are quite likely to be fallible.

We have 5 broad requirements for rack88’s sandbox environment:

  1. A portable and durable storage environment
    • We should be able to easily query and inspect the contents of the agent’s file system, along with its accumulated history and state.
    • It should be simple to move the agent’s file system across different hosts if needed.
  2. File system, network and process isolation
    • The agent should not have access to the host’s file system or network.
    • The agent should not be able to run networking commands within its shell environment, unless explicitly enabled.
    • This reduces the surface area of attack for a rogue/compromised agent.
  3. Light-weight execution environment
    • There should be minimal resource overhead required to (cold/warm) start and stop the sandbox environment. The key resources being disk space, RAM and CPU utilisation.
  4. A (nearly) turing complete execution environment
    • The agent should be able to run arbitrary computations and mathematical operations on data, particularly iterative/recursive operations.
  5. We should be able to observe the state of the agent’s file system and execution environment via a graphical user interface.

Technical Implementation

overview
Overview of the agent file system

1. Building the sandbox environment

A few options which I considered for the sandbox environment were:

  1. Docker

    • The core abstraction here is the container image — which is an OCI-compliant collection of tarball files, which when unpacked form the file system required to run the given application.
    • We get file system, network and process isolation guarantees (via Linux namespaces and cgroups), however we still run on the host system’s kernel.
    • It’s also easy to get a sandbox environment running with Docker containers.
    • However, Docker comes with the overhead of running the Docker daemon locally, as well as building container images.
  2. Firecracker:

    • The core abstraction here is the microVM.
    • We get hardware-level isolation within a dedicated kernel per tenant/application/instance.
    • I found this to be interesting (it’s written in Rust too), but too ‘heavy-weight’ for this stage of the project.
  3. Just Bash + AgentFS:

    • The core abstraction here is a simulated bash shell environment, implemented in TypeScript, backed by an SQLite based file system.
    • We get minimal process isolation, as the entire shell environment runs within the same process as the host. However, we get file system and network isolation, as we can control the file system and network access of the shell environment.
    • The SQLite based file system means the entire file system is represented as a single file on disk, which makes it easy to move around.
    • This means that we can query the file system using SQL, which makes it easier to audit the agent’s file system and state. AgentFS supports logging tool calls and their outputs, which we can then retrospectively query and review.
    • It’s also quite easy to get a sandbox environment running with Just Bash + AgentFS, exposed via a Bun HTTP server.
    • I found this to be the ideal solution, hitting the sweet spot between portability, isolation and ease of implementation.

alt text
Isolation vs portability trade-off

i. Durable storage environment

AgentFS implements a file system backend by a local SQLite-based database.

The entirety of the file system is represented on disk as a single .db file.

This satisfies our portability requirement, as we can easily move around the agent’s environment across places if we wish to.

To contrast this with a Docker container, we could mount a persistent volume on an agent’s container, which would also give us access to the agent’s file system/records.

However, we don’t get the added benefit of an interoperable Node.js Filesystem interface implementation, which AgentFS provides.

The existence of this interface means that we can use AgentFS within any Node.js context that expects a file system interface.

ii. Nearly Turing-complete execution environment

Just Bash provides a TypeScript-based simulated bash shell environment, containing a large number of Linux tools re-implemented in TypeScript.

Importantly, Just Bash contains a Python interpreter, which means that we can run arbitrary computations within this shell environment.

Just Bash also supports using a custom file system such as the one provided by AgentFS.

We can wire up both components together like so:

import { agentfs } from "agentfs-sdk/just-bash";
import { Bash } from "just-bash";

const agentFS = await agentfs({ id: "rack88-agent" });

const shell = new Bash({
    fs: agentFS,
    cwd: "/app",
    python: true,      
});

iii. A means of observing the state of the file system and execution environment

Once we have a persistent file system and a turing complete environment (bash + python), we’ll need a way to view the agent’s outputs (files, scripts etc).

To solve for this, we can expose the shell environment via a HTTP server which (i) we’ll call and (ii) our agent harness will call (next section).

I’ve setup the shell HTTP server using Bun, as it gives us out-of-the-box TypeScript support, great primitives and also a neat developer experience (dx).

async function execHandler(req: Request) {
  const body = await req.json();  
  return shell.exec(body.command);
}

const server = Bun.serve({
  port: 3000,
  routes: {
    "/exec": execHandler,
  }
});

We can test this via curl with the below request which lists files at the shell’s mounting path:

curl localhost:3000/exec -H "Content-Type: application/json" -d '{"command": "ls /app"}'

{"stdout":"memory\nresearch\nscratch\n","stderr":"","exit_code":0}

2. Exposing the environment as a tool for the agent

The rack88 agent harness is written in Rust. It supports calling arbitrary tools with the interface:

#[async_trait]
pub trait Tool {
    type Input;
    type Output;

    // name returns the name of the tool
    fn name(&self) -> String;

    // describe returns a description of the tool's schema in
    // a JSON schema format
    fn get_definition(&self) -> types::ToolDef;

    // parse_input parses an input string from an LLM into
    // the tool's input type.
    fn parse_input(&self, input: String) -> Result<Self::Input, String>;

    // call invokes the tool, based on the given input and returns an output.
    async fn call(&self, input: Self::Input) -> Result<Self::Output, String>;
}

This interface describes a Tool which has an arbitrary input schema and output schema, and it can parse string inputs into its expected structured input type.

Importantly, a Tool can be called with its input type, and return its output type.

Given this requirement, we can define our bash tool definition as follows:


// BashTool provides the LLM with access to bash script execution
#[derive(Clone)]
pub struct BashTool {
    // sandbox_client makes HTTP requests to the bash shell HTTP server
    sandbox_client: sandbox::Client,
}

impl BashTool {
    pub fn new() -> Result<BashTool, String> {
        let sandbox_client = sandbox::Client::new()?;
        Ok(BashTool { sandbox_client })
    }
}

#[async_trait]
impl Tool for BashTool {
    type Input = sandbox::BashRequest;
    type Output = sandbox::BashResponse;

    fn name(&self) -> String {
        String::from("bash")
    }

    fn parse_input(&self, input_str: String) -> Result<Self::Input, String> {
        let input: Self::Input = serde_json::from_str(&input_str)
            .map_err(|e| format!("failed to parse bash tool input: {}", e))?;
        Ok(input)
    }

    async fn call(&self, input: Self::Input) -> Result<Self::Output, String> {
        let response = self.sandbox_client.run_bash(input).await?;
        Ok(response)
    }

    fn get_definition(&self) -> types::ToolDef {
        types::ToolDef {
            tool_type: String::from("function"),
            function: types::FunctionToolDef {
                name: String::from("bash"),
                description: String::from(
                    "Executes a bash command in a persistent sandbox session. The session persists across calls — environment variables, files written, and working directory all carry over between invocations. Use this tool liberally and expressively.\n\nCapabilities:\n- Full bash scripting: pipes, loops, conditionals, heredocs\n- Python 3 is available for ad-hoc data analysis: `python3 -c '...'` or write a script and run it\n- jq is available for JSON wrangling: parse, filter, transform JSON data from tool outputs or files\n- File system: read, write, and process files across calls. Prefer reading large data progressively (e.g. `head`, `tail`, `grep`, `sed`) rather than all at once.\n\nPrefer bash over asking the user to provide data. If you need to analyse JSON, pipe it through jq. If you need to process a file, read it with bash. If you need to do arithmetic or data analysis, use Python. Treat the sandbox filesystem as your working scratchpad — write intermediate results to files and read them back in later calls.",
                ),
                parameters: types::FunctionToolDefParameters {
                    parameter_type: String::from("object"),
                    required: vec![String::from("command")],
                    properties: serde_json::json!({
                        "command": {
                            "type": "string",
                            "description": "The command to execute in the sandbox"
                        }
                    }),
                },
            },
        }
    }
}

3. Building a user interface for the file system

overview
User interface client interaction with the shell environment

Once we have a server backing the file system and shell environment, as well as the agent interacting with the environment (creating files, running scripts), we can build a user interface for the file system.

We can have the user interface client make an HTTP request to the Bun sandbox server in order to retrieve the contents of the file system.

For example, opening a directory in the user interface dispatches a ls dir/ request from the client to the server, which then runs in the shell environment and returns the contents of the directory.

The client then renders the contents of the server’s response (i.e. the directory contents) in the user interface.

This makes it much easier and more ergonomic to inspect the contents of the file system and review the agent’s findings.

I got Claude Code to implement macOS 10 and Windows XP skins for the user interface, along with a terminal interface.

This means that we can dispatch commands within the user interface via a browser based terminal, similar to a regular local terminal:

terminal
🎉 Running shell commands with the browser-based terminal

Under the hood, each time we send a command via the browser terminal, the frontend client simply sends an HTTP request to the Bun sandbox server, which then runs in the shell environment and returns the results.

This means that we can explore the agent’s file system as though we were navigating a regular file system.

The familiar (and nostalgic) user interface allows us to retain our existing intuition about navigating file systems, while providing a modern and ergonomic experience for working with the agent’s file system.

notes
Viewing markdown text files in the browser file system (windows XP skin)

4. Testing the agent

Here’s an example turn with the agent where it writes and runs a Python script to run a simple calculation:

notes
Prompting the rack88 agent to call the bash tool
notes
rack88's response after calling the bash tool
notes
The result of the rack88 agent's tool call in the file system

Takeaways

This was quite engaging to work on and I’m excited to see how the agent sandbox space progresses, particularly in local and multi-tenant remote environments.

Bash might be all you need for your sufficiently-capable agent to crunch numbers, run scripts, and navigate file systems.

The current rack88 agent sandbox implementation is geared towards local development in trusted environments.

In a production environment, a few optimisations which I would make include:

  • Encrypted Database files on disk: The SQLite database files should ideally be encrypted at rest on disk with the customer’s private key.
  • Access Control: OAuth2/RBAC to ensure commands can only be made by authenticated and authorized agents/clients to the given sandbox environments.
  • Metrics for the file system server: Latency and throughput metrics to detect outlier commands.

I would most definitely explore Firecracker for more production-grade isolation.

References

  1. AgentFS + Just Bash
  2. Fly.io: Docker without Docker
  3. Fly.io: Sandboxing and workload isolation
  4. Firecracker: Lightweight Virtualization for Serverless Applications
  5. Claude Code Sandboxing
  6. OCI specification
  7. RyOS