Introduction

AeorDB is a content-addressed file database that treats your data as a filesystem, not as tables and rows. Store any file at any path, query structured fields with sub-millisecond lookups, and version everything with Git-like snapshots and forks – all from a single binary with zero external dependencies.

What Makes AeorDB Different

Content-addressed storage with BLAKE3. Every piece of data is identified by its cryptographic hash. This gives you built-in deduplication, integrity verification, and a Merkle tree that makes versioning essentially free.

A filesystem, not a schema. Data lives at paths like /users/alice.json and /docs/reports/q1.pdf, organized into directories. No schemas to define, no migrations to run. Store JSON, images, PDFs, or raw bytes – the engine handles them all the same way.

Built-in versioning. Create named snapshots, fork your database into isolated branches, diff between any two versions, and export/import self-contained .aeordb files. The content-addressed Merkle tree means historical reads resolve the exact data at the time of the snapshot, not the latest overwrite.

Native parsers for common formats. AeorDB includes 8 built-in parsers (text, HTML/XML, PDF, images, audio, video, MS Office, ODF) that run automatically during indexing with zero deployment overhead. Common file types are searchable out of the box.

WASM plugin system. Extend the database with WebAssembly plugins for two purposes: parser plugins that extract structured fields from custom file formats for indexing, and query plugins that run custom logic directly at the data layer. Plugins execute in a sandboxed WASM runtime with configurable memory limits. Native parsers handle common formats; WASM plugins extend to anything else.

Native HTTP API. AeorDB exposes its full API over HTTP – no separate proxy, no client library required. Store files with PUT, read them with GET, query with POST /files/query, and manage versions with the /versions/* endpoints. Any HTTP client works.

Embeddable. A single aeordb binary with no external dependencies. Point it at a .aeordb file and you have a running database. Like SQLite, but for files with versioning and a built-in HTTP server.

Lock-free concurrent reads. The engine uses snapshot double-buffering via ArcSwap so readers never block writers and never see partial state. Queries routinely complete in under a millisecond.

Key Features

Storage: Append-only WAL file, content-addressed BLAKE3 hashing, automatic zstd compression, 256KB chunking for dedup
Indexing: Scalar bucketing (NVT) with u64, i64, f64, string, timestamp, trigram, phonetic/soundex/dmetaphone index types
Native parsers: 8 built-in parsers (text, HTML/XML, PDF, images, audio, video, MS Office, ODF) – no WASM deployment needed
Querying: JSON query API with boolean logic (and, or, not), comparison operators, sorting, pagination, projections, aggregations, and zero-config virtual fields for searching by filename, extension, size, and content type
Versioning: Snapshots, forks, diff/patch, export/import as self-contained .aeordb files, file-level history and restore
Plugins: WASM parser plugins for custom formats, WASM query plugins for server-side logic
Operations: Background task system, cron scheduler (including automated backups), garbage collection, automatic reindexing
Auth: Self-contained JWT auth, API keys, user/group management with tags, path-level permissions, or --auth false for local use
TLS: Native HTTPS via rustls with --tls-cert and --tls-key flags
Configuration: TOML config file support (--config) with 1:1 CLI flag mapping
Observability: O(1) stats and memory diagnostics at /system/stats, Prometheus metrics at /system/metrics, real-time metrics SSE event, structured logging

Next Steps

Installation – build from source
Quick Start – store, query, and version data in 5 minutes
Architecture – how the engine works under the hood

Bot Quickstart

This page is for automated agents that discover an AeorDB HTTP endpoint and need to understand how to use it safely.

Discovery

The public health endpoint is GET /system/health.
The public documentation is served at GET /docs/.
This raw bot guide is served at GET /docs/SKILL.md.
The browser portal is served at GET /.

Authentication

Most API routes require an API key exchanged for a bearer token:

curl -X POST "$AEORDB/auth/token" \
  -H "Content-Type: application/json" \
  -d '{"api_key":"<api-key>"}'

Use the returned token as:

Authorization: Bearer <token>

If auth is disabled for a development instance, authenticated routes may work without this header.

High-Value Routes

Purpose	Route
Health	`GET /system/health`
Stats	`GET /system/stats`
Read file	`GET /files/{path}`
Write file	`PUT /files/{path}`
List directory	`GET /files/{dir}/`
Query one subtree	`POST /files/query`
Search globally or by subtree	`POST /files/search`
Fetch many files or ranges	`POST /files/fetch`
Chunk upload check	`POST /blobs/check`
Upload chunk	`PUT /blobs/chunks/{hash}`
Commit uploaded chunks	`POST /blobs/commit`
Invoke plugin	`POST /plugins/{name}/invoke`
Events	`GET /system/events`

Search Examples

Search one folder:

{
  "path": "/docs/",
  "query": "how to",
  "limit": 20
}

Structured search:

{
  "path": "/",
  "where": {"field": "@extension", "op": "eq", "value": "md"},
  "limit": 100
}

Search with locators for follow-up range fetch:

{
  "path": "/docs/",
  "query": "database write pattern",
  "include_matches": true,
  "max_matches_per_result": 5,
  "snippet_chars": 240
}

Use returned fetch_hint, ranges, content_hash, and updated_at with POST /files/fetch to retrieve only the relevant parts of large files.

Range Fetch

Use POST /files/fetch for batch reads and partial reads. Prefer range fetch after search hit locators instead of downloading large documents.

Safety Rules

Treat AeorDB as a database, not a disposable file server.
Do not run repair, GC, import, export-over, or repeated restarts against a suspected corrupt original until evidence has been preserved.
If corruption is suspected, preserve the database file, hot files, lock file, and logs before mutation.
Prefer graceful shutdown with SIGTERM/Ctrl+C and wait for completion.
Avoid broad unbounded searches or full-file fetches when a scoped search or range fetch will do.

More Detail

Start with:

GET /docs/api/files.html
GET /docs/api/querying.html
GET /docs/api/upload-protocol.html
GET /docs/api/plugins.html
GET /docs/operations/threat-model.html

Installation

AeorDB is built from source using the Rust toolchain. There are no external dependencies – the binary is fully self-contained.

Prerequisites

Rust (stable toolchain, 1.75+)

Install Rust if you don’t have it:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Build from Source

Clone the repository and install the release binary locally:

git clone https://github.com/AeorDB/aeordb.git
cd aeordb
./scripts/install-local.sh

The binary is installed at:

~/.local/bin/aeordb

Verify the Build

aeordb --help

You should see:

AeorDB command-line interface

Usage: aeordb <COMMAND>

Commands:
  start            Start the database server
  stress           Run stress tests against a running instance
  emergency-reset  Emergency reset: revoke the current root API key and generate a new one
  export           Export a version as a self-contained .aeordb file
  diff             Create a patch .aeordb containing only the changeset between two versions
  import           Import an export or patch .aeordb file into a target database
  promote          Promote a version hash to HEAD
  gc               Run garbage collection to reclaim unreachable entries
  help             Print this message or the help of the given subcommand(s)

Optional: Install an Existing Binary

If you already have a built release binary, install it without rebuilding:

./scripts/install-local.sh --from /path/to/aeordb

Optional: Add to PATH

The install script uses ~/.local/bin, which is usually already on PATH. If your shell cannot find aeordb after installation:

export PATH="$HOME/.local/bin:$PATH"

No External Dependencies

AeorDB does not require:

A separate database process (it IS the process)
Runtime libraries or shared objects
Configuration files (sensible defaults for everything)
Docker, containers, or orchestration

The single binary is all you need. Point it at a file path and it creates the database on first run.

Next Steps

Quick Start – start the server and store your first file

Quick Start

This tutorial walks you through the core operations: storing files, querying, creating snapshots, and cleaning up. All examples use curl and assume auth is disabled for simplicity.

1. Start the Server

aeordb start --database mydb.aeordb --port 6830 --auth false

You should see log output indicating the server is listening on port 6830. The mydb.aeordb file is created automatically if it does not exist.

2. Store a File

Store a JSON file at the path /users/alice.json:

curl -X PUT http://localhost:6830/files/users/alice.json \
  -H "Content-Type: application/json" \
  -d '{"name":"Alice","age":30,"city":"Portland"}'

Expected response:

{
  "status": "created",
  "path": "/users/alice.json",
  "hash": "a1b2c3d4..."
}

Store a few more files to have data to query:

curl -X PUT http://localhost:6830/files/users/bob.json \
  -H "Content-Type: application/json" \
  -d '{"name":"Bob","age":25,"city":"Seattle"}'

curl -X PUT http://localhost:6830/files/users/carol.json \
  -H "Content-Type: application/json" \
  -d '{"name":"Carol","age":35,"city":"Portland"}'

3. Read a File

curl http://localhost:6830/files/users/alice.json

Expected response:

{"name":"Alice","age":30,"city":"Portland"}

4. List a Directory

Append a trailing slash to list directory contents:

curl http://localhost:6830/files/users/

Expected response:

{
  "items": [
    {"path": "/users/alice.json", "name": "alice.json", "entry_type": 2, "hash": "a1b2c3..."},
    {"path": "/users/bob.json", "name": "bob.json", "entry_type": 2, "hash": "d4e5f6..."},
    {"path": "/users/carol.json", "name": "carol.json", "entry_type": 2, "hash": "a7b8c9..."}
  ]
}

5. Add an Index

To query fields, you need to tell AeorDB which fields to index. Store an index configuration at .aeordb-config/indexes.json inside the directory:

curl -X PUT http://localhost:6830/files/users/.aeordb-config/indexes.json \
  -H "Content-Type: application/json" \
  -d '{
    "indexes": [
      {"name": "age", "type": "u64"},
      {"name": "city", "type": "string"},
      {"name": "name", "type": ["string", "trigram"]}
    ]
  }'

This tells the engine to index the age field as a 64-bit unsigned integer, city as an exact string, and name as both an exact string and a trigram (for fuzzy matching). Existing files in the directory are automatically reindexed in the background.

6. Query

Query for users older than 28 in Portland:

curl -X POST http://localhost:6830/files/query \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/users/",
    "where": {
      "and": [
        {"field": "age", "op": "gt", "value": 28},
        {"field": "city", "op": "eq", "value": "Portland"}
      ]
    }
  }'

Expected response:

{
  "results": [
    {"name": "Alice", "age": 30, "city": "Portland"},
    {"name": "Carol", "age": 35, "city": "Portland"}
  ],
  "total_count": 2
}

Query Operators

Operator	Description	Example
`eq`	Equals	`{"field": "city", "op": "eq", "value": "Portland"}`
`gt`	Greater than	`{"field": "age", "op": "gt", "value": 25}`
`gte`	Greater than or equal	`{"field": "age", "op": "gte", "value": 25}`
`lt`	Less than	`{"field": "age", "op": "lt", "value": 30}`
`lte`	Less than or equal	`{"field": "age", "op": "lte", "value": 30}`
`between`	Range (inclusive)	`{"field": "age", "op": "between", "value": [25, 35]}`
`fuzzy`	Trigram fuzzy match	`{"field": "name", "op": "fuzzy", "value": "Alce"}`
`phonetic`	Phonetic match	`{"field": "name", "op": "phonetic", "value": "Karrol"}`

7. Create a Snapshot

Save the current state as a named snapshot:

curl -X POST http://localhost:6830/versions/snapshots \
  -H "Content-Type: application/json" \
  -d '{"name": "v1"}'

Expected response:

{
  "status": "created",
  "name": "v1",
  "root_hash": "e5f6a7b8..."
}

You can list all snapshots:

curl http://localhost:6830/versions/snapshots

8. Delete a File

curl -X DELETE http://localhost:6830/files/users/alice.json

Expected response:

{
  "status": "deleted",
  "path": "/users/alice.json"
}

The file is removed from the current state (HEAD), but the snapshot v1 still contains it. You can restore the snapshot to get it back.

9. Run Garbage Collection

Over time, deleted and overwritten data accumulates in the database file. Run GC to reclaim unreachable entries:

curl -X POST http://localhost:6830/system/gc

Expected response:

{
  "versions_scanned": 2,
  "live_entries": 15,
  "garbage_entries": 3,
  "reclaimed_bytes": 1024,
  "duration_ms": 12,
  "dry_run": false
}

To preview what would be collected without actually deleting:

curl -X POST "http://localhost:6830/system/gc?dry_run=true"

Next Steps

Configuration – CLI flags, auth modes, CORS, index config
Architecture – understand the storage engine internals
Versioning – snapshots, forks, diff/patch, export/import
Indexing – index types, multi-strategy fields, WASM parsers

Configuration

AeorDB is configured through CLI flags, a TOML configuration file, and through configuration files stored inside the database itself. CLI flags always take precedence over config file values.

Configuration File

AeorDB supports a TOML configuration file for all server settings. Pass it with --config:

aeordb start --config aeordb.toml

Every config key has a 1:1 mapping to a CLI flag. CLI flags override config file values. Omit any key to use the built-in default.

[server]
port = 6830
host = "0.0.0.0"
log_format = "pretty"

[server.tls]
cert = "/path/to/cert.pem"
key = "/path/to/key.pem"

[server.cors]
origins = ["https://app.example.com"]

[auth]
mode = "self"
jwt_expiry_seconds = 3600

[storage]
database = "data.aeordb"
chunk_size = 262144
# hot_dir = "./hot"

An example config file is included in the repository as aeordb.example.toml.

CLI Flags

aeordb start [OPTIONS]

Flag	Default	Description
`--config`, `-c`	—	Path to a TOML configuration file
`--port`, `-p`	`6830`	HTTP listen port
`--host`	`0.0.0.0`	Bind address
`--database`, `-D`	`data.aeordb`	Path to the database file (created if it does not exist)
`--auth`	self-contained	Auth provider URI (see Auth Modes)
`--hot-dir`	database parent dir	Directory for write-ahead hot files (crash recovery journal)
`--cors-origins`	disabled	CORS allowed origins (see CORS)
`--log-format`	`pretty`	Log output format: `pretty` or `json`
`--tls-cert`	—	Path to TLS certificate PEM file (requires `--tls-key`)
`--tls-key`	—	Path to TLS private key PEM file (requires `--tls-cert`)
`--jwt-expiry`	`3600`	JWT token lifetime in seconds
`--chunk-size`	`262144`	Write chunk size in bytes (256 KiB default)

TLS

AeorDB supports native HTTPS via rustls. Provide both a certificate and private key:

aeordb start \
  --tls-cert /etc/ssl/certs/aeordb.pem \
  --tls-key /etc/ssl/private/aeordb.key \
  --port 443

Both --tls-cert and --tls-key must be provided together – supplying only one is an error. When TLS is configured, the server listens for HTTPS connections instead of plain HTTP.

Examples

# Minimal: local development with no auth
aeordb start --database dev.aeordb --port 8080 --auth false

# Production: HTTPS with auth, CORS for your frontend
aeordb start \
  --database /var/lib/aeordb/prod.aeordb \
  --port 443 \
  --tls-cert /etc/ssl/certs/aeordb.pem \
  --tls-key /etc/ssl/private/aeordb.key \
  --hot-dir /var/lib/aeordb/hot \
  --cors-origins "https://myapp.com,https://admin.myapp.com" \
  --log-format json

# Using a config file with CLI overrides
aeordb start --config aeordb.toml --port 8080

Auth Modes

The --auth flag controls how authentication works:

Value	Behavior
`false` (or `null`, `no`, `0`)	Auth disabled – all requests are allowed without tokens. Use for local development only.
(omitted)	Self-contained mode (default). AeorDB manages its own users, API keys, and JWT tokens. A root API key is printed on first startup.
`file:///path/to/identity.json`	External identity file. AeorDB loads cryptographic keys from the specified file. A bootstrap API key is generated on first use.

Self-Contained Auth (Default)

On first startup, AeorDB creates an internal identity store and prints a root API key:

Root API key: aeor_ak_7f3b2a1c...

Use this key to create additional users and API keys via the admin API. If you lose the root key, use aeordb emergency-reset to generate a new one.

Obtaining a JWT Token

# Exchange API key for a JWT token
curl -X POST http://localhost:6830/auth/token \
  -H "Content-Type: application/json" \
  -d '{"api_key": "aeor_ak_7f3b2a1c..."}'

# Use the token for subsequent requests
curl http://localhost:6830/files/users/ \
  -H "Authorization: Bearer eyJhbG..."

CORS

Global CORS via CLI

The --cors-origins flag sets allowed origins for all routes:

# Allow all origins
aeordb start --cors-origins "*"

# Allow specific origins (comma-separated)
aeordb start --cors-origins "https://myapp.com,https://admin.myapp.com"

Without --cors-origins, no CORS headers are sent and cross-origin browser requests will fail.

Per-Path CORS

For fine-grained control, store a /.config/cors.json file in the database:

curl -X PUT http://localhost:6830/files/.config/cors.json \
  -H "Content-Type: application/json" \
  -d '{
    "rules": [
      {
        "path": "/public/",
        "origins": ["*"],
        "methods": ["GET", "HEAD"],
        "headers": ["Content-Type"]
      },
      {
        "path": "/api/",
        "origins": ["https://myapp.com"],
        "methods": ["GET", "POST", "PUT", "DELETE", "HEAD", "OPTIONS"],
        "headers": ["Content-Type", "Authorization"]
      }
    ]
  }'

Per-path rules are checked first. If no rule matches the request path, the global --cors-origins setting applies.

Index Configuration

Indexes are configured per-directory by storing a .aeordb-config/indexes.json file under the directory path. When this file changes, the engine automatically triggers a background reindex of all files in that directory.

{
  "indexes": [
    {"name": "title", "type": "string"},
    {"name": "age", "type": "u64"},
    {"name": "email", "type": ["string", "trigram"]},
    {"name": "created", "type": "timestamp"}
  ]
}

Index Types

Type	Description	Use Case
`u64`	Unsigned 64-bit integer	Counts, IDs, sizes
`i64`	Signed 64-bit integer	Temperatures, offsets, balances
`f64`	64-bit floating point	Coordinates, measurements, scores
`string`	Exact string match	Categories, statuses, enum values
`timestamp`	UTC millisecond timestamp	Date ranges, temporal queries
`trigram`	Trigram-based fuzzy text	Typo-tolerant search, substring matching
`phonetic`	Phonetic matching (Soundex)	Name search (“Smith” matches “Smyth”)
`soundex`	Soundex encoding	Alternative phonetic matching
`dmetaphone`	Double Metaphone	Multi-cultural phonetic matching

Multi-Strategy Indexes

A single field can have multiple index types. Specify type as an array:

{"name": "title", "type": ["string", "trigram", "phonetic"]}

This creates three index files (title.string.idx, title.trigram.idx, title.phonetic.idx) from the same source field. Use the appropriate query operator to target each index type.

Source Resolution

By default, the index name is used as the JSON field name. For nested fields or parser output, use the source array:

{
  "parser": "pdf-extractor",
  "indexes": [
    {"name": "title", "source": ["metadata", "title"], "type": "string"},
    {"name": "author", "source": ["metadata", "author"], "type": ["string", "trigram"]},
    {"name": "page_count", "source": ["metadata", "pages"], "type": "u64"}
  ]
}

See Indexing & Queries for the full indexing reference.

Cron Configuration

Schedule recurring background tasks by storing /.config/cron.json:

curl -X PUT http://localhost:6830/files/.config/cron.json \
  -H "Content-Type: application/json" \
  -d '{
    "schedules": [
      {
        "id": "weekly-gc",
        "task_type": "gc",
        "schedule": "0 3 * * 0",
        "args": {},
        "enabled": true
      },
      {
        "id": "nightly-reindex",
        "task_type": "reindex",
        "schedule": "0 2 * * *",
        "args": {"path": "/data/"},
        "enabled": true
      }
    ]
  }'

Supported task types: "gc", "reindex", and "backup". The "backup" type accepts backup_dir, retention_count, and snapshot arguments (see Backup & Restore).

The schedule field uses standard 5-field cron syntax: minute hour day_of_month month day_of_week. Cron schedules can also be managed via the HTTP API at /system/cron.

Lifecycle Policy

Database lifecycle policy is stored in the database at /.aeordb-config/lifecycle.json and can be managed through the root-only /system/lifecycle API.

curl -X PUT http://localhost:6830/system/lifecycle \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "snapshot_writes_enabled": false,
    "snapshot_retention": {
      "auto_months": 1,
      "manual_months": 12
    }
  }'

snapshot_writes_enabled defaults to true. Set it to false to disallow new snapshot writes without removing or breaking existing snapshots. Existing snapshots can still be listed, read, restored, deleted, exported, and pruned. Automatic safety snapshots are skipped while this flag is disabled.

Compression

AeorDB uses zstd compression automatically when configured. To enable compression for a directory, add the compression field to the index config:

{
  "compression": "zstd",
  "indexes": [
    {"name": "title", "type": "string"}
  ]
}

Auto-Detection

When compression is enabled, the engine applies heuristics to decide whether to actually compress each file:

Files smaller than 500 bytes are stored uncompressed (header overhead negates savings)
Already-compressed formats (JPEG, PNG, MP4, ZIP, etc.) are stored uncompressed
Text, JSON, XML, and other compressible types are compressed with zstd

Compression is transparent – reads automatically decompress. The content hash is always computed on the raw uncompressed data, so deduplication works regardless of compression settings.

Architecture

AeorDB is a single-file database built on an append-only write-ahead log (WAL). The database file contains all data, indexes, and metadata in one place. Understanding the architecture helps you reason about performance, recovery, and versioning behavior.

High-Level Overview

                         aeordb start
                             |
                     +-------+-------+
                     |  HTTP Server  |
                     |  (axum)       |
                     +-------+-------+
                             |
              +--------------+--------------+
              |              |              |
        +-----+----+  +-----+----+  +------+------+
        | Query    |  | Plugin   |  | Version     |
        | Engine   |  | Manager  |  | Manager     |
        +-----+----+  +-----+----+  +------+------+
              |              |              |
              |         +----+----+         |
              |         | Native  |         |
              |         | Parsers |         |
              |         +---------+         |
              +--------------+--------------+
                             |
                    +--------+--------+
                    | Storage Engine  |
                    | (StorageEngine) |
                    +--------+--------+
                             |
              +--------------+--------------+
              |              |              |
        +-----+----+  +-----+----+  +------+------+
        | Append   |  | KV Store |  | NVT         |
        | Writer   |  | (.kv)    |  | (in-memory) |
        +----------+  +----------+  +-------------+
              |              |
              +--------------+
                    |
            [  mydb.aeordb  ]    <-- single file on disk
            [ mydb.aeordb.kv ]   <-- KV index file

Native Parsers

AeorDB ships with 8 built-in format parsers (text, HTML/XML, PDF, images, audio, video, MS Office, ODF) that run as compiled Rust code during indexing. Native parsers are tried first for recognized content types; unrecognized formats fall through to the WASM plugin system. This means common file types are indexable out of the box with zero deployment overhead. See Plugin Endpoints for the full format list.

Metrics Counters

System metrics (file counts, disk sizes, throughput rates, process memory, and bounded cache diagnostics) are tracked via O(1) atomic counters or cheap in-memory snapshots that are updated inline during normal operations. On startup, live file counts and logical bytes are initialized by walking the current directory tree, while stored chunk payload bytes are initialized from KV entry metadata without reading chunk bodies. The GET /system/stats endpoint and the metrics SSE event read directly from these counters and bounded snapshots – there is no full database scan at query time. Rolling rate computation (1-minute, 5-minute, 15-minute averages) is maintained continuously, so monitoring data is always available at near-zero cost.

The Database File (`.aeordb`)

The .aeordb file is an append-only WAL. Every write appends a new entry to the end of the file. Entries are never modified in place (except during garbage collection).

File Layout

[File Header - 256 bytes]
  Magic: "AEOR"
  Hash algorithm, timestamps, KV/NVT pointers, HEAD hash, entry count

[Entry 1] [Entry 2] [Entry 3] ... [Entry N]
  Chunks, FileRecords, DirectoryIndexes, Snapshots, DeletionRecords, Voids

The 256-byte file header contains pointers to the KV block, NVT, and the current HEAD hash. Every entry carries its own header with magic bytes, type tag, hash algorithm, compression flag, key, and value.

Entry Types

Type	Purpose
Chunk	Raw file data (256KB blocks)
FileRecord	File metadata + ordered list of chunk hashes
DirectoryIndex	Directory contents (child entries with hashes)
Snapshot	Named point-in-time version reference
DeletionRecord	Marks a file as deleted (for version history completeness)
Void	Free space marker (reclaimable by future writes)

The KV Index File (`.aeordb.kv`)

The KV store is a sorted array of (hash, offset) pairs stored in a separate file. It maps content hashes to byte offsets in the main .aeordb file, providing O(1) lookups when combined with the NVT.

Each entry is hash_length + 8 bytes (40 bytes for BLAKE3-256). The entries are sorted by hash, and the NVT tells you which bucket to look in, so lookups are a single seek + small scan.

KV Resize

When the KV store needs to grow, the engine enters a brief resize mode:

A temporary buffer KV store is created
New writes go to the buffer (no blocking)
The primary KV store is expanded
Buffer contents are merged into the primary
Buffer is discarded

Writes never block during resize.

NVT (Normalized Vector Table)

The NVT is an in-memory structure that provides fast hash-to-bucket lookups for the KV store.

How It Works

Normalize the hash to a scalar: first_8_bytes_as_u64 / u64::MAX produces a value in [0.0, 1.0]
Map the scalar to a bucket: bucket_index = floor(scalar * num_buckets)
The bucket points to a range in the KV store – scan that range for the exact hash

BLAKE3 hashes are uniformly distributed, so buckets stay balanced without manual tuning. The NVT starts at 1,024 buckets and doubles when the average scan length exceeds a threshold.

Scaling

Entries	NVT Buckets	NVT Memory	Avg Scan
10,000	1,024	16 KB	~10
1,000,000	65,536	1 MB	~15
100,000,000	1,048,576	16 MB	~95

Hot File WAL (Crash Recovery)

The --hot-dir flag specifies a directory for write-ahead hot files. During a write:

The entry is written to a hot file first (fsync’d)
The entry is then written to the main .aeordb file
On success, the hot file entry is cleared

If the process crashes between steps 1 and 2, the hot file is replayed on the next startup to recover uncommitted writes. If --hot-dir is not specified, the hot directory defaults to the same directory as the database file.

Snapshot Double-Buffering

AeorDB uses ArcSwap for lock-free concurrent reads. The in-memory directory state is wrapped in an Arc that readers clone cheaply. When a write completes:

The writer builds a new directory state
The new state is swapped in atomically via ArcSwap::store
Readers holding the old Arc continue using it until they finish
The old state is dropped when the last reader releases it

This means:

Readers never block writers
Writers never block readers
Every read sees a consistent point-in-time snapshot
No read locks, no write locks on the read path

B-Tree Directories

Small directories (under 256 entries) are stored as flat lists of child entries. When a directory exceeds 256 entries, the engine automatically converts it to a B-tree structure. This keeps directory lookups O(log n) even for directories with millions of files.

B-tree nodes are themselves stored as content-addressed entries, so they participate in versioning and structural sharing just like any other data.

Directory Propagation

When a file changes, the engine propagates the update up the directory tree:

Write /users/alice.json
  -> update /users/ directory (new child hash for alice.json)
    -> update / root directory (new child hash for users/)
      -> update HEAD (new root hash)

Each directory gets a new content hash because its contents changed. This is how the Merkle tree works – a change at any leaf creates new hashes all the way to the root. The root hash (HEAD) uniquely identifies the complete state of the database.

Unified Cache

AeorDB uses an eviction-based cache (no TTL) for frequently accessed metadata:

Cached Data	Evicted On
Permissions	Write, delete, or rename of `.permissions` files
Index configs	Write or delete of `.aeordb-config/indexes.json`
Groups	Group membership changes
API keys	Key creation, revocation, or expiration

Cache entries live indefinitely until explicitly evicted by a write, delete, or rename that affects the underlying data. This avoids stale-read windows that TTL-based caches introduce – the cache is always consistent with the storage layer.

KV Type Index

The KV store maintains a type-based index that enables O(k) lookups by entry type (snapshots, files, directories, etc.) instead of O(n) full scans. For example, listing all snapshots requires scanning only snapshot entries rather than the entire KV store.

Next Steps

Storage Engine – entry format, hashing, chunking, and dedup details
Versioning – how snapshots, forks, and diff/patch work
Indexing & Queries – how indexes are built and queried

Storage Engine

The storage engine is an append-only WAL (write-ahead log) where the log IS the database. Every write appends a new entry. The file only grows (until garbage collection reclaims unreachable entries). This design gives you crash recovery, versioning, and integrity verification as structural properties rather than bolted-on features.

Entry Format

Every entry on disk shares the same header format:

[Entry Header - 31 bytes fixed + hash_length variable]
  magic:            u32    (0x0AE012DB - marks the start of a valid entry)
  entry_version:    u8     (payload format version for this entry)
  entry_type:       u8     (Chunk, FileRecord, DirectoryIndex, etc.)
  flags:            u8     (operational flags)
  hash_algo:        u16    (BLAKE3_256 = 0x0001, SHA256 = 0x0002, etc.)
  compression_algo: u8     (None = 0x00, Zstd = 0x01)
  encryption_algo:  u8     (None = 0x00, reserved for future use)
  key_length:       u32    (length of the key field)
  value_length:     u32    (length of the value field)
  timestamp:        i64    (UTC milliseconds since epoch)
  total_length:     u32    (total bytes including header, for jump-scanning)
  hash:             [u8; N] (integrity hash, N determined by hash_algo)

[Key - key_length bytes]
[Value - value_length bytes]

Key properties:

magic (0x0AE012DB) enables recovery scanning – find entry boundaries even in a corrupted file by scanning for magic bytes
total_length enables jump-scanning – skip to the next entry without reading the full key/value
hash covers entry_type + key + value – re-hash and compare to detect corruption
entry_version enables payload evolution – the engine selects the correct parser for that entry type based on this byte. The generic WAL/header format is still v0; individual payloads can advance independently.

For BLAKE3-256 (the default), the hash is 32 bytes, making the full header 63 bytes.

Content-Addressed Hashing

Every piece of data is identified by its BLAKE3 hash. Hash inputs are prefixed by type (domain separation) to prevent collisions between different entry types:

Entry Type	Hash Input	Example
Chunk	`chunk:` + raw bytes	`BLAKE3("chunk:" + file_bytes)`
FileRecord (path key)	`file:` + path	`BLAKE3("file:/users/alice.json")`
FileRecord (content key)	`filec:` + serialized record	`BLAKE3("filec:" + record_bytes)`
DirectoryIndex (path key)	`dir:` + path	`BLAKE3("dir:/users/")`
DirectoryIndex (content key)	`dirc:` + serialized data	`BLAKE3("dirc:" + dir_bytes)`

The domain prefix ensures that a chunk’s raw data can never produce the same hash as a file path, even if the bytes are identical.

Chunking

Files are split into 256KB chunks for storage. Each chunk is content-addressed independently:

Original file (700KB):
  [Chunk 1: 256KB] -> hash_a
  [Chunk 2: 256KB] -> hash_b
  [Chunk 3: 188KB] -> hash_c

FileRecord:
  path: "/docs/report.pdf"
  content_hash: BLAKE3(file bytes)
  chunk_hashes: [hash_a, hash_b, hash_c]
  total_size: 700KB

Chunking provides:

Deduplication: Two files sharing identical 256KB blocks store those blocks only once
Efficient updates: Modifying 3 bytes of a 10GB file creates one new chunk, not a new copy of the entire file
Streaming reads: Read a file by iterating its chunk hashes and fetching each chunk

Dual-Key FileRecords

FileRecords are stored at two keys to support both current reads and historical versioning:

Path key (file:/path) – mutable, always points to the latest version. Used for reads, metadata, indexing, and deletion. O(1) lookup.
Content key (filec: + serialized record) – immutable, content-addressed. The directory tree’s ChildEntry.hash points to this key.

When the version manager walks a snapshot’s directory tree, it follows ChildEntry.hash to the content key, which resolves to the FileRecord as it existed at snapshot time – not the current version. This is what makes historical reads correct.

Directories use the same pattern: dir:/path (mutable) and dirc: + data (immutable content key).

FileRecord Format

[FileRecord Value - v1]
  path_length:      u16
  path:             [u8; path_length]     (full file path)
  content_type_len: u16
  content_type:     [u8; content_type_len] (MIME type)
  total_size:       u64                    (file size in bytes)
  created_at:       i64                    (UTC milliseconds)
  updated_at:       i64                    (UTC milliseconds)
  content_hash:     [u8; hash_length]      (raw whole-file hash, BLAKE3(file bytes) by default)
  metadata_length:  u32
  metadata:         [u8; metadata_length]  (arbitrary JSON metadata)
  chunk_count:      u32
  chunk_hashes:     [u8; chunk_count * hash_length] (ordered chunk hashes)

FileRecord v0 is the same layout without content_hash. New writes emit FileRecord v1, and readers support both v0 and v1. Legacy v0 records expose an empty content_hash until they are rewritten. A forced reindex (POST /system/tasks/reindex with "force": true) rewrites older live FileRecords through the current writer and backfills this field.

Metadata fields come before the chunk list so file metadata can be read without skipping past all chunk hashes. Chunk hashes are the tail of the record for streaming reads.

Directory Propagation

When a file is stored or deleted, the change propagates up the directory tree:

Store /users/alice.json:

1. Store chunks -> [hash_a, hash_b]
2. Store FileRecord at path key + content key
3. Update /users/ DirectoryIndex (new ChildEntry for alice.json)
4. Update / root DirectoryIndex (new ChildEntry for users/)
5. Update HEAD in file header (new root hash)

Each directory gets a new content hash because one of its children changed. This chain of updates from leaf to root is what maintains the Merkle tree and makes versioning work.

Namespace Writer Serialization

The visible publish step is serialized by an engine-level namespace write guard. Individual WAL appends are already protected by the writer/KV locks, but that is not enough for a file-system namespace mutation: the mutable path key, the parent directory child entry, ancestor directory hashes, and HEAD must advance as one logical operation relative to other writers.

Buffered file writes may store chunks before taking this guard, because chunks are content-addressed and unreachable chunks are safe garbage if a later publish fails. Once a writer starts publishing FileRecords, symlinks, directory entries, or HEAD, it holds the namespace guard until the publish completes. The guard is reentrant for nested operations such as copy/restore and indexed delete, and it is scoped per StorageEngine instance.

Void Management

When garbage collection reclaims an entry, the bytes the entry occupied become a void – a region of the WAL marked as reclaimable. Voids are tracked entirely in memory by the VoidManager, which keeps two parallel indexes:

by_offset:  BTreeMap<u64, u32>            // offset → size, ordered iteration
by_size:    BTreeMap<u32, BTreeSet<u64>>  // size → set of offsets, best-fit lookup

Voids are never written into the WAL as their own records. They live in memory while the process runs and are persisted by riding along inside the hot tail on every periodic flush, so the next clean startup restores the void set without scanning the file. On a dirty startup (hot tail unreadable), voids are re-derived via a gap scan of the rebuilt KV: any byte range not covered by a live KV entry between kv_block_end and the WAL tail is registered as a void.

When a new entry needs to be written, the engine calls VoidManager::find_void(needed) before appending to the tail. If a void of sufficient size exists, the entry is written in-place at that void’s offset; otherwise the entry appends. If the chosen void is larger than the entry, the remainder is re-registered as a smaller void (when it is at least 63 bytes – the minimum useful void size for BLAKE3-256: 31-byte fixed header + 32-byte hash + 0-byte key + 0-byte value).

Two size floors govern void tracking:

Constant	Default	Meaning
`MINIMUM_VOID_SIZE`	1 byte	Below this, voids are discarded entirely – treated as alignment noise
`MINIMUM_USEFUL_VOID_SIZE`	63 bytes	Below this, voids are tracked (for metrics and fragmentation visibility) but never returned by `find_void` – no real entry would fit

Compression

Compression is a post-hash transform:

Write: raw data -> hash -> compress -> store
Read:  load -> decompress -> verify hash -> return

The hash is always computed on the raw uncompressed data. This preserves deduplication (same content = same hash regardless of compression) and integrity verification.

Each entry carries its own compression_algo byte, so compressed and uncompressed entries coexist in the same file. Currently, zstd is the only supported compression algorithm.

On-Disk Layout: Single File

A complete AeorDB lives in one .aeordb file:

┌──────────────────────────────────────────────────────────────┐
│  File Header Slot A (256 bytes) — magic + version + CRC      │
├──────────────────────────────────────────────────────────────┤
│  File Header Slot B (256 bytes) — magic + version + CRC      │
├──────────────────────────────────────────────────────────────┤
│  KV Block — bucket pages with per-page magic + CRC           │
│  (stages: 64 KB → 512 KB → 4 MB → 32 MB → 128 MB → …)        │
├──────────────────────────────────────────────────────────────┤
│  WAL — entries appended forward (chunks, file records,       │
│  directory indexes, snapshots, …)                            │
├──────────────────────────────────────────────────────────────┤
│  Hot Tail — magic + count + CRC + recent KV entries          │
└──────────────────────────────────────────────────────────────┘

No sidecar files. No journal directory. The single file is the entire database.

The file header lives in two slots: slot A at byte 0, slot B at byte 256. Each carries a u64 sequence, the header fields, a format_magic = "AEORDB\0\0" byte string, a format_version: u8, and a u32 CRC over the slot. Writers update whichever slot has the lower sequence (increment then write, then fsync). Readers parse both and pick the highest sequence with a valid CRC. A torn write on one slot leaves the other intact — the database always opens.

On open, if format_magic is wrong the engine refuses with “not an AeorDB file”. If format_version doesn’t match what this build understands, the engine refuses with a clear “DB format vN, this build expects vM” message. Format-breaking changes must bump the version.

The KV block at the head is a flat bucket array — hash → entry offset mappings indexed by a normalized vector table (NVT). Each bucket page carries [magic: u32][crc32: u32][entry_count: u16][entries…]. Lookups validate magic + CRC before scanning. If a page fails validation, the engine performs a per-bucket rebuild: using the NVT to identify which hashes belong to that bucket, it scans the WAL for those entries and reconstructs the page. The WAL is the source of truth; bucket pages are a recoverable index.

The WAL grows forward from the end of the KV block. New entries are appended.

The hot tail at EOF is a short journal of KV entries that haven’t been flushed into the bucket pages yet. On startup we read the hot tail first, populating the in-memory write buffer with the most recent unflushed entries.

Online KV Expansion

When the KV bucket pages fill, AeorDB grows the KV block to the next stage while the database is running — no restart, no downtime, no rebuild.

The expansion holds both writer and KV locks for the duration of the relocation, then releases them. Reads and writes that complete before or after the expansion proceed normally; writes during the expansion queue on the lock.

Stage progression (BLAKE3, page size 1,314 bytes):

Stage	Block size	Buckets	Capacity (~)
0	64 KB	49	1,500 entries
1	512 KB	399	12,000
2	4 MB	3,192	96,000
3	32 MB	25,532	768,000
4	128 MB	102,130	3,000,000

Growth is geometric early (8× per stage) then linear (4×, 2×) to keep relocation costs bounded.

How it works (atomically)

Set resize_in_progress = true and resize_target_stage in the file header.
Scan forward from the new KV-block boundary to find the first WAL entry that starts AFTER it. This is the actual end of the growth zone — entries straddling the boundary must be included so the relocated copy is complete.
Bulk-copy the growth zone [old_kv_end .. actual_copy_end] to the end of the WAL (right before the hot tail). This temporarily produces two copies — the original (still readable) and the relocated copy. Crash-safe window.
fsync.
Tell the KV store to finalize: zero the new bucket pages, rehash all entries (entries that lived in the growth zone now point to their relocated offsets), update the in-memory state, write the file header with the new stage.
Write a Void entry over the dead tail at the old growth-zone boundary so the entry scanner stays clean on restart.

If the process crashes between steps 2 and 5, the original WAL data is intact and the next startup detects resize_in_progress = true in the header. The expansion resumes from the relocated copy.

Hot Tail

The hot tail is a small, versioned journal at the end of the file. It carries two kinds of transient state that would otherwise be lost on crash:

Pending KV writes — entries that exist in the in-memory write buffer but haven’t been flushed to bucket pages yet.
Void snapshot — the current VoidManager state, so the next clean startup restores reclaimable-space tracking without rescanning the WAL.

[Header — 21 bytes]
  magic:           [u8; 5] = AE 01 7D B1 0D
  format_version:  u8                 (top-level layout version; bumped on section changes)
  write_count:     u32                (number of write records below)
  void_count:      u32                (number of void records below)
  header_crc32:    u32                (CRC32 of the preceding 14 bytes)

[Write records — 1 + hash_length + 13 bytes each (42 for BLAKE3-256)]
  version:      u8        (per-record layout version; bumped without a full format bump)
  hash:         [u8; hash_length]
  type_flags:   u8
  offset:       u64                   (WAL position of the actual entry)
  total_length: u32                   (on-disk length of the entry)

[Void records — 13 bytes each]
  version:      u8
  offset:       u64                   (start of the reclaimable region)
  size:         u32                   (length of the region)

The hot tail is the durability boundary for both the KV and the void set. Periodic flushes (every 100 ms or whenever the write buffer hits its threshold) rewrite the hot tail in-place at header.hot_tail_offset. On the next clean startup the header is parsed, the write records reload into the write buffer, and the void records repopulate the VoidManager – all before any read serves traffic.

If the hot tail’s header CRC fails, the magic doesn’t match, or the recorded offset points past the file boundary, the engine logs a warning and triggers a dirty startup: a WAL metadata scan (via scan_entries_dirty_recovery) rebuilds the KV from scratch and recover_voids_via_gap_scan re-derives the void set from gaps in the rebuilt KV. The scan reads and verifies mutable metadata records, but skips large chunk and void payload bodies because rebuild only needs their headers and keys. Normal file reads and explicit verification tooling still validate chunk contents. No data is lost – the WAL is the source of truth; the hot tail is a fast-path index plus a void snapshot.

A magic-byte version bump is enough on its own to invalidate older hot tails: the next open with newer code sees a magic mismatch, falls into dirty startup, and rebuilds correctly. That makes the format safely evolvable without a migration tool.

fsync Strategy

Not all entries are equally important for durability:

Data	fsync	Rationale
Chunks, FileRecords, DeletionRecords	Immediate	The truth – not rebuildable from other data
KV store, NVT, DirectoryIndex, Snapshots	Deferred	Derived data – can be rebuilt from a full entry scan

This gives durability where it matters and performance where it doesn’t.

Serious Durability Failures

If a critical write, flush, or sync step fails, AeorDB latches the engine into a read-only safety mode. Reads and diagnostic operations remain available, but subsequent write/index/GC mutations return a durability failure instead of continuing on top of an uncertain persistence state.

On the first latch, the engine also attempts a best-effort emergency spill outside the database file. The spill contains:

manifest.json with the failure context, DB path, offsets, counts, and spill errors.
hot-tail.bin with the current in-memory hot-tail payload: pending KV writes plus the void snapshot.
index-buffer.json when there are unflushed buffered index mutations.
wal-tail.bin when the engine can copy WAL bytes from the last successfully published hot-tail boundary to the current WAL offset.

Spill destination order is:

AEORDB_EMERGENCY_SPILL_DIR, when set.
The OS user-data directory ($XDG_DATA_HOME/aeordb/emergency-spill, ~/.local/share/aeordb/emergency-spill, macOS Application Support, or Windows app data).
The temp fallback: /tmp/aeordb-emergency-spill on Unix-like systems.

AEORDB_EMERGENCY_WAL_SPILL_MAX_BYTES caps the WAL-tail copy. The default is 4 GiB; set it to 0 to copy the full tail. This is intentionally best-effort: the copied bytes reflect what the process can still read and write through the operating system at the time of failure.

On the next server start, AeorDB scans every spill destination for unresolved artifacts whose db_path matches the database being opened. If any are found, startup refuses to serve the normal API and prints the in-place repair command:

aeordb verify --repair --force-fix-in-place -D /path/to/database.aeordb

Repair orders matching artifacts by creation time, oldest first, and prompts before replay unless --yes is passed. Replay only copies verified WAL-tail bytes back into the database file. The external hot-tail.bin and index-buffer.json files are reported as evidence, but are not treated as primary data: after the WAL tail is restored, repair forces a WAL-to-EOF KV rebuild, re-derives reusable gaps with the void gap scanner, and publishes a new hot tail. When verify/repair finishes cleanly, each artifact directory receives an applied.json marker and future startups ignore it.

Crash Recovery

The recovery hierarchy, from least to most damage:

What’s Lost	Recovery Method
Nothing	Read HEAD from KV store, load directory index, ready
One file-header slot torn	Read the other slot (highest valid sequence + CRC wins)
One KV bucket page corrupted	Per-bucket rebuild from WAL, bounded work
Hot tail torn	Dirty startup — full WAL scan rebuilds the KV from scratch, then `recover_voids_via_gap_scan` re-derives the void set from gaps in the rebuilt KV
KV store only	Entry-by-entry scan, rebuild KV store, load latest directory index
Directory index only	Scan FileRecords + DeletionRecords, reconstruct from paths + timestamps
KV store + directory	Full entry scan, rebuild KV, reconstruct directory
Only chunks + FileRecords survive	Full data recovery, version history reconstructed via DeletionRecords

The magic bytes at the start of every entry enable boundary detection even in partially corrupted files. The total_length field in each header enables efficient forward scanning.

Next Steps

Architecture – high-level system overview
Versioning – snapshots, forks, and the Merkle tree

Versioning

AeorDB’s content-addressed Merkle tree makes versioning a structural property of the storage engine rather than an add-on feature. Every write creates new hashes up the directory tree, so every committed state is already a snapshot by definition – you just need to save a pointer to the root hash.

How Versioning Works

The database state at any point in time is fully described by its root hash (HEAD). HEAD is the hash of the root DirectoryIndex, which contains hashes of its children, which contain hashes of their children, all the way down to the chunks of individual files.

HEAD -> root DirectoryIndex
         |
         +-- /users/ (DirectoryIndex)
         |     +-- alice.json (FileRecord -> [chunk_a, chunk_b])
         |     +-- bob.json   (FileRecord -> [chunk_c])
         |
         +-- /docs/ (DirectoryIndex)
               +-- readme.md  (FileRecord -> [chunk_d])

When you write /users/alice.json, the engine creates:

New chunks (if the content changed)
A new FileRecord with new chunk hashes
A new /users/ DirectoryIndex with the updated child entry
A new root DirectoryIndex with the updated /users/ child entry
HEAD now points to the new root hash

The old root hash still exists and still points to the old directory tree with the old file content. Nothing was overwritten – new entries were appended.

Snapshots

A snapshot is a named reference to a root hash. Creating a snapshot saves the current HEAD so you can return to it later.

Create a Snapshot

curl -X POST http://localhost:6830/versions/snapshots \
  -H "Content-Type: application/json" \
  -d '{"name": "v1.0"}'

Snapshot deduplication: If an existing snapshot already has the same root hash as the current HEAD, create_snapshot returns the existing snapshot instead of creating a duplicate. This means taking frequent snapshots is essentially free when data hasn’t changed – no storage overhead, no redundant entries.

List Snapshots

curl http://localhost:6830/versions/snapshots

Response:

{
  "snapshots": [
    {"name": "v1.0", "root_hash": "a1b2c3...", "created_at": 1775968398000},
    {"name": "v2.0", "root_hash": "d4e5f6...", "created_at": 1775968500000}
  ]
}

Restore a Snapshot

Restoring a snapshot sets HEAD back to the snapshot’s root hash. The current state is not lost – you can snapshot it before restoring if you want to preserve it.

curl -X POST http://localhost:6830/versions/restore \
  -H "Content-Type: application/json" \
  -d '{"name": "v1.0"}'

Delete a Snapshot

curl -X DELETE http://localhost:6830/versions/snapshots/v1.0

After deleting a snapshot, entries that were only reachable through that snapshot become eligible for garbage collection.

Forks

Forks are isolated branches of the database. Writes to a fork do not affect HEAD or other forks. This is useful for testing changes, running experiments, or staging updates before promoting them.

Create a Fork

curl -X POST http://localhost:6830/versions/forks \
  -H "Content-Type: application/json" \
  -d '{"name": "experiment"}'

List Forks

curl http://localhost:6830/versions/forks

Promote a Fork

When you’re satisfied with the changes in a fork, promote it to HEAD:

curl -X POST http://localhost:6830/versions/forks/experiment/promote

Abandon a Fork

curl -X DELETE http://localhost:6830/versions/forks/experiment

Tree Walking

The content-addressed Merkle tree enables historical reads. When you walk a snapshot’s directory tree:

Start from the snapshot’s root hash
Load the root DirectoryIndex – each child entry has a hash
Follow child hashes to subdirectories or files
Each FileRecord’s ChildEntry.hash points to a content-addressed (immutable) key

Because file content keys are immutable, walking a snapshot’s tree always resolves to the data as it existed when the snapshot was taken, even if the files have been overwritten or deleted since then.

Snapshot "v1.0" root_hash: aaa111...
  -> /users/ dir_hash: bbb222...
     -> alice.json content_key: ccc333...  (resolves to Alice's v1.0 data)

Current HEAD root_hash: ddd444...
  -> /users/ dir_hash: eee555...
     -> alice.json content_key: fff666...  (resolves to Alice's current data)

Both trees can coexist because they share unchanged chunks and directories (structural sharing). Only the parts that differ consume additional storage.

Export and Import

AeorDB can export a version as a self-contained .aeordb file and import it into another database.

Export

Export creates a clean, compacted database from a single version – no history, no voids, no deleted entries:

# Export current HEAD
aeordb export --database data.aeordb --output backup.aeordb

# Export a specific snapshot
aeordb export --database data.aeordb --snapshot v1.0 --output v1.aeordb

# Export via HTTP
curl -X POST http://localhost:6830/versions/export --output backup.aeordb
curl -X POST "http://localhost:6830/versions/export?snapshot=v1.0" --output v1.aeordb

The exported file is a fully functional database that can be opened with aeordb start.

Import

Import applies an export or patch file to a target database:

# Import without promoting HEAD (inspect first)
aeordb import --database target.aeordb --file backup.aeordb

# Import and promote in one step
aeordb import --database target.aeordb --file backup.aeordb --promote

# Import via HTTP
curl -X POST http://localhost:6830/versions/import \
  -H "Content-Type: application/octet-stream" \
  --data-binary @backup.aeordb

Import does NOT automatically change HEAD. The imported version exists in the database and can be promoted explicitly when ready.

Diff and Patch

Diff extracts only the changeset between two versions. The output is a patch file – not a standalone database.

# Diff between two snapshots
aeordb diff --database data.aeordb --from v1.0 --to v2.0 --output patch.aeordb

# Diff from a snapshot to current HEAD
aeordb diff --database data.aeordb --from v1.0 --output patch.aeordb

# Diff via HTTP
curl -X POST "http://localhost:6830/versions/diff?from=v1.0&to=v2.0" --output patch.aeordb

A patch file contains only new/changed chunks, updated FileRecords, and deletion markers. Chunks shared between the two versions are not included, making patches much smaller than full exports.

Applying a Patch

# Apply patch (strict base version check)
aeordb import --database target.aeordb --file patch.aeordb

# Skip base version check
aeordb import --database target.aeordb --file patch.aeordb --force

# Apply and promote
aeordb import --database target.aeordb --file patch.aeordb --promote

If the target database’s HEAD does not match the patch’s base version, the import fails unless --force is used.

Promote

Promote sets HEAD to a specific version hash. This is separate from import so you can inspect the imported data before committing to it:

# CLI
aeordb promote --database data.aeordb --hash f6e5d4c3...

# HTTP
curl -X POST "http://localhost:6830/versions/promote" \
  -H "Content-Type: application/json" \
  -d '{"hash": "f6e5d4c3..."}'

Next Steps

Storage Engine – how the Merkle tree and content addressing work at the byte level
Architecture – system overview and crash recovery

Indexing & Queries

AeorDB indexes are opt-in and configured per-directory. Nothing is indexed by default – you control exactly which fields are indexed, with which strategies, and for which file types. This keeps the engine lean and predictable.

Index Configuration

Create a .aeordb-config/indexes.json file in any directory to define indexes for files in that directory:

curl -X PUT http://localhost:6830/files/users/.aeordb-config/indexes.json \
  -H "Content-Type: application/json" \
  -d '{
    "indexes": [
      {"name": "name", "type": ["string", "trigram"]},
      {"name": "age", "type": "u64"},
      {"name": "city", "type": "string"},
      {"name": "email", "type": "trigram"},
      {"name": "created_at", "type": "timestamp"}
    ]
  }'

When this file is created or updated, the engine automatically triggers a background reindex of all existing files in the directory.

Subdirectory Indexing with Glob

By default, an index config only indexes direct children of its directory. To index files across subdirectories, add a glob field:

curl -X PUT http://localhost:6830/files/sessions/.aeordb-config/indexes.json \
  -H "Content-Type: application/json" \
  -d '{
    "glob": "*/session.json",
    "indexes": [
      {"name": "patient_name", "type": ["string", "trigram"]},
      {"name": "notes", "type": "trigram", "source": ["comments", "", "text"]}
    ]
  }'

This config at /sessions/ indexes all session.json files in immediate subdirectories (e.g., /sessions/s1/session.json, /sessions/s2/session.json).

Glob patterns:

* — matches one directory level (*/file.json matches subdir/file.json)
** — matches any depth (**/*.json matches a/b/c/file.json)
? — matches a single character

When a file is stored, the engine checks ancestor directories for glob configs. The nearest matching ancestor’s config is used. Indexes are stored at the config owner’s directory, so querying /sessions/ finds results from all matching subdirectories.

Reindexing with a glob config recursively scans all subdirectories and filters by the glob pattern.

Index Types

Type	Order-Preserving	Description
`u64`	Yes	Unsigned 64-bit integer. Range-tracking with observed min/max.
`i64`	Yes	Signed 64-bit integer. Shifted to [0.0, 1.0] for NVT storage.
`f64`	Yes	64-bit floating point. Clamping for NaN/Inf handling.
`string`	Partially	Exact string matching. Multi-stage scalar: first byte weighted + length.
`timestamp`	Yes	UTC millisecond timestamps. Range-tracking.
`trigram`	No	Trigram-based fuzzy text matching. Tolerates typos, supports substring search.
`phonetic`	No	General phonetic matching (Soundex algorithm).
`soundex`	No	Soundex encoding for English names.
`dmetaphone`	No	Double Metaphone for multi-cultural phonetic matching.

Multi-Strategy Indexes

A single field can be indexed with multiple strategies by passing type as an array:

{"name": "title", "type": ["string", "trigram", "phonetic"]}

This creates three separate index files for the same field:

title.string.idx – exact match queries
title.trigram.idx – fuzzy/substring queries
title.phonetic.idx – phonetic queries

Use the appropriate query operator to target the desired index.

How Indexes Work

AeorDB uses a Normalized Vector Table (NVT) for index lookups. Each indexed field gets its own NVT.

The NVT Approach

A ScalarConverter maps each field value to a scalar in [0.0, 1.0]
The scalar maps to a bucket in the NVT
The bucket points to the matching entries

For numeric types (u64, i64, f64, timestamp), the converter tracks the observed min/max and distributes values uniformly across the [0.0, 1.0] range. This means range queries (gt, lt, between) are efficient – they resolve to a contiguous range of buckets.

For a query like WHERE age > 30:

converter.to_scalar(30) computes where 30 falls in the bucket range
All buckets after that point are candidates
Only those buckets are scanned

This is O(1) for the bucket lookup, with a small linear scan within the bucket.

Two-Tier Execution

Simple queries (single field, direct comparison) use direct scalar lookups – no bitmaps, no compositing. Most queries fall into this tier.

Complex queries (OR, NOT, multi-field boolean logic) build NVT bitmaps and composite them:

Each field condition produces a bitmask over the NVT buckets
AND = bitwise AND of masks
OR = bitwise OR of masks
NOT = bitwise NOT of a mask
The final mask identifies which buckets contain results

Memory usage is bounded: a bitmask for 1M buckets is only 128KB, regardless of how many entries exist.

Source Resolution

By default, the name field in an index definition is used as the JSON key to extract the value:

{"name": "age", "type": "u64"}

This extracts the age field from {"name": "Alice", "age": 30}.

Nested Fields

For nested JSON or parser output, use the source array to specify the path:

{"name": "author", "source": ["metadata", "author"], "type": "string"}

This extracts metadata.author from a JSON structure like:

{"metadata": {"author": "Jane Smith", "title": "Report"}}

The source array supports:

String segments for object key lookup: ["metadata", "author"]
Integer segments for array index access: ["items", 0, "name"]

Array Fan-Out

To index every element in an array, use an empty string "" as a source segment. This “fans out” — creating one index entry per array element:

{"name": "tag", "type": "trigram", "source": ["tags", ""]}

For {"tags": ["rust", "database", "aeordb"]}, this creates three index entries for the same file. A query for tag = "rust" will find this file.

Fan-out works on objects too — "" iterates all values:

{"name": "value", "type": "string", "source": [""]}

For {"color": "red", "size": "large"}, this creates entries for both "red" and "large".

Fan-out can be chained for nested structures:

{"name": "comment_text", "type": "trigram", "source": ["comments", "", "text"]}

For {"comments": [{"text": "hello"}, {"text": "world"}]}, this creates entries for "hello" and "world".

Regex Filtering

Use a regex segment /pattern/flags to filter which keys (on objects) or elements (on arrays) are included:

{"name": "tag_value", "type": "string", "source": ["/^tag_/"]}

For {"tag_color": "red", "tag_size": "large", "name": "foo"}, this creates entries for "red" and "large" (keys matching /^tag_/), but not "foo".

Supported flags:

i — case-insensitive matching

{"name": "name_field", "type": "string", "source": ["/^name$/i"]}

Matches keys name, Name, NAME, etc.

Plugin Mapper

For complex extraction logic, delegate to a WASM plugin:

{
  "name": "summary",
  "source": {"plugin": "my-mapper", "args": {"mode": "summary", "max_length": 500}},
  "type": "trigram"
}

The plugin receives the parsed JSON and the args object, and returns the extracted field value.

Parser Integration

For non-JSON files (PDFs, images, XML, etc.), a parser converts raw bytes into a JSON object that the indexing pipeline can work with.

Native Parsers (Built-In)

AeorDB ships with 8 native parsers that handle common formats automatically – no deployment required. During indexing, the engine tries native parsers first based on content type (with extension-based fallback for application/octet-stream). Supported formats include text, HTML/XML, PDF, images (JPEG/PNG/GIF/BMP/WebP/TIFF/SVG), audio (MP3/WAV/OGG), video (MP4/AVI/WebM/MKV/FLV), MS Office (DOCX/XLSX), and ODF (ODT/ODS). See Plugin Endpoints – Native Parsers for the full list.

If no native parser handles the content type, the engine falls through to the WASM plugin system.

WASM Parser Plugins

For custom or proprietary formats not covered by the built-in parsers, deploy a WASM parser plugin. WASM parsers receive the same input envelope and return the same JSON structure as native parsers.

Configuration

{
  "parser": "pdf-extractor",
  "parser_memory_limit": "256mb",
  "indexes": [
    {"name": "title", "source": ["metadata", "title"], "type": ["string", "trigram"]},
    {"name": "author", "source": ["metadata", "author"], "type": "phonetic"},
    {"name": "content", "source": ["text"], "type": "trigram"},
    {"name": "page_count", "source": ["metadata", "page_count"], "type": "u64"}
  ]
}

The parser receives a JSON envelope with the file data (base64-encoded) and metadata:

{
  "data": "<base64-encoded file bytes>",
  "meta": {
    "filename": "report.pdf",
    "path": "/docs/reports/report.pdf",
    "content_type": "application/pdf",
    "size": 1048576
  }
}

The parser returns a JSON object (like {"text": "...", "metadata": {"title": "...", ...}}), and the source paths in each index definition walk this JSON to extract field values.

Global Parser Registry

You can also register parsers globally by content type at /.config/parsers.json:

{
  "application/pdf": "pdf-extractor",
  "image/jpeg": "image-metadata",
  "image/png": "image-metadata"
}

When a file is stored and no parser is configured in the directory’s index config, the engine checks this registry using the file’s content type.

Failure Handling

Parser and indexing failures never prevent file storage. The file is always stored regardless of parse/index errors. If logging is enabled in the index config ("logging": true), errors are written to .logs/ under the directory.

Default Indexes

On first server start, AeorDB bootstraps a default index configuration at /.aeordb-config/indexes.json with glob: "**/*". This automatically indexes every file’s metadata across the entire database:

Field	Index Types	Description
`@path`	string, trigram	Full file path. Supports exact match plus path substring/similarity search.
`@filename`	string, trigram, soundex, dmetaphone, dmetaphone_alt	File name (last path segment). Supports exact match, fuzzy search, and phonetic matching. `@file_name` is accepted as a query/config alias and resolves to this canonical field.
`@extension`	string	File extension. Supports exact match.
`@hash`	string	Raw whole-file content hash (`blake3(file bytes)`). Supports exact match.
`@created_at`	timestamp	Creation time. Supports range queries.
`@updated_at`	timestamp	Last update time. Supports range queries.
`@size`	u64	File size in bytes. Supports range queries.
`@content_type`	string	MIME type. Supports exact match.

These indexes are stored at /.indexes/ and cover every file in the database. Because the bootstrap config uses glob: "**/*", the query and global search endpoints can use the root index for scoped virtual-field lookups and then filter candidates to the requested path. The global search endpoint (POST /files/search) works out of the box with no additional configuration.

@-Field Source Resolution

When the indexing pipeline encounters a field name starting with @, it extracts the value from the file’s metadata (FileRecord) instead of parsing the file content. This means even binary files (images, videos, PDFs) are indexed by path, filename, extension, raw whole-file hash, timestamps, size, and content type without needing a parser.

@hash is not a chunk hash and is not the content-addressed key for the serialized FileRecord. It is the hash of the reconstructed file bytes, computed and stored during file writes.

Customization

The default config at /.aeordb-config/indexes.json is only written on first boot. Existing databases keep their current config until you edit it. You can modify the config to add or remove default fields; changes trigger an automatic reindex. During reindex, AeorDB retires index files whose field/strategy is no longer present in the config. A forced reindex can also backfill FileRecord metadata migrations such as whole-file @hash.

Automatic Reindexing

When you store or update a .aeordb-config/indexes.json file, the engine automatically enqueues a background reindex task for that directory. If the config contains only virtual @ metadata fields, the task uses the metadata-only path and skips full file reads and parsers. Mixed configs that include content fields use the full indexing pipeline.

The task:

Reads the current index config
Lists all files in the directory
Re-runs metadata-only or full indexing for each file
Buffers index writes in memory and flushes them periodically
Reports progress via GET /system/tasks

During reindexing, queries still work but may return incomplete results. The query response includes a meta.reindexing field with the current progress:

{
  "results": [...],
  "meta": {
    "reindexing": 0.67,
    "reindexing_eta": 1775968398803,
    "reindexing_indexed": 6700,
    "reindexing_total": 10000
  }
}

Query API

Queries are submitted as POST /files/query with a JSON body:

{
  "path": "/users/",
  "where": {
    "and": [
      {"field": "age", "op": "gt", "value": 30},
      {"field": "city", "op": "eq", "value": "Portland"},
      {"not": {"field": "role", "op": "eq", "value": "banned"}}
    ]
  },
  "sort": {"field": "age", "order": "desc"},
  "limit": 50,
  "offset": 0
}

Boolean Logic

The where clause supports full boolean logic:

{
  "where": {
    "or": [
      {"field": "city", "op": "eq", "value": "Portland"},
      {
        "and": [
          {"field": "age", "op": "gt", "value": 25},
          {"field": "city", "op": "eq", "value": "Seattle"}
        ]
      }
    ]
  }
}

For backward compatibility, a flat array in where is treated as an implicit and:

{
  "where": [
    {"field": "age", "op": "gt", "value": 30},
    {"field": "city", "op": "eq", "value": "Portland"}
  ]
}

Query Operators

Operator	Description	Value Type
`eq`	Equals	any
`gt`	Greater than	numeric, timestamp
`gte`	Greater than or equal	numeric, timestamp
`lt`	Less than	numeric, timestamp
`lte`	Less than or equal	numeric, timestamp
`between`	Inclusive range	`[min, max]`
`fuzzy`	Trigram fuzzy match	string (requires trigram index)
`phonetic`	Phonetic match	string (requires phonetic/soundex/dmetaphone index)

Next Steps

Quick Start – hands-on tutorial with query examples
Configuration – full index config reference
Storage Engine – how indexed data is stored

Replication

AeorDB supports multi-node replication using content-addressed sync. Every node is a full peer — any node can accept writes. Nodes sync by comparing directory tree hashes and exchanging missing chunks. Conflicts are detected, preserved, and resolved as first-class database entities.

How It Works

AeorDB’s replication leverages its content-addressed storage architecture:

Every file is identified by its content hash — identical content always produces the same hash, regardless of when or where it was stored
Directory trees are Merkle trees — changing one file changes the tree hash all the way to the root
Sync is a tree comparison — two nodes compare their root hashes. If they differ, they exchange the entries that are different
Chunks are immutable — once stored, a chunk never changes. This makes transferring data between nodes safe and idempotent

Sync Protocol

When two nodes sync:

Node A asks Node B: “What changed since the last time we synced?” (tree diff)
Node B computes the differences and responds with a list of added, modified, and deleted files
Node A fetches any missing chunks from Node B
Node A merges the changes into its own tree, detecting conflicts
Node A updates its HEAD to the merged state

This process is atomic — either all changes are applied, or none are. A network failure mid-sync leaves the database unchanged.

Sync is bidirectional. After Node A pulls from Node B, Node B can pull from Node A to get any changes that originated on Node A.

Conflict Resolution

When two nodes modify the same file independently, AeorDB detects the conflict and resolves it automatically:

Last-Write-Wins (LWW) — the version with the higher virtual timestamp becomes the “current” version
Modify beats delete — if one node modifies a file while another deletes it, the modification wins. Work is never silently lost.
Loser preserved — the “losing” version is stored in /.conflicts/ so it can be recovered

Conflicts are stored as regular database entries, which means they sync to all nodes automatically. A conflict resolved on any node propagates the resolution to all other nodes.

Viewing Conflicts

# List all unresolved conflicts
curl http://localhost:6830/sync/conflicts \
  -H "Authorization: Bearer $TOKEN"

# Resolve a conflict (pick the winner)
curl -X POST http://localhost:6830/sync/conflicts/assets/logo.psd \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"pick": "winner"}'

Virtual Clock

Nodes synchronize their clocks using the heartbeat mechanism. Each heartbeat carries three fields — intent_time, construct_time, and node_id — which allow nodes to compute clock offsets and network latency. This ensures that timestamps used for conflict resolution ordering are consistent across nodes to near-millisecond precision. The heartbeat is a dedicated clock-sync pulse and does not carry any stats or metrics data.

When a new node connects, it enters a honeymoon phase where only heartbeats are exchanged. The node settles its clock before any data sync begins, ensuring accurate timestamp ordering from the first sync.

Selective Sync

Nodes can sync specific path subtrees only:

{
  "sync_paths": ["/assets/**", "/docs/**"]
}

This is useful for:

Desktop clients that only need their working directory
Regional offices that only need their projects
Edge nodes that serve specific content

Client = Node

The replication protocol is the same protocol that the AeorDB client uses. A desktop client syncing with a server and a server syncing with another server use the same mechanism — compare hashes, exchange chunks, merge trees.

Client Sync

Desktop clients and other non-peer applications can sync using the same protocol as replication peers, with appropriate access restrictions:

Clients authenticate with their JWT token
The /.system/ directory is automatically excluded from client sync results
API key scoping rules apply — a scoped key with restricted path access only sees changes for allowed paths
Clients can use the paths filter for selective sync (e.g., only sync /assets/**)

This means a client with a read-only key scoped to /assets/ will only see file changes under /assets/ in sync diffs, and cannot access system data, other users’ files, or paths outside its scope.

Comparison with Strong Consistency

AeorDB uses eventual consistency, not strong consistency (Raft/Paxos). This means:

Feature	AeorDB (Eventual)	Raft (Strong)
Write availability	Any node, anytime	Leader only
Network partition	Both sides keep writing	Minority is read-only
Large files	Stream at your own pace	Consensus on every chunk
Complexity	Low	High
Consistency guarantee	Eventually identical	Immediately identical

For creative teams working with large assets across multiple locations, eventual consistency is the right tradeoff — availability and simplicity matter more than instant consistency.

For hands-on instructions for setting up and managing a cluster, see Cluster Operations.

Users and Workspaces

AeorDB ships with no implicit “home directory” for users. A new user exists in the system store (/.aeordb-system/users/<uuid>) but owns nothing in the file tree until an admin explicitly grants them access to one or more paths.

This is deliberate. Different deployments want different patterns:

An interactive customer-facing app might want every user to have a personal workspace at /workspaces/<username>/.
A team-collaboration app might want shared folders at /teams/<team-name>/ with no per-user space at all.
A backup or sync deployment might want users that exist purely to hold API keys and never write a single file from the portal.

Hardcoding a /home/<user>/ convention into the engine would force every deployment into the first shape. Instead AeorDB exposes the primitives — create a folder, grant a permission — and leaves the policy to the operator.

The suggested convention: `/workspaces/<username>/`

When an admin creates a user from the portal, the “Create User” modal offers to bootstrap a personal workspace as a single extra step:

Username: [wyatt           ]
Email:    [[email protected]]

☑ Grant a personal workspace
   Path: [/workspaces/wyatt  ]   (full access for this user)

Checking the box (default) performs two operations after the user record is created:

POST /files/mkdir to create the folder at the chosen path.
POST /files/share to grant the new user crudlify permission on that folder.

The end result is identical to an admin doing both steps manually from the file browser — it’s a UX shortcut for the common case, not a new engine concept. You can:

Override the path: type whatever makes sense for your deployment (/team-wyatt/, /u/wyatt/, /home/wyatt/, …). The default is just a suggestion.
Skip the workspace entirely: uncheck the box for service / backup / replication users who shouldn’t have a portal-visible storage area.

Users with no grants

A user with no grants and no shares is still a valid AeorDB account. They can:

Authenticate via POST /auth/token and receive a JWT.
Hit any API endpoint the JWT permits (typically reads of paths shared with them later).
Generate API keys for themselves (self-service if enabled).

What they can’t do — yet — is see or write anything in the file tree. The portal’s file browser detects this state and shows a guidance card with the user’s ID so they can request access from an admin instead of staring at a blank page with no action buttons.

Granting access after the fact

If you skipped the workspace at user-creation time (or want to grant additional folders later), the share flow is the canonical path:

From the file browser, navigate to (or create) the folder you want to grant.
Use the Share action on that folder.
Pick the user(s) and the permission level (crudlify for full access, or a custom subset).

This is the same engine call (POST /files/share) that the user-creation modal makes when the workspace checkbox is on — it’s just split across the two operations (mkdir + share) when an admin needs to do it incrementally.

Service / backup / replication users

For automation accounts, the recommended pattern is:

Create the user with the workspace checkbox unchecked.
Generate an API key for that user from the Keys page.
Grant the user the minimal permission set their job needs — e.g. read-only access to /data/exports/ for a backup job, or crudlify on /incoming/<service-name>/ for an ingestion bot.
Hand the API key to the automation.

The user record will never appear in the portal’s file browser because there’s nothing for them to browse — exactly the desired behavior for non-interactive accounts.

Why not auto-create `/home/<user>/`?

The first iteration of this design considered an engine-side rule that would auto-create a home folder for every new user. We rejected it because:

It hardcodes a namespace. Customers wanting /team-wyatt/ would have to navigate around an auto-created /home/wyatt/ that they never asked for.
Service users would get noise — every backup or CI account leaving an empty home folder for someone to clean up.
Existing users (root, anyone created before the rule) would not have one — inconsistent and confusing.

Pushing the workspace creation to the portal UI, with an opt-out checkbox and an editable path, gives the same 90%-case convenience without baking a policy into the engine.

Files & Directories

AeorDB exposes a content-addressable filesystem through its file routes. Every path under /files/ represents either a file or a directory.

Endpoint Summary

Method	Path	Description	Auth	Status Codes
PUT	`/files/{path}`	Store a file	Yes	201, 400, 404, 409, 413, 500
GET	`/files/{path}`	Read a file or list a directory	Yes	200, 206, 404, 416, 500
POST	`/files/fetch`	Batch read file bodies as JSON strings	Yes	200, 400, 404, 413, 500
DELETE	`/files/{path}`	Delete a file	Yes	200, 404, 500
HEAD	`/files/{path}`	Check existence and get metadata	Yes	200, 404, 500
PATCH	`/files/{path}`	Rename a file/symlink (`application/json`) or JSON merge-patch into a stored document (`application/merge-patch+json`)	Yes	200, 201, 400, 404, 413, 415, 500
POST	`/files/share`	Share paths with users/groups	Yes (root)	200, 400, 404, 500
GET	`/files/shares?path=`	List active shares for a path	Yes	200, 500
DELETE	`/files/shares`	Revoke a share	Yes (root)	200, 404, 500
POST	`/files/share-link`	Create a share link (scoped API key + JWT URL)	Yes (root)	201, 400, 403, 500
GET	`/files/share-links?path=`	List active share links for a path	Yes (root)	200, 403, 500
DELETE	`/files/share-links/{key_id}`	Revoke a share link	Yes (root)	200, 403, 404, 500
GET	`/files/shared-with-me`	List paths shared with the current user	Yes	200, 403, 500
POST	`/files/copy`	Copy files or directories	Yes	200, 400, 404, 500
POST	`/files/search`	Global cross-directory search	Yes	200, 400, 500
PUT	`/links/{path}`	Create or update a symlink	Yes	201, 400, 500

Searching by metadata: Files can also be searched by their metadata – path, filename, extension, hash, size, content type, and timestamps – using virtual fields in the query API. New databases index these fields by default; legacy databases can add the default config and reindex. If a virtual-field index is absent, AeorDB falls back to a FileRecord metadata scan.

PUT /files/

Store a file at the given path. Parent directories are created automatically. If a file already exists at the path, it is overwritten (creating a new version).

Streaming uploads: Files are streamed in 256 KB chunks – the server never buffers the full file in memory. Files up to the router-level limit (10 GB) are supported. For resumable uploads of very large files, see the chunked upload protocol.

Request

Headers:
- Authorization: Bearer <token> (required)
- Content-Type (optional) – auto-detected from magic bytes if omitted
Body: raw file bytes

Response

Status: 201 Created

{
  "path": "/data/report.pdf",
  "content_type": "application/pdf",
  "size": 245678,
  "created_at": 1775968398000,
  "updated_at": 1775968398000
}

Side Effects

If the path matches /.aeordb-config/indexes.json (or a nested variant like /data/.aeordb-config/indexes.json), a reindex task is automatically enqueued for the parent directory. Any existing pending or running reindex for that path is cancelled first. Configs containing only virtual @ metadata fields use metadata-only reindexing automatically.
Triggers entries_created events on the event bus.
Runs any deployed store-phase plugins.

Example

curl -X PUT http://localhost:6830/files/data/report.pdf \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/pdf" \
  --data-binary @report.pdf

Error Responses

Status	Condition
400	Invalid input (e.g., empty path)
404	Parent path references a non-existent entity
409	Path conflict (e.g., file exists where directory expected)
413	Payload exceeds the router-level body limit (10 GB)
500	Internal storage failure

GET /files/

Read a file or list a directory. The server determines the type automatically:

If the path resolves to a file, the file content is streamed with appropriate headers.
If the path resolves to a directory, a JSON object with an items array of children is returned.

Request

Headers:
- Authorization: Bearer <token> (required)
- Range: bytes=<start>-<end> (optional for file reads) – single byte ranges only

Query Parameters

Param	Type	Default	Description
`snapshot`	string	—	Read the file as it was at this named snapshot
`version`	string	—	Read the file at this version hash (hex)
`nofollow`	boolean	`false`	If the path is a symlink, return metadata instead of following
`depth`	integer	`0`	Directory listing depth: `0` = immediate children, `-1` = unlimited recursion
`glob`	string	—	Filter directory listings by name, relative path, or full-path glob pattern (``, `?`, `*`)
`limit`	integer	—	Maximum number of entries to return in a directory listing
`offset`	integer	—	Number of entries to skip before returning results

File Response

Status: 200 OK for full reads, 206 Partial Content for valid byte-range reads

Headers:

Header	Description
`Accept-Ranges`	`bytes` for file responses
`Content-Length`	Bytes returned in this response
`Content-Range`	Returned on `206 Partial Content`, e.g. `bytes 1024-2047/4096`
`X-AeorDB-Path`	Canonical path of the file
`X-AeorDB-Size`	File size in bytes
`X-AeorDB-Created-At`	Unix timestamp (milliseconds)
`X-AeorDB-Updated-At`	Unix timestamp (milliseconds)
`Content-Type`	MIME type (if known)

Body: raw file bytes (streamed)

Byte Range Reads

File reads support HTTP single byte ranges for browser media seeking and partial downloads:

curl http://localhost:6830/files/media/movie.mp4 \
  -H "Authorization: Bearer $TOKEN" \
  -H "Range: bytes=1048576-2097151" \
  -o movie.part

Supported forms:

Range	Meaning
`bytes=100-199`	Inclusive byte range
`bytes=100-`	From byte 100 through end of file
`bytes=-1024`	Last 1024 bytes

Unsatisfiable or malformed byte ranges return 416 Range Not Satisfiable with Content-Range: bytes */<file_size>. Multi-range requests are not supported.

Directory Response

Status: 200 OK

Each entry includes path, hash, and numeric entry_type fields. Symlink entries also include a target field.

Entry types: 2 = file, 3 = directory, 8 = symlink.

Effective permissions: When a directory listing is accessed with a scoped API key or as a non-root user, each item in the response includes an effective_permissions field – an 8-character crudlify string indicating what operations the caller can perform on that item. Ancestor directory items (directories leading to the scoped path) receive read+list permissions only.

{
  "items": [
    {
      "path": "/data/report.pdf",
      "name": "report.pdf",
      "entry_type": 2,
      "hash": "a3f8c1...",
      "size": 245678,
      "created_at": 1775968398000,
      "updated_at": 1775968398000,
      "content_type": "application/pdf"
    },
    {
      "path": "/data/images",
      "name": "images",
      "entry_type": 3,
      "hash": "b2c4d5...",
      "size": 0,
      "created_at": 1775968000000,
      "updated_at": 1775968000000,
      "content_type": null
    },
    {
      "path": "/data/latest",
      "name": "latest",
      "entry_type": 8,
      "hash": "c3d5e6...",
      "target": "/data/report.pdf",
      "size": 0,
      "created_at": 1775968500000,
      "updated_at": 1775968500000,
      "content_type": null
    }
  ]
}

Paginated Directory Listing

Use limit and offset to paginate directory listings:

curl "http://localhost:6830/files/data/?limit=10&offset=20" \
  -H "Authorization: Bearer $TOKEN"

When limit or offset is provided, the response includes pagination metadata:

{
  "items": [...],
  "total": 150,
  "limit": 10,
  "offset": 20
}

Field	Type	Description
`total`	integer	Total number of entries (before pagination)
`limit`	integer	The limit that was applied
`offset`	integer	The offset that was applied

Examples

Read a file:

curl http://localhost:6830/files/data/report.pdf \
  -H "Authorization: Bearer $TOKEN" \
  -o report.pdf

List a directory:

curl http://localhost:6830/files/data/ \
  -H "Authorization: Bearer $TOKEN"

Recursive Directory Listing

Use the depth and glob query parameters to list files recursively:

# List all files recursively
curl http://localhost:6830/files/data/?depth=-1 \
  -H "Authorization: Bearer $TOKEN"

# List only .psd files anywhere under /assets/
curl "http://localhost:6830/files/assets/?depth=-1&glob=*.psd" \
  -H "Authorization: Bearer $TOKEN"

# List JSON frames anywhere under /sessions/
curl "http://localhost:6830/files/sessions/?depth=-1&glob=**/frames/*.json" \
  -H "Authorization: Bearer $TOKEN"

# List one level deep
curl http://localhost:6830/files/data/?depth=1 \
  -H "Authorization: Bearer $TOKEN"

When depth > 0 or depth = -1, the response contains files only in a flat list. Directory entries are traversed but not included in the output. Recursive globs can match a basename (*.psd), a path relative to the requested directory (**/frames/*.json), or a full path.

Versioned Reads

Read a file as it was at a specific snapshot or version:

# Read file at a named snapshot
curl "http://localhost:6830/files/data/report.pdf?snapshot=v1.0" \
  -H "Authorization: Bearer $TOKEN"

# Read file at a specific version hash
curl "http://localhost:6830/files/data/report.pdf?version=a1b2c3..." \
  -H "Authorization: Bearer $TOKEN"

If both snapshot and version are provided, snapshot takes precedence. Returns 404 if the file did not exist at that version.

Error Responses

Status	Condition
404	Path does not exist as file or directory
416	Requested byte range cannot be satisfied
500	Internal read failure

POST /files/fetch

Fetch multiple files or multiple file ranges in one request. There are two request shapes:

paths: whole-file batch fetch, returned as a JSON object keyed by canonical path.
items: range batch fetch, returned as an ordered items array.

Provide either paths or items, not both. Directories, missing files, system paths, and unreadable paths return 404. File bytes are encoded into JSON strings with UTF-8 lossy conversion, so binary data may contain replacement characters.

Request

Headers:
- Authorization: Bearer <token> (required)
- Content-Type: application/json (required)
Body:

{
  "paths": [
    "/data/a.txt",
    "/data/b.json"
  ],
  "max_bytes": 1048576
}

Field	Type	Required	Description
`paths`	array	Yes for whole-file mode	File paths to fetch
`items`	array	Yes for range mode	Range fetch requests; see below
`max_bytes`	integer	No	Optional lower cumulative byte limit for this request. Values above the server cap are clamped to the server cap.
`continue_on_error`	boolean	No	Range mode only. If true, per-item errors are returned in the `items` array instead of aborting the request.

Response

Status: 200 OK

{
  "/data/a.txt": {
    "path": "/data/a.txt",
    "name": "a.txt",
    "size": 324534,
    "created_at": 1235413451345,
    "updated_at": 134513453145,
    "content_type": "text/plain",
    "content": "..."
  },
  "/data/b.json": {
    "path": "/data/b.json",
    "name": "b.json",
    "size": 2048,
    "created_at": 1235413451345,
    "updated_at": 134513453145,
    "content_type": "application/json",
    "content": "{\"ok\":true}"
  }
}

Range Fetch Mode

Use items to fetch line, character, byte, or JSON Pointer ranges. This is the preferred follow-up for search hit locators.

{
  "items": [
    {
      "id": "hit-1",
      "path": "/data/a.txt",
      "if_content_hash": "b3c1...",
      "range": { "mode": "lines", "start": 10, "end": 14 }
    },
    {
      "id": "hit-2",
      "path": "/data/b.json",
      "if_updated_at": 1775968398000,
      "range": { "mode": "json_pointer", "pointer": "/messages/0/content" },
      "max_bytes": 65536
    }
  ],
  "continue_on_error": true
}

Item Field	Type	Required	Description
`id`	string	No	Caller-supplied ID echoed in the response
`path`	string	Yes	File path
`if_content_hash`	string	No	Reject as stale if the file content hash changed since search
`if_updated_at`	integer	No	Reject as stale if the file timestamp changed since search
`range.mode`	string	Yes	`lines`, `chars`, `bytes`, or `json_pointer`
`range.start`	integer	No	Start offset. Lines are 1-based inclusive; chars/bytes are 0-based inclusive
`range.end`	integer	No	End offset. Lines are inclusive; chars/bytes are exclusive
`range.pointer`	string	Required for `json_pointer`	RFC 6901 JSON Pointer
`max_bytes`	integer	No	Per-item output cap, clamped by the server

Range-mode response:

{
  "items": [
    {
      "id": "hit-1",
      "status": "ok",
      "path": "/data/a.txt",
      "name": "a.txt",
      "size": 324534,
      "created_at": 1235413451345,
      "updated_at": 134513453145,
      "content_hash": "b3c1...",
      "content_type": "text/plain",
      "range": { "mode": "lines", "start": 10, "end": 14, "pointer": null },
      "source_size": 324534,
      "content": "...",
      "truncated": false
    }
  ],
  "has_errors": false
}

If continue_on_error is true, per-item errors have status values such as not_found, stale, invalid, too_large, or error.

Limits

Limit	Value
Maximum paths	10,000
Maximum cumulative file bytes	256 MB
Default per-item range output	4 MB
Maximum per-item range output	16 MB

Example

curl -X POST http://localhost:6830/files/fetch \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"paths":["/data/a.txt","/data/b.json"]}'

Error Responses

Status	Condition
400	Empty request, invalid range, both `paths` and `items`, or too many paths/items
404	Any requested path is missing, a directory, a system path, or not readable by the caller
409	Range item is stale because `if_content_hash` or `if_updated_at` no longer matches
413	Response would exceed the cumulative byte limit
500	Internal read failure

DELETE /files/

Delete a file or empty directory at the given path. Creates a DeletionRecord and removes the entry from its parent directory listing. The handler tries file deletion first, then directory deletion, then returns 404. Non-empty directories return 400 with “Directory is not empty (N children)”.

Request

Headers:
- Authorization: Bearer <token> (required)

Response

Status: 200 OK

{
  "deleted": true,
  "path": "/data/report.pdf"
}

Deleted File Access

Deleted files are invisible to users who lack the d (delete) permission on the containing directory:

GET /files/deleted?path=/dir returns an empty list
GET /files/{path} returns 404 for deleted files
HEAD /files/{path} returns 404 for deleted files
Reading a deleted file via ?version= also returns 404

Root users always have full access to deleted files.

Side Effects

Triggers entries_deleted events on the event bus.
Updates index entries for the deleted file.

Example

curl -X DELETE http://localhost:6830/files/data/report.pdf \
  -H "Authorization: Bearer $TOKEN"

Error Responses

Status	Condition
400	Directory is not empty
404	Path not found (neither file nor directory)
500	Internal deletion failure

PATCH /files/

PATCH is overloaded by Content-Type:

application/json → rename the file or symlink (documented below).
application/merge-patch+json → server-side JSON merge into the stored document. See the dedicated JSON Merge Patch reference for that mode.

The content-type alone discriminates — a merge-patch body that happens to contain a "to" key will be merged into the file, not used as a rename instruction.

Rename mode

Rename or move a file or symlink to a new path. This is a metadata-only operation – no data is copied in the content-addressed store. The file’s content hash remains the same; only the path mapping changes.

Request

Headers:
- Authorization: Bearer <token> (required)
- Content-Type: application/json (required — application/merge-patch+json routes to the merge endpoint instead)
Body:

{
  "to": "/new/path/report.pdf"
}

Field	Type	Required	Description
`to`	string	Yes	Destination path for the file or symlink

Response

Status: 200 OK

{
  "from": "/data/report.pdf",
  "to": "/archive/report.pdf"
}

Side Effects

Triggers entries_created and entries_deleted events on the event bus.
If the path is a symlink, the symlink itself is moved (not the target).
If a file already exists at the destination path, the operation fails.

Example

# Move a file
curl -X PATCH http://localhost:6830/files/data/report.pdf \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"to": "/archive/report.pdf"}'

# Rename a symlink
curl -X PATCH http://localhost:6830/files/latest-logo \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"to": "/current-logo"}'

Error Responses

Status	Condition
400	Missing `to` field or invalid destination path
404	Source file or symlink not found
500	Internal rename failure

Symlinks

AeorDB supports soft symlinks — entries that point to another path. Symlinks are transparent by default: reading a symlink path returns the target’s content.

PUT /links/

Create or update a symlink.

Request Body:

{
  "target": "/assets/logo.psd"
}

Response: 201 Created

{
  "path": "/latest-logo",
  "target": "/assets/logo.psd",
  "entry_type": 8,
  "created_at": 1775968398000,
  "updated_at": 1775968398000
}

The target path does not need to exist at creation time (dangling symlinks are allowed).

Reading Symlinks

By default, GET /files/{path} follows symlinks transparently:

# Returns the content of /assets/logo.psd
curl http://localhost:6830/files/latest-logo \
  -H "Authorization: Bearer $TOKEN"

To inspect the symlink itself without following it, use ?nofollow=true:

curl "http://localhost:6830/files/latest-logo?nofollow=true" \
  -H "Authorization: Bearer $TOKEN"

Returns the symlink metadata as JSON instead of the target’s content.

Symlink Resolution

Symlinks can point to other symlinks — chains are followed recursively. AeorDB detects cycles and enforces a maximum resolution depth of 32 hops.

Scenario	Result
Symlink -> file	Returns file content
Symlink -> directory	Returns directory listing
Symlink -> symlink -> file	Follows chain, returns file content
Symlink -> nonexistent	404 (dangling symlink)
Symlink cycle (A -> B -> A)	400 with cycle detection message
Chain exceeds 32 hops	400 with depth exceeded message

HEAD on Symlinks

HEAD /files/{path} returns symlink metadata as headers:

X-AeorDB-Entry-Type: symlink
X-AeorDB-Symlink-Target: /assets/logo.psd
X-AeorDB-Path: /latest-logo
X-AeorDB-Created-At: 1775968398000
X-AeorDB-Updated-At: 1775968398000

Deleting Symlinks

DELETE /files/{path} on a symlink deletes the symlink itself, not the target:

curl -X DELETE http://localhost:6830/files/latest-logo \
  -H "Authorization: Bearer $TOKEN"

{
  "deleted": true,
  "path": "latest-logo",
  "entry_type": "symlink"
}

Symlinks in Directory Listings

Symlinks appear in directory listings with entry_type: 8 and a target field:

{
  "path": "/data/latest",
  "name": "latest",
  "entry_type": 8,
  "hash": "c3d5e6...",
  "target": "/data/report.pdf",
  "size": 0,
  "created_at": 1775968500000,
  "updated_at": 1775968500000,
  "content_type": null
}

Symlink Versioning

Symlinks are versioned like files. Snapshots capture the symlink’s target path at that point in time. Restoring a snapshot restores the link, not the resolved content.

AeorDB supports sharing files and directories with specific users and groups. Shares are stored as .permissions files in the filesystem — the sharing API is a convenience layer on top of the existing permission system.

POST /files/share

Share one or more paths with users and/or groups. For files, the permission is scoped to just that file using a path_pattern. For directories, the permission applies to everything inside.

Auth: Root only.

Request Body:

{
  "paths": ["/photos/sunset.jpg", "/photos/beach.jpg"],
  "users": ["6f94eecf-b136-47b4-9b47-c20f781f1f5b"],
  "groups": ["design-team"],
  "permissions": "crudl..."
}

Field	Type	Required	Description
`paths`	array	Yes	Paths to share (files or directories)
`users`	array	No	User UUIDs to share with
`groups`	array	No	Group names to share with
`permissions`	string	Yes	8-character crudlify permission flags

At least one of users or groups must be non-empty.

Response: 200 OK

{
  "shared": 2,
  "paths": ["/photos/sunset.jpg", "/photos/beach.jpg"]
}

How it works:

For each file path, a PermissionLink is created on the parent directory’s .permissions file with a path_pattern matching the filename — so the permission only applies to that specific file.
For each directory path, the link is created with no path_pattern — it applies to everything in the directory.
If a link for the same group and pattern already exists, it is updated (not duplicated).

Example:

curl -X POST http://localhost:6830/files/share \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "paths": ["/photos/sunset.jpg"],
    "users": ["6f94eecf-b136-47b4-9b47-c20f781f1f5b"],
    "permissions": "cr..l..."
  }'

GET /files/shares

List active shares for a path.

Query Parameters:

Param	Type	Required	Description
`path`	string	Yes	Path to list shares for

Response: 200 OK

{
  "path": "/photos/sunset.jpg",
  "shares": [
    {
      "group": "user:6f94eecf-...",
      "username": "alice",
      "allow": "cr..l...",
      "deny": "........",
      "path_pattern": "sunset.jpg"
    },
    {
      "group": "design-team",
      "allow": "crudl...",
      "deny": "........",
      "path_pattern": null
    }
  ]
}

For user:{uuid} groups, the username field is resolved from the user store. For named groups, username is null.

When querying a specific file, only shares with a matching path_pattern (or directory-wide shares with no pattern) are returned.

Example:

curl "http://localhost:6830/files/shares?path=/photos/sunset.jpg" \
  -H "Authorization: Bearer $TOKEN"

DELETE /files/shares

Revoke a specific share by removing the matching PermissionLink.

Auth: Root only.

Request Body:

{
  "path": "/photos/sunset.jpg",
  "group": "user:6f94eecf-...",
  "path_pattern": "sunset.jpg"
}

Field	Type	Required	Description
`path`	string	Yes	The shared path
`group`	string	Yes	The group to revoke (`user:{uuid}` or group name)
`path_pattern`	string	No	The path pattern to match (null for directory-wide links)

Response: 200 OK

{
  "revoked": true,
  "group": "user:6f94eecf-..."
}

Example:

curl -X DELETE http://localhost:6830/files/shares \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/photos/sunset.jpg",
    "group": "user:6f94eecf-...",
    "path_pattern": "sunset.jpg"
  }'

Status	Condition
400	Empty paths, no users/groups, or invalid permissions string
403	Non-root user attempted to share or unshare
404	Path not found, or no matching share to revoke
500	Internal failure writing permissions

Shareable URLs backed by scoped API keys. A share link bundles a scoped API key with a JWT token into a URL that opens the portal file browser, scoped to the shared path.

POST /files/share-link

Create a share link for one or more paths. This creates a scoped API key with user_id: null and generates a JWT token embedded in the URL.

Auth: Root only.

Request Body:

{
  "paths": ["/photos/vacation"],
  "permissions": "cr..l...",
  "expires_in_days": 30,
  "base_url": "https://files.example.com"
}

Field	Type	Required	Description
`paths`	array	Yes	Paths to share
`permissions`	string	Yes	8-character crudlify permission flags
`expires_in_days`	integer	No	Days until expiry (`null` = never expires)
`base_url`	string	Yes	Base URL for the generated share link

Response: 201 Created

{
  "url": "https://files.example.com/share?token=eyJ...",
  "token": "eyJhbGciOiJIUzI1NiIs...",
  "key_id": "a1b2c3d4-...",
  "permissions": "cr..l...",
  "expires_at": 1778560398000,
  "paths": ["/photos/vacation"]
}

The URL opens the portal file browser scoped to the shared path. Ancestor directories are navigable (read+list only) – only path components leading to the target are visible.

Example:

curl -X POST http://localhost:6830/files/share-link \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "paths": ["/photos/vacation"],
    "permissions": "-r--l---",
    "expires_in_days": 7,
    "base_url": "https://files.example.com"
  }'

GET /files/share-links

List active share links for a path.

Auth: Root only.

Query Parameters:

Param	Type	Required	Description
`path`	string	Yes	Path to list share links for

Response: 200 OK

{
  "path": "/photos/vacation",
  "links": [
    {
      "key_id": "a1b2c3d4-...",
      "label": "share-link:/photos/vacation",
      "permissions": "-r--l---",
      "expires_at": 1778560398000,
      "created_at": 1775968398000
    }
  ]
}

Example:

curl "http://localhost:6830/files/share-links?path=/photos/vacation" \
  -H "Authorization: Bearer $TOKEN"

DELETE /files/share-links/

Revoke a share link. This deletes the underlying scoped API key and invalidates the JWT.

Auth: Root only.

Response: 200 OK

{
  "revoked": true,
  "key_id": "a1b2c3d4-..."
}

Example:

curl -X DELETE http://localhost:6830/files/share-links/a1b2c3d4-... \
  -H "Authorization: Bearer $TOKEN"

Status	Condition
400	Missing or invalid fields (paths, permissions, base_url)
403	Non-root user attempted to create, list, or revoke share links
404	Share link not found (revoke)
500	Internal failure creating or revoking share link

Shared With Me

GET /files/shared-with-me

List paths that have been shared with the current user. AeorDB resolves the caller’s group memberships against the cached grants index, returning paths where those groups have access. This is used by the portal to discover entry points when the user has no root-level permissions.

Auth: Requires a user-bound token. Share tokens receive 403. Root users receive an empty list (they already have access to everything).

Response: 200 OK

{
  "paths": [
    {
      "path": "/photos/vacation/",
      "permissions": ".r..l...",
      "path_pattern": null
    }
  ]
}

Each entry includes the shared path, the permissions string granted, and an optional path_pattern (non-null when the share targets a specific file within a directory).

Example:

curl http://localhost:6830/files/shared-with-me \
  -H "Authorization: Bearer $TOKEN"

Error Responses:

Status	Condition
403	Called with a share token instead of a user token
500	Internal failure scanning permissions

HEAD /files/

Check whether a path exists and retrieve its metadata as response headers, without downloading the body. Works for both files and directories.

Request

Headers:
- Authorization: Bearer <token> (required)

Response

Status: 200 OK (empty body)

Headers:

Header	Value
`X-AeorDB-Entry-Type`	`file`, `directory`, or `symlink`
`X-AeorDB-Path`	Canonical path
`X-AeorDB-Size`	File size in bytes (files only)
`X-AeorDB-Created-At`	Unix timestamp in milliseconds (files only)
`X-AeorDB-Updated-At`	Unix timestamp in milliseconds (files only)
`Content-Type`	MIME type (files only, if known)
`X-AeorDB-Symlink-Target`	Target path (symlinks only)

Example

curl -I http://localhost:6830/files/data/report.pdf \
  -H "Authorization: Bearer $TOKEN"

HTTP/1.1 200 OK
X-AeorDB-Entry-Type: file
X-AeorDB-Path: /data/report.pdf
X-AeorDB-Size: 245678
X-AeorDB-Created-At: 1775968398000
X-AeorDB-Updated-At: 1775968398000
Content-Type: application/pdf

Error Responses

Status	Condition
404	Path does not exist
500	Internal metadata lookup failure

POST /files/copy

Copy files or directories to a new location. Uses content-addressed deduplication — no data is physically duplicated.

Request

Headers:
- Authorization: Bearer <token> (required)
- Content-Type: application/json (required)
Body:

{
  "paths": ["/src/file1.txt", "/src/file2.txt"],
  "destination": "/dst/"
}

Field	Type	Required	Description
`paths`	array	Yes	Paths to copy (files or directories)
`destination`	string	Yes	Destination directory

Response

Status: 200 OK

{
  "copied": ["/dst/file1.txt", "/dst/file2.txt"]
}

For directories in the paths list, all contents are recursively copied. If any individual copy fails, the error is reported in an errors array but other copies continue.

Example

curl -X POST http://localhost:6830/files/copy \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "paths": ["/src/file1.txt", "/src/file2.txt"],
    "destination": "/dst/"
  }'

Global Search

POST /files/search

Search across all indexed directories in the database. Unlike POST /files/query which targets a single directory, global search fans out across every directory that has indexes.

Request body:

{
  "query": "hero-banner",
  "limit": 20,
  "offset": 0
}

The query field performs a broad fuzzy search across default-indexed text metadata fields that have fuzzy-capable indexes, such as @filename and @path.

For more targeted searches, use a structured where clause:

{
  "where": {
    "field": "@filename",
    "op": "similar",
    "value": "hero",
    "threshold": 0.3
  },
  "limit": 20
}

To find every path under a root that contains a file with a known whole-file content hash, query @hash with eq:

{
  "path": "/some/root/",
  "where": {
    "field": "@hash",
    "op": "eq",
    "value": "b3c1..."
  },
  "limit": 1000
}

@hash is the raw whole-file hash (blake3(file bytes)) stored in the FileRecord at write time. It is not the first chunk hash. New databases bootstrap an exact string index for @hash; trigram hash indexes are not created by default because exact hash lookup is the normal use case and trigram hash indexes consume substantial memory on large databases. Existing databases may need their /.aeordb-config/indexes.json updated and reindexed before @hash eq uses only the exact index; reindexing also retires old hash index strategies that are no longer present in the config. Legacy FileRecord v0 entries written before this field existed must be migrated before they can participate in exact @hash lookup; trigger a forced reindex with POST /system/tasks/reindex and "force": true to backfill them.

For agent workflows, /files/search and /files/query also support opt-in hit locators with include_matches: true. Locator responses include snippets, typed ranges, content_hash identity anchors, and fetch hints that can be used with /files/fetch range mode. See Querying for the full schema.

Response:

{
  "results": [
    {
      "path": "/assets/hero-banner.png",
      "score": 0.85,
      "directory": "/assets",
      "index_field": "@filename"
    }
  ],
  "total": 1,
  "directories_searched": 42
}

JSON Merge Patch

PATCH /files/{path} with Content-Type: application/merge-patch+json partially updates a JSON document server-side — the engine reads the stored file, merges the patch in, and writes back atomically. Clients send only the changed fields. No more read-modify-write round trips, no more lost-update races between the read and the put.

This document is the implementation guide for client SDKs and applications adopting the endpoint. For the underlying spec, see RFC 7396.

Note on the dispatcher. PATCH /files/{path} is overloaded by Content-Type: application/merge-patch+json lands here; application/json is the existing rename endpoint. The content-type alone distinguishes them — a merge-patch body that happens to contain a "to" key will be merged, not renamed.

Why this exists

Before merge-patch, updating a single field in a stored JSON document required three round trips:

GET /files/record.json — pull the current state.
Parse it, mutate locally.
PUT /files/record.json — push the full document back.

This is a problem because:

It’s a lost-update race. If another writer modifies the file between your GET and PUT, your PUT silently overwrites their change.
It’s bandwidth-expensive. A 5 MB record requires 10 MB of transfer to change a 12-byte field.
It’s client-side complex. Every client has to implement parse/mutate/serialize correctly per schema.

Server-side merge fixes all three: only the patch (typically tiny) crosses the wire, and the engine’s write path is single-threaded so two concurrent patches to disjoint keys compose correctly.

Wire protocol

PATCH /files/{path}
Authorization: Bearer <token>
Content-Type:  application/merge-patch+json

<JSON patch body>

Query parameters:

Param	Type	Default	Meaning
`depth`	signed integer	unset	Controls how deep the merge recurses. See Depth bound below. Unset = strict RFC 7396 (unbounded).

Status codes:

Status	Meaning
`200 OK`	Existing file merged.
`201 Created`	File did not previously exist; patch became the new document.
`400 Bad Request`	Malformed query parameter.
`401 Unauthorized`	Missing/invalid token.
`403 Forbidden`	Caller lacks `update` permission on the path.
`404 Not Found`	Path is in `/.aeordb-system/...` (system data is never client-modifiable).
`413 Payload Too Large`	Patch body or stored file exceeds 10 MB.
`415 Unsupported Media Type`	Patch body is not valid JSON, OR the stored file is not valid JSON.

Successful responses have the same shape as PUT /files/{path}:

{
  "path":         "/record.json",
  "content_type": "application/json",
  "size":         85,
  "created_at":   1779470049858,
  "updated_at":   1779470049903,
  "hash":         "bc14a77290fb594388efe43fbb4a0b31411cea40b6a725b5fcd3aa782a3cd4e4"
}

Merge semantics

The merge is recursive JSON object merge, per RFC 7396:

Patch is an object → each key is merged into the target:
- null value → deletes the key from the target.
- Object value → recursive merge into the target’s value at that key.
- Anything else (scalar, array) → replaces the target’s value at that key.
Patch is anything else (top-level scalar, array, or null) → the patch replaces the entire stored document.

Arrays are replaced, not concatenated

This is a frequent surprise. RFC 7396 has no merge concept for arrays — they are always treated as opaque values.

target: {"tags": ["a", "b", "c"]}
patch:  {"tags": ["d"]}
result: {"tags": ["d"]}        // NOT ["a", "b", "c", "d"]

If you want to append to an array, you have to read it, append client-side, and PATCH the new array, or use a separate operation.

`null` is delete, not “set to null”

target: {"name": "Alice", "email": "a@x"}
patch:  {"email": null}
result: {"name": "Alice"}

If you genuinely need to store a JSON null for a field, you can’t do it with merge-patch. Use PUT with the full document instead.

Missing file → 201 Created

GET /files/new.json    → 404
PATCH /files/new.json with {"a": 1}    → 201 Created
GET /files/new.json    → {"a": 1}

The missing file is treated as {} for merge purposes.

Depth bound

By default the merge recurses to arbitrary depth. The ?depth=N query parameter bounds that recursion. The sign is meaningful:

`?depth=...`	Behavior
(unset)	Strict RFC 7396 — unbounded recursion.
`?depth=0`	Wholesale replace — the patch overwrites the stored document. Functionally identical to a `PUT`.
`?depth=+N`	Merge N levels deep. At the boundary, deeper object values in the patch REPLACE the target’s subtree.
`?depth=-N`	Merge N levels deep. At the boundary, deeper object values in the patch are IGNORED (target’s subtree is preserved).

The signed distinction only fires for object values at the boundary. Scalars and null always behave the same regardless of sign — null deletes at the current merge level, scalars insert/replace.

Why positive vs negative?

The use cases are genuinely different.

Positive depth — “I want to update top-level fields and atomically swap a known subtree”:

target: {"user": {"name": "Alice", "prefs": {"theme": "dark"}}, "session": "abc"}
patch:  {"user": {"prefs": {"theme": "light"}}}
?depth=1
result: {"user": {"prefs": {"theme": "light"}}, "session": "abc"}
                  ▲
              user is REPLACED — name is lost.
              "session" is preserved (not in the patch).

Negative depth — “I want to update top-level fields but leave nested state alone, even if the caller accidentally includes deeper data”:

target: {"user": {"name": "Alice", "prefs": {"theme": "dark"}}, "scalar": "old"}
patch:  {"user": {"prefs": {"theme": "light"}}, "scalar": "new"}
?depth=-1
result: {"user": {"name": "Alice", "prefs": {"theme": "dark"}}, "scalar": "new"}
                  ▲
              user is PRESERVED — patch's user object is ignored entirely.
              "scalar" is updated (scalars always merge regardless of sign).

Negative depth is a defensive primitive: it lets a service that knows it should only update shallow fields enforce that on the server, so a buggy or malicious client can’t accidentally rewrite a nested subtree.

Counting levels

depth=N means N levels of merge actually happen. The outer merge of the patch into the target counts as 1.

depth=1: only top-level keys merge. Their values replace (positive) or are preserved (negative).
depth=2: top-level merges, plus one recursion into object values. Level-3 objects replace/preserve.
depth=3: three levels of merging happen. Level-4 objects replace/preserve.
depth=0: no levels of merging — the patch is written as the new document.

Examples

All examples assume BASE=http://localhost:6830 and JWT=$(...) set to a valid bearer token.

Update a single field

# Stored: {"name": "Alice", "age": 30}
curl -X PATCH "$BASE/files/user.json" \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/merge-patch+json" \
  -d '{"age": 31}'
# Result: {"name": "Alice", "age": 31}

Delete a field

# Stored: {"name": "Alice", "email": "a@x", "phone": "555-1212"}
curl -X PATCH "$BASE/files/user.json" \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/merge-patch+json" \
  -d '{"phone": null}'
# Result: {"name": "Alice", "email": "a@x"}

Update a nested field without disturbing siblings

# Stored: {"user": {"name": "Alice", "prefs": {"theme": "dark", "lang": "en"}}}
curl -X PATCH "$BASE/files/user.json" \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/merge-patch+json" \
  -d '{"user": {"prefs": {"theme": "light"}}}'
# Result: {"user": {"name": "Alice", "prefs": {"theme": "light", "lang": "en"}}}

Swap a subtree atomically (`?depth=+1`)

When you want to replace prefs wholesale rather than merging:

# Stored: {"user": {"name": "Alice", "prefs": {"theme": "dark", "lang": "en"}}}
curl -X PATCH "$BASE/files/user.json?depth=1" \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/merge-patch+json" \
  -d '{"user": {"prefs": {"theme": "light"}}}'
# Result: {"user": {"prefs": {"theme": "light"}}}
#                 ▲ user.name is gone (user object was replaced wholesale)

Protect nested state from accidental writes (`?depth=-1`)

Useful in a service that exposes shallow user-profile updates but never wants its API to touch nested session/credential blobs even if a caller includes them by mistake:

# Stored: {"profile": {"name": "Alice"}, "credentials": {"token": "secret"}}
curl -X PATCH "$BASE/files/user.json?depth=-1" \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/merge-patch+json" \
  -d '{"profile": {"name": "Bob"}, "credentials": {"token": "compromised"}}'
# Result: {"profile": {"name": "Alice"}, "credentials": {"token": "secret"}}
#         ▲ Both nested objects are preserved — the patch's depths are ignored.

Note that scalars at the top level still update — ?depth=-1 is “shallow merge that doesn’t touch the depths,” not “noop.”

Create a new document via PATCH

# /new.json does not exist
curl -X PATCH "$BASE/files/new.json" \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/merge-patch+json" \
  -d '{"hello": "world"}'
# Status: 201 Created
# Stored: {"hello": "world"}

Wholesale replace (`?depth=0` — equivalent to PUT)

curl -X PATCH "$BASE/files/doc.json?depth=0" \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/merge-patch+json" \
  -d '{"replaced": true}'
# Stored: {"replaced": true}  (whatever was there before is gone)

In practice you should just use PUT for this. ?depth=0 exists for callers that compute the depth dynamically and want the boundary to degrade cleanly.

Concurrency

Safe by default

The engine’s write path is single-threaded — writes are serialized through the write buffer and WAL. Two concurrent merge-patches to the same file are applied one at a time by the server. Each merge reads the latest persisted state (including the other writer’s prior merge), applies its own patch on top, and writes.

This means:

Disjoint key updates compose correctly. Writer A’s {"a": 1} and Writer B’s {"b": 2} arriving simultaneously yield {"a": 1, "b": 2} regardless of which runs first.
Overlapping key updates are last-writer-wins. This is inherent to merge semantics — there’s no per-key clock or vector to detect concurrency. If two writers both update "theme", the second to be applied wins.

Compare-and-swap is not currently supported

The endpoint does not yet honor If-Match: <content-hash>. If you have a strict ordering requirement (e.g., “fail if anyone modified this file since I last read it”), do a GET first, capture the hash, then PUT with the new document and compare. We may add an opt-in CAS header in a future revision; track [the followup issue] if you need it.

When you should still GET first

You need to make a decision based on the current state (“only set published to true if draft_count >= 1”). The server can’t conditionally merge, only blindly merge.
You need to read the result of the merge — the response body is metadata only, not the merged content. Do a GET after.

Limits

Limit	Value	Behavior on exceed
Patch body size	10 MB	`413 Payload Too Large`
Stored file size (post-merge target must fit in memory)	10 MB	`413 Payload Too Large`
Path is under `/.aeordb-system/`	—	`404 Not Found`
Patch body not valid JSON	—	`415 Unsupported Media Type`
Stored file present but not valid JSON	—	`415 Unsupported Media Type`

These caps apply because the engine has to hold both the existing document and the patched result in memory simultaneously. If you have records larger than 10 MB, split them into multiple files and merge each independently.

Migration from read-modify-write

If you have existing client code like this:

// OLD: 3 round trips, race-prone
const resp = await fetch(`${base}/files/user.json`, { headers });
const doc = await resp.json();
doc.prefs.theme = 'light';
await fetch(`${base}/files/user.json`, {
  method: 'PUT',
  headers: { ...headers, 'Content-Type': 'application/json' },
  body: JSON.stringify(doc),
});

The migration is mechanical:

// NEW: 1 round trip, race-safe under server-side merge
await fetch(`${base}/files/user.json`, {
  method: 'PATCH',
  headers: { ...headers, 'Content-Type': 'application/merge-patch+json' },
  body: JSON.stringify({ prefs: { theme: 'light' } }),
});

Pick a depth mode based on what your code path expects:

Your code did this	Use this depth
Mutate a deeply-nested field	(unset) — strict RFC 7396
Replace an entire subtree by value	`?depth=+N` matching the subtree depth
Update only shallow fields, never touch nested state	`?depth=-N` matching how far to allow merges
Replace the whole document	`?depth=0` (or just use `PUT`)

Implementation checklist for SDKs

A minimal client wrapper should:

Expose mergePatch(path, patch, { depth }) returning the response metadata.
Set Content-Type: application/merge-patch+json (no other variant — the dispatcher uses this exact string to discriminate from rename).
Encode depth as a signed query-string integer when provided. depth=0 is legal; depth=-1 is legal; omit the query param entirely for unbounded.
Handle 201 and 200 as success. Both have the same body shape.
Surface 413 and 415 distinctly from generic 4xx; they typically indicate a configuration bug (wrong content-type on the stored file, oversize doc) rather than a transient failure.
Do not retry on 4xx. Merge patches are not idempotent in general — retrying a delete-via-null after a successful first call would re-apply against the now-changed state.
Document explicitly that arrays replace wholesale and null deletes, so users don’t get surprised.

Query API

The query engine supports indexed field queries with boolean combinators, pagination, sorting, aggregations, projections, and an explain mode.

Endpoint Summary

Method	Path	Description	Auth	Status Codes
POST	`/files/query`	Execute a query	Yes	200, 400, 404, 500
POST	`/files/search`	Global cross-directory search	Yes	200, 400, 500

POST /files/query

Execute a query against indexed fields within a directory path.

Request Body

{
  "path": "/users",
  "where": {
    "field": "age",
    "op": "gt",
    "value": 21
  },
  "limit": 20,
  "offset": 0,
  "order_by": [{"field": "name", "direction": "asc"}],
  "after": null,
  "before": null,
  "include_total": true,
  "select": ["@path", "@score", "name"],
  "include_matches": false,
  "max_matches_per_result": 5,
  "snippet_chars": 160,
  "match_context_lines": 2,
  "max_locator_scan_bytes": 268435456,
  "explain": false
}

Field	Type	Required	Description
`path`	string	Yes	Directory path to query within
`where`	object/array	Yes	Query filter (see below)
`limit`	integer	No	Max results to return (server default applies if omitted)
`offset`	integer	No	Skip this many results
`order_by`	array	No	Sort fields with direction
`after`	string	No	Cursor for forward pagination
`before`	string	No	Cursor for backward pagination
`include_total`	boolean	No	Include `total_count` in response (default: false)
`select`	array	No	Project specific fields in results
`aggregate`	object	No	Run aggregations instead of returning results
`include_matches`	boolean	No	Include request-time hit locators for returned results (default: false)
`max_matches_per_result`	integer	No	Maximum hit locators per result (default: 5, max: 50)
`snippet_chars`	integer	No	Maximum snippet characters per locator (default: 160, max: 4096)
`match_context_lines`	integer	No	Context line count for stored-file line fetch hints
`max_locator_scan_bytes`	integer	No	Caller cap for stored-file locator scans; server clamps to its hard cap
`explain`	string/boolean	No	`"plan"`, `"analyze"`, or `true` for query plan

Hit locators are opt-in. They are generated only for the current page after authorization filtering, not during index lookup.

Query Operators

Each field query is an object with field, op, and value:

{"field": "age", "op": "gt", "value": 21}

Comparison Operators

Operator	Description	Value Type	Example
`eq`	Exact match	any	`{"field": "status", "op": "eq", "value": "active"}`
`gt`	Greater than	number/string	`{"field": "age", "op": "gt", "value": 21}`
`lt`	Less than	number/string	`{"field": "age", "op": "lt", "value": 65}`
`between`	Inclusive range	number/string	`{"field": "age", "op": "between", "value": 21, "value2": 65}`
`in`	Match any value in a set	array	`{"field": "status", "op": "in", "value": ["active", "pending"]}`

Text Search Operators

These operators require the appropriate index type to be configured.

Operator	Description	Index Required	Example
`contains`	Substring match	trigram	`{"field": "name", "op": "contains", "value": "alice"}`
`similar`	Fuzzy trigram match with threshold	trigram	`{"field": "name", "op": "similar", "value": "alice", "threshold": 0.3}`
`phonetic`	Sounds-like match	phonetic	`{"field": "name", "op": "phonetic", "value": "smith"}`
`fuzzy`	Configurable fuzzy match	trigram	See below
`match`	Multi-strategy combined match	trigram + phonetic	`{"field": "name", "op": "match", "value": "alice"}`

Fuzzy Operator Options

The fuzzy operator supports additional parameters:

{
  "field": "name",
  "op": "fuzzy",
  "value": "alice",
  "fuzziness": "auto",
  "algorithm": "damerau_levenshtein"
}

Parameter	Values	Default
`fuzziness`	`"auto"` or integer (edit distance)	`"auto"`
`algorithm`	`"damerau_levenshtein"`, `"jaro_winkler"`	`"damerau_levenshtein"`

Similar Operator Options

{
  "field": "name",
  "op": "similar",
  "value": "alice",
  "threshold": 0.3
}

Parameter	Type	Default	Description
`threshold`	float	0.3	Minimum similarity score (0.0 to 1.0)

Boolean Combinators

Combine multiple conditions using and, or, and not:

AND

All conditions must match:

{
  "where": {
    "and": [
      {"field": "age", "op": "gt", "value": 21},
      {"field": "status", "op": "eq", "value": "active"}
    ]
  }
}

OR

At least one condition must match:

{
  "where": {
    "or": [
      {"field": "status", "op": "eq", "value": "active"},
      {"field": "status", "op": "eq", "value": "pending"}
    ]
  }
}

NOT

Invert a condition:

{
  "where": {
    "not": {"field": "status", "op": "eq", "value": "deleted"}
  }
}

Nested Boolean Logic

Combinators can be nested up to 32 levels deep. Queries exceeding this depth are rejected with a 400 error. Additionally, a single query can return at most 100,000 results – requests that would exceed this ceiling are truncated.

{
  "where": {
    "and": [
      {"field": "age", "op": "gt", "value": 21},
      {
        "or": [
          {"field": "role", "op": "eq", "value": "admin"},
          {"field": "role", "op": "eq", "value": "moderator"}
        ]
      }
    ]
  }
}

Legacy Array Format

An array at the top level is sugar for AND:

{
  "where": [
    {"field": "age", "op": "gt", "value": 21},
    {"field": "status", "op": "eq", "value": "active"}
  ]
}

Response Format

Standard Query Response

{
  "items": [
    {
      "path": "/users/alice.json",
      "size": 256,
      "content_type": "application/json",
      "created_at": 1775968398000,
      "updated_at": 1775968398000,
      "score": 1.0,
      "matched_by": ["name"]
    }
  ],
  "has_more": true,
  "total_count": 150,
  "next_cursor": "eyJwYXRoIjoiL3VzZXJzL2JvYi5qc29uIn0=",
  "prev_cursor": "eyJwYXRoIjoiL3VzZXJzL2Fhcm9uLmpzb24ifQ==",
  "meta": {
    "reindexing": 0.67,
    "reindexing_eta": 1775968398803,
    "reindexing_indexed": 670,
    "reindexing_total": 1000,
    "reindexing_stale_since": 1775968300000
  }
}

Field	Type	Description
`items`	array	Matching file metadata with scores
`has_more`	boolean	Whether more results exist beyond the current page
`total_count`	integer	Total matching results (only if `include_total: true`)
`next_cursor`	string	Cursor for the next page (if `has_more` is true)
`prev_cursor`	string	Cursor for the previous page
`default_limit_hit`	boolean	Present and true when the server’s default limit was applied
`default_limit`	integer	The server’s default limit value (present with `default_limit_hit`)
`meta`	object	Reindex progress metadata (present only during active reindex)

Result Fields

Each result object contains:

Field	Type	Description
`path`	string	Full path to the matched file
`size`	integer	File size in bytes
`content_type`	string	MIME type (nullable)
`created_at`	integer	Creation timestamp (ms)
`updated_at`	integer	Last update timestamp (ms)
`score`	float	Relevance score (1.0 = exact match)
`matched_by`	array	List of field names that matched
`content_hash`	string	Whole-file content hash, present when `include_matches` is true and the record has a hash
`matches`	array	Hit locators, present when `include_matches` is true
`matches_truncated`	boolean	True if more locators were available than returned
`locator_status`	string	`complete`, `partial`, or `unsupported`

Hit Locators

When include_matches is true, /files/query and /files/search add bounded hit locators to each returned result. This is designed for agents that need to search first and then fetch only relevant ranges.

For JSON files, matching indexed fields are reported as field-value locators with JSON Pointer fetch hints:

{
  "path": "/users/alice.json",
  "content_hash": "b3c1...",
  "matches": [
    {
      "id": "m_0001",
      "query": "Alice",
      "matched_text": "Alice",
      "field": "name",
      "operator": "eq",
      "source": {
        "type": "field-value",
        "field": "name",
        "json_pointer": "/name",
        "value_type": "string"
      },
      "range": {
        "char": { "start": 0, "end": 5, "unit": "unicode-scalar", "basis": "field-value" }
      },
      "fetch": {
        "json_pointer": "/name",
        "preferred": "json_pointer"
      },
      "snippet": {
        "text": "Alice",
        "highlight": [{ "start": 0, "end": 5, "unit": "unicode-scalar" }],
        "truncated_before": false,
        "truncated_after": false
      },
      "confidence": "exact",
      "scan_status": "complete"
    }
  ],
  "matches_truncated": false,
  "locator_status": "complete"
}

Metadata matches such as @filename use source.type = "metadata". Plain-text or unstructured UTF-8 file matches use source.type = "stored-file" and include byte, character, line, and column ranges plus byte_range and line_range fetch hints.

Use POST /files/fetch range mode to fetch the returned ranges. Pass if_content_hash or if_updated_at from the search result to avoid fetching a stale range after the file changes.

Sorting

Sort results by one or more fields:

{
  "order_by": [
    {"field": "name", "direction": "asc"},
    {"field": "created_at", "direction": "desc"}
  ]
}

Direction	Description
`asc`	Ascending (default)
`desc`	Descending

Pagination

Offset-Based

{
  "limit": 20,
  "offset": 40
}

Cursor-Based

Use after or before with cursor values from a previous response:

{
  "limit": 20,
  "after": "eyJwYXRoIjoiL3VzZXJzL2JvYi5qc29uIn0="
}

Projection (select)

Return only specific fields in each result. Use @-prefixed names for built-in metadata fields:

{
  "select": ["@path", "@score", "name", "email"]
}

Virtual Field	Maps To
`@path`	`path`
`@score`	`score`
`@size`	`size`
`@content_type`	`content_type`
`@created_at`	`created_at`
`@updated_at`	`updated_at`
`@matched_by`	`matched_by`
`@content_hash`	`content_hash`
`@matches`	`matches`

Envelope fields (has_more, next_cursor, total_count, meta) are never stripped by projection.

Aggregations

Run aggregate computations instead of returning individual results.

Request

{
  "path": "/orders",
  "where": {"field": "status", "op": "eq", "value": "complete"},
  "aggregate": {
    "count": true,
    "sum": ["total", "tax"],
    "avg": ["total"],
    "min": ["total"],
    "max": ["total"],
    "group_by": ["status"]
  }
}

Field	Type	Description
`count`	boolean	Include a count of matching records
`sum`	array	Fields to sum
`avg`	array	Fields to average
`min`	array	Fields to find minimum
`max`	array	Fields to find maximum
`group_by`	array	Fields to group results by

Response

The response shape depends on whether group_by is used. Aggregation results are returned as a JSON object.

Explain Mode

Inspect the query execution plan without running the full query. Useful for debugging index usage and performance.

{
  "path": "/users",
  "where": {"field": "age", "op": "gt", "value": 21},
  "explain": "plan"
}

Value	Description
`true` or `"plan"`	Show the query plan
`"analyze"`	Execute the query and include timing information

Virtual Fields

Virtual fields let you query file metadata directly. Prefix a field name with @ to query against built-in file attributes instead of indexed document fields.

AeorDB uses virtual-field indexes when they are configured. New databases bootstrap default virtual-field indexes at /.aeordb-config/indexes.json; older databases can add the same config and run a forced reindex to backfill them. If no index exists for a supported virtual field, AeorDB falls back to scanning FileRecord metadata for compatibility.

Available Virtual Fields

Field	Type	Description
`@path`	string	Full file path (e.g., `/docs/report.pdf`)
`@filename`	string	Filename only – the last segment of the path (e.g., `report.pdf`)
`@file_name`	string	Alias for `@filename`; queries use the canonical `@filename` index
`@extension`	string	File extension after the last `.` (e.g., `pdf`)
`@content_type`	string	MIME type (e.g., `application/pdf`)
`@hash`	string	Raw whole-file content hash (`blake3(file bytes)`)
`@size`	u64	File size in bytes
`@created_at`	i64	Creation timestamp in milliseconds
`@updated_at`	i64	Last update timestamp in milliseconds

Supported Operators

String virtual fields (@path, @filename, @file_name, @extension, @content_type, @hash) support:

eq, contains, in, gt, lt, similar (trigram), fuzzy (edit distance), phonetic (soundex/metaphone), match (fused multi-strategy)

Numeric virtual fields (@size, @created_at, @updated_at) support:

eq, gt, lt, between, in

How Virtual Fields Work

Virtual fields are derived from the FileRecord header and payload. Query execution is index-first:

Default indexes. New databases index every built-in virtual field for exact lookup by default. @filename also gets trigram and phonetic indexes for contains, similar, fuzzy, phonetic, and match.
Ancestor glob indexes. If a query is scoped below a glob-indexed directory, AeorDB can use the ancestor index and filter candidates back down to the requested path.
Scan fallback. If a database does not have a matching virtual-field index, AeorDB scans FileRecords under the query path so legacy queries continue to work. This fallback is O(n) over files under the path.
Combinable with indexed fields. You can mix virtual fields and indexed fields in the same where clause using boolean combinators (and, or, not).
Canonical aliases. @file_name is accepted as an alias but is indexed and queried as @filename.

Virtual Field Examples

Find all PDFs by extension:

curl -X POST http://localhost:6830/files/query \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/",
    "where": {"field": "@extension", "op": "eq", "value": "pdf"}
  }'

Fuzzy filename search (handles typos):

curl -X POST http://localhost:6830/files/query \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/docs",
    "where": {"field": "@filename", "op": "similar", "value": "report", "threshold": 0.3}
  }'

Find large files over 10 MB:

curl -X POST http://localhost:6830/files/query \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/",
    "where": {"field": "@size", "op": "gt", "value": 10485760}
  }'

Find images by content type:

curl -X POST http://localhost:6830/files/query \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/",
    "where": {"field": "@content_type", "op": "contains", "value": "image/"}
  }'

Combine multiple virtual fields – large PDFs:

curl -X POST http://localhost:6830/files/query \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/",
    "where": {
      "and": [
        {"field": "@extension", "op": "eq", "value": "pdf"},
        {"field": "@size", "op": "gt", "value": 1048576}
      ]
    }
  }'

Global Search

Search across all indexed directories in the database.

POST /files/search

Field	Type	Required	Description
`query`	string	No	Broad search — searched against all trigram, phonetic, soundex, and dmetaphone indexed fields
`where`	object	No	Structured query filter (same syntax as `/files/query`)
`path`	string	No	Scope search to a subtree (default: `/` = everything)
`limit`	integer	No	Max results (default: 50, max: 1000)
`offset`	integer	No	Skip results
`include_matches`	boolean	No	Include request-time hit locators for returned results (default: false)
`max_matches_per_result`	integer	No	Maximum hit locators per result (default: 5, max: 50)
`snippet_chars`	integer	No	Maximum snippet characters per locator (default: 160, max: 4096)
`match_context_lines`	integer	No	Context line count for stored-file line fetch hints
`max_locator_scan_bytes`	integer	No	Caller cap for stored-file locator scans; server clamps to its hard cap

At least one of query or where is required.

Broad Search

Discovers all directories with fuzzy-capable indexes (trigram, phonetic, soundex, dmetaphone) and searches the term against every matching field:

curl -X POST http://localhost:6830/files/search \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "alice", "limit": 20}'

Structured Search

Same where clause syntax as /files/query, but searches across all directories that have the requested field indexed:

curl -X POST http://localhost:6830/files/search \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"where": {"field": "@size", "op": "gt", "value": 1048576}}'

Combined

Broad search filtered by structured conditions:

curl -X POST http://localhost:6830/files/search \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "report", "where": {"field": "@extension", "op": "eq", "value": "pdf"}}'

Response

Same format as /files/query, plus a source field per result indicating which directory’s index matched. The locator fields below are present only when include_matches is true:

{
  "results": [
    {
      "path": "/users/alice.json",
      "score": 0.95,
      "matched_by": ["@filename"],
      "source": "/",
      "size": 256,
      "content_type": "application/json",
      "created_at": 1775968398000,
      "updated_at": 1775968398000,
      "content_hash": "b3c1...",
      "matches": [],
      "matches_truncated": false,
      "locator_status": "complete"
    }
  ],
  "has_more": false,
  "total_count": 1
}

Examples

Simple equality query

curl -X POST http://localhost:6830/files/query \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/users",
    "where": {"field": "status", "op": "eq", "value": "active"},
    "limit": 10
  }'

Fuzzy name search with pagination

curl -X POST http://localhost:6830/files/query \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/users",
    "where": {"field": "name", "op": "similar", "value": "alice", "threshold": 0.4},
    "limit": 20,
    "order_by": [{"field": "name", "direction": "asc"}],
    "include_total": true
  }'

Complex boolean query

curl -X POST http://localhost:6830/files/query \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/products",
    "where": {
      "and": [
        {"field": "price", "op": "between", "value": 10, "value2": 100},
        {
          "or": [
            {"field": "category", "op": "eq", "value": "electronics"},
            {"field": "category", "op": "eq", "value": "books"}
          ]
        },
        {"not": {"field": "status", "op": "eq", "value": "discontinued"}}
      ]
    },
    "order_by": [{"field": "price", "direction": "asc"}],
    "limit": 50
  }'

Aggregation with grouping

curl -X POST http://localhost:6830/files/query \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/orders",
    "where": {"field": "year", "op": "eq", "value": 2026},
    "aggregate": {
      "count": true,
      "sum": ["total"],
      "avg": ["total"],
      "group_by": ["status"]
    }
  }'

Error Responses

Status	Condition
400	Invalid query structure, missing field/op, unsupported operation, range query on non-range converter
404	Query path or index not found
500	Internal query execution failure

Version API

AeorDB provides Git-like version control through snapshots (named points in time) and forks (divergent branches of the data).

Endpoint Summary

Method	Path	Description	Auth	Root Required
POST	`/versions/snapshots`	Create a snapshot	Yes	No
GET	`/versions/snapshots`	List all snapshots	Yes	No
POST	`/versions/restore`	Restore a snapshot	Yes	Yes
DELETE	`/versions/snapshots/{name}`	Delete a snapshot	Yes	Yes
POST	`/versions/forks`	Create a fork	Yes	No
GET	`/versions/forks`	List all forks	Yes	No
POST	`/versions/forks/{name}/promote`	Promote fork to HEAD	Yes	Yes
DELETE	`/versions/forks/{name}`	Abandon a fork	Yes	Yes
GET	`/files/{path}?snapshot={name}`	Read file at a snapshot	Yes	No
GET	`/files/{path}?version={hash}`	Read file at a version hash	Yes	No
GET	`/versions/history/{path}`	File change history across snapshots	Yes	No
POST	`/versions/restore/{path}`	Restore file from a version	Yes	Yes

Snapshots

POST /versions/snapshots

Create a named snapshot of the current HEAD.

If lifecycle configuration has "snapshot_writes_enabled": false, this endpoint returns 403 Forbidden. Existing snapshots remain available for listing, reading, restoring, deleting, exporting, and pruning.

Request Body:

{
  "name": "v1.0",
  "metadata": {
    "description": "First stable release",
    "author": "alice"
  }
}

Field	Type	Required	Description
`name`	string	Yes	Unique snapshot name
`metadata`	object	No	Arbitrary key-value metadata (defaults to empty)

Response: 201 Created

{
  "name": "v1.0",
  "root_hash": "a1b2c3d4e5f6...",
  "created_at": 1775968398000,
  "metadata": {
    "description": "First stable release",
    "author": "alice"
  }
}

Example:

curl -X POST http://localhost:6830/versions/snapshots \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "v1.0", "metadata": {"description": "First stable release"}}'

Error Responses:

Status	Condition
403	Snapshot writes disabled by lifecycle configuration
409	Snapshot with this name already exists
500	Internal failure

GET /versions/snapshots

List all snapshots.

Response: 200 OK

{
  "items": [
    {
      "name": "v1.0",
      "root_hash": "a1b2c3d4e5f6...",
      "created_at": 1775968398000,
      "metadata": {"description": "First stable release"}
    },
    {
      "name": "v2.0",
      "root_hash": "f6e5d4c3b2a1...",
      "created_at": 1775969000000,
      "metadata": {}
    }
  ]
}

Example:

curl http://localhost:6830/versions/snapshots \
  -H "Authorization: Bearer $TOKEN"

POST /versions/restore

Restore a named snapshot, making it the current HEAD. Requires root.

Request Body:

{
  "name": "v1.0"
}

Response: 200 OK

{
  "restored": true,
  "name": "v1.0"
}

Example:

curl -X POST http://localhost:6830/versions/restore \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "v1.0"}'

Error Responses:

Status	Condition
403	Non-root user
404	Snapshot not found
500	Internal failure

DELETE /versions/snapshots/

Delete a named snapshot. Requires root.

Response: 200 OK

{
  "deleted": true,
  "name": "v1.0"
}

Example:

curl -X DELETE http://localhost:6830/versions/snapshots/v1.0 \
  -H "Authorization: Bearer $TOKEN"

Error Responses:

Status	Condition
403	Non-root user
404	Snapshot not found
500	Internal failure

Forks

Forks create a divergent branch of the data, optionally based on a named snapshot.

POST /versions/forks

Create a new fork.

Request Body:

{
  "name": "experiment",
  "base": "v1.0"
}

Field	Type	Required	Description
`name`	string	Yes	Unique fork name
`base`	string	No	Snapshot name to fork from (defaults to current HEAD)

Response: 201 Created

{
  "name": "experiment",
  "root_hash": "a1b2c3d4e5f6...",
  "created_at": 1775968398000
}

Example:

curl -X POST http://localhost:6830/versions/forks \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "experiment", "base": "v1.0"}'

Error Responses:

Status	Condition
409	Fork with this name already exists
500	Internal failure

GET /versions/forks

List all active forks.

Response: 200 OK

{
  "items": [
    {
      "name": "experiment",
      "root_hash": "a1b2c3d4e5f6...",
      "created_at": 1775968398000
    }
  ]
}

Example:

curl http://localhost:6830/versions/forks \
  -H "Authorization: Bearer $TOKEN"

POST /versions/forks/{name}/promote

Promote a fork’s state to HEAD, making it the active version. Requires root.

Response: 200 OK

{
  "promoted": true,
  "name": "experiment"
}

Example:

curl -X POST http://localhost:6830/versions/forks/experiment/promote \
  -H "Authorization: Bearer $TOKEN"

Error Responses:

Status	Condition
403	Non-root user
404	Fork not found
500	Internal failure

DELETE /versions/forks/

Abandon a fork (soft delete). Requires root.

Response: 200 OK

{
  "abandoned": true,
  "name": "experiment"
}

Example:

curl -X DELETE http://localhost:6830/versions/forks/experiment \
  -H "Authorization: Bearer $TOKEN"

Error Responses:

Status	Condition
403	Non-root user
404	Fork not found
500	Internal failure

File-Level Version Access

Read, restore, and view history for individual files at specific historical versions.

Reading Files at a Version

Use query parameters on the standard file read endpoint:

# Read a file as it was at a named snapshot
curl "http://localhost:6830/files/assets/logo.psd?snapshot=v1.0" \
  -H "Authorization: Bearer $TOKEN"

# Read a file at a specific version hash
curl "http://localhost:6830/files/assets/logo.psd?version=a1b2c3d4..." \
  -H "Authorization: Bearer $TOKEN"

Returns the file content exactly as it was at that version, with the same headers as a normal file read. If both snapshot and version are provided, snapshot takes precedence.

Error Responses:

Status	Condition
404	File did not exist at that version
404	Snapshot or version not found
400	Invalid version hash (not valid hex)

GET /versions/history/

View the change history of a single file across all snapshots. Returns entries ordered newest-first, each with a change_type indicating what happened to the file at that snapshot.

Response: 200 OK

{
  "path": "assets/logo.psd",
  "history": [
    {
      "snapshot": "v2.0",
      "timestamp": 1775969000000,
      "change_type": "modified",
      "size": 512000,
      "content_type": "image/vnd.adobe.photoshop",
      "content_hash": "f6e5d4c3..."
    },
    {
      "snapshot": "v1.0",
      "timestamp": 1775968398000,
      "change_type": "added",
      "size": 256000,
      "content_type": "image/vnd.adobe.photoshop",
      "content_hash": "a1b2c3d4..."
    }
  ]
}

Change types:

Type	Meaning
`added`	File exists in this snapshot but not the previous one
`modified`	File exists in both but content changed
`unchanged`	File exists in both with identical content
`deleted`	File existed in the previous snapshot but not this one

If the file has never existed in any snapshot, returns 200 with an empty history array.

Example:

curl http://localhost:6830/versions/history/assets/logo.psd \
  -H "Authorization: Bearer $TOKEN"

POST /versions/restore/

Restore a single file from a historical version to the current HEAD. Requires root.

Before restoring, an automatic safety snapshot is created to preserve the current state. If lifecycle configuration has "snapshot_writes_enabled": false, this safety snapshot is skipped and the restore still proceeds. If snapshot writes are enabled but the safety snapshot cannot be created, the restore is rejected.

Request Body:

{
  "snapshot": "v1.0"
}

Or using a version hash:

{
  "version": "a1b2c3d4..."
}

Field	Type	Required	Description
`snapshot`	string	One required	Snapshot name to restore from
`version`	string	One required	Version hash (hex) to restore from

If both are provided, snapshot takes precedence.

Response: 200 OK

{
  "restored": true,
  "path": "assets/logo.psd",
  "from_snapshot": "v1.0",
  "auto_snapshot": "pre-restore-2026-04-14T05-01-01Z",
  "size": 256000
}

The auto_snapshot field contains the name of the safety snapshot created before the restore. You can use this snapshot to recover the pre-restore state if needed.

When snapshot writes are disabled, auto_snapshot is null:

{
  "restored": true,
  "path": "assets/logo.psd",
  "from_snapshot": "v1.0",
  "auto_snapshot": null,
  "size": 256000
}

Example:

curl -X POST http://localhost:6830/versions/restore/assets/logo.psd \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"snapshot": "v1.0"}'

Error Responses:

Status	Condition
400	Neither `snapshot` nor `version` provided
403	Non-root user (requires both write and snapshot permissions)
404	File not found at the specified version
404	Snapshot or version not found
500	Failed to create safety snapshot while snapshot writes are enabled, or failed to write restored file

Pre-Hashed Upload Protocol

AeorDB provides a 4-phase upload protocol for efficient, deduplicated file transfers. Clients split files into chunks, hash them locally, and only upload chunks the server does not already have.

When to use this protocol: Inline uploads via PUT /files/{path} are capped at 100 MB. Files larger than 100 MB must use this chunked upload protocol. It is also beneficial for large batches of files because the dedup check (phase 2) skips chunks already on the server.

Protocol Overview

Negotiate – GET /blobs/config to learn the hash algorithm and chunk size.
Dedup check – POST /blobs/check with a list of chunk hashes to find which are already stored.
Upload – PUT /blobs/chunks/{hash} for each needed chunk.
Commit – POST /blobs/commit to atomically assemble chunks into files.

Endpoint Summary

Method	Path	Description	Auth	Body Limit
GET	`/blobs/config`	Negotiate hash algorithm and chunk size	No	–
POST	`/blobs/check`	Check which chunks the server already has	Yes	32 MiB
PUT	`/blobs/chunks/{hash}`	Upload a single chunk	Yes	10 GB
POST	`/blobs/commit`	Atomic multi-file commit from chunks	Yes	32 MiB

Phase 1: GET /blobs/config

Retrieve the server’s hash algorithm, chunk size, and hash prefix. This endpoint is public (no authentication required).

Response

Status: 200 OK

{
  "hash_algorithm": "blake3",
  "chunk_size": 262144,
  "chunk_hash_prefix": "chunk:"
}

Field	Type	Description
`hash_algorithm`	string	Hash algorithm used by the server (e.g., `"blake3"`)
`chunk_size`	integer	Maximum chunk size in bytes (262,144 = 256 KB)
`chunk_hash_prefix`	string	Prefix prepended to chunk data before hashing

How to Compute Chunk Hashes

The server computes chunk hashes as:

hash = blake3("chunk:" + chunk_bytes)

Clients must use the same formula. The prefix ("chunk:") is prepended to the raw bytes before hashing, not to the hex-encoded hash.

Example

curl http://localhost:6830/blobs/config

Phase 2: POST /blobs/check

Send a list of chunk hashes to determine which ones the server already has (deduplication). Only upload the ones in the needed list.

The request body is a JSON manifest and is capped at 32 MiB. Clients syncing very large files or large batches should split /blobs/check calls before that limit.

Request Body

{
  "hashes": [
    "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2",
    "f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5"
  ]
}

Field	Type	Required	Description
`hashes`	array of strings	Yes	Hex-encoded chunk hashes

Response

Status: 200 OK

{
  "have": [
    "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
  ],
  "needed": [
    "f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5"
  ]
}

Field	Type	Description
`have`	array	Hashes the server already has – skip these
`needed`	array	Hashes the server needs – upload these

Example

curl -X POST http://localhost:6830/blobs/check \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"hashes": ["a1b2c3...", "f6e5d4..."]}'

Error Responses

Status	Condition
400	Invalid hex hash in the list

Phase 3: PUT /blobs/chunks/

Upload a single chunk. The server verifies the hash matches the content before storing.

Request

URL parameter: {hash} – hex-encoded blake3 hash of "chunk:" + chunk_bytes
Headers:
- Authorization: Bearer <token> (required)
Body: raw chunk bytes

Hash Verification

The server recomputes the hash from the uploaded bytes:

computed = blake3("chunk:" + body_bytes)

If the computed hash does not match the URL parameter, the upload is rejected.

Response

Status: 201 Created (new chunk stored)

{
  "status": "created",
  "hash": "f6e5d4c3b2a1..."
}

Status: 200 OK (chunk already exists – dedup)

{
  "status": "exists",
  "hash": "f6e5d4c3b2a1..."
}

Compression

Blob staging stores chunk bytes exactly as uploaded. AeorDB does not blindly compress /blobs/chunks/ payloads because large media files are often already compressed, and commit can use raw chunk headers for fast metadata validation.

Example

curl -X PUT http://localhost:6830/blobs/chunks/f6e5d4c3b2a1... \
  -H "Authorization: Bearer $TOKEN" \
  --data-binary @chunk_001.bin

Error Responses

Status	Condition
400	Chunk exceeds maximum size (262,144 bytes)
400	Invalid hex hash in URL
400	Hash mismatch between URL and computed hash
500	Storage failure

Phase 4: POST /blobs/commit

Atomically commit multiple files from previously uploaded chunks. Each file specifies its path, content type, and the ordered list of chunk hashes that compose it.

The request body is a JSON manifest and is capped at 32 MiB. This is separate from the 10 GB raw chunk upload limit because /blobs/commit carries paths and hash references, not file bytes.

During commit, AeorDB records the raw whole-file content hash (blake3(file bytes)) in the file metadata. That stored value backs @hash searches; it is not derived from the first chunk.

By default, AeorDB streams the ordered stored chunks and computes that hash on the server. Trusted sync clients that already computed the raw file hash can send content_hash and size with each file. For raw stored chunks, AeorDB then validates chunk existence and byte length from KV metadata and can publish the FileRecord without rereading every chunk body.

Request Body

{
  "files": [
    {
      "path": "/data/report.pdf",
      "content_type": "application/pdf",
      "content_hash": "9b01f3e3d06f...",
      "size": 15234212,
      "chunk_hashes": [
        "a1b2c3d4e5f6...",
        "f6e5d4c3b2a1..."
      ]
    },
    {
      "path": "/data/image.png",
      "content_type": "image/png",
      "chunk_hashes": [
        "1234abcd5678..."
      ]
    }
  ]
}

Field	Type	Required	Description
`files`	array	Yes	List of files to commit
`files[].path`	string	Yes	Destination path for the file
`files[].content_type`	string	No	MIME type
`files[].chunk_hashes`	array	Yes	Ordered list of hex-encoded chunk hashes. `chunks` is also accepted for compatibility.
`files[].content_hash`	string	No	Raw whole-file hash (`blake3(file bytes)`) as hex. When paired with `size`, this enables the raw-chunk metadata fast path for trusted callers.
`files[].size`	integer	No	Total raw file byte length. If supplied, AeorDB validates it against stored chunk metadata or the streamed byte count.

Response

Status: 200 OK

The response contains a summary of the commit operation.

Example

curl -X POST http://localhost:6830/blobs/commit \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      {
        "path": "/data/report.pdf",
        "content_type": "application/pdf",
        "content_hash": "9b01f3e3d06f...",
        "size": 15234212,
        "chunk_hashes": ["a1b2c3d4...", "f6e5d4c3..."]
      }
    ]
  }'

Error Responses

Status	Condition
400	Invalid input (missing path, bad hash, size mismatch, etc.)
429	Blob commit workers are saturated, or an identical commit is already in progress. The response includes `"retryable": true`.
500	Commit task failure or panic

Full Upload Workflow

Here is a complete workflow for uploading a file:

# 1. Get server configuration
CONFIG=$(curl -s http://localhost:6830/blobs/config)
CHUNK_SIZE=$(echo $CONFIG | jq -r '.chunk_size')

# 2. Split file into chunks and hash them
# (pseudo-code: split report.pdf into 256KB chunks, hash each with blake3)
# chunk_hashes=["hash1", "hash2", ...]
# content_hash=blake3(report.pdf raw bytes)
# size=report.pdf raw byte length

# 3. Check which chunks are needed
DEDUP=$(curl -s -X POST http://localhost:6830/blobs/check \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"hashes": ["hash1", "hash2"]}')

# 4. Upload only the needed chunks
for hash in $(echo $DEDUP | jq -r '.needed[]'); do
  curl -X PUT "http://localhost:6830/blobs/chunks/$hash" \
    -H "Authorization: Bearer $TOKEN" \
    --data-binary @"chunk_$hash.bin"
done

# 5. Commit the file
curl -X POST http://localhost:6830/blobs/commit \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [{
      "path": "/data/report.pdf",
      "content_type": "application/pdf",
      "content_hash": "whole-file-hash",
      "size": 15234212,
      "chunk_hashes": ["hash1", "hash2"]
    }]
  }'

System Operations

Administrative endpoints for garbage collection, background tasks, cron scheduling, metrics, health checks, backup/restore, and user/group management. Most system endpoints require root access.

Endpoint Summary

Garbage Collection

Method	Path	Description	Root Required
POST	`/system/gc`	Run synchronous garbage collection	Yes

Background Tasks

Method	Path	Description	Root Required
POST	`/system/tasks/reindex`	Trigger a reindex task	Yes
POST	`/system/tasks/gc`	Trigger a background GC task	Yes
GET	`/system/tasks`	List all tasks with progress	Yes
GET	`/system/tasks/{id}`	Get a single task	Yes
DELETE	`/system/tasks/{id}`	Cancel a task	Yes

Cron Scheduling

Method	Path	Description	Root Required
GET	`/system/cron`	List cron schedules	Yes
POST	`/system/cron`	Create a cron schedule	Yes
PATCH	`/system/cron/{id}`	Update a cron schedule	Yes
DELETE	`/system/cron/{id}`	Delete a cron schedule	Yes

Lifecycle Configuration

Method	Path	Description	Root Required
GET	`/system/lifecycle`	Read lifecycle policy	Yes
PUT	`/system/lifecycle`	Replace lifecycle policy	Yes

Backup & Restore

Method	Path	Description	Root Required
POST	`/versions/export`	Export database as `.aeordb`	Yes
POST	`/versions/diff`	Create patch between versions	Yes
POST	`/versions/import`	Import a backup or patch	Yes
POST	`/versions/promote`	Promote a version hash to HEAD	Yes

Monitoring

Method	Path	Description	Root Required
GET	`/system/stats`	System stats (JSON)	Yes (auth required)
GET	`/system/metrics`	Prometheus metrics	Yes (auth required)
GET	`/system/health`	Health check	No (public)

API Key Management

Method	Path	Description	Root Required
POST	`/auth/keys/admin`	Create an API key	Yes
GET	`/auth/keys/admin`	List all API keys	Yes
DELETE	`/auth/keys/admin/{key_id}`	Revoke an API key	Yes

User Management

Method	Path	Description	Root Required
POST	`/system/users`	Create a user	Yes
GET	`/system/users`	List all users	Yes
GET	`/system/users/{user_id}`	Get a user	Yes
PATCH	`/system/users/{user_id}`	Update a user	Yes
DELETE	`/system/users/{user_id}`	Deactivate a user (soft delete)	Yes

Group Management

Method	Path	Description	Root Required
POST	`/system/groups`	Create a group	Yes
GET	`/system/groups`	List all groups	Yes
GET	`/system/groups/{name}`	Get a group	Yes
PATCH	`/system/groups/{name}`	Update a group	Yes
DELETE	`/system/groups/{name}`	Delete a group	Yes

Email Configuration

Method	Path	Description	Root Required
GET	`/system/email-config`	Get email configuration (secrets masked)	Yes
PUT	`/system/email-config`	Save email configuration (SMTP or OAuth)	Yes
POST	`/system/email-test`	Send a test email	Yes

Garbage Collection

POST /system/gc

Run garbage collection synchronously. Identifies and removes orphaned entries not reachable from the current HEAD.

Query Parameters:

Parameter	Type	Default	Description
`dry_run`	boolean	`false`	If true, report what would be collected without deleting

Response: 200 OK

The response contains GC statistics (entries scanned, reclaimed bytes, etc.).

Example:

# Dry run
curl -X POST "http://localhost:6830/system/gc?dry_run=true" \
  -H "Authorization: Bearer $TOKEN"

# Actual GC
curl -X POST http://localhost:6830/system/gc \
  -H "Authorization: Bearer $TOKEN"

Error Responses:

Status	Condition
403	Non-root user
500	GC failure

Background Tasks

POST /system/tasks/reindex

Enqueue a reindex task for a directory path. Re-scans all files and rebuilds index entries.

Request Body:

{
  "path": "/data/",
  "force": false,
  "metadata_only": false,
  "index_flush_writes": 262144,
  "index_flush_ms": 30000
}

Field	Type	Default	Description
`path`	string	Required	Directory path to reindex
`force`	boolean	`false`	When true, also migrates older live FileRecord payloads in the requested subtree to the current version before indexing eligible files. Omit or set to `false` for index-only reprocessing. Internal/system FileRecords can be migrated but are not indexed.
`metadata_only`	boolean	`false`	When true, rebuild only virtual `@` metadata indexes from FileRecord metadata. This skips file body reads, JSON parsing, and parser plugins.
`index_flush_writes`	integer	`262144`	Flush buffered index mutations after this many field/strategy updates.
`index_flush_ms`	integer	`30000`	Flush buffered index mutations after this many milliseconds.

Response: 200 OK

{
  "id": "task-uuid-here",
  "task_type": "reindex",
  "status": "pending"
}

Example:

curl -X POST http://localhost:6830/system/tasks/reindex \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"path": "/data/", "metadata_only": true}'

POST /system/tasks/gc

Enqueue a background GC task (non-blocking).

Request Body:

{
  "dry_run": false
}

Response: 200 OK

{
  "id": "task-uuid-here",
  "task_type": "gc",
  "status": "pending"
}

Example:

curl -X POST http://localhost:6830/system/tasks/gc \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"dry_run": false}'

GET /system/tasks

List all tasks with their current progress.

Response: 200 OK

{
  "items": [
    {
      "id": "task-uuid-here",
      "task_type": "reindex",
      "status": "running",
      "args": {"path": "/data/"},
      "progress": 0.45,
      "eta_ms": 1775968500000
    }
  ]
}

Each task includes progress (0.0-1.0) and eta_ms (estimated completion timestamp) if available.

Example:

curl http://localhost:6830/system/tasks \
  -H "Authorization: Bearer $TOKEN"

GET /system/tasks/

Get a single task by ID.

Response: 200 OK

{
  "id": "task-uuid-here",
  "task_type": "reindex",
  "status": "running",
  "args": {"path": "/data/"},
  "progress": 0.45,
  "eta_ms": 1775968500000
}

Error Responses:

Status	Condition
404	Task not found

DELETE /system/tasks/

Cancel a task.

Response: 200 OK

{
  "id": "task-uuid-here",
  "status": "cancelled"
}

Example:

curl -X DELETE http://localhost:6830/system/tasks/task-uuid-here \
  -H "Authorization: Bearer $TOKEN"

Cron Scheduling

Tip: The portal Settings page provides an intuitive UI for scheduling garbage collection. Navigate to Settings → Garbage Collector to configure.

GET /system/cron

List all cron schedules.

Response: 200 OK

{
  "items": [
    {
      "id": "nightly-gc",
      "schedule": "0 2 * * *",
      "task_type": "gc",
      "args": {"dry_run": false},
      "enabled": true
    }
  ]
}

POST /system/cron

Create a new cron schedule.

Request Body:

{
  "id": "nightly-gc",
  "schedule": "0 2 * * *",
  "task_type": "gc",
  "args": {"dry_run": false},
  "enabled": true
}

Field	Type	Required	Description
`id`	string	Yes	Unique schedule identifier
`schedule`	string	Yes	Cron expression
`task_type`	string	Yes	Task type to enqueue (`"gc"`, `"reindex"`, `"backup"`)
`args`	object	Yes	Arguments passed to the task
`enabled`	boolean	Yes	Whether the schedule is active

Response: 201 Created

Example:

curl -X POST http://localhost:6830/system/cron \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "nightly-gc",
    "schedule": "0 2 * * *",
    "task_type": "gc",
    "args": {"dry_run": false},
    "enabled": true
  }'

Error Responses:

Status	Condition
400	Invalid cron expression
409	Schedule with this ID already exists

PATCH /system/cron/

Update a cron schedule. All fields are optional – only provided fields are changed.

Request Body:

{
  "enabled": false,
  "schedule": "0 3 * * *"
}

Field	Type	Description
`enabled`	boolean	Enable or disable the schedule
`schedule`	string	New cron expression
`task_type`	string	New task type
`args`	object	New task arguments

Response: 200 OK

Returns the updated schedule.

Error Responses:

Status	Condition
400	Invalid cron expression
404	Schedule not found

DELETE /system/cron/

Delete a cron schedule.

Response: 200 OK

{
  "id": "nightly-gc",
  "deleted": true
}

Error Responses:

Status	Condition
404	Schedule not found

Lifecycle Configuration

Lifecycle configuration is stored inside the database at /.aeordb-config/lifecycle.json. Missing fields use safe defaults, so older databases that only have snapshot_retention continue to allow snapshot writes.

GET /system/lifecycle

Return the current lifecycle policy.

Response: 200 OK

{
  "snapshot_writes_enabled": true,
  "snapshot_retention": {
    "auto_months": 0,
    "manual_months": 0
  }
}

PUT /system/lifecycle

Replace the lifecycle policy.

Request Body:

{
  "snapshot_writes_enabled": false,
  "snapshot_retention": {
    "auto_months": 1,
    "manual_months": 12
  }
}

Field	Type	Default	Description
`snapshot_writes_enabled`	boolean	`true`	Allow creation of new snapshot records. When `false`, existing snapshots can still be listed, read, restored, deleted, exported, and pruned.
`snapshot_retention.auto_months`	integer	`0`	Months after which auto snapshots are eligible for pruning. `0` disables pruning.
`snapshot_retention.manual_months`	integer	`0`	Months after which manual snapshots are eligible for pruning. `0` disables pruning.

When snapshot_writes_enabled is false, manual POST /versions/snapshots and snapshot rename requests return 403. Automatic safety snapshots, such as file-restore and pre-GC snapshots, are skipped; the underlying operation continues when it can safely proceed without writing a new snapshot.

Backup & Restore

POST /versions/export

Export the database (or a specific version) as an .aeordb archive file.

Query Parameters:

Parameter	Type	Description
`snapshot`	string	Export a named snapshot (default: HEAD)
`hash`	string	Export a specific version by hex hash

Response: 200 OK

Content-Type: application/octet-stream
Content-Disposition: attachment; filename="export-{hash_prefix}.aeordb"
Body: binary archive data

Example:

# Export HEAD
curl -X POST http://localhost:6830/versions/export \
  -H "Authorization: Bearer $TOKEN" \
  -o backup.aeordb

# Export a specific snapshot
curl -X POST "http://localhost:6830/versions/export?snapshot=v1.0" \
  -H "Authorization: Bearer $TOKEN" \
  -o backup-v1.aeordb

# Export by hash
curl -X POST "http://localhost:6830/versions/export?hash=a1b2c3d4..." \
  -H "Authorization: Bearer $TOKEN" \
  -o backup.aeordb

POST /versions/diff

Create a patch file representing the difference between two versions.

Query Parameters:

Parameter	Type	Required	Description
`from`	string	Yes	Source snapshot name or hex hash
`to`	string	No	Target snapshot name or hex hash (default: HEAD)

Response: 200 OK

Content-Type: application/octet-stream
Content-Disposition: attachment; filename="patch-{hash_prefix}.aeordb"
Body: binary patch data

Example:

curl -X POST "http://localhost:6830/versions/diff?from=v1.0&to=v2.0" \
  -H "Authorization: Bearer $TOKEN" \
  -o patch-v1-v2.aeordb

POST /versions/import

Import a backup or patch file. Body limit: 10 MB.

Query Parameters:

Parameter	Type	Default	Description
`force`	boolean	`false`	Force import even if conflicts exist
`promote`	boolean	`false`	Promote the imported version to HEAD

Request:

Headers:
- Authorization: Bearer <token> (required)
Body: raw .aeordb file bytes

Response: 200 OK

{
  "status": "success",
  "backup_type": "export",
  "entries_imported": 1500,
  "chunks_imported": 3200,
  "files_imported": 450,
  "directories_imported": 30,
  "deletions_applied": 5,
  "version_hash": "a1b2c3d4e5f6...",
  "head_promoted": true
}

Example:

curl -X POST "http://localhost:6830/versions/import?promote=true" \
  -H "Authorization: Bearer $TOKEN" \
  --data-binary @backup.aeordb

Error Responses:

Status	Condition
400	Invalid or corrupt backup file
403	Non-root user

POST /versions/promote

Promote an arbitrary version hash to HEAD.

Query Parameters:

Parameter	Type	Required	Description
`hash`	string	Yes	Hex-encoded version hash to promote

Response: 200 OK

{
  "status": "success",
  "head": "a1b2c3d4e5f6..."
}

Example:

curl -X POST "http://localhost:6830/versions/promote?hash=a1b2c3d4e5f6..." \
  -H "Authorization: Bearer $TOKEN"

Error Responses:

Status	Condition
400	Invalid hash format
404	Version hash not found in storage

Monitoring

GET /system/stats

System statistics endpoint. Returns a structured JSON snapshot of all engine metrics. All values are read from O(1) atomic counters — this endpoint is safe to poll frequently with no performance impact.

Response: 200 OK

{
  "identity": {
    "version": "0.9.0",
    "database_path": "/data/mydb.aeordb",
    "hash_algorithm": "Blake3_256",
    "chunk_size": 262144,
    "node_id": 1,
    "uptime_seconds": 86400
  },
  "counts": {
    "files": 150000,
    "directories": 23000,
    "symlinks": 500,
    "chunks": 420000,
    "snapshots": 12,
    "forks": 2
  },
  "sizes": {
    "disk_total": 2147483648,
    "kv_file": 86114304,
    "logical_data": 1800000000,
    "chunk_data": 1200000000,
    "void_space": 5242880,
    "dedup_savings": 600000000
  },
  "throughput": {
    "writes_per_sec": { "1m": 42.3, "5m": 38.1, "15m": 35.7, "peak_1m": 120.0 },
    "reads_per_sec": { "1m": 156.2, "5m": 140.5, "15m": 138.0, "peak_1m": 450.0 },
    "bytes_written_per_sec": { "1m": 435200, "5m": 392000, "15m": 367000 },
    "bytes_read_per_sec": { "1m": 16065536, "5m": 14450000, "15m": 14200000 }
  },
  "latency": {
    "write": { "p50": 5.6, "p95": 15.4, "p99": 20.5 },
    "read": { "p50": 2.1, "p95": 8.3, "p99": 12.0 },
    "query": { "p50": 4.2, "p95": 22.0, "p99": 45.0 },
    "flush": { "p50": 1.2, "p95": 5.0, "p99": 12.0 }
  },
  "health": {
    "disk_usage_percent": 48.5,
    "kv_fill_ratio": 0.72,
    "dedup_hit_rate": 0.33,
    "gc_last_reclaimed_bytes": 1048576,
    "write_buffer_depth": 42
  },
  "memory": {
    "process": {
      "rss_bytes": 2147483648,
      "peak_rss_bytes": 3221225472,
      "virtual_bytes": 8589934592,
      "data_bytes": 1610612736,
      "swap_bytes": 0,
      "thread_count": 32,
      "fd_count": 128
    },
    "index_cache": {
      "cached_indexes": 16,
      "dirty_indexes": 4,
      "deleted_indexes": 0,
      "pending_mutations": 512,
      "total_mutations": 102400,
      "flushes": 7,
      "flushed_indexes": 92,
      "entries": 2500000,
      "values": 350000,
      "estimated_bytes": 734003200,
      "top_cached_indexes": []
    },
    "directory_cache": {
      "entries": 12000,
      "estimated_bytes": 16777216
    },
    "caches": {
      "permissions_entries": 128,
      "index_config_entries": 16,
      "grants_index_entries": 4
    },
    "estimated_engine_owned_bytes": 750780416
  },
  "sync": {
    "active_peers": 2,
    "failing_peers": 0,
    "last_sync_ms": 1776563922032,
    "sync_lag_entries": { "peer_2": 0, "peer_3": 15 }
  }
}

Response sections:

Section	Description
`identity`	Server version, database path, hash algorithm, chunk size, node ID, and uptime
`counts`	Current totals for files, directories, symlinks, chunks, snapshots, and forks
`sizes`	Byte-level storage breakdown: disk total, KV file size, logical data, chunk data, void space, dedup savings
`throughput`	Rolling read/write rates (1m, 5m, 15m averages) and peak rates
`latency`	Percentile latencies (p50, p95, p99) for write, read, query, and flush operations (in milliseconds)
`health`	Operational health signals: disk usage, KV fill ratio, dedup hit rate, last GC reclamation, write buffer depth
`memory`	Process memory and AeorDB-owned cache diagnostics, including RSS, swap, thread/fd counts, index cache estimates, directory cache estimates, and cache entry counts
`sync`	Replication status: active/failing peers, last sync timestamp, per-peer sync lag (only present when replication is active)

Note: The previous GET /system/stats returned a flat object computed via O(n) iteration. The new response is structured into nested sections and is O(1) — no performance concerns polling at high frequency.

sizes.logical_data is the sum of live file sizes reachable from the current HEAD tree. sizes.chunk_data is the stored payload size of unique chunk entries in the KV index, initialized from entry metadata without reading chunk bodies. sizes.void_space is tracked reusable space inside the append log; it is not filesystem free space.

memory.process is sampled from the operating system. On Linux this uses /proc/self/status plus /proc/self/fd; on macOS it uses Mach task information plus /dev/fd. Platform-specific fields that are unavailable are reported as 0. memory.index_cache.estimated_bytes, memory.directory_cache.estimated_bytes, and memory.estimated_engine_owned_bytes are best-effort estimates intended for diagnosis and trend monitoring; they are not allocator-exact byte accounting.

Example:

curl http://localhost:6830/system/stats \
  -H "Authorization: Bearer $TOKEN"

GET /system/health

Public health check endpoint. No authentication required. Once the database is ready, this returns only a minimal status object – no detailed internal checks are exposed.

Response: 200 OK

{
  "status": "healthy",
  "version": "0.9.5"
}

During startup, clean opens, dirty startup, or WAL/KV recovery, AeorDB binds HTTP before the storage engine is ready. In that state, /system/health still returns 200 OK with startup progress:

{
  "status": "starting",
  "phase": "rebuild_kv_scan",
  "message": "Scanning WAL entries for dirty startup recovery",
  "version": "0.9.5",
  "progress": 0.42,
  "eta": {
    "seconds": 480,
    "at": "2026-06-12T03:05:00Z"
  }
}

progress is an overall startup fraction from 0.0 to 1.0. eta is null when unknown, or an object with estimated seconds remaining and an RFC 3339 timestamp. Non-health routes return 503 Service Unavailable until the full application router is ready.

For detailed system diagnostics after startup, use GET /system/stats instead (requires authentication).

Example:

curl http://localhost:6830/system/health

GET /system/metrics

Prometheus-format metrics endpoint. Requires authentication.

Response: 200 OK

Content-Type: text/plain; version=0.0.4; charset=utf-8
Body: Prometheus text exposition format

Memory gauges are updated when /system/metrics is rendered and by the periodic metrics pulse:

Metric	Description
`aeordb_process_rss_bytes`	Current process resident set size
`aeordb_process_peak_rss_bytes`	Peak process resident set size observed by the OS
`aeordb_process_virtual_bytes`	Process virtual address size
`aeordb_process_data_bytes`	Process data/heap segment size where available
`aeordb_process_swap_bytes`	Process swap usage where available
`aeordb_process_thread_count`	Process thread count where available
`aeordb_process_fd_count`	Open file descriptor count where available
`aeordb_engine_memory_estimated_bytes`	Estimated AeorDB-owned cache memory tracked by diagnostics
`aeordb_index_cache_estimated_bytes`	Estimated shared index cache memory
`aeordb_index_cache_cached_indexes`	Cached field/strategy index count
`aeordb_index_cache_dirty_indexes`	Cached indexes with unflushed mutations
`aeordb_index_cache_pending_mutations`	Pending index mutations in the shared buffer
`aeordb_index_cache_entries`	Indexed scalar entry count currently cached
`aeordb_index_cache_values`	Raw indexed value count currently cached
`aeordb_directory_cache_estimated_bytes`	Estimated directory content cache memory
`aeordb_directory_cache_entries`	Directory content cache entry count

Example:

curl http://localhost:6830/system/metrics \
  -H "Authorization: Bearer $TOKEN"

API Key Management

POST /auth/keys/admin

Create a new API key. The plaintext key is returned only once – store it securely. Requires root.

Request Body:

{
  "user_id": "550e8400-e29b-41d4-a716-446655440000"
}

Field	Type	Required	Description
`user_id`	string (UUID)	No	User to create the key for (defaults to the calling user)

Response: 201 Created

{
  "key_id": "660e8400-e29b-41d4-a716-446655440001",
  "api_key": "aeor_660e8400_a1b2c3d4e5f6...",
  "user_id": "550e8400-e29b-41d4-a716-446655440000",
  "created_at": "2026-04-13T10:00:00Z"
}

Example:

curl -X POST http://localhost:6830/auth/keys/admin \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"user_id": "550e8400-e29b-41d4-a716-446655440000"}'

GET /auth/keys/admin

List all API keys (metadata only – no secrets). Requires root.

Response: 200 OK

{
  "items": [
    {
      "key_id": "660e8400-e29b-41d4-a716-446655440001",
      "user_id": "550e8400-e29b-41d4-a716-446655440000",
      "created_at": "2026-04-13T10:00:00Z",
      "is_revoked": false
    }
  ]
}

DELETE /auth/keys/admin/

Revoke an API key. Revoked keys cannot be used to obtain tokens. Requires root.

Response: 200 OK

{
  "revoked": true,
  "key_id": "660e8400-e29b-41d4-a716-446655440001"
}

Error Responses:

Status	Condition
400	Invalid key ID format
404	API key not found

User Management

POST /system/users

Create a new user. Requires root.

Request Body:

{
  "username": "alice",
  "email": "[email protected]",
  "tags": ["editor", "us-west"]
}

Field	Type	Required	Description
`username`	string	Yes	Unique username
`email`	string	No	User email address
`tags`	array of strings	No	Admin-assigned tags for group membership queries (default: empty)

Response: 201 Created

{
  "user_id": "550e8400-e29b-41d4-a716-446655440000",
  "username": "alice",
  "email": "[email protected]",
  "is_active": true,
  "tags": ["editor", "us-west"],
  "created_at": 1775968398000,
  "updated_at": 1775968398000
}

GET /system/users

List all users. Requires root.

Response: 200 OK

{
  "items": [
    {
      "user_id": "550e8400-e29b-41d4-a716-446655440000",
      "username": "alice",
      "email": "[email protected]",
      "is_active": true,
      "created_at": 1775968398000,
      "updated_at": 1775968398000
    }
  ]
}

GET /system/users/

Get a single user. Requires root.

Response: 200 OK (same shape as the user object above)

Error Responses:

Status	Condition
400	Invalid UUID
404	User not found

PATCH /system/users/

Update a user. All fields are optional. Requires root.

Request Body:

{
  "username": "alice_updated",
  "email": "[email protected]",
  "is_active": true,
  "tags": ["editor", "us-west", "senior"]
}

Tags are admin-only – users cannot modify their own tags, preventing privilege escalation through self-assigned group membership.

Response: 200 OK (returns the updated user)

DELETE /system/users/

Deactivate a user (soft delete – sets is_active to false). Requires root.

Response: 200 OK

{
  "deactivated": true,
  "user_id": "550e8400-e29b-41d4-a716-446655440000"
}

Group Management

Groups define path-level access control rules using query-based membership. Users are matched into groups by querying safe fields on their user record, including tags.

Tag-Based Group Membership

When query_field is set to "tags", three special operators are available:

Operator	Description	Example
`has`	User has this exact tag	`"query_value": "editor"`
`has_any`	User has at least one of these tags (comma-separated)	`"query_value": "editor,admin"`
`has_all`	User has all of these tags (comma-separated)	`"query_value": "editor,us-west"`

Standard operators (eq, gt, etc.) also work with tags – they match against the comma-joined tag string.

POST /system/groups

Create a new group. Requires root.

Request Body:

{
  "name": "editors",
  "default_allow": "/content/*",
  "default_deny": "/system/*",
  "query_field": "tags",
  "query_operator": "has",
  "query_value": "editor"
}

Field	Type	Required	Description
`name`	string	Yes	Unique group name
`default_allow`	string	Yes	Path pattern for allowed access
`default_deny`	string	Yes	Path pattern for denied access
`query_field`	string	Yes	User field to query for membership (must be a safe field)
`query_operator`	string	Yes	Comparison operator
`query_value`	string	Yes	Value to match against

Response: 201 Created

{
  "name": "editors",
  "default_allow": "/content/*",
  "default_deny": "/system/*",
  "query_field": "role",
  "query_operator": "eq",
  "query_value": "editor",
  "created_at": 1775968398000,
  "updated_at": 1775968398000
}

GET /system/groups

List all groups. Requires root.

Response: 200 OK (object with items array of group objects)

GET /system/groups/

Get a single group. Requires root.

Error Responses:

Status	Condition
404	Group not found

PATCH /system/groups/

Update a group. All fields are optional. Requires root.

Request Body:

{
  "default_allow": "/content/*",
  "query_value": "senior-editor"
}

The query_field value is validated against a whitelist of safe fields. Attempting to use an unsafe field returns a 400 error.

Error Responses:

Status	Condition
400	Unsafe query field
404	Group not found

DELETE /system/groups/

Delete a group. Requires root.

Response: 200 OK

{
  "deleted": true,
  "name": "editors"
}

Error Responses:

Status	Condition
404	Group not found

Email Configuration

AeorDB supports sending email notifications (e.g., when files are shared via POST /files/share). Email can be configured using either SMTP or OAuth providers.

GET /system/email-config

Retrieve the current email configuration. Sensitive fields (passwords, client secrets, refresh tokens) are masked as "--------" in the response, and a "configured": true field is added.

Auth: Root only.

Response: 200 OK

{
  "provider": "smtp",
  "host": "smtp.example.com",
  "port": 587,
  "username": "[email protected]",
  "password": "--------",
  "from_address": "[email protected]",
  "from_name": "AeorDB",
  "tls": "starttls",
  "configured": true
}

PUT /system/email-config

Save email configuration. Supports two provider types: SMTP and OAuth.

Auth: Root only.

SMTP Configuration:

{
  "provider": "smtp",
  "host": "smtp.example.com",
  "port": 587,
  "username": "[email protected]",
  "password": "secret",
  "from_address": "[email protected]",
  "from_name": "AeorDB",
  "tls": "starttls"
}

Field	Type	Required	Description
`provider`	string	Yes	Must be `"smtp"`
`host`	string	Yes	SMTP server hostname
`port`	integer	Yes	SMTP server port
`username`	string	Yes	SMTP username
`password`	string	Yes	SMTP password
`from_address`	string	Yes	Sender email address
`from_name`	string	No	Sender display name (default: `"AeorDB"`)
`tls`	string	No	TLS mode: `"starttls"` (port 587, default), `"tls"` (implicit TLS, port 465), or `"none"` (cleartext)

OAuth Configuration:

{
  "provider": "oauth",
  "oauth_provider": "gmail",
  "client_id": "...",
  "client_secret": "...",
  "refresh_token": "...",
  "from_address": "[email protected]",
  "from_name": "AeorDB"
}

Field	Type	Required	Description
`provider`	string	Yes	Must be `"oauth"`
`oauth_provider`	string	Yes	OAuth provider: `"gmail"`, `"outlook"`, or `"custom"`
`client_id`	string	Yes	OAuth client ID
`client_secret`	string	Yes	OAuth client secret
`refresh_token`	string	Yes	OAuth refresh token
`from_address`	string	Yes	Sender email address
`from_name`	string	No	Sender display name

Response: 200 OK

POST /system/email-test

Send a test email to verify the current configuration.

Auth: Root only.

Request Body:

{
  "to": "[email protected]"
}

Response: 200 OK

{
  "sent": true,
  "message": "Test email sent successfully"
}

On failure:

{
  "sent": false,
  "error": "Connection refused: smtp.example.com:587"
}

When files are shared via POST /files/share, email notifications are automatically sent to recipients if email is configured. If email is not configured, sharing still works – notifications are silently skipped.

Plugin Endpoints

AeorDB supports deploying WebAssembly (WASM) plugins that extend the database with custom logic. HTTP plugins are invoked by their plugin path under the /plugins namespace.

Native Parsers

AeorDB includes 8 built-in format parsers that run automatically during indexing – no WASM deployment needed. Native parsers are tried first; if the content type is not recognized, the engine falls through to any deployed WASM parser plugin.

Supported Formats

Parser	Content Types	Extensions	Extracted Fields
Text	`text/plain`, `text/markdown`, `text/css`, `text/csv`, `application/json`, `application/xml`, `application/yaml`, `application/javascript`, `text/x-*`	`.txt`, `.md`, `.rs`, `.js`, `.py`, `.ts`, `.c`, `.h`, `.cpp`, `.java`, `.go`, `.sh`, `.css`, `.json`, `.yaml`, `.yml`, `.toml`, `.xml`, `.sql`	`text`, `metadata` (line/word/char counts, BOM detection, code heuristics)
HTML/XML	`text/html`, `text/xml`, `application/xhtml+xml`	`.html`, `.htm`, `.xhtml`	`text` (script/style stripped), `metadata` (title, description, keywords, headings, link count)
PDF	`application/pdf`	`.pdf`	`metadata` (page count, version)
Images	`image/jpeg`, `image/png`, `image/gif`, `image/bmp`, `image/webp`, `image/tiff`, `image/svg+xml`	`.jpg`, `.jpeg`, `.png`, `.gif`, `.bmp`, `.webp`, `.tiff`, `.tif`, `.svg`	`metadata` (format, dimensions, color depth from magic bytes)
Audio	`audio/mpeg`, `audio/mp3`, `audio/wav`, `audio/x-wav`, `audio/ogg`, `audio/vorbis`	`.mp3`, `.wav`, `.ogg`	`metadata` (format, duration, sample rate, channels, bitrate, ID3 tags)
Video	`video/mp4`, `video/quicktime`, `video/x-msvideo`, `video/avi`, `video/webm`, `video/x-matroska`, `video/x-flv`	`.mp4`, `.mov`, `.avi`, `.webm`, `.mkv`, `.flv`	`metadata` (format, container, stream info)
MS Office	`application/vnd.openxmlformats-officedocument.wordprocessingml.document`, `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`, `application/msword`, `application/vnd.ms-excel`	`.docx`, `.xlsx`	Extracted text and document metadata
ODF	`application/vnd.oasis.opendocument.text`, `application/vnd.oasis.opendocument.spreadsheet`	`.odt`, `.ods`	Extracted text and document metadata

How Native Parsers Work

When a file is stored and indexing runs, the engine checks the file’s content type against the native parser registry.
If the content type is application/octet-stream or empty, the engine falls back to extension-based matching.
If a native parser handles the format, it returns structured JSON that feeds into the indexing pipeline.
If no native parser matches, the engine falls through to the WASM plugin system (checking /.config/parsers.json or the directory’s parser field).

Native parsers have zero overhead compared to WASM – they run as compiled Rust code in the same process. WASM parser plugins remain available for custom or proprietary formats not covered by the built-in parsers.

Endpoint Summary

Method	Path	Description	Auth
PUT	`/plugins/{name}`	Deploy a WASM plugin	Yes
POST	`/plugins/{name}/invoke`	Invoke a plugin	Yes
GET	`/plugins`	List all deployed plugins	Yes
DELETE	`/plugins/{name}`	Remove a deployed plugin	Yes

PUT /plugins/

Deploy a WASM plugin at the given plugin path. If a plugin already exists at this path, it is replaced.

{name} in the URL is the plugin path/key used for storage and invocation. The optional name query parameter is display metadata only.

Request

URL parameters:
- {name} – plugin path/key used by /plugins/{name}/invoke
Query parameters:
- name (optional) – display name (defaults to the {name} URL segment)
- plugin_type (optional) – plugin type string (defaults to "wasm")
- version (optional) – plugin package version
- author (optional) – plugin author
Headers:
- Authorization: Bearer <token> (required)
Body: raw WASM binary bytes

Response

Status: 200 OK

Returns the plugin metadata:

{
  "plugin_id": "2f1f4a7a-30f4-4a75-bde0-8e3c6eaf51e4",
  "name": "my-plugin",
  "path": "my-plugin",
  "plugin_type": "wasm",
  "created_at": "2026-04-13T10:00:00Z",
  "version": "1.2.3",
  "author": "Plugin Author",
  "checksum": "blake3:4f3c...",
  "updated_at": "2026-04-13T10:15:00Z"
}

Example

curl -X PUT "http://localhost:6830/plugins/my-plugin?version=1.2.3&author=Plugin%20Author" \
  -H "Authorization: Bearer $TOKEN" \
  --data-binary @plugin.wasm

Error Responses

Status	Condition
400	Empty body or invalid plugin type
400	Invalid WASM module
500	Deployment failure

POST /plugins/{name}/invoke

Invoke a deployed plugin. The request body is wrapped in a PluginRequest envelope with metadata, then passed to the WASM runtime.

Request

URL parameters:
- {name} – plugin path/key
Headers:
- Authorization: Bearer <token> (required)
- Content-Type – depends on what the plugin expects
Body: raw request payload (passed to the plugin as arguments)

Plugin Request Envelope

The server wraps the raw body into:

{
  "arguments": "<raw body bytes>",
  "metadata": {
    "name": "my-plugin",
    "path": "/plugins/my-plugin",
    "plugin_path": "my-plugin"
  }
}

Plugin Response

Plugins return a PluginResponse envelope:

{
  "status_code": 200,
  "content_type": "application/json",
  "headers": {
    "x-custom-header": "value"
  },
  "body": "<response bytes>"
}

The server maps these fields to the HTTP response. For security, only safe headers are forwarded:

Headers starting with x-
cache-control, etag, last-modified, content-disposition, content-language, content-encoding, vary

If the plugin returns data that is not a valid PluginResponse, it is sent as raw application/octet-stream bytes (backward compatibility).

WASM Host Functions

Plugins have access to the following host functions for interacting with the database:

CRUD: read, write, and delete files
Extraction: read UTF-8 line or character ranges without buffering the full file through the plugin boundary
Query: execute queries and aggregations against the engine
Context: access request metadata

Example

curl -X POST http://localhost:6830/plugins/my-plugin/invoke \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"input": "hello"}'

Error Responses

Status	Condition
404	Plugin not found
500	Plugin invocation failure (runtime error, panic, etc.)

GET /plugins

List all deployed plugins.

Request

Headers:
- Authorization: Bearer <token> (required)

Response

Status: 200 OK

{
  "items": [
    {
      "plugin_id": "2f1f4a7a-30f4-4a75-bde0-8e3c6eaf51e4",
      "name": "my-plugin",
      "path": "my-plugin",
      "plugin_type": "wasm",
      "created_at": "2026-04-13T10:00:00Z",
      "version": "1.2.3",
      "author": "Plugin Author",
      "checksum": "blake3:4f3c...",
      "updated_at": "2026-04-13T10:15:00Z"
    }
  ]
}

Example

curl http://localhost:6830/plugins \
  -H "Authorization: Bearer $TOKEN"

DELETE /plugins/

Remove a deployed plugin.

Request

URL parameters:
- {name} – plugin path/key
Headers:
- Authorization: Bearer <token> (required)

Response

Status: 200 OK

{
  "removed": true,
  "path": "my-plugin"
}

Example

curl -X DELETE http://localhost:6830/plugins/my-plugin \
  -H "Authorization: Bearer $TOKEN"

Error Responses

Status	Condition
404	Plugin not found
500	Removal failure

Events & Webhooks

AeorDB publishes real-time events via Server-Sent Events (SSE). Clients can subscribe to a filtered stream of engine events for live updates.

Endpoint Summary

Method	Path	Description	Auth
GET	`/system/events`	Global SSE event stream (all events, filtered by permissions)	Yes
GET	`/events/me`	Per-user SSE channel (events addressed to the authenticated user)	Yes

GET /system/events

Open a persistent Server-Sent Events connection. The server pushes events as they occur and sends periodic keepalive pings.

When the stream is available, the server is ready to serve the full API. For unfiltered streams, and for streams that explicitly include server_ready, the first event is a synthetic server_ready event with the current ready state. This event is generated per connection; it is not a one-time broadcast that clients can miss.

Query Parameters

Parameter	Type	Description
`events`	string	Comma-separated list of event types to receive (default: all)
`path_prefix`	string	Only receive events whose payload contains a path starting with this prefix

Request

curl -N http://localhost:6830/system/events \
  -H "Authorization: Bearer $TOKEN"

Filtered Stream

Subscribe to only specific event types:

curl -N "http://localhost:6830/system/events?events=entries_created,entries_deleted" \
  -H "Authorization: Bearer $TOKEN"

Filter by path prefix:

curl -N "http://localhost:6830/system/events?path_prefix=/data/users" \
  -H "Authorization: Bearer $TOKEN"

Combine both:

curl -N "http://localhost:6830/system/events?events=entries_created&path_prefix=/data/" \
  -H "Authorization: Bearer $TOKEN"

Response Format

The response is an SSE stream. Each event has the standard SSE fields:

id: evt-uuid-here
event: entries_created
data: {"event_id":"evt-uuid-here","event_type":"entries_created","timestamp":1775968398000,"payload":{"entries":[{"path":"/data/report.pdf"}]}}

Event Envelope

Each event is a JSON object with:

Field	Type	Description
`event_id`	string	Unique event identifier
`event_type`	string	Type of event (see below)
`timestamp`	integer	Unix timestamp (milliseconds)
`payload`	object	Event-specific data

Event Types

Event Type	Description	Payload
`server_ready`	Synthetic first event on ready SSE connections	`{"status": "ready", "version": "...", "startup_time": 1781233139578, "uptime_ms": 6500}`
`entries_created`	Files were created or updated	`{"entries": [{"path": "..."}]}`
`entries_deleted`	Files were deleted	`{"entries": [{"path": "..."}]}`
`versions_created`	A new version (snapshot/fork) was created	Version metadata
`permissions_changed`	Permissions were updated for a path	`{"path": "..."}`
`indexes_changed`	Index configuration was updated	`{"path": "..."}`
`heartbeat`	Clock synchronization pulse (every 15s)	`{"intent_time", "construct_time", "node_id"}`
`metrics`	System metrics snapshot (every 15s)	`{"counts", "sizes", "throughput", "health"}`

Server Ready Event

The server_ready event is emitted as the first SSE event for each eligible /system/events connection. Because /system/events is only served by the full application router, receiving this event means the server has finished startup and is ready for normal API traffic.

It is sent when:

no events filter is supplied
server_ready is included in the events filter

It is not sent when an events filter excludes it. This preserves strict event filtering for clients that only want a specific event type.

{
  "status": "ready",
  "version": "0.9.5",
  "startup_time": 1781233139578,
  "uptime_ms": 6500
}

Field	Type	Description
`status`	string	Always `"ready"`
`version`	string	AeorDB server version
`startup_time`	integer	Server startup timestamp in Unix milliseconds
`uptime_ms`	integer	Current uptime at the moment this SSE connection was accepted

Heartbeat Event

The heartbeat event is used exclusively for clock synchronization between nodes. It fires every 15 seconds and carries only three fields:

{
  "intent_time": 1776563925000,
  "construct_time": 1776563925003,
  "node_id": 1
}

Field	Type	Description
`intent_time`	integer	Timestamp (ms) when the heartbeat was scheduled to fire
`construct_time`	integer	Timestamp (ms) when the heartbeat payload was actually constructed
`node_id`	integer	The node that emitted this heartbeat

The delta between intent_time and construct_time is used by peers to measure clock offset and network latency. The heartbeat does not contain any stats or metrics data — it is a lightweight clock-sync mechanism only.

Breaking change: Prior versions included stats fields (file counts, disk usage, etc.) in the heartbeat payload. These fields have been removed. Use the metrics event for monitoring data.

Metrics Event

The metrics event delivers system metrics to connected clients every 15 seconds (configurable, independent of the heartbeat interval). All values are computed in O(1) from atomic counters — subscribing to this event has no performance impact on the server.

Subscribe to metrics:

curl -N "http://localhost:6830/system/events?events=metrics" \
  -H "Authorization: Bearer $TOKEN"

Payload:

{
  "counts": {
    "files": 150000,
    "directories": 23000,
    "symlinks": 500,
    "chunks": 420000,
    "snapshots": 12,
    "forks": 2
  },
  "sizes": {
    "disk_total": 2147483648,
    "kv_file": 86114304,
    "logical_data": 1800000000,
    "chunk_data": 1200000000,
    "void_space": 5242880,
    "dedup_savings": 600000000
  },
  "throughput": {
    "writes_per_sec": { "1m": 42.3, "5m": 38.1, "15m": 35.7, "peak_1m": 120.0 },
    "reads_per_sec": { "1m": 156.2, "5m": 140.5, "15m": 138.0, "peak_1m": 450.0 },
    "bytes_written_per_sec": { "1m": 435200, "5m": 392000, "15m": 367000 },
    "bytes_read_per_sec": { "1m": 16065536, "5m": 14450000, "15m": 14200000 }
  },
  "health": {
    "disk_usage_percent": 48.5,
    "kv_fill_ratio": 0.72,
    "dedup_hit_rate": 0.33,
    "write_buffer_depth": 42
  },
  "memory": {
    "process": {
      "rss_bytes": 2147483648,
      "peak_rss_bytes": 3221225472,
      "virtual_bytes": 8589934592,
      "data_bytes": 1610612736,
      "swap_bytes": 0,
      "thread_count": 32,
      "fd_count": 128
    },
    "index_cache": {
      "cached_indexes": 16,
      "dirty_indexes": 4,
      "deleted_indexes": 0,
      "pending_mutations": 512,
      "total_mutations": 102400,
      "flushes": 7,
      "flushed_indexes": 92,
      "entries": 2500000,
      "values": 350000,
      "estimated_bytes": 734003200,
      "top_cached_indexes": []
    },
    "directory_cache": {
      "entries": 12000,
      "estimated_bytes": 16777216
    },
    "caches": {
      "permissions_entries": 128,
      "index_config_entries": 16,
      "grants_index_entries": 4
    },
    "estimated_engine_owned_bytes": 750780416
  }
}

memory.process is sampled from the operating system. memory.index_cache.estimated_bytes, memory.directory_cache.estimated_bytes, and memory.estimated_engine_owned_bytes are best-effort diagnostic estimates, not allocator-exact accounting.

Payload sections:

Section	Description
`counts`	Current totals for files, directories, symlinks, chunks, snapshots, and forks
`sizes`	Byte-level storage breakdown: disk total, KV file, logical data, chunk data, void space, dedup savings
`throughput`	Rolling rates (1m, 5m, 15m averages and peak) for read/write operations and bytes
`health`	Operational health signals: disk usage, KV fill ratio, dedup efficiency, write buffer depth

Migration note: If your dashboard previously subscribed to ?events=heartbeat for monitoring data, switch to ?events=metrics. If you need both clock data and metrics (uncommon), subscribe to ?events=heartbeat,metrics.

Keepalive

The server sends a keepalive ping every 30 seconds to prevent connection timeouts:

: ping

Path Prefix Matching

The path prefix filter checks two locations in the event payload:

Batch events: payload.entries[].path – matches if any entry’s path starts with the prefix
Single-path events: payload.path – matches if the path starts with the prefix

Connection Behavior

The connection stays open indefinitely until the client disconnects.
If the client falls behind (lagged), missed events are silently dropped.
Reconnecting clients should use the last received id for gap detection (standard SSE Last-Event-ID header).

JavaScript Example

const evtSource = new EventSource(
  'http://localhost:6830/system/events?events=entries_created',
  { headers: { 'Authorization': 'Bearer ' + token } }
);

evtSource.addEventListener('entries_created', (event) => {
  const data = JSON.parse(event.data);
  console.log('Files created:', data.payload.entries);
});

evtSource.onerror = (err) => {
  console.error('SSE error:', err);
};

GET /events/me

A per-user SSE channel that delivers ONLY events addressed to the authenticated user. The server filters the event bus and forwards an event only when its recipient_user_id matches the JWT’s sub claim. Generic events with no recipient (heartbeats, system metrics, file uploads, etc.) are NOT delivered here — those go through /system/events.

This channel is the security boundary for personal notifications: each user can only see events sent specifically to them, even if multiple users are subscribed simultaneously.

Request

curl -N http://localhost:6830/events/me \
  -H "Authorization: Bearer $TOKEN"

EventSource (browsers can’t set Authorization headers on SSE):

const evt = new EventSource('/events/me?token=' + encodeURIComponent(token));
evt.addEventListener('files_shared', (e) => {
  const payload = JSON.parse(e.data).payload;
  alert(`${payload.from} shared ${payload.path} with you`);
});

Event Types Currently Routed Here

Event	Payload	Triggered By
`files_shared`	`{ path, permissions, from }`	A `POST /files/share` call where the recipient is in the `users` list. One event per (recipient, path).

Additional per-user event types (group invitations, mentions, etc.) will be added on this channel — the recipient field is the routing boundary.

Event Envelope

{
  "event_id": "uuid",
  "event_type": "files_shared",
  "timestamp": 1778391000000,
  "user_id": "00000000-0000-0000-0000-000000000000",
  "recipient_user_id": "6874d1cd-…",
  "payload": {
    "path": "/Pictures/Family/photo.jpg",
    "permissions": ".r..l...",
    "from": "Root"
  }
}

user_id is the actor (who performed the action). recipient_user_id matches the authenticated subscriber.

Webhook Configuration

Webhooks can be configured in-database by storing webhook configuration files. The event bus internally broadcasts all events, and webhook delivery can be wired to the SSE stream.

Webhook configuration is stored at a well-known path within the engine and follows the same event type filtering as the SSE endpoint.

Authentication

AeorDB supports multiple authentication modes. All protected endpoints require either a JWT Bearer token or are accessed through an API key exchange.

Auth Modes

AeorDB can run in one of three authentication modes, selected at startup:

Mode	CLI Flag	Description
`disabled`	`--auth disabled`	No authentication. All requests are allowed.
`self-contained`	`--auth self-contained`	Keys and users stored inside the database (default).
`file`	`--auth file://<path>`	Identity loaded from an external file. Returns a bootstrap API key on first run.

Disabled Mode

All middleware is bypassed. Every request is treated as authenticated. Useful for local development.

Self-Contained Mode

The default. Users, API keys, and tokens are all stored within the AeorDB engine itself. The JWT signing key is generated automatically.

File Mode

Identity is loaded from an external file at the specified path. On first startup, a bootstrap API key is printed to stdout so you can authenticate and set up additional users.

Endpoint Summary

Method	Path	Description	Auth Required
POST	`/auth/token`	Exchange API key for JWT	No
POST	`/auth/magic-link`	Request a magic link	No
GET	`/auth/magic-link/verify`	Verify a magic link code	No
POST	`/auth/refresh`	Refresh an expired JWT	No
POST	`/auth/keys/admin`	Create an API key	Yes (root)
GET	`/auth/keys/admin`	List API keys	Yes (root)
DELETE	`/auth/keys/admin/{key_id}`	Revoke an API key	Yes (root)
POST	`/auth/keys`	Create an API key (self-service)	Yes
GET	`/auth/keys`	List your own API keys	Yes
DELETE	`/auth/keys/{key_id}`	Revoke your own API key	Yes

JWT Tokens

All protected endpoints accept a JWT Bearer token in the Authorization header:

Authorization: Bearer eyJhbGciOiJIUzI1NiIs...

Token Claims

Claim	Type	Description
`sub`	string	User ID (UUID) or email
`iss`	string	Always `"aeordb"`
`iat`	integer	Issued-at timestamp (Unix seconds)
`exp`	integer	Expiration timestamp (Unix seconds)
`scope`	string	Optional scope restriction
`permissions`	object	Optional fine-grained permissions

POST /auth/token

Exchange an API key for a JWT and refresh token. This is the primary authentication flow.

Request Body

{
  "api_key": "aeor_660e8400_a1b2c3d4e5f6..."
}

Response

Status: 200 OK

{
  "token": "eyJhbGciOiJIUzI1NiIs...",
  "expires_in": 3600,
  "refresh_token": "rt_a1b2c3d4e5f6..."
}

Field	Type	Description
`token`	string	JWT access token
`expires_in`	integer	Token lifetime in seconds
`refresh_token`	string	Refresh token for obtaining new JWTs

API Key Format

API keys follow the format aeor_{key_id_prefix}_{secret}. The key_id_prefix is extracted for O(1) lookup – the server does not iterate all stored keys.

Example

curl -X POST http://localhost:6830/auth/token \
  -H "Content-Type: application/json" \
  -d '{"api_key": "aeor_660e8400_a1b2c3d4e5f6..."}'

Error Responses

Status	Condition
401	Invalid, revoked, or malformed API key
500	Token creation failure

POST /auth/magic-link

Request a magic link for passwordless authentication. The server always returns 200 OK regardless of whether the email exists, to prevent email enumeration.

When email is configured, AeorDB sends the link only for an existing active user whose username matches the requested email address. AeorDB does not create users from this endpoint. Services that want first-login signup must create the user through the user administration API first, then request the magic link.

Emailed magic links require either AEORDB_MAGIC_LINK_URL_TEMPLATE or AEORDB_MAGIC_LINK_BASE_URL. Use AEORDB_MAGIC_LINK_URL_TEMPLATE when an application shell must consume the login code, for example https://app.example.com/?code={code}. If only AEORDB_MAGIC_LINK_BASE_URL is set, AeorDB builds a direct API verification URL like https://files.example.com/auth/magic-link/verify?code=.... AeorDB deliberately does not derive bearer login links from request Host or proxy headers.

In development mode, set AEORDB_LOG_MAGIC_LINKS=1 to log the magic link URL via tracing instead of sending email. If AEORDB_MAGIC_LINK_BASE_URL is not set, the logged URL is relative.

Rate Limiting

This endpoint is rate-limited per email address. Exceeding the limit returns 429 Too Many Requests.

Request Body

{
  "email": "[email protected]"
}

Response

Status: 200 OK

{
  "message": "If an account exists, a login link has been sent."
}

Example

curl -X POST http://localhost:6830/auth/magic-link \
  -H "Content-Type: application/json" \
  -d '{"email": "[email protected]"}'

Error Responses

Status	Condition
429	Rate limit exceeded

GET /auth/magic-link/verify

Verify a magic link code and receive a JWT. Each code can only be used once.

Query Parameters

Parameter	Type	Required	Description
`code`	string	Yes	The magic link code

Response

Status: 200 OK

{
  "token": "eyJhbGciOiJIUzI1NiIs...",
  "expires_in": 3600
}

Example

curl "http://localhost:6830/auth/magic-link/verify?code=abc123..."

Error Responses

Status	Condition
401	Invalid code, expired, or already used
500	Token creation failure

POST /auth/refresh

Exchange a refresh token for a new JWT and a new refresh token. Implements token rotation – the old refresh token is revoked and cannot be reused.

Request Body

{
  "refresh_token": "rt_a1b2c3d4e5f6..."
}

Response

Status: 200 OK

{
  "token": "eyJhbGciOiJIUzI1NiIs...",
  "expires_in": 3600,
  "refresh_token": "rt_new_token_here..."
}

Example

curl -X POST http://localhost:6830/auth/refresh \
  -H "Content-Type: application/json" \
  -d '{"refresh_token": "rt_a1b2c3d4e5f6..."}'

Error Responses

Status	Condition
401	Invalid, revoked, or expired refresh token
500	Token creation failure

Root User

The root user has the nil UUID (00000000-0000-0000-0000-000000000000). Only the root user can:

Create and manage API keys
Create and manage users
Create and manage groups
Share and unshare files and directories
Create and manage share links
Restore snapshots and manage forks
Run garbage collection
Manage tasks and cron schedules
Export, import, and promote versions

First-Run Bootstrap

When using file:// auth mode, a bootstrap API key is printed to stdout on first run. Use this key to authenticate as root and create additional users and keys:

# Start the server (prints bootstrap key)
aeordb --auth file:///path/to/identity.json

# Exchange the bootstrap key for a token
curl -X POST http://localhost:6830/auth/token \
  -H "Content-Type: application/json" \
  -d '{"api_key": "<bootstrap-key>"}'

Authentication Flow Summary

                 API Key                    Magic Link
                   |                            |
          POST /auth/token             POST /auth/magic-link
                   |                            |
                   v                            v
              JWT + Refresh               Email with code
                   |                            |
                   |                   GET /auth/magic-link/verify
                   |                            |
                   v                            v
              Use JWT in                    JWT Token
           Authorization header                 |
                   |                            |
                   v                            v
           Protected endpoints          Protected endpoints
                   |
          Token expires
                   |
         POST /auth/refresh
                   |
                   v
           New JWT + New Refresh
           (old refresh revoked)

API Keys (Admin)

The /auth/keys/admin endpoints listed in the endpoint summary are for root administrators managing any user’s keys.

Note: The /auth/keys/admin endpoints are for root administrators managing any user’s keys. For self-service key management, see Self-Service API Keys below.

Self-Service API Keys

Any authenticated user can create, list, and revoke their own API keys. Root users can additionally create keys for other users.

POST /auth/keys

Create an API key for yourself.

Request Body:

{
  "label": "MacBook sync client",
  "expires_in_days": 730,
  "rules": [
    {"/assets/**": "-r--l---"},
    {"/drafts/**": "crudlify"},
    {"**": "--------"}
  ]
}

Field	Type	Required	Description
`label`	string	No	Human-friendly name for the key
`expires_in_days`	integer	No	Days until expiry (default: 730, max: 3650)
`rules`	array	No	Path permission rules (default: empty = full pass-through)
`user_id`	string	No	Root only: create key for another user

Response: 201 Created

{
  "key_id": "a1b2c3d4-...",
  "key": "aeor_k_a1b2c3d4..._...",
  "user_id": "e5f6a7b8-...",
  "label": "MacBook sync client",
  "expires_at": 1839024000000,
  "rules": [...]
}

The key field (plaintext) is returned once and can never be retrieved again. Store it securely.

Example:

curl -X POST http://localhost:6830/auth/keys \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "label": "CI deploy key",
    "expires_in_days": 90,
    "rules": [
      {"/deployments/**": "crudlify"},
      {"**": "--------"}
    ]
  }'

GET /auth/keys

List your own API keys (non-revoked). Root users see all keys.

Response: 200 OK

[
  {
    "key_id": "a1b2c3d4-...",
    "label": "MacBook sync client",
    "user_id": "e5f6a7b8-...",
    "expires_at": 1839024000000,
    "created_at": 1776208000000,
    "rules": [...]
  }
]

Never includes the key hash or plaintext.

DELETE /auth/keys/

Revoke one of your own API keys. Root users can revoke anyone’s key.

Response: 200 OK

{
  "revoked": true,
  "key_id": "a1b2c3d4-..."
}

Scoped API Keys

API keys can be restricted to specific paths and operations using rules. Rules are an ordered list of path-glob to permission-flags pairs. The first matching rule wins.

Rule Format

Each rule is a JSON object with one key (the glob pattern) and one value (the permission flags):

[
  {"/assets/**": "-r--l---"},
  {"/drafts/**": "crudlify"},
  {"**": "--------"}
]

Permission Flags

The flags string is exactly 8 characters, one for each operation:

Position	Flag	Operation
0	`c`	Create
1	`r`	Read
2	`u`	Update
3	`d`	Delete (also controls access to deleted files – listing and reading)
4	`l`	List
5	`i`	Invoke
6	`f`	Functions (deploy)
7	`y`	Configure

Use the letter to allow the operation, - to deny it:

crudlify — full access
-r--l--- — read and list only
cr------ — create and read only
-------- — deny all

Rule Evaluation

Rules are evaluated top-to-bottom. The first matching glob determines the permissions.
If no rule matches the path, access is denied.
An empty rules list means no restrictions (full pass-through to user permissions).
Rules can only restrict — they never grant more access than the user already has.

Keys created with user_id: null are share keys – their rules are the sole permission authority. Share keys are not associated with any user and derive all access exclusively from their scoped rules. They are created automatically by the link sharing endpoints.

Scoped keys automatically receive read and list access to ancestor directories leading to their scoped paths. For example, a key scoped to /photos/vacation/** can navigate / and /photos/ with read+list only, but cannot see sibling directories or files outside the scoped path. Directory listings include an effective_permissions field per item when accessed with scoped keys.

Security Behavior

When a scoped key is denied access to a path, the server returns 404 Not Found (not 403 Forbidden). This prevents information leakage — a denied path looks identical to a path that doesn’t exist.

This also applies to directory listings: entries the key cannot access are silently omitted from the response.

Key Expiration

All API keys have a mandatory expiration:

Default: 730 days (2 years)
Maximum: 3650 days (10 years)
Expired keys are rejected at token exchange time

Examples

Read-only key for /assets/:

{
  "label": "Asset viewer",
  "rules": [
    {"/assets/**": "-r--l---"},
    {"**": "--------"}
  ]
}

Full access to one project, read-only elsewhere:

{
  "label": "Project lead - Q4 campaign",
  "rules": [
    {"/projects/q4-campaign/**": "crudlify"},
    {"/shared/**": "-r--l---"},
    {"**": "--------"}
  ]
}

CI/CD deploy key (create and update only, specific path):

{
  "label": "CI pipeline",
  "expires_in_days": 90,
  "rules": [
    {"/deployments/**": "cru-----"},
    {"**": "--------"}
  ]
}

CORS Configuration

AeorDB supports Cross-Origin Resource Sharing (CORS) through a CLI flag and per-path rules stored in the database.

Configuration Methods

CLI Flag

Enable CORS at startup with the --cors-origins flag:

# Allow all origins
aeordb --cors-origins "*"

# Allow specific origins
aeordb --cors-origins "https://app.example.com,https://admin.example.com"

The CLI flag sets the default CORS policy for all routes. When no --cors-origins flag is provided, no CORS headers are sent.

Per-Path Rules (Config File)

For fine-grained control, store per-path CORS rules at /.config/cors.json inside the database:

{
  "rules": [
    {
      "path": "/files/*",
      "origins": ["https://app.example.com"],
      "methods": ["GET", "POST", "PUT", "DELETE", "HEAD", "OPTIONS"],
      "allow_headers": ["Content-Type", "Authorization"],
      "max_age": 3600,
      "allow_credentials": false
    },
    {
      "path": "/files/query",
      "origins": ["*"],
      "methods": ["POST"],
      "allow_headers": ["Content-Type", "Authorization"],
      "max_age": 600,
      "allow_credentials": false
    },
    {
      "path": "/system/events",
      "origins": ["https://app.example.com"],
      "allow_credentials": true
    }
  ]
}

Upload the config file using the engine API:

curl -X PUT http://localhost:6830/files/.config/cors.json \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d @cors.json

Rule Schema

Each rule in the rules array supports:

Field	Type	Default	Description
`path`	string	(required)	URL path to match. Supports trailing `*` for prefix matching.
`origins`	array of strings	(required)	Allowed origins. Use `["*"]` for any origin.
`methods`	array of strings	`["GET","POST","PUT","DELETE","HEAD","OPTIONS"]`	Allowed HTTP methods.
`allow_headers`	array of strings	`["Content-Type","Authorization"]`	Allowed request headers.
`max_age`	integer	`3600`	Preflight cache duration in seconds.
`allow_credentials`	boolean	`false`	Whether to include `Access-Control-Allow-Credentials: true`.

Path Matching

Per-path rules are checked in order (first match wins):

Exact match: "/files/query" matches only /files/query
Prefix match: "/files/*" matches /files/data/file.json, /files/images/photo.png, etc.

If no per-path rule matches, the CLI default (if any) is used.

Precedence

Per-path rules from /.config/cors.json (first match wins)
CLI --cors-origins flag defaults
No CORS headers (if neither is configured)

CORS Middleware Behavior

The CORS middleware runs as the outermost layer in the middleware stack, ensuring that OPTIONS preflight requests are handled before authentication middleware rejects them for missing tokens.

Preflight Requests (OPTIONS)

When a preflight request arrives:

The middleware checks if the Origin header matches an allowed origin.
If allowed: returns 204 No Content with CORS headers:
- Access-Control-Allow-Origin
- Access-Control-Allow-Methods
- Access-Control-Allow-Headers
- Access-Control-Max-Age
- Access-Control-Allow-Credentials (if configured)
If not allowed: returns 403 Forbidden.

Normal Requests

For non-preflight requests with an allowed origin:

The request passes through to the handler normally.
CORS headers are appended to the response:
- Access-Control-Allow-Origin
- Access-Control-Allow-Credentials (if configured)

For requests from non-allowed origins, no CORS headers are added (the browser will block the response).

Wildcard Origin

When origins include "*", the Access-Control-Allow-Origin header is set to *. Note: when using allow_credentials: true, browsers require a specific origin rather than *.

Default CORS Headers (–cors-origins Flag)

When only the --cors-origins flag is used (no per-path rules), the defaults are:

Header	Value
`Access-Control-Allow-Methods`	`GET, POST, PUT, DELETE, HEAD, OPTIONS`
`Access-Control-Allow-Headers`	`Content-Type, Authorization`
`Access-Control-Max-Age`	`3600`
`Access-Control-Allow-Credentials`	Not set

Examples

Development: Allow Everything

aeordb --cors-origins "*"

Production: Specific Origins

aeordb --cors-origins "https://app.example.com,https://admin.example.com"

Per-Path with Credentials

Store in /.config/cors.json:

{
  "rules": [
    {
      "path": "/system/events",
      "origins": ["https://app.example.com"],
      "allow_credentials": true,
      "max_age": 86400
    },
    {
      "path": "/files/*",
      "origins": ["https://app.example.com", "https://admin.example.com"],
      "methods": ["GET", "PUT", "DELETE", "HEAD", "OPTIONS"],
      "allow_headers": ["Content-Type", "Authorization", "X-Request-ID"]
    }
  ]
}

Documentation

AeorDB serves its own mdBook documentation from the HTTP API so humans and automated agents can discover how to use an endpoint without needing separate files.

These routes are public and do not require authentication.

Routes

Method	Path	Description
`GET`	`/docs`	Redirects to `/docs/` so mdBook relative links resolve correctly.
`GET`	`/docs/`	Serves the embedded mdBook documentation index.
`GET`	`/docs/{asset}`	Serves embedded mdBook pages, scripts, styles, fonts, and other static assets.
`GET`	`/docs/SKILL.md`	Serves a raw bot-facing quickstart guide.

The portal root page at GET / also advertises the documentation with a rel="help" link and a login-page link to ./docs/.

Bot Discovery

Agents should start with:

GET /system/health to confirm the endpoint is an AeorDB instance.
GET /docs/SKILL.md for a compact route and safety quickstart.
GET /docs/ for the full documentation.

Build Behavior

The documentation is embedded into the AeorDB binary at build time. If mdbook is available, AeorDB embeds the generated mdBook output. If mdbook is unavailable, AeorDB still builds and serves a minimal fallback documentation page.

Library API

AeorDB can be used as an embedded Rust library without the HTTP server. The aeordb crate exposes all database operations as direct function calls.

Quick Start

Add aeordb to your Cargo.toml:

[dependencies]
aeordb = { path = "../aeordb/aeordb-lib" }

Basic usage:

use aeordb::engine::{StorageEngine, DirectoryOps, RequestContext, BufferedFile, JsonMergeFilePatch, MergeDepth};

// Create or open a database
let engine = StorageEngine::create("my.aeordb").unwrap();
let ctx = RequestContext::system();
let ops = DirectoryOps::new(&engine);
ops.ensure_root_directory(&ctx).unwrap();

// Store a small file (full content in memory — fine for KB-range data)
ops.store_file_buffered(&ctx, "/hello.txt", b"Hello, world!", Some("text/plain")).unwrap();

// Store several small files in one embedded batch
ops.store_files_buffered_batch(&ctx, vec![
    BufferedFile {
        path: "/sync/a.json".to_string(),
        data: br#"{"dirty":false}"#.to_vec(),
        content_type: Some("application/json".to_string()),
    },
    BufferedFile {
        path: "/sync/b.txt".to_string(),
        data: b"short text".to_vec(),
        content_type: Some("text/plain".to_string()),
    },
]).unwrap();

// Merge JSON documents without an HTTP round trip
ops.merge_json_file(&ctx, "/sync/a.json", serde_json::json!({"seen": true}), MergeDepth::Unbounded).unwrap();
ops.merge_json_files_batch(&ctx, vec![
    JsonMergeFilePatch {
        path: "/sync/a.json".to_string(),
        patch: serde_json::json!({"count": 2}),
        depth: MergeDepth::Unbounded,
    },
]).unwrap();

// Read it back into a single Vec
let data = ops.read_file_buffered("/hello.txt").unwrap();
assert_eq!(data, b"Hello, world!");

// For arbitrary-size content, stream from any `Read` source:
let file = std::fs::File::open("big.mp4").unwrap();
ops.store_file_from_reader(&ctx, "/big.mp4", file, Some("video/mp4")).unwrap();

// And read it back chunk-by-chunk without materializing:
let stream = ops.read_file_streaming("/big.mp4").unwrap();
for chunk in stream {
    let chunk = chunk.unwrap();
    // ... write to network / file / hasher / etc.
}

File Operations

All file operations are on DirectoryOps:

let ops = DirectoryOps::new(&engine);

Function	Description
`store_file_buffered(ctx, path, data, content_type)`	Store a file at the given path. Buffered — loads `data` fully into memory; use only for small payloads.
`store_files_buffered_batch(ctx, files)`	Store multiple fully-buffered small files in one embedded batch. Validates every path before writing, preserves created timestamps on overwrite, and supports trusted system paths.
`store_file_from_reader(ctx, path, reader, content_type)`	Store a file by streaming chunks from any `Read` source. Bounded memory. Use for arbitrary-size content.
`read_file_buffered(path)`	Read a file’s content into a single `Vec<u8>`. Buffered — materializes the full file; use only for small payloads.
`read_file_streaming(path)`	Read a file as a streaming iterator of chunks. Bounded memory. Use for arbitrary-size content.
`merge_json_file(ctx, path, patch, depth)`	Apply an RFC 7396 JSON merge patch to one JSON file. Missing files start as `{}` and are created as `application/json`.
`merge_json_files_batch(ctx, patches)`	Apply multiple JSON merge patches, validate/parse every target first, then write the merged documents in one embedded batch.
`delete_file(ctx, path)`	Delete a file
`exists(path)`	Check if a file or directory exists
`get_metadata(path)`	Get file metadata without reading content
`list_directory(path)`	List immediate children of a directory
`create_directory(ctx, path)`	Create an empty directory

Buffered Batch Writes

store_files_buffered_batch is for trusted embedded callers that already have small file bodies in memory. It is not a replacement for streaming uploads of arbitrary-size data.

use aeordb::engine::BufferedFile;

let result = ops.store_files_buffered_batch(&ctx, vec![
    BufferedFile {
        path: "/buckets/users.json".to_string(),
        data: br#"{"updated":true}"#.to_vec(),
        content_type: Some("application/json".to_string()),
    },
    BufferedFile {
        path: "/buckets/index.txt".to_string(),
        data: b"user-bucket\n".to_vec(),
        content_type: Some("text/plain".to_string()),
    },
]).unwrap();

assert_eq!(result.committed, 2);

Batch validation rejects empty batches, root writes, and duplicate normalized paths before writing any entries. Unlike the HTTP /blobs/commit endpoint, this embedded helper supports internal system paths because any caller with direct StorageEngine access is already trusted code.

JSON Merge Patch

The RFC 7396 merge primitive is exported from the engine layer:

use aeordb::engine::{apply_merge_patch, MergeDepth};

let mut target = serde_json::json!({"a": 1, "nested": {"x": 1}});
apply_merge_patch(&mut target, serde_json::json!({"nested": {"y": 2}}), MergeDepth::Unbounded);
assert_eq!(target, serde_json::json!({"a": 1, "nested": {"x": 1, "y": 2}}));

For stored JSON files, use the DirectoryOps helpers:

use aeordb::engine::{JsonMergeFilePatch, MergeDepth};

let single = ops.merge_json_file(
    &ctx,
    "/state/session.json",
    serde_json::json!({"title": "Scratch"}),
    MergeDepth::Unbounded,
).unwrap();
assert!(single.created);

let batch = ops.merge_json_files_batch(&ctx, vec![
    JsonMergeFilePatch {
        path: "/state/session.json".to_string(),
        patch: serde_json::json!({"count": 7}),
        depth: MergeDepth::Unbounded,
    },
]).unwrap();
assert_eq!(batch.merged, 1);

Existing target files must contain valid JSON. If any target in merge_json_files_batch is invalid, the batch fails before writing any merged output.

Directory Listing

use aeordb::engine::directory_listing::list_directory_recursive;

// List all files recursively
let entries = list_directory_recursive(&engine, "/assets", -1, None).unwrap();

// List with glob filter
let psds = list_directory_recursive(&engine, "/assets", -1, Some("*.psd")).unwrap();

// List with a recursive path-shaped glob under the requested directory
let frames = list_directory_recursive(&engine, "/sessions", -1, Some("**/frames/*.json")).unwrap();

// List one level deep
let shallow = list_directory_recursive(&engine, "/assets", 1, None).unwrap();

Symlinks

// Create a symlink
ops.store_symlink(&ctx, "/latest", "/v2/logo.psd").unwrap();

// Read symlink metadata
let record = ops.get_symlink("/latest").unwrap();

// Resolve a symlink (follows chains, detects cycles)
use aeordb::engine::symlink_resolver::{resolve_symlink, ResolvedTarget};
match resolve_symlink(&engine, "/latest").unwrap() {
    ResolvedTarget::File(record) => println!("Points to file: {}", record.path),
    ResolvedTarget::Directory(path) => println!("Points to dir: {}", path),
}

// Delete a symlink (not its target)
ops.delete_symlink(&ctx, "/latest").unwrap();

Versioning

Snapshots

use aeordb::engine::VersionManager;
use std::collections::HashMap;

let vm = VersionManager::new(&engine);

// Create a snapshot
let snapshot = vm.create_snapshot(&ctx, "v1.0", HashMap::new()).unwrap();

// List snapshots
let snapshots = vm.list_snapshots().unwrap();

// Restore a snapshot (replaces HEAD)
vm.restore_snapshot(&ctx, "v1.0").unwrap();

// Delete a snapshot
vm.delete_snapshot(&ctx, "v1.0").unwrap();

Forks

// Create a fork from current HEAD
vm.create_fork(&ctx, "experiment", None).unwrap();

// Create a fork from a snapshot
vm.create_fork(&ctx, "experiment", Some("v1.0")).unwrap();

// List forks
let forks = vm.list_forks().unwrap();

// Promote a fork to HEAD
vm.promote_fork(&ctx, "experiment").unwrap();

// Abandon a fork
vm.abandon_fork(&ctx, "experiment").unwrap();

File-Level Version Access

use aeordb::engine::{file_history, file_restore_from_version};
use aeordb::engine::version_access::{resolve_file_at_version, read_file_at_version};

// Read a file as it was at a specific snapshot
let snapshot = vm.create_snapshot(&ctx, "v1", HashMap::new()).unwrap();
let data = read_file_at_version(&engine, &snapshot.root_hash, "/doc.txt").unwrap();

// Get file change history across all snapshots
let history = file_history(&engine, "/doc.txt").unwrap();
for entry in &history {
    println!("{}: {} ({})", entry.snapshot, entry.change_type,
        entry.size.unwrap_or(0));
}

// Restore a file from a snapshot (creates auto-safety-snapshot)
let (auto_snap, size) = file_restore_from_version(
    &engine, &ctx, "/doc.txt", Some("v1"), None,
).unwrap();

If lifecycle configuration has snapshot_writes_enabled set to false, file_restore_from_version skips the auto-safety snapshot and returns an empty auto_snap string while still restoring the file.

Sync / Replication

The library exposes the same sync primitives as the HTTP endpoints, enabling embedded clients to replicate without HTTP overhead.

use aeordb::engine::{
    compute_sync_diff, get_needed_chunks, apply_sync_chunks,
    SyncDiff, ChunkData,
};

// Compute what changed since a known state
let diff = compute_sync_diff(
    &engine,
    Some(&last_known_hash),  // None for full sync
    Some(&["/assets/**".to_string()]),  // path filter
    false,  // exclude /.system/
).unwrap();

// Get the chunk data for transfer
let chunks = get_needed_chunks(&engine, &diff.chunk_hashes_needed).unwrap();

// On the receiving side: store incoming chunks
let stored = apply_sync_chunks(&engine, &chunks).unwrap();

Conflict Management

use aeordb::engine::{
    list_conflicts_typed, ConflictRecord,
};
use aeordb::engine::conflict_store::{resolve_conflict, dismiss_conflict};

// List unresolved conflicts
let conflicts = list_conflicts_typed(&engine).unwrap();

// Resolve by picking winner or loser
resolve_conflict(&engine, &ctx, "/contested/file.psd", "winner").unwrap();

// Or accept the auto-winner
dismiss_conflict(&engine, &ctx, "/other/file.txt").unwrap();

Querying

use aeordb::engine::{QueryEngine, QueryBuilder};

let qe = QueryEngine::new(&engine);

// Build and execute a query
let query = QueryBuilder::new("/users")
    .field("name").contains("Alice")
    .build();

let results = qe.execute(&query).unwrap();

Backup & Export

use aeordb::engine::{export_version, import_backup};

// Export current HEAD as a .aeordb file
let result = export_version(&engine, &head_hash, "/tmp/backup.aeordb").unwrap();

// Import a backup
let result = import_backup(&engine, &ctx, "/tmp/backup.aeordb").unwrap();

Garbage Collection

use aeordb::engine::gc::run_gc;

// Run GC (dry_run = true for preview)
let result = run_gc(&engine, &ctx, false).unwrap();
println!("Reclaimed {} bytes from {} entries", result.reclaimed_bytes, result.garbage_entries);

System Data

System data (users, groups, API keys, config) is stored under /.system/ and accessed via system_store:

use aeordb::engine::system_store;

// Store/retrieve config
system_store::store_config(&engine, &ctx, "my_key", b"my_value").unwrap();
let value = system_store::get_config(&engine, "my_key").unwrap();

// User management
let user = aeordb::engine::User::new("alice", "[email protected]");
system_store::store_user(&engine, &ctx, &user).unwrap();
let users = system_store::list_users(&engine).unwrap();

// API key management
system_store::store_api_key(&engine, &ctx, &key_record).unwrap();
let keys = system_store::list_api_keys(&engine).unwrap();

Virtual Clock

For replication, the virtual clock provides synchronized timestamps:

use aeordb::engine::{SystemClock, VirtualClock, PeerClockTracker};

let clock = SystemClock::new(node_id);
let now = clock.now_ms();

// For testing, use MockClock
use aeordb::engine::MockClock;
let mock = MockClock::new(1, 1000);
mock.advance(500);
assert_eq!(mock.now_ms(), 1500);

Event Bus

Subscribe to database events programmatically:

use aeordb::engine::EventBus;

let bus = EventBus::new();
let mut receiver = bus.subscribe();

// Events are emitted automatically on file operations
// Listen in a separate task:
tokio::spawn(async move {
    while let Ok(event) = receiver.recv().await {
        println!("Event: {} on {}", event.event_type, event.source);
    }
});

Key Types

Type	Module	Description
`StorageEngine`	`engine`	The database engine
`DirectoryOps`	`engine`	File/directory operations
`VersionManager`	`engine`	Snapshot/fork management
`RequestContext`	`engine`	Context for write operations
`FileRecord`	`engine`	File metadata
`SymlinkRecord`	`engine`	Symlink metadata
`ChildEntry`	`engine`	Directory listing entry
`ListingEntry`	`engine`	Recursive listing entry
`SyncDiff`	`engine`	Sync diff result
`ChunkData`	`engine`	Chunk hash + data pair
`ConflictRecord`	`engine`	Typed conflict entry
`QueryEngine`	`engine`	Query execution
`EventBus`	`engine`	Event pub/sub
`PeerManager`	`engine`	Cluster peer management
`VirtualClock`	`engine`	Clock trait for timestamps

Parser Plugins

Parser plugins let you transform non-JSON files into structured, queryable JSON when they are stored in AeorDB. AeorDB provides two parser mechanisms:

Native parsers (built-in) – 8 parsers for common formats (text, HTML/XML, PDF, images, audio, video, MS Office, ODF) that run automatically with zero deployment. See Plugin Endpoints – Native Parsers for the full format list.
WASM parser plugins – custom parsers compiled to WebAssembly, deployed per-table for proprietary or specialized formats.

Native parsers are tried first. If no native parser handles the content type, the engine falls through to any configured WASM parser.

How WASM Parsers Work

When a file is written to a table that has a WASM parser configured, AeorDB automatically routes the raw bytes through the parser’s WASM module. The parser receives the file data plus metadata and returns a JSON value. That JSON is then indexed by AeorDB’s query engine, making the original non-JSON file fully searchable.

Writing a Parser: Step by Step

1. Create a Rust Crate

cargo new my-parser --lib
cd my-parser

Edit Cargo.toml:

[package]
name = "my-parser"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["cdylib"]

[dependencies]
aeordb-plugin-sdk = { path = "../aeordb-plugin-sdk" }
serde_json = "1"

The crate-type = ["cdylib"] is required – it tells the compiler to produce a dynamic library suitable for WASM.

2. Implement the Parse Function

Use the aeordb_parser! macro to generate the WASM export boilerplate. Your job is to write a function that takes a ParserInput and returns Result<serde_json::Value, String>.

#![allow(unused)]
fn main() {
use aeordb_plugin_sdk::aeordb_parser;
use aeordb_plugin_sdk::parser::*;

aeordb_parser!(parse);

fn parse(input: ParserInput) -> Result<serde_json::Value, String> {
    let text = std::str::from_utf8(&input.data)
        .map_err(|e| e.to_string())?;
    Ok(serde_json::json!({
        "text": text,
        "metadata": {
            "line_count": text.lines().count(),
            "word_count": text.split_whitespace().count(),
        }
    }))
}
}

The aeordb_parser! macro generates:

A global allocator for the WASM target
A handle(ptr, len) -> i64 export that deserializes the parser envelope, calls your function, and returns the serialized response as a packed pointer+length

You never interact with the raw WASM ABI directly.

3. Build for WASM

cargo build --target wasm32-unknown-unknown --release

The compiled module lands at:

target/wasm32-unknown-unknown/release/my_parser.wasm

4. Deploy the Parser

Upload the WASM binary to a table’s plugin deployment endpoint:

curl -X PUT \
  http://localhost:6830/mydb/myschema/mytable/_deploy \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/wasm" \
  --data-binary @target/wasm32-unknown-unknown/release/my_parser.wasm

5. Configure Content-Type Routing

Create or update /.config/parsers.json to route specific content types to your parser:

{
  "parsers": {
    "text/plain": "my-parser",
    "text/csv": "csv-parser",
    "application/pdf": "pdf-parser"
  }
}

When a file with a matching Content-Type is stored, AeorDB automatically invokes the corresponding parser.

6. Configure Indexing

Add the parser name to indexes.json so the parsed output is indexed:

{
  "indexes": [
    {
      "field": "text",
      "type": "fulltext"
    },
    {
      "field": "metadata.word_count",
      "type": "numeric"
    }
  ]
}

The `ParserInput` Struct

Your parse function receives a ParserInput with two fields:

Field	Type	Description
`data`	`Vec<u8>`	Raw file bytes (already base64-decoded from the wire envelope)
`meta`	`FileMeta`	Metadata about the file being parsed

`FileMeta` Fields

Field	Type	Description
`filename`	`String`	File name only (e.g., `"report.pdf"`)
`path`	`String`	Full storage path (e.g., `"/docs/reports/report.pdf"`)
`content_type`	`String`	MIME type (e.g., `"text/plain"`)
`size`	`u64`	Raw file size in bytes
`hash`	`String`	Hex-encoded content hash (may be empty)
`hash_algorithm`	`String`	Hash algorithm used (e.g., `"blake3_256"`)
`created_at`	`i64`	Creation timestamp (ms since epoch, default 0)
`updated_at`	`i64`	Last update timestamp (ms since epoch, default 0)

Real-World Example: Plaintext Parser

The built-in plaintext parser (aeordb-parsers/plaintext) demonstrates a production parser:

#![allow(unused)]
fn main() {
use aeordb_plugin_sdk::aeordb_parser;
use aeordb_plugin_sdk::parser::*;

aeordb_parser!(parse);

fn parse(input: ParserInput) -> Result<serde_json::Value, String> {
    let text = std::str::from_utf8(&input.data)
        .map_err(|e| format!("not valid UTF-8: {}", e))?;

    let line_count = text.lines().count();
    let word_count = text.split_whitespace().count();
    let char_count = text.chars().count();
    let byte_count = input.data.len();

    // Extract first line as a "title" (common convention for text files)
    let title = text.lines().next().unwrap_or("").trim().to_string();

    // Detect if it looks like source code
    let has_braces = text.contains('{') && text.contains('}');
    let has_imports = text.contains("import ")
        || text.contains("use ")
        || text.contains("#include");
    let looks_like_code = has_braces || has_imports;

    Ok(serde_json::json!({
        "text": text,
        "metadata": {
            "filename": input.meta.filename,
            "content_type": input.meta.content_type,
            "size": byte_count,
            "line_count": line_count,
            "word_count": word_count,
            "char_count": char_count,
        },
        "title": title,
        "looks_like_code": looks_like_code,
    }))
}
}

This parser:

Validates UTF-8 encoding (returns an error for binary data)
Extracts text statistics (lines, words, characters)
Pulls the first line as a title
Heuristically detects source code

Error Handling

Return Err(String) from your parse function to signal a failure. AeorDB will store the error in the parser response and the file will not be indexed. The original file is still stored – only parsing/indexing is skipped.

#![allow(unused)]
fn main() {
fn parse(input: ParserInput) -> Result<serde_json::Value, String> {
    if input.data.is_empty() {
        return Err("empty file".to_string());
    }
    // ...
}
}

Query Plugins

Query plugins are WASM modules that can read, write, delete, query, and aggregate data inside AeorDB, then return custom HTTP responses. They are the extension mechanism for building custom API endpoints, computed views, data transformations, or any logic that needs to run server-side.

First-party examples live under aeordb-plugins/. See Default Plugins for the bundled extract and jq query plugins.

How It Works

A query plugin receives a PluginRequest (containing the HTTP body and metadata) and a PluginContext that provides host functions for interacting with the database. The plugin performs whatever logic it needs – querying data, writing files, aggregating results – and returns a PluginResponse with a status code, body, and content type.

Writing a Query Plugin: Step by Step

1. Create a Rust Crate

cargo new my-plugin --lib
cd my-plugin

Edit Cargo.toml:

[package]
name = "my-plugin"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["cdylib"]

[dependencies]
aeordb-plugin-sdk = { path = "../aeordb-plugin-sdk" }
serde_json = "1"

2. Implement the Handler

Use the aeordb_query_plugin! macro and write a function that takes (PluginContext, PluginRequest) and returns Result<PluginResponse, PluginError>.

#![allow(unused)]
fn main() {
use aeordb_plugin_sdk::prelude::*;
use aeordb_plugin_sdk::aeordb_query_plugin;

aeordb_query_plugin!(handle);

fn handle(ctx: PluginContext, request: PluginRequest) -> Result<PluginResponse, PluginError> {
    let results = ctx.query("/users")
        .field("name").contains("Alice")
        .field("age").gt_u64(21)
        .limit(10)
        .execute()?;

    PluginResponse::json(200, &serde_json::json!({
        "users": results,
        "count": results.len()
    })).map_err(|e| PluginError::SerializationFailed(e.to_string()))
}
}

The aeordb_query_plugin! macro generates:

A global allocator for the WASM target
An alloc(size) -> ptr export for host-to-guest memory allocation
A handle(ptr, len) -> i64 export that deserializes the request, creates a PluginContext, calls your function, and returns the serialized response

3. Build and Deploy

cargo build --target wasm32-unknown-unknown --release

curl -X PUT \
  http://localhost:6830/mydb/myschema/mytable/_deploy \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/wasm" \
  --data-binary @target/wasm32-unknown-unknown/release/my_plugin.wasm

4. Invoke the Plugin

curl -X POST \
  http://localhost:6830/mydb/myschema/mytable/_invoke/my-plugin \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "Alice"}'

The `PluginRequest` Struct

Field	Type	Description
`arguments`	`Vec<u8>`	Raw argument bytes (the HTTP request body forwarded to the plugin)
`metadata`	`HashMap<String, String>`	Key-value metadata about the invocation context

The metadata map typically contains:

Key	Description
`function_name`	The function name from the invoke URL (e.g., `"echo"`, `"read"`)

You can use function_name to multiplex a single plugin into multiple operations:

#![allow(unused)]
fn main() {
fn handle(ctx: PluginContext, request: PluginRequest) -> Result<PluginResponse, PluginError> {
    let function = request.metadata
        .get("function_name")
        .map(|s| s.as_str())
        .unwrap_or("default");

    match function {
        "search" => handle_search(ctx, &request),
        "stats" => handle_stats(ctx, &request),
        _ => Ok(PluginResponse::error(404, &format!("Unknown function: {}", function))),
    }
}
}

The `PluginContext`

PluginContext is your handle for calling AeorDB host functions from inside the WASM sandbox. It is created automatically by the macro and passed to your handler.

File Operations

#![allow(unused)]
fn main() {
// Read a file -- returns FileData { data, content_type, size }
let file = ctx.read_file("/mydb/users/alice.json")?;

// Extract text without buffering the full file through the plugin boundary
let excerpt = ctx.extract_file("/mydb/docs/readme.md", ExtractRequest::lines(10, 20))?;

// Write a file (create or overwrite)
ctx.write_file("/mydb/output/result.json", b"{\"ok\":true}", "application/json")?;

// Delete a file
ctx.delete_file("/mydb/temp/scratch.json")?;

// Get file metadata -- returns FileMetadata { path, size, content_type, created_at, updated_at }
let meta = ctx.file_metadata("/mydb/users/alice.json")?;

// List directory entries -- returns Vec<DirEntry { name, entry_type, size }>
let entries = ctx.list_directory("/mydb/users/")?;
}

Query

Use ctx.query(path) to get a QueryBuilder with a fluent API:

#![allow(unused)]
fn main() {
let results = ctx.query("/users")
    .field("name").contains("Alice")
    .field("age").gt_u64(21)
    .sort("name", SortDirection::Asc)
    .limit(10)
    .offset(0)
    .execute()?;

// results: Vec<QueryResult { path, score, matched_by }>
}

Aggregate

Use ctx.aggregate(path) to get an AggregateBuilder:

#![allow(unused)]
fn main() {
let stats = ctx.aggregate("/orders")
    .count()
    .sum("total")
    .avg("total")
    .min_val("total")
    .max_val("total")
    .group_by("status")
    .limit(100)
    .execute()?;

// stats: AggregateResult { groups, total_count }
}

The `QueryBuilder`

The QueryBuilder provides a fluent API for composing queries. Multiple conditions on the top level are implicitly ANDed.

Field Operators

Start a field condition with .field("name"), then chain an operator:

Equality:

.eq(value: &[u8]) – exact match on raw bytes
.eq_u64(value) – exact match on u64
.eq_i64(value) – exact match on i64
.eq_f64(value) – exact match on f64
.eq_str(value) – exact match on string
.eq_bool(value) – exact match on boolean

Comparison:

.gt(value: &[u8]), .gt_u64(value), .gt_str(value), .gt_f64(value) – greater than
.lt(value: &[u8]), .lt_u64(value), .lt_str(value), .lt_f64(value) – less than

Range:

.between(min: &[u8], max: &[u8]) – inclusive range on raw bytes
.between_u64(min, max) – inclusive range on u64
.between_str(min, max) – inclusive range on strings

Set Membership:

.in_values(values: &[&[u8]]) – match any of the given byte values
.in_u64(values: &[u64]) – match any of the given u64 values
.in_str(values: &[&str]) – match any of the given strings

Text Search:

.contains(text) – substring / trigram contains
.similar(text, threshold) – trigram similarity (threshold 0.0–1.0)
.phonetic(text) – Soundex/Metaphone phonetic match
.fuzzy(text) – Levenshtein distance fuzzy match
.match_query(text) – full-text match

Boolean Combinators

#![allow(unused)]
fn main() {
// AND group
ctx.query("/users")
    .and(|q| q.field("name").contains("Alice").field("active").eq_bool(true))
    .limit(10)
    .execute()?;

// OR group
ctx.query("/users")
    .or(|q| q.field("role").eq_str("admin").field("role").eq_str("superadmin"))
    .execute()?;

// NOT
ctx.query("/users")
    .not(|q| q.field("status").eq_str("banned"))
    .execute()?;
}

Sorting and Pagination

#![allow(unused)]
fn main() {
ctx.query("/users")
    .field("active").eq_bool(true)
    .sort("created_at", SortDirection::Desc)
    .sort("name", SortDirection::Asc)
    .limit(25)
    .offset(50)
    .execute()?;
}

The `PluginResponse`

Three builder methods for constructing responses:

`PluginResponse::json(status_code, &body)`

Serializes any Serialize type to JSON. Sets Content-Type: application/json.

#![allow(unused)]
fn main() {
PluginResponse::json(200, &serde_json::json!({"ok": true}))
    .map_err(|e| PluginError::SerializationFailed(e.to_string()))
}

`PluginResponse::text(status_code, body)`

Returns a plain text response. Sets Content-Type: text/plain.

#![allow(unused)]
fn main() {
Ok(PluginResponse::text(201, "Created by plugin"))
}

`PluginResponse::error(status_code, message)`

Returns a JSON error response in the form {"error": "<message>"}.

#![allow(unused)]
fn main() {
Ok(PluginResponse::error(404, "User not found"))
}

Real-World Example: Echo Plugin

The built-in echo plugin (aeordb-plugins/echo-plugin) demonstrates multiplexing a single plugin across multiple operations:

#![allow(unused)]
fn main() {
use aeordb_plugin_sdk::prelude::*;
use aeordb_plugin_sdk::aeordb_query_plugin;

aeordb_query_plugin!(echo_handle);

fn echo_handle(ctx: PluginContext, request: PluginRequest) -> Result<PluginResponse, PluginError> {
    let function = request.metadata
        .get("function_name")
        .map(|s| s.as_str())
        .unwrap_or("echo");

    match function {
        "echo" => {
            PluginResponse::json(200, &serde_json::json!({
                "echo": true,
                "metadata": request.metadata,
                "body_len": request.arguments.len(),
            }))
            .map_err(|e| PluginError::SerializationFailed(e.to_string()))
        }
        "read" => {
            let path = std::str::from_utf8(&request.arguments)
                .map_err(|e| PluginError::ExecutionFailed(e.to_string()))?;
            match ctx.read_file(path) {
                Ok(file) => PluginResponse::json(200, &serde_json::json!({
                    "size": file.size,
                    "content_type": file.content_type,
                    "data_len": file.data.len(),
                }))
                .map_err(|e| PluginError::SerializationFailed(e.to_string())),
                Err(e) => Ok(PluginResponse::error(404, &e.to_string())),
            }
        }
        "write" => {
            match ctx.write_file("/plugin-output/result.json", b"{\"written\":true}", "application/json") {
                Ok(()) => PluginResponse::json(201, &serde_json::json!({"ok": true}))
                    .map_err(|e| PluginError::SerializationFailed(e.to_string())),
                Err(e) => Ok(PluginResponse::error(500, &e.to_string())),
            }
        }
        "delete" => {
            let path = std::str::from_utf8(&request.arguments)
                .map_err(|e| PluginError::ExecutionFailed(e.to_string()))?;
            match ctx.delete_file(path) {
                Ok(()) => PluginResponse::json(200, &serde_json::json!({"deleted": true}))
                    .map_err(|e| PluginError::SerializationFailed(e.to_string())),
                Err(e) => Ok(PluginResponse::error(500, &e.to_string())),
            }
        }
        "metadata" => {
            let path = std::str::from_utf8(&request.arguments)
                .map_err(|e| PluginError::ExecutionFailed(e.to_string()))?;
            match ctx.file_metadata(path) {
                Ok(meta) => PluginResponse::json(200, &serde_json::json!(meta))
                    .map_err(|e| PluginError::SerializationFailed(e.to_string())),
                Err(e) => Ok(PluginResponse::error(404, &e.to_string())),
            }
        }
        "list" => {
            let path = std::str::from_utf8(&request.arguments)
                .map_err(|e| PluginError::ExecutionFailed(e.to_string()))?;
            match ctx.list_directory(path) {
                Ok(entries) => PluginResponse::json(200, &serde_json::json!({"entries": entries}))
                    .map_err(|e| PluginError::SerializationFailed(e.to_string())),
                Err(e) => Ok(PluginResponse::error(500, &e.to_string())),
            }
        }
        "status" => Ok(PluginResponse::text(201, "Created by plugin")),
        _ => Ok(PluginResponse::error(404, &format!("Unknown function: {}", function))),
    }
}
}

Default Plugins

AeorDB ships first-party WASM query plugins under aeordb-plugins/. Release WASM builds for these plugins are embedded into the AeorDB server binary and installed at startup into user-accessible plugin paths.

On startup, AeorDB installs these bundled plugins if they are missing. If a plugin already exists at a bundled path, AeorDB overwrites it only when both conditions are true:

The stored plugin ID matches the bundled plugin ID.
The bundled plugin version is greater than or equal to the stored plugin version.

This makes checksum drift a signal, not the authority. A stored plugin at the same path with a different plugin ID is treated as user-managed and left untouched. A stored plugin with the bundled ID but a newer version is also left untouched to avoid downgrades.

Plugin	Plugin ID	Version	Author	Public invoke path
`extract`	`/org/aeordev/aeordb/plugins/extract`	`0.1.0`	`AeorDB`	`POST /plugins/extract/invoke`
`jq`	`/org/aeordev/aeordb/plugins/jq`	`0.1.0`	`AeorDB`	`POST /plugins/jq/invoke`

If you change a default plugin’s source, rebuild its WASM and refresh the embedded copy before rebuilding AeorDB:

cd aeordb-plugins/extract-plugin
cargo build --target wasm32-unknown-unknown --release
cp target/wasm32-unknown-unknown/release/aeordb_extract_plugin.wasm \
  ../../aeordb-lib/src/plugins/bundled/extract.wasm

cd ../jq-plugin
cargo build --target wasm32-unknown-unknown --release
cp target/wasm32-unknown-unknown/release/aeordb_jq_plugin.wasm \
  ../../aeordb-lib/src/plugins/bundled/jq.wasm

User-deployed plugins still use the normal plugin deployment API. The bundled plugin paths are reserved for AeorDB defaults; if one of those paths is occupied by a plugin without the matching bundled plugin ID, startup leaves it untouched and logs a warning.

`extract`

The extract plugin reads only the requested text range through the native plugin host extraction call. It does not buffer the whole file across the plugin boundary.

Request fields:

Field	Type	Required	Description
`file`	string	yes	File path to extract from
`path`	string	alias	Alias for `file`
`mode`	string	yes	`lines` or `chars`
`start`	integer	no	1-based line start for `lines`, 0-based char start for `chars`
`end`	integer	no	Inclusive line end for `lines`, exclusive char end for `chars`
`max_bytes`	integer	no	Maximum returned text bytes

Example:

POST /plugins/extract/invoke
Content-Type: application/json

{
  "file": "/docs/readme.md",
  "mode": "lines",
  "start": 10,
  "end": 20,
  "max_bytes": 65536
}

Response body:

{
  "text": "selected text\n",
  "content_type": "text/markdown",
  "source_size": 12345,
  "mode": "lines",
  "start": 10,
  "end": 20,
  "truncated": false
}

`jq`

The jq plugin reads a JSON file and evaluates a jq-compatible expression using the embedded jaq engine. JSON files are currently loaded in full before filtering.

Request fields:

Field	Type	Required	Description
`file`	string	yes	JSON file path
`path`	string	alias	Alias for `file`
`expr`	string	yes	jq expression

Example:

POST /plugins/jq/invoke
Content-Type: application/json

{
  "file": "/data/messages.json",
  "expr": ".messages[] | select(.role == \"user\") | .content"
}

Responses always use a plural outputs array, even when the expression emits one value:

{
  "outputs": [
    "first user message",
    "second user message"
  ]
}

Plugin SDK Reference

Complete type reference for the aeordb-plugin-sdk crate. This covers every public struct, enum, trait, and method available to plugin authors.

Macros

`aeordb_parser!(fn_name)`

Generates WASM exports for a parser plugin. Your function must have the signature:

#![allow(unused)]
fn main() {
fn fn_name(input: ParserInput) -> Result<serde_json::Value, String>
}

Generated exports:

handle(ptr: i32, len: i32) -> i64 – deserializes the parser envelope, calls your function, returns packed pointer+length to the serialized response

`aeordb_query_plugin!(fn_name)`

Generates WASM exports for a query plugin. Your function must have the signature:

#![allow(unused)]
fn main() {
fn fn_name(ctx: PluginContext, request: PluginRequest) -> Result<PluginResponse, PluginError>
}

Generated exports:

alloc(size: i32) -> i32 – allocates guest memory for the host to write request data
handle(ptr: i32, len: i32) -> i64 – deserializes the request, creates a PluginContext, calls your function, returns packed pointer+length to the serialized response

Prelude

Import everything you need with:

#![allow(unused)]
fn main() {
use aeordb_plugin_sdk::prelude::*;
}

This re-exports: PluginError, PluginRequest, PluginResponse, json, ParserInput, FileMeta, PluginContext, FileData, DirEntry, FileMetadata, ExtractRequest, ExtractedText, QueryResult, AggregateResult, SortDirection.

`PluginRequest`

Request passed to a query plugin when it is invoked.

Field	Type	Description
`arguments`	`Vec<u8>`	Raw argument bytes (e.g., the HTTP request body forwarded to the plugin)
`metadata`	`HashMap<String, String>`	Key-value metadata about the invocation context

Common metadata keys:

Key	Description
`function_name`	The function name from the invoke URL

`PluginResponse`

Response returned by a plugin after handling a request.

Field	Type	Description
`status_code`	`u16`	HTTP-style status code
`body`	`Vec<u8>`	Raw response body bytes
`content_type`	`Option<String>`	MIME content type of the body
`headers`	`HashMap<String, String>`	Additional response headers

Builder Methods

`PluginResponse::json(status_code: u16, body: &T) -> Result<Self, serde_json::Error>`

Serializes body (any Serialize type) to JSON. Sets content_type to "application/json".

#![allow(unused)]
fn main() {
PluginResponse::json(200, &serde_json::json!({"ok": true}))
}

`PluginResponse::text(status_code: u16, body: impl Into<String>) -> Self`

Creates a plain text response. Sets content_type to "text/plain".

#![allow(unused)]
fn main() {
PluginResponse::text(200, "Hello, world!")
}

`PluginResponse::error(status_code: u16, message: impl Into<String>) -> Self`

Creates a JSON error response: {"error": "<message>"}. Sets content_type to "application/json".

#![allow(unused)]
fn main() {
PluginResponse::error(404, "not found")
}

`PluginError`

Error enum for the plugin system.

Variant	Description
`NotFound(String)`	The plugin could not be found
`ExecutionFailed(String)`	The plugin failed during execution
`SerializationFailed(String)`	Request or response could not be serialized/deserialized
`ResourceLimitExceeded(String)`	Plugin exceeded memory, fuel, or other resource limits
`InvalidModule(String)`	An invalid or corrupt WASM module was provided
`Internal(String)`	A generic internal error

All variants carry a String message. PluginError implements Display, Debug, and Error.

`json`

The SDK re-exports serde_json directly and also provides aeordb_plugin_sdk::json helpers for common plugin request parsing, serialization, and recursive value merging.

Types and macros re-exported from serde_json:

Item	Description
`json!`	Build a JSON value inline
`Value`	Dynamic JSON value
`Map`	JSON object map
`Number`	JSON number

Helper functions:

Function	Signature	Description
`parse_bytes`	`<T: DeserializeOwned>(&[u8]) -> Result<T, PluginError>`	Parse raw JSON bytes into a typed value
`parse_str`	`<T: DeserializeOwned>(&str) -> Result<T, PluginError>`	Parse a JSON string into a typed value
`parse_request`	`<T: DeserializeOwned>(&PluginRequest) -> Result<T, PluginError>`	Parse `PluginRequest.arguments` into a typed value
`to_value`	`<T: Serialize>(&T) -> Result<Value, PluginError>`	Serialize into `serde_json::Value`
`to_bytes`	`<T: Serialize>(&T) -> Result<Vec<u8>, PluginError>`	Serialize into compact JSON bytes
`to_string`	`<T: Serialize>(&T) -> Result<String, PluginError>`	Serialize into a compact JSON string
`merge_into`	`(&mut Value, Value)`	Recursively merge one JSON value into another
`merged`	`(Value, Value) -> Value`	Return a merged copy
`merge_all`	`(Value, impl IntoIterator<Item = Value>) -> Value`	Apply overlays left to right

Merge semantics are recursive for object/object branches. Arrays, scalars, and null replace the existing value. This is not JSON Merge Patch; null is preserved as a value and does not delete a key.

#![allow(unused)]
fn main() {
use aeordb_plugin_sdk::prelude::*;
use serde::Deserialize;

#[derive(Deserialize)]
struct Args {
    file: String,
    options: json::Value,
}

let args: Args = json::parse_request(&request)?;
let options = json::merged(
    json::json!({"max_bytes": 65536, "format": {"pretty": false}}),
    args.options,
);
}

`PluginContext`

Guest-side handle for calling AeorDB host functions from WASM. Created automatically by aeordb_query_plugin! and passed to the handler.

On non-WASM targets (native compilation), all methods return PluginError::ExecutionFailed – this allows IDE support and unit testing of plugin logic without a WASM runtime.

File Operations

`read_file(&self, path: &str) -> Result<FileData, PluginError>`

Read a file at the given path. Returns the decoded file bytes, content type, and size.

`extract_file(&self, path: &str, request: ExtractRequest) -> Result<ExtractedText, PluginError>`

Extract a UTF-8 text slice without buffering the full file through the plugin boundary.

Supported modes:

Mode	Range semantics
`lines`	`start` and `end` are 1-based inclusive line numbers
`chars`	`start` is 0-based inclusive and `end` is exclusive

Line extraction treats \r\n as one line break while preserving the original line ending in the returned text. Lone \n and lone \r also count as line breaks.

#![allow(unused)]
fn main() {
let lines = ctx.extract_file("/docs/readme.md", ExtractRequest::lines(10, 20))?;
let chars = ctx.extract_file("/docs/readme.md", ExtractRequest::chars(0, 500))?;
}

ExtractRequest.max_bytes can cap the returned text size. The host enforces an absolute upper bound and sets ExtractedText.truncated when output is cut at the cap.

`write_file(&self, path: &str, data: &[u8], content_type: &str) -> Result<(), PluginError>`

Write (create or overwrite) a file. Data is base64-encoded on the wire automatically.

`delete_file(&self, path: &str) -> Result<(), PluginError>`

Delete a file at the given path.

`file_metadata(&self, path: &str) -> Result<FileMetadata, PluginError>`

Retrieve metadata for a file without reading its contents.

`list_directory(&self, path: &str) -> Result<Vec<DirEntry>, PluginError>`

List directory entries at the given path.

Query and Aggregation

`query(&self, path: &str) -> QueryBuilder`

Start building a query against files at the given path. See QueryBuilder.

`aggregate(&self, path: &str) -> AggregateBuilder`

Start building an aggregation against files at the given path. See AggregateBuilder.

`FileData`

Raw file data returned by read_file.

Field	Type	Description
`data`	`Vec<u8>`	Decoded file bytes
`content_type`	`String`	MIME content type
`size`	`u64`	File size in bytes

`DirEntry`

A single directory entry returned by list_directory.

Field	Type	Description
`name`	`String`	Entry name (file or directory name, not the full path)
`entry_type`	`String`	`"file"` or `"directory"`
`size`	`u64`	Size in bytes (0 for directories, defaults to 0 if absent)

`FileMetadata`

Metadata about a stored file.

Field	Type	Description
`path`	`String`	Full storage path
`size`	`u64`	File size in bytes
`content_type`	`Option<String>`	MIME content type (if known)
`created_at`	`i64`	Creation timestamp (ms since epoch)
`updated_at`	`i64`	Last update timestamp (ms since epoch)

`ParserInput`

Input to a parser function.

Field	Type	Description
`data`	`Vec<u8>`	Raw file bytes (base64-decoded from the wire envelope)
`meta`	`FileMeta`	File metadata

`FileMeta`

Metadata about the file being parsed (available inside parser plugins).

Field	Type	Description
`filename`	`String`	File name only (e.g., `"report.pdf"`)
`path`	`String`	Full storage path (e.g., `"/docs/reports/report.pdf"`)
`content_type`	`String`	MIME type
`size`	`u64`	Raw file size in bytes
`hash`	`String`	Hex-encoded content hash (may be empty)
`hash_algorithm`	`String`	Hash algorithm (e.g., `"blake3_256"`, may be empty)
`created_at`	`i64`	Creation timestamp (ms since epoch, default 0)
`updated_at`	`i64`	Last update timestamp (ms since epoch, default 0)

`QueryBuilder`

Fluent builder for constructing AeorDB queries. Obtained via PluginContext::query(path) or QueryBuilder::new(path).

Field Conditions

Start with .field("name") to get a FieldQueryBuilder, then chain one operator:

Equality

Method	Signature	Description
`eq`	`(value: &[u8]) -> QueryBuilder`	Exact match on raw bytes
`eq_u64`	`(value: u64) -> QueryBuilder`	Exact match on u64
`eq_i64`	`(value: i64) -> QueryBuilder`	Exact match on i64
`eq_f64`	`(value: f64) -> QueryBuilder`	Exact match on f64
`eq_str`	`(value: &str) -> QueryBuilder`	Exact match on string
`eq_bool`	`(value: bool) -> QueryBuilder`	Exact match on boolean

Greater Than

Method	Signature	Description
`gt`	`(value: &[u8]) -> QueryBuilder`	Greater than on raw bytes
`gt_u64`	`(value: u64) -> QueryBuilder`	Greater than on u64
`gt_str`	`(value: &str) -> QueryBuilder`	Greater than on string
`gt_f64`	`(value: f64) -> QueryBuilder`	Greater than on f64

Less Than

Method	Signature	Description
`lt`	`(value: &[u8]) -> QueryBuilder`	Less than on raw bytes
`lt_u64`	`(value: u64) -> QueryBuilder`	Less than on u64
`lt_str`	`(value: &str) -> QueryBuilder`	Less than on string
`lt_f64`	`(value: f64) -> QueryBuilder`	Less than on f64

Range

Method	Signature	Description
`between`	`(min: &[u8], max: &[u8]) -> QueryBuilder`	Inclusive range on raw bytes
`between_u64`	`(min: u64, max: u64) -> QueryBuilder`	Inclusive range on u64
`between_str`	`(min: &str, max: &str) -> QueryBuilder`	Inclusive range on strings

Set Membership

Method	Signature	Description
`in_values`	`(values: &[&[u8]]) -> QueryBuilder`	Match any of the given byte values
`in_u64`	`(values: &[u64]) -> QueryBuilder`	Match any of the given u64 values
`in_str`	`(values: &[&str]) -> QueryBuilder`	Match any of the given strings

Text Search

Method	Signature	Description
`contains`	`(text: &str) -> QueryBuilder`	Substring / trigram contains search
`similar`	`(text: &str, threshold: f64) -> QueryBuilder`	Trigram similarity search (0.0–1.0)
`phonetic`	`(text: &str) -> QueryBuilder`	Soundex/Metaphone phonetic search
`fuzzy`	`(text: &str) -> QueryBuilder`	Levenshtein distance fuzzy search
`match_query`	`(text: &str) -> QueryBuilder`	Full-text match query

Boolean Combinators

Method	Signature	Description
`and`	`(build_fn: FnOnce(QueryBuilder) -> QueryBuilder) -> Self`	AND group via closure
`or`	`(build_fn: FnOnce(QueryBuilder) -> QueryBuilder) -> Self`	OR group via closure
`not`	`(build_fn: FnOnce(QueryBuilder) -> QueryBuilder) -> Self`	Negate a condition via closure

Sorting and Pagination

Method	Signature	Description
`sort`	`(field: impl Into<String>, direction: SortDirection) -> Self`	Add a sort field
`limit`	`(count: usize) -> Self`	Limit result count
`offset`	`(count: usize) -> Self`	Skip the first N results

Execution

Method	Signature	Description
`execute`	`(self) -> Result<Vec<QueryResult>, PluginError>`	Execute the query via host FFI
`to_json`	`(&self) -> serde_json::Value`	Serialize builder state to JSON (for inspection/debugging)

`QueryResult`

A single query result returned by the host.

Field	Type	Description
`path`	`String`	Path of the matching file
`score`	`f64`	Relevance score (higher is better, default 0.0)
`matched_by`	`Vec<String>`	Names of the indexes/operations that matched (default empty)

`SortDirection`

Sort direction for query results.

Variant	Description
`Asc`	Ascending order
`Desc`	Descending order

`AggregateBuilder`

Fluent builder for constructing AeorDB aggregation queries. Obtained via PluginContext::aggregate(path) or AggregateBuilder::new(path).

Aggregation Operations

Method	Signature	Description
`count`	`(self) -> Self`	Request a count aggregation
`sum`	`(field: impl Into<String>) -> Self`	Request a sum on a field
`avg`	`(field: impl Into<String>) -> Self`	Request an average on a field
`min_val`	`(field: impl Into<String>) -> Self`	Request a minimum value on a field
`max_val`	`(field: impl Into<String>) -> Self`	Request a maximum value on a field

Grouping and Filtering

Method	Signature	Description
`group_by`	`(field: impl Into<String>) -> Self`	Group results by a field
`filter`	`(build_fn: FnOnce(QueryBuilder) -> QueryBuilder) -> Self`	Add a where condition via closure
`limit`	`(count: usize) -> Self`	Limit the number of groups returned

Execution

Method	Signature	Description
`execute`	`(self) -> Result<AggregateResult, PluginError>`	Execute the aggregation via host FFI
`to_json`	`(&self) -> serde_json::Value`	Serialize builder state to JSON

`AggregateResult`

Aggregation result returned by the host.

Field	Type	Description
`groups`	`Vec<serde_json::Value>`	Per-group aggregation results (default empty)
`total_count`	`Option<u64>`	Total count if `count` was requested without `group_by`

Garbage Collection

AeorDB is an append-only database: writes, overwrites, and deletes all append new entries without modifying existing ones. Over time, this leaves orphaned entries – old file versions, deleted content, stale directory indexes – consuming disk space. Garbage collection (GC) reclaims that space.

What GC Does

GC uses a mark-and-sweep algorithm:

Mark phase: Starting from HEAD, all snapshots, and all forks, GC walks every reachable directory tree. It marks every entry that is still live: directory indexes, file records, content chunks, system tables, task queue records, and deletion records. GC also treats current path-key FileRecords (file:{path}) as roots for their chunks, so a stale directory HEAD cannot cause a file that is still readable by path to lose its chunk data.
Sweep phase: GC iterates all entries in the key-value store. Any entry whose hash is not in the live set is garbage. Non-live entries are overwritten in-place with DeletionRecord or Void entries, then removed from the KV index.

What Gets Collected

Old file versions that have been overwritten
Content chunks no longer referenced by any file record
Directory indexes from previous tree states
Entries orphaned by deletes

What Does NOT Get Collected

HEAD and all reachable entries from HEAD
All snapshot root trees and their descendants
All fork root trees and their descendants
System table entries (/.system, /.config)
Task queue records (registry + individual task entries)
DeletionRecord entries (needed for KV rebuild from .aeordb scan)
Chunks referenced by a live path-key FileRecord, even if a directory index is temporarily stale

Running GC

CLI

# Run GC
aeordb gc --database data.aeordb

# Dry run -- report what would be collected without deleting
aeordb gc --database data.aeordb --dry-run

Example output:

AeorDB Garbage Collection
Database: data.aeordb

Versions scanned: 3
Live entries:     1247
Garbage entries:  89
Reclaimed:        1.2 MB
Duration:         0.3s

Dry run output:

AeorDB Garbage Collection [DRY RUN]
Database: data.aeordb

[DRY RUN] Would collect 89 garbage entries (1.2 MB)

HTTP API

Synchronous GC (blocks until complete):

# Run GC
curl -X POST http://localhost:6830/system/gc \
  -H "Authorization: Bearer $API_KEY"

# Dry run
curl -X POST http://localhost:6830/system/gc?dry_run=true \
  -H "Authorization: Bearer $API_KEY"

Response:

{
  "versions_scanned": 3,
  "live_entries": 1247,
  "garbage_entries": 89,
  "reclaimed_bytes": 1258291,
  "duration_ms": 312,
  "dry_run": false
}

Background GC (returns immediately, runs as a task):

curl -X POST http://localhost:6830/system/tasks/gc \
  -H "Authorization: Bearer $API_KEY"

This enqueues a GC task that the background task worker will pick up. Track its progress via the task system.

When to Run GC

After bulk deletes: If you delete a large number of files, their content chunks become garbage.
After bulk overwrites: Updating many files leaves old versions behind.
After version cleanup: If you delete old snapshots, the entries they exclusively referenced become garbage.
Periodically: Set up a cron schedule for automatic GC.

Example cron configuration (/.config/cron.json):

{
  "schedules": [
    {
      "id": "nightly-gc",
      "task_type": "gc",
      "schedule": "0 3 * * *",
      "args": {},
      "enabled": true
    }
  ]
}

Concurrency and Safety

GC should not be run concurrently with writes. The sweep phase re-verifies each candidate against the current KV state before overwriting to mitigate races, but for full safety, callers should ensure exclusive access during GC.

Before a non-dry-run GC, AeorDB normally creates an engine-internal _aeordb_pre_gc_* snapshot as a safety net. If lifecycle configuration has "snapshot_writes_enabled": false, GC skips that safety snapshot and continues with the mark-and-sweep run. Existing snapshots are still honored as live roots either way.

Crash safety: If the process crashes mid-sweep, the .aeordb file may contain partially overwritten entries. On restart, the .kv index file will be stale and must be deleted to trigger a full rebuild from the .aeordb file scan. The rebuild replays deletion records and reconstructs the index, so no committed data is lost. Garbage entries that were not yet swept will persist until the next GC run.

Performance

At scale, expect approximately 10K entries/sec sweep throughput. The mark phase is faster since it only walks reachable trees. The sweep phase writes are batched with a single sync at the end for performance.

Diagnostic Environment Variables

The engine exposes a couple of opt-in diagnostic env vars for ops debugging. They are no-ops unless set, so leaving them unset costs nothing.

Variable Effect

AEORDB_GC_TIMING Per-phase wall-clock timing of GC mark + sweep on stderr (per-level BFS times, frontier sizes, dedup/sort/read breakdown, deletion-pass count). Useful for narrowing down where a slow GC is spending time.

AEORDB_GC_MEM_PROFILE Per-phase RSS sampling around mark, mark.bfs, mark.deletion-pass, recheck-drain, and sweep. Each phase emits [gc-mem] <phase>: baseline_rss=X MB peak_rss=Y MB end_rss=Z MB delta_hwm=W MB elapsed=.... Also activates wider PhaseSampler coverage in the file-write paths (store_file_from_reader, store_chunk, finalize_file) and snapshot operations. On macOS the values come from Mach task_info; on Linux from /proc/self/status.

These are intended for ops investigations, not normal operation. Set them only when you’re chasing a specific problem.

Async Index Cleanup

When files are deleted, their index entries are cleaned up asynchronously in the background. Deletions are debounced with a 50ms timeout and batched up to 100 paths per batch. This means index cleanup does not block the delete response, and rapid successive deletes are coalesced into efficient batch operations.

Reindexing

When you change a table’s index configuration (indexes.json), existing files need to be re-processed through the indexing pipeline. AeorDB handles this through background reindex tasks.

Why Reindex

You added a new index field (e.g., adding a fulltext index on a field that was previously unindexed)
You changed the index type for a field (e.g., switching from exact to fulltext)
You added or changed a parser plugin, and existing files need to be re-parsed
You modified index settings (e.g., changing similarity thresholds)

Automatic Reindexing

Changing indexes.json via the API automatically triggers a background reindex task for the affected directory. You do not need to manually trigger reindexing in most cases.

If every configured field is a virtual metadata field (@filename, @hash, @size, and so on), the automatic task uses metadata-only reindexing. That path reads FileRecord metadata only and does not read or parse file bodies. Mixed configs that include content fields still use the full parser/content indexing path.

Manual Reindexing

HTTP API

curl -X POST http://localhost:6830/system/tasks/reindex \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"path": "/data/", "metadata_only": true}'

The path argument specifies which directory to reindex. Manual API reindexing defaults to force: false, which means index-only reprocessing. Pass "force": true only when you deliberately want to migrate older FileRecord payloads while reindexing.

Set "metadata_only": true when you only need to rebuild virtual @ metadata indexes. This skips full file reads, JSON parsing, and parser plugins. Content fields in the config are ignored in metadata-only mode.

Optional flush controls:

Field	Default	Description
`index_flush_writes`	`262144`	Flush buffered index mutations after this many field/strategy updates
`index_flush_ms`	`30000`	Flush buffered index mutations after this many milliseconds

The task worker will:

Read the indexes.json configuration for that path
List all file entries in the directory, or when force is true, scan current live FileRecord path keys in the requested subtree
If force is true, rewrite any older FileRecord payloads to the current version before indexing
Rebuild indexes through either the metadata-only path or the full parser/content pipeline
Buffer index writes in memory and flush them by write count, elapsed time, or final completion
Track progress and update checkpoints

Automatic reindexing triggered by indexes.json changes is index-only. Forced reindexing is the deliberate migration path: if a FileRecord is older than the current payload version, AeorDB rewrites the path, identity, and current content-addressed FileRecord entries using the current writer while preserving the file’s timestamps, metadata, chunks, and parent directory entry. For FileRecord v0, this backfills the stored whole-file content_hash used by @hash.

Forced migration includes live path-key records under internal system/config paths such as /.aeordb-system and /.aeordb-config. Those records can be migrated, but internal/system files are still skipped by the indexing pipeline.

Glob-Aware Reindexing

When the index config includes a glob field, the reindex task uses recursive directory listing instead of direct children only. Files are filtered by the glob pattern before processing.

For example, a config at /sessions/ with "glob": "*/session.json" will recursively list all files under /sessions/, filter to those matching */session.json, and reindex each one.

Progress Tracking

During an active reindex, query responses include a meta.reindexing field indicating that results may be incomplete:

{
  "results": [...],
  "meta": {
    "reindexing": true
  }
}

You can also check progress through the task system:

curl http://localhost:6830/system/tasks \
  -H "Authorization: Bearer $API_KEY"

The response includes progress details:

{
  "tasks": [
    {
      "id": "abc123",
      "task_type": "reindex",
      "status": "running",
      "progress": {
        "task_id": "abc123",
        "task_type": "reindex",
        "progress": 0.45,
        "eta_ms": 12000,
        "indexed_count": 450,
        "total_count": 1000,
        "message": "indexed 450/1000 files"
      }
    }
  ]
}

Circuit Breaker

If 10 consecutive files fail to index, the reindex task trips a circuit breaker and fails with an error:

circuit breaker: 10 consecutive indexing failures

This prevents runaway error loops when the index configuration or parser is fundamentally broken. Fix the underlying issue and trigger a new reindex.

Checkpoint and Resume

Reindex tasks save checkpoints as processed work becomes durable. Because index writes are buffered in memory during reindexing, AeorDB only advances the checkpoint past buffered index mutations after those mutations have been flushed to storage. If the server crashes before a buffer flush, the resumed task may repeat some already-scanned files, but it will not skip unflushed index updates.

The checkpoint is the name of the last successfully processed file (files are processed in alphabetical order for deterministic ordering).

Cancellation

Cancel a running reindex task:

curl -X POST http://localhost:6830/system/tasks/{task_id}/cancel \
  -H "Authorization: Bearer $API_KEY"

The task checks for cancellation after each batch, so it will stop within one batch cycle.

Batch Processing

Files are processed in batches of 50. Index file updates are cached in memory and flushed after 262,144 index mutations or 30 seconds by default, plus one final flush at completion. This avoids rewriting the full on-disk index file after every file/field update during large reindexes.

After each batch, the task:

Advances the checkpoint when all prior index mutations are durable
Computes progress percentage and ETA (using a rolling average of the last 10 batch times)
Checks for cancellation

Task System & Cron

AeorDB runs long-running operations (reindexing, garbage collection) as background tasks. Tasks are managed by a task queue, executed by a dedicated worker, and can be triggered manually or on a cron schedule.

Built-in Task Types

Task Type	Description
`reindex`	Re-run the indexing pipeline on all files under a directory
`gc`	Run garbage collection (mark-and-sweep)
`backup`	Export HEAD (or a named snapshot) as a timestamped `.aeordb` file

Task Lifecycle

pending  -->  running  -->  completed
                       -->  failed
                       -->  cancelled

Pending: Task is enqueued and waiting for the worker to pick it up.
Running: Worker has dequeued the task and is executing it.
Completed: Task finished successfully.
Failed: Task encountered an error (e.g., circuit breaker tripped, GC failed).
Cancelled: Task was cancelled by the user between batch iterations.

On server startup, any tasks left in Running state (from a previous crash) are reset to Pending so they can be re-executed.

API

List Tasks

curl http://localhost:6830/system/tasks \
  -H "Authorization: Bearer $API_KEY"

Response:

{
  "tasks": [
    {
      "id": "abc123",
      "task_type": "reindex",
      "status": "running",
      "args": {"path": "/data/"},
      "created_at": 1700000000000,
      "progress": {
        "task_id": "abc123",
        "task_type": "reindex",
        "progress": 0.65,
        "eta_ms": 8000,
        "indexed_count": 650,
        "total_count": 1000,
        "message": "indexed 650/1000 files"
      }
    },
    {
      "id": "def456",
      "task_type": "gc",
      "status": "completed",
      "args": {},
      "created_at": 1699999000000
    }
  ]
}

Trigger a Task

Reindex:

curl -X POST http://localhost:6830/system/tasks/reindex \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"path": "/data/", "metadata_only": true}'

Manual API reindexing defaults to force: false, which is index-only. Use "force": true to also migrate older FileRecord payloads to the current version before indexing.

Use "metadata_only": true to rebuild only virtual @ metadata indexes without reading file bodies or invoking parsers. Reindex tasks buffer index writes in memory and flush them after index_flush_writes mutations or index_flush_ms milliseconds. Defaults are 262,144 mutations and 30,000 ms.

Garbage Collection:

curl -X POST http://localhost:6830/system/tasks/gc \
  -H "Authorization: Bearer $API_KEY"

Cancel a Task

curl -X POST http://localhost:6830/system/tasks/{task_id}/cancel \
  -H "Authorization: Bearer $API_KEY"

Cancellation is cooperative: the task checks for cancellation between batch iterations. It will not interrupt a batch in progress.

Progress Tracking

Running tasks expose in-memory progress information:

Field	Type	Description
`task_id`	`String`	Task identifier
`task_type`	`String`	Task type (e.g., `"reindex"`)
`progress`	`f64`	Completion fraction (0.0 to 1.0)
`eta_ms`	`Option<i64>`	Estimated time remaining in milliseconds
`indexed_count`	`usize`	Number of items processed so far
`total_count`	`usize`	Total items to process
`message`	`Option<String>`	Human-readable progress message

Progress is computed using a rolling average of the last 10 batch execution times for ETA calculation.

During an active reindex, query responses include meta.reindexing: true so clients know results may be incomplete.

Cron Scheduling

AeorDB includes a built-in cron scheduler that checks /.config/cron.json every 60 seconds and enqueues matching tasks.

Configuration

Store the cron configuration at /.config/cron.json:

{
  "schedules": [
    {
      "id": "nightly-gc",
      "task_type": "gc",
      "schedule": "0 3 * * *",
      "args": {},
      "enabled": true
    },
    {
      "id": "hourly-reindex",
      "task_type": "reindex",
      "schedule": "0 * * * *",
      "args": {"path": "/data/"},
      "enabled": true
    }
  ]
}

Cron Expression Format

Standard 5-field Unix cron expressions:

minute  hour  day-of-month  month  day-of-week
  *       *        *          *        *

Examples:

0 3 * * * – every day at 3:00 AM
*/15 * * * * – every 15 minutes
0 0 * * 0 – every Sunday at midnight
30 2 1 * * – 2:30 AM on the 1st of every month

Cron API

Create/update the schedule:

curl -X PUT http://localhost:6830/.config/cron.json \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "schedules": [
      {
        "id": "nightly-gc",
        "task_type": "gc",
        "schedule": "0 3 * * *",
        "args": {},
        "enabled": true
      }
    ]
  }'

Read the schedule:

curl http://localhost:6830/.config/cron.json \
  -H "Authorization: Bearer $API_KEY"

Disable a schedule (set enabled: false and re-upload):

# Fetch, modify, re-upload

Deduplication

The cron scheduler checks whether a task with the same type and arguments is already pending or running before enqueuing. This prevents duplicate tasks from stacking up if a previous run hasn’t finished.

CronSchedule Fields

Field	Type	Description
`id`	`String`	Unique identifier for this schedule
`task_type`	`String`	Task type to enqueue (e.g., `"gc"`, `"reindex"`)
`schedule`	`String`	5-field Unix cron expression
`args`	`serde_json::Value`	Arguments passed to the task
`enabled`	`bool`	Whether this schedule is active (default `true`)

Task Retention

Completed tasks are automatically pruned:

Tasks older than 24 hours are removed
At most 100 completed tasks are retained

Pruning runs after each task completes.

Events

The task system emits events on the event bus:

Event	Description
`tasks_started`	A task has begun execution
`tasks_completed`	A task finished successfully
`tasks_failed`	A task encountered an error
`gc_started`	GC has begun execution
`gc_completed`	GC-specific completion event with statistics

Backup & Restore

AeorDB supports exporting database versions as self-contained .aeordb files, creating incremental patches between versions, importing backups, and promoting version hashes.

Concepts

Full export: A clean .aeordb file containing only the live entries at a specific version. No voids, no deletion records, no stale overwrites, no history.
Patch (diff): A .aeordb file containing only the changeset between two versions – new/changed chunks, updated file records, updated directory indexes, and deletion records for removed files.
Import: Applying an export or patch into a target database.
Promote: Setting a version hash as the current HEAD.

Privileged Backup (`--root-key`)

By default, aeordb export includes only user data — the export omits everything under /.aeordb-system/ (users, groups, snapshot records) and stops at the specified version (HEAD or one named snapshot).

When you supply the source database’s root API key, the CLI unlocks privileged backup mode:

System data is included: /.aeordb-system/users/, /.aeordb-system/groups/, /.aeordb-system/snapshots/, /.aeordb-system/config/.
All named snapshots are walked, not just HEAD. The exported .aeordb carries the full snapshot history.
Credentials are NEVER exported, regardless of the key: /.aeordb-system/api-keys/, /.aeordb-system/refresh-tokens/, and /.aeordb-system/magic-links/ are always filtered out. Credentials are tied to the database identity that issued them; importing them would conflict with the target’s own bootstrap key.

Supply the key via flag or environment variable:

# Flag
aeordb export -D source.aeordb -o backup.aeordb \
  --root-key aeor_k_…

# Environment variable
AEORDB_ROOT_KEY=aeor_k_… aeordb export -D source.aeordb -o backup.aeordb

When importing a backup that contains system data, the target database’s root key must be provided the same way. This proves ownership of the destination before user/group/snapshot records are merged.

Full Export

Export HEAD, a named snapshot, or a specific version hash as a self-contained backup.

CLI

# Export HEAD
aeordb export --database data.aeordb --output backup.aeordb

# Export a named snapshot
aeordb export --database data.aeordb --output backup.aeordb --snapshot v1

# Export a specific version hash
aeordb export --database data.aeordb --output backup.aeordb --hash abc123def456...

The output file must not already exist – the command will refuse to overwrite.

HTTP API

curl -X POST http://localhost:6830/versions/export \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"output": "backup.aeordb"}'

With a snapshot:

curl -X POST http://localhost:6830/versions/export \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"output": "backup.aeordb", "snapshot": "v1"}'

Note: HTTP exports never include system data or other snapshots — that’s CLI-only with --root-key. The HTTP endpoint is for sharing a single version’s user data.

Output

Export complete.
  Files: 142
  Chunks: 89
  Directories: 23
  Version: abc123def456...

Diff / Patch

Create an incremental patch containing only the changes between two versions. This is significantly smaller than a full export when only a few files have changed.

CLI

# Diff between two snapshots
aeordb diff --database data.aeordb --output patch.aeordb --from v1 --to v2

# Diff from a snapshot to HEAD
aeordb diff --database data.aeordb --output patch.aeordb --from v1

# Diff using raw hashes
aeordb diff --database data.aeordb --output patch.aeordb --from abc123... --to def456...

The --from and --to arguments accept either snapshot names or hex-encoded version hashes. If --to is omitted, HEAD is used.

HTTP API

curl -X POST http://localhost:6830/versions/diff \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"output": "patch.aeordb", "from": "v1", "to": "v2"}'

Output

Patch created.
  Files added: 5
  Files modified: 12
  Files deleted: 3
  Chunks: 8
  Directories: 7
  From: abc123...
  To:   def456...

What a Patch Contains

New chunks: Content chunks that exist in the target version but not the base version
Added file records: Files present in the target but not the base
Modified file records: Files that changed between the two versions
Deletion records: Files present in the base but removed in the target
Changed directory indexes: Directory entries that differ between versions

Import

Apply a full export or incremental patch to a target database.

CLI

# Import a full export
aeordb import --database data.aeordb --file backup.aeordb

# Import and immediately promote HEAD
aeordb import --database data.aeordb --file backup.aeordb --promote

# Force import a patch even if base version doesn't match
aeordb import --database data.aeordb --file patch.aeordb --force

# Import a privileged backup that contains system data (users, groups,
# snapshots). Required when the backup was made with --root-key.
aeordb import --database data.aeordb --file backup.aeordb \
  --root-key aeor_k_…

Flags:

--promote: Automatically set HEAD to the imported version
--force: Skip base version verification for patches
--root-key <key> (or AEORDB_ROOT_KEY env var): Required when the backup contains system data. The key must be the target database’s root key — proving you own where the data is going.

When the backup contains system data and no root key is provided, the import succeeds but skips the system entries (a warning is printed).

HTTP API

curl -X POST http://localhost:6830/versions/import \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"file": "backup.aeordb", "promote": true}'

Patch Base Version Check

When importing a patch, AeorDB verifies that the target database’s current HEAD matches the patch’s base version. If they don’t match, the import fails:

Target database HEAD (aaa111...) does not match patch base version (bbb222...).
Use --force to apply anyway.

Use --force to skip this check if you know what you’re doing.

Output

Full export imported.
  Entries: 254
  Chunks: 89
  Files: 142
  Directories: 23
  Deletions: 0
  Version: abc123...

  HEAD has been promoted.

If --promote was not used:

  HEAD has NOT been changed.
  To promote: aeordb promote --hash abc123...

Promote

Set a specific version hash as the current HEAD.

CLI

aeordb promote --database data.aeordb --hash abc123def456...

The command verifies that the hash exists in the database before promoting.

HTTP API

curl -X POST http://localhost:6830/versions/promote \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"hash": "abc123def456..."}'

Typical Workflows

Regular Backups

# Create a snapshot first
curl -X POST http://localhost:6830/versions/snapshots \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"name": "daily-2024-01-15"}'

# Export it
aeordb export --database data.aeordb \
  --output backups/daily-2024-01-15.aeordb \
  --snapshot daily-2024-01-15

Incremental Backups

# First backup: full export
aeordb export --database data.aeordb --output backups/full.aeordb --snapshot v1

# Subsequent backups: just the diff
aeordb diff --database data.aeordb --output backups/patch-v1-v2.aeordb --from v1 --to v2

Restore from Backup

# Import the full backup
aeordb import --database restored.aeordb --file backups/full.aeordb --promote

# Apply incremental patches in order
aeordb import --database restored.aeordb --file backups/patch-v1-v2.aeordb --promote

Migrate Between Servers

# On source server
aeordb export --database data.aeordb --output transfer.aeordb

# Copy to target server
scp transfer.aeordb target-server:/data/

# On target server
aeordb import --database data.aeordb --file transfer.aeordb --promote

Automated Backup Scheduling

Use the "backup" task type with the cron scheduler to run automated backups on a schedule. The backup task exports HEAD (or a named snapshot) as a timestamped .aeordb file and optionally enforces a retention policy.

Cron Configuration

{
  "schedules": [
    {
      "id": "nightly-backup",
      "task_type": "backup",
      "schedule": "0 1 * * *",
      "args": {
        "backup_dir": "/var/backups/aeordb/",
        "retention_count": 7
      },
      "enabled": true
    }
  ]
}

Backup Task Arguments

Argument	Type	Default	Description
`backup_dir`	string	`"./backups/"`	Destination directory for backup files
`retention_count`	integer	`0`	Keep at most this many `.aeordb` files in the backup directory. `0` means unlimited.
`snapshot`	string	–	Export a named snapshot instead of HEAD

The task creates a timestamped filename (e.g., backup-head-20260419T030000.000Z.aeordb) to avoid collisions. When retention_count is set, the oldest .aeordb files in backup_dir are deleted after each backup to stay within the limit.

Example: Weekly Backups with 4-Week Retention

curl -X POST http://localhost:6830/system/cron \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "weekly-backup",
    "schedule": "0 2 * * 0",
    "task_type": "backup",
    "args": {
      "backup_dir": "/var/backups/aeordb/",
      "retention_count": 4
    },
    "enabled": true
  }'

Cluster Operations

Setting Up a Cluster

Starting the First Node

Start a node normally. It operates as a standalone database:

aeordb start -D nodeA.aeordb --auth self

Save the root API key it prints — you’ll use it as the join token for new nodes.

Joining a Cluster

A new node joins by calling /sync/join on an existing member. This fetches the cluster’s JWT signing key (so tokens validate cluster-wide) and registers both nodes as peers of each other.

aeordb start -D nodeB.aeordb --auth self \
  --port 6841 \
  --join http://nodeA:6830 \
  --join-token "$NODE_A_ROOT_KEY"

After join, node B:

Adopts node A’s JWT signing key (persisted in /.aeordb-system/config/jwt_signing_key)
Adds node A as a peer (persisted)
Is automatically added as a peer on node A

The flag is one-shot: subsequent restarts of node B do not need --join.

Adding Peers Manually

For nodes already in a cluster (sharing the same signing key), peers can be added without rejoining:

# At startup — comma-separated peer URLs, idempotent
aeordb start -D data.aeordb --peers "http://nodeC:6830,http://nodeD:6830"

# At runtime
curl -X POST http://localhost:6830/sync/peers \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"address": "https://nodeC:6830", "label": "US West"}'

Authentication

All /sync/* endpoints require JWT authentication. Nodes mint short-lived root JWTs (nil UUID sub) internally when calling each other’s /sync/diff and /sync/chunks endpoints. This works because every node in the cluster shares the same Ed25519 signing key (distributed by /sync/join).

The .system/ namespace is never exposed through the HTTP file APIs (/files/, /links/, /blobs/) — not even to root users. Instead, the sync system transfers .system/ data internally through /sync/diff and /sync/chunks when the caller is authenticated as root. Operators do not need to manage .system/ data manually.

TLS

The server listener can be terminated with TLS using --tls-cert and --tls-key. There is no separate “cluster TLS” toggle — peer URLs use whichever scheme you configure (http:// or https://). For production, use https:// peer addresses with valid certificates, or place nodes behind a private network.

Monitoring

Cluster Status

# This node's view of the cluster
curl http://localhost:6830/sync/status \
  -H "Authorization: Bearer $TOKEN"

# Peer list with sync state and last-sync timestamps
curl http://localhost:6830/sync/peers \
  -H "Authorization: Bearer $TOKEN"

Connection States

Each peer connection has a state:

State	Meaning
Disconnected	No active connection
Honeymoon	Clock settling in progress — no data sync yet
Active	Fully synced and exchanging data

The honeymoon phase is mandatory on every connect/reconnect. It ensures clocks are calibrated before any data is exchanged.

Triggering Sync

Sync happens automatically via SSE events and periodic fallback. You can also trigger it manually:

# Sync with all peers
curl -X POST http://localhost:6830/sync/trigger \
  -H "Authorization: Bearer $TOKEN"

Managing Conflicts

Listing Conflicts

curl http://localhost:6830/sync/conflicts \
  -H "Authorization: Bearer $TOKEN"

Resolving Conflicts

# Pick the auto-winner (default — higher timestamp)
curl -X POST http://localhost:6830/sync/conflicts/path/to/file \
  -H "Authorization: Bearer $TOKEN"

# Pick a specific version
curl -X POST http://localhost:6830/sync/conflicts/path/to/file \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"pick": "loser"}'

Selective Sync

Configure per-peer path filters:

curl -X POST http://localhost:6830/sync/peers \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "address": "https://cdn-edge:6830",
    "label": "CDN Edge",
    "sync_paths": ["/public/**"]
  }'

Client Sync

In addition to peer-to-peer replication, AeorDB supports client sync using the same endpoints.

Authentication

All sync endpoints use JWT Bearer token authentication:

Caller	Access Level
Root JWT (nil UUID)	Full sync access – `.system/` data is included automatically by the sync internals
Non-root JWT	Filtered access (scoped)

The .system/ namespace is completely invisible through all HTTP file APIs, regardless of caller privilege. During sync, .system/ data is transferred internally when the sync system detects a root JWT caller – this is handled by the include_system() mechanism and requires no operator intervention. Non-root tokens get filtered results:

API key scoping rules restrict which paths are visible
Chunks with the FLAG_SYSTEM flag are never served through file APIs

Example: Client Sync

# Get a JWT token
TOKEN=$(curl -s -X POST http://localhost:6830/auth/token \
  -H "Content-Type: application/json" \
  -d '{"api_key": "aeor_k_..."}' | jq -r .token)

# Sync diff — only see changes for allowed paths
curl -X POST http://localhost:6830/sync/diff \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"paths": ["/assets/**"]}'

# Fetch needed chunks
curl -X POST http://localhost:6830/sync/chunks \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"hashes": ["abc123...", "def456..."]}'

Scoped API Keys for Sync

Create a scoped API key for a client that should only sync specific paths:

curl -X POST http://localhost:6830/auth/keys \
  -H "Authorization: Bearer $ROOT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "label": "Designer MacBook - assets only",
    "rules": [
      {"/assets/**": "-r--l---"},
      {"**": "--------"}
    ]
  }'

The client using this key will only see /assets/ changes in sync responses, regardless of what paths filter it requests.

Troubleshooting

Node stuck in Honeymoon

The clock hasn’t settled. Possible causes:

High network jitter between nodes
Large clock offset (> 30 seconds) — check NTP
Firewall blocking heartbeat messages

Sync not happening

Check peer is in Active state: GET /sync/peers
Verify the signing key is shared. If you started a node without --join, its JWTs won’t validate on the other nodes — re-join via /sync/join or copy /.aeordb-system/config/jwt_signing_key manually
Verify network connectivity between nodes
Trigger manual sync: POST /sync/trigger — the response includes per-peer success/failure with error messages

Data inconsistency after sync

Check for unresolved conflicts: GET /sync/conflicts
Conflicts are normal — they mean two nodes wrote the same file
Resolve conflicts to reconcile the data

Threat Model

This document describes the trust boundaries, assets, attacker classes, and mitigations for AeorDB. It is a working document — keep it current as the security surface evolves.

Trust Boundaries

AeorDB has five distinct trust boundaries. An identity that holds rights on one side of a boundary should not be presumed to hold rights on the other.

Boundary	Inner trust	Outer trust
Peer ↔ Peer	Cluster signing key holder	Anyone reachable over the network
Server ↔ Client	Server process + filesystem	HTTP clients (browsers, CLIs, services)
Root user ↔ Scoped user	Root API key holder	Other identities (per-key scoped tokens)
Public mode	Anyone — but writes are forbidden	Same
Auth-required mode	Authenticated identities	Anonymous callers (rejected at edge)

Assets

What an attacker would want to obtain or modify:

The Ed25519 cluster signing key. Held in /.aeordb-system/jwt_signing_key. Holding this key lets an attacker mint JWTs that any peer in the cluster accepts. Exposed by /sync/join to a successful (root-authed) caller — see “Compromised peer” below.
Refresh tokens. Held in /.aeordb-system/refresh-tokens/ (hashed). A live refresh token mints new JWTs for its subject. Now bound to a specific API key (the issuing key); revoking the key invalidates the chain on next refresh.
Root API keys. Held in /.aeordb-system/api-keys/. Compromise = total cluster access.
Scoped API keys. Limited to a (path, operation) rule set. Compromise = leakage of the data in scope.
File content + metadata. What users actually store. Confidentiality is via auth-required mode + key scoping. Integrity is via content-addressed hashes (BLAKE3) at the entry layer.

Attacker Classes

Untrusted client with valid scoped key

Capabilities: Holds a non-root API key with a defined (path, operation) scope. Cannot list keys, users, or system state.

What they should not be able to do:

Access paths outside their scope, even via aliases (symlinks, version-history queries, /blobs/{hash}).
Mint a root JWT or refresh a token bound to a different key.
Read or write /.aeordb-system/ or /.aeordb-config/ directly.

Mitigations:

Per-key ActiveKeyRules checked on every request that touches a path.
/blobs/{hash} and /files/download now route through the same scope check as /files/{path} (closes beta-audit P0).
/auth/refresh validates the issuing key’s is_revoked and expires_at on every refresh, so revoking a leaked key terminates outstanding refresh chains.

Compromised peer

Capabilities: Holds the cluster signing key (acquired via a successful /sync/join or by physical access to a node). Can mint JWTs accepted by every peer.

What they should not be able to do:

Be admitted invisibly. Every /sync/join is now rate-limited per-IP and audit-logged to /.aeordb-system/join-audit/.
Cause silent data loss in other peers. Sync conflicts go through the LWW conflict store, which preserves losers as snapshots.

Mitigations:

/sync/join per-IP rate limit (5/min, 30/hour).
Audit log at /.aeordb-system/join-audit/<ts>-<ip>.json for every join attempt.
Sync requires scope: "sync" JWT, not arbitrary root tokens.

Residual risk: A compromised root token + ability to reach /sync/join once = one signing-key extraction. The audit log + rate limit raise the cost but don’t eliminate it. Encryption-at-rest (separate project) is the long-term mitigation.

On-path observer

Capabilities: Sees raw HTTP between client and server, or between peers.

What they should not be able to do:

Recover content if TLS terminates at the server.
Replay captured JWTs after they expire.

Mitigations:

AeorDB does not terminate TLS itself; operators should deploy behind a reverse proxy with TLS (caddy, nginx).
JWTs are short-lived (default 1 hour). Refresh tokens are bound to their issuing key.

Residual risk: Without TLS, all the above mitigations are moot — the threat model assumes TLS at the edge.

Lost-laptop scenario

Capabilities: Adversary has the .aeordb file plus filesystem access.

What they should not be able to do:

Bypass authentication if the database is encrypted at rest.

Mitigations:

(Future) Full database-level encryption project. Until that lands, treat the .aeordb file as containing plaintext content + hashed credentials.

Residual risk: Current state. File-level encryption is the next major security project.

Public-mode vs Auth-required mode

In public mode, all read endpoints are open. Writes are still rejected at the auth middleware. The threat model in public mode is: anyone can read everything; this is acknowledged and intentional.

In auth-required mode, the auth middleware rejects every request without a valid JWT or root key. Scoped keys further restrict the path space.

Out of Scope

Side-channel attacks (timing, cache).
Physical attacks on the host.
Insider attacks by an operator with shell access.
Denial-of-service attacks beyond what TLS terminators handle.

API Versioning Policy

Current State

AeorDB is pre-beta. There is exactly one supported API version, and no external clients are running prior builds. The HTTP API is not prefixed with /v1/; the sync wire protocol carries no explicit version header. Both will change before beta.

This document records the policy that will apply once beta opens.

HTTP API

URL prefixing

Beta-and-later HTTP routes will be prefixed with /v1/. Unprefixed routes will continue to serve /v1/ semantics for one minor release after the prefix lands (a grace window), then be removed.

Version negotiation

Clients indicate desired version by URL prefix (/v1/files, /v2/files).
An Accept: application/vnd.aeordb.v2+json header is also accepted; the URL prefix wins if both are present.

Breaking changes

A change is breaking if any of the following occurs:

A field is renamed or its type changes.
A field becomes required where it was optional, or vice versa.
An error code or status code changes for a given input.
A default value changes.

Breaking changes go to the next major version. Non-breaking additions (new optional fields, new endpoints) ship in the current major.

Deprecation

A deprecated endpoint returns Deprecation: true and Sunset: <RFC 1123 date> headers.
A deprecated endpoint must remain functional for 6 months after the deprecation header first ships, or one major version, whichever is longer.
After sunset: the endpoint returns 410 Gone.

Error format when a client sends an unknown version

Unknown /vN/ prefix: 404 with body {"error": "unknown API version: vN", "supported": ["v1"]}.
Unknown Accept version header: 406 Not Acceptable with the same body shape.

Sync Wire Protocol

The peer-to-peer sync protocol is more sensitive than the public HTTP API because both endpoints must agree on the wire format byte-for-byte.

Version handshake

The first request a peer makes on a new connection (typically /sync/state or /sync/handshake if introduced) includes a X-AeorDB-Sync-Version: 1 header.
A peer that doesn’t recognize the version replies 426 Upgrade Required with the supported range.

Version coupling to peer state

A peer’s last_sync_version is recorded with its peer config so an old peer that comes back online doesn’t mistakenly speak a newer protocol.
Joining the cluster (/sync/join) records the joining peer’s sync version. The cluster refuses joins from peers using a sync version newer than the responding node.

Breaking sync changes

Treated as cluster-wide upgrades:

Operators deploy the new build to one peer first.
That peer continues to speak the old sync version with other peers until all peers are upgraded.
Once every peer reports the new version in its heartbeat, the cluster transitions to the new sync protocol.
The old version stays supported for one release for rollback.

Format Version (On-Disk)

The file header carries format_magic = b"AEOR" (4 bytes) and format_version: u8 (currently 1). Format-breaking changes (e.g. KV page CRC, A/B header double-buffer) require a version bump.

Open-time behavior

Wrong magic: refuse with EngineError::InvalidMagic (“not an AeorDB file”).
Unknown version: refuse with EngineError::InvalidEntryVersion(N) (“DB format vN, this build expects v{SUPPORTED}; use the build that wrote this file to export, then re-import”).

Migration policy

Pre-beta: no migration tooling. Format breaks require nuking the dev DB.
Beta-and-later: every format-version bump must ship with an aeordb migrate --from N --to N+1 command that opens the old format and writes a new file at the new format.

Library API (Rust crate)

The aeordb crate follows Semver:

Major bump: any breaking change to a public function signature, public struct field, or public trait.
Minor bump: new public items, deprecations.
Patch bump: bug fixes, internal changes.

CLI Commands

Complete reference for the aeordb command-line interface.

`aeordb start`

Start the AeorDB server.

aeordb start [OPTIONS]

Flags

Flag	Short	Default	Description
`--config`	`-c`	–	Path to a TOML configuration file
`--port`	`-p`	`6830`	TCP port to listen on
`--host`		`0.0.0.0`	Bind address
`--database`	`-D`	`data.aeordb`	Path to the `.aeordb` database file
`--log-format`		`pretty`	Log output format: `pretty` or `json`
`--auth`		(none)	Auth provider URI (see below)
`--hot-dir`		(database parent dir)	Directory for write-ahead hot files
`--cors-origins`		(disabled)	CORS allowed origins
`--tls-cert`		–	Path to TLS certificate PEM file (requires `--tls-key`)
`--tls-key`		–	Path to TLS private key PEM file (requires `--tls-cert`)
`--jwt-expiry`		`3600`	JWT token lifetime in seconds
`--chunk-size`		`262144`	Write chunk size in bytes (256 KiB)
`--peers`		–	Comma-separated peer URLs to register at startup (persisted, idempotent)
`--join`		–	URL of an existing cluster member to join (one-shot; fetches the cluster’s signing key)
`--join-token`		–	Root API key (or bearer token) of the existing cluster member, required with `--join`

Auth Modes

The --auth flag accepts several formats:

Value	Mode	Description
(not set)	Disabled	No authentication required (dev mode)
`false`, `null`, `no`, `0`	Disabled	Explicitly disable authentication
`self`	Self-contained	AeorDB manages API keys internally
`file:///path/to/identity`	File-based	Load identity from a file

When using self mode, the root API key is printed once on first startup. Save it – it cannot be retrieved again (but can be reset with emergency-reset).

CORS

Value	Behavior
(not set)	CORS disabled
`*`	Allow all origins
`https://a.com,https://b.com`	Allow specific comma-separated origins

Examples

# Development mode (no auth, default port)
aeordb start

# Production with auth on port 8080
aeordb start --port 8080 --database /var/lib/aeordb/prod.aeordb --auth self --log-format json

# Custom hot directory and CORS
aeordb start --database data.aeordb --hot-dir /fast-ssd/hot --cors-origins "*"

# HTTPS with TLS
aeordb start --tls-cert /etc/ssl/cert.pem --tls-key /etc/ssl/key.pem --port 443

# Using a config file
aeordb start --config aeordb.toml

# Config file with CLI overrides
aeordb start --config aeordb.toml --port 8080 --auth false

# Join an existing cluster (one-shot — adopts the cluster's JWT signing key)
aeordb start --database nodeB.aeordb --auth self \
  --join http://nodeA:6830 --join-token "$NODE_A_ROOT_KEY"

# Register additional peers on a node already in a cluster
aeordb start --database data.aeordb --peers "http://nodeC:6830,http://nodeD:6830"

# Show version
aeordb --version

What Happens on Start

Binds HTTP and serves /system/health with status: "starting"
Scans the configured emergency-spill locations for unresolved artifacts tied to this database and refuses startup if any are found
Opens (or creates) the database file
Rebuilds startup state from the WAL if the previous shutdown was dirty
Bootstraps root API key (if --auth self and no key exists yet)
Resets any tasks left in Running state from a previous crash to Pending
Starts background workers:
- Heartbeat: emits clock-sync pulses every 15 seconds
- Metrics: emits system metrics snapshots every 15 seconds
- Cron scheduler: checks /.config/cron.json every 60 seconds
- Task worker: dequeues and executes background tasks
- Webhook dispatcher: delivers events to registered webhook URLs
Switches the full API router to ready and emits server_ready on eligible SSE streams
On CTRL+C or SIGTERM, stops accepting new storage work, waits for active work to drain, then flushes buffers

The shutdown drain window defaults to 600 seconds. Set AEORDB_SHUTDOWN_OPERATION_WAIT_SECS to override it.

`aeordb verify`

Verify database integrity and optionally repair recoverable issues.

aeordb verify [OPTIONS]

Flags

Flag	Short	Default	Description
`--database`	`-D`	`data.aeordb`	Path to the `.aeordb` database file
`--repair`		`false`	Repair recoverable issues
`--force-fix-in-place`		`false`	Apply repairs directly to the original database instead of creating `<database>.repaired`
`--yes`		`false`	Accept emergency-spill replay prompts without interactive confirmation

Examples

# Verify without modifying the database
aeordb verify --database data.aeordb

# Repair into a copy
aeordb verify --repair --database data.aeordb

# Repair the original database in place
aeordb verify --repair --force-fix-in-place --database data.aeordb

# Unattended emergency-spill repair after reviewing the artifacts
aeordb verify --repair --force-fix-in-place --yes --database data.aeordb

Directory Tree Repair

aeordb verify reports damaged B-tree directory branches under the Directory Consistency section. Normal read paths can return the readable portion of a damaged B-tree directory, but verification surfaces the missing or corrupt branch so it is not silently hidden.

When --repair is used, B-tree directory issues are repaired by first rebuilding only the affected B-tree directory from current live path records. If targeted repair cannot recover the issue, repair falls back to rebuilding the live directory tree from path-key FileRecords.

Emergency Spill Recovery

If startup finds unresolved emergency-spill artifacts for the target database, it exits before serving the normal API and prints the repair command:

aeordb verify --repair --force-fix-in-place -D /path/to/database.aeordb

Repair scans all emergency-spill locations, orders matching artifacts oldest-first, prints the hot-tail and WAL-tail files it found, and prompts before replay. --yes skips the prompt for automation. Spill replay must run in place because the artifact marker belongs to the original database path.

WAL-tail bytes are the only spill payload replayed into the database file. hot-tail.bin and index-buffer.json are preserved and reported, but repair does not trust them as primary data: after WAL-tail replay it forces a WAL rebuild, reconstructs reusable gaps, and publishes a fresh hot tail.

`aeordb gc`

Run garbage collection to reclaim space from unreachable entries.

aeordb gc [OPTIONS]

Flags

Flag	Short	Default	Description
`--database`	`-D`	`data.aeordb`	Path to the `.aeordb` database file
`--dry-run`		`false`	Report what would be collected without actually deleting

Examples

# Run GC
aeordb gc --database data.aeordb

# Preview what would be collected
aeordb gc --database data.aeordb --dry-run

Output

AeorDB Garbage Collection
Database: data.aeordb

Versions scanned: 3
Live entries:     1247
Garbage entries:  89
Reclaimed:        1.2 MB
Duration:         0.3s

See Garbage Collection for details on the mark-and-sweep algorithm.

`aeordb export`

Export a version as a self-contained .aeordb file.

aeordb export [OPTIONS]

Flags

Flag	Short	Default	Description
`--database`	`-D`	`data.aeordb`	Source database file
`--output`	`-o`	(required)	Output `.aeordb` file path
`--snapshot`	`-s`	(none)	Named snapshot to export
`--hash`		(none)	Specific version hash to export (hex-encoded)

If neither --snapshot nor --hash is provided, HEAD is exported.

Examples

# Export HEAD
aeordb export --database data.aeordb --output backup.aeordb

# Export a named snapshot
aeordb export --database data.aeordb --output backup-v1.aeordb --snapshot v1

# Export a specific hash
aeordb export --database data.aeordb --output backup.aeordb --hash abc123def456...

The output file must not already exist.

See Backup & Restore for full backup workflows.

`aeordb diff`

Create a patch .aeordb containing only the changeset between two versions.

aeordb diff [OPTIONS]

Flags

Flag	Short	Default	Description
`--database`	`-D`	`data.aeordb`	Source database file
`--output`	`-o`	(required)	Output patch file path
`--from`		(required)	Base version (snapshot name or hex hash)
`--to`		HEAD	Target version (snapshot name or hex hash)

Examples

# Diff between two snapshots
aeordb diff --database data.aeordb --output patch.aeordb --from v1 --to v2

# Diff from a snapshot to HEAD
aeordb diff --database data.aeordb --output patch.aeordb --from v1

# Diff between raw hashes
aeordb diff --database data.aeordb --output patch.aeordb --from abc123... --to def456...

The --from and --to arguments first try snapshot name lookup, then fall back to interpreting the value as a hex-encoded hash.

See Backup & Restore for incremental backup workflows.

`aeordb import`

Import an export or patch .aeordb file into a target database.

aeordb import [OPTIONS]

Flags

Flag	Short	Default	Description
`--database`	`-D`	`data.aeordb`	Target database file
`--file`	`-f`	(required)	Backup or patch file to import
`--force`		`false`	Skip base version verification for patches
`--promote`		`false`	Automatically set HEAD to the imported version

Examples

# Import a full backup
aeordb import --database data.aeordb --file backup.aeordb

# Import and promote HEAD
aeordb import --database data.aeordb --file backup.aeordb --promote

# Force-import a patch even if base doesn't match
aeordb import --database data.aeordb --file patch.aeordb --force --promote

Patch Base Verification

When importing a patch (backup_type=2), AeorDB verifies that the target database’s HEAD matches the patch’s base version. Use --force to bypass this check.

See Backup & Restore for restore workflows.

`aeordb promote`

Promote a version hash to HEAD.

aeordb promote [OPTIONS]

Flags

Flag	Short	Default	Description
`--database`	`-D`	`data.aeordb`	Database file
`--hash`		(required)	Hex-encoded version hash to promote

Examples

aeordb promote --database data.aeordb --hash abc123def456...

The command verifies the hash exists in the database before promoting.

`aeordb stress`

Run stress tests against a running AeorDB instance.

aeordb stress [OPTIONS]

Flags

Flag	Short	Default	Description
`--target`	`-t`	`http://localhost:6830`	Target server URL
`--api-key`	`-a`	(required)	API key for authentication
`--concurrency`	`-c`	`10`	Number of concurrent workers
`--duration`	`-d`	`10s`	Test duration (e.g., `30s`, `5m`)
`--operation`	`-o`	`mixed`	Operation type: `write`, `read`, or `mixed`
`--file-size`	`-s`	`1kb`	File size for writes (e.g., `512b`, `1kb`, `1mb`)
`--path-prefix`	`-p`	`/stress-test`	Path prefix for stress test files

Examples

# Quick mixed read/write test
aeordb stress --api-key $API_KEY

# Heavy write test for 5 minutes
aeordb stress --api-key $API_KEY --operation write --concurrency 50 --duration 5m --file-size 10kb

# Read-only test against production
aeordb stress --target https://prod.example.com --api-key $API_KEY --operation read --concurrency 100 --duration 30s

`aeordb emergency-reset`

Revoke the current root API key and generate a new one. Use this if the root key is lost or compromised.

aeordb emergency-reset [OPTIONS]

Flags

Flag	Short	Default	Description
`--database`	`-D`	(required)	Database file
`--force`		`false`	Skip confirmation prompt

Examples

# Interactive (prompts for confirmation)
aeordb emergency-reset --database data.aeordb

# Non-interactive
aeordb emergency-reset --database data.aeordb --force

What Happens

Finds all API keys linked to the root user (nil UUID)
Revokes each one
Generates a new root API key
Prints the new key (shown once, save it immediately)

WARNING: This will invalidate the current root API key.
A new root API key will be generated.
Proceed? [y/N]: y
Revoked 1 existing root API key(s).

==========================================================
  NEW ROOT API KEY (shown once, save it now!):
  aeordb_abc123def456...
==========================================================

This command requires direct file access to the database – it cannot be run over HTTP. It is intended for recovery scenarios where you have lost the root API key.

Keyboard shortcuts

AeorDB Documentation