Introduction
AeorDB is a content-addressed file database that treats your data as a filesystem, not as tables and rows. Store any file at any path, query structured fields with sub-millisecond lookups, and version everything with Git-like snapshots and forks – all from a single binary with zero external dependencies.
What Makes AeorDB Different
Content-addressed storage with BLAKE3. Every piece of data is identified by its cryptographic hash. This gives you built-in deduplication, integrity verification, and a Merkle tree that makes versioning essentially free.
A filesystem, not a schema. Data lives at paths like /users/alice.json and /docs/reports/q1.pdf, organized into directories. No schemas to define, no migrations to run. Store JSON, images, PDFs, or raw bytes – the engine handles them all the same way.
Built-in versioning. Create named snapshots, fork your database into isolated branches, diff between any two versions, and export/import self-contained .aeordb files. The content-addressed Merkle tree means historical reads resolve the exact data at the time of the snapshot, not the latest overwrite.
WASM plugin system. Extend the database with WebAssembly plugins for two purposes: parser plugins that extract structured fields from non-JSON files (PDFs, images, XML) for indexing, and query plugins that run custom logic directly at the data layer. Plugins execute in a sandboxed WASM runtime with configurable memory limits.
Native HTTP API. AeorDB exposes its full API over HTTP – no separate proxy, no client library required. Store files with PUT, read them with GET, query with POST /query, and manage versions with the /version/* endpoints. Any HTTP client works.
Embeddable. A single aeordb binary with no external dependencies. Point it at a .aeordb file and you have a running database. Like SQLite, but for files with versioning and a built-in HTTP server.
Lock-free concurrent reads. The engine uses snapshot double-buffering via ArcSwap so readers never block writers and never see partial state. Queries routinely complete in under a millisecond.
Key Features
- Storage: Append-only WAL file, content-addressed BLAKE3 hashing, automatic zstd compression, 256KB chunking for dedup
- Indexing: Scalar bucketing (NVT) with u64, i64, f64, string, timestamp, trigram, phonetic/soundex/dmetaphone index types
- Querying: JSON query API with boolean logic (
and,or,not), comparison operators, sorting, pagination, projections, and aggregations - Versioning: Snapshots, forks, diff/patch, export/import as self-contained
.aeordbfiles - Plugins: WASM parser plugins for any file format, WASM query plugins for custom data-layer logic
- Operations: Background task system, cron scheduler, garbage collection, automatic reindexing
- Auth: Self-contained JWT auth, API keys, user/group management, path-level permissions, or
--auth falsefor local use - Observability: Prometheus metrics at
/admin/metrics, SSE event stream at/events/stream, structured logging
Next Steps
- Installation – build from source
- Quick Start – store, query, and version data in 5 minutes
- Architecture – how the engine works under the hood
Installation
AeorDB is built from source using the Rust toolchain. There are no external dependencies – the binary is fully self-contained.
Prerequisites
- Rust (stable toolchain, 1.75+)
Install Rust if you don’t have it:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Build from Source
Clone the repository and build in release mode:
git clone https://github.com/AeorDB/aeordb.git
cd aeordb
cargo build --release
The binary is located at:
target/release/aeordb
Verify the Build
./target/release/aeordb --help
You should see:
AeorDB command-line interface
Usage: aeordb <COMMAND>
Commands:
start Start the database server
stress Run stress tests against a running instance
emergency-reset Emergency reset: revoke the current root API key and generate a new one
export Export a version as a self-contained .aeordb file
diff Create a patch .aeordb containing only the changeset between two versions
import Import an export or patch .aeordb file into a target database
promote Promote a version hash to HEAD
gc Run garbage collection to reclaim unreachable entries
help Print this message or the help of the given subcommand(s)
Optional: Add to PATH
# Copy to a location in your PATH
sudo cp target/release/aeordb /usr/local/bin/aeordb
# Or symlink
sudo ln -s "$(pwd)/target/release/aeordb" /usr/local/bin/aeordb
No External Dependencies
AeorDB does not require:
- A separate database process (it IS the process)
- Runtime libraries or shared objects
- Configuration files (sensible defaults for everything)
- Docker, containers, or orchestration
The single binary is all you need. Point it at a file path and it creates the database on first run.
Next Steps
- Quick Start – start the server and store your first file
Quick Start
This tutorial walks you through the core operations: storing files, querying, creating snapshots, and cleaning up. All examples use curl and assume auth is disabled for simplicity.
1. Start the Server
aeordb start --database mydb.aeordb --port 3000 --auth false
You should see log output indicating the server is listening on port 3000. The mydb.aeordb file is created automatically if it does not exist.
2. Store a File
Store a JSON file at the path /users/alice.json:
curl -X PUT http://localhost:3000/engine/users/alice.json \
-H "Content-Type: application/json" \
-d '{"name":"Alice","age":30,"city":"Portland"}'
Expected response:
{
"status": "created",
"path": "/users/alice.json",
"hash": "a1b2c3d4..."
}
Store a few more files to have data to query:
curl -X PUT http://localhost:3000/engine/users/bob.json \
-H "Content-Type: application/json" \
-d '{"name":"Bob","age":25,"city":"Seattle"}'
curl -X PUT http://localhost:3000/engine/users/carol.json \
-H "Content-Type: application/json" \
-d '{"name":"Carol","age":35,"city":"Portland"}'
3. Read a File
curl http://localhost:3000/engine/users/alice.json
Expected response:
{"name":"Alice","age":30,"city":"Portland"}
4. List a Directory
Append a trailing slash to list directory contents:
curl http://localhost:3000/engine/users/
Expected response:
{
"path": "/users/",
"entries": [
{"name": "alice.json", "type": "file"},
{"name": "bob.json", "type": "file"},
{"name": "carol.json", "type": "file"}
]
}
5. Add an Index
To query fields, you need to tell AeorDB which fields to index. Store an index configuration at .config/indexes.json inside the directory:
curl -X PUT http://localhost:3000/engine/users/.config/indexes.json \
-H "Content-Type: application/json" \
-d '{
"indexes": [
{"name": "age", "type": "u64"},
{"name": "city", "type": "string"},
{"name": "name", "type": ["string", "trigram"]}
]
}'
This tells the engine to index the age field as a 64-bit unsigned integer, city as an exact string, and name as both an exact string and a trigram (for fuzzy matching). Existing files in the directory are automatically reindexed in the background.
6. Query
Query for users older than 28 in Portland:
curl -X POST http://localhost:3000/query \
-H "Content-Type: application/json" \
-d '{
"path": "/users/",
"where": {
"and": [
{"field": "age", "op": "gt", "value": 28},
{"field": "city", "op": "eq", "value": "Portland"}
]
}
}'
Expected response:
{
"results": [
{"name": "Alice", "age": 30, "city": "Portland"},
{"name": "Carol", "age": 35, "city": "Portland"}
],
"total_count": 2
}
Query Operators
| Operator | Description | Example |
|---|---|---|
eq | Equals | {"field": "city", "op": "eq", "value": "Portland"} |
gt | Greater than | {"field": "age", "op": "gt", "value": 25} |
gte | Greater than or equal | {"field": "age", "op": "gte", "value": 25} |
lt | Less than | {"field": "age", "op": "lt", "value": 30} |
lte | Less than or equal | {"field": "age", "op": "lte", "value": 30} |
between | Range (inclusive) | {"field": "age", "op": "between", "value": [25, 35]} |
fuzzy | Trigram fuzzy match | {"field": "name", "op": "fuzzy", "value": "Alce"} |
phonetic | Phonetic match | {"field": "name", "op": "phonetic", "value": "Karrol"} |
7. Create a Snapshot
Save the current state as a named snapshot:
curl -X POST http://localhost:3000/version/snapshot \
-H "Content-Type: application/json" \
-d '{"name": "v1"}'
Expected response:
{
"status": "created",
"name": "v1",
"root_hash": "e5f6a7b8..."
}
You can list all snapshots:
curl http://localhost:3000/version/snapshots
8. Delete a File
curl -X DELETE http://localhost:3000/engine/users/alice.json
Expected response:
{
"status": "deleted",
"path": "/users/alice.json"
}
The file is removed from the current state (HEAD), but the snapshot v1 still contains it. You can restore the snapshot to get it back.
9. Run Garbage Collection
Over time, deleted and overwritten data accumulates in the database file. Run GC to reclaim unreachable entries:
curl -X POST http://localhost:3000/admin/gc
Expected response:
{
"versions_scanned": 2,
"live_entries": 15,
"garbage_entries": 3,
"reclaimed_bytes": 1024,
"duration_ms": 12,
"dry_run": false
}
To preview what would be collected without actually deleting:
curl -X POST "http://localhost:3000/admin/gc?dry_run=true"
Next Steps
- Configuration – CLI flags, auth modes, CORS, index config
- Architecture – understand the storage engine internals
- Versioning – snapshots, forks, diff/patch, export/import
- Indexing – index types, multi-strategy fields, WASM parsers
Configuration
AeorDB is configured through CLI flags at startup and through configuration files stored inside the database itself.
CLI Flags
aeordb start [OPTIONS]
| Flag | Default | Description |
|---|---|---|
--port, -p | 3000 | HTTP listen port |
--database, -D | data.aeordb | Path to the database file (created if it does not exist) |
--auth | self-contained | Auth provider URI (see Auth Modes) |
--hot-dir | database parent dir | Directory for write-ahead hot files (crash recovery journal) |
--cors | disabled | CORS allowed origins (see CORS) |
--log-format | pretty | Log output format: pretty or json |
Examples
# Minimal: local development with no auth
aeordb start --database dev.aeordb --port 8080 --auth false
# Production: custom port, explicit hot directory, CORS for your frontend
aeordb start \
--database /var/lib/aeordb/prod.aeordb \
--port 443 \
--hot-dir /var/lib/aeordb/hot \
--cors "https://myapp.com,https://admin.myapp.com" \
--log-format json
Auth Modes
The --auth flag controls how authentication works:
| Value | Behavior |
|---|---|
false (or null, no, 0) | Auth disabled – all requests are allowed without tokens. Use for local development only. |
| (omitted) | Self-contained mode (default). AeorDB manages its own users, API keys, and JWT tokens. A root API key is printed on first startup. |
file:///path/to/identity.json | External identity file. AeorDB loads cryptographic keys from the specified file. A bootstrap API key is generated on first use. |
Self-Contained Auth (Default)
On first startup, AeorDB creates an internal identity store and prints a root API key:
Root API key: aeor_ak_7f3b2a1c...
Use this key to create additional users and API keys via the admin API. If you lose the root key, use aeordb emergency-reset to generate a new one.
Obtaining a JWT Token
# Exchange API key for a JWT token
curl -X POST http://localhost:3000/auth/token \
-H "Content-Type: application/json" \
-d '{"api_key": "aeor_ak_7f3b2a1c..."}'
# Use the token for subsequent requests
curl http://localhost:3000/engine/users/ \
-H "Authorization: Bearer eyJhbG..."
CORS
Global CORS via CLI
The --cors flag sets allowed origins for all routes:
# Allow all origins
aeordb start --cors "*"
# Allow specific origins (comma-separated)
aeordb start --cors "https://myapp.com,https://admin.myapp.com"
Without --cors, no CORS headers are sent and cross-origin browser requests will fail.
Per-Path CORS
For fine-grained control, store a /.config/cors.json file in the database:
curl -X PUT http://localhost:3000/engine/.config/cors.json \
-H "Content-Type: application/json" \
-d '{
"rules": [
{
"path": "/public/",
"origins": ["*"],
"methods": ["GET", "HEAD"],
"headers": ["Content-Type"]
},
{
"path": "/api/",
"origins": ["https://myapp.com"],
"methods": ["GET", "POST", "PUT", "DELETE", "HEAD", "OPTIONS"],
"headers": ["Content-Type", "Authorization"]
}
]
}'
Per-path rules are checked first. If no rule matches the request path, the global --cors setting applies.
Index Configuration
Indexes are configured per-directory by storing a .config/indexes.json file under the directory path. When this file changes, the engine automatically triggers a background reindex of all files in that directory.
{
"indexes": [
{"name": "title", "type": "string"},
{"name": "age", "type": "u64"},
{"name": "email", "type": ["string", "trigram"]},
{"name": "created", "type": "timestamp"}
]
}
Index Types
| Type | Description | Use Case |
|---|---|---|
u64 | Unsigned 64-bit integer | Counts, IDs, sizes |
i64 | Signed 64-bit integer | Temperatures, offsets, balances |
f64 | 64-bit floating point | Coordinates, measurements, scores |
string | Exact string match | Categories, statuses, enum values |
timestamp | UTC millisecond timestamp | Date ranges, temporal queries |
trigram | Trigram-based fuzzy text | Typo-tolerant search, substring matching |
phonetic | Phonetic matching (Soundex) | Name search (“Smith” matches “Smyth”) |
soundex | Soundex encoding | Alternative phonetic matching |
dmetaphone | Double Metaphone | Multi-cultural phonetic matching |
Multi-Strategy Indexes
A single field can have multiple index types. Specify type as an array:
{"name": "title", "type": ["string", "trigram", "phonetic"]}
This creates three index files (title.string.idx, title.trigram.idx, title.phonetic.idx) from the same source field. Use the appropriate query operator to target each index type.
Source Resolution
By default, the index name is used as the JSON field name. For nested fields or parser output, use the source array:
{
"parser": "pdf-extractor",
"indexes": [
{"name": "title", "source": ["metadata", "title"], "type": "string"},
{"name": "author", "source": ["metadata", "author"], "type": ["string", "trigram"]},
{"name": "page_count", "source": ["metadata", "pages"], "type": "u64"}
]
}
See Indexing & Queries for the full indexing reference.
Cron Configuration
Schedule recurring background tasks by storing /.config/cron.json:
curl -X PUT http://localhost:3000/engine/.config/cron.json \
-H "Content-Type: application/json" \
-d '{
"schedules": [
{
"id": "weekly-gc",
"task_type": "gc",
"schedule": "0 3 * * 0",
"args": {},
"enabled": true
},
{
"id": "nightly-reindex",
"task_type": "reindex",
"schedule": "0 2 * * *",
"args": {"path": "/data/"},
"enabled": true
}
]
}'
The schedule field uses standard 5-field cron syntax: minute hour day_of_month month day_of_week. Cron schedules can also be managed via the HTTP API at /admin/cron.
Compression
AeorDB uses zstd compression automatically when configured. To enable compression for a directory, add the compression field to the index config:
{
"compression": "zstd",
"indexes": [
{"name": "title", "type": "string"}
]
}
Auto-Detection
When compression is enabled, the engine applies heuristics to decide whether to actually compress each file:
- Files smaller than 500 bytes are stored uncompressed (header overhead negates savings)
- Already-compressed formats (JPEG, PNG, MP4, ZIP, etc.) are stored uncompressed
- Text, JSON, XML, and other compressible types are compressed with zstd
Compression is transparent – reads automatically decompress. The content hash is always computed on the raw uncompressed data, so deduplication works regardless of compression settings.
Architecture
AeorDB is a single-file database built on an append-only write-ahead log (WAL). The database file contains all data, indexes, and metadata in one place. Understanding the architecture helps you reason about performance, recovery, and versioning behavior.
High-Level Overview
aeordb start
|
+-------+-------+
| HTTP Server |
| (axum) |
+-------+-------+
|
+--------------+--------------+
| | |
+-----+----+ +-----+----+ +------+------+
| Query | | Plugin | | Version |
| Engine | | Manager | | Manager |
+-----+----+ +-----+----+ +------+------+
| | |
+--------------+--------------+
|
+--------+--------+
| Storage Engine |
| (StorageEngine) |
+--------+--------+
|
+--------------+--------------+
| | |
+-----+----+ +-----+----+ +------+------+
| Append | | KV Store | | NVT |
| Writer | | (.kv) | | (in-memory) |
+----------+ +----------+ +-------------+
| |
+--------------+
|
[ mydb.aeordb ] <-- single file on disk
[ mydb.aeordb.kv ] <-- KV index file
The Database File (.aeordb)
The .aeordb file is an append-only WAL. Every write appends a new entry to the end of the file. Entries are never modified in place (except during garbage collection).
File Layout
[File Header - 256 bytes]
Magic: "AEOR"
Hash algorithm, timestamps, KV/NVT pointers, HEAD hash, entry count
[Entry 1] [Entry 2] [Entry 3] ... [Entry N]
Chunks, FileRecords, DirectoryIndexes, Snapshots, DeletionRecords, Voids
The 256-byte file header contains pointers to the KV block, NVT, and the current HEAD hash. Every entry carries its own header with magic bytes, type tag, hash algorithm, compression flag, key, and value.
Entry Types
| Type | Purpose |
|---|---|
| Chunk | Raw file data (256KB blocks) |
| FileRecord | File metadata + ordered list of chunk hashes |
| DirectoryIndex | Directory contents (child entries with hashes) |
| Snapshot | Named point-in-time version reference |
| DeletionRecord | Marks a file as deleted (for version history completeness) |
| Void | Free space marker (reclaimable by future writes) |
The KV Index File (.aeordb.kv)
The KV store is a sorted array of (hash, offset) pairs stored in a separate file. It maps content hashes to byte offsets in the main .aeordb file, providing O(1) lookups when combined with the NVT.
Each entry is hash_length + 8 bytes (40 bytes for BLAKE3-256). The entries are sorted by hash, and the NVT tells you which bucket to look in, so lookups are a single seek + small scan.
KV Resize
When the KV store needs to grow, the engine enters a brief resize mode:
- A temporary buffer KV store is created
- New writes go to the buffer (no blocking)
- The primary KV store is expanded
- Buffer contents are merged into the primary
- Buffer is discarded
Writes never block during resize.
NVT (Normalized Vector Table)
The NVT is an in-memory structure that provides fast hash-to-bucket lookups for the KV store.
How It Works
- Normalize the hash to a scalar:
first_8_bytes_as_u64 / u64::MAXproduces a value in [0.0, 1.0] - Map the scalar to a bucket:
bucket_index = floor(scalar * num_buckets) - The bucket points to a range in the KV store – scan that range for the exact hash
BLAKE3 hashes are uniformly distributed, so buckets stay balanced without manual tuning. The NVT starts at 1,024 buckets and doubles when the average scan length exceeds a threshold.
Scaling
| Entries | NVT Buckets | NVT Memory | Avg Scan |
|---|---|---|---|
| 10,000 | 1,024 | 16 KB | ~10 |
| 1,000,000 | 65,536 | 1 MB | ~15 |
| 100,000,000 | 1,048,576 | 16 MB | ~95 |
Hot File WAL (Crash Recovery)
The --hot-dir flag specifies a directory for write-ahead hot files. During a write:
- The entry is written to a hot file first (fsync’d)
- The entry is then written to the main
.aeordbfile - On success, the hot file entry is cleared
If the process crashes between steps 1 and 2, the hot file is replayed on the next startup to recover uncommitted writes. If --hot-dir is not specified, the hot directory defaults to the same directory as the database file.
Snapshot Double-Buffering
AeorDB uses ArcSwap for lock-free concurrent reads. The in-memory directory state is wrapped in an Arc that readers clone cheaply. When a write completes:
- The writer builds a new directory state
- The new state is swapped in atomically via
ArcSwap::store - Readers holding the old
Arccontinue using it until they finish - The old state is dropped when the last reader releases it
This means:
- Readers never block writers
- Writers never block readers
- Every read sees a consistent point-in-time snapshot
- No read locks, no write locks on the read path
B-Tree Directories
Small directories (under 256 entries) are stored as flat lists of child entries. When a directory exceeds 256 entries, the engine automatically converts it to a B-tree structure. This keeps directory lookups O(log n) even for directories with millions of files.
B-tree nodes are themselves stored as content-addressed entries, so they participate in versioning and structural sharing just like any other data.
Directory Propagation
When a file changes, the engine propagates the update up the directory tree:
Write /users/alice.json
-> update /users/ directory (new child hash for alice.json)
-> update / root directory (new child hash for users/)
-> update HEAD (new root hash)
Each directory gets a new content hash because its contents changed. This is how the Merkle tree works – a change at any leaf creates new hashes all the way to the root. The root hash (HEAD) uniquely identifies the complete state of the database.
Next Steps
- Storage Engine – entry format, hashing, chunking, and dedup details
- Versioning – how snapshots, forks, and diff/patch work
- Indexing & Queries – how indexes are built and queried
Storage Engine
The storage engine is an append-only WAL (write-ahead log) where the log IS the database. Every write appends a new entry. The file only grows (until garbage collection reclaims unreachable entries). This design gives you crash recovery, versioning, and integrity verification as structural properties rather than bolted-on features.
Entry Format
Every entry on disk shares the same header format:
[Entry Header - 31 bytes fixed + hash_length variable]
magic: u32 (0x0AE012DB - marks the start of a valid entry)
entry_version: u8 (format version, starting at 1)
entry_type: u8 (Chunk, FileRecord, DirectoryIndex, etc.)
flags: u8 (operational flags)
hash_algo: u16 (BLAKE3_256 = 0x0001, SHA256 = 0x0002, etc.)
compression_algo: u8 (None = 0x00, Zstd = 0x01)
encryption_algo: u8 (None = 0x00, reserved for future use)
key_length: u32 (length of the key field)
value_length: u32 (length of the value field)
timestamp: i64 (UTC milliseconds since epoch)
total_length: u32 (total bytes including header, for jump-scanning)
hash: [u8; N] (integrity hash, N determined by hash_algo)
[Key - key_length bytes]
[Value - value_length bytes]
Key properties:
magic(0x0AE012DB) enables recovery scanning – find entry boundaries even in a corrupted file by scanning for magic bytestotal_lengthenables jump-scanning – skip to the next entry without reading the full key/valuehashcoversentry_type + key + value– re-hash and compare to detect corruptionentry_versionenables format evolution – the engine selects the correct parser based on this byte
For BLAKE3-256 (the default), the hash is 32 bytes, making the full header 63 bytes.
Content-Addressed Hashing
Every piece of data is identified by its BLAKE3 hash. Hash inputs are prefixed by type (domain separation) to prevent collisions between different entry types:
| Entry Type | Hash Input | Example |
|---|---|---|
| Chunk | chunk: + raw bytes | BLAKE3("chunk:" + file_bytes) |
| FileRecord (path key) | file: + path | BLAKE3("file:/users/alice.json") |
| FileRecord (content key) | filec: + serialized record | BLAKE3("filec:" + record_bytes) |
| DirectoryIndex (path key) | dir: + path | BLAKE3("dir:/users/") |
| DirectoryIndex (content key) | dirc: + serialized data | BLAKE3("dirc:" + dir_bytes) |
The domain prefix ensures that a chunk’s raw data can never produce the same hash as a file path, even if the bytes are identical.
Chunking
Files are split into 256KB chunks for storage. Each chunk is content-addressed independently:
Original file (700KB):
[Chunk 1: 256KB] -> hash_a
[Chunk 2: 256KB] -> hash_b
[Chunk 3: 188KB] -> hash_c
FileRecord:
path: "/docs/report.pdf"
chunk_hashes: [hash_a, hash_b, hash_c]
total_size: 700KB
Chunking provides:
- Deduplication: Two files sharing identical 256KB blocks store those blocks only once
- Efficient updates: Modifying 3 bytes of a 10GB file creates one new chunk, not a new copy of the entire file
- Streaming reads: Read a file by iterating its chunk hashes and fetching each chunk
Dual-Key FileRecords
FileRecords are stored at two keys to support both current reads and historical versioning:
-
Path key (
file:/path) – mutable, always points to the latest version. Used for reads, metadata, indexing, and deletion. O(1) lookup. -
Content key (
filec:+ serialized record) – immutable, content-addressed. The directory tree’sChildEntry.hashpoints to this key.
When the version manager walks a snapshot’s directory tree, it follows ChildEntry.hash to the content key, which resolves to the FileRecord as it existed at snapshot time – not the current version. This is what makes historical reads correct.
Directories use the same pattern: dir:/path (mutable) and dirc: + data (immutable content key).
FileRecord Format
[FileRecord Value]
path_length: u16
path: [u8; path_length] (full file path)
content_type_len: u16
content_type: [u8; content_type_len] (MIME type)
total_size: u64 (file size in bytes)
created_at: i64 (UTC milliseconds)
updated_at: i64 (UTC milliseconds)
metadata_length: u32
metadata: [u8; metadata_length] (arbitrary JSON metadata)
chunk_count: u32
chunk_hashes: [u8; chunk_count * 32] (ordered BLAKE3 hashes)
Metadata fields come first so you can read file metadata without skipping past the chunk list. Chunk hashes are the tail of the record for streaming reads.
Directory Propagation
When a file is stored or deleted, the change propagates up the directory tree:
Store /users/alice.json:
1. Store chunks -> [hash_a, hash_b]
2. Store FileRecord at path key + content key
3. Update /users/ DirectoryIndex (new ChildEntry for alice.json)
4. Update / root DirectoryIndex (new ChildEntry for users/)
5. Update HEAD in file header (new root hash)
Each directory gets a new content hash because one of its children changed. This chain of updates from leaf to root is what maintains the Merkle tree and makes versioning work.
Void Management
When garbage collection reclaims an entry, the space becomes a Void – a marker for reclaimable space. Voids are tracked by size using deterministic hash keys:
Key: BLAKE3("::aeordb:void:262144")
Value: [list of file offsets where 262144-byte voids exist]
When a new entry needs to be written, the engine checks for a void of sufficient size before appending to the end of the file. If a void is larger than needed, it is split: the entry occupies the front, and a smaller void is created for the remainder (if the remainder is at least 63 bytes – the minimum entry header size).
Compression
Compression is a post-hash transform:
Write: raw data -> hash -> compress -> store
Read: load -> decompress -> verify hash -> return
The hash is always computed on the raw uncompressed data. This preserves deduplication (same content = same hash regardless of compression) and integrity verification.
Each entry carries its own compression_algo byte, so compressed and uncompressed entries coexist in the same file. Currently, zstd is the only supported compression algorithm.
fsync Strategy
Not all entries are equally important for durability:
| Data | fsync | Rationale |
|---|---|---|
| Chunks, FileRecords, DeletionRecords | Immediate | The truth – not rebuildable from other data |
| KV store, NVT, DirectoryIndex, Snapshots | Deferred | Derived data – can be rebuilt from a full entry scan |
This gives durability where it matters and performance where it doesn’t.
Crash Recovery
The recovery hierarchy, from least to most damage:
| What’s Lost | Recovery Method |
|---|---|
| Nothing | Read HEAD from KV store, load directory index, ready |
| KV store only | Entry-by-entry scan, rebuild KV store, load latest directory index |
| Directory index only | Scan FileRecords + DeletionRecords, reconstruct from paths + timestamps |
| KV store + directory | Full entry scan, rebuild KV, reconstruct directory |
| Only chunks + FileRecords survive | Full data recovery, version history reconstructed via DeletionRecords |
The magic bytes at the start of every entry enable boundary detection even in partially corrupted files. The total_length field in each header enables efficient forward scanning.
Next Steps
- Architecture – high-level system overview
- Versioning – snapshots, forks, and the Merkle tree
Versioning
AeorDB’s content-addressed Merkle tree makes versioning a structural property of the storage engine rather than an add-on feature. Every write creates new hashes up the directory tree, so every committed state is already a snapshot by definition – you just need to save a pointer to the root hash.
How Versioning Works
The database state at any point in time is fully described by its root hash (HEAD). HEAD is the hash of the root DirectoryIndex, which contains hashes of its children, which contain hashes of their children, all the way down to the chunks of individual files.
HEAD -> root DirectoryIndex
|
+-- /users/ (DirectoryIndex)
| +-- alice.json (FileRecord -> [chunk_a, chunk_b])
| +-- bob.json (FileRecord -> [chunk_c])
|
+-- /docs/ (DirectoryIndex)
+-- readme.md (FileRecord -> [chunk_d])
When you write /users/alice.json, the engine creates:
- New chunks (if the content changed)
- A new FileRecord with new chunk hashes
- A new
/users/DirectoryIndex with the updated child entry - A new root DirectoryIndex with the updated
/users/child entry - HEAD now points to the new root hash
The old root hash still exists and still points to the old directory tree with the old file content. Nothing was overwritten – new entries were appended.
Snapshots
A snapshot is a named reference to a root hash. Creating a snapshot saves the current HEAD so you can return to it later.
Create a Snapshot
curl -X POST http://localhost:3000/version/snapshot \
-H "Content-Type: application/json" \
-d '{"name": "v1.0"}'
List Snapshots
curl http://localhost:3000/version/snapshots
Response:
{
"snapshots": [
{"name": "v1.0", "root_hash": "a1b2c3...", "created_at": 1775968398000},
{"name": "v2.0", "root_hash": "d4e5f6...", "created_at": 1775968500000}
]
}
Restore a Snapshot
Restoring a snapshot sets HEAD back to the snapshot’s root hash. The current state is not lost – you can snapshot it before restoring if you want to preserve it.
curl -X POST http://localhost:3000/version/restore \
-H "Content-Type: application/json" \
-d '{"name": "v1.0"}'
Delete a Snapshot
curl -X DELETE http://localhost:3000/version/snapshot/v1.0
After deleting a snapshot, entries that were only reachable through that snapshot become eligible for garbage collection.
Forks
Forks are isolated branches of the database. Writes to a fork do not affect HEAD or other forks. This is useful for testing changes, running experiments, or staging updates before promoting them.
Create a Fork
curl -X POST http://localhost:3000/version/fork \
-H "Content-Type: application/json" \
-d '{"name": "experiment"}'
List Forks
curl http://localhost:3000/version/forks
Promote a Fork
When you’re satisfied with the changes in a fork, promote it to HEAD:
curl -X POST http://localhost:3000/version/fork/experiment/promote
Abandon a Fork
curl -X DELETE http://localhost:3000/version/fork/experiment
Tree Walking
The content-addressed Merkle tree enables historical reads. When you walk a snapshot’s directory tree:
- Start from the snapshot’s root hash
- Load the root DirectoryIndex – each child entry has a hash
- Follow child hashes to subdirectories or files
- Each FileRecord’s
ChildEntry.hashpoints to a content-addressed (immutable) key
Because file content keys are immutable, walking a snapshot’s tree always resolves to the data as it existed when the snapshot was taken, even if the files have been overwritten or deleted since then.
Snapshot "v1.0" root_hash: aaa111...
-> /users/ dir_hash: bbb222...
-> alice.json content_key: ccc333... (resolves to Alice's v1.0 data)
Current HEAD root_hash: ddd444...
-> /users/ dir_hash: eee555...
-> alice.json content_key: fff666... (resolves to Alice's current data)
Both trees can coexist because they share unchanged chunks and directories (structural sharing). Only the parts that differ consume additional storage.
Export and Import
AeorDB can export a version as a self-contained .aeordb file and import it into another database.
Export
Export creates a clean, compacted database from a single version – no history, no voids, no deleted entries:
# Export current HEAD
aeordb export --database data.aeordb --output backup.aeordb
# Export a specific snapshot
aeordb export --database data.aeordb --snapshot v1.0 --output v1.aeordb
# Export via HTTP
curl -X POST http://localhost:3000/admin/export --output backup.aeordb
curl -X POST "http://localhost:3000/admin/export?snapshot=v1.0" --output v1.aeordb
The exported file is a fully functional database that can be opened with aeordb start.
Import
Import applies an export or patch file to a target database:
# Import without promoting HEAD (inspect first)
aeordb import --database target.aeordb --file backup.aeordb
# Import and promote in one step
aeordb import --database target.aeordb --file backup.aeordb --promote
# Import via HTTP
curl -X POST http://localhost:3000/admin/import \
-H "Content-Type: application/octet-stream" \
--data-binary @backup.aeordb
Import does NOT automatically change HEAD. The imported version exists in the database and can be promoted explicitly when ready.
Diff and Patch
Diff extracts only the changeset between two versions. The output is a patch file – not a standalone database.
# Diff between two snapshots
aeordb diff --database data.aeordb --from v1.0 --to v2.0 --output patch.aeordb
# Diff from a snapshot to current HEAD
aeordb diff --database data.aeordb --from v1.0 --output patch.aeordb
# Diff via HTTP
curl -X POST "http://localhost:3000/admin/diff?from=v1.0&to=v2.0" --output patch.aeordb
A patch file contains only new/changed chunks, updated FileRecords, and deletion markers. Chunks shared between the two versions are not included, making patches much smaller than full exports.
Applying a Patch
# Apply patch (strict base version check)
aeordb import --database target.aeordb --file patch.aeordb
# Skip base version check
aeordb import --database target.aeordb --file patch.aeordb --force
# Apply and promote
aeordb import --database target.aeordb --file patch.aeordb --promote
If the target database’s HEAD does not match the patch’s base version, the import fails unless --force is used.
Promote
Promote sets HEAD to a specific version hash. This is separate from import so you can inspect the imported data before committing to it:
# CLI
aeordb promote --database data.aeordb --hash f6e5d4c3...
# HTTP
curl -X POST "http://localhost:3000/admin/promote" \
-H "Content-Type: application/json" \
-d '{"hash": "f6e5d4c3..."}'
Next Steps
- Storage Engine – how the Merkle tree and content addressing work at the byte level
- Architecture – system overview and crash recovery
Indexing & Queries
AeorDB indexes are opt-in and configured per-directory. Nothing is indexed by default – you control exactly which fields are indexed, with which strategies, and for which file types. This keeps the engine lean and predictable.
Index Configuration
Create a .config/indexes.json file in any directory to define indexes for files in that directory:
curl -X PUT http://localhost:3000/engine/users/.config/indexes.json \
-H "Content-Type: application/json" \
-d '{
"indexes": [
{"name": "name", "type": ["string", "trigram"]},
{"name": "age", "type": "u64"},
{"name": "city", "type": "string"},
{"name": "email", "type": "trigram"},
{"name": "created_at", "type": "timestamp"}
]
}'
When this file is created or updated, the engine automatically triggers a background reindex of all existing files in the directory.
Index Types
| Type | Order-Preserving | Description |
|---|---|---|
u64 | Yes | Unsigned 64-bit integer. Range-tracking with observed min/max. |
i64 | Yes | Signed 64-bit integer. Shifted to [0.0, 1.0] for NVT storage. |
f64 | Yes | 64-bit floating point. Clamping for NaN/Inf handling. |
string | Partially | Exact string matching. Multi-stage scalar: first byte weighted + length. |
timestamp | Yes | UTC millisecond timestamps. Range-tracking. |
trigram | No | Trigram-based fuzzy text matching. Tolerates typos, supports substring search. |
phonetic | No | General phonetic matching (Soundex algorithm). |
soundex | No | Soundex encoding for English names. |
dmetaphone | No | Double Metaphone for multi-cultural phonetic matching. |
Multi-Strategy Indexes
A single field can be indexed with multiple strategies by passing type as an array:
{"name": "title", "type": ["string", "trigram", "phonetic"]}
This creates three separate index files for the same field:
title.string.idx– exact match queriestitle.trigram.idx– fuzzy/substring queriestitle.phonetic.idx– phonetic queries
Use the appropriate query operator to target the desired index.
How Indexes Work
AeorDB uses a Normalized Vector Table (NVT) for index lookups. Each indexed field gets its own NVT.
The NVT Approach
- A
ScalarConvertermaps each field value to a scalar in [0.0, 1.0] - The scalar maps to a bucket in the NVT
- The bucket points to the matching entries
For numeric types (u64, i64, f64, timestamp), the converter tracks the observed min/max and distributes values uniformly across the [0.0, 1.0] range. This means range queries (gt, lt, between) are efficient – they resolve to a contiguous range of buckets.
For a query like WHERE age > 30:
converter.to_scalar(30)computes where 30 falls in the bucket range- All buckets after that point are candidates
- Only those buckets are scanned
This is O(1) for the bucket lookup, with a small linear scan within the bucket.
Two-Tier Execution
Simple queries (single field, direct comparison) use direct scalar lookups – no bitmaps, no compositing. Most queries fall into this tier.
Complex queries (OR, NOT, multi-field boolean logic) build NVT bitmaps and composite them:
- Each field condition produces a bitmask over the NVT buckets
AND= bitwise AND of masksOR= bitwise OR of masksNOT= bitwise NOT of a mask- The final mask identifies which buckets contain results
Memory usage is bounded: a bitmask for 1M buckets is only 128KB, regardless of how many entries exist.
Source Resolution
By default, the name field in an index definition is used as the JSON key to extract the value:
{"name": "age", "type": "u64"}
This extracts the age field from {"name": "Alice", "age": 30}.
Nested Fields
For nested JSON or parser output, use the source array to specify the path:
{"name": "author", "source": ["metadata", "author"], "type": "string"}
This extracts metadata.author from a JSON structure like:
{"metadata": {"author": "Jane Smith", "title": "Report"}}
The source array supports:
- String segments for object key lookup:
["metadata", "author"] - Integer segments for array index access:
["items", 0, "name"]
Plugin Mapper
For complex extraction logic, delegate to a WASM plugin:
{
"name": "summary",
"source": {"plugin": "my-mapper", "args": {"mode": "summary", "max_length": 500}},
"type": "trigram"
}
The plugin receives the parsed JSON and the args object, and returns the extracted field value.
WASM Parser Integration
For non-JSON files (PDFs, images, XML, etc.), a parser plugin converts raw bytes into a JSON object that the indexing pipeline can work with.
Configuration
{
"parser": "pdf-extractor",
"parser_memory_limit": "256mb",
"indexes": [
{"name": "title", "source": ["metadata", "title"], "type": ["string", "trigram"]},
{"name": "author", "source": ["metadata", "author"], "type": "phonetic"},
{"name": "content", "source": ["text"], "type": "trigram"},
{"name": "page_count", "source": ["metadata", "page_count"], "type": "u64"}
]
}
The parser receives a JSON envelope with the file data (base64-encoded) and metadata:
{
"data": "<base64-encoded file bytes>",
"meta": {
"filename": "report.pdf",
"path": "/docs/reports/report.pdf",
"content_type": "application/pdf",
"size": 1048576
}
}
The parser returns a JSON object (like {"text": "...", "metadata": {"title": "...", ...}}), and the source paths in each index definition walk this JSON to extract field values.
Global Parser Registry
You can also register parsers globally by content type at /.config/parsers.json:
{
"application/pdf": "pdf-extractor",
"image/jpeg": "image-metadata",
"image/png": "image-metadata"
}
When a file is stored and no parser is configured in the directory’s index config, the engine checks this registry using the file’s content type.
Failure Handling
Parser and indexing failures never prevent file storage. The file is always stored regardless of parse/index errors. If logging is enabled in the index config ("logging": true), errors are written to .logs/ under the directory.
Automatic Reindexing
When you store or update a .config/indexes.json file, the engine automatically enqueues a background reindex task for that directory. The task:
- Reads the current index config
- Lists all files in the directory
- Re-runs the indexing pipeline for each file (in batches of 50, yielding between batches)
- Reports progress via
GET /admin/tasks
During reindexing, queries still work but may return incomplete results. The query response includes a meta.reindexing field with the current progress:
{
"results": [...],
"meta": {
"reindexing": 0.67,
"reindexing_eta": 1775968398803,
"reindexing_indexed": 6700,
"reindexing_total": 10000
}
}
Query API
Queries are submitted as POST /query with a JSON body:
{
"path": "/users/",
"where": {
"and": [
{"field": "age", "op": "gt", "value": 30},
{"field": "city", "op": "eq", "value": "Portland"},
{"not": {"field": "role", "op": "eq", "value": "banned"}}
]
},
"sort": {"field": "age", "order": "desc"},
"limit": 50,
"offset": 0
}
Boolean Logic
The where clause supports full boolean logic:
{
"where": {
"or": [
{"field": "city", "op": "eq", "value": "Portland"},
{
"and": [
{"field": "age", "op": "gt", "value": 25},
{"field": "city", "op": "eq", "value": "Seattle"}
]
}
]
}
}
For backward compatibility, a flat array in where is treated as an implicit and:
{
"where": [
{"field": "age", "op": "gt", "value": 30},
{"field": "city", "op": "eq", "value": "Portland"}
]
}
Query Operators
| Operator | Description | Value Type |
|---|---|---|
eq | Equals | any |
gt | Greater than | numeric, timestamp |
gte | Greater than or equal | numeric, timestamp |
lt | Less than | numeric, timestamp |
lte | Less than or equal | numeric, timestamp |
between | Inclusive range | [min, max] |
fuzzy | Trigram fuzzy match | string (requires trigram index) |
phonetic | Phonetic match | string (requires phonetic/soundex/dmetaphone index) |
Next Steps
- Quick Start – hands-on tutorial with query examples
- Configuration – full index config reference
- Storage Engine – how indexed data is stored
Files & Directories
AeorDB exposes a content-addressable filesystem through its engine routes. Every path under /engine/ represents either a file or a directory.
Endpoint Summary
| Method | Path | Description | Auth | Status Codes |
|---|---|---|---|---|
| PUT | /engine/{path} | Store a file | Yes | 201, 400, 404, 409, 500 |
| GET | /engine/{path} | Read a file or list a directory | Yes | 200, 404, 500 |
| DELETE | /engine/{path} | Delete a file | Yes | 200, 404, 500 |
| HEAD | /engine/{path} | Check existence and get metadata | Yes | 200, 404, 500 |
| POST | /engine-symlink/{path} | Create or update a symlink | Yes | 201, 400, 500 |
PUT /engine/
Store a file at the given path. Parent directories are created automatically. If a file already exists at the path, it is overwritten (creating a new version).
Body limit: 10 GB
Request
- Headers:
Authorization: Bearer <token>(required)Content-Type(optional) – auto-detected from magic bytes if omitted
- Body: raw file bytes
Response
Status: 201 Created
{
"path": "/data/report.pdf",
"content_type": "application/pdf",
"total_size": 245678,
"created_at": 1775968398000,
"updated_at": 1775968398000
}
Side Effects
- If the path matches
/.config/indexes.json(or a nested variant like/data/.config/indexes.json), a reindex task is automatically enqueued for the parent directory. Any existing pending or running reindex for that path is cancelled first. - Triggers
entries_createdevents on the event bus. - Runs any deployed store-phase plugins.
Example
curl -X PUT http://localhost:3000/engine/data/report.pdf \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/pdf" \
--data-binary @report.pdf
Error Responses
| Status | Condition |
|---|---|
| 400 | Invalid input (e.g., empty path) |
| 404 | Parent path references a non-existent entity |
| 409 | Path conflict (e.g., file exists where directory expected) |
| 500 | Internal storage failure |
GET /engine/
Read a file or list a directory. The server determines the type automatically:
- If the path resolves to a file, the file content is streamed with appropriate headers.
- If the path resolves to a directory, a JSON array of children is returned.
Request
- Headers:
Authorization: Bearer <token>(required)
Query Parameters
| Param | Type | Default | Description |
|---|---|---|---|
snapshot | string | — | Read the file as it was at this named snapshot |
version | string | — | Read the file at this version hash (hex) |
nofollow | boolean | false | If the path is a symlink, return metadata instead of following |
depth | integer | 0 | Directory listing depth: 0 = immediate children, -1 = unlimited recursion |
glob | string | — | Filter directory listing by file name glob pattern (*, ?, [abc]) |
File Response
Status: 200 OK
Headers:
| Header | Description |
|---|---|
X-Path | Canonical path of the file |
X-Total-Size | File size in bytes |
X-Created-At | Unix timestamp (milliseconds) |
X-Updated-At | Unix timestamp (milliseconds) |
Content-Type | MIME type (if known) |
Body: raw file bytes (streamed)
Directory Response
Status: 200 OK
Each entry includes path, hash, and numeric entry_type fields. Symlink entries also include a target field.
Entry types: 2 = file, 3 = directory, 8 = symlink.
[
{
"path": "/data/report.pdf",
"name": "report.pdf",
"entry_type": 2,
"hash": "a3f8c1...",
"total_size": 245678,
"created_at": 1775968398000,
"updated_at": 1775968398000,
"content_type": "application/pdf"
},
{
"path": "/data/images",
"name": "images",
"entry_type": 3,
"hash": "b2c4d5...",
"total_size": 0,
"created_at": 1775968000000,
"updated_at": 1775968000000,
"content_type": null
},
{
"path": "/data/latest",
"name": "latest",
"entry_type": 8,
"hash": "c3d5e6...",
"target": "/data/report.pdf",
"total_size": 0,
"created_at": 1775968500000,
"updated_at": 1775968500000,
"content_type": null
}
]
Examples
Read a file:
curl http://localhost:3000/engine/data/report.pdf \
-H "Authorization: Bearer $TOKEN" \
-o report.pdf
List a directory:
curl http://localhost:3000/engine/data/ \
-H "Authorization: Bearer $TOKEN"
Recursive Directory Listing
Use the depth and glob query parameters to list files recursively:
# List all files recursively
curl http://localhost:3000/engine/data/?depth=-1 \
-H "Authorization: Bearer $TOKEN"
# List only .psd files anywhere under /assets/
curl "http://localhost:3000/engine/assets/?depth=-1&glob=*.psd" \
-H "Authorization: Bearer $TOKEN"
# List one level deep
curl http://localhost:3000/engine/data/?depth=1 \
-H "Authorization: Bearer $TOKEN"
When depth > 0 or depth = -1, the response contains files only in a flat list. Directory entries are traversed but not included in the output.
Versioned Reads
Read a file as it was at a specific snapshot or version:
# Read file at a named snapshot
curl "http://localhost:3000/engine/data/report.pdf?snapshot=v1.0" \
-H "Authorization: Bearer $TOKEN"
# Read file at a specific version hash
curl "http://localhost:3000/engine/data/report.pdf?version=a1b2c3..." \
-H "Authorization: Bearer $TOKEN"
If both snapshot and version are provided, snapshot takes precedence. Returns 404 if the file did not exist at that version.
Error Responses
| Status | Condition |
|---|---|
| 404 | Path does not exist as file or directory |
| 500 | Internal read failure |
DELETE /engine/
Delete a file at the given path. Creates a DeletionRecord and removes the file from its parent directory listing. Directories cannot be deleted directly – delete all files within first.
Request
- Headers:
Authorization: Bearer <token>(required)
Response
Status: 200 OK
{
"deleted": true,
"path": "/data/report.pdf"
}
Side Effects
- Triggers
entries_deletedevents on the event bus. - Updates index entries for the deleted file.
Example
curl -X DELETE http://localhost:3000/engine/data/report.pdf \
-H "Authorization: Bearer $TOKEN"
Error Responses
| Status | Condition |
|---|---|
| 404 | File not found |
| 500 | Internal deletion failure |
Symlinks
AeorDB supports soft symlinks — entries that point to another path. Symlinks are transparent by default: reading a symlink path returns the target’s content.
POST /engine-symlink/
Create or update a symlink.
Request Body:
{
"target": "/assets/logo.psd"
}
Response: 201 Created
{
"path": "/latest-logo",
"target": "/assets/logo.psd",
"entry_type": 8,
"created_at": 1775968398000,
"updated_at": 1775968398000
}
The target path does not need to exist at creation time (dangling symlinks are allowed).
Reading Symlinks
By default, GET /engine/{path} follows symlinks transparently:
# Returns the content of /assets/logo.psd
curl http://localhost:3000/engine/latest-logo \
-H "Authorization: Bearer $TOKEN"
To inspect the symlink itself without following it, use ?nofollow=true:
curl "http://localhost:3000/engine/latest-logo?nofollow=true" \
-H "Authorization: Bearer $TOKEN"
Returns the symlink metadata as JSON instead of the target’s content.
Symlink Resolution
Symlinks can point to other symlinks — chains are followed recursively. AeorDB detects cycles and enforces a maximum resolution depth of 32 hops.
| Scenario | Result |
|---|---|
| Symlink → file | Returns file content |
| Symlink → directory | Returns directory listing |
| Symlink → symlink → file | Follows chain, returns file content |
| Symlink → nonexistent | 404 (dangling symlink) |
| Symlink cycle (A → B → A) | 400 with cycle detection message |
| Chain exceeds 32 hops | 400 with depth exceeded message |
HEAD on Symlinks
HEAD /engine/{path} returns symlink metadata as headers:
X-Entry-Type: symlink
X-Symlink-Target: /assets/logo.psd
X-Path: /latest-logo
X-Created-At: 1775968398000
X-Updated-At: 1775968398000
Deleting Symlinks
DELETE /engine/{path} on a symlink deletes the symlink itself, not the target:
curl -X DELETE http://localhost:3000/engine/latest-logo \
-H "Authorization: Bearer $TOKEN"
{
"deleted": true,
"path": "latest-logo",
"type": "symlink"
}
Symlinks in Directory Listings
Symlinks appear in directory listings with entry_type: 8 and a target field:
{
"path": "/data/latest",
"name": "latest",
"entry_type": 8,
"hash": "c3d5e6...",
"target": "/data/report.pdf",
"total_size": 0,
"created_at": 1775968500000,
"updated_at": 1775968500000,
"content_type": null
}
Symlink Versioning
Symlinks are versioned like files. Snapshots capture the symlink’s target path at that point in time. Restoring a snapshot restores the link, not the resolved content.
HEAD /engine/
Check whether a path exists and retrieve its metadata as response headers, without downloading the body. Works for both files and directories.
Request
- Headers:
Authorization: Bearer <token>(required)
Response
Status: 200 OK (empty body)
Headers:
| Header | Value |
|---|---|
X-Entry-Type | file, directory, or symlink |
X-Path | Canonical path |
X-Total-Size | File size in bytes (files only) |
X-Created-At | Unix timestamp in milliseconds (files only) |
X-Updated-At | Unix timestamp in milliseconds (files only) |
Content-Type | MIME type (files only, if known) |
X-Symlink-Target | Target path (symlinks only) |
Example
curl -I http://localhost:3000/engine/data/report.pdf \
-H "Authorization: Bearer $TOKEN"
HTTP/1.1 200 OK
X-Entry-Type: file
X-Path: /data/report.pdf
X-Total-Size: 245678
X-Created-At: 1775968398000
X-Updated-At: 1775968398000
Content-Type: application/pdf
Error Responses
| Status | Condition |
|---|---|
| 404 | Path does not exist |
| 500 | Internal metadata lookup failure |
Query API
The query engine supports indexed field queries with boolean combinators, pagination, sorting, aggregations, projections, and an explain mode.
Endpoint Summary
| Method | Path | Description | Auth | Status Codes |
|---|---|---|---|---|
| POST | /query | Execute a query | Yes | 200, 400, 404, 500 |
POST /query
Execute a query against indexed fields within a directory path.
Request Body
{
"path": "/users",
"where": {
"field": "age",
"op": "gt",
"value": 21
},
"limit": 20,
"offset": 0,
"order_by": [{"field": "name", "direction": "asc"}],
"after": null,
"before": null,
"include_total": true,
"select": ["@path", "@score", "name"],
"explain": false
}
| Field | Type | Required | Description |
|---|---|---|---|
path | string | Yes | Directory path to query within |
where | object/array | Yes | Query filter (see below) |
limit | integer | No | Max results to return (server default applies if omitted) |
offset | integer | No | Skip this many results |
order_by | array | No | Sort fields with direction |
after | string | No | Cursor for forward pagination |
before | string | No | Cursor for backward pagination |
include_total | boolean | No | Include total_count in response (default: false) |
select | array | No | Project specific fields in results |
aggregate | object | No | Run aggregations instead of returning results |
explain | string/boolean | No | "plan", "analyze", or true for query plan |
Query Operators
Each field query is an object with field, op, and value:
{"field": "age", "op": "gt", "value": 21}
Comparison Operators
| Operator | Description | Value Type | Example |
|---|---|---|---|
eq | Exact match | any | {"field": "status", "op": "eq", "value": "active"} |
gt | Greater than | number/string | {"field": "age", "op": "gt", "value": 21} |
lt | Less than | number/string | {"field": "age", "op": "lt", "value": 65} |
between | Inclusive range | number/string | {"field": "age", "op": "between", "value": 21, "value2": 65} |
in | Match any value in a set | array | {"field": "status", "op": "in", "value": ["active", "pending"]} |
Text Search Operators
These operators require the appropriate index type to be configured.
| Operator | Description | Index Required | Example |
|---|---|---|---|
contains | Substring match | trigram | {"field": "name", "op": "contains", "value": "alice"} |
similar | Fuzzy trigram match with threshold | trigram | {"field": "name", "op": "similar", "value": "alice", "threshold": 0.3} |
phonetic | Sounds-like match | phonetic | {"field": "name", "op": "phonetic", "value": "smith"} |
fuzzy | Configurable fuzzy match | trigram | See below |
match | Multi-strategy combined match | trigram + phonetic | {"field": "name", "op": "match", "value": "alice"} |
Fuzzy Operator Options
The fuzzy operator supports additional parameters:
{
"field": "name",
"op": "fuzzy",
"value": "alice",
"fuzziness": "auto",
"algorithm": "damerau_levenshtein"
}
| Parameter | Values | Default |
|---|---|---|
fuzziness | "auto" or integer (edit distance) | "auto" |
algorithm | "damerau_levenshtein", "jaro_winkler" | "damerau_levenshtein" |
Similar Operator Options
{
"field": "name",
"op": "similar",
"value": "alice",
"threshold": 0.3
}
| Parameter | Type | Default | Description |
|---|---|---|---|
threshold | float | 0.3 | Minimum similarity score (0.0 to 1.0) |
Boolean Combinators
Combine multiple conditions using and, or, and not:
AND
All conditions must match:
{
"where": {
"and": [
{"field": "age", "op": "gt", "value": 21},
{"field": "status", "op": "eq", "value": "active"}
]
}
}
OR
At least one condition must match:
{
"where": {
"or": [
{"field": "status", "op": "eq", "value": "active"},
{"field": "status", "op": "eq", "value": "pending"}
]
}
}
NOT
Invert a condition:
{
"where": {
"not": {"field": "status", "op": "eq", "value": "deleted"}
}
}
Nested Boolean Logic
Combinators can be nested arbitrarily:
{
"where": {
"and": [
{"field": "age", "op": "gt", "value": 21},
{
"or": [
{"field": "role", "op": "eq", "value": "admin"},
{"field": "role", "op": "eq", "value": "moderator"}
]
}
]
}
}
Legacy Array Format
An array at the top level is sugar for AND:
{
"where": [
{"field": "age", "op": "gt", "value": 21},
{"field": "status", "op": "eq", "value": "active"}
]
}
Response Format
Standard Query Response
{
"results": [
{
"path": "/users/alice.json",
"total_size": 256,
"content_type": "application/json",
"created_at": 1775968398000,
"updated_at": 1775968398000,
"score": 1.0,
"matched_by": ["age"]
}
],
"has_more": true,
"total_count": 150,
"next_cursor": "eyJwYXRoIjoiL3VzZXJzL2JvYi5qc29uIn0=",
"prev_cursor": "eyJwYXRoIjoiL3VzZXJzL2Fhcm9uLmpzb24ifQ==",
"meta": {
"reindexing": 0.67,
"reindexing_eta": 1775968398803,
"reindexing_indexed": 670,
"reindexing_total": 1000,
"reindexing_stale_since": 1775968300000
}
}
| Field | Type | Description |
|---|---|---|
results | array | Matching file metadata with scores |
has_more | boolean | Whether more results exist beyond the current page |
total_count | integer | Total matching results (only if include_total: true) |
next_cursor | string | Cursor for the next page (if has_more is true) |
prev_cursor | string | Cursor for the previous page |
default_limit_hit | boolean | Present and true when the server’s default limit was applied |
default_limit | integer | The server’s default limit value (present with default_limit_hit) |
meta | object | Reindex progress metadata (present only during active reindex) |
Result Fields
Each result object contains:
| Field | Type | Description |
|---|---|---|
path | string | Full path to the matched file |
total_size | integer | File size in bytes |
content_type | string | MIME type (nullable) |
created_at | integer | Creation timestamp (ms) |
updated_at | integer | Last update timestamp (ms) |
score | float | Relevance score (1.0 = exact match) |
matched_by | array | List of field names that matched |
Sorting
Sort results by one or more fields:
{
"order_by": [
{"field": "name", "direction": "asc"},
{"field": "created_at", "direction": "desc"}
]
}
| Direction | Description |
|---|---|
asc | Ascending (default) |
desc | Descending |
Pagination
Offset-Based
{
"limit": 20,
"offset": 40
}
Cursor-Based
Use after or before with cursor values from a previous response:
{
"limit": 20,
"after": "eyJwYXRoIjoiL3VzZXJzL2JvYi5qc29uIn0="
}
Projection (select)
Return only specific fields in each result. Use @-prefixed names for built-in metadata fields:
{
"select": ["@path", "@score", "name", "email"]
}
| Virtual Field | Maps To |
|---|---|
@path | path |
@score | score |
@size | total_size |
@content_type | content_type |
@created_at | created_at |
@updated_at | updated_at |
@matched_by | matched_by |
Envelope fields (has_more, next_cursor, total_count, meta) are never stripped by projection.
Aggregations
Run aggregate computations instead of returning individual results.
Request
{
"path": "/orders",
"where": {"field": "status", "op": "eq", "value": "complete"},
"aggregate": {
"count": true,
"sum": ["total", "tax"],
"avg": ["total"],
"min": ["total"],
"max": ["total"],
"group_by": ["status"]
}
}
| Field | Type | Description |
|---|---|---|
count | boolean | Include a count of matching records |
sum | array | Fields to sum |
avg | array | Fields to average |
min | array | Fields to find minimum |
max | array | Fields to find maximum |
group_by | array | Fields to group results by |
Response
The response shape depends on whether group_by is used. Aggregation results are returned as a JSON object.
Explain Mode
Inspect the query execution plan without running the full query. Useful for debugging index usage and performance.
{
"path": "/users",
"where": {"field": "age", "op": "gt", "value": 21},
"explain": "plan"
}
| Value | Description |
|---|---|
true or "plan" | Show the query plan |
"analyze" | Execute the query and include timing information |
Examples
Simple equality query
curl -X POST http://localhost:3000/query \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"path": "/users",
"where": {"field": "status", "op": "eq", "value": "active"},
"limit": 10
}'
Fuzzy name search with pagination
curl -X POST http://localhost:3000/query \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"path": "/users",
"where": {"field": "name", "op": "similar", "value": "alice", "threshold": 0.4},
"limit": 20,
"order_by": [{"field": "name", "direction": "asc"}],
"include_total": true
}'
Complex boolean query
curl -X POST http://localhost:3000/query \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"path": "/products",
"where": {
"and": [
{"field": "price", "op": "between", "value": 10, "value2": 100},
{
"or": [
{"field": "category", "op": "eq", "value": "electronics"},
{"field": "category", "op": "eq", "value": "books"}
]
},
{"not": {"field": "status", "op": "eq", "value": "discontinued"}}
]
},
"order_by": [{"field": "price", "direction": "asc"}],
"limit": 50
}'
Aggregation with grouping
curl -X POST http://localhost:3000/query \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"path": "/orders",
"where": {"field": "year", "op": "eq", "value": 2026},
"aggregate": {
"count": true,
"sum": ["total"],
"avg": ["total"],
"group_by": ["status"]
}
}'
Error Responses
| Status | Condition |
|---|---|
| 400 | Invalid query structure, missing field/op, unsupported operation, range query on non-range converter |
| 404 | Query path or index not found |
| 500 | Internal query execution failure |
Version API
AeorDB provides Git-like version control through snapshots (named points in time) and forks (divergent branches of the data).
Endpoint Summary
| Method | Path | Description | Auth | Root Required |
|---|---|---|---|---|
| POST | /version/snapshot | Create a snapshot | Yes | No |
| GET | /version/snapshots | List all snapshots | Yes | No |
| POST | /version/restore | Restore a snapshot | Yes | Yes |
| DELETE | /version/snapshot/{name} | Delete a snapshot | Yes | Yes |
| POST | /version/fork | Create a fork | Yes | No |
| GET | /version/forks | List all forks | Yes | No |
| POST | /version/fork/{name}/promote | Promote fork to HEAD | Yes | Yes |
| DELETE | /version/fork/{name} | Abandon a fork | Yes | Yes |
| GET | /engine/{path}?snapshot={name} | Read file at a snapshot | Yes | No |
| GET | /engine/{path}?version={hash} | Read file at a version hash | Yes | No |
| GET | /version/file-history/{path} | File change history across snapshots | Yes | No |
| POST | /version/file-restore/{path} | Restore file from a version | Yes | Yes |
Snapshots
POST /version/snapshot
Create a named snapshot of the current HEAD.
Request Body:
{
"name": "v1.0",
"metadata": {
"description": "First stable release",
"author": "alice"
}
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique snapshot name |
metadata | object | No | Arbitrary key-value metadata (defaults to empty) |
Response: 201 Created
{
"name": "v1.0",
"root_hash": "a1b2c3d4e5f6...",
"created_at": 1775968398000,
"metadata": {
"description": "First stable release",
"author": "alice"
}
}
Example:
curl -X POST http://localhost:3000/version/snapshot \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "v1.0", "metadata": {"description": "First stable release"}}'
Error Responses:
| Status | Condition |
|---|---|
| 409 | Snapshot with this name already exists |
| 500 | Internal failure |
GET /version/snapshots
List all snapshots.
Response: 200 OK
[
{
"name": "v1.0",
"root_hash": "a1b2c3d4e5f6...",
"created_at": 1775968398000,
"metadata": {"description": "First stable release"}
},
{
"name": "v2.0",
"root_hash": "f6e5d4c3b2a1...",
"created_at": 1775969000000,
"metadata": {}
}
]
Example:
curl http://localhost:3000/version/snapshots \
-H "Authorization: Bearer $TOKEN"
POST /version/restore
Restore a named snapshot, making it the current HEAD. Requires root.
Request Body:
{
"name": "v1.0"
}
Response: 200 OK
{
"restored": true,
"name": "v1.0"
}
Example:
curl -X POST http://localhost:3000/version/restore \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "v1.0"}'
Error Responses:
| Status | Condition |
|---|---|
| 403 | Non-root user |
| 404 | Snapshot not found |
| 500 | Internal failure |
DELETE /version/snapshot/
Delete a named snapshot. Requires root.
Response: 200 OK
{
"deleted": true,
"name": "v1.0"
}
Example:
curl -X DELETE http://localhost:3000/version/snapshot/v1.0 \
-H "Authorization: Bearer $TOKEN"
Error Responses:
| Status | Condition |
|---|---|
| 403 | Non-root user |
| 404 | Snapshot not found |
| 500 | Internal failure |
Forks
Forks create a divergent branch of the data, optionally based on a named snapshot.
POST /version/fork
Create a new fork.
Request Body:
{
"name": "experiment",
"base": "v1.0"
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique fork name |
base | string | No | Snapshot name to fork from (defaults to current HEAD) |
Response: 201 Created
{
"name": "experiment",
"root_hash": "a1b2c3d4e5f6...",
"created_at": 1775968398000
}
Example:
curl -X POST http://localhost:3000/version/fork \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "experiment", "base": "v1.0"}'
Error Responses:
| Status | Condition |
|---|---|
| 409 | Fork with this name already exists |
| 500 | Internal failure |
GET /version/forks
List all active forks.
Response: 200 OK
[
{
"name": "experiment",
"root_hash": "a1b2c3d4e5f6...",
"created_at": 1775968398000
}
]
Example:
curl http://localhost:3000/version/forks \
-H "Authorization: Bearer $TOKEN"
POST /version/fork/{name}/promote
Promote a fork’s state to HEAD, making it the active version. Requires root.
Response: 200 OK
{
"promoted": true,
"name": "experiment"
}
Example:
curl -X POST http://localhost:3000/version/fork/experiment/promote \
-H "Authorization: Bearer $TOKEN"
Error Responses:
| Status | Condition |
|---|---|
| 403 | Non-root user |
| 404 | Fork not found |
| 500 | Internal failure |
DELETE /version/fork/
Abandon a fork (soft delete). Requires root.
Response: 200 OK
{
"abandoned": true,
"name": "experiment"
}
Example:
curl -X DELETE http://localhost:3000/version/fork/experiment \
-H "Authorization: Bearer $TOKEN"
Error Responses:
| Status | Condition |
|---|---|
| 403 | Non-root user |
| 404 | Fork not found |
| 500 | Internal failure |
File-Level Version Access
Read, restore, and view history for individual files at specific historical versions.
Reading Files at a Version
Use query parameters on the standard file read endpoint:
# Read a file as it was at a named snapshot
curl "http://localhost:3000/engine/assets/logo.psd?snapshot=v1.0" \
-H "Authorization: Bearer $TOKEN"
# Read a file at a specific version hash
curl "http://localhost:3000/engine/assets/logo.psd?version=a1b2c3d4..." \
-H "Authorization: Bearer $TOKEN"
Returns the file content exactly as it was at that version, with the same headers as a normal file read. If both snapshot and version are provided, snapshot takes precedence.
Error Responses:
| Status | Condition |
|---|---|
| 404 | File did not exist at that version |
| 404 | Snapshot or version not found |
| 400 | Invalid version hash (not valid hex) |
GET /version/file-history/
View the change history of a single file across all snapshots. Returns entries ordered newest-first, each with a change_type indicating what happened to the file at that snapshot.
Response: 200 OK
{
"path": "assets/logo.psd",
"history": [
{
"snapshot": "v2.0",
"timestamp": 1775969000000,
"change_type": "modified",
"size": 512000,
"content_type": "image/vnd.adobe.photoshop",
"content_hash": "f6e5d4c3..."
},
{
"snapshot": "v1.0",
"timestamp": 1775968398000,
"change_type": "added",
"size": 256000,
"content_type": "image/vnd.adobe.photoshop",
"content_hash": "a1b2c3d4..."
}
]
}
Change types:
| Type | Meaning |
|---|---|
added | File exists in this snapshot but not the previous one |
modified | File exists in both but content changed |
unchanged | File exists in both with identical content |
deleted | File existed in the previous snapshot but not this one |
If the file has never existed in any snapshot, returns 200 with an empty history array.
Example:
curl http://localhost:3000/version/file-history/assets/logo.psd \
-H "Authorization: Bearer $TOKEN"
POST /version/file-restore/
Restore a single file from a historical version to the current HEAD. Requires root.
Before restoring, an automatic safety snapshot is created (named pre-restore-{timestamp}) to preserve the current state. If the safety snapshot cannot be created, the restore is rejected.
Request Body:
{
"snapshot": "v1.0"
}
Or using a version hash:
{
"version": "a1b2c3d4..."
}
| Field | Type | Required | Description |
|---|---|---|---|
snapshot | string | One required | Snapshot name to restore from |
version | string | One required | Version hash (hex) to restore from |
If both are provided, snapshot takes precedence.
Response: 200 OK
{
"restored": true,
"path": "assets/logo.psd",
"from_snapshot": "v1.0",
"auto_snapshot": "pre-restore-2026-04-14T05-01-01Z",
"size": 256000
}
The auto_snapshot field contains the name of the safety snapshot created before the restore. You can use this snapshot to recover the pre-restore state if needed.
Example:
curl -X POST http://localhost:3000/version/file-restore/assets/logo.psd \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"snapshot": "v1.0"}'
Error Responses:
| Status | Condition |
|---|---|
| 400 | Neither snapshot nor version provided |
| 403 | Non-root user (requires both write and snapshot permissions) |
| 404 | File not found at the specified version |
| 404 | Snapshot or version not found |
| 500 | Failed to create safety snapshot or write restored file |
Pre-Hashed Upload Protocol
AeorDB provides a 4-phase upload protocol for efficient, deduplicated file transfers. Clients split files into chunks, hash them locally, and only upload chunks the server does not already have.
Protocol Overview
- Negotiate – GET
/upload/configto learn the hash algorithm and chunk size. - Dedup check – POST
/upload/checkwith a list of chunk hashes to find which are already stored. - Upload – PUT
/upload/chunks/{hash}for each needed chunk. - Commit – POST
/upload/committo atomically assemble chunks into files.
Endpoint Summary
| Method | Path | Description | Auth | Body Limit |
|---|---|---|---|---|
| GET | /upload/config | Negotiate hash algorithm and chunk size | No | – |
| POST | /upload/check | Check which chunks the server already has | Yes | 1 MB |
| PUT | /upload/chunks/{hash} | Upload a single chunk | Yes | 10 GB |
| POST | /upload/commit | Atomic multi-file commit from chunks | Yes | 1 MB |
Phase 1: GET /upload/config
Retrieve the server’s hash algorithm, chunk size, and hash prefix. This endpoint is public (no authentication required).
Response
Status: 200 OK
{
"hash_algorithm": "blake3",
"chunk_size": 262144,
"chunk_hash_prefix": "chunk:"
}
| Field | Type | Description |
|---|---|---|
hash_algorithm | string | Hash algorithm used by the server (e.g., "blake3") |
chunk_size | integer | Maximum chunk size in bytes (262,144 = 256 KB) |
chunk_hash_prefix | string | Prefix prepended to chunk data before hashing |
How to Compute Chunk Hashes
The server computes chunk hashes as:
hash = blake3("chunk:" + chunk_bytes)
Clients must use the same formula. The prefix ("chunk:") is prepended to the raw bytes before hashing, not to the hex-encoded hash.
Example
curl http://localhost:3000/upload/config
Phase 2: POST /upload/check
Send a list of chunk hashes to determine which ones the server already has (deduplication). Only upload the ones in the needed list.
Request Body
{
"hashes": [
"a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2",
"f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5"
]
}
| Field | Type | Required | Description |
|---|---|---|---|
hashes | array of strings | Yes | Hex-encoded chunk hashes |
Response
Status: 200 OK
{
"have": [
"a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
],
"needed": [
"f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5"
]
}
| Field | Type | Description |
|---|---|---|
have | array | Hashes the server already has – skip these |
needed | array | Hashes the server needs – upload these |
Example
curl -X POST http://localhost:3000/upload/check \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"hashes": ["a1b2c3...", "f6e5d4..."]}'
Error Responses
| Status | Condition |
|---|---|
| 400 | Invalid hex hash in the list |
Phase 3: PUT /upload/chunks/
Upload a single chunk. The server verifies the hash matches the content before storing.
Request
- URL parameter:
{hash}– hex-encoded blake3 hash of"chunk:" + chunk_bytes - Headers:
Authorization: Bearer <token>(required)
- Body: raw chunk bytes
Hash Verification
The server recomputes the hash from the uploaded bytes:
computed = blake3("chunk:" + body_bytes)
If the computed hash does not match the URL parameter, the upload is rejected.
Response
Status: 201 Created (new chunk stored)
{
"status": "created",
"hash": "f6e5d4c3b2a1..."
}
Status: 200 OK (chunk already exists – dedup)
{
"status": "exists",
"hash": "f6e5d4c3b2a1..."
}
Compression
The server automatically applies Zstd compression to chunks when beneficial (based on size heuristics). This is transparent to the client.
Example
curl -X PUT http://localhost:3000/upload/chunks/f6e5d4c3b2a1... \
-H "Authorization: Bearer $TOKEN" \
--data-binary @chunk_001.bin
Error Responses
| Status | Condition |
|---|---|
| 400 | Chunk exceeds maximum size (262,144 bytes) |
| 400 | Invalid hex hash in URL |
| 400 | Hash mismatch between URL and computed hash |
| 500 | Storage failure |
Phase 4: POST /upload/commit
Atomically commit multiple files from previously uploaded chunks. Each file specifies its path, content type, and the ordered list of chunk hashes that compose it.
Request Body
{
"files": [
{
"path": "/data/report.pdf",
"content_type": "application/pdf",
"chunk_hashes": [
"a1b2c3d4e5f6...",
"f6e5d4c3b2a1..."
]
},
{
"path": "/data/image.png",
"content_type": "image/png",
"chunk_hashes": [
"1234abcd5678..."
]
}
]
}
| Field | Type | Required | Description |
|---|---|---|---|
files | array | Yes | List of files to commit |
files[].path | string | Yes | Destination path for the file |
files[].content_type | string | No | MIME type |
files[].chunk_hashes | array | Yes | Ordered list of hex-encoded chunk hashes |
Response
Status: 200 OK
The response contains a summary of the commit operation.
Example
curl -X POST http://localhost:3000/upload/commit \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"files": [
{
"path": "/data/report.pdf",
"content_type": "application/pdf",
"chunk_hashes": ["a1b2c3d4...", "f6e5d4c3..."]
}
]
}'
Error Responses
| Status | Condition |
|---|---|
| 400 | Invalid input (missing path, bad hash, etc.) |
| 500 | Commit task failure or panic |
Full Upload Workflow
Here is a complete workflow for uploading a file:
# 1. Get server configuration
CONFIG=$(curl -s http://localhost:3000/upload/config)
CHUNK_SIZE=$(echo $CONFIG | jq -r '.chunk_size')
# 2. Split file into chunks and hash them
# (pseudo-code: split report.pdf into 256KB chunks, hash each with blake3)
# chunk_hashes=["hash1", "hash2", ...]
# 3. Check which chunks are needed
DEDUP=$(curl -s -X POST http://localhost:3000/upload/check \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"hashes": ["hash1", "hash2"]}')
# 4. Upload only the needed chunks
for hash in $(echo $DEDUP | jq -r '.needed[]'); do
curl -X PUT "http://localhost:3000/upload/chunks/$hash" \
-H "Authorization: Bearer $TOKEN" \
--data-binary @"chunk_$hash.bin"
done
# 5. Commit the file
curl -X POST http://localhost:3000/upload/commit \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"files": [{
"path": "/data/report.pdf",
"content_type": "application/pdf",
"chunk_hashes": ["hash1", "hash2"]
}]
}'
Admin Operations
Administrative endpoints for garbage collection, background tasks, cron scheduling, metrics, health checks, backup/restore, and user/group management. Most admin endpoints require root access.
Endpoint Summary
Garbage Collection
| Method | Path | Description | Root Required |
|---|---|---|---|
| POST | /admin/gc | Run synchronous garbage collection | Yes |
Background Tasks
| Method | Path | Description | Root Required |
|---|---|---|---|
| POST | /admin/tasks/reindex | Trigger a reindex task | Yes |
| POST | /admin/tasks/gc | Trigger a background GC task | Yes |
| GET | /admin/tasks | List all tasks with progress | Yes |
| GET | /admin/tasks/{id} | Get a single task | Yes |
| DELETE | /admin/tasks/{id} | Cancel a task | Yes |
Cron Scheduling
| Method | Path | Description | Root Required |
|---|---|---|---|
| GET | /admin/cron | List cron schedules | Yes |
| POST | /admin/cron | Create a cron schedule | Yes |
| PATCH | /admin/cron/{id} | Update a cron schedule | Yes |
| DELETE | /admin/cron/{id} | Delete a cron schedule | Yes |
Backup & Restore
| Method | Path | Description | Root Required |
|---|---|---|---|
| POST | /admin/export | Export database as .aeordb | Yes |
| POST | /admin/diff | Create patch between versions | Yes |
| POST | /admin/import | Import a backup or patch | Yes |
| POST | /admin/promote | Promote a version hash to HEAD | Yes |
Monitoring
| Method | Path | Description | Root Required |
|---|---|---|---|
| GET | /admin/metrics | Prometheus metrics | Yes (auth required) |
| GET | /admin/health | Health check | No (public) |
API Key Management
| Method | Path | Description | Root Required |
|---|---|---|---|
| POST | /admin/api-keys | Create an API key | Yes |
| GET | /admin/api-keys | List all API keys | Yes |
| DELETE | /admin/api-keys/{key_id} | Revoke an API key | Yes |
User Management
| Method | Path | Description | Root Required |
|---|---|---|---|
| POST | /admin/users | Create a user | Yes |
| GET | /admin/users | List all users | Yes |
| GET | /admin/users/{user_id} | Get a user | Yes |
| PATCH | /admin/users/{user_id} | Update a user | Yes |
| DELETE | /admin/users/{user_id} | Deactivate a user (soft delete) | Yes |
Group Management
| Method | Path | Description | Root Required |
|---|---|---|---|
| POST | /admin/groups | Create a group | Yes |
| GET | /admin/groups | List all groups | Yes |
| GET | /admin/groups/{name} | Get a group | Yes |
| PATCH | /admin/groups/{name} | Update a group | Yes |
| DELETE | /admin/groups/{name} | Delete a group | Yes |
Garbage Collection
POST /admin/gc
Run garbage collection synchronously. Identifies and removes orphaned entries not reachable from the current HEAD.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
dry_run | boolean | false | If true, report what would be collected without deleting |
Response: 200 OK
The response contains GC statistics (entries scanned, reclaimed bytes, etc.).
Example:
# Dry run
curl -X POST "http://localhost:3000/admin/gc?dry_run=true" \
-H "Authorization: Bearer $TOKEN"
# Actual GC
curl -X POST http://localhost:3000/admin/gc \
-H "Authorization: Bearer $TOKEN"
Error Responses:
| Status | Condition |
|---|---|
| 403 | Non-root user |
| 500 | GC failure |
Background Tasks
POST /admin/tasks/reindex
Enqueue a reindex task for a directory path. Re-scans all files and rebuilds index entries.
Request Body:
{
"path": "/data/"
}
Response: 200 OK
{
"id": "task-uuid-here",
"task_type": "reindex",
"status": "pending"
}
Example:
curl -X POST http://localhost:3000/admin/tasks/reindex \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"path": "/data/"}'
POST /admin/tasks/gc
Enqueue a background GC task (non-blocking).
Request Body:
{
"dry_run": false
}
Response: 200 OK
{
"id": "task-uuid-here",
"task_type": "gc",
"status": "pending"
}
Example:
curl -X POST http://localhost:3000/admin/tasks/gc \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"dry_run": false}'
GET /admin/tasks
List all tasks with their current progress.
Response: 200 OK
[
{
"id": "task-uuid-here",
"task_type": "reindex",
"status": "running",
"args": {"path": "/data/"},
"progress": 0.45,
"eta_ms": 1775968500000
}
]
Each task includes progress (0.0-1.0) and eta_ms (estimated completion timestamp) if available.
Example:
curl http://localhost:3000/admin/tasks \
-H "Authorization: Bearer $TOKEN"
GET /admin/tasks/
Get a single task by ID.
Response: 200 OK
{
"id": "task-uuid-here",
"task_type": "reindex",
"status": "running",
"args": {"path": "/data/"},
"progress": 0.45,
"eta_ms": 1775968500000
}
Error Responses:
| Status | Condition |
|---|---|
| 404 | Task not found |
DELETE /admin/tasks/
Cancel a task.
Response: 200 OK
{
"id": "task-uuid-here",
"status": "cancelled"
}
Example:
curl -X DELETE http://localhost:3000/admin/tasks/task-uuid-here \
-H "Authorization: Bearer $TOKEN"
Cron Scheduling
GET /admin/cron
List all cron schedules.
Response: 200 OK
[
{
"id": "nightly-gc",
"schedule": "0 2 * * *",
"task_type": "gc",
"args": {"dry_run": false},
"enabled": true
}
]
POST /admin/cron
Create a new cron schedule.
Request Body:
{
"id": "nightly-gc",
"schedule": "0 2 * * *",
"task_type": "gc",
"args": {"dry_run": false},
"enabled": true
}
| Field | Type | Required | Description |
|---|---|---|---|
id | string | Yes | Unique schedule identifier |
schedule | string | Yes | Cron expression |
task_type | string | Yes | Task type to enqueue ("gc", "reindex") |
args | object | Yes | Arguments passed to the task |
enabled | boolean | Yes | Whether the schedule is active |
Response: 201 Created
Example:
curl -X POST http://localhost:3000/admin/cron \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"id": "nightly-gc",
"schedule": "0 2 * * *",
"task_type": "gc",
"args": {"dry_run": false},
"enabled": true
}'
Error Responses:
| Status | Condition |
|---|---|
| 400 | Invalid cron expression |
| 409 | Schedule with this ID already exists |
PATCH /admin/cron/
Update a cron schedule. All fields are optional – only provided fields are changed.
Request Body:
{
"enabled": false,
"schedule": "0 3 * * *"
}
| Field | Type | Description |
|---|---|---|
enabled | boolean | Enable or disable the schedule |
schedule | string | New cron expression |
task_type | string | New task type |
args | object | New task arguments |
Response: 200 OK
Returns the updated schedule.
Error Responses:
| Status | Condition |
|---|---|
| 400 | Invalid cron expression |
| 404 | Schedule not found |
DELETE /admin/cron/
Delete a cron schedule.
Response: 200 OK
{
"id": "nightly-gc",
"deleted": true
}
Error Responses:
| Status | Condition |
|---|---|
| 404 | Schedule not found |
Backup & Restore
POST /admin/export
Export the database (or a specific version) as an .aeordb archive file.
Query Parameters:
| Parameter | Type | Description |
|---|---|---|
snapshot | string | Export a named snapshot (default: HEAD) |
hash | string | Export a specific version by hex hash |
Response: 200 OK
- Content-Type:
application/octet-stream - Content-Disposition:
attachment; filename="export-{hash_prefix}.aeordb" - Body: binary archive data
Example:
# Export HEAD
curl -X POST http://localhost:3000/admin/export \
-H "Authorization: Bearer $TOKEN" \
-o backup.aeordb
# Export a specific snapshot
curl -X POST "http://localhost:3000/admin/export?snapshot=v1.0" \
-H "Authorization: Bearer $TOKEN" \
-o backup-v1.aeordb
# Export by hash
curl -X POST "http://localhost:3000/admin/export?hash=a1b2c3d4..." \
-H "Authorization: Bearer $TOKEN" \
-o backup.aeordb
POST /admin/diff
Create a patch file representing the difference between two versions.
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
from | string | Yes | Source snapshot name or hex hash |
to | string | No | Target snapshot name or hex hash (default: HEAD) |
Response: 200 OK
- Content-Type:
application/octet-stream - Content-Disposition:
attachment; filename="patch-{hash_prefix}.aeordb" - Body: binary patch data
Example:
curl -X POST "http://localhost:3000/admin/diff?from=v1.0&to=v2.0" \
-H "Authorization: Bearer $TOKEN" \
-o patch-v1-v2.aeordb
POST /admin/import
Import a backup or patch file. Body limit: 10 MB.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
force | boolean | false | Force import even if conflicts exist |
promote | boolean | false | Promote the imported version to HEAD |
Request:
- Headers:
Authorization: Bearer <token>(required)
- Body: raw
.aeordbfile bytes
Response: 200 OK
{
"status": "success",
"backup_type": "export",
"entries_imported": 1500,
"chunks_imported": 3200,
"files_imported": 450,
"directories_imported": 30,
"deletions_applied": 5,
"version_hash": "a1b2c3d4e5f6...",
"head_promoted": true
}
Example:
curl -X POST "http://localhost:3000/admin/import?promote=true" \
-H "Authorization: Bearer $TOKEN" \
--data-binary @backup.aeordb
Error Responses:
| Status | Condition |
|---|---|
| 400 | Invalid or corrupt backup file |
| 403 | Non-root user |
POST /admin/promote
Promote an arbitrary version hash to HEAD.
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
hash | string | Yes | Hex-encoded version hash to promote |
Response: 200 OK
{
"status": "success",
"head": "a1b2c3d4e5f6..."
}
Example:
curl -X POST "http://localhost:3000/admin/promote?hash=a1b2c3d4e5f6..." \
-H "Authorization: Bearer $TOKEN"
Error Responses:
| Status | Condition |
|---|---|
| 400 | Invalid hash format |
| 404 | Version hash not found in storage |
Monitoring
GET /admin/health
Public health check endpoint. No authentication required.
Response: 200 OK
{
"status": "ok"
}
Example:
curl http://localhost:3000/admin/health
GET /admin/metrics
Prometheus-format metrics endpoint. Requires authentication.
Response: 200 OK
- Content-Type:
text/plain; version=0.0.4; charset=utf-8 - Body: Prometheus text exposition format
Example:
curl http://localhost:3000/admin/metrics \
-H "Authorization: Bearer $TOKEN"
API Key Management
POST /admin/api-keys
Create a new API key. The plaintext key is returned only once – store it securely. Requires root.
Request Body:
{
"user_id": "550e8400-e29b-41d4-a716-446655440000"
}
| Field | Type | Required | Description |
|---|---|---|---|
user_id | string (UUID) | No | User to create the key for (defaults to the calling user) |
Response: 201 Created
{
"key_id": "660e8400-e29b-41d4-a716-446655440001",
"api_key": "aeor_660e8400_a1b2c3d4e5f6...",
"user_id": "550e8400-e29b-41d4-a716-446655440000",
"created_at": "2026-04-13T10:00:00Z"
}
Example:
curl -X POST http://localhost:3000/admin/api-keys \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"user_id": "550e8400-e29b-41d4-a716-446655440000"}'
GET /admin/api-keys
List all API keys (metadata only – no secrets). Requires root.
Response: 200 OK
[
{
"key_id": "660e8400-e29b-41d4-a716-446655440001",
"user_id": "550e8400-e29b-41d4-a716-446655440000",
"created_at": "2026-04-13T10:00:00Z",
"is_revoked": false
}
]
DELETE /admin/api-keys/
Revoke an API key. Revoked keys cannot be used to obtain tokens. Requires root.
Response: 200 OK
{
"revoked": true,
"key_id": "660e8400-e29b-41d4-a716-446655440001"
}
Error Responses:
| Status | Condition |
|---|---|
| 400 | Invalid key ID format |
| 404 | API key not found |
User Management
POST /admin/users
Create a new user. Requires root.
Request Body:
{
"username": "alice",
"email": "[email protected]"
}
| Field | Type | Required | Description |
|---|---|---|---|
username | string | Yes | Unique username |
email | string | No | User email address |
Response: 201 Created
{
"user_id": "550e8400-e29b-41d4-a716-446655440000",
"username": "alice",
"email": "[email protected]",
"is_active": true,
"created_at": 1775968398000,
"updated_at": 1775968398000
}
GET /admin/users
List all users. Requires root.
Response: 200 OK
[
{
"user_id": "550e8400-e29b-41d4-a716-446655440000",
"username": "alice",
"email": "[email protected]",
"is_active": true,
"created_at": 1775968398000,
"updated_at": 1775968398000
}
]
GET /admin/users/
Get a single user. Requires root.
Response: 200 OK (same shape as the user object above)
Error Responses:
| Status | Condition |
|---|---|
| 400 | Invalid UUID |
| 404 | User not found |
PATCH /admin/users/
Update a user. All fields are optional. Requires root.
Request Body:
{
"username": "alice_updated",
"email": "[email protected]",
"is_active": true
}
Response: 200 OK (returns the updated user)
DELETE /admin/users/
Deactivate a user (soft delete – sets is_active to false). Requires root.
Response: 200 OK
{
"deactivated": true,
"user_id": "550e8400-e29b-41d4-a716-446655440000"
}
Group Management
Groups define path-level access control rules using query-based membership.
POST /admin/groups
Create a new group. Requires root.
Request Body:
{
"name": "editors",
"default_allow": "/content/*",
"default_deny": "/admin/*",
"query_field": "role",
"query_operator": "eq",
"query_value": "editor"
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique group name |
default_allow | string | Yes | Path pattern for allowed access |
default_deny | string | Yes | Path pattern for denied access |
query_field | string | Yes | User field to query for membership (must be a safe field) |
query_operator | string | Yes | Comparison operator |
query_value | string | Yes | Value to match against |
Response: 201 Created
{
"name": "editors",
"default_allow": "/content/*",
"default_deny": "/admin/*",
"query_field": "role",
"query_operator": "eq",
"query_value": "editor",
"created_at": 1775968398000,
"updated_at": 1775968398000
}
GET /admin/groups
List all groups. Requires root.
Response: 200 OK (array of group objects)
GET /admin/groups/
Get a single group. Requires root.
Error Responses:
| Status | Condition |
|---|---|
| 404 | Group not found |
PATCH /admin/groups/
Update a group. All fields are optional. Requires root.
Request Body:
{
"default_allow": "/content/*",
"query_value": "senior-editor"
}
The query_field value is validated against a whitelist of safe fields. Attempting to use an unsafe field returns a 400 error.
Error Responses:
| Status | Condition |
|---|---|
| 400 | Unsafe query field |
| 404 | Group not found |
DELETE /admin/groups/
Delete a group. Requires root.
Response: 200 OK
{
"deleted": true,
"name": "editors"
}
Error Responses:
| Status | Condition |
|---|---|
| 404 | Group not found |
Plugin Endpoints
AeorDB supports deploying WebAssembly (WASM) plugins that extend the database with custom logic. Plugins are scoped to a {database}/{schema}/{table} namespace.
Endpoint Summary
| Method | Path | Description | Auth |
|---|---|---|---|
| PUT | /{db}/{schema}/{table}/_deploy | Deploy a WASM plugin | Yes |
| POST | /{db}/{schema}/{table}/{function}/_invoke | Invoke a plugin function | Yes |
| GET | /{db}/_plugins | List all deployed plugins | Yes |
| DELETE | /{db}/{schema}/{table}/{function}/_remove | Remove a deployed plugin | Yes |
PUT /{db}/{schema}/{table}/_deploy
Deploy a WASM plugin to the given namespace. If a plugin already exists at this path, it is replaced.
Request
- URL parameters:
{db}– database name{schema}– schema name{table}– table name
- Query parameters:
name(optional) – plugin name (defaults to the{table}segment)plugin_type(optional) – plugin type string (defaults to"wasm")
- Headers:
Authorization: Bearer <token>(required)
- Body: raw WASM binary bytes
Response
Status: 200 OK
Returns the plugin metadata:
{
"name": "my-plugin",
"path": "mydb/public/users",
"plugin_type": "wasm",
"deployed_at": "2026-04-13T10:00:00Z"
}
Example
curl -X PUT "http://localhost:3000/mydb/public/users/_deploy?name=my-plugin" \
-H "Authorization: Bearer $TOKEN" \
--data-binary @plugin.wasm
Error Responses
| Status | Condition |
|---|---|
| 400 | Empty body or invalid plugin type |
| 400 | Invalid WASM module |
| 500 | Deployment failure |
POST /{db}/{schema}/{table}/{function}/_invoke
Invoke a deployed plugin’s function. The request body is wrapped in a PluginRequest envelope with metadata, then passed to the WASM runtime.
Request
- URL parameters:
{db}– database name{schema}– schema name{table}– table name{function}– function name to invoke
- Headers:
Authorization: Bearer <token>(required)Content-Type– depends on what the plugin expects
- Body: raw request payload (passed to the plugin as
arguments)
Plugin Request Envelope
The server wraps the raw body into:
{
"arguments": "<raw body bytes>",
"metadata": {
"function_name": "compute",
"path": "/mydb/public/users/compute",
"plugin_path": "mydb/public/users"
}
}
Plugin Response
Plugins return a PluginResponse envelope:
{
"status_code": 200,
"content_type": "application/json",
"headers": {
"x-custom-header": "value"
},
"body": "<response bytes>"
}
The server maps these fields to the HTTP response. For security, only safe headers are forwarded:
- Headers starting with
x- cache-control,etag,last-modified,content-disposition,content-language,content-encoding,vary
If the plugin returns data that is not a valid PluginResponse, it is sent as raw application/octet-stream bytes (backward compatibility).
WASM Host Functions
Plugins have access to the following host functions for interacting with the database:
- CRUD: read, write, and delete files
- Query: execute queries and aggregations against the engine
- Context: access request metadata
Example
curl -X POST http://localhost:3000/mydb/public/users/compute/_invoke \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"input": "hello"}'
Error Responses
| Status | Condition |
|---|---|
| 404 | Plugin not found at the given path |
| 500 | Plugin invocation failure (runtime error, panic, etc.) |
GET /{db}/_plugins
List all deployed plugins.
Request
- URL parameters:
{db}– database name
- Headers:
Authorization: Bearer <token>(required)
Response
Status: 200 OK
[
{
"name": "my-plugin",
"path": "mydb/public/users",
"plugin_type": "wasm"
}
]
Example
curl http://localhost:3000/mydb/_plugins \
-H "Authorization: Bearer $TOKEN"
DELETE /{db}/{schema}/{table}/{function}/_remove
Remove a deployed plugin.
Request
- URL parameters:
{db}– database name{schema}– schema name{table}– table name{function}– function name
- Headers:
Authorization: Bearer <token>(required)
Response
Status: 200 OK
{
"removed": true,
"path": "mydb/public/users"
}
Example
curl -X DELETE http://localhost:3000/mydb/public/users/compute/_remove \
-H "Authorization: Bearer $TOKEN"
Error Responses
| Status | Condition |
|---|---|
| 404 | Plugin not found |
| 500 | Removal failure |
Events & Webhooks
AeorDB publishes real-time events via Server-Sent Events (SSE). Clients can subscribe to a filtered stream of engine events for live updates.
Endpoint Summary
| Method | Path | Description | Auth |
|---|---|---|---|
| GET | /events/stream | SSE event stream | Yes |
GET /events/stream
Open a persistent Server-Sent Events connection. The server pushes events as they occur and sends periodic keepalive pings.
Query Parameters
| Parameter | Type | Description |
|---|---|---|
events | string | Comma-separated list of event types to receive (default: all) |
path_prefix | string | Only receive events whose payload contains a path starting with this prefix |
Request
curl -N http://localhost:3000/events/stream \
-H "Authorization: Bearer $TOKEN"
Filtered Stream
Subscribe to only specific event types:
curl -N "http://localhost:3000/events/stream?events=entries_created,entries_deleted" \
-H "Authorization: Bearer $TOKEN"
Filter by path prefix:
curl -N "http://localhost:3000/events/stream?path_prefix=/data/users" \
-H "Authorization: Bearer $TOKEN"
Combine both:
curl -N "http://localhost:3000/events/stream?events=entries_created&path_prefix=/data/" \
-H "Authorization: Bearer $TOKEN"
Response Format
The response is an SSE stream. Each event has the standard SSE fields:
id: evt-uuid-here
event: entries_created
data: {"event_id":"evt-uuid-here","event_type":"entries_created","timestamp":1775968398000,"payload":{"entries":[{"path":"/data/report.pdf"}]}}
Event Envelope
Each event is a JSON object with:
| Field | Type | Description |
|---|---|---|
event_id | string | Unique event identifier |
event_type | string | Type of event (see below) |
timestamp | integer | Unix timestamp (milliseconds) |
payload | object | Event-specific data |
Event Types
| Event Type | Description | Payload |
|---|---|---|
entries_created | Files were created or updated | {"entries": [{"path": "..."}]} |
entries_deleted | Files were deleted | {"entries": [{"path": "..."}]} |
versions_created | A new version (snapshot/fork) was created | Version metadata |
permissions_changed | Permissions were updated for a path | {"path": "..."} |
indexes_changed | Index configuration was updated | {"path": "..."} |
Keepalive
The server sends a keepalive ping every 30 seconds to prevent connection timeouts:
: ping
Path Prefix Matching
The path prefix filter checks two locations in the event payload:
- Batch events:
payload.entries[].path– matches if any entry’s path starts with the prefix - Single-path events:
payload.path– matches if the path starts with the prefix
Connection Behavior
- The connection stays open indefinitely until the client disconnects.
- If the client falls behind (lagged), missed events are silently dropped.
- Reconnecting clients should use the last received
idfor gap detection (standard SSELast-Event-IDheader).
JavaScript Example
const evtSource = new EventSource(
'http://localhost:3000/events/stream?events=entries_created',
{ headers: { 'Authorization': 'Bearer ' + token } }
);
evtSource.addEventListener('entries_created', (event) => {
const data = JSON.parse(event.data);
console.log('Files created:', data.payload.entries);
});
evtSource.onerror = (err) => {
console.error('SSE error:', err);
};
Webhook Configuration
Webhooks can be configured in-database by storing webhook configuration files. The event bus internally broadcasts all events, and webhook delivery can be wired to the SSE stream.
Webhook configuration is stored at a well-known path within the engine and follows the same event type filtering as the SSE endpoint.
Authentication
AeorDB supports multiple authentication modes. All protected endpoints require either a JWT Bearer token or are accessed through an API key exchange.
Auth Modes
AeorDB can run in one of three authentication modes, selected at startup:
| Mode | CLI Flag | Description |
|---|---|---|
disabled | --auth disabled | No authentication. All requests are allowed. |
self-contained | --auth self-contained | Keys and users stored inside the database (default). |
file | --auth file://<path> | Identity loaded from an external file. Returns a bootstrap API key on first run. |
Disabled Mode
All middleware is bypassed. Every request is treated as authenticated. Useful for local development.
Self-Contained Mode
The default. Users, API keys, and tokens are all stored within the AeorDB engine itself. The JWT signing key is generated automatically.
File Mode
Identity is loaded from an external file at the specified path. On first startup, a bootstrap API key is printed to stdout so you can authenticate and set up additional users.
Endpoint Summary
| Method | Path | Description | Auth Required |
|---|---|---|---|
| POST | /auth/token | Exchange API key for JWT | No |
| POST | /auth/magic-link | Request a magic link | No |
| GET | /auth/magic-link/verify | Verify a magic link code | No |
| POST | /auth/refresh | Refresh an expired JWT | No |
| POST | /admin/api-keys | Create an API key | Yes (root) |
| GET | /admin/api-keys | List API keys | Yes (root) |
| DELETE | /admin/api-keys/{key_id} | Revoke an API key | Yes (root) |
JWT Tokens
All protected endpoints accept a JWT Bearer token in the Authorization header:
Authorization: Bearer eyJhbGciOiJIUzI1NiIs...
Token Claims
| Claim | Type | Description |
|---|---|---|
sub | string | User ID (UUID) or email |
iss | string | Always "aeordb" |
iat | integer | Issued-at timestamp (Unix seconds) |
exp | integer | Expiration timestamp (Unix seconds) |
scope | string | Optional scope restriction |
permissions | object | Optional fine-grained permissions |
POST /auth/token
Exchange an API key for a JWT and refresh token. This is the primary authentication flow.
Request Body
{
"api_key": "aeor_660e8400_a1b2c3d4e5f6..."
}
Response
Status: 200 OK
{
"token": "eyJhbGciOiJIUzI1NiIs...",
"expires_in": 3600,
"refresh_token": "rt_a1b2c3d4e5f6..."
}
| Field | Type | Description |
|---|---|---|
token | string | JWT access token |
expires_in | integer | Token lifetime in seconds |
refresh_token | string | Refresh token for obtaining new JWTs |
API Key Format
API keys follow the format aeor_{key_id_prefix}_{secret}. The key_id_prefix is extracted for O(1) lookup – the server does not iterate all stored keys.
Example
curl -X POST http://localhost:3000/auth/token \
-H "Content-Type: application/json" \
-d '{"api_key": "aeor_660e8400_a1b2c3d4e5f6..."}'
Error Responses
| Status | Condition |
|---|---|
| 401 | Invalid, revoked, or malformed API key |
| 500 | Token creation failure |
POST /auth/magic-link
Request a magic link for passwordless authentication. The server always returns 200 OK regardless of whether the email exists, to prevent email enumeration.
In development mode, the magic link URL is logged via tracing (no email is actually sent).
Rate Limiting
This endpoint is rate-limited per email address. Exceeding the limit returns 429 Too Many Requests.
Request Body
{
"email": "[email protected]"
}
Response
Status: 200 OK
{
"message": "If an account exists, a login link has been sent."
}
Example
curl -X POST http://localhost:3000/auth/magic-link \
-H "Content-Type: application/json" \
-d '{"email": "[email protected]"}'
Error Responses
| Status | Condition |
|---|---|
| 429 | Rate limit exceeded |
GET /auth/magic-link/verify
Verify a magic link code and receive a JWT. Each code can only be used once.
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
code | string | Yes | The magic link code |
Response
Status: 200 OK
{
"token": "eyJhbGciOiJIUzI1NiIs...",
"expires_in": 3600
}
Example
curl "http://localhost:3000/auth/magic-link/verify?code=abc123..."
Error Responses
| Status | Condition |
|---|---|
| 401 | Invalid code, expired, or already used |
| 500 | Token creation failure |
POST /auth/refresh
Exchange a refresh token for a new JWT and a new refresh token. Implements token rotation – the old refresh token is revoked and cannot be reused.
Request Body
{
"refresh_token": "rt_a1b2c3d4e5f6..."
}
Response
Status: 200 OK
{
"token": "eyJhbGciOiJIUzI1NiIs...",
"expires_in": 3600,
"refresh_token": "rt_new_token_here..."
}
Example
curl -X POST http://localhost:3000/auth/refresh \
-H "Content-Type: application/json" \
-d '{"refresh_token": "rt_a1b2c3d4e5f6..."}'
Error Responses
| Status | Condition |
|---|---|
| 401 | Invalid, revoked, or expired refresh token |
| 500 | Token creation failure |
Root User
The root user has the nil UUID (00000000-0000-0000-0000-000000000000). Only the root user can:
- Create and manage API keys
- Create and manage users
- Create and manage groups
- Restore snapshots and manage forks
- Run garbage collection
- Manage tasks and cron schedules
- Export, import, and promote versions
First-Run Bootstrap
When using file:// auth mode, a bootstrap API key is printed to stdout on first run. Use this key to authenticate as root and create additional users and keys:
# Start the server (prints bootstrap key)
aeordb --auth file:///path/to/identity.json
# Exchange the bootstrap key for a token
curl -X POST http://localhost:3000/auth/token \
-H "Content-Type: application/json" \
-d '{"api_key": "<bootstrap-key>"}'
Authentication Flow Summary
API Key Magic Link
| |
POST /auth/token POST /auth/magic-link
| |
v v
JWT + Refresh Email with code
| |
| GET /auth/magic-link/verify
| |
v v
Use JWT in JWT Token
Authorization header |
| |
v v
Protected endpoints Protected endpoints
|
Token expires
|
POST /auth/refresh
|
v
New JWT + New Refresh
(old refresh revoked)
CORS Configuration
AeorDB supports Cross-Origin Resource Sharing (CORS) through a CLI flag and per-path rules stored in the database.
Configuration Methods
CLI Flag
Enable CORS at startup with the --cors flag:
# Allow all origins
aeordb --cors "*"
# Allow specific origins
aeordb --cors "https://app.example.com,https://admin.example.com"
The CLI flag sets the default CORS policy for all routes. When no --cors flag is provided, no CORS headers are sent.
Per-Path Rules (Config File)
For fine-grained control, store per-path CORS rules at /.config/cors.json inside the database:
{
"rules": [
{
"path": "/engine/*",
"origins": ["https://app.example.com"],
"methods": ["GET", "POST", "PUT", "DELETE", "HEAD", "OPTIONS"],
"allow_headers": ["Content-Type", "Authorization"],
"max_age": 3600,
"allow_credentials": false
},
{
"path": "/query",
"origins": ["*"],
"methods": ["POST"],
"allow_headers": ["Content-Type", "Authorization"],
"max_age": 600,
"allow_credentials": false
},
{
"path": "/events/stream",
"origins": ["https://app.example.com"],
"allow_credentials": true
}
]
}
Upload the config file using the engine API:
curl -X PUT http://localhost:3000/engine/.config/cors.json \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d @cors.json
Rule Schema
Each rule in the rules array supports:
| Field | Type | Default | Description |
|---|---|---|---|
path | string | (required) | URL path to match. Supports trailing * for prefix matching. |
origins | array of strings | (required) | Allowed origins. Use ["*"] for any origin. |
methods | array of strings | ["GET","POST","PUT","DELETE","HEAD","OPTIONS"] | Allowed HTTP methods. |
allow_headers | array of strings | ["Content-Type","Authorization"] | Allowed request headers. |
max_age | integer | 3600 | Preflight cache duration in seconds. |
allow_credentials | boolean | false | Whether to include Access-Control-Allow-Credentials: true. |
Path Matching
Per-path rules are checked in order (first match wins):
- Exact match:
"/query"matches only/query - Prefix match:
"/engine/*"matches/engine/data/file.json,/engine/images/photo.png, etc.
If no per-path rule matches, the CLI default (if any) is used.
Precedence
- Per-path rules from
/.config/cors.json(first match wins) - CLI
--corsflag defaults - No CORS headers (if neither is configured)
CORS Middleware Behavior
The CORS middleware runs as the outermost layer in the middleware stack, ensuring that OPTIONS preflight requests are handled before authentication middleware rejects them for missing tokens.
Preflight Requests (OPTIONS)
When a preflight request arrives:
- The middleware checks if the
Originheader matches an allowed origin. - If allowed: returns
204 No Contentwith CORS headers:Access-Control-Allow-OriginAccess-Control-Allow-MethodsAccess-Control-Allow-HeadersAccess-Control-Max-AgeAccess-Control-Allow-Credentials(if configured)
- If not allowed: returns
403 Forbidden.
Normal Requests
For non-preflight requests with an allowed origin:
- The request passes through to the handler normally.
- CORS headers are appended to the response:
Access-Control-Allow-OriginAccess-Control-Allow-Credentials(if configured)
For requests from non-allowed origins, no CORS headers are added (the browser will block the response).
Wildcard Origin
When origins include "*", the Access-Control-Allow-Origin header is set to *. Note: when using allow_credentials: true, browsers require a specific origin rather than *.
Default CORS Headers (CLI Flag)
When only the --cors flag is used (no per-path rules), the defaults are:
| Header | Value |
|---|---|
Access-Control-Allow-Methods | GET, POST, PUT, DELETE, HEAD, OPTIONS |
Access-Control-Allow-Headers | Content-Type, Authorization |
Access-Control-Max-Age | 3600 |
Access-Control-Allow-Credentials | Not set |
Examples
Development: Allow Everything
aeordb --cors "*"
Production: Specific Origins
aeordb --cors "https://app.example.com,https://admin.example.com"
Per-Path with Credentials
Store in /.config/cors.json:
{
"rules": [
{
"path": "/events/stream",
"origins": ["https://app.example.com"],
"allow_credentials": true,
"max_age": 86400
},
{
"path": "/engine/*",
"origins": ["https://app.example.com", "https://admin.example.com"],
"methods": ["GET", "PUT", "DELETE", "HEAD", "OPTIONS"],
"allow_headers": ["Content-Type", "Authorization", "X-Request-ID"]
}
]
}
Parser Plugins
Parser plugins let you transform non-JSON files (plaintext, CSV, PDF, images, etc.) into structured, queryable JSON when they are stored in AeorDB. Parsers are compiled to WebAssembly and deployed per-table, so each data collection can have its own parsing logic.
How It Works
When a file is written to a table that has a parser configured, AeorDB automatically routes the raw bytes through the parser’s WASM module. The parser receives the file data plus metadata and returns a JSON value. That JSON is then indexed by AeorDB’s query engine, making the original non-JSON file fully searchable.
Writing a Parser: Step by Step
1. Create a Rust Crate
cargo new my-parser --lib
cd my-parser
Edit Cargo.toml:
[package]
name = "my-parser"
version = "0.1.0"
edition = "2021"
[lib]
crate-type = ["cdylib"]
[dependencies]
aeordb-plugin-sdk = { path = "../aeordb-plugin-sdk" }
serde_json = "1"
The crate-type = ["cdylib"] is required – it tells the compiler to produce a dynamic library suitable for WASM.
2. Implement the Parse Function
Use the aeordb_parser! macro to generate the WASM export boilerplate. Your job is to write a function that takes a ParserInput and returns Result<serde_json::Value, String>.
#![allow(unused)]
fn main() {
use aeordb_plugin_sdk::aeordb_parser;
use aeordb_plugin_sdk::parser::*;
aeordb_parser!(parse);
fn parse(input: ParserInput) -> Result<serde_json::Value, String> {
let text = std::str::from_utf8(&input.data)
.map_err(|e| e.to_string())?;
Ok(serde_json::json!({
"text": text,
"metadata": {
"line_count": text.lines().count(),
"word_count": text.split_whitespace().count(),
}
}))
}
}
The aeordb_parser! macro generates:
- A global allocator for the WASM target
- A
handle(ptr, len) -> i64export that deserializes the parser envelope, calls your function, and returns the serialized response as a packed pointer+length
You never interact with the raw WASM ABI directly.
3. Build for WASM
cargo build --target wasm32-unknown-unknown --release
The compiled module lands at:
target/wasm32-unknown-unknown/release/my_parser.wasm
4. Deploy the Parser
Upload the WASM binary to a table’s plugin deployment endpoint:
curl -X PUT \
http://localhost:3000/mydb/myschema/mytable/_deploy \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/wasm" \
--data-binary @target/wasm32-unknown-unknown/release/my_parser.wasm
5. Configure Content-Type Routing
Create or update /.config/parsers.json to route specific content types to your parser:
{
"parsers": {
"text/plain": "my-parser",
"text/csv": "csv-parser",
"application/pdf": "pdf-parser"
}
}
When a file with a matching Content-Type is stored, AeorDB automatically invokes the corresponding parser.
6. Configure Indexing
Add the parser name to indexes.json so the parsed output is indexed:
{
"indexes": [
{
"field": "text",
"type": "fulltext"
},
{
"field": "metadata.word_count",
"type": "numeric"
}
]
}
The ParserInput Struct
Your parse function receives a ParserInput with two fields:
| Field | Type | Description |
|---|---|---|
data | Vec<u8> | Raw file bytes (already base64-decoded from the wire envelope) |
meta | FileMeta | Metadata about the file being parsed |
FileMeta Fields
| Field | Type | Description |
|---|---|---|
filename | String | File name only (e.g., "report.pdf") |
path | String | Full storage path (e.g., "/docs/reports/report.pdf") |
content_type | String | MIME type (e.g., "text/plain") |
size | u64 | Raw file size in bytes |
hash | String | Hex-encoded content hash (may be empty) |
hash_algorithm | String | Hash algorithm used (e.g., "blake3_256") |
created_at | i64 | Creation timestamp (ms since epoch, default 0) |
updated_at | i64 | Last update timestamp (ms since epoch, default 0) |
Real-World Example: Plaintext Parser
The built-in plaintext parser (aeordb-parsers/plaintext) demonstrates a production parser:
#![allow(unused)]
fn main() {
use aeordb_plugin_sdk::aeordb_parser;
use aeordb_plugin_sdk::parser::*;
aeordb_parser!(parse);
fn parse(input: ParserInput) -> Result<serde_json::Value, String> {
let text = std::str::from_utf8(&input.data)
.map_err(|e| format!("not valid UTF-8: {}", e))?;
let line_count = text.lines().count();
let word_count = text.split_whitespace().count();
let char_count = text.chars().count();
let byte_count = input.data.len();
// Extract first line as a "title" (common convention for text files)
let title = text.lines().next().unwrap_or("").trim().to_string();
// Detect if it looks like source code
let has_braces = text.contains('{') && text.contains('}');
let has_imports = text.contains("import ")
|| text.contains("use ")
|| text.contains("#include");
let looks_like_code = has_braces || has_imports;
Ok(serde_json::json!({
"text": text,
"metadata": {
"filename": input.meta.filename,
"content_type": input.meta.content_type,
"size": byte_count,
"line_count": line_count,
"word_count": word_count,
"char_count": char_count,
},
"title": title,
"looks_like_code": looks_like_code,
}))
}
}
This parser:
- Validates UTF-8 encoding (returns an error for binary data)
- Extracts text statistics (lines, words, characters)
- Pulls the first line as a title
- Heuristically detects source code
Error Handling
Return Err(String) from your parse function to signal a failure. AeorDB will store the error in the parser response and the file will not be indexed. The original file is still stored – only parsing/indexing is skipped.
#![allow(unused)]
fn main() {
fn parse(input: ParserInput) -> Result<serde_json::Value, String> {
if input.data.is_empty() {
return Err("empty file".to_string());
}
// ...
}
}
See Also
- Query Plugins – plugins that query the database and return custom responses
- SDK Reference – complete type reference for the plugin SDK
Query Plugins
Query plugins are WASM modules that can read, write, delete, query, and aggregate data inside AeorDB, then return custom HTTP responses. They are the extension mechanism for building custom API endpoints, computed views, data transformations, or any logic that needs to run server-side.
How It Works
A query plugin receives a PluginRequest (containing the HTTP body and metadata) and a PluginContext that provides host functions for interacting with the database. The plugin performs whatever logic it needs – querying data, writing files, aggregating results – and returns a PluginResponse with a status code, body, and content type.
Writing a Query Plugin: Step by Step
1. Create a Rust Crate
cargo new my-plugin --lib
cd my-plugin
Edit Cargo.toml:
[package]
name = "my-plugin"
version = "0.1.0"
edition = "2021"
[lib]
crate-type = ["cdylib"]
[dependencies]
aeordb-plugin-sdk = { path = "../aeordb-plugin-sdk" }
serde_json = "1"
2. Implement the Handler
Use the aeordb_query_plugin! macro and write a function that takes (PluginContext, PluginRequest) and returns Result<PluginResponse, PluginError>.
#![allow(unused)]
fn main() {
use aeordb_plugin_sdk::prelude::*;
use aeordb_plugin_sdk::aeordb_query_plugin;
aeordb_query_plugin!(handle);
fn handle(ctx: PluginContext, request: PluginRequest) -> Result<PluginResponse, PluginError> {
let results = ctx.query("/users")
.field("name").contains("Alice")
.field("age").gt_u64(21)
.limit(10)
.execute()?;
PluginResponse::json(200, &serde_json::json!({
"users": results,
"count": results.len()
})).map_err(|e| PluginError::SerializationFailed(e.to_string()))
}
}
The aeordb_query_plugin! macro generates:
- A global allocator for the WASM target
- An
alloc(size) -> ptrexport for host-to-guest memory allocation - A
handle(ptr, len) -> i64export that deserializes the request, creates aPluginContext, calls your function, and returns the serialized response
3. Build and Deploy
cargo build --target wasm32-unknown-unknown --release
curl -X PUT \
http://localhost:3000/mydb/myschema/mytable/_deploy \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/wasm" \
--data-binary @target/wasm32-unknown-unknown/release/my_plugin.wasm
4. Invoke the Plugin
curl -X POST \
http://localhost:3000/mydb/myschema/mytable/_invoke/my-plugin \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "Alice"}'
The PluginRequest Struct
| Field | Type | Description |
|---|---|---|
arguments | Vec<u8> | Raw argument bytes (the HTTP request body forwarded to the plugin) |
metadata | HashMap<String, String> | Key-value metadata about the invocation context |
The metadata map typically contains:
| Key | Description |
|---|---|
function_name | The function name from the invoke URL (e.g., "echo", "read") |
You can use function_name to multiplex a single plugin into multiple operations:
#![allow(unused)]
fn main() {
fn handle(ctx: PluginContext, request: PluginRequest) -> Result<PluginResponse, PluginError> {
let function = request.metadata
.get("function_name")
.map(|s| s.as_str())
.unwrap_or("default");
match function {
"search" => handle_search(ctx, &request),
"stats" => handle_stats(ctx, &request),
_ => Ok(PluginResponse::error(404, &format!("Unknown function: {}", function))),
}
}
}
The PluginContext
PluginContext is your handle for calling AeorDB host functions from inside the WASM sandbox. It is created automatically by the macro and passed to your handler.
File Operations
#![allow(unused)]
fn main() {
// Read a file -- returns FileData { data, content_type, size }
let file = ctx.read_file("/mydb/users/alice.json")?;
// Write a file (create or overwrite)
ctx.write_file("/mydb/output/result.json", b"{\"ok\":true}", "application/json")?;
// Delete a file
ctx.delete_file("/mydb/temp/scratch.json")?;
// Get file metadata -- returns FileMetadata { path, size, content_type, created_at, updated_at }
let meta = ctx.file_metadata("/mydb/users/alice.json")?;
// List directory entries -- returns Vec<DirEntry { name, entry_type, size }>
let entries = ctx.list_directory("/mydb/users/")?;
}
Query
Use ctx.query(path) to get a QueryBuilder with a fluent API:
#![allow(unused)]
fn main() {
let results = ctx.query("/users")
.field("name").contains("Alice")
.field("age").gt_u64(21)
.sort("name", SortDirection::Asc)
.limit(10)
.offset(0)
.execute()?;
// results: Vec<QueryResult { path, score, matched_by }>
}
Aggregate
Use ctx.aggregate(path) to get an AggregateBuilder:
#![allow(unused)]
fn main() {
let stats = ctx.aggregate("/orders")
.count()
.sum("total")
.avg("total")
.min_val("total")
.max_val("total")
.group_by("status")
.limit(100)
.execute()?;
// stats: AggregateResult { groups, total_count }
}
The QueryBuilder
The QueryBuilder provides a fluent API for composing queries. Multiple conditions on the top level are implicitly ANDed.
Field Operators
Start a field condition with .field("name"), then chain an operator:
Equality:
.eq(value: &[u8])– exact match on raw bytes.eq_u64(value)– exact match on u64.eq_i64(value)– exact match on i64.eq_f64(value)– exact match on f64.eq_str(value)– exact match on string.eq_bool(value)– exact match on boolean
Comparison:
.gt(value: &[u8]),.gt_u64(value),.gt_str(value),.gt_f64(value)– greater than.lt(value: &[u8]),.lt_u64(value),.lt_str(value),.lt_f64(value)– less than
Range:
.between(min: &[u8], max: &[u8])– inclusive range on raw bytes.between_u64(min, max)– inclusive range on u64.between_str(min, max)– inclusive range on strings
Set Membership:
.in_values(values: &[&[u8]])– match any of the given byte values.in_u64(values: &[u64])– match any of the given u64 values.in_str(values: &[&str])– match any of the given strings
Text Search:
.contains(text)– substring / trigram contains.similar(text, threshold)– trigram similarity (threshold 0.0–1.0).phonetic(text)– Soundex/Metaphone phonetic match.fuzzy(text)– Levenshtein distance fuzzy match.match_query(text)– full-text match
Boolean Combinators
#![allow(unused)]
fn main() {
// AND group
ctx.query("/users")
.and(|q| q.field("name").contains("Alice").field("active").eq_bool(true))
.limit(10)
.execute()?;
// OR group
ctx.query("/users")
.or(|q| q.field("role").eq_str("admin").field("role").eq_str("superadmin"))
.execute()?;
// NOT
ctx.query("/users")
.not(|q| q.field("status").eq_str("banned"))
.execute()?;
}
Sorting and Pagination
#![allow(unused)]
fn main() {
ctx.query("/users")
.field("active").eq_bool(true)
.sort("created_at", SortDirection::Desc)
.sort("name", SortDirection::Asc)
.limit(25)
.offset(50)
.execute()?;
}
The PluginResponse
Three builder methods for constructing responses:
PluginResponse::json(status_code, &body)
Serializes any Serialize type to JSON. Sets Content-Type: application/json.
#![allow(unused)]
fn main() {
PluginResponse::json(200, &serde_json::json!({"ok": true}))
.map_err(|e| PluginError::SerializationFailed(e.to_string()))
}
PluginResponse::text(status_code, body)
Returns a plain text response. Sets Content-Type: text/plain.
#![allow(unused)]
fn main() {
Ok(PluginResponse::text(201, "Created by plugin"))
}
PluginResponse::error(status_code, message)
Returns a JSON error response in the form {"error": "<message>"}.
#![allow(unused)]
fn main() {
Ok(PluginResponse::error(404, "User not found"))
}
Real-World Example: Echo Plugin
The built-in echo plugin (aeordb-plugins/echo-plugin) demonstrates multiplexing a single plugin across multiple operations:
#![allow(unused)]
fn main() {
use aeordb_plugin_sdk::prelude::*;
use aeordb_plugin_sdk::aeordb_query_plugin;
aeordb_query_plugin!(echo_handle);
fn echo_handle(ctx: PluginContext, request: PluginRequest) -> Result<PluginResponse, PluginError> {
let function = request.metadata
.get("function_name")
.map(|s| s.as_str())
.unwrap_or("echo");
match function {
"echo" => {
PluginResponse::json(200, &serde_json::json!({
"echo": true,
"metadata": request.metadata,
"body_len": request.arguments.len(),
}))
.map_err(|e| PluginError::SerializationFailed(e.to_string()))
}
"read" => {
let path = std::str::from_utf8(&request.arguments)
.map_err(|e| PluginError::ExecutionFailed(e.to_string()))?;
match ctx.read_file(path) {
Ok(file) => PluginResponse::json(200, &serde_json::json!({
"size": file.size,
"content_type": file.content_type,
"data_len": file.data.len(),
}))
.map_err(|e| PluginError::SerializationFailed(e.to_string())),
Err(e) => Ok(PluginResponse::error(404, &e.to_string())),
}
}
"write" => {
match ctx.write_file("/plugin-output/result.json", b"{\"written\":true}", "application/json") {
Ok(()) => PluginResponse::json(201, &serde_json::json!({"ok": true}))
.map_err(|e| PluginError::SerializationFailed(e.to_string())),
Err(e) => Ok(PluginResponse::error(500, &e.to_string())),
}
}
"delete" => {
let path = std::str::from_utf8(&request.arguments)
.map_err(|e| PluginError::ExecutionFailed(e.to_string()))?;
match ctx.delete_file(path) {
Ok(()) => PluginResponse::json(200, &serde_json::json!({"deleted": true}))
.map_err(|e| PluginError::SerializationFailed(e.to_string())),
Err(e) => Ok(PluginResponse::error(500, &e.to_string())),
}
}
"metadata" => {
let path = std::str::from_utf8(&request.arguments)
.map_err(|e| PluginError::ExecutionFailed(e.to_string()))?;
match ctx.file_metadata(path) {
Ok(meta) => PluginResponse::json(200, &serde_json::json!(meta))
.map_err(|e| PluginError::SerializationFailed(e.to_string())),
Err(e) => Ok(PluginResponse::error(404, &e.to_string())),
}
}
"list" => {
let path = std::str::from_utf8(&request.arguments)
.map_err(|e| PluginError::ExecutionFailed(e.to_string()))?;
match ctx.list_directory(path) {
Ok(entries) => PluginResponse::json(200, &serde_json::json!({"entries": entries}))
.map_err(|e| PluginError::SerializationFailed(e.to_string())),
Err(e) => Ok(PluginResponse::error(500, &e.to_string())),
}
}
"status" => Ok(PluginResponse::text(201, "Created by plugin")),
_ => Ok(PluginResponse::error(404, &format!("Unknown function: {}", function))),
}
}
}
See Also
- Parser Plugins – plugins that transform non-JSON files into queryable data
- SDK Reference – complete type reference for the plugin SDK
Plugin SDK Reference
Complete type reference for the aeordb-plugin-sdk crate. This covers every public struct, enum, trait, and method available to plugin authors.
Macros
aeordb_parser!(fn_name)
Generates WASM exports for a parser plugin. Your function must have the signature:
#![allow(unused)]
fn main() {
fn fn_name(input: ParserInput) -> Result<serde_json::Value, String>
}
Generated exports:
handle(ptr: i32, len: i32) -> i64– deserializes the parser envelope, calls your function, returns packed pointer+length to the serialized response
aeordb_query_plugin!(fn_name)
Generates WASM exports for a query plugin. Your function must have the signature:
#![allow(unused)]
fn main() {
fn fn_name(ctx: PluginContext, request: PluginRequest) -> Result<PluginResponse, PluginError>
}
Generated exports:
alloc(size: i32) -> i32– allocates guest memory for the host to write request datahandle(ptr: i32, len: i32) -> i64– deserializes the request, creates aPluginContext, calls your function, returns packed pointer+length to the serialized response
Prelude
Import everything you need with:
#![allow(unused)]
fn main() {
use aeordb_plugin_sdk::prelude::*;
}
This re-exports: PluginError, PluginRequest, PluginResponse, ParserInput, FileMeta, PluginContext, FileData, DirEntry, FileMetadata, QueryResult, AggregateResult, SortDirection.
PluginRequest
Request passed to a query plugin when it is invoked.
| Field | Type | Description |
|---|---|---|
arguments | Vec<u8> | Raw argument bytes (e.g., the HTTP request body forwarded to the plugin) |
metadata | HashMap<String, String> | Key-value metadata about the invocation context |
Common metadata keys:
| Key | Description |
|---|---|
function_name | The function name from the invoke URL |
PluginResponse
Response returned by a plugin after handling a request.
| Field | Type | Description |
|---|---|---|
status_code | u16 | HTTP-style status code |
body | Vec<u8> | Raw response body bytes |
content_type | Option<String> | MIME content type of the body |
headers | HashMap<String, String> | Additional response headers |
Builder Methods
PluginResponse::json(status_code: u16, body: &T) -> Result<Self, serde_json::Error>
Serializes body (any Serialize type) to JSON. Sets content_type to "application/json".
#![allow(unused)]
fn main() {
PluginResponse::json(200, &serde_json::json!({"ok": true}))
}
PluginResponse::text(status_code: u16, body: impl Into<String>) -> Self
Creates a plain text response. Sets content_type to "text/plain".
#![allow(unused)]
fn main() {
PluginResponse::text(200, "Hello, world!")
}
PluginResponse::error(status_code: u16, message: impl Into<String>) -> Self
Creates a JSON error response: {"error": "<message>"}. Sets content_type to "application/json".
#![allow(unused)]
fn main() {
PluginResponse::error(404, "not found")
}
PluginError
Error enum for the plugin system.
| Variant | Description |
|---|---|
NotFound(String) | The plugin could not be found |
ExecutionFailed(String) | The plugin failed during execution |
SerializationFailed(String) | Request or response could not be serialized/deserialized |
ResourceLimitExceeded(String) | Plugin exceeded memory, fuel, or other resource limits |
InvalidModule(String) | An invalid or corrupt WASM module was provided |
Internal(String) | A generic internal error |
All variants carry a String message. PluginError implements Display, Debug, and Error.
PluginContext
Guest-side handle for calling AeorDB host functions from WASM. Created automatically by aeordb_query_plugin! and passed to the handler.
On non-WASM targets (native compilation), all methods return PluginError::ExecutionFailed – this allows IDE support and unit testing of plugin logic without a WASM runtime.
File Operations
read_file(&self, path: &str) -> Result<FileData, PluginError>
Read a file at the given path. Returns the decoded file bytes, content type, and size.
write_file(&self, path: &str, data: &[u8], content_type: &str) -> Result<(), PluginError>
Write (create or overwrite) a file. Data is base64-encoded on the wire automatically.
delete_file(&self, path: &str) -> Result<(), PluginError>
Delete a file at the given path.
file_metadata(&self, path: &str) -> Result<FileMetadata, PluginError>
Retrieve metadata for a file without reading its contents.
list_directory(&self, path: &str) -> Result<Vec<DirEntry>, PluginError>
List directory entries at the given path.
Query and Aggregation
query(&self, path: &str) -> QueryBuilder
Start building a query against files at the given path. See QueryBuilder.
aggregate(&self, path: &str) -> AggregateBuilder
Start building an aggregation against files at the given path. See AggregateBuilder.
FileData
Raw file data returned by read_file.
| Field | Type | Description |
|---|---|---|
data | Vec<u8> | Decoded file bytes |
content_type | String | MIME content type |
size | u64 | File size in bytes |
DirEntry
A single directory entry returned by list_directory.
| Field | Type | Description |
|---|---|---|
name | String | Entry name (file or directory name, not the full path) |
entry_type | String | "file" or "directory" |
size | u64 | Size in bytes (0 for directories, defaults to 0 if absent) |
FileMetadata
Metadata about a stored file.
| Field | Type | Description |
|---|---|---|
path | String | Full storage path |
size | u64 | File size in bytes |
content_type | Option<String> | MIME content type (if known) |
created_at | i64 | Creation timestamp (ms since epoch) |
updated_at | i64 | Last update timestamp (ms since epoch) |
ParserInput
Input to a parser function.
| Field | Type | Description |
|---|---|---|
data | Vec<u8> | Raw file bytes (base64-decoded from the wire envelope) |
meta | FileMeta | File metadata |
FileMeta
Metadata about the file being parsed (available inside parser plugins).
| Field | Type | Description |
|---|---|---|
filename | String | File name only (e.g., "report.pdf") |
path | String | Full storage path (e.g., "/docs/reports/report.pdf") |
content_type | String | MIME type |
size | u64 | Raw file size in bytes |
hash | String | Hex-encoded content hash (may be empty) |
hash_algorithm | String | Hash algorithm (e.g., "blake3_256", may be empty) |
created_at | i64 | Creation timestamp (ms since epoch, default 0) |
updated_at | i64 | Last update timestamp (ms since epoch, default 0) |
QueryBuilder
Fluent builder for constructing AeorDB queries. Obtained via PluginContext::query(path) or QueryBuilder::new(path).
Field Conditions
Start with .field("name") to get a FieldQueryBuilder, then chain one operator:
Equality
| Method | Signature | Description |
|---|---|---|
eq | (value: &[u8]) -> QueryBuilder | Exact match on raw bytes |
eq_u64 | (value: u64) -> QueryBuilder | Exact match on u64 |
eq_i64 | (value: i64) -> QueryBuilder | Exact match on i64 |
eq_f64 | (value: f64) -> QueryBuilder | Exact match on f64 |
eq_str | (value: &str) -> QueryBuilder | Exact match on string |
eq_bool | (value: bool) -> QueryBuilder | Exact match on boolean |
Greater Than
| Method | Signature | Description |
|---|---|---|
gt | (value: &[u8]) -> QueryBuilder | Greater than on raw bytes |
gt_u64 | (value: u64) -> QueryBuilder | Greater than on u64 |
gt_str | (value: &str) -> QueryBuilder | Greater than on string |
gt_f64 | (value: f64) -> QueryBuilder | Greater than on f64 |
Less Than
| Method | Signature | Description |
|---|---|---|
lt | (value: &[u8]) -> QueryBuilder | Less than on raw bytes |
lt_u64 | (value: u64) -> QueryBuilder | Less than on u64 |
lt_str | (value: &str) -> QueryBuilder | Less than on string |
lt_f64 | (value: f64) -> QueryBuilder | Less than on f64 |
Range
| Method | Signature | Description |
|---|---|---|
between | (min: &[u8], max: &[u8]) -> QueryBuilder | Inclusive range on raw bytes |
between_u64 | (min: u64, max: u64) -> QueryBuilder | Inclusive range on u64 |
between_str | (min: &str, max: &str) -> QueryBuilder | Inclusive range on strings |
Set Membership
| Method | Signature | Description |
|---|---|---|
in_values | (values: &[&[u8]]) -> QueryBuilder | Match any of the given byte values |
in_u64 | (values: &[u64]) -> QueryBuilder | Match any of the given u64 values |
in_str | (values: &[&str]) -> QueryBuilder | Match any of the given strings |
Text Search
| Method | Signature | Description |
|---|---|---|
contains | (text: &str) -> QueryBuilder | Substring / trigram contains search |
similar | (text: &str, threshold: f64) -> QueryBuilder | Trigram similarity search (0.0–1.0) |
phonetic | (text: &str) -> QueryBuilder | Soundex/Metaphone phonetic search |
fuzzy | (text: &str) -> QueryBuilder | Levenshtein distance fuzzy search |
match_query | (text: &str) -> QueryBuilder | Full-text match query |
Boolean Combinators
| Method | Signature | Description |
|---|---|---|
and | (build_fn: FnOnce(QueryBuilder) -> QueryBuilder) -> Self | AND group via closure |
or | (build_fn: FnOnce(QueryBuilder) -> QueryBuilder) -> Self | OR group via closure |
not | (build_fn: FnOnce(QueryBuilder) -> QueryBuilder) -> Self | Negate a condition via closure |
Sorting and Pagination
| Method | Signature | Description |
|---|---|---|
sort | (field: impl Into<String>, direction: SortDirection) -> Self | Add a sort field |
limit | (count: usize) -> Self | Limit result count |
offset | (count: usize) -> Self | Skip the first N results |
Execution
| Method | Signature | Description |
|---|---|---|
execute | (self) -> Result<Vec<QueryResult>, PluginError> | Execute the query via host FFI |
to_json | (&self) -> serde_json::Value | Serialize builder state to JSON (for inspection/debugging) |
QueryResult
A single query result returned by the host.
| Field | Type | Description |
|---|---|---|
path | String | Path of the matching file |
score | f64 | Relevance score (higher is better, default 0.0) |
matched_by | Vec<String> | Names of the indexes/operations that matched (default empty) |
SortDirection
Sort direction for query results.
| Variant | Description |
|---|---|
Asc | Ascending order |
Desc | Descending order |
AggregateBuilder
Fluent builder for constructing AeorDB aggregation queries. Obtained via PluginContext::aggregate(path) or AggregateBuilder::new(path).
Aggregation Operations
| Method | Signature | Description |
|---|---|---|
count | (self) -> Self | Request a count aggregation |
sum | (field: impl Into<String>) -> Self | Request a sum on a field |
avg | (field: impl Into<String>) -> Self | Request an average on a field |
min_val | (field: impl Into<String>) -> Self | Request a minimum value on a field |
max_val | (field: impl Into<String>) -> Self | Request a maximum value on a field |
Grouping and Filtering
| Method | Signature | Description |
|---|---|---|
group_by | (field: impl Into<String>) -> Self | Group results by a field |
filter | (build_fn: FnOnce(QueryBuilder) -> QueryBuilder) -> Self | Add a where condition via closure |
limit | (count: usize) -> Self | Limit the number of groups returned |
Execution
| Method | Signature | Description |
|---|---|---|
execute | (self) -> Result<AggregateResult, PluginError> | Execute the aggregation via host FFI |
to_json | (&self) -> serde_json::Value | Serialize builder state to JSON |
AggregateResult
Aggregation result returned by the host.
| Field | Type | Description |
|---|---|---|
groups | Vec<serde_json::Value> | Per-group aggregation results (default empty) |
total_count | Option<u64> | Total count if count was requested without group_by |
See Also
- Parser Plugins – how to write and deploy parser plugins
- Query Plugins – how to write and deploy query plugins
Garbage Collection
AeorDB is an append-only database: writes, overwrites, and deletes all append new entries without modifying existing ones. Over time, this leaves orphaned entries – old file versions, deleted content, stale directory indexes – consuming disk space. Garbage collection (GC) reclaims that space.
What GC Does
GC uses a mark-and-sweep algorithm:
-
Mark phase: Starting from HEAD, all snapshots, and all forks, GC walks every reachable directory tree. It marks every entry that is still live: directory indexes, file records, content chunks, system tables, task queue records, and deletion records.
-
Sweep phase: GC iterates all entries in the key-value store. Any entry whose hash is not in the live set is garbage. Non-live entries are overwritten in-place with DeletionRecord or Void entries, then removed from the KV index.
What Gets Collected
- Old file versions that have been overwritten
- Content chunks no longer referenced by any file record
- Directory indexes from previous tree states
- Entries orphaned by deletes
What Does NOT Get Collected
- HEAD and all reachable entries from HEAD
- All snapshot root trees and their descendants
- All fork root trees and their descendants
- System table entries (
/.system,/.config) - Task queue records (registry + individual task entries)
- DeletionRecord entries (needed for KV rebuild from
.aeordbscan)
Running GC
CLI
# Run GC
aeordb gc --database data.aeordb
# Dry run -- report what would be collected without deleting
aeordb gc --database data.aeordb --dry-run
Example output:
AeorDB Garbage Collection
Database: data.aeordb
Versions scanned: 3
Live entries: 1247
Garbage entries: 89
Reclaimed: 1.2 MB
Duration: 0.3s
Dry run output:
AeorDB Garbage Collection [DRY RUN]
Database: data.aeordb
[DRY RUN] Would collect 89 garbage entries (1.2 MB)
HTTP API
Synchronous GC (blocks until complete):
# Run GC
curl -X POST http://localhost:3000/admin/gc \
-H "Authorization: Bearer $API_KEY"
# Dry run
curl -X POST http://localhost:3000/admin/gc?dry_run=true \
-H "Authorization: Bearer $API_KEY"
Response:
{
"versions_scanned": 3,
"live_entries": 1247,
"garbage_entries": 89,
"reclaimed_bytes": 1258291,
"duration_ms": 312,
"dry_run": false
}
Background GC (returns immediately, runs as a task):
curl -X POST http://localhost:3000/admin/tasks/gc \
-H "Authorization: Bearer $API_KEY"
This enqueues a GC task that the background task worker will pick up. Track its progress via the task system.
When to Run GC
- After bulk deletes: If you delete a large number of files, their content chunks become garbage.
- After bulk overwrites: Updating many files leaves old versions behind.
- After version cleanup: If you delete old snapshots, the entries they exclusively referenced become garbage.
- Periodically: Set up a cron schedule for automatic GC.
Example cron configuration (/.config/cron.json):
{
"schedules": [
{
"id": "nightly-gc",
"task_type": "gc",
"schedule": "0 3 * * *",
"args": {},
"enabled": true
}
]
}
Concurrency and Safety
GC should not be run concurrently with writes. The sweep phase re-verifies each candidate against the current KV state before overwriting to mitigate races, but for full safety, callers should ensure exclusive access during GC.
Crash safety: If the process crashes mid-sweep, the .aeordb file may contain partially overwritten entries. On restart, the .kv index file will be stale and must be deleted to trigger a full rebuild from the .aeordb file scan. The rebuild replays deletion records and reconstructs the index, so no committed data is lost. Garbage entries that were not yet swept will persist until the next GC run.
Performance
At scale, expect approximately 10K entries/sec sweep throughput. The mark phase is faster since it only walks reachable trees. The sweep phase writes are batched with a single sync at the end for performance.
See Also
- Task System & Cron – background task execution and scheduling
- Reindexing – rebuilding indexes after config changes
- Backup & Restore – exporting clean versions without garbage
Reindexing
When you change a table’s index configuration (indexes.json), existing files need to be re-processed through the indexing pipeline. AeorDB handles this through background reindex tasks.
Why Reindex
- You added a new index field (e.g., adding a
fulltextindex on a field that was previously unindexed) - You changed the index type for a field (e.g., switching from
exacttofulltext) - You added or changed a parser plugin, and existing files need to be re-parsed
- You modified index settings (e.g., changing similarity thresholds)
Automatic Reindexing
Changing indexes.json via the API automatically triggers a background reindex task for the affected directory. You do not need to manually trigger reindexing in most cases.
Manual Reindexing
HTTP API
curl -X POST http://localhost:3000/admin/tasks/reindex \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"path": "/data/"}'
The path argument specifies which directory to reindex. The task worker will:
- Read the
indexes.jsonconfiguration for that path - List all file entries in the directory
- Re-read each file and run it through the indexing pipeline
- Track progress and update checkpoints
Progress Tracking
During an active reindex, query responses include a meta.reindexing field indicating that results may be incomplete:
{
"results": [...],
"meta": {
"reindexing": true
}
}
You can also check progress through the task system:
curl http://localhost:3000/admin/tasks \
-H "Authorization: Bearer $API_KEY"
The response includes progress details:
{
"tasks": [
{
"id": "abc123",
"task_type": "reindex",
"status": "running",
"progress": {
"task_id": "abc123",
"task_type": "reindex",
"progress": 0.45,
"eta_ms": 12000,
"indexed_count": 450,
"total_count": 1000,
"message": "indexed 450/1000 files"
}
}
]
}
Circuit Breaker
If 10 consecutive files fail to index, the reindex task trips a circuit breaker and fails with an error:
circuit breaker: 10 consecutive indexing failures
This prevents runaway error loops when the index configuration or parser is fundamentally broken. Fix the underlying issue and trigger a new reindex.
Checkpoint and Resume
Reindex tasks save a checkpoint after each batch (50 files). If the server crashes or the task is cancelled and restarted, it resumes from the last checkpoint rather than starting over.
The checkpoint is the name of the last successfully processed file (files are processed in alphabetical order for deterministic ordering).
Cancellation
Cancel a running reindex task:
curl -X POST http://localhost:3000/admin/tasks/{task_id}/cancel \
-H "Authorization: Bearer $API_KEY"
The task checks for cancellation after each batch, so it will stop within one batch cycle.
Batch Processing
Files are processed in batches of 50. After each batch, the task:
- Updates the checkpoint to the last file in the batch
- Computes progress percentage and ETA (using a rolling average of the last 10 batch times)
- Checks for cancellation
See Also
- Task System & Cron – task lifecycle, listing, and scheduling
- Garbage Collection – reclaiming space from orphaned entries
Task System & Cron
AeorDB runs long-running operations (reindexing, garbage collection) as background tasks. Tasks are managed by a task queue, executed by a dedicated worker, and can be triggered manually or on a cron schedule.
Built-in Task Types
| Task Type | Description |
|---|---|
reindex | Re-run the indexing pipeline on all files under a directory |
gc | Run garbage collection (mark-and-sweep) |
Task Lifecycle
pending --> running --> completed
--> failed
--> cancelled
- Pending: Task is enqueued and waiting for the worker to pick it up.
- Running: Worker has dequeued the task and is executing it.
- Completed: Task finished successfully.
- Failed: Task encountered an error (e.g., circuit breaker tripped, GC failed).
- Cancelled: Task was cancelled by the user between batch iterations.
On server startup, any tasks left in Running state (from a previous crash) are reset to Pending so they can be re-executed.
API
List Tasks
curl http://localhost:3000/admin/tasks \
-H "Authorization: Bearer $API_KEY"
Response:
{
"tasks": [
{
"id": "abc123",
"task_type": "reindex",
"status": "running",
"args": {"path": "/data/"},
"created_at": 1700000000000,
"progress": {
"task_id": "abc123",
"task_type": "reindex",
"progress": 0.65,
"eta_ms": 8000,
"indexed_count": 650,
"total_count": 1000,
"message": "indexed 650/1000 files"
}
},
{
"id": "def456",
"task_type": "gc",
"status": "completed",
"args": {},
"created_at": 1699999000000
}
]
}
Trigger a Task
Reindex:
curl -X POST http://localhost:3000/admin/tasks/reindex \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"path": "/data/"}'
Garbage Collection:
curl -X POST http://localhost:3000/admin/tasks/gc \
-H "Authorization: Bearer $API_KEY"
Cancel a Task
curl -X POST http://localhost:3000/admin/tasks/{task_id}/cancel \
-H "Authorization: Bearer $API_KEY"
Cancellation is cooperative: the task checks for cancellation between batch iterations. It will not interrupt a batch in progress.
Progress Tracking
Running tasks expose in-memory progress information:
| Field | Type | Description |
|---|---|---|
task_id | String | Task identifier |
task_type | String | Task type (e.g., "reindex") |
progress | f64 | Completion fraction (0.0 to 1.0) |
eta_ms | Option<i64> | Estimated time remaining in milliseconds |
indexed_count | usize | Number of items processed so far |
total_count | usize | Total items to process |
message | Option<String> | Human-readable progress message |
Progress is computed using a rolling average of the last 10 batch execution times for ETA calculation.
During an active reindex, query responses include meta.reindexing: true so clients know results may be incomplete.
Cron Scheduling
AeorDB includes a built-in cron scheduler that checks /.config/cron.json every 60 seconds and enqueues matching tasks.
Configuration
Store the cron configuration at /.config/cron.json:
{
"schedules": [
{
"id": "nightly-gc",
"task_type": "gc",
"schedule": "0 3 * * *",
"args": {},
"enabled": true
},
{
"id": "hourly-reindex",
"task_type": "reindex",
"schedule": "0 * * * *",
"args": {"path": "/data/"},
"enabled": true
}
]
}
Cron Expression Format
Standard 5-field Unix cron expressions:
minute hour day-of-month month day-of-week
* * * * *
Examples:
0 3 * * *– every day at 3:00 AM*/15 * * * *– every 15 minutes0 0 * * 0– every Sunday at midnight30 2 1 * *– 2:30 AM on the 1st of every month
Cron API
Create/update the schedule:
curl -X PUT http://localhost:3000/.config/cron.json \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"schedules": [
{
"id": "nightly-gc",
"task_type": "gc",
"schedule": "0 3 * * *",
"args": {},
"enabled": true
}
]
}'
Read the schedule:
curl http://localhost:3000/.config/cron.json \
-H "Authorization: Bearer $API_KEY"
Disable a schedule (set enabled: false and re-upload):
# Fetch, modify, re-upload
Deduplication
The cron scheduler checks whether a task with the same type and arguments is already pending or running before enqueuing. This prevents duplicate tasks from stacking up if a previous run hasn’t finished.
CronSchedule Fields
| Field | Type | Description |
|---|---|---|
id | String | Unique identifier for this schedule |
task_type | String | Task type to enqueue (e.g., "gc", "reindex") |
schedule | String | 5-field Unix cron expression |
args | serde_json::Value | Arguments passed to the task |
enabled | bool | Whether this schedule is active (default true) |
Task Retention
Completed tasks are automatically pruned:
- Tasks older than 24 hours are removed
- At most 100 completed tasks are retained
Pruning runs after each task completes.
Events
The task system emits events on the event bus:
| Event | Description |
|---|---|
task.started | A task has begun execution |
task.completed | A task finished successfully |
task.failed | A task encountered an error |
gc.completed | GC-specific completion event with statistics |
See Also
- Garbage Collection – details on the GC mark-and-sweep algorithm
- Reindexing – details on the reindex process, circuit breaker, and checkpoints
Backup & Restore
AeorDB supports exporting database versions as self-contained .aeordb files, creating incremental patches between versions, importing backups, and promoting version hashes.
Concepts
- Full export: A clean
.aeordbfile containing only the live entries at a specific version. No voids, no deletion records, no stale overwrites, no history. - Patch (diff): A
.aeordbfile containing only the changeset between two versions – new/changed chunks, updated file records, updated directory indexes, and deletion records for removed files. - Import: Applying an export or patch into a target database.
- Promote: Setting a version hash as the current HEAD.
Full Export
Export HEAD, a named snapshot, or a specific version hash as a self-contained backup.
CLI
# Export HEAD
aeordb export --database data.aeordb --output backup.aeordb
# Export a named snapshot
aeordb export --database data.aeordb --output backup.aeordb --snapshot v1
# Export a specific version hash
aeordb export --database data.aeordb --output backup.aeordb --hash abc123def456...
The output file must not already exist – the command will refuse to overwrite.
HTTP API
curl -X POST http://localhost:3000/admin/export \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"output": "backup.aeordb"}'
With a snapshot:
curl -X POST http://localhost:3000/admin/export \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"output": "backup.aeordb", "snapshot": "v1"}'
Output
Export complete.
Files: 142
Chunks: 89
Directories: 23
Version: abc123def456...
Diff / Patch
Create an incremental patch containing only the changes between two versions. This is significantly smaller than a full export when only a few files have changed.
CLI
# Diff between two snapshots
aeordb diff --database data.aeordb --output patch.aeordb --from v1 --to v2
# Diff from a snapshot to HEAD
aeordb diff --database data.aeordb --output patch.aeordb --from v1
# Diff using raw hashes
aeordb diff --database data.aeordb --output patch.aeordb --from abc123... --to def456...
The --from and --to arguments accept either snapshot names or hex-encoded version hashes. If --to is omitted, HEAD is used.
HTTP API
curl -X POST http://localhost:3000/admin/diff \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"output": "patch.aeordb", "from": "v1", "to": "v2"}'
Output
Patch created.
Files added: 5
Files modified: 12
Files deleted: 3
Chunks: 8
Directories: 7
From: abc123...
To: def456...
What a Patch Contains
- New chunks: Content chunks that exist in the target version but not the base version
- Added file records: Files present in the target but not the base
- Modified file records: Files that changed between the two versions
- Deletion records: Files present in the base but removed in the target
- Changed directory indexes: Directory entries that differ between versions
Import
Apply a full export or incremental patch to a target database.
CLI
# Import a full export
aeordb import --database data.aeordb --file backup.aeordb
# Import and immediately promote HEAD
aeordb import --database data.aeordb --file backup.aeordb --promote
# Force import a patch even if base version doesn't match
aeordb import --database data.aeordb --file patch.aeordb --force
Flags:
--promote: Automatically set HEAD to the imported version--force: Skip base version verification for patches
HTTP API
curl -X POST http://localhost:3000/admin/import \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"file": "backup.aeordb", "promote": true}'
Patch Base Version Check
When importing a patch, AeorDB verifies that the target database’s current HEAD matches the patch’s base version. If they don’t match, the import fails:
Target database HEAD (aaa111...) does not match patch base version (bbb222...).
Use --force to apply anyway.
Use --force to skip this check if you know what you’re doing.
Output
Full export imported.
Entries: 254
Chunks: 89
Files: 142
Directories: 23
Deletions: 0
Version: abc123...
HEAD has been promoted.
If --promote was not used:
HEAD has NOT been changed.
To promote: aeordb promote --hash abc123...
Promote
Set a specific version hash as the current HEAD.
CLI
aeordb promote --database data.aeordb --hash abc123def456...
The command verifies that the hash exists in the database before promoting.
HTTP API
curl -X POST http://localhost:3000/admin/promote \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"hash": "abc123def456..."}'
Typical Workflows
Regular Backups
# Create a snapshot first
curl -X POST http://localhost:3000/admin/snapshots \
-H "Authorization: Bearer $API_KEY" \
-d '{"name": "daily-2024-01-15"}'
# Export it
aeordb export --database data.aeordb \
--output backups/daily-2024-01-15.aeordb \
--snapshot daily-2024-01-15
Incremental Backups
# First backup: full export
aeordb export --database data.aeordb --output backups/full.aeordb --snapshot v1
# Subsequent backups: just the diff
aeordb diff --database data.aeordb --output backups/patch-v1-v2.aeordb --from v1 --to v2
Restore from Backup
# Import the full backup
aeordb import --database restored.aeordb --file backups/full.aeordb --promote
# Apply incremental patches in order
aeordb import --database restored.aeordb --file backups/patch-v1-v2.aeordb --promote
Migrate Between Servers
# On source server
aeordb export --database data.aeordb --output transfer.aeordb
# Copy to target server
scp transfer.aeordb target-server:/data/
# On target server
aeordb import --database data.aeordb --file transfer.aeordb --promote
See Also
- CLI Commands – full command reference with all flags
- Garbage Collection – clean up orphaned entries after imports
CLI Commands
Complete reference for the aeordb command-line interface.
aeordb start
Start the AeorDB server.
aeordb start [OPTIONS]
Flags
| Flag | Short | Default | Description |
|---|---|---|---|
--port | -p | 3000 | TCP port to listen on |
--database | -D | data.aeordb | Path to the .aeordb database file |
--log-format | pretty | Log output format: pretty or json | |
--auth | (none) | Auth provider URI (see below) | |
--hot-dir | (database parent dir) | Directory for write-ahead hot files | |
--cors | (disabled) | CORS allowed origins |
Auth Modes
The --auth flag accepts several formats:
| Value | Mode | Description |
|---|---|---|
| (not set) | Disabled | No authentication required (dev mode) |
false, null, no, 0 | Disabled | Explicitly disable authentication |
self | Self-contained | AeorDB manages API keys internally |
file:///path/to/identity | File-based | Load identity from a file |
When using self mode, the root API key is printed once on first startup. Save it – it cannot be retrieved again (but can be reset with emergency-reset).
CORS
| Value | Behavior |
|---|---|
| (not set) | CORS disabled |
* | Allow all origins |
https://a.com,https://b.com | Allow specific comma-separated origins |
Examples
# Development mode (no auth, default port)
aeordb start
# Production with auth on port 8080
aeordb start --port 8080 --database /var/lib/aeordb/prod.aeordb --auth self --log-format json
# Custom hot directory and CORS
aeordb start --database data.aeordb --hot-dir /fast-ssd/hot --cors "*"
What Happens on Start
- Opens (or creates) the database file
- Bootstraps root API key (if
--auth selfand no key exists yet) - Resets any tasks left in
Runningstate from a previous crash toPending - Starts background workers:
- Heartbeat: emits
DatabaseStatsevery 15 seconds - Cron scheduler: checks
/.config/cron.jsonevery 60 seconds - Task worker: dequeues and executes background tasks
- Webhook dispatcher: delivers events to registered webhook URLs
- Heartbeat: emits
- Binds to the TCP port and begins serving requests
- Shuts down gracefully on CTRL+C
aeordb gc
Run garbage collection to reclaim space from unreachable entries.
aeordb gc [OPTIONS]
Flags
| Flag | Short | Default | Description |
|---|---|---|---|
--database | -D | data.aeordb | Path to the .aeordb database file |
--dry-run | false | Report what would be collected without actually deleting |
Examples
# Run GC
aeordb gc --database data.aeordb
# Preview what would be collected
aeordb gc --database data.aeordb --dry-run
Output
AeorDB Garbage Collection
Database: data.aeordb
Versions scanned: 3
Live entries: 1247
Garbage entries: 89
Reclaimed: 1.2 MB
Duration: 0.3s
See Garbage Collection for details on the mark-and-sweep algorithm.
aeordb export
Export a version as a self-contained .aeordb file.
aeordb export [OPTIONS]
Flags
| Flag | Short | Default | Description |
|---|---|---|---|
--database | -D | data.aeordb | Source database file |
--output | -o | (required) | Output .aeordb file path |
--snapshot | -s | (none) | Named snapshot to export |
--hash | (none) | Specific version hash to export (hex-encoded) |
If neither --snapshot nor --hash is provided, HEAD is exported.
Examples
# Export HEAD
aeordb export --database data.aeordb --output backup.aeordb
# Export a named snapshot
aeordb export --database data.aeordb --output backup-v1.aeordb --snapshot v1
# Export a specific hash
aeordb export --database data.aeordb --output backup.aeordb --hash abc123def456...
The output file must not already exist.
See Backup & Restore for full backup workflows.
aeordb diff
Create a patch .aeordb containing only the changeset between two versions.
aeordb diff [OPTIONS]
Flags
| Flag | Short | Default | Description |
|---|---|---|---|
--database | -D | data.aeordb | Source database file |
--output | -o | (required) | Output patch file path |
--from | (required) | Base version (snapshot name or hex hash) | |
--to | HEAD | Target version (snapshot name or hex hash) |
Examples
# Diff between two snapshots
aeordb diff --database data.aeordb --output patch.aeordb --from v1 --to v2
# Diff from a snapshot to HEAD
aeordb diff --database data.aeordb --output patch.aeordb --from v1
# Diff between raw hashes
aeordb diff --database data.aeordb --output patch.aeordb --from abc123... --to def456...
The --from and --to arguments first try snapshot name lookup, then fall back to interpreting the value as a hex-encoded hash.
See Backup & Restore for incremental backup workflows.
aeordb import
Import an export or patch .aeordb file into a target database.
aeordb import [OPTIONS]
Flags
| Flag | Short | Default | Description |
|---|---|---|---|
--database | -D | data.aeordb | Target database file |
--file | -f | (required) | Backup or patch file to import |
--force | false | Skip base version verification for patches | |
--promote | false | Automatically set HEAD to the imported version |
Examples
# Import a full backup
aeordb import --database data.aeordb --file backup.aeordb
# Import and promote HEAD
aeordb import --database data.aeordb --file backup.aeordb --promote
# Force-import a patch even if base doesn't match
aeordb import --database data.aeordb --file patch.aeordb --force --promote
Patch Base Verification
When importing a patch (backup_type=2), AeorDB verifies that the target database’s HEAD matches the patch’s base version. Use --force to bypass this check.
See Backup & Restore for restore workflows.
aeordb promote
Promote a version hash to HEAD.
aeordb promote [OPTIONS]
Flags
| Flag | Short | Default | Description |
|---|---|---|---|
--database | -D | data.aeordb | Database file |
--hash | (required) | Hex-encoded version hash to promote |
Examples
aeordb promote --database data.aeordb --hash abc123def456...
The command verifies the hash exists in the database before promoting.
aeordb stress
Run stress tests against a running AeorDB instance.
aeordb stress [OPTIONS]
Flags
| Flag | Short | Default | Description |
|---|---|---|---|
--target | -t | http://localhost:3000 | Target server URL |
--api-key | -a | (required) | API key for authentication |
--concurrency | -c | 10 | Number of concurrent workers |
--duration | -d | 10s | Test duration (e.g., 30s, 5m) |
--operation | -o | mixed | Operation type: write, read, or mixed |
--file-size | -s | 1kb | File size for writes (e.g., 512b, 1kb, 1mb) |
--path-prefix | -p | /stress-test | Path prefix for stress test files |
Examples
# Quick mixed read/write test
aeordb stress --api-key $API_KEY
# Heavy write test for 5 minutes
aeordb stress --api-key $API_KEY --operation write --concurrency 50 --duration 5m --file-size 10kb
# Read-only test against production
aeordb stress --target https://prod.example.com --api-key $API_KEY --operation read --concurrency 100 --duration 30s
aeordb emergency-reset
Revoke the current root API key and generate a new one. Use this if the root key is lost or compromised.
aeordb emergency-reset [OPTIONS]
Flags
| Flag | Short | Default | Description |
|---|---|---|---|
--database | -D | (required) | Database file |
--force | false | Skip confirmation prompt |
Examples
# Interactive (prompts for confirmation)
aeordb emergency-reset --database data.aeordb
# Non-interactive
aeordb emergency-reset --database data.aeordb --force
What Happens
- Finds all API keys linked to the root user (nil UUID)
- Revokes each one
- Generates a new root API key
- Prints the new key (shown once, save it immediately)
WARNING: This will invalidate the current root API key.
A new root API key will be generated.
Proceed? [y/N]: y
Revoked 1 existing root API key(s).
==========================================================
NEW ROOT API KEY (shown once, save it now!):
aeordb_abc123def456...
==========================================================
This command requires direct file access to the database – it cannot be run over HTTP. It is intended for recovery scenarios where you have lost the root API key.
See Also
- Garbage Collection – GC algorithm details
- Backup & Restore – backup workflows
- Task System & Cron – background tasks and scheduling
- Reindexing – reindex process details