NEDB Documentation
NEDB is a versioned, time-traveling embedded database — replay-protected, idempotent, relational, and searchable. One hash-chained, nonce-enforced, append-only log is the substrate for idempotency, replay protection, crash recovery, MVCC, and time-travel simultaneously.
What makes it different: most databases store a snapshot of current state. NEDB stores the log of every operation and derives state from it. That one decision makes replay protection, crash recovery, time-travel reads, and on-chain provability all fall out for free — from the same structure.
Key properties
- Replay-protected: every write carries a strictly-monotonic per-client nonce. Stale or duplicate ops are rejected.
- Idempotent: pass an
idemkey and retries are no-ops — the original result is returned without re-executing. - MVCC time-travel: read the database exactly as it was at any past sequence —
AS OF seq. - Relational: first-class graph edges with O(1) traversal — and the graph time-travels too.
- Durable:
NEDB(path)appends every op to disk (log.aof) andfsync's it; the database reloads by replaying that log on open. - Provable:
verify()rewalks the BLAKE2b hash chain; the head hash is a commitment to the entire history, anchorable on-chain.
Installation
NEDB ships as a universal Python package — one command, any platform, no compiler needed.
pip install nedb-engine
Verify the install and start the server daemon:
python3 -c "import nedb; print(nedb.__version__)" # 0.3.1 nedbd # nedbd 0.3.1 — http://127.0.0.1:7070 data=./nedb-data auth=off
Python 3.8+ required. The package ships a universal py3-none-any wheel — no Rust toolchain, no native compilation. The optional Rust core (nedb._native) is additive roadmap; the pure-Python engine is the production baseline.
Quickstart
Up and running in under a minute — a durable database with indexes, relations, and time-travel.
from nedb import NEDB # Durable (persists to disk). Use NEDB() for in-memory. db = NEDB("./mydata") # Indexes power fast filtering, sorting, and full-text search db.create_index("users", "status", "eq") db.create_index("users", "age", "ordered") db.create_index("users", "bio", "search") # Write rows db.put("users", "alice", {"name": "Alice", "age": 31, "status": "active", "bio": "systems engineer"}) db.put("users", "bob", {"name": "Bob", "age": 40, "status": "active", "bio": "database architect"}) # Idempotent write — safe to retry forever db.put("orders", "o1", {"total": 42}, client="checkout", nonce=7, idem="charge-o1") # Query with NQL db.query('FROM users WHERE status = "active" ORDER BY age DESC') db.query('FROM users SEARCH "systems"') # Relations + traversal db.link("users:alice", "follows", "users:bob") db.q("users").where("_id", "=", "alice").traverse("follows").run() # Time-travel snap = db.seq db.put("users", "alice", {"age": 32, "status": "active"}) db.get("users", "alice", as_of=snap)["age"] # → 31 # Integrity assert db.verify() # hash chain intact assert db.verify_determinism() # state == replay(log) db.close()
Core Concepts
The append-only log
Every mutation (put, delete, link, put_file) creates an Op that is appended to the OpLog. Each Op is chained to the previous one via a BLAKE2b hash — the head hash is a cryptographic commitment to the entire history. Nothing is ever rewritten or deleted from the log.
State is a pure function of the log. The MVCC store, relations, and indexes are materialized views — they are rebuilt by replaying the log, which means crash recovery and time-travel are free side effects of the same design.
Sequences and time-travel
Every Op gets a monotonically-increasing seq — an integer starting at 0. db.seq returns the current sequence number. Any read can be made time-traveling by passing as_of=seq: the engine replays or truncates the log to that point and returns the result that was true then.
Replay protection and idempotency
Each write takes a client identifier and a nonce. The nonce must strictly exceed the last nonce seen from that client — stale or repeated ops raise ReplayError. If you also pass an idem key, the very first successful write is recorded; subsequent calls with the same key return the original result and append nothing to the log.
Collections and keys
NEDB is schema-agnostic. A "collection" is just a string namespace prefix. Every document has an _id field (set automatically from the id argument if missing). Internal store keys are formatted as collection:id.
Relations
Edges are stored as (from, relation, to) triples — the "from" and "to" are full node keys in collection:id form. Relations also time-travel: neighbors(frm, rel, as_of=seq) returns the edges that existed at that sequence.
NEDB()
The main database object. Instantiate once; keep open for the lifetime of your application.
| Parameter | Type | Description |
|---|---|---|
| path | str | None | Directory path for durable storage. If None (default) the database is in-memory only and nothing is written to disk. |
db = NEDB() # in-memory db = NEDB("./data") # durable — creates ./data/log.aof + meta.json # Use as a context manager for automatic close/flush: with NEDB("./data") as db: db.put("k", "1", {"v": 1})
put()
Insert or replace a document. If a document with the same id already exists it is fully replaced. Returns the stored document (with _id set).
| Parameter | Type | Description |
|---|---|---|
| coll | str | Collection name (e.g. "users") |
| id | str | Document identifier — unique within the collection |
| doc | dict | The document to store. _id is set automatically. |
| client | str | Client identifier for nonce tracking. Default: "local" |
| nonce | int | None | Monotonic write counter for this client. Auto-incremented if None. |
| idem | str | None | Idempotency key. If this key was already used, the original result is returned and nothing is appended. |
# Simple write doc = db.put("users", "alice", {"name": "Alice", "age": 31}) # Idempotent write — calling this 100 times = calling it once db.put("orders", "o1", {"total": 99}, client="api", nonce=1, idem="order-o1-v1") # Explicit nonce (replay-protected from a distributed service) db.put("events", "e1", {"type": "click"}, client="tracker", nonce=42)
delete()
Remove a document. The deletion is appended to the log as a delete op — the history is preserved and the document is visible at past sequences via AS OF.
db.delete("users", "alice")
get()
Retrieve a single document by id. Returns None if the document does not exist (or did not exist at as_of).
doc = db.get("users", "alice") # current HEAD old = db.get("users", "alice", as_of=3) # as it was at seq 3
query()
Execute a NQL query string. Returns a list of matching documents. See the NQL Reference for the full grammar.
rows = db.query('FROM users WHERE status = "active" ORDER BY age DESC LIMIT 10') found = db.query('FROM users SEARCH "engineer"') old = db.query('FROM users AS OF 5 WHERE age > 25')
create_index()
Create a secondary index on a collection field. Indexes are maintained incrementally on every write and dramatically speed up queries. Existing rows are backfilled immediately.
| kind | Type | Use for |
|---|---|---|
"eq" | Hash map | Equality filters — WHERE field = value |
"ordered" | Sorted list | Range queries and sorting — ORDER BY field, WHERE field > n |
"search" | Inverted index | Full-text search — SEARCH "term" |
db.create_index("users", "status", "eq") # WHERE status = "active" db.create_index("users", "created_at", "ordered") # ORDER BY created_at DESC db.create_index("users", "bio", "search") # SEARCH "engineer"
Create indexes before seeding data when possible — but creating after works too; existing rows are backfilled. Index configuration is persisted in meta.json (durable mode) so indexes survive restarts.
link() / unlink()
Create or remove a directed graph edge. Node keys are in collection:id format.
neighbors() / inbound()
Traverse edges from a node. Returns a list of node key strings (collection:id). Both support time-travel via as_of.
db.link("users:alice", "follows", "users:bob") db.link("users:alice", "follows", "users:carol") db.neighbors("users:alice", "follows") # ["users:bob", "users:carol"] db.inbound("users:bob", "follows") # ["users:alice"] snap = db.seq db.unlink("users:alice", "follows", "users:bob") db.neighbors("users:alice", "follows", as_of=snap) # ["users:bob", "users:carol"] (time-travel)
put_file() / get_file()
Git-style versioned file storage with Cascade compression: content-defined chunking, content-addressed dedup, and two compression tiers ("warm" fast / "cold" archival). Every version has a Merkle root.
data = open("notes.txt", "rb").read() v1 = db.put_file("notes.txt", data) # warm tier (fast) v2 = db.put_file("notes.txt", new_data, tier="cold") # cold tier (max compression) db.get_file("notes.txt", v1) # original bytes back db.file_root("notes.txt", v1) # Merkle root — anchorable on ITC db.compression_stats("warm") # {"ratio": 39.9, "dedup_hits": 20, ...}
verify() / verify_determinism()
Rewalk the entire BLAKE2b hash chain and confirm no op has been modified, reordered, or deleted. Returns True if the chain is intact. Run this after loading from disk to confirm persistence was not corrupted.
Replay the log from scratch into fresh state and compare the result with the current materialized state. Returns True if they match — proving that state is a pure function of the log.
Properties
| Property | Type | Description |
|---|---|---|
db.seq | int | Current sequence number (number of ops − 1) |
db.head | str | Current chain head — BLAKE2b hex digest of the entire history |
Lifecycle
flush() forces an fsync without closing. close() flushes and closes the AOF file handle. Always call close() (or use the context manager) before exiting in durable mode.
Fluent Builder — q()
An alternative to raw NQL strings when you want to compose queries programmatically.
Returns a Query builder. Chain methods and call .run() to execute.
| Method | NQL equivalent |
|---|---|
.where(field, op, value) | WHERE field op value |
.as_of(seq) | AS OF seq |
.search(text) | SEARCH "text" |
.order_by(field, "DESC") | ORDER BY field DESC |
.traverse(rel) | TRAVERSE rel |
.limit(n) | LIMIT n |
.run() | (executes and returns list[dict]) |
results = (
db.q("users")
.where("status", "=", "active")
.where("age", ">=", 25)
.order_by("age", "DESC")
.limit(10)
.run()
)
NQL Reference
NQL — the NEDB Query Language — is a small, readable query syntax. One grammar, one parser; the query() method and the q() builder both compile to the same plan.
Full grammar
[ AS OF <seq> ]
[ WHERE <field> <op> <value> ( AND <field> <op> <value> )* ]
[ SEARCH "<text>" ]
[ ORDER BY <field> [ ASC | DESC ] ]
[ TRAVERSE <relation> ]
[ LIMIT <n> ]
op ∈ = != < <= > >=
String values use double quotes. Numbers are unquoted.
FROM
Required. Specifies the collection to query.
db.query('FROM users') # all users db.query('FROM orders LIMIT 20') # first 20 orders
WHERE
Filter rows. Multiple conditions are combined with AND. All six comparison operators are supported. Equality filters use indexed lookups when an eq index exists.
db.query('FROM users WHERE status = "active"') db.query('FROM users WHERE age >= 25 AND status = "active"') db.query('FROM orders WHERE total > 100 AND status != "cancelled"')
Index hint: equality conditions on indexed fields (eq index) skip the collection scan entirely. Create an eq index on any field you filter by frequently.
SEARCH
Full-text search across all fields that have a search index. All terms in the query string must appear (AND semantics). Without an index, NEDB falls back to a linear scan — still correct, just slower.
db.create_index("users", "bio", "search") db.query('FROM users SEARCH "rust"') db.query('FROM users SEARCH "systems engineer"') # both terms must match
ORDER BY
Sort results by a field. Default direction is ASC. An ordered index makes sorting faster but is not required.
db.query('FROM users ORDER BY age ASC') db.query('FROM users ORDER BY created_at DESC LIMIT 10') db.query('FROM orders WHERE status = "paid" ORDER BY total DESC')
TRAVERSE
Follow a relation from the result set. First filters/sorts rows in the FROM collection, then follows the named edge to the target collection. Returns documents from the target collection.
# Who does alice follow? db.query('FROM users WHERE _id = "alice" TRAVERSE follows') # All work orders for active projects db.query('FROM projects WHERE status = "active" TRAVERSE contains')
AS OF
Time-travel read. Execute the query against the database as it existed at the given sequence number. Works with all other clauses.
snap = db.seq db.put("users", "alice", {"age": 32}) # alice is 31 at the snapshot, 32 at HEAD db.query(f'FROM users AS OF {snap} WHERE _id = "alice"') db.query('FROM users AS OF 0') # empty — nothing existed at seq 0
LIMIT
Truncate the result set. Applied after all other clauses (WHERE, SEARCH, ORDER BY, TRAVERSE).
db.query('FROM users ORDER BY created_at DESC LIMIT 5') db.query('FROM users WHERE status = "active" LIMIT 100')
Durable Mode
Pass a directory path to NEDB() to make the database durable. Every op is appended to disk immediately — NEDB uses the same model as Redis AOF persistence.
# Session 1 — write data db = NEDB("./mydata") db.create_index("users", "status", "eq") db.put("users", "alice", {"name": "Alice", "status": "active"}) db.close() # flush + fsync # Session 2 — reopen (replays log.aof, rebuilds state) db = NEDB("./mydata") assert db.verify() # chain intact across the restart assert db.get("users", "alice")["name"] == "Alice"
Always call db.close() (or use the context manager) before exiting. Unflushed writes are in an OS buffer that may not be visible after an unclean shutdown. close() calls fsync() so the AOF is fully durable.
What gets written to disk
Two files are created in the data directory:
| File | Contents |
|---|---|
log.aof | One JSON line per op, in append order. Contains the full Op including seq, client, nonce, op type, payload, timestamp, prev_hash, and hash. Never rewritten — only appended. |
meta.json | The index configuration: list of [coll, field, kind] tuples. Updated on create_index(). Loaded first on open so indexes are rebuilt during log replay. |
On open, NEDB:
- Loads
meta.jsonand registers the index configuration. - Reads
log.aofline by line and folds each Op into the MVCC store, relations, and indexes. - Restores the nonce counters and idempotency map from the ops themselves.
- Opens
log.aofin append mode — all new ops go to the end.
The hash chain is preserved verbatim (hashes are not recomputed on load), so verify() and the head commitment survive restarts exactly.
nedbd — Server Daemon
Run NEDB as a long-lived process and connect clients over HTTP — the way you'd run Redis or Postgres. Each named database is a durable NEDB(path) held open in memory.
nedbd
# nedbd 0.3.1 — http://127.0.0.1:7070 data=./nedb-data auth=off
Environment variables
| Variable | Default | Description |
|---|---|---|
NEDBD_HOST | 127.0.0.1 | Bind address |
NEDBD_PORT | 7070 | Bind port |
NEDBD_DATA | ./nedb-data | Root directory for database files. Each database is a subdirectory. |
NEDBD_TOKEN | unset | Optional bearer token. If set, every /v1/* request must include Authorization: Bearer <token>. |
NEDBD_PORT=9000 NEDBD_DATA=/var/db/nedb NEDBD_TOKEN=my-secret nedbd
HTTP Routes
All request/response bodies are JSON. Authentication (if NEDBD_TOKEN is set) uses Authorization: Bearer <token> on every /v1/* request.
| Route | Description |
|---|---|
GET/health | Ping. Returns {"ok": true, "version": "0.3.1", "databases": [...]}. No auth required. |
GET/v1/databases | List all databases with summary (name, seq, head, rows, collections). |
POST/v1/databases | Create a database. Body: {"name": "shop", "init": {...}}. init is optional — see init schema below. |
GET/v1/databases/:name | Full detail: collections, indexes, seq, head, integrity check, recent log. |
DELETE/v1/databases/:name | Drop the database and delete its files. Irreversible. |
POST/v1/databases/:name/query | Run NQL. Body: {"nql": "FROM users LIMIT 10"}. Returns {"rows": [...], "count": N, "seq": N, "head": "..."} |
POST/v1/databases/:name/put | Insert or replace a row. Body: {"coll": "users", "id": "u1", "doc": {...}, "client"?, "nonce"?, "idem"?} |
POST/v1/databases/:name/index | Add an index. Body: {"coll": "users", "field": "status", "kind": "eq"} |
POST/v1/databases/:name/link | Create a graph edge. Body: {"frm": "users:u1", "rel": "follows", "to": "users:u2"} |
DELETE/v1/databases/:name/rows/:coll/:id | Delete a row by collection and id. Appended to the log. |
GET/v1/databases/:name/verify | Re-walk the hash chain. Returns {"ok": true, "seq": N, "head": "..."} |
GET/v1/databases/:name/log?limit=N | Recent log entries (newest first). Default limit: 50. |
The init payload
When creating a database you can seed it in one call:
{
"name": "shop",
"init": {
"indexes": [
["users", "status", "eq"],
["orders", "total", "ordered"]
],
"seed": {
"users": [{"id": "u1", "name": "Ada", "status": "active"}]
},
"links": [
["users:u1", "placed", "orders:o1"]
]
}
}
curl Examples
curl -X POST http://localhost:7070/v1/databases \ -H 'Content-Type: application/json' \ -d '{"name":"myapp"}'
curl -X POST http://localhost:7070/v1/databases/myapp/query \ -H 'Content-Type: application/json' \ -d '{"nql":"FROM users WHERE status = \"active\" ORDER BY age DESC LIMIT 5"}'
curl -X POST http://localhost:7070/v1/databases/myapp/put \ -H 'Content-Type: application/json' \ -d '{"coll":"users","id":"u1","doc":{"name":"Ada","status":"active","age":31}}'
curl http://localhost:7070/v1/databases/myapp/verify
# {"ok": true, "seq": 1, "head": "a3f..."}
curl http://localhost:7070/v1/databases \
-H 'Authorization: Bearer my-secret'
Node / JavaScript
Connect to a running nedbd instance over HTTP. No native addon required — pure fetch.
const BASE = 'http://127.0.0.1:7070' async function query(db, nql) { const res = await fetch(`${BASE}/v1/databases/${db}/query`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ nql }) }) return (await res.json()).rows } async function put(db, coll, id, doc) { const res = await fetch(`${BASE}/v1/databases/${db}/put`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ coll, id, doc }) }) return res.json() } // Usage const users = await query('myapp', 'FROM users WHERE status = "active"') await put('myapp', 'users', 'u2', { name: 'Bo', status: 'active', age: 28 })
Python requests
import requests BASE = "http://127.0.0.1:7070" rows = requests.post( f"{BASE}/v1/databases/myapp/query", json={"nql": 'FROM users WHERE status = "active" ORDER BY age DESC'} ).json()["rows"] requests.post( f"{BASE}/v1/databases/myapp/put", json={"coll": "users", "id": "u3", "doc": {"name": "Cy", "status": "active"}} )
SQL Compatibility
Translate standard SQL to NQL and NEDB operations — no MySQL or MariaDB code involved. SQL is a familiar entry point; the NEDB engine executes everything natively.
from nedb import NEDB from nedb.sql import sql_exec, sql_to_nql db = NEDB("./data") # SELECT → NQL query sql_exec(db, "SELECT * FROM users WHERE status = 'active' ORDER BY age DESC LIMIT 10") sql_exec(db, "SELECT * FROM users WHERE bio LIKE '%rust%'") # LIKE → SEARCH sql_exec(db, f"SELECT * FROM orders AS OF {snap}") # NEDB extension: time-travel # INSERT → db.put() sql_exec(db, "INSERT INTO users (id, name, age, status) VALUES ('u1', 'Ada', 31, 'active')") # UPDATE → fetch + merge + db.put() (all other fields preserved) sql_exec(db, "UPDATE users SET age = 32 WHERE id = 'u1'") # DELETE → db.delete() sql_exec(db, "DELETE FROM users WHERE id = 'u1'") # See what NQL a SELECT compiles to: sql_to_nql("SELECT * FROM users WHERE status = 'active' ORDER BY age DESC LIMIT 5") # → 'FROM users WHERE status = "active" ORDER BY age DESC LIMIT 5'
Supported SQL
| Statement | Mapped to | Notes |
|---|---|---|
SELECT * | NQL FROM … WHERE … ORDER BY … LIMIT | All columns returned (no projection yet) |
WHERE field = 'v' | WHERE field = "v" | All six ops: = != < <= > >= |
WHERE field LIKE '%x%' | SEARCH "x" | Best-effort; requires a search index |
AS OF n | AS OF n | NEDB extension — time-travel in SQL |
INSERT INTO t (id, …) VALUES (…) | db.put() | An id or _id column is required |
UPDATE t SET f=v WHERE id = x | fetch + merge + db.put() | Only id-targeted updates; all other fields preserved |
DELETE FROM t WHERE id = x | db.delete() | Only id-targeted deletes |
OR, JOIN, subqueries, and aggregates raise SQLUnsupportedError with a clear message. They are on the roadmap.
Redis Compatibility
Map Redis commands to NEDB primitives — no hiredis or Redis server code involved. Redis keys become NEDB collections; the engine handles persistence, integrity, and time-travel automatically.
from nedb import NEDB from nedb.redis_compat import RedisCompat r = RedisCompat(NEDB("./data")) # Strings r.execute("SET", "k", "hello") # → "OK" r.execute("GET", "k") # → "hello" r.execute("INCR", "counter") # → 1 r.execute("MSET", "a", "1", "b", "2") # → "OK" # Hashes (Redis key names with colons work: "user:1") r.execute("HSET", "user:1", "name", "Ada", "age", "31") r.execute("HGET", "user:1", "name") # → "Ada" r.execute("HGETALL", "user:1") # → {"name": "Ada", "age": "31"} # Sets r.execute("SADD", "tags", "python", "rust") r.execute("SMEMBERS", "tags") # → {"python", "rust"} r.execute("SISMEMBER", "tags", "rust") # → 1 # Lists r.execute("RPUSH", "q", "a", "b", "c") r.execute("LRANGE", "q", 0, -1) # → ["a", "b", "c"] r.execute("LPOP", "q") # → "a" # Unsupported — clear error + roadmap link r.execute("EXPIRE", "k", 60) # → RedisUnsupportedError
Command coverage
| Group | Commands |
|---|---|
| Strings | SET GET DEL EXISTS INCR INCRBY DECR DECRBY MSET MGET SETNX GETDEL APPEND STRLEN TYPE RENAME KEYS DBSIZE FLUSHDB |
| Hashes | HSET HMSET HSETNX HGET HMGET HGETALL HDEL HEXISTS HKEYS HVALS HLEN HINCRBY |
| Sets | SADD SMEMBERS SISMEMBER SREM SCARD SUNION SINTER SDIFF |
| Lists | LPUSH RPUSH LRANGE LLEN LINDEX LSET LPOP RPOP |
| Unsupported (roadmap) | EXPIRE TTL PEXPIRE PTTL PERSIST · SUBSCRIBE PUBLISH · MULTI EXEC DISCARD WATCH |
Key encoding: Redis key names that contain characters invalid in NQL collection names (e.g. user:1) are stored using a safe hex encoding (user__3a__1). This is transparent — you always use the original key name in commands.
wrap_redis() — Redis Layer-2
Already running on Redis? Wrap your connection in one line and add NEDB's full feature set alongside your existing app — zero migration required, zero impact on your existing keys.
Two surfaces coexist on the same connection:
- Surface 1 — every Redis command (
r.set,r.hset,r.get, …) passes through to Redis unchanged. Alice's app code doesn't change. - Surface 2 —
r.nedb.*exposes the full NEDB API: NQL queries, time-travel, causal provenance, hash chain verification.
import redis, json from nedb import wrap_redis # ONE LINE — Alice's app code doesn't change r = wrap_redis(redis.Redis("localhost", 6379), db_name="rideshare") # Surface 1 — existing Redis commands, unchanged r.set("driver:d1", json.dumps({"name": "Bob", "status": "active"})) r.get("driver:d1") # → b'{"name": "Bob", ...}' r.hset("trip:t1", mapping={"status": "requested"}) # Surface 2 — new NEDB features on the same connection r.nedb.put("driver", "d1", {"name": "Bob", "status": "active"}) r.nedb.query('FROM driver WHERE status = "active"') r.nedb.verify() # → True
Isolation guarantee
NEDB never writes to Alice's namespace. It owns only keys prefixed nedb:{db_name}::
| Key | Type | Purpose |
|---|---|---|
nedb:{db_name}:oplog | Redis Stream | Append-only op log (in-process mode) |
nedb:{db_name}:snapshot | Redis Hash | Checkpoint |
nedb:{db_name}:meta | Redis Hash | Index configuration |
wrap_redis() signature
wrap_redis(
r, # redis.Redis (or compatible) connection
db_name: str = "default", # logical name; NEDB uses nedb:{db_name}:*
nedbd_url: str = None, # route r.nedb.* to a nedbd server (see below)
nedbd_token: str = None, # bearer token for nedbd auth (optional)
) → WrappedRedis
Local testing — no Redis server needed: use fakeredis as a drop-in. See examples/fakeredis_demo.py in the repo.
Backfill — import existing Redis data
One-time SCAN of Alice's existing Redis keys into NEDB's hash chain. After backfill, all historical data is queryable via NQL, time-travelable, and verified.
Three-step migration
# Step 1 — register: map Redis key globs to NEDB collections (chainable) (r.nedb .register("driver:*", collection="driver", value_parser=json.loads) .register("trip:*", collection="trip", value_type="hash") ) # Step 2 — backfill: import all existing Redis data once imported = r.nedb.backfill() # → int (number of keys imported) # Step 3 — shadow: all future surface-1 writes auto-chain r.nedb.shadow_writes = True
register() — collection mapping
| Param | Type | Description |
|---|---|---|
pattern | str | Redis key glob, e.g. "driver:*" |
collection | str | NEDB collection name |
id_extractor | callable | fn(key) → id. Default: key.rsplit(":", 1)[-1] |
value_parser | callable | fn(raw) → dict. Default: JSON decode, fallback to {"_v": raw} |
value_type | str | "string" (default) · "hash" · "json" |
Returns self for chaining.
backfill() — one-pass SCAN import
# Backfill all registered patterns imported = r.nedb.backfill() # Or backfill a single pattern directly (no prior register needed) imported = r.nedb.backfill("zone:*", "zone", value_parser=json.loads)
Every imported record gets _source: "backfill" and _evidence: "backfill" in the hash chain entry. Returns total keys imported (int).
Write Shadowing
Set r.nedb.shadow_writes = True and every surface-1 write is silently mirrored into NEDB's hash chain. Alice's app writes to Redis normally; NEDB captures the full write history without any code changes.
r.nedb.shadow_writes = True # Alice's code — unchanged r.set("driver:d1", json.dumps({"name": "Bob", "status": "active"})) r.hset("trip:t1", mapping={"status": "en_route", "driver_id": "d1"}) # → both writes auto-chained into NEDB r.nedb.get("driver", "d1") # → {"name": "Bob", "status": "active", "_source": "shadow"} # HSET merges with existing NEDB doc r.nedb.get("trip", "t1") # → {"status": "en_route", "driver_id": "d1", "rider_id": "u1", ...} # Time-travel through shadowed writes snap = r.nedb.seq r.set("driver:d1", json.dumps({"name": "Bob", "status": "offline"})) r.nedb.get_as_of("driver", "d1", snap) # → {"status": "active", ...} # Disable at any time r.nedb.shadow_writes = False
Shadowed commands
The following write commands are intercepted when shadow_writes=True: set setnx setex psetex getset mset msetnx hset hmset hsetnx hincrby hincrbyfloat hdel lpush rpush lset ltrim lpop rpop sadd srem zadd zincrby zrem del unlink rename append incr incrby decr decrby setrange.
Keys that match a registered pattern get a full NEDB put() (NQL-queryable, time-travelable). Unmatched keys get a raw chain entry (tamper-evidence only).
Shadow failures never break the Redis surface call — any error in NEDB is silently swallowed.
wrap_redis() + nedbd Mode
Pass nedbd_url= to route all r.nedb.* calls to a running nedbd server instead of the in-process engine. nedbd handles its own durable AOF persistence on disk; the Redis Stream backend is bypassed entirely.
import redis from nedb import wrap_redis r = wrap_redis( redis.Redis("localhost", 6379), db_name="rideshare", nedbd_url="http://localhost:8421", # ← nedbd server nedbd_token="my-secret", # ← optional bearer token ) # Surface 1 — Redis, unchanged r.set("driver:d1", "...") # Surface 2 — forwarded to nedbd over HTTP/JSON r.nedb.put("driver", "d1", {"name": "Bob"}) r.nedb.query('FROM driver WHERE status = "active"') r.nedb.verify() # → True # Backfill + shadow_writes work identically in nedbd mode r.nedb.register("driver:*", "driver", value_parser=json.loads) r.nedb.backfill() r.nedb.shadow_writes = True
Mode comparison
| Mode | How | Persistence | Best for |
|---|---|---|---|
| In-process | wrap_redis(r, db_name=…) | Redis Stream oplog | Lightweight; single-process apps |
| nedbd | wrap_redis(r, …, nedbd_url=…) | Durable AOF on disk | Multi-process; production deployments |
NedBdProxy is the internal HTTP client that translates r.nedb.* calls to nedbd's /v1/databases/{name}/* API. The surface API is identical in both modes — switching is a one-line change to wrap_redis().
Causal provenance in nedbd mode
As of v1.2.1 the nedbd PUT endpoint accepts caused_by, evidence, confidence, valid_from, and valid_to from the request body and passes them through to the engine:
r.nedb.put("trip", "t1", {"rider": "u1", "driver": "d1"}, caused_by=[r.nedb.seq - 1], evidence="inference", confidence=0.97) r.nedb.query('FROM trip WHERE _id = "t1" TRACE caused_by') # works in nedbd mode
Auto-Indexing
Wrap a NEDB instance with AutoIndexDB and indexes are created automatically based on observed query patterns — no manual create_index() calls required.
from nedb import NEDB, AutoIndexDB db = AutoIndexDB(NEDB("./data"), threshold=5, verbose=True) # Query as normal — field usage is tallied automatically db.query('FROM users WHERE status = "active"') # 1/5 db.query('FROM users WHERE status = "active"') # 2/5 # ... 3 more ... db.query('FROM users WHERE status = "active"') # [autoindex] created eq index on users.status (threshold=5) # Check what's been created and what's close db.analyze() # {"tallies": {"users.status (eq)": 5}, "indexes_created": ["users.status (eq)"], "threshold": 5} db.suggest() # ["users.age (ordered) — 3/5 queries"] ← not yet at threshold # All other NEDB methods work unchanged — AutoIndexDB is a transparent proxy db.put("users", "u1", {"name": "Ada", "status": "active"}) db.get("users", "u1") db.verify()
| Parameter | Type | Description |
|---|---|---|
db | NEDB | Any NEDB instance (embedded or durable) |
threshold | int | Query count before an index is auto-created. Default: 5 |
verbose | bool | Print a message when an index is created. Default: False |
How it works
Every query() call is intercepted. The NQL string is parsed to extract WHERE field names and ORDER BY fields. Each (collection, field, kind) combination is tallied. Once the count reaches threshold, create_index() is called automatically. Equality conditions (=, !=) create an eq index; range comparisons (<, >, <=, >=) and ORDER BY create an ordered index.
Snapshot Checkpoints
Capture the full database state to disk — anchored in the hash chain — so future starts are O(delta) instead of O(total). The chain never breaks: the checkpoint is a real op in the AOF.
from nedb import NEDB db = NEDB("./data") # ... write 100 K rows ... db.checkpoint() # anchor state in the chain → writes snapshot.json db.close() # Next open: loads snapshot then replays only delta ops db2 = NEDB("./data") assert db2.verify() # chain from genesis → checkpoint op → delta: intact
nedbd auto-checkpoints on SIGTERM / SIGINT. Stopping the daemon with Ctrl+C or kill writes a checkpoint for every open database before shutdown. The next start loads from those snapshots — restart time is proportional to writes since the last checkpoint, not total history.
You can also trigger a checkpoint over HTTP at any time:
curl -X POST localhost:7070/v1/databases/mydb/checkpoint
# {"ok": true, "head": "abc...", "seq": 1042}
What the snapshot contains
- MVCC store (every key's HEAD value and write seq)
- Relations (graph edges with added/removed seqs)
- Index configuration (eq / ordered / search specs)
- BlobStore chunks and file manifests (both tiers, compressed)
- Nonce and idempotency tables (replay protection survives restart)
Pre-checkpoint time-travel: AS OF queries for seqs before the snapshot require the full AOF. Keep the log file for archival use if you need indefinite time-travel; otherwise snapshots are self-contained for all operations after the checkpoint seq.
TTL / Key Expiry
Set a time-to-live on any document. Expiry is lazy (checked on every read) and append-only (a delete op is written to the log when a key expires).
# Put with TTL db.put("cache", "session", {"token": "xyz"}, ttl_s=3600) # expires in 1 hour # Set / update TTL on an existing doc db.expire("cache", "session", ttl_s=300) # update to 5 minutes; False if not found # Bulk sweep (call periodically for background maintenance) n = db.sweep() # → number of expired docs deleted
| Method | Description |
|---|---|
put(..., ttl_s=N) | Store _expires_at = now + N in the doc. Lazy expiry on every subsequent get(). |
expire(coll, id, ttl_s) | Set or update TTL on an existing document. Returns False if the document doesn't exist. |
sweep() | Scan and delete all expired documents now. Returns the count deleted. |
Time-travel ignores expiry. get(..., as_of=seq) always returns what was true at that seq, even if the document has since expired. Expiry only fires on HEAD reads.
The Redis adapter's EXPIRE, TTL, and PTTL commands now map to db.expire().
GROUP BY Aggregations
Aggregate query results by a field. Compatible with WHERE, SEARCH, and LIMIT — filtering happens before grouping.
# Count rows per group db.query("FROM orders GROUP BY status COUNT") # → [{"status": "paid", "count": 42}, {"status": "pending", "count": 7}] # Sum a numeric field db.query("FROM sales GROUP BY region SUM revenue") # → [{"region": "north", "count": 3, "sum_revenue": 15000}, ...] # Average db.query("FROM scores GROUP BY grade AVG score") # Min / Max db.query("FROM items GROUP BY category MIN price") db.query("FROM items GROUP BY category MAX price") # With WHERE (filter before grouping) db.query('FROM orders WHERE region = "EU" GROUP BY status COUNT')
Aggregate functions
| Syntax | Output field | Description |
|---|---|---|
GROUP BY f COUNT | count | Number of rows in the group |
GROUP BY f SUM field | sum_field | Sum of field across the group |
GROUP BY f AVG field | avg_field | Average of field |
GROUP BY f MIN field | min_field | Minimum value of field |
GROUP BY f MAX field | max_field | Maximum value of field |
Every group result always includes count alongside the requested aggregate. Non-numeric values are skipped for SUM / AVG / MIN / MAX.
Encryption at Rest
AES-256-GCM at-rest encryption with a double-envelope key structure. Toggle-able — zero overhead when disabled. Encrypts the AOF, snapshot.json, and BlobStore chunks.
The double envelope
Enable encryption
db = NEDB("./data", tmk=bytes.fromhex("a3f1...")) # 32-byte hex key
# .env NEDB_TMK=a3f1... # 64-char hex string (any length accepted, normalised via HKDF) # or a key file NEDB_TMK_FILE=/run/secrets/nedb-tmk
NEDB_TMK=a3f1... nedbd
Key rotation
Re-wrap the DEK under a new TMK without re-encrypting any data. The old TMK is immediately rejected after rotation.
db = NEDB("./data", tmk=old_key) db.rewrap_key(old_tmk=old_key, new_tmk=new_key) db.close() # Database now only opens with new_key
What is encrypted
| File | Coverage |
|---|---|
log.aof | Every op line encrypted individually — {"enc":1,"ct":"<b64>"} |
snapshot.json | Entire file encrypted as a single envelope |
key.enc | Wrapped DEK (never the plaintext DEK) |
| BlobStore chunks | Each chunk encrypted before base64 storage in the snapshot |
The hash chain is unaffected. verify(), AS OF, and Merkle proofs operate on decrypted log entries — encryption is a transparent layer below the log. The chain from genesis through every checkpoint op to the current head remains continuously verifiable.
bundled since v0.5.5: cryptography is now a required dependency — it ships with every pip install nedb-engine. No separate install needed.
RESP2 Wire Protocol
nedbd now speaks the Redis Serialization Protocol (RESP2). redis-cli, redis-benchmark, and every Redis client library in every language connects to nedbd natively — no Redis installation required.
Enable RESP2
# env var: NEDBD_RESP2_PORT (0 = disabled, default) NEDBD_RESP2_PORT=6379 nedbd # Boot log: # nedbd 0.6.0 — http://127.0.0.1:7070 data=./nedb-data auth=off # resp2 — redis:// 127.0.0.1:6379 (RESP2 wire protocol)
Connect with redis-cli
redis-cli -p 6379 PING # PONG redis-cli -p 6379 SELECT salonbooking # OK — open that database redis-cli -p 6379 SET key "hello" # OK redis-cli -p 6379 HSET "user:1" name Ada age 31 # :2 redis-cli -p 6379 SADD tags python rust # :2 redis-cli -p 6379 SMEMBERS tags # {"python", "rust"}
NQL pass-through via EVAL
# EVAL runs any NQL query; rows are returned as JSON strings redis-cli -p 6379 EVAL "FROM users WHERE status = \"active\" LIMIT 5" 0 # 1) "{\"_id\":\"u1\",\"name\":\"Ada\",\"status\":\"active\"}" # 2) "{\"_id\":\"u2\",\"name\":\"Bo\",\"status\":\"active\"}"
SELECT maps to NEDB database names
SELECT <name> opens the named NEDB database (creates it if it doesn't exist). This replaces Redis's integer 0-15 DBs with NEDB's named databases — use the database name you deployed in nedbd.
Command coverage
| Group | Commands |
|---|---|
| Strings | SET GET DEL EXISTS INCR INCRBY DECR DECRBY MSET MGET SETNX GETDEL APPEND STRLEN TYPE RENAME KEYS DBSIZE FLUSHDB |
| Hashes | HSET HMSET HSETNX HGET HMGET HGETALL HDEL HEXISTS HKEYS HVALS HLEN HINCRBY |
| Sets | SADD SMEMBERS SISMEMBER SREM SCARD SUNION SINTER SDIFF |
| Lists | LPUSH RPUSH LRANGE LLEN LINDEX LSET LPOP RPOP |
| Server | PING SELECT COMMAND QUIT DBSIZE KEYS TYPE FLUSHDB |
| NQL | EVAL "<nql>" 0 — run any NQL query |
| Unsupported (roadmap) | EXPIRE TTL PTTL SUBSCRIBE PUBLISH MULTI EXEC — clear -ERR with roadmap note |
Changelog
wrap_redis() + nedbd server mode
wrap_redis() now accepts nedbd_url= and nedbd_token= to route all r.nedb.* calls to a running nedbd HTTP server instead of the in-process engine. New NedBdProxy class translates put/get/query/create_index/link/delete/verify/checkpoint to nedbd's /v1/databases/{name}/* HTTP API. Backfill and write shadowing work identically in nedbd mode. Also: nedbd PUT endpoint now passes caused_by, evidence, confidence, valid_from, valid_to through to the engine (previously only client, nonce, idem were forwarded). 29/29 nedbd integration tests; 74/74 in-process tests.
wrap_redis() backfill + write shadowing
Three-step Redis migration: register() maps key globs to NEDB collections, backfill() scans existing Redis keys via SCAN and imports them into the hash chain in one pass (evidence tagged "backfill"), shadow_writes = True auto-chains all future surface-1 writes. New: CollectionMapping with custom id_extractor and value_parser; chainable register(); direct backfill(pattern, collection) without prior register. 30 new tests; 74/74 total. Also: updated fakeredis_demo.py (29/29 checks) and README wrap_redis section.
wrap_redis() — NEDB as a Redis layer-2
Wrap any existing redis.Redis connection with NEDB in one line: r = wrap_redis(redis.Redis(...), db_name="..."). Two surfaces coexist: Surface 1 — every Redis command passes through unchanged; Surface 2 — r.nedb.* gives time-travel, NQL, causal provenance, and hash-chain verification. NEDB persists via Redis Streams (nedb:{db_name}:oplog). Isolation guarantee: NEDB never writes outside its nedb:{db_name}: prefix. 44 tests; fakeredis demo.
License → GPL-3.0-or-later; npm homepage fix
Updated license in package.json and pyproject.toml from Apache-2.0 to GPL-3.0-or-later. Added explicit "homepage" to package.json to prevent npm from appending #readme.
Rust NQL integer/float comparison fix
serde_json::json!(3.0f64) serialises as Float(3.0) but a document field parsed from JSON arrives as PosInt(3). Fixed by comparing via as_f64() in the Rust nql.rs cmp() function so integer literals and float literals compare equal when numerically equal.
Rust AS OF index bypass fix
The eq index was being used even for AS OF queries; since the index reflects HEAD only, AS OF queries returned stale current-version results. Fixed with an as_of.is_none() guard so the eq index is only used for HEAD reads.
napi-rs Node.js binding rewrite
Complete rewrite of rust/crates/nedb-node/src/lib.rs to match the current NEDB core API. Fixes API drift from the original stub. All napi-rs bindings now match put/get/query/createIndex/link/verify/head/seq/getAsOf.
RESP2 wire protocol — nedbd is a drop-in Redis replacement
Enable with NEDBD_RESP2_PORT=6379 nedbd. redis-cli, redis-benchmark, and every Redis client library connects natively. SELECT <name> opens a named NEDB database. EVAL "<nql>" 0 runs any NQL query. All major command groups supported (strings, hashes, sets, lists); EXPIRE/TTL/SUBSCRIBE/MULTI return clear -ERR with roadmap note. Also: auto backfill-encrypt existing plaintext AOF on first encrypted open (v0.5.6), eager database open at startup so backfill shows in boot log (v0.5.7), cryptography bundled as required dep (v0.5.5).
AES-256-GCM encryption at rest — double envelope TMK
Toggle-able at-rest encryption for the AOF, snapshots, and BlobStore chunks. Double-envelope: a random per-database DEK is wrapped by an external TMK (programmatic, NEDB_TMK env, or key file). The GCM tag detects any tampering on read. Key rotation via db.rewrap_key() re-wraps the DEK without touching data. The hash chain, verify(), and AS OF work unchanged through encryption. Requires pip install cryptography. 38 new tests; 127/127 total.
BlobStore (Cascade files) persisted in checkpoints
Files stored via put_file() are now fully preserved across checkpoint() + restart. The snapshot serialises both BlobStore tiers: compressed chunk bytes (encrypted if DEK set), file manifests, Merkle roots, and dedup stats. get_file(), file_root(), file_proof(), and compression_stats() all work identically after a snapshot-assisted restart.
nedbd auto-checkpoint on SIGTERM / SIGINT
The daemon checkpoints every open database before shutdown (SIGTERM or SIGINT) so the next startup loads from the snapshot with zero delta ops to replay. Signal handler uses a daemon thread to call httpd.shutdown() — avoids deadlock with serve_forever(). Also adds POST /v1/databases/:name/checkpoint for on-demand checkpoints.
Snapshot checkpoints, TTL/expiry, GROUP BY aggregations
Snapshots: db.checkpoint() captures state in a snapshot.json anchored in the hash chain (the checkpoint is a real log op). Future opens are O(delta). TTL: db.put(..., ttl_s=N), db.expire(), db.sweep(). Lazy expiry on read; AS OF reads skip expiry checks. Redis EXPIRE now functional. GROUP BY: FROM t GROUP BY f COUNT|SUM|AVG|MIN|MAX field in NQL. 30 new tests; 89/89 total.
Structured benchmark suite
bench/benchmarks.py: measures GET/PUT/time-travel, indexed vs unindexed query, adapter overhead vs raw NQL, in-memory vs AOF write cost, optional Redis TCP and nedbd HTTP comparisons. Results written to bench/RESULTS.md. README perf table updated with real measured numbers.
SQL adapter, Redis compatibility, auto-indexing
Three new pure-Python adapters, zero external dependencies. SQL: sql_exec(db, sql) translates SELECT/INSERT/UPDATE/DELETE to NQL and NEDB primitives — MariaDB users can write what they know and immediately get time-travel and hash-chain integrity. Redis: RedisCompat(db).execute(cmd, *args) maps SET/GET/HSET/HGETALL/SADD/SMEMBERS/LPUSH/LRANGE and 30+ more commands to NEDB. EXPIRE/TTL/SUBSCRIBE/MULTI return RedisUnsupportedError with a roadmap note. Auto-indexing: AutoIndexDB(db, threshold=5) tallies query field usage and creates indexes automatically. 48 new tests; 59/59 total.
Universal wheel — installs everywhere
Switched build backend to hatchling. Publishes a py3-none-any wheel + sdist that installs on any platform and Python version without a toolchain. Fixes the silent upgrade failure on Intel Mac caused by arm64-only wheels in 0.2.0/0.3.0. The nedbd command is always present after install.
nedbd — server daemon
Run NEDB as a long-lived HTTP/JSON server. pip install nedb-engine now ships the nedbd console command. Each named database is a durable NEDB(path) held open in memory. Multi-database, optional bearer token auth, CORS, ThreadingHTTPServer. Verified: create, seed, query, traverse, verify, kill + restart → data intact.
Durable AOF persistence
NEDB(path) opens a durable database. Every op is appended to log.aof and fsync'd immediately. Index config is snapshotted to meta.json. Database reloads by replaying the log on open — verify(), AS OF time-travel, relations, and the head commitment all survive restarts. NEDB() with no path remains in-memory (fully backward-compatible). Added: flush(), close(), context-manager support.
Stable pure-Python baseline
Universal py3-none-any wheel + sdist. Last release in the 0.1.x line before the persistence work. All features complete: log, MVCC, relations, indexes (eq/ordered/search), NQL, Cascade compression, Merkle proofs, fluent builder. 10/10 invariant tests pass.
Native Rust wheels
First native wheel release — platform wheels for macOS arm64, Linux x86_64, Windows x64 via maturin + PyO3. The optional nedb._native accelerator is loaded lazily; the package works identically without it.
Initial release
Pure-Python reference engine. Full feature set: append-only hash-chained log, MVCC time-travel, replay protection, idempotency, first-class relations, three index types, NQL query language, git-style Cascade-compressed file store, Merkle proofs.