llmgrep: The Query Layer That Grew With the Stack
In January 2026 I needed a way to search a Magellan database from the command line. Not an IDE plugin, not a language server – just a CLI tool that takes a query and returns structured JSON. Four and a half months later, that tool has 33 published versions, 9 search modes, 12 subcommands, and has evolved from a simple SQL wrapper into the primary query interface for the entire code intelligence stack.
The repo is at github.com/oldnordic/llmgrep. 17.5K lines of Rust, 348 commits, 218 tests. This is the development timeline.
llmgrep is intentionally read-only: it does not own the index, it does not mutate source code, and it does not replace Magellan. It turns indexed code facts into structured queries that humans and agents can consume.
The starting point (January 24)
The first version did one thing: run a SQL query against a Magellan SQLite database and print matching symbols. Name search, regex support, JSON output. That’s it.
The initial scope was deliberately narrow. Magellan indexes code into a relational schema (tables for symbols, references, AST nodes, files). llmgrep was a read-only lens on that data. No indexing, no editing, no background processes.
The first crate published to crates.io was v0.1.0 on January 24, 2026. The first stable release, v1.0.0, came the next day with error handling, path validation, and 277 tests.
Magellan integration layers (January 31 – February 4)
The next two weeks added progressively deeper Magellan integration.
v1.1.0 (January 31) added metrics-based filtering (fan-in, fan-out, cyclomatic complexity), FQN filtering, and symbol ID lookups. A bug in the same release – the JOIN condition compared a SHA hash string to an integer row ID – broke metrics in search results. Fixed the same day in v1.1.1.
v1.2.0 (February 1) added AST filtering: filter results by node kind, nesting depth, and include enriched AST context (parent kind, children count, decision points).
v1.4.0 (February 3) was the first major architecture change. Instead of just querying the database directly, llmgrep could now shell out to magellan CLI for graph algorithms: reachable-from, dead-code detection, cycle detection, forward/backward slicing. The algorithm results came back as SymbolSet JSON files, which llmgrep loaded and used as SQL filters. For large sets (>1000 items), it creates temporary tables instead of IN-clauses.
v2.1.0 (February 4) promoted AST and find-AST to first-class subcommands, added condense and paths filtering, and added Windows support via feature flags.
At this point llmgrep had gone from “SQL wrapper” to “query orchestrator” – composing its own SQL queries with external algorithm results.
The dual-backend experiment (February 10–16)
v3.0.0 (February 10) was the biggest single release. Magellan had added a native binary backend (custom B+Tree format, no SQLite dependency), and llmgrep gained a full abstraction layer to support both backends simultaneously.
New commands exclusive to the native backend:
complete– FQN autocomplete via KV prefix scanlookup– O(1) exact symbol lookup by fully qualified name
The backend detection was automatic: check file extension and header bytes, route to the right implementation. Feature flags controlled what was compiled in.
The native backend went through two iterations (v2, then v3) inside llmgrep. Both had persistence bugs – KV data lost across process restarts (fixed in v3.0.1), then node data lost on reopen (fixed in v3.0.8 via sqlitegraph 2.0.5).
In the end, the native backends were removed. For the codebase sizes in my stack, SQLite handles the query workload. The dual-backend abstraction added complexity without measurable benefit. v3.5.1 (May 26) deleted 1,332 lines of geometric backend code and simplified the crate back to SQLite-only.
This was the right call. The abstraction was clean, but maintaining two parallel code paths for a read-only query tool wasn’t worth it.
Security hardening (April 23)
v3.1.4 fixed real vulnerabilities:
- SQL injection: AST context retrieval used string-interpolated SQL. Replaced all interpolation with rusqlite parameterized queries. User-provided
--ast-kindvalues could have been exploited before this fix. - Panic vectors: Five
.unwrap()and.expect()calls in production paths that could crash on malformed databases. Replaced with proper error propagation. - Watch mode performance: Nested O(n*m) loops for delta computation replaced with HashSet lookups – O(n+m).
- Negative i64 to u64 cast: Corrupted data could produce huge values instead of errors. Fixed to detect and reject.
This was the release where llmgrep stopped being “my tool” and started being “a tool I publish.”
From symbol search to knowledge queries (May 6–10)
Three releases in five days expanded what llmgrep could find:
v3.2.0 (May 6) switched search to FTS5 OR semantics. Multi-word queries like "Mutex RwLock" now match symbols containing either word instead of requiring both in exact order. A query like "test print" went from 0 results to 744.
v3.3.0 (May 6) added --mode implements for type-trait relationships. Queries like llmgrep search --query "Debug" --mode implements return every type that implements a trait, or every trait a type implements.
v3.3.1 (May 10) added --mode docs and --mode facts. These query Magellan’s source_documents and candidate_facts tables – wiki pages, specs, messages, and knowledge triples extracted from the codebase. llmgrep was no longer just a code search tool; it was a knowledge graph query interface.
Intelligence features (May 20–29)
explore (May 20, unreleased) takes a natural-language intent string (--intent "error handling"), tokenizes it, searches via FTS5 and LIKE, ranks by name match + fan-in, and clusters results by file/module. No embeddings required.
evolve (v3.5.0, May 26) scores symbols by fan_in * cyclomatic_complexity to identify high-impact refactoring candidates. Supports --dry-run, --min-score, and writes candidates to the database.
stats (v3.5.0, May 26) generates code health summaries: symbol counts by kind, dead code detection, top hotspots by composite score, coverage gap analysis.
navigate (v3.6.0, May 28) uses Magellan’s SymbolNavigator library API directly (no shell-out) for depth-aware graph traversal. Walk callers or callees up to configurable depth.
forge module (v3.7.0, May 29) is a high-level library API for programmatic access. Six convenience functions (search_symbols, search_symbols_regex, search_symbols_by_language, search_references, search_calls, lookup_symbol) that external agents and tools can call without going through the CLI.
--mode semantic (v3.8.0, June 8) uses HNSW vector similarity for natural-language code search. llmgrep only searches embeddings – Magellan owns embedding generation via magellan embed.
What I’d do differently
The native backend detour was unnecessary. Six weeks of development (v3.0.0 through v3.1.6) supporting dual backends, debugging persistence bugs, maintaining feature flags. For the query patterns in my stack, SQLite is fast enough. I should have validated that earlier.
Shell-out integration should have been library calls from the start. v1.4.0 shelled out to magellan CLI for graph algorithms. v3.6.0 replaced that with direct library API calls. The subprocess overhead and version coupling weren’t worth the initial simplicity.
The module split happened too late. main.rs hit 3,436 lines before v3.5.0 split it into modular dispatch. Should have happened at ~1,000 lines.
What worked
The read-only constraint. llmgrep never writes to the Magellan database (except evolve which writes to a separate table). This means zero risk of corrupting the index. Any tool in the stack can query the same database concurrently without coordination.
Structured error codes. The SPL-E1xx series (CLI errors), SPL-E2xx (backend errors) make errors parseable. Agents can distinguish “symbol not found” from “database too old” from “feature not compiled in.”
FTS5 for text search. Once I switched to OR semantics, symbol search became genuinely useful. No embeddings needed for name-based queries.
The forge module. Being able to call search_symbols("parse", db) from Rust code without parsing CLI output is the difference between “scriptable” and “programmable.”
By the numbers
| Metric | Value |
|---|---|
| Lines of Rust | 17,584 |
| Published versions | 33 |
| Search modes | 9 |
| Subcommands | 12 |
| Tests | 218 |
| Commits | 348 |
| Crates.io downloads | 550 |
| Development span | January – June 2026 |
| Current version | 3.8.0 |
Downloads are modest – this is a niche tool for a specific stack. The 550 total downloads across 33 versions reflect its actual audience: me, my agents, and a handful of people who found it via the Magellan ecosystem.
The tool in context
llmgrep sits between Magellan (the indexer) and the tools that consume code intelligence:
Source files → Magellan (index) → .db file → llmgrep (query) → JSON output
↑
forge module → agents, tools
It doesn’t index code, generate embeddings, edit files, or analyze control flow. It answers questions about code that has already been indexed. The 9 search modes cover symbols, references, call graphs, type-trait implementations, knowledge documents, knowledge facts, semantic similarity, labels, and AST structure. The 12 subcommands add autocomplete, lookup, graph navigation, code health stats, refactoring candidate scoring, and intent-based exploration.
The code is at github.com/oldnordic/llmgrep. The crate is on crates.io. GPL-3.0-only.