Lorebook Formatting Best Practices
by fp32
Token bloat has gotten too high!
Your lorebooks are too big. Everyone has gone insane thinking that since Gemini supports 1M context, we need to jam that context full ASAP. With the death of many free providers, token use is becoming costly and your users need you to use your tokens wisely.
I've seen character entries pushing 4,000 tokens and full lorebooks breaking 30,000 (AVERAGE). You may say "oh fp, it's fine". With lorebook plugins left and right, your context is sitting at 20k before the first message. The way lorebooks are structured means that even if the model (Gemini) can handle large context, we are dealing with a huge block of context from lorebook/prompt entries with little structure, and with services like Lorebary, and many authors, you may be mixing formatting for lorebooks with XML / Natural Language and JSON.
For a very long time format didn't matter, the last real research happened in 2024, and almost every model had it's own quirks.
However, it's 2026, I'm here to influence you to move to a better place.
This guide covers how to format your lorebooks for the best balance of retrieval accuracy and token efficiency, with benchmarks, examples, and prompts you can use to convert your existing entries.
The Problem: Natural Language Lorebooks
When you ask an LLM to generate a lorebook, it spits out a big block of natural language Markdown: bullet points, prose paragraphs, headers everywhere. This feels readable to you, but it's poorly structured for the model to actually use during roleplay. Models don't know what a lorebook is, or how it is used in roleplay. It knows how to get information to a human, you, quickly, and that is natural language.
Benchmarks from Improving Agents put natural language retrieval accuracy at roughly 49.6%. That means it's a coin flip whether the model actually picks up on the detail you carefully prompted Claude to write into your lorebook.
Independently verified on HN.
XML is even worse: it's more verbose and less accurate than the alternative we're about to look at. If you're wrapping your lorebooks in XML tags, you're paying more tokens for worse results.
Here's the full ranking from the Improving Agents study (1,000 records, 1,000 questions, GPT-4.1 nano):
Markdown KV (MOST ACCURATE OPTION)
60.7%
52,104
XML
56.0%
76,114
INI
55.7%
48,100
YAML
54.7%
55,395
HTML
53.6%
75,204
JSON
52.3%
66,396
Markdown Table (TOKEN FRIENDLY OPTION)
51.9%
25,140
**Natural Language (WE ARE HERE)**
49.6%
43,411
JSONL
45.0%
54,407
CSV
44.3%
19,524
On Input Formats
Whatever you pick has to be consistent across ALL your entries, you cannot mix two formats, this is where accuracy and retrieval decreases.
Best Solution: Markdown KV
Markdown KV is a format where each record gets a ## header, and its attributes are listed as key: value pairs inside a code block. This is the exact format that scored 60.7% retrieval accuracy — the highest of all 11 formats tested.
It looks like this:
That's it. A heading, a code fence, key-value pairs inside.
Why does this work so well? The ## header gives the model a clear anchor to locate a record. The code block tells the model "this is structured data, not prose." The key: value pairs are unambiguous — no table borders, no pipe characters, no parsing needed.
Why does this matter for your lorebooks? Most of you are using free-tier or budget models. An independent replication on HN across 30 models confirmed that format choice matters most on weaker models — frontier models like GPT-5 hit near 100% accuracy regardless of format. But on the cheaper models most roleplay platforms use, the format you choose is the difference between your character details actually showing up in the RP or getting ignored.
Example: Complex Characters
This is where Markdown KV shines brightest like a diamond. Most Lorebook bloat comes from Character entries, and they map perfectly to this format: one record, many attributes.
Here's a character entry in natural language from DDM (499 tokens):
Converted to Markdown KV:
Ioverths[Renowned bot creator on a site] usually has well structured lorebook entries, however with libraries like Lorebary mixing many content types, or many lorebook entries, it's hard to keep consistency across many NPCs / many entries, making sure each character has similar keys, even in notes (weapon, likes, dislikes) means that the model will more likely be able to retrieve AND remember these values.
Example: List-Type Lorebook (Archetypes, Dictionaries)
For lorebooks that are lists of many short entries (school archetypes, spell lists, location dictionaries), you have two good options. I pulled this from the front page of Sophia's Lorebary
Accurate Option: Markdown KV (benchmarked at 60.7% accuracy)
Each entry gets its own header and code block:
This is the format that was actually benchmarked. The tradeoff is that each entry's header + code fences add overhead, so for long lists of short entries, this costs more tokens than a Markdown table.
Token Friendly Option: Markdown Table (benchmarked at 51.9% accuracy, but much fewer tokens):
The table version uses roughly 30-40% fewer tokens for list-style data, but retrieval accuracy is about 9 points lower. For simple lookup entries where each item only has 1-2 attributes, the token savings may be worth it.
For anything with 3+ attributes per entry, use Markdown KV.
What About TOON?
I actually started writing this to convince ppl to switch over to TOON but found it's probably not the best for what we need it for, it's also hard to generate accurately with an LLM, I think it's too new for a lot of ppl
You may have seen TOON (Token-Oriented Object Notation) making the rounds. It's a compact format designed to minimize tokens for LLM input, and it does deliver real token savings (30–60% vs JSON) on flat, uniform data.
However, for lorebooks specifically, TOON has some problems:
Independent benchmarks from Improving Agents found that TOON ranked last in accuracy (43.1%) on nested data, behind JSON, Markdown, and YAML. The official TOON benchmarks show better numbers, but those are run by the TOON team on data structures that play to TOON's strengths.
Character sheets are nested data. A character with appearance, personality, relationships, history, and notes is exactly the kind of mixed structure where TOON struggles.
LLMs can't generate it well. An academic paper (arxiv:2601.12014) found that models show lower structural correctness when generating TOON because they lack native training on the format. So you can't even reliably ask an LLM to convert your lorebooks to TOON.
TOON is a genuinely interesting format with a real use case: if you have a lorebook entry that's essentially a flat dictionary or lookup table (like a list of items, spells, or locations with one attribute each), TOON can compress that efficiently. But for character sheets, world lore, and relationship maps, Markdown KV is the better choice.
Quick Reference
Markdown KV
60.7%
Moderate
NPCs, Characters, Settings
Markdown Table
51.9%
Low
Simple lookup lists, dictionaries
Natural Language
49.6%
Varies
Nothing, stop using this
XML
56.0%
Very High
Doubly Stop Using This
TOON
43-74% (varies wildly)
Low
Flat datasets, only use this in case of emergency
Conversion Prompt
Paste this into any LLM along with your lorebook entry to convert it to Markdown KV (MOST ACCURATE OPTION):
Paste this into any LLM along with your lorebook entry to convert it to Markdown Table (TOKEN FRIENDLY OPTION)
TL;DR
Stop writing lorebooks in natural language or XML. The model can't reliably retrieve information from them.
Use Markdown KV (
## heading+ code block withkey: valuepairs) for character sheets and complex entries. This is the format benchmarked at 60.7% retrieval accuracy.Use Markdown tables for simple lookup lists where token savings matter more than the accuracy difference.
PLS STOP USING XML IT IS BLOATED.
Sources: Improving Agents — Table Format Benchmark (Sep 2025), Improving Agents — Nested Data Formats (Oct 2025), Improving Agents — TOON Benchmarks (Oct 2025), HN Independent Replication (Oct 2025), Masciari et al. — arxiv:2601.12014 (Jan 2026)
Page information Last updated: 07 Feb 2026 04:59 UTC, originally from here Maintained by: Corpses (Thanks FP <3) Notes: Added colors, four links and a period (.) *Insert peace out GIF ✌* This guide is community-maintained and may evolve over time.
Was this helpful?
