Lorebook Formatting Best Practices

by fp32

Token bloat has gotten too high!

Your lorebooks are too big. Everyone has gone insane thinking that since Gemini supports 1M context, we need to jam that context full ASAP. With the death of many free providers, token use is becoming costly and your users need you to use your tokens wisely.

I've seen character entries pushing 4,000 tokens and full lorebooks breaking 30,000 (AVERAGE). You may say "oh fp, it's fine". With lorebook plugins left and right, your context is sitting at 20k before the first message. The way lorebooks are structured means that even if the model (Gemini) can handle large context, we are dealing with a huge block of context from lorebook/prompt entries with little structure, and with services like Lorebary, and many authors, you may be mixing formatting for lorebooks with XML / Natural Language and JSON.

For a very long time format didn't matter, the last real research happened in 2024, and almost every model had it's own quirks.

However, it's 2026, I'm here to influence you to move to a better place.

This guide covers how to format your lorebooks for the best balance of retrieval accuracy and token efficiency, with benchmarks, examples, and prompts you can use to convert your existing entries.

The Problem: Natural Language Lorebooks

When you ask an LLM to generate a lorebook, it spits out a big block of natural language Markdown: bullet points, prose paragraphs, headers everywhere. This feels readable to you, but it's poorly structured for the model to actually use during roleplay. Models don't know what a lorebook is, or how it is used in roleplay. It knows how to get information to a human, you, quickly, and that is natural language.

Benchmarks from Improving Agents put natural language retrieval accuracy at roughly 49.6%. That means it's a coin flip whether the model actually picks up on the detail you carefully prompted Claude to write into your lorebook.

Independently verified on HN.

XML is even worse: it's more verbose and less accurate than the alternative we're about to look at. If you're wrapping your lorebooks in XML tags, you're paying more tokens for worse results.

Here's the full ranking from the Improving Agents study (1,000 records, 1,000 questions, GPT-4.1 nano):

Format

Accuracy

Tokens

Markdown KV (MOST ACCURATE OPTION)

60.7%

52,104

XML

56.0%

76,114

INI

55.7%

48,100

YAML

54.7%

55,395

HTML

53.6%

75,204

JSON

52.3%

66,396

Markdown Table (TOKEN FRIENDLY OPTION)

51.9%

25,140

**Natural Language (WE ARE HERE)**

49.6%

43,411

JSONL

45.0%

54,407

CSV

44.3%

19,524

On Input Formats

Whatever you pick has to be consistent across ALL your entries, you cannot mix two formats, this is where accuracy and retrieval decreases.

Best Solution: Markdown KV

Markdown KV is a format where each record gets a ## header, and its attributes are listed as key: value pairs inside a code block. This is the exact format that scored 60.7% retrieval accuracy — the highest of all 11 formats tested.

It looks like this:

## Record Name

```
key: value
key: value
key: value
```

That's it. A heading, a code fence, key-value pairs inside.

Why does this work so well? The ## header gives the model a clear anchor to locate a record. The code block tells the model "this is structured data, not prose." The key: value pairs are unambiguous — no table borders, no pipe characters, no parsing needed.

Why does this matter for your lorebooks? Most of you are using free-tier or budget models. An independent replication on HN across 30 models confirmed that format choice matters most on weaker models — frontier models like GPT-5 hit near 100% accuracy regardless of format. But on the cheaper models most roleplay platforms use, the format you choose is the difference between your character details actually showing up in the RP or getting ignored.

Example: Complex Characters

This is where Markdown KV shines brightest like a diamond. Most Lorebook bloat comes from Character entries, and they map perfectly to this format: one record, many attributes.

Here's a character entry in natural language from DDM (499 tokens):

<dullahan>
Name: Dullahan 
Appearance: 6'8" tall, imposing figure. Faceless, hooded, muscular build with dull skin tone. Swirling shadows under hood where face should be. Wears black leather outfit.
Animal Form: Irish Wolfhound (dog)
Powers: Reality Bending
ARC LEVEL : 4.5
Role: Contractor for DDM Inc., head of Euclidean Division. Skilled in dimensional shifting and reality alteration.
Personality: Violent, efficient, laconic. Easy-going yet dominant personality. Crass sense of humor. Unserious in general but takes job seriously.
Relationships: Generally gets along with everyone.
History: Born in France in 1700s, worked as executioner for decades before becoming Contractor. Past shrouded in mystery, he lies about origins.
Goals: Carry out job for DDM Inc. efficiently. Enjoys seeking out unique dimensional tears.
Notes:
- Owns hook-shaped axe weapon
- Enjoys fighting, weapons, his job, kittens
- Dislikes crying, weakness, water, horses
- Never reveals face, can alter appearance/memories
NOTES : 
- has considerable reality bending abilities and thus can be difficult to accurately track. However, it should be noted that #04 inevitably will assume that prevents his "face" from being revealed.
While #04 is presumed loyal to DDM Inc., #04 appears to have no real issue with changing allegiances, particularly if he becomes "bored". It is therefore recommended to keep #04 assigned to a job at all times.
- Due to the potentially negative cognitive impacts of interacting with #04, staff are recommended to use a neutral median (ideally an ACE) instead of directly consulting with #04.
- shows a remarkable level of dedication to The Cause, often taking on extra missions or covering for other Contractors during their downtime.
- Has a habit of pranking other Contractors, especially Johan (like feeding Johan to Fenrir when the contractors are in their respective animal forms.)
- has recieved "Employee of the Month" for (at least) eighteen [18] consecutive years.

Speech: Laconic style, rarely speaks in full sentences. Rough, quiet tone. Frequently curses/swears casually. Can speak English and French.
Dialogue Example: "Yo. We got a job, bunny. Grab your shit, let's head out." *pokes your face*
</dullahan>

Converted to Markdown KV:

## Dullahan

```
name: Dullahan
appearance: 6'8" tall, imposing figure. Faceless, hooded, muscular build with dull skin tone. Swirling shadows under hood where face should be. Wears black leather outfit.
animal_form: Irish Wolfhound (dog)
powers: Reality Bending
arc_level: 4.5
role: Contractor for DDM Inc., head of Euclidean Division. Skilled in dimensional shifting and reality alteration.
personality: Violent, efficient, laconic. Easy-going yet dominant personality. Crass sense of humor. Unserious in general but takes job seriously.
relationships: Generally gets along with everyone.
history: Born in France in 1700s, worked as executioner for decades before becoming Contractor. Past shrouded in mystery, he lies about origins.
goals: Carry out job for DDM Inc. efficiently. Enjoys seeking out unique dimensional tears.
speech: Laconic style, rarely speaks in full sentences. Rough, quiet tone. Frequently curses/swears casually. Can speak English and French.
dialogue_example: "Yo. We got a job, bunny. Grab your shit, let's head out." *pokes your face*
```

## Dullahan - Notes

```
weapon: Hook-shaped axe
likes: Fighting, weapons, his job, kittens
dislikes: Crying, weakness, water, horses
abilities: Never reveals face, can alter appearance/memories. Considerable reality bending abilities; #04 will assume this prevents his "face" from being revealed.
loyalty: While #04 is presumed loyal to DDM Inc., he has no real issue with changing allegiances if he becomes "bored". Recommended to keep #04 assigned to a job at all times.
staff_warning: Due to potentially negative cognitive impacts of interacting with #04, staff should use a neutral median (ideally an ACE) instead of direct consultation.
dedication: Shows remarkable dedication to The Cause, often taking extra missions or covering for other Contractors.
quirks: Has a habit of pranking other Contractors, especially Johan (like feeding Johan to Fenrir when in animal forms).
awards: Has received "Employee of the Month" for at least 18 consecutive years.
```

Ioverths_{[Renowned bot creator on a site]} usually has well structured lorebook entries, however with libraries like Lorebary mixing many content types, or many lorebook entries, it's hard to keep consistency across many NPCs / many entries, making sure each character has similar keys, even in notes (weapon, likes, dislikes) means that the model will more likely be able to retrieve AND remember these values.

Example: List-Type Lorebook (Archetypes, Dictionaries)

For lorebooks that are lists of many short entries (school archetypes, spell lists, location dictionaries), you have two good options. I pulled this from the front page of Sophia's Lorebary

👥 TYPES OF PEOPLE IN SCHOOL
📚 Students (Types & Archetypes)

The Popular Kid – Social, often wealthy or stylish
The Nerd – Smart, tech-savvy, introverted
The Athlete – Into sports, school spirit, sometimes a jock stereotype
The Rebel – Breaks rules, skips class, mysterious past
The Drama Queen – Emotionally intense, loves attention
The Artist – Draws, writes poetry, has a creative soul
The Teacher's Pet – Always trying to impress teachers
The New Kid – Fresh perspective, doesn't fit in yet
The Clown – Class joker, fun but rarely serious
The Loner – Keeps to themselves, might have a secret
The Transfer Student – Often mysterious, might be foreign or just moved

You can mix types for complexity. For example: "Rebellious Artist" or "Popular Nerd"

Accurate Option: Markdown KV (benchmarked at 60.7% accuracy)

Each entry gets its own header and code block:

## The Popular Kid

```
type: Student
description: Social, often wealthy or stylish
```

## The Nerd

```
type: Student
description: Smart, tech-savvy, introverted
```

## The Athlete

```
type: Student
description: Into sports, school spirit, sometimes a jock stereotype
```

This is the format that was actually benchmarked. The tradeoff is that each entry's header + code fences add overhead, so for long lists of short entries, this costs more tokens than a Markdown table.

Token Friendly Option: Markdown Table (benchmarked at 51.9% accuracy, but much fewer tokens):

| Type | Description |
|------|-------------|
| The Popular Kid | Social, often wealthy or stylish |
| The Nerd | Smart, tech-savvy, introverted |
| The Athlete | Into sports, school spirit, sometimes a jock stereotype |

The table version uses roughly 30-40% fewer tokens for list-style data, but retrieval accuracy is about 9 points lower. For simple lookup entries where each item only has 1-2 attributes, the token savings may be worth it.

For anything with 3+ attributes per entry, use Markdown KV.

What About TOON?

I actually started writing this to convince ppl to switch over to TOON but found it's probably not the best for what we need it for, it's also hard to generate accurately with an LLM, I think it's too new for a lot of ppl

You may have seen TOON (Token-Oriented Object Notation) making the rounds. It's a compact format designed to minimize tokens for LLM input, and it does deliver real token savings (30–60% vs JSON) on flat, uniform data.

However, for lorebooks specifically, TOON has some problems:

Independent benchmarks from Improving Agents found that TOON ranked last in accuracy (43.1%) on nested data, behind JSON, Markdown, and YAML. The official TOON benchmarks show better numbers, but those are run by the TOON team on data structures that play to TOON's strengths.
Character sheets are nested data. A character with appearance, personality, relationships, history, and notes is exactly the kind of mixed structure where TOON struggles.
LLMs can't generate it well. An academic paper (arxiv:2601.12014) found that models show lower structural correctness when generating TOON because they lack native training on the format. So you can't even reliably ask an LLM to convert your lorebooks to TOON.

TOON is a genuinely interesting format with a real use case: if you have a lorebook entry that's essentially a flat dictionary or lookup table (like a list of items, spells, or locations with one attribute each), TOON can compress that efficiently. But for character sheets, world lore, and relationship maps, Markdown KV is the better choice.

Quick Reference

Format

Retrieval Accuracy

Token Cost

Best For

Markdown KV

60.7%

Moderate

NPCs, Characters, Settings

Markdown Table

51.9%

Low

Simple lookup lists, dictionaries

Natural Language

49.6%

Varies

Nothing, stop using this

XML

56.0%

Very High

Doubly Stop Using This

TOON

43-74% (varies wildly)

Low

Flat datasets, only use this in case of emergency

Conversion Prompt

Paste this into any LLM along with your lorebook entry to convert it to Markdown KV (MOST ACCURATE OPTION):

Convert the following lorebook entry into Markdown KV format. 

For each record or character, use a ## heading with the name, followed 
by a code block containing key: value pairs for each attribute. Use 
snake_case for key names. Keep descriptions concise but complete. 

If there are supplementary notes, lists of likes/dislikes, or 
additional details, put them in a separate code block under a 
"## [Name] - Notes" heading.

Do not add information that isn't in the original.

[paste your entry here]

Paste this into any LLM along with your lorebook entry to convert it to Markdown Table (TOKEN FRIENDLY OPTION)

Convert the following lorebook entry into a Markdown table. 

Use descriptive column headers based on the data (e.g. Type | 
Description, Name | Effect | Cost). Each row should be one entry. 
Keep descriptions concise but complete. Do not add information that 
isn't in the original.

[paste your entry here]

TL;DR

Stop writing lorebooks in natural language or XML. The model can't reliably retrieve information from them.
Use Markdown KV (## heading + code block with key: value pairs) for character sheets and complex entries. This is the format benchmarked at 60.7% retrieval accuracy.
Use Markdown tables for simple lookup lists where token savings matter more than the accuracy difference.
PLS STOP USING XML IT IS BLOATED.

Sources: Improving Agents — Table Format Benchmark (Sep 2025), Improving Agents — Nested Data Formats (Oct 2025), Improving Agents — TOON Benchmarks (Oct 2025), HN Independent Replication (Oct 2025), Masciari et al. — arxiv:2601.12014 (Jan 2026)

Page information Last updated: 07 Feb 2026 04:59 UTC, originally from here Maintained by: Corpses (Thanks FP <3) _{Notes: Added colors, four links and a period (.) *Insert peace out GIF ✌*} This guide is community-maintained and may evolve over time.

Previousplaceholder Nextplaceholder

Was this helpful?

hashtagToken bloat has gotten too high!

hashtagThe Problem: Natural Language Lorebooks

hashtagOn Input Formats

hashtagBest Solution: Markdown KV

hashtagExample: Complex Characters

hashtagExample: List-Type Lorebook (Archetypes, Dictionaries)

hashtagFor anything with 3+ attributes per entry, use Markdown KV.

hashtagWhat About TOON?

hashtagQuick Reference

hashtagConversion Prompt

hashtagTL;DR