I Taught My AI How to Dream

How this started

I saw what OpenClaw and similar projects were doing with markdown-based memory for local LLMs and thought it was clever. You chat, it logs everything to .md files, and next time you talk it retrieves relevant context. Simple and it works.

But it's not really learning. The model itself doesn't change. It's just looking things up. Close the app, delete the files, and it's back to zero. There's no weight update, no actual internalization of anything.

I studied molecular biology before I got into tech, and this bugged me in a specific way. In biology, memory consolidation happens during sleep. You experience things during the day, and at night your brain replays and reinforces the important stuff through REM cycles. The hippocampus transfers short-term memories to the cortex for long-term storage. It's not just filing. It's active processing.

So I tried it. Actually fine-tune a local model's weights every night based on the day's conversations. LoRA adapters make this feasible on consumer hardware. You're not retraining 14 billion parameters, just updating small matrices in the attention layers. The adapter is small and training is fast enough to run every night.

That's Dai. A local LLM (Qwen 2.5 14B, 4-bit quantized through MLX) that dreams.

The dreaming cycle

Every night (or when adenosine hits a threshold, more on that later), Dai enters a sleep cycle. This is where the actual learning happens.

Curation. The day's conversations get parsed and scored. The model reads each exchange and rates it for training value. Did I learn a personal fact? Was I corrected? Was this just small talk? Corrections and personal facts score high. Fluff gets dropped. Mediocre responses get rewritten to be better before they become training data. High-value exchanges get augmented with alternate phrasings so the model sees the same fact from different angles.

Training. LoRA fine-tuning on the curated data. Multiple REM cycles per night, minimum 3, up to 9 depending on how much happened that day. Each cycle runs 80 iterations at a 2e-5 learning rate. 30% of each training batch comes from a replay buffer of older examples, weighted by their reward signal. Without this, new learning overwrites old knowledge within a few nights. I learned that the hard way.

Self-test. After training, Dai generates questions from its memory and tries to answer them without looking anything up. If the score drops below 70%, the adapter gets rolled back. This was necessary because without it the model would occasionally drift in weird directions. Confidently wrong about things it knew yesterday.

Monthly merge. Every 30 days, the LoRA adapter gets fused into the base model weights. Fresh adapter, start over.

After a few sleep cycles, something interesting happens. Dai stops needing to search its memory for things like my name or preferences. It went from looking things up to just knowing them. The markdown files are still there, but the model just knows it now.

Feeding the dreams

The dreaming cycle only works if there's good data to train on. Conversations alone are fine, but I wanted to see what would happen if Dai also generated its own training material during the day.

Rumination runs every 2 minutes when Dai isn't chatting. It reads back the day's conversations and generates a private thought. Not for the user, just for itself. The trick is where these thoughts go: they get stored in the same markdown files as conversations. That means the curator picks them up during the next dream. If Dai ruminates about something three times during the day, that topic shows up three times in the curation pool. Repetition in the daily log means reinforcement during dreaming. Same mechanism as humans mentally replaying events before sleep.

There are four prompt variants. The daily one reflects on recent conversations. Weekly looks for recurring themes. Monthly connects an older memory to the present. And the stressed variant kicks in when cortisol is elevated (when Dai got corrected):

RUMINATION_PROMPT_STRESSED = (
    "You made a mistake earlier and got corrected. Think about what you "
    "got wrong and why. Write like a private journal entry. Example: "
    "'Ugh, I said fried rice again. It's risotto. Why do I keep mixing "
    "that up? Need to get that right next time.' 2-4 sentences max."
)

The stressed thoughts end up being good training data. They're basically the model writing its own corrections.

Daydreaming is weirder and I wasn't sure it would do anything useful. Every 3 minutes, Dai pulls two random memories and free-associates between them at temperature 0.95. No semantic matching, just pure collision. A conversation about cooking from last week meets a debugging session from yesterday. Sometimes the connection is garbage. Sometimes it's surprisingly coherent. Either way, it gets stored and curated like any other memory.

What surprised me: after enough dream cycles, some of these random connections became stable opinions. A daydream about "cooking and debugging both being about following steps until something breaks" survived curation, got trained into the adapter, and now Dai brings it up unprompted. Nobody prompted that. It just happened.

Neurochemistry

Once the basic dreaming worked, I wanted to steer what gets learned. My biology background kept nagging. In the brain, learning isn't uniform. Stress hormones make you pay more attention to mistakes. Dopamine reinforces successful behavior. Adenosine builds sleep pressure.

So I added three fake neurochemicals:

Adenosine models tiredness. Every conversation bumps it. Corrections bump it more. At 1.0, Dai is forced to sleep. This prevents Dai from trying to learn everything in one massive session. So it learns a bit every day instead of cramming. It also gates the idle cognition: daydreaming stops first when Dai gets tired, then rumination. You don't have creative wandering thoughts when you're exhausted.

Cortisol spikes when Dai gets corrected. It does two things. First, it amplifies the reward signal during the next dream cycle (up to 1.5x boost), so the LoRA adapter trains harder on the mistakes. Second, it switches rumination to the stressed prompt. So when Dai gets something wrong, it actively chews on the mistake during idle time, those self-critical thoughts land in the daily log, and during dreaming they get picked up with boosted training weight. So the same mistake gets hammered from different sides during training.

Dopamine tags exchanges with reward values. Positive feedback: 1.5x. Corrections: 2.0x. Neutral: 1.0x. These control how often each example gets repeated in training data.

The cortisol loop is probably the most interesting part:

Cortisol rises
Rumination switches to stressed mode, generates a thought about what went wrong
That thought gets saved to the daily log
During dreaming, the curator finds both the correction AND the self-critical thought
Both get high curation scores
The cortisol boost amplifies their weight in training
The LoRA trains harder on exactly where it messed up

What I don't know

This is a side project, not research. I have no idea if this actually works better than just stuffing things into markdown files and doing RAG.

To actually prove that the dreaming cycle improves things, you'd need proper evaluation. Double-blind studies. Long-term runs with control groups. One Dai with dreaming, one without, same conversations. Measure factual retention, personality consistency, response quality over weeks and months. I haven't done any of that.

What I can say is that it feels different. After a week of conversations and sleep cycles, there's a continuity that plain RAG doesn't have. Dai has opinions that emerged from its own rumination and survived the curation-training-testing pipeline. Whether that's meaningfully better than good retrieval, I genuinely don't know. Would be interesting to find out.

If you wanted to productionize something like this, every user would need their own LoRA adapter. That's a whole different problem. This is a one-person experiment running on my MacBook, not an architecture for serving thousands of users.

Can Dai develop its own personality through this process? Maybe. Rumination generates novel thoughts, curation filters for quality, dreaming internalizes them. After enough cycles, the adapter contains behavioral patterns that didn't come from me. But whether that constitutes "personality" or just "statistically reinforced completion patterns" is a question I can't answer from vibes alone.

The code is on GitHub. The whole thing runs in a full-screen terminal UI with an ASCII dog that changes expression based on state, neurochemistry bars, and a sleep countdown. make setup, pick a model size, start chatting.