Dima Kramskoy — Senior Cloud Architect at DoiT International 20+ years software engineering · 10 AWS certifications · AWS Community Builder 2026 · Alumni of Juval Löwy's Architect Master Class (2022)
The Problem: Every Conversation Starts from Zero
Here's the anti-pattern that drives me crazy: you have a 45-minute deep conversation with an AI assistant about your architecture decisions, trade-offs, constraints. You close the tab. Next day, you open a new chat. It knows nothing. You're back to onboarding.
Now multiply that across your entire knowledge surface: Slack threads with context that'll never be searchable again, email chains where decisions got buried in reply #14, docs that exist in three versions across two drives, and tribal knowledge that lives exclusively in people's heads.
No one would like to onboard their replacement with only their Slack history. But that's effectively what we're doing with our AI tools — giving them fragments and expecting coherence.
The core issue isn't intelligence. GPT-4, Claude, Gemini — they're all brilliant. The issue is amnesia. Every session is a cold start. Your context is scattered across a dozen tools, and none of them talk to each other through a unified model of you and your work.
I've been thinking about this for months. Then Karpathy posted a tweet that crystallized the solution.
Karpathy's Inspiration: RAG Retrieves, a Wiki Compounds
In April 2026, Andrej Karpathy posted about "the LLM Wiki" on X — 17 million views and counting. The core insight hit me like a design pattern clicking into place:
"RAG retrieves. A wiki compounds."
His framework is elegant. Three layers:
- Raw Sources — articles, papers, tweets, conversations, notes
- Wiki — distilled, structured, interconnected knowledge pages
- Schema — the ontology that governs how knowledge gets organized
The analogy he uses is perfect: "Obsidian is the IDE. The LLM is the programmer. The wiki is the codebase."
Matt Paige wrote an excellent breakdown of the concept. The key shift: instead of dumping raw documents into a vector database and hoping retrieval finds the right chunk, you build a living, structured knowledge base that the LLM maintains and enriches over time.
RAG is a lookup. A wiki is a system. One is O(1) per query. The other compounds.
But here's what nagged me about Karpathy's version: it's manual. You're the operator. You prompt, you review, you paste, you organize. The LLM assists, but you drive. I wanted something different — an assistant that does this for me, not a system I maintain myself.
Why Amazon Quick Desktop
When I evaluated tools for this implementation, I had a checklist derived directly from Karpathy's architecture:
- ✅ Long-term memory that persists across conversations
- ✅ Knowledge graph (auto-extracts entities/relationships from Slack, email, calendar)
- ✅ Semantic search over local files
- ✅ File read/write access to my filesystem
- ✅ Background tasks (research while I work on other things)
- ✅ Connected tools (Gmail, Slack, Google Calendar, MCP servers)
- ✅ Action layer (can draft emails, create docs, book meetings)
Amazon Quick Desktop already had what Karpathy describes — but built-in and connected to real work tools. It's not a chat window. It's a runtime.
Quick comparison:
| Capability | Claude Desktop | Glean | Amazon Quick |
|---|---|---|---|
| Long-term memory | ❌ (per-project only) | ❌ | ✅ Cross-conversation |
| Knowledge graph | ❌ | Partial (enterprise) | ✅ Auto-extracted |
| File system access | ✅ (MCP) | ❌ | ✅ Native |
| Connected tools | Limited MCP | Enterprise SSO | ✅ Slack/Gmail/Cal/etc. |
| Action layer | File writes only | Search only | ✅ Full (drafts/slides/scheduling) |
| Background tasks | ❌ | ❌ | ✅ Parallel agents |
The key decision: I wanted an assistant that compounds knowledge FOR me, not a system I maintain myself. The maintenance tax of a manual wiki kills adoption. I've seen it a dozen times.
What I Built (in 1 Week)
Folder Structure
~/SecondBrain/├── raw/ # Unprocessed inputs│ ├── articles/│ ├── transcripts/│ └── captures/├── wiki/ # Distilled knowledge│ ├── concepts/ # Mental models, frameworks│ │ ├── second-brain-architecture.md│ │ ├── rag-vs-wiki-pattern.md│ │ └── approval-workflow-pattern.md│ ├── entities/ # People, orgs, tools│ │ ├── doit-international.md│ │ └── amazon-quick.md│ ├── projects/ # Active work│ │ ├── genai-skill-share-talk.md│ │ └── voice-capture-pipeline.md│ ├── sources/ # Indexed references│ │ └── karpathy-llm-wiki.md│ └── log/ # Daily output tracking│ ├── 2026-05-19.md│ ├── 2026-05-20.md│ └── ...├── SCHEMA.md # The ontology└── mkdocs.yml # Auto-served documentationSCHEMA.md (Excerpt)
# SecondBrain Schema v1.0
## Page Types- **concept**: A mental model, pattern, or framework. Must include: definition, when-to-use, anti-patterns, related concepts.- **entity**: A person, org, or tool. Must include: role/purpose, relationship to my work, last interaction date.- **project**: Active or completed work. Must include: status, stakeholders, decisions log, next actions.- **source**: An ingested article/talk/paper. Must include: URL, key insights (max 5), connection to existing concepts.- **log**: Daily entry. Auto-generated. Tracks: pages created, pages updated, questions answered, actions taken.
## Naming Conventionkebab-case. No dates in filenames (use frontmatter).
## Cross-Reference RulesEvery new page MUST link to ≥1 existing page. Orphans are a smell.The Stack
- MkDocs + Material theme — auto-serves the wiki locally on
localhost:8000 - launchd plist — starts MkDocs on login (macOS). Zero friction to browse.
- Amazon Quick — reads/writes the wiki, proposes updates, answers questions against it
- Semantic indexing — Quick indexes
~/SecondBrain/and searches it contextually
The Approval Workflow
This is critical. Nothing writes without my OK. The flow:
- I say "ingest this article" or share a link
- Quick reads it, proposes a new wiki page (or updates to existing ones)
- I see the diff — new content highlighted, cross-references shown
- I approve, modify, or reject
- Approved content writes to disk, MkDocs auto-refreshes
15+ wiki pages in the first week. Not from grinding — from conversations I was already having.
A Day in the Life
Morning:
"Good morning"
Quick responds with: priority emails (flagged or from key people), Slack threads that need my response, today's calendar with prep notes for meetings. Not a firehose — a briefing.
During the day:
"Ingest this: [link to architecture blog post]"
Quick reads it, identifies 3 key concepts, proposes a new sources/ page and updates to two existing concepts/ pages. I skim the diff, approve, done. 90 seconds.
"Draft a blog post about the Second Brain implementation"
It pulls from my wiki pages, knows my voice (from memory of past writing), structures it with my preferred format. I edit, not author from scratch.
"Block 2 hours for deep work on the voice capture pipeline tomorrow"
Checks my calendar, finds a slot, books it, adds prep notes from the projects/voice-capture-pipeline.md page.
Background:
While I'm in meetings, background tasks research topics I queued earlier. When I come back: "I found 3 relevant papers on knowledge graph maintenance. Want me to ingest them?"
The compound effect is real. By day 5, it was answering questions by synthesizing across multiple wiki pages I'd forgotten I approved.
The Voice Capture Extension (PoC)
Best ideas come when I'm walking, not at my desk. So I built a pipeline:
Architecture
iPhone Shortcuts (or Plaud NotePin wearable) → API Gateway (REST) → Lambda (upload handler) → S3 (audio bucket) → S3 Event → Lambda (transcription trigger) → AWS Transcribe → Lambda (post-processing) → Amazon Quick Knowledge Base → Wiki page proposedHow It Works
- I tap a Shortcut on my iPhone (or the NotePin records ambient)
- Audio uploads to S3 via API Gateway + Lambda
- S3 event triggers transcription via AWS Transcribe
- Transcription is cleaned, chunked, and pushed to Quick's knowledge base
- Next time I open Quick: "You had a voice capture about [topic]. Want me to create a wiki page?"
Total AWS cost: fractions of a cent per capture. The Lambda functions are trivial — 50 lines each. The value is in the loop closing: thought → capture → structured knowledge → actionable.
Honest Review: 7/10
What's Working
- Structure — The schema enforces consistency. Pages are findable and useful.
- Compounding — Week 2 answers are noticeably better than Week 1. It knows things.
- Connected tools — Slack context enriches wiki pages. Calendar awareness enables real scheduling.
- Action layer — It doesn't just know things; it does things. Drafts, slides, bookings.
- Semantic search — "What did I decide about X?" actually works across the wiki.
What Needs Work
- Cross-referencing — Not fully automatic yet. Some pages remain under-linked.
- Health checks — Haven't implemented scheduled audits for stale/orphan pages.
- Scheduled ingestion — No cron for "check these 5 RSS feeds daily." Manual trigger still.
- Contradiction detection — Untested. What happens when new info conflicts with existing wiki pages?
What We Did BETTER Than Karpathy's Original Vision
| Karpathy's Version | My Implementation |
|---|---|
| Manual prompting | Approval workflow (assistant proposes) |
| Read-only files | Action layer (produces output) |
| Local Obsidian only | Connected to Slack, Gmail, Calendar |
| No entity awareness | Auto-extracted knowledge graph |
| No voice input | Voice capture pipeline |
| No background work | Parallel background agents |
| Single-user IDE | MkDocs served + shareable |
His vision is the blueprint. But it's read-only — files you query. Mine produces: drafts emails, creates slides, books meetings, writes blog posts. The wiki isn't just a reference; it's a source of action.
How You Can Start (5 Steps)
You don't need my full setup to get value. Here's the gradient:
The 5 Steps
- Connect your tools — Slack, Gmail, Calendar, a local folder. This is 10 minutes in Settings.
- Start talking — Memory compounds from Day 1. Every conversation teaches it about you.
- Say "remember this" after important conversations — Explicit memory anchors.
- Create
~/SecondBrain/+SCHEMA.md— If you want to go deeper, give it structure. - Ask it a question that spans everything — "What were my key decisions last week?" You're hooked.
Three Levels of Commitment
| Level | Effort | What You Get |
|---|---|---|
| 🟢 Lazy | Just talk normally | Memory + knowledge graph compound silently |
| 🟡 Medium | Wiki folder + SCHEMA.md | Structured, searchable, cross-referenced knowledge |
| 🔴 Full | MkDocs + voice pipeline + background agents | Complete second brain with action layer |
Start at 🟢. Seriously. The compounding happens whether you build infrastructure or not. The wiki structure just makes it visible and auditable.
Conclusion
Your AI shouldn't start from scratch every conversation.
The tools exist. The pattern is proven. Karpathy showed the architecture; Amazon Quick provides the runtime. The gap between "AI assistant" and "second brain" is just structure + persistence + connected tools.
Compounding > Retrieving. Every conversation, every ingested article, every approved wiki page makes the next interaction smarter. That's not retrieval — that's growth.
Start small. It compounds. That's the point.
One last thought — and this comes from Jocko Willink, not from AI research: Extreme Ownership applied to knowledge. If information is in your world — a Slack thread, a half-remembered conference talk, an idea on a morning walk — and you don't capture it, you don't own it. It owns you by being unavailable when you need it most.
Capture it. Structure it. Let it compound.
This post was drafted with the help of my Second Brain — pulling from wiki pages I'd built over the prior week. The irony isn't lost on me. That's the whole point.
Presented at DoiT's GenAI Community Skill Share, May 22, 2026. Thanks to the ~27 peers who asked sharp questions and pushed the thinking further.