Version: 0.1 (Draft)
Date: March 2026
Status: Living document — will evolve with specs
We mimic the experience of a private 1:1 lesson with the best human tutor in the world.
Everything flows from this. The system should feel like a patient, expert tutor sitting next to a student at their desk — watching their work, listening to their questions, speaking when helpful, staying quiet when the student needs to think.
Not a chatbot. Not an app. A tutor.
| Real life | FreakingGenius |
|---|---|
| Paper notebook | Edge (tablet) |
| Tutor sitting next to you | Tutor (phone, computer, or browser) |
| Tutor watching your paper | Tutor receiving stroke stream |
| Tutor pointing at something | Annotations rendered on Edge |
| Tutor's knowledge of you | Brain (metacognition map) |
| Tutor's curriculum | Curriculum (exercises, skill graph) |
Paper doesn't talk. Paper doesn't have opinions. Paper receives your writing and shows what the tutor points at. The Edge is paper.
The tutor watches, listens, thinks, and speaks. The Tutor process does all of this — running on any device with a microphone and speaker (phone, laptop, or browser tab), positioned near the student.
┌─────────────────────────────────────────┐
│ STUDENT'S DESK │
│ │
│ ┌──────────┐ ┌───────────┐ │
│ │ TUTOR │ │ TABLET │ │
│ │ DEVICE │ │ (Edge) │ │
│ │ │ │ │ │
│ │ 👂 🗣️ │ │ ✏️ 📄 │ │
│ └──────────┘ └───────────┘ │
│ │
│ 🧑🎓 Student │
└─────────────────────────────────────────┘
Tutor device can be: phone, laptop, desktop, or browser tab
┌────────────────────────────────────────────────────────────────────────┐
│ CLOUD │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Brain │ │ Curriculum │ │ DataPlane │ │ControlPlane │ │
│ │ │ │ │ │ │ │ │ │
│ │ Student │ │ Exercises │ │ Events │ │ Auth │ │
│ │ models │ │ Skill graph │ │ Storage │ │ Config │ │
│ │ Skill maps │ │ Templates │ │ Analytics │ │ Billing │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └─────────────┘ │
│ │ │ │ │
└─────────┼────────────────┼────────────────┼───────────────────────────┘
│ │ │
└────────┬───────┴────────────────┘
│
▼
┌────────────────────────────────────────────────────────────────────────┐
│ TUTOR DEVICE (Phone / Computer / Browser) │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ TUTOR │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Voice │ │ Vision │ │ Persona │ │ Session │ │ │
│ │ │ │ │ │ │ │ │ Manager │ │ │
│ │ │ STT/TTS │ │ Stroke │ │ Tone │ │ │ │ │
│ │ │ │ │ → Math │ │ Style │ │ Timing │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ Bridge Protocol │
│ ▼ │
└────────────────────────────────────┼───────────────────────────────────┘
│
┌────────────┴────────────┐
│ BRIDGE │
│ (protocol + library) │
│ │
│ Semantic → Primitives │
│ Capability adaptation │
└────────────┬────────────┘
│
▼
┌────────────────────────────────────────────────────────────────────────┐
│ STUDENT'S TABLET │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ EDGE │ │
│ │ │ │
│ │ • Render exercises │ │
│ │ • Capture strokes │ │
│ │ • Display annotations │ │
│ │ • Report device state │ │
│ │ │ │
│ │ (That's it. It's paper.) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────────┘
The Edge is paper. Silent, passive, receives writing.
| Does | Does NOT |
|---|---|
| Render exercises and content | Make pedagogical decisions |
| Capture and stream strokes | Parse math |
| Display annotations from Tutor | Generate audio |
| Report device state (battery, connection) | Store student history |
| Handle offline queueing | Know anything about the student |
Planned Edge platforms:
Bridge translates Tutor's semantic intent into platform-specific rendering.
The Bridge is NOT a hosted service. It is:
This avoids a network hop for visual translation. Tutor sends semantic commands ("highlight step 3 in warning color"). Bridge library on Edge translates to native rendering.
Why Bridge exists:
The Tutor is the teacher in the room. Watches, listens, speaks, decides.
Deployment options:
tutor.freakinggenius.com — works on any device with mic/speaker| Subcomponent | Responsibility |
|---|---|
| Voice | STT (listen), TTS (speak), turn-taking |
| Vision | Stroke stream → structured math |
| Persona | Tone, personality, communication style |
| Session Manager | Timing, pacing, lesson lifecycle |
| Behavior Engine | When to intervene, what to say |
The Tutor consults Brain (student model) and Curriculum (what to teach) to make decisions. It's the orchestrator.
Strokes → mathematical meaning
Vision receives the stroke stream from Edge and parses it into structured math. This is where we understand "the student wrote 3x + 5 = 14" and "they just subtracted 5 from both sides."
Options:
The Brain remembers everything about each student.
Brain persists across sessions. It answers questions like:
The knowledge base. Exercises, skill graphs, difficulty curves.
Curriculum answers questions like:
The backend, split into:
| Service | Responsibility |
|---|---|
| DataPlane | Event ingestion, student records, analytics warehouse |
| ControlPlane | Auth, config, feature flags, billing, business rules |
| AdminPortal | Internal dashboards, content management |
| ParentPortal | Parent-facing dashboard (progress, summaries) |
| TeacherPortal | Teacher-facing views (class progress, alerts) |
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Student │ │ Tutor │ │ Tablet │ │ Cloud │
│ │ │ Device │ │ (Edge) │ │ │
└────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘
│ │ │ │
│ Opens Tutor app │ │ │
│───────────────────►│ │ │
│ │ │ │
│ │ Fetch student model, session state │
│ │───────────────────────────────────────►│
│ │◄───────────────────────────────────────│
│ │ │ │
│ Opens Edge app │ │ │
│────────────────────────────────────────►│ │
│ │ │ │
│ │ Auto-pair (previously linked) │
│ │◄──────────────────►│ │
│ │ │ │
│ │ HELLO + capabilities │
│ │◄───────────────────│ │
│ │ │ │
│ │ SESSION_CONFIG │ │
│ │───────────────────►│ │
│ │ │ │
│ "Hey! Ready to │ │ │
│ pick up where │ │ │
│ we left off?" │ │ │
│◄───────────────────│ │ │
│ │ │ │
│ │ RENDER_EXERCISE │ │
│ │───────────────────►│ │
│ │ │ Display exercise │
│ │ │───────────────────►│
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Student │ │ Tutor │ │ Tablet │
│ │ │ Device │ │ (Edge) │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
│ Writes "3x" │ │
│────────────────────────────────────────►│
│ │ │
│ │ Stroke batch (200ms)
│ │◄───────────────────│
│ │ │
│ │ Vision: "3x" (accumulating)
│ │ │
│ Writes "+ 5 = 14" │ │
│────────────────────────────────────────►│
│ │ │
│ │ Stroke batch │
│ │◄───────────────────│
│ │ │
│ │ Vision: "3x + 5 = 14" (equation)
│ │ Behavior: wait, let them work
│ │ │
│ Pauses (thinking) │ │
│ │ 30s no input... │
│ │ │
│ "What should I │ │
│ do first?" │ │
│───────────────────►│ │
│ │ │
│ │ STT: "What should I do first?"
│ │ Behavior: student asked, respond
│ │ │
│ "Try to get x │ │
│ alone. What's │ │
│ stopping that?" │ │
│◄───────────────────│ │
│ │ │
│ Writes "3x = 9" │ │
│────────────────────────────────────────►│
│ │ │
│ │ Stroke batch │
│ │◄───────────────────│
│ │ │
│ │ Vision: "3x = 9" ✓ correct step
│ │ Behavior: they got it, stay quiet
│ │ │
│ Writes "x = 3" │ │
│────────────────────────────────────────►│
│ │ │
│ │ Vision: "x = 3" ✓ correct!
│ │ Behavior: praise!
│ │ │
│ "Nice! You │ │
│ isolated x │ │
│ perfectly." │ │
│◄───────────────────│ │
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Student │ │ Tutor │ │ Tablet │
│ │ │ Device │ │ (Edge) │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
│ Writes "3x = 14 + 5" (wrong!) │
│────────────────────────────────────────►│
│ │ │
│ │ Stroke batch │
│ │◄───────────────────│
│ │ │
│ │ Vision: "3x = 14 + 5"
│ │ Behavior: error detected, intervene
│ │ │
│ "Hmm, hold on. │ │
│ Check the sign │ │
│ when you moved │ │
│ the 5." │ │
│◄───────────────────│ │
│ │ │
│ │ ANNOTATE: highlight the "+"
│ │───────────────────►│
│ │ │ Draws highlight
│ │ │
│ "Oh! It should │ │
│ be minus!" │ │
│───────────────────►│ │
│ │ │
│ "Exactly. When │ │
│ you move to the │ │
│ other side, it │ │
│ flips." │ │
│◄───────────────────│ │
Observe fast. React deliberately.
A real tutor doesn't interrupt every stroke. They watch continuously but speak selectively.
| Event | Target Response | Rationale |
|---|---|---|
| Student speaks | <1 second | Conversation — silence feels like ignoring |
| Student says "I'm done" / "check this" | <2 seconds | They're waiting |
| Student finishes a step | 2-5 seconds | Tutor "thinking" feels natural |
| Student makes error mid-work | 1-3 seconds | Fast enough to catch, not creepy |
| Student stuck (30s+ no input) | 5-10 seconds | Gentle nudge |
The transport is real-time. The response is deliberate.
Edge ──────► Bridge ──────► Tutor
stream stream │
(fast) (fast) │
▼
┌──────────┐
│ Perceive │ ← accumulate, parse, wait
└────┬─────┘
│
▼
┌──────────┐
│ Decide │ ← is now the right moment?
└────┬─────┘
│
▼
┌──────────┐
│ Speak │ ← generate response
└──────────┘
| Component | Target | Notes |
|---|---|---|
| Stroke batching | 200ms | 5 updates/sec, feels real-time |
| Voice STT | <500ms | Conversation feel |
| Voice TTS (first byte) | <300ms | No dead air |
| Math parsing | 500ms cycles | Re-parse accumulated strokes |
| Annotation render | <500ms | Feels responsive |
We will have multiple Edge platforms:
Each has different:
The abstraction isolates these differences.
At session start, Edge announces what it can do:
{
"device": "reMarkable2",
"capabilities": {
"color": false,
"audio": false,
"camera": false,
"refreshRate": "slow",
"stylus": { "pressure": true, "tilt": true },
"screenSize": { "width": 1404, "height": 1872 },
"offlineStorage": "2GB"
}
}
Bridge adapts:
Full spec to follow, but the contract is simple:
Edge receives:
RENDER_EXERCISE — show this exerciseRENDER_ANNOTATION — highlight/mark somethingCLEAR_ANNOTATIONS — remove annotationsNAVIGATE — go to next exerciseEdge sends:
STROKE_DATA — stream of pointsGESTURE — undo, erase, next, etc.HEARTBEAT — battery, connectivityThat's it. Edge is paper.
| Decision | Choice | Rationale |
|---|---|---|
| Audio lives on Tutor device, not tablet | Tutor device = teacher, Tablet = paper | Matches real tutor metaphor. reMarkable has no audio. Clean separation. |
| Tutor is web-first (browser) | Browser for v1, native apps later | No App Store delays. Works on any device with mic. PWA for install. |
| Bridge is library, not service | Embed in Edge app | Avoids network hop for visual translation. Latency matters. |
| Stroke batching at 200ms | Batch, don't stream per-point | Feels real-time. Reduces message volume 10x. Survives bad WiFi. |
| Tutor/tablet persistent pairing | Pair once, auto-connect after | Less friction per session. They're a unit. |
| Parent not present in lesson | Summaries after, not real-time | Real tutors work with the student. Parent is sponsor, not participant. |
| BOOX first, reMarkable second | BOOX is open Android | Faster to market. reMarkable requires hack. Abstraction allows both. |
| Vision may run on-device | Hybrid: fast local, refined cloud | <500ms for error detection requires on-device. Complex parsing can be cloud. |
| Question | Options | Notes |
|---|---|---|
| Where does Vision run? | On-device / Cloud / Hybrid | Latency vs. accuracy tradeoff. Need to prototype. |
| Stroke format details | Custom binary? Protobuf? JSON? | Binary is smaller, JSON is debuggable. |
| Math notation format | LaTeX? MathML? Custom AST? | Need to support multi-step work, not just final answer. |
| Offline capability | How much can work without connection? | Edge can queue strokes. Can Tutor work offline with cached Brain/Curriculum? |
| Browser audio constraints | Web Audio API limitations? | Test STT/TTS in browser vs. native. |
| Question | Options | Notes |
|---|---|---|
| What does Tutor screen show during lesson? | Avatar / Waveform / Nothing / Stats | Tutor needs a "presence" but shouldn't distract. |
| How much struggle before Tutor helps? | Configurable? Fixed? | Too fast = no learning. Too slow = frustration. |
| Multiple personas? | One Tutor voice or selectable? | Personalization vs. complexity. |
| Lesson length | Fixed (45m)? Flexible? | Kids need structure, but rigid times don't fit all families. |
| Question | Options | Notes |
|---|---|---|
| What if no second device available? | Tablet-only mode? Tutor runs on tablet? | Fallback for single-device families. |
| Tablet ownership model | Sell? Rent? BNPL? | Per bootstrap strategy — multiple options. |
| Multi-child family | Shared tablet? Separate profiles? | One tablet, multiple kids = cost savings for family. |
| Spec | Purpose | Status |
|---|---|---|
| Architecture Overview | This document | ✅ Draft |
| Stroke Format | Shared data format for ink | 🔜 Next |
| Visual Primitives | Device-agnostic drawing commands | 🔜 Next |
| Edge Contract | What any Edge must implement | 🔜 Next |
| Spec | Purpose | Status |
|---|---|---|
| Bridge Protocol | Tutor ↔ Edge communication | Pending |
| Vision API | Stroke → math recognition | Pending |
| Tutor Behavior | When to speak, what to say | Pending |
| Voice Integration | STT/TTS pipeline | Pending |
| Spec | Purpose | Status |
|---|---|---|
| Curriculum Schema | Exercise + skill graph format | Pending |
| Brain Data Model | Metacognition map structure | Pending |
| Difficulty Progression | How learning adapts | Pending |
| Spec | Purpose | Status |
|---|---|---|
| Session Flow | Lesson lifecycle | Pending |
| Device Pairing | Tutor device ↔ tablet connection | Pending |
| Cloud Services | Storage, auth, dashboards | Pending |
| Term | Definition |
|---|---|
| Edge | The tablet app. Captures strokes, renders exercises. "Paper." |
| Tutor | The web/native app running on any device with mic/speaker. Listens, speaks, watches, decides. "The teacher." |
| Tutor device | Whatever runs the Tutor app: phone, laptop, browser tab, desktop. |
| Bridge | Protocol + library translating Tutor intent → Edge rendering. |
| Brain | Cloud service storing student model, skill levels, history. |
| Curriculum | Cloud service storing exercises, skill graph, content. |
| Vision | Component parsing strokes into structured math. |
| Stroke | A single pen-down to pen-up sequence of points. |
| Annotation | Visual overlay from Tutor (highlights, marks, hints). |
| Document | Status |
|---|---|
| Stroke Format Spec | Pending |
| Visual Primitives Spec | Pending |
| Edge Contract Spec | Pending |
| ... | ... |
This is a living document. As specs are written, this overview will be updated to reflect decisions and link to detailed specs.