FreakingGenius Architecture Overview

Version: 0.1 (Draft)
Date: March 2026
Status: Living document — will evolve with specs

1. First Principle

We mimic the experience of a private 1:1 lesson with the best human tutor in the world.

Everything flows from this. The system should feel like a patient, expert tutor sitting next to a student at their desk — watching their work, listening to their questions, speaking when helpful, staying quiet when the student needs to think.

Not a chatbot. Not an app. A tutor.

2. The Metaphor

Real life	FreakingGenius
Paper notebook	Edge (tablet)
Tutor sitting next to you	Tutor (phone, computer, or browser)
Tutor watching your paper	Tutor receiving stroke stream
Tutor pointing at something	Annotations rendered on Edge
Tutor's knowledge of you	Brain (metacognition map)
Tutor's curriculum	Curriculum (exercises, skill graph)

Paper doesn't talk. Paper doesn't have opinions. Paper receives your writing and shows what the tutor points at. The Edge is paper.

The tutor watches, listens, thinks, and speaks. The Tutor process does all of this — running on any device with a microphone and speaker (phone, laptop, or browser tab), positioned near the student.

3. Physical Setup

┌─────────────────────────────────────────┐
│             STUDENT'S DESK              │
│                                         │
│    ┌──────────┐       ┌───────────┐    │
│    │  TUTOR   │       │  TABLET   │    │
│    │ DEVICE   │       │  (Edge)   │    │
│    │          │       │           │    │
│    │  👂 🗣️   │       │   ✏️ 📄   │    │
│    └──────────┘       └───────────┘    │
│                                         │
│              🧑‍🎓 Student               │
└─────────────────────────────────────────┘

Tutor device can be: phone, laptop, desktop, or browser tab

Tablet (Edge): Silent. Receives writing. Displays exercises and annotations.
Tutor device: Listens. Speaks. Watches what student writes. Decides when to intervene.
Student: Writes on tablet. Talks to Tutor device. Feels like they're working with a tutor.
Parent: Not present. Receives summaries after lessons.

4. System Components

4.1 Component Map

┌────────────────────────────────────────────────────────────────────────┐
│                              CLOUD                                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  │
│  │    Brain    │  │ Curriculum  │  │  DataPlane  │  │ControlPlane │  │
│  │             │  │             │  │             │  │             │  │
│  │ Student     │  │ Exercises   │  │ Events      │  │ Auth        │  │
│  │ models      │  │ Skill graph │  │ Storage     │  │ Config      │  │
│  │ Skill maps  │  │ Templates   │  │ Analytics   │  │ Billing     │  │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └─────────────┘  │
│         │                │                │                           │
└─────────┼────────────────┼────────────────┼───────────────────────────┘
          │                │                │
          └────────┬───────┴────────────────┘
                   │
                   ▼
┌────────────────────────────────────────────────────────────────────────┐
│                      TUTOR DEVICE (Phone / Computer / Browser)         │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │                           TUTOR                                  │  │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐            │  │
│  │  │  Voice  │  │ Vision  │  │ Persona │  │ Session │            │  │
│  │  │         │  │         │  │         │  │ Manager │            │  │
│  │  │ STT/TTS │  │ Stroke  │  │ Tone    │  │         │            │  │
│  │  │         │  │ → Math  │  │ Style   │  │ Timing  │            │  │
│  │  └─────────┘  └─────────┘  └─────────┘  └─────────┘            │  │
│  └─────────────────────────────────────────────────────────────────┘  │
│                                    │                                   │
│                                    │ Bridge Protocol                   │
│                                    ▼                                   │
└────────────────────────────────────┼───────────────────────────────────┘
                                     │
                        ┌────────────┴────────────┐
                        │         BRIDGE          │
                        │   (protocol + library)  │
                        │                         │
                        │  Semantic → Primitives  │
                        │  Capability adaptation  │
                        └────────────┬────────────┘
                                     │
                                     ▼
┌────────────────────────────────────────────────────────────────────────┐
│                         STUDENT'S TABLET                               │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │                            EDGE                                  │  │
│  │                                                                  │  │
│  │   • Render exercises                                            │  │
│  │   • Capture strokes                                             │  │
│  │   • Display annotations                                         │  │
│  │   • Report device state                                         │  │
│  │                                                                  │  │
│  │   (That's it. It's paper.)                                      │  │
│  └─────────────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────────────┘

4.2 Component Responsibilities

Edge (Tablet App)

The Edge is paper. Silent, passive, receives writing.

Does	Does NOT
Render exercises and content	Make pedagogical decisions
Capture and stream strokes	Parse math
Display annotations from Tutor	Generate audio
Report device state (battery, connection)	Store student history
Handle offline queueing	Know anything about the student

Planned Edge platforms:

BOOX (Android e-ink tablets) — v1
reMarkable 2 — v1.1 (pending hack success)
iPad — future
Web browser — future (reduced capability)

Bridge (Protocol + Library)

Bridge translates Tutor's semantic intent into platform-specific rendering.

The Bridge is NOT a hosted service. It is:

A protocol defining messages between Tutor and Edge
A client library embedded in each Edge app

This avoids a network hop for visual translation. Tutor sends semantic commands ("highlight step 3 in warning color"). Bridge library on Edge translates to native rendering.

Why Bridge exists:

Tutor shouldn't know about e-ink refresh rates or screen sizes
Edge shouldn't know about pedagogical concepts
Abstraction enables edge-agnosticism

Tutor (Web App / Native App)

The Tutor is the teacher in the room. Watches, listens, speaks, decides.

Deployment options:

Browser (recommended for v1): tutor.freakinggenius.com — works on any device with mic/speaker
PWA: Installable web app with offline support
Native phone app: iOS/Android for best audio experience
Native desktop app: For families preferring computer

Subcomponent	Responsibility
Voice	STT (listen), TTS (speak), turn-taking
Vision	Stroke stream → structured math
Persona	Tone, personality, communication style
Session Manager	Timing, pacing, lesson lifecycle
Behavior Engine	When to intervene, what to say

The Tutor consults Brain (student model) and Curriculum (what to teach) to make decisions. It's the orchestrator.

Vision (Inside Tutor)

Strokes → mathematical meaning

Vision receives the stroke stream from Edge and parses it into structured math. This is where we understand "the student wrote 3x + 5 = 14" and "they just subtracted 5 from both sides."

Options:

Run on-device (Tutor device) for low latency
Run server-side for better accuracy
Hybrid: fast local pass, server refinement

Brain (Cloud)

The Brain remembers everything about each student.

Metacognition map: What concepts does this student understand?
Skill levels: Where are they on each skill?
Learning path: What should they learn next?
History: What exercises have they done? Where did they struggle?

Brain persists across sessions. It answers questions like:

"This student has been stuck on fraction division for 3 sessions"
"They learn better from visual examples than verbal explanations"
"They tend to rush and make sign errors"

Curriculum (Cloud)

The knowledge base. Exercises, skill graphs, difficulty curves.

Skill graph: Prerequisites, relationships between concepts
Exercise bank: Problems, templates, parametric generators
Content assets: The "Head First" style visuals, quirky explanations
Difficulty curves: How hard is each exercise? What determines progression?

Curriculum answers questions like:

"Give me a fraction multiplication problem at difficulty 3"
"What are the prerequisites for learning quadratic equations?"
"What visual metaphor do we use for negative numbers?"

Cloud (Infrastructure)

The backend, split into:

Service	Responsibility
DataPlane	Event ingestion, student records, analytics warehouse
ControlPlane	Auth, config, feature flags, billing, business rules
AdminPortal	Internal dashboards, content management
ParentPortal	Parent-facing dashboard (progress, summaries)
TeacherPortal	Teacher-facing views (class progress, alerts)

5. Key Flows

5.1 Lesson Start

┌─────────┐          ┌─────────┐          ┌─────────┐          ┌─────────┐
│ Student │          │  Tutor  │          │ Tablet  │          │  Cloud  │
│         │          │ Device  │          │ (Edge)  │          │         │
└────┬────┘          └────┬────┘          └────┬────┘          └────┬────┘
     │                    │                    │                    │
     │  Opens Tutor app   │                    │                    │
     │───────────────────►│                    │                    │
     │                    │                    │                    │
     │                    │  Fetch student model, session state     │
     │                    │───────────────────────────────────────►│
     │                    │◄───────────────────────────────────────│
     │                    │                    │                    │
     │  Opens Edge app    │                    │                    │
     │────────────────────────────────────────►│                    │
     │                    │                    │                    │
     │                    │  Auto-pair (previously linked)         │
     │                    │◄──────────────────►│                    │
     │                    │                    │                    │
     │                    │  HELLO + capabilities                  │
     │                    │◄───────────────────│                    │
     │                    │                    │                    │
     │                    │  SESSION_CONFIG    │                    │
     │                    │───────────────────►│                    │
     │                    │                    │                    │
     │  "Hey! Ready to    │                    │                    │
     │   pick up where    │                    │                    │
     │   we left off?"    │                    │                    │
     │◄───────────────────│                    │                    │
     │                    │                    │                    │
     │                    │  RENDER_EXERCISE   │                    │
     │                    │───────────────────►│                    │
     │                    │                    │  Display exercise  │
     │                    │                    │───────────────────►│

5.2 Working on a Problem (Core Loop)

┌─────────┐          ┌─────────┐          ┌─────────┐
│ Student │          │  Tutor  │          │ Tablet  │
│         │          │ Device  │          │ (Edge)  │
└────┬────┘          └────┬────┘          └────┬────┘
     │                    │                    │
     │  Writes "3x"       │                    │
     │────────────────────────────────────────►│
     │                    │                    │
     │                    │  Stroke batch (200ms)
     │                    │◄───────────────────│
     │                    │                    │
     │                    │  Vision: "3x" (accumulating)
     │                    │                    │
     │  Writes "+ 5 = 14" │                    │
     │────────────────────────────────────────►│
     │                    │                    │
     │                    │  Stroke batch      │
     │                    │◄───────────────────│
     │                    │                    │
     │                    │  Vision: "3x + 5 = 14" (equation)
     │                    │  Behavior: wait, let them work
     │                    │                    │
     │  Pauses (thinking) │                    │
     │                    │  30s no input...   │
     │                    │                    │
     │  "What should I    │                    │
     │   do first?"       │                    │
     │───────────────────►│                    │
     │                    │                    │
     │                    │  STT: "What should I do first?"
     │                    │  Behavior: student asked, respond
     │                    │                    │
     │  "Try to get x     │                    │
     │   alone. What's    │                    │
     │   stopping that?"  │                    │
     │◄───────────────────│                    │
     │                    │                    │
     │  Writes "3x = 9"   │                    │
     │────────────────────────────────────────►│
     │                    │                    │
     │                    │  Stroke batch      │
     │                    │◄───────────────────│
     │                    │                    │
     │                    │  Vision: "3x = 9" ✓ correct step
     │                    │  Behavior: they got it, stay quiet
     │                    │                    │
     │  Writes "x = 3"    │                    │
     │────────────────────────────────────────►│
     │                    │                    │
     │                    │  Vision: "x = 3" ✓ correct!
     │                    │  Behavior: praise!
     │                    │                    │
     │  "Nice! You        │                    │
     │   isolated x       │                    │
     │   perfectly."      │                    │
     │◄───────────────────│                    │

5.3 Catching an Error

┌─────────┐          ┌─────────┐          ┌─────────┐
│ Student │          │  Tutor  │          │ Tablet  │
│         │          │ Device  │          │ (Edge)  │
└────┬────┘          └────┬────┘          └────┬────┘
     │                    │                    │
     │  Writes "3x = 14 + 5"  (wrong!)         │
     │────────────────────────────────────────►│
     │                    │                    │
     │                    │  Stroke batch      │
     │                    │◄───────────────────│
     │                    │                    │
     │                    │  Vision: "3x = 14 + 5"
     │                    │  Behavior: error detected, intervene
     │                    │                    │
     │  "Hmm, hold on.    │                    │
     │   Check the sign   │                    │
     │   when you moved   │                    │
     │   the 5."          │                    │
     │◄───────────────────│                    │
     │                    │                    │
     │                    │  ANNOTATE: highlight the "+" 
     │                    │───────────────────►│
     │                    │                    │  Draws highlight
     │                    │                    │
     │  "Oh! It should    │                    │
     │   be minus!"       │                    │
     │───────────────────►│                    │
     │                    │                    │
     │  "Exactly. When    │                    │
     │   you move to the  │                    │
     │   other side, it   │                    │
     │   flips."          │                    │
     │◄───────────────────│                    │

6. Latency Philosophy

Observe fast. React deliberately.

A real tutor doesn't interrupt every stroke. They watch continuously but speak selectively.

6.1 Tiered Responsiveness

Event	Target Response	Rationale
Student speaks	<1 second	Conversation — silence feels like ignoring
Student says "I'm done" / "check this"	<2 seconds	They're waiting
Student finishes a step	2-5 seconds	Tutor "thinking" feels natural
Student makes error mid-work	1-3 seconds	Fast enough to catch, not creepy
Student stuck (30s+ no input)	5-10 seconds	Gentle nudge

6.2 Transport vs. Response

The transport is real-time. The response is deliberate.

Edge ──────► Bridge ──────► Tutor
     stream        stream       │
    (fast)        (fast)        │
                                ▼
                          ┌──────────┐
                          │ Perceive │ ← accumulate, parse, wait
                          └────┬─────┘
                               │
                               ▼
                          ┌──────────┐
                          │  Decide  │ ← is now the right moment?
                          └────┬─────┘
                               │
                               ▼
                          ┌──────────┐
                          │  Speak   │ ← generate response
                          └──────────┘

6.3 Protocol Targets

Component	Target	Notes
Stroke batching	200ms	5 updates/sec, feels real-time
Voice STT	<500ms	Conversation feel
Voice TTS (first byte)	<300ms	No dead air
Math parsing	500ms cycles	Re-parse accumulated strokes
Annotation render	<500ms	Feels responsive

7. Edge Abstraction

7.1 Why Abstraction Matters

We will have multiple Edge platforms:

BOOX (Android e-ink) — now
reMarkable 2 — soon
iPad — later
Phone-only (no tablet) — per bootstrap strategy
Web — maybe

Each has different:

Screen sizes and aspect ratios
Refresh capabilities (e-ink vs. LCD)
Stylus APIs
Audio capabilities (some have none)
Color support

The abstraction isolates these differences.

7.2 Capability Negotiation

At session start, Edge announces what it can do:

{
  "device": "reMarkable2",
  "capabilities": {
    "color": false,
    "audio": false,
    "camera": false,
    "refreshRate": "slow",
    "stylus": { "pressure": true, "tilt": true },
    "screenSize": { "width": 1404, "height": 1872 },
    "offlineStorage": "2GB"
  }
}

Bridge adapts:

No color → convert color hints to patterns/icons
Slow refresh → batch visual updates, avoid animations
No audio → audio stays on Tutor device (always the case anyway)

7.3 Edge Contract (Preview)

Full spec to follow, but the contract is simple:

Edge receives:

RENDER_EXERCISE — show this exercise
RENDER_ANNOTATION — highlight/mark something
CLEAR_ANNOTATIONS — remove annotations
NAVIGATE — go to next exercise

Edge sends:

STROKE_DATA — stream of points
GESTURE — undo, erase, next, etc.
HEARTBEAT — battery, connectivity

That's it. Edge is paper.

8. Key Decisions

Decision	Choice	Rationale
Audio lives on Tutor device, not tablet	Tutor device = teacher, Tablet = paper	Matches real tutor metaphor. reMarkable has no audio. Clean separation.
Tutor is web-first (browser)	Browser for v1, native apps later	No App Store delays. Works on any device with mic. PWA for install.
Bridge is library, not service	Embed in Edge app	Avoids network hop for visual translation. Latency matters.
Stroke batching at 200ms	Batch, don't stream per-point	Feels real-time. Reduces message volume 10x. Survives bad WiFi.
Tutor/tablet persistent pairing	Pair once, auto-connect after	Less friction per session. They're a unit.
Parent not present in lesson	Summaries after, not real-time	Real tutors work with the student. Parent is sponsor, not participant.
BOOX first, reMarkable second	BOOX is open Android	Faster to market. reMarkable requires hack. Abstraction allows both.
Vision may run on-device	Hybrid: fast local, refined cloud	<500ms for error detection requires on-device. Complex parsing can be cloud.

9. Open Questions

9.1 Technical

Question	Options	Notes
Where does Vision run?	On-device / Cloud / Hybrid	Latency vs. accuracy tradeoff. Need to prototype.
Stroke format details	Custom binary? Protobuf? JSON?	Binary is smaller, JSON is debuggable.
Math notation format	LaTeX? MathML? Custom AST?	Need to support multi-step work, not just final answer.
Offline capability	How much can work without connection?	Edge can queue strokes. Can Tutor work offline with cached Brain/Curriculum?
Browser audio constraints	Web Audio API limitations?	Test STT/TTS in browser vs. native.

9.2 Product

Question	Options	Notes
What does Tutor screen show during lesson?	Avatar / Waveform / Nothing / Stats	Tutor needs a "presence" but shouldn't distract.
How much struggle before Tutor helps?	Configurable? Fixed?	Too fast = no learning. Too slow = frustration.
Multiple personas?	One Tutor voice or selectable?	Personalization vs. complexity.
Lesson length	Fixed (45m)? Flexible?	Kids need structure, but rigid times don't fit all families.

9.3 Business

Question	Options	Notes
What if no second device available?	Tablet-only mode? Tutor runs on tablet?	Fallback for single-device families.
Tablet ownership model	Sell? Rent? BNPL?	Per bootstrap strategy — multiple options.
Multi-child family	Shared tablet? Separate profiles?	One tablet, multiple kids = cost savings for family.

10. Spec Roadmap

Phase 1: Foundation

Spec	Purpose	Status
Architecture Overview	This document	✅ Draft
Stroke Format	Shared data format for ink	🔜 Next
Visual Primitives	Device-agnostic drawing commands	🔜 Next
Edge Contract	What any Edge must implement	🔜 Next

Phase 2: Core Loop

Spec	Purpose	Status
Bridge Protocol	Tutor ↔ Edge communication	Pending
Vision API	Stroke → math recognition	Pending
Tutor Behavior	When to speak, what to say	Pending
Voice Integration	STT/TTS pipeline	Pending

Phase 3: Intelligence

Spec	Purpose	Status
Curriculum Schema	Exercise + skill graph format	Pending
Brain Data Model	Metacognition map structure	Pending
Difficulty Progression	How learning adapts	Pending

Phase 4: Platform

Spec	Purpose	Status
Session Flow	Lesson lifecycle	Pending
Device Pairing	Tutor device ↔ tablet connection	Pending
Cloud Services	Storage, auth, dashboards	Pending

Appendix A: Glossary

Term	Definition
Edge	The tablet app. Captures strokes, renders exercises. "Paper."
Tutor	The web/native app running on any device with mic/speaker. Listens, speaks, watches, decides. "The teacher."
Tutor device	Whatever runs the Tutor app: phone, laptop, browser tab, desktop.
Bridge	Protocol + library translating Tutor intent → Edge rendering.
Brain	Cloud service storing student model, skill levels, history.
Curriculum	Cloud service storing exercises, skill graph, content.
Vision	Component parsing strokes into structured math.
Stroke	A single pen-down to pen-up sequence of points.
Annotation	Visual overlay from Tutor (highlights, marks, hints).

Appendix B: Related Documents

Document	Status
Stroke Format Spec	Pending
Visual Primitives Spec	Pending
Edge Contract Spec	Pending
...	...

This is a living document. As specs are written, this overview will be updated to reflect decisions and link to detailed specs.