← Back to documentation

FreakingGenius Architecture Overview

Version: 0.1 (Draft)
Date: March 2026
Status: Living document — will evolve with specs


1. First Principle

We mimic the experience of a private 1:1 lesson with the best human tutor in the world.

Everything flows from this. The system should feel like a patient, expert tutor sitting next to a student at their desk — watching their work, listening to their questions, speaking when helpful, staying quiet when the student needs to think.

Not a chatbot. Not an app. A tutor.


2. The Metaphor

Real life FreakingGenius
Paper notebook Edge (tablet)
Tutor sitting next to you Tutor (phone, computer, or browser)
Tutor watching your paper Tutor receiving stroke stream
Tutor pointing at something Annotations rendered on Edge
Tutor's knowledge of you Brain (metacognition map)
Tutor's curriculum Curriculum (exercises, skill graph)

Paper doesn't talk. Paper doesn't have opinions. Paper receives your writing and shows what the tutor points at. The Edge is paper.

The tutor watches, listens, thinks, and speaks. The Tutor process does all of this — running on any device with a microphone and speaker (phone, laptop, or browser tab), positioned near the student.


3. Physical Setup

┌─────────────────────────────────────────┐
│             STUDENT'S DESK              │
│                                         │
│    ┌──────────┐       ┌───────────┐    │
│    │  TUTOR   │       │  TABLET   │    │
│    │ DEVICE   │       │  (Edge)   │    │
│    │          │       │           │    │
│    │  👂 🗣️   │       │   ✏️ 📄   │    │
│    └──────────┘       └───────────┘    │
│                                         │
│              🧑‍🎓 Student               │
└─────────────────────────────────────────┘

Tutor device can be: phone, laptop, desktop, or browser tab

4. System Components

4.1 Component Map

┌────────────────────────────────────────────────────────────────────────┐
│                              CLOUD                                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  │
│  │    Brain    │  │ Curriculum  │  │  DataPlane  │  │ControlPlane │  │
│  │             │  │             │  │             │  │             │  │
│  │ Student     │  │ Exercises   │  │ Events      │  │ Auth        │  │
│  │ models      │  │ Skill graph │  │ Storage     │  │ Config      │  │
│  │ Skill maps  │  │ Templates   │  │ Analytics   │  │ Billing     │  │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └─────────────┘  │
│         │                │                │                           │
└─────────┼────────────────┼────────────────┼───────────────────────────┘
          │                │                │
          └────────┬───────┴────────────────┘
                   │
                   ▼
┌────────────────────────────────────────────────────────────────────────┐
│                      TUTOR DEVICE (Phone / Computer / Browser)         │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │                           TUTOR                                  │  │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐            │  │
│  │  │  Voice  │  │ Vision  │  │ Persona │  │ Session │            │  │
│  │  │         │  │         │  │         │  │ Manager │            │  │
│  │  │ STT/TTS │  │ Stroke  │  │ Tone    │  │         │            │  │
│  │  │         │  │ → Math  │  │ Style   │  │ Timing  │            │  │
│  │  └─────────┘  └─────────┘  └─────────┘  └─────────┘            │  │
│  └─────────────────────────────────────────────────────────────────┘  │
│                                    │                                   │
│                                    │ Bridge Protocol                   │
│                                    ▼                                   │
└────────────────────────────────────┼───────────────────────────────────┘
                                     │
                        ┌────────────┴────────────┐
                        │         BRIDGE          │
                        │   (protocol + library)  │
                        │                         │
                        │  Semantic → Primitives  │
                        │  Capability adaptation  │
                        └────────────┬────────────┘
                                     │
                                     ▼
┌────────────────────────────────────────────────────────────────────────┐
│                         STUDENT'S TABLET                               │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │                            EDGE                                  │  │
│  │                                                                  │  │
│  │   • Render exercises                                            │  │
│  │   • Capture strokes                                             │  │
│  │   • Display annotations                                         │  │
│  │   • Report device state                                         │  │
│  │                                                                  │  │
│  │   (That's it. It's paper.)                                      │  │
│  └─────────────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────────────┘

4.2 Component Responsibilities

Edge (Tablet App)

The Edge is paper. Silent, passive, receives writing.

Does Does NOT
Render exercises and content Make pedagogical decisions
Capture and stream strokes Parse math
Display annotations from Tutor Generate audio
Report device state (battery, connection) Store student history
Handle offline queueing Know anything about the student

Planned Edge platforms:

Bridge (Protocol + Library)

Bridge translates Tutor's semantic intent into platform-specific rendering.

The Bridge is NOT a hosted service. It is:

  1. A protocol defining messages between Tutor and Edge
  2. A client library embedded in each Edge app

This avoids a network hop for visual translation. Tutor sends semantic commands ("highlight step 3 in warning color"). Bridge library on Edge translates to native rendering.

Why Bridge exists:

Tutor (Web App / Native App)

The Tutor is the teacher in the room. Watches, listens, speaks, decides.

Deployment options:

Subcomponent Responsibility
Voice STT (listen), TTS (speak), turn-taking
Vision Stroke stream → structured math
Persona Tone, personality, communication style
Session Manager Timing, pacing, lesson lifecycle
Behavior Engine When to intervene, what to say

The Tutor consults Brain (student model) and Curriculum (what to teach) to make decisions. It's the orchestrator.

Vision (Inside Tutor)

Strokes → mathematical meaning

Vision receives the stroke stream from Edge and parses it into structured math. This is where we understand "the student wrote 3x + 5 = 14" and "they just subtracted 5 from both sides."

Options:

Brain (Cloud)

The Brain remembers everything about each student.

Brain persists across sessions. It answers questions like:

Curriculum (Cloud)

The knowledge base. Exercises, skill graphs, difficulty curves.

Curriculum answers questions like:

Cloud (Infrastructure)

The backend, split into:

Service Responsibility
DataPlane Event ingestion, student records, analytics warehouse
ControlPlane Auth, config, feature flags, billing, business rules
AdminPortal Internal dashboards, content management
ParentPortal Parent-facing dashboard (progress, summaries)
TeacherPortal Teacher-facing views (class progress, alerts)

5. Key Flows

5.1 Lesson Start

┌─────────┐          ┌─────────┐          ┌─────────┐          ┌─────────┐
│ Student │          │  Tutor  │          │ Tablet  │          │  Cloud  │
│         │          │ Device  │          │ (Edge)  │          │         │
└────┬────┘          └────┬────┘          └────┬────┘          └────┬────┘
     │                    │                    │                    │
     │  Opens Tutor app   │                    │                    │
     │───────────────────►│                    │                    │
     │                    │                    │                    │
     │                    │  Fetch student model, session state     │
     │                    │───────────────────────────────────────►│
     │                    │◄───────────────────────────────────────│
     │                    │                    │                    │
     │  Opens Edge app    │                    │                    │
     │────────────────────────────────────────►│                    │
     │                    │                    │                    │
     │                    │  Auto-pair (previously linked)         │
     │                    │◄──────────────────►│                    │
     │                    │                    │                    │
     │                    │  HELLO + capabilities                  │
     │                    │◄───────────────────│                    │
     │                    │                    │                    │
     │                    │  SESSION_CONFIG    │                    │
     │                    │───────────────────►│                    │
     │                    │                    │                    │
     │  "Hey! Ready to    │                    │                    │
     │   pick up where    │                    │                    │
     │   we left off?"    │                    │                    │
     │◄───────────────────│                    │                    │
     │                    │                    │                    │
     │                    │  RENDER_EXERCISE   │                    │
     │                    │───────────────────►│                    │
     │                    │                    │  Display exercise  │
     │                    │                    │───────────────────►│

5.2 Working on a Problem (Core Loop)

┌─────────┐          ┌─────────┐          ┌─────────┐
│ Student │          │  Tutor  │          │ Tablet  │
│         │          │ Device  │          │ (Edge)  │
└────┬────┘          └────┬────┘          └────┬────┘
     │                    │                    │
     │  Writes "3x"       │                    │
     │────────────────────────────────────────►│
     │                    │                    │
     │                    │  Stroke batch (200ms)
     │                    │◄───────────────────│
     │                    │                    │
     │                    │  Vision: "3x" (accumulating)
     │                    │                    │
     │  Writes "+ 5 = 14" │                    │
     │────────────────────────────────────────►│
     │                    │                    │
     │                    │  Stroke batch      │
     │                    │◄───────────────────│
     │                    │                    │
     │                    │  Vision: "3x + 5 = 14" (equation)
     │                    │  Behavior: wait, let them work
     │                    │                    │
     │  Pauses (thinking) │                    │
     │                    │  30s no input...   │
     │                    │                    │
     │  "What should I    │                    │
     │   do first?"       │                    │
     │───────────────────►│                    │
     │                    │                    │
     │                    │  STT: "What should I do first?"
     │                    │  Behavior: student asked, respond
     │                    │                    │
     │  "Try to get x     │                    │
     │   alone. What's    │                    │
     │   stopping that?"  │                    │
     │◄───────────────────│                    │
     │                    │                    │
     │  Writes "3x = 9"   │                    │
     │────────────────────────────────────────►│
     │                    │                    │
     │                    │  Stroke batch      │
     │                    │◄───────────────────│
     │                    │                    │
     │                    │  Vision: "3x = 9" ✓ correct step
     │                    │  Behavior: they got it, stay quiet
     │                    │                    │
     │  Writes "x = 3"    │                    │
     │────────────────────────────────────────►│
     │                    │                    │
     │                    │  Vision: "x = 3" ✓ correct!
     │                    │  Behavior: praise!
     │                    │                    │
     │  "Nice! You        │                    │
     │   isolated x       │                    │
     │   perfectly."      │                    │
     │◄───────────────────│                    │

5.3 Catching an Error

┌─────────┐          ┌─────────┐          ┌─────────┐
│ Student │          │  Tutor  │          │ Tablet  │
│         │          │ Device  │          │ (Edge)  │
└────┬────┘          └────┬────┘          └────┬────┘
     │                    │                    │
     │  Writes "3x = 14 + 5"  (wrong!)         │
     │────────────────────────────────────────►│
     │                    │                    │
     │                    │  Stroke batch      │
     │                    │◄───────────────────│
     │                    │                    │
     │                    │  Vision: "3x = 14 + 5"
     │                    │  Behavior: error detected, intervene
     │                    │                    │
     │  "Hmm, hold on.    │                    │
     │   Check the sign   │                    │
     │   when you moved   │                    │
     │   the 5."          │                    │
     │◄───────────────────│                    │
     │                    │                    │
     │                    │  ANNOTATE: highlight the "+" 
     │                    │───────────────────►│
     │                    │                    │  Draws highlight
     │                    │                    │
     │  "Oh! It should    │                    │
     │   be minus!"       │                    │
     │───────────────────►│                    │
     │                    │                    │
     │  "Exactly. When    │                    │
     │   you move to the  │                    │
     │   other side, it   │                    │
     │   flips."          │                    │
     │◄───────────────────│                    │

6. Latency Philosophy

Observe fast. React deliberately.

A real tutor doesn't interrupt every stroke. They watch continuously but speak selectively.

6.1 Tiered Responsiveness

Event Target Response Rationale
Student speaks <1 second Conversation — silence feels like ignoring
Student says "I'm done" / "check this" <2 seconds They're waiting
Student finishes a step 2-5 seconds Tutor "thinking" feels natural
Student makes error mid-work 1-3 seconds Fast enough to catch, not creepy
Student stuck (30s+ no input) 5-10 seconds Gentle nudge

6.2 Transport vs. Response

The transport is real-time. The response is deliberate.

Edge ──────► Bridge ──────► Tutor
     stream        stream       │
    (fast)        (fast)        │
                                ▼
                          ┌──────────┐
                          │ Perceive │ ← accumulate, parse, wait
                          └────┬─────┘
                               │
                               ▼
                          ┌──────────┐
                          │  Decide  │ ← is now the right moment?
                          └────┬─────┘
                               │
                               ▼
                          ┌──────────┐
                          │  Speak   │ ← generate response
                          └──────────┘

6.3 Protocol Targets

Component Target Notes
Stroke batching 200ms 5 updates/sec, feels real-time
Voice STT <500ms Conversation feel
Voice TTS (first byte) <300ms No dead air
Math parsing 500ms cycles Re-parse accumulated strokes
Annotation render <500ms Feels responsive

7. Edge Abstraction

7.1 Why Abstraction Matters

We will have multiple Edge platforms:

Each has different:

The abstraction isolates these differences.

7.2 Capability Negotiation

At session start, Edge announces what it can do:

{
  "device": "reMarkable2",
  "capabilities": {
    "color": false,
    "audio": false,
    "camera": false,
    "refreshRate": "slow",
    "stylus": { "pressure": true, "tilt": true },
    "screenSize": { "width": 1404, "height": 1872 },
    "offlineStorage": "2GB"
  }
}

Bridge adapts:

7.3 Edge Contract (Preview)

Full spec to follow, but the contract is simple:

Edge receives:

Edge sends:

That's it. Edge is paper.


8. Key Decisions

Decision Choice Rationale
Audio lives on Tutor device, not tablet Tutor device = teacher, Tablet = paper Matches real tutor metaphor. reMarkable has no audio. Clean separation.
Tutor is web-first (browser) Browser for v1, native apps later No App Store delays. Works on any device with mic. PWA for install.
Bridge is library, not service Embed in Edge app Avoids network hop for visual translation. Latency matters.
Stroke batching at 200ms Batch, don't stream per-point Feels real-time. Reduces message volume 10x. Survives bad WiFi.
Tutor/tablet persistent pairing Pair once, auto-connect after Less friction per session. They're a unit.
Parent not present in lesson Summaries after, not real-time Real tutors work with the student. Parent is sponsor, not participant.
BOOX first, reMarkable second BOOX is open Android Faster to market. reMarkable requires hack. Abstraction allows both.
Vision may run on-device Hybrid: fast local, refined cloud <500ms for error detection requires on-device. Complex parsing can be cloud.

9. Open Questions

9.1 Technical

Question Options Notes
Where does Vision run? On-device / Cloud / Hybrid Latency vs. accuracy tradeoff. Need to prototype.
Stroke format details Custom binary? Protobuf? JSON? Binary is smaller, JSON is debuggable.
Math notation format LaTeX? MathML? Custom AST? Need to support multi-step work, not just final answer.
Offline capability How much can work without connection? Edge can queue strokes. Can Tutor work offline with cached Brain/Curriculum?
Browser audio constraints Web Audio API limitations? Test STT/TTS in browser vs. native.

9.2 Product

Question Options Notes
What does Tutor screen show during lesson? Avatar / Waveform / Nothing / Stats Tutor needs a "presence" but shouldn't distract.
How much struggle before Tutor helps? Configurable? Fixed? Too fast = no learning. Too slow = frustration.
Multiple personas? One Tutor voice or selectable? Personalization vs. complexity.
Lesson length Fixed (45m)? Flexible? Kids need structure, but rigid times don't fit all families.

9.3 Business

Question Options Notes
What if no second device available? Tablet-only mode? Tutor runs on tablet? Fallback for single-device families.
Tablet ownership model Sell? Rent? BNPL? Per bootstrap strategy — multiple options.
Multi-child family Shared tablet? Separate profiles? One tablet, multiple kids = cost savings for family.

10. Spec Roadmap

Phase 1: Foundation

Spec Purpose Status
Architecture Overview This document ✅ Draft
Stroke Format Shared data format for ink 🔜 Next
Visual Primitives Device-agnostic drawing commands 🔜 Next
Edge Contract What any Edge must implement 🔜 Next

Phase 2: Core Loop

Spec Purpose Status
Bridge Protocol Tutor ↔ Edge communication Pending
Vision API Stroke → math recognition Pending
Tutor Behavior When to speak, what to say Pending
Voice Integration STT/TTS pipeline Pending

Phase 3: Intelligence

Spec Purpose Status
Curriculum Schema Exercise + skill graph format Pending
Brain Data Model Metacognition map structure Pending
Difficulty Progression How learning adapts Pending

Phase 4: Platform

Spec Purpose Status
Session Flow Lesson lifecycle Pending
Device Pairing Tutor device ↔ tablet connection Pending
Cloud Services Storage, auth, dashboards Pending

Appendix A: Glossary

Term Definition
Edge The tablet app. Captures strokes, renders exercises. "Paper."
Tutor The web/native app running on any device with mic/speaker. Listens, speaks, watches, decides. "The teacher."
Tutor device Whatever runs the Tutor app: phone, laptop, browser tab, desktop.
Bridge Protocol + library translating Tutor intent → Edge rendering.
Brain Cloud service storing student model, skill levels, history.
Curriculum Cloud service storing exercises, skill graph, content.
Vision Component parsing strokes into structured math.
Stroke A single pen-down to pen-up sequence of points.
Annotation Visual overlay from Tutor (highlights, marks, hints).

Appendix B: Related Documents

Document Status
Stroke Format Spec Pending
Visual Primitives Spec Pending
Edge Contract Spec Pending
... ...

This is a living document. As specs are written, this overview will be updated to reflect decisions and link to detailed specs.