Version: 0.1 (Draft)
Date: March 2026
Status: Proposal
Depends on: Stroke Format Spec
Used by: Tutor
Vision is the component that transforms raw strokes into structured mathematical understanding. It answers: "What did the student write, and what does it mean?"
Vision is NOT:
| Goal | Implication |
|---|---|
| Mathematical understanding | Parse structure (equations, steps), not just symbols |
| Work-in-progress parsing | Understand incomplete work as student writes |
| Error localization | Identify where in the work an error occurred |
| Low latency | <500ms for incremental updates |
| Confidence scoring | Know when interpretation is uncertain |
Vision handles:
Vision does NOT handle (yet):
┌─────────────────────────────────────────────────────────────────┐
│ VISION PIPELINE │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Stroke │ │ Symbol │ │ Math │ │ Work │ │
│ │ Groups │───►│ Recog │───►│ Parser │───►│ Analyzer│ │
│ │ │ │ │ │ │ │ │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ "These strokes "That's a "It's the "They're │
│ belong together" 3, x, +" equation isolating x" │
│ 3x + 5 = 14" │
└─────────────────────────────────────────────────────────────────┘
| Stage | Input | Output |
|---|---|---|
| Stroke Grouping | Raw strokes | Logical groups (symbols, expressions) |
| Symbol Recognition | Stroke groups | Characters with confidence |
| Math Parsing | Symbols + positions | Structured math (AST) |
| Work Analysis | Math AST over time | Steps, intent, errors |
| Option | Latency | Accuracy | Cost |
|---|---|---|---|
| On-device (Tutor device) | ~100ms | Medium | Free |
| Cloud API | ~300ms | High | Per-call |
| Hybrid | ~150ms | High | Reduced |
Recommended: Hybrid — fast on-device for incremental updates, cloud for full analysis.
Note: On-device performance varies by platform. Native phone apps can use optimized ML frameworks (CoreML, TensorFlow Lite). Browser-based Tutor may need to rely more on cloud processing.
A recognized character or mathematical symbol.
{
"id": "sym_001",
"value": "3",
"type": "digit",
"confidence": 0.95,
"strokeIds": ["str_001", "str_002"],
"bounds": {"x": 0.12, "y": 0.35, "w": 0.03, "h": 0.05},
"alternatives": [
{"value": "8", "confidence": 0.03},
{"value": "5", "confidence": 0.02}
]
}
| Field | Type | Description |
|---|---|---|
id |
string | Unique identifier |
value |
string | Recognized value |
type |
enum | Symbol category (see 3.2) |
confidence |
float | 0.0 - 1.0 |
strokeIds |
array | Source strokes |
bounds |
object | Bounding box |
alternatives |
array | Other possible interpretations |
enum SymbolType {
// Numerals
DIGIT, // 0-9
DECIMAL_POINT, // .
// Variables
VARIABLE, // x, y, n, etc.
// Operators
PLUS, // +
MINUS, // - (as operator)
MULTIPLY, // × or *
DIVIDE, // ÷ or /
EQUALS, // =
NOT_EQUALS, // ≠
LESS_THAN, // <
GREATER_THAN, // >
// Grouping
PAREN_OPEN, // (
PAREN_CLOSE, // )
BRACKET_OPEN, // [
BRACKET_CLOSE, // ]
// Special
FRACTION_BAR, // horizontal line in fraction
EXPONENT, // superscript indicator
SQRT, // √
NEGATIVE_SIGN, // - (as sign, not operator)
// Other
UNKNOWN, // Unrecognized
SCRATCH, // Crossed out / scribble
}
A mathematical expression parsed from symbols.
{
"id": "expr_001",
"latex": "3x + 5",
"tree": {
"type": "add",
"left": {
"type": "multiply",
"left": {"type": "number", "value": 3},
"right": {"type": "variable", "name": "x"}
},
"right": {"type": "number", "value": 5}
},
"symbols": ["sym_001", "sym_002", "sym_003", "sym_004"],
"bounds": {"x": 0.10, "y": 0.34, "w": 0.15, "h": 0.06},
"confidence": 0.92
}
An equation (expression = expression).
{
"id": "eq_001",
"type": "equation",
"left": { /* expression */ },
"right": { /* expression */ },
"latex": "3x + 5 = 14",
"confidence": 0.91
}
A single step in the student's work.
{
"id": "step_001",
"index": 0,
"content": { /* equation or expression */ },
"operation": null,
"bounds": {"x": 0.05, "y": 0.30, "w": 0.40, "h": 0.08},
"timestamp": 1711670500000
}
For subsequent steps:
{
"id": "step_002",
"index": 1,
"content": { /* equation: 3x = 9 */ },
"operation": {
"type": "subtract_both_sides",
"operand": {"type": "number", "value": 5}
},
"valid": true,
"bounds": {"x": 0.05, "y": 0.40, "w": 0.35, "h": 0.08},
"timestamp": 1711670510000
}
Complete analysis of student's work.
{
"exerciseId": "ex_4521",
"problem": { /* original equation */ },
"steps": [ /* array of WorkStep */ ],
"answer": {
"expression": {"type": "number", "value": 3},
"latex": "x = 3",
"zoneId": "answer_zone"
},
"status": "complete",
"errors": [],
"confidence": 0.89
}
Called on each stroke batch (every 200ms during active writing).
Request:
{
"method": "vision.update",
"params": {
"sessionId": "sess_xyz",
"exerciseId": "ex_4521",
"strokes": [ /* new strokes only */ ],
"context": {
"problem": "3x + 5 = 14",
"expectedAnswer": "x = 3",
"previousParsed": { /* last Work object */ }
}
}
}
Response:
{
"result": {
"symbols": [ /* newly recognized symbols */ ],
"expressions": [ /* updated expressions */ ],
"work": { /* full Work object */ },
"changes": [
{"type": "symbol_added", "symbolId": "sym_042"},
{"type": "step_updated", "stepId": "step_002"}
],
"parseTime": 145
}
}
Called when complete analysis needed (e.g., on "submit").
Request:
{
"method": "vision.parse",
"params": {
"sessionId": "sess_xyz",
"exerciseId": "ex_4521",
"allStrokes": [ /* all strokes */ ],
"context": {
"problem": "3x + 5 = 14",
"expectedAnswer": "x = 3"
}
}
}
Response:
{
"result": {
"work": { /* complete Work object */ },
"evaluation": {
"answerCorrect": true,
"workValid": true,
"stepsAnalysis": [
{"stepIndex": 0, "valid": true, "operation": "given"},
{"stepIndex": 1, "valid": true, "operation": "subtract_both_sides"},
{"stepIndex": 2, "valid": true, "operation": "divide_both_sides"}
]
},
"confidence": 0.93,
"parseTime": 320
}
}
Called periodically or on suspected error.
Request:
{
"method": "vision.checkErrors",
"params": {
"sessionId": "sess_xyz",
"work": { /* current Work object */ }
}
}
Response:
{
"result": {
"errors": [
{
"type": "sign_error",
"stepIndex": 1,
"location": {"symbolIds": ["sym_025"]},
"expected": "+",
"actual": "-",
"message": "Sign should flip when moving to other side",
"confidence": 0.87
}
]
}
}
When confidence is low, Vision can request clarification.
Response with clarification needed:
{
"result": {
"work": { /* partial */ },
"clarificationNeeded": [
{
"symbolId": "sym_023",
"bounds": {"x": 0.35, "y": 0.42, "w": 0.03, "h": 0.05},
"candidates": [
{"value": "6", "confidence": 0.45},
{"value": "0", "confidence": 0.40},
{"value": "9", "confidence": 0.10}
],
"contextHint": "Expecting a single digit here"
}
]
}
}
Tutor can then ask: "Is that a 6 or a 0?"
Strokes are grouped based on:
| Factor | Weight | Description |
|---|---|---|
| Temporal proximity | High | Strokes within 500ms likely same symbol |
| Spatial proximity | High | Strokes that overlap or touch |
| Structural patterns | Medium | Known multi-stroke symbols (=, +, ×) |
| Context | Low | What makes sense mathematically |
Multi-stage recognition:
Example: A loop shape could be 0, O, or o. Context (math expression) strongly suggests 0.
Position determines meaning:
| Position | Interpretation |
|---|---|
| Superscript | Exponent |
| Subscript | Index (x₁) |
| Inline | Normal symbol |
| Above/below line | Fraction |
| Small leading | Negative sign vs. minus |
| Pair | Disambiguation |
|---|---|
| 1, l, I | Context: numbers vs. variables |
| 0, O, o | Math context → 0 |
| ×, x | Operator position → ×, variable position → x |
| -, − | Position: between terms → operator, before term → sign |
| 2, z | Typical handwriting differences |
| 5, S | Context: digit expected vs. variable |
Steps are detected by:
Vision infers what operation was performed:
| Pattern | Detected Operation |
|---|---|
| Same terms, one moved across = | add/subtract_both_sides |
| All terms multiplied/divided by same | multiply/divide_both_sides |
| Expression simplified | simplify |
| Terms combined | combine_like_terms |
| Distribution applied | distribute |
| Factoring applied | factor |
For each step, Vision checks:
{
"stepIndex": 2,
"valid": false,
"error": {
"type": "arithmetic_error",
"detail": "14 - 5 = 9, not 8",
"expected": "9",
"actual": "8"
}
}
| Level | Range | Meaning | Action |
|---|---|---|---|
| High | 0.85+ | Very confident | Proceed normally |
| Medium | 0.60-0.85 | Somewhat confident | Proceed with caution |
| Low | 0.40-0.60 | Uncertain | May need clarification |
| Very Low | <0.40 | Guessing | Request clarification |
Overall confidence is combination of:
overall = (symbol_conf^0.4) × (parse_conf^0.3) × (context_conf^0.3)
{
"confidence": 0.72,
"confidenceFlags": {
"lowSymbolConfidence": ["sym_023", "sym_025"],
"ambiguousParse": false,
"unexpectedContent": false
}
}
enum ErrorType {
// Arithmetic
ARITHMETIC_ERROR, // Wrong calculation
// Algebraic
SIGN_ERROR, // Wrong sign when moving terms
OPERATION_ERROR, // Wrong operation applied
MISSING_TERM, // Dropped a term
EXTRA_TERM, // Added a term
DISTRIBUTION_ERROR, // Incorrect distribution
COMBINING_ERROR, // Wrong combination of like terms
// Structural
INCOMPLETE_STEP, // Step not finished
SKIPPED_STEP, // Jumped too far
WRONG_VARIABLE, // Solving for wrong variable
// Format
NOTATION_ERROR, // Math written incorrectly
AMBIGUOUS, // Can't determine intent
}
Errors include location info for highlighting:
{
"type": "SIGN_ERROR",
"location": {
"stepIndex": 1,
"symbolIds": ["sym_025"],
"bounds": {"x": 0.30, "y": 0.41, "w": 0.02, "h": 0.04}
},
"context": {
"operation": "subtract_both_sides",
"expected": "3x = 14 - 5",
"actual": "3x = 14 + 5"
}
}
Tutor Vision
│ │
│ Stroke batch received │
│ │
│ vision.update(strokes) ──────────►│
│ │ Process strokes
│ │ Update Work model
│ ◄────────────── Work + changes ───│
│ │
│ Evaluate: should I speak? │
│ • Error detected? │
│ • Step completed? │
│ • Student stuck? │
│ │
Vision output triggers Tutor decisions:
| Vision Output | Tutor Consideration |
|---|---|
error detected |
Intervene now or let student self-correct? |
step completed |
Praise? Move on? |
confidence low |
Ask for clarification? |
answer in answer zone |
Check correctness? |
| No change for 30s | Student stuck? Offer hint? |
Tutor can ask Vision specific questions:
{"method": "vision.isAnswerCorrect", "params": {"work": {...}}}
{"method": "vision.getNextHint", "params": {"work": {...}, "errorType": "sign_error"}}
{"method": "vision.compareToExpected", "params": {"work": {...}, "expected": "x = 3"}}
| Operation | Target | Max |
|---|---|---|
| Incremental update | 150ms | 300ms |
| Full parse | 300ms | 500ms |
| Error check | 100ms | 200ms |
Vision maintains cache per session:
Vision processes in cycles, not per-stroke:
Strokes arrive: ─●──●●─●──●●●──●─●──────────●●──
Process cycles: ──────X────────X────────────X──
150ms 150ms (idle, process on activity)
| Capability | Description | Timeline |
|---|---|---|
| Geometry notation | Angles, parallel marks, congruence | v1.5 |
| Graph interpretation | Identify plotted points, lines | v2.0 |
| Word problem parsing | Extract math from text | v2.0 |
| Multi-language | Non-Latin numerals, RTL | v2.0 |
| Capability | Challenge |
|---|---|
| Intent prediction | What is student trying to do? |
| Learning style detection | Visual vs. procedural approach? |
| Misconception identification | What conceptual error underlies this? |
3x + 5 = 14Strokes received:
str_001: curves forming "3"
str_002: crossed lines forming "x"
str_003: crossed lines forming "+"
str_004: curves forming "5"
str_005: two horizontal lines forming "="
str_006: vertical line (part of "1")
str_007: curves forming "4"
Symbol recognition:
[
{"id": "sym_001", "value": "3", "type": "DIGIT", "confidence": 0.97},
{"id": "sym_002", "value": "x", "type": "VARIABLE", "confidence": 0.94},
{"id": "sym_003", "value": "+", "type": "PLUS", "confidence": 0.96},
{"id": "sym_004", "value": "5", "type": "DIGIT", "confidence": 0.93},
{"id": "sym_005", "value": "=", "type": "EQUALS", "confidence": 0.98},
{"id": "sym_006", "value": "1", "type": "DIGIT", "confidence": 0.91},
{"id": "sym_007", "value": "4", "type": "DIGIT", "confidence": 0.95}
]
Expression parsing:
{
"type": "equation",
"left": {
"type": "add",
"left": {"type": "multiply", "left": {"type": "number", "value": 3}, "right": {"type": "variable", "name": "x"}},
"right": {"type": "number", "value": 5}
},
"right": {"type": "number", "value": 14},
"latex": "3x + 5 = 14"
}
Student writes: 3x = 14 + 5 (should be 14 - 5)
{
"type": "SIGN_ERROR",
"stepIndex": 1,
"location": {"symbolIds": ["sym_015"]},
"expected": "-",
"actual": "+",
"message": "When moving +5 to the other side, it becomes -5"
}
Student writes: 3x = 8 (should be 9)
{
"type": "ARITHMETIC_ERROR",
"stepIndex": 1,
"location": {"symbolIds": ["sym_020"]},
"calculation": "14 - 5",
"expected": "9",
"actual": "8"
}
Student writes: 3x = 14 (dropped the 5)
{
"type": "MISSING_TERM",
"stepIndex": 1,
"missingFrom": "left_side",
"missing": {"type": "number", "value": 5},
"message": "The +5 term needs to be accounted for"
}
Next spec: Tutor Behavior (when to speak, what to say)