Summary
Empowering the next generation of learners through multimodal Socratic-style guidance.
The Vision
socratic-sight is a tutoring platform designed for K-5 students to provide a supportive and interactive environment where children learn to solve problems, not just find answers.
The Problem
Modern educational technology focuses on providing answers, prioritizing speed in arriving at the result over the process of discovery. This answer-first approach is not conducive to learning. Existing voice tutors suffer from high latency, making natural dialogue with young children nearly impossible.
Key Differentiators
- Pedagogical Integrity Built with a "No-Answer Constitution" enforced on multiple levels to lead through inquiry and not reveal results.
- Low-Latency Interaction Leveraging WebRTC and OpenAI’s Realtime API to allow students to interrupt the AI—mimicking a real human tutor.
- Visual Context Awareness The AI doesn't just hear the student, it evaluates visual cues such as pencil strokes.
Growth Strategy
Starting as a parent-managed tool for home learning, Socratic Sight scales into a B2B SaaS platform for schools and textbook publishers. Future expansion includes gesture recognition and AR integration for hands-on subjects like chemistry and robotics, turning any physical desk into a smart laboratory.
High-Level Tech Design
System Architecture
The system is built as a highly concurrent, asynchronous event loop designed to handle audio and visual streams with minimal overhead.
| Layer | Technology Stack |
|---|---|
| Frontend | React (TypeScript), WebRTC |
| Backend | FastAPI, Uvicorn |
| AI Engine | OpenAI Realtime API |
| Storage | PostgreSQL, Pydantic, Alembic |
| Tooling | uv, ruff, eslint, prettier |
Key Components
Core Realtime Client (WebRTC + Realtime API Protocol)
Provides a thin transport layer for real-time audio, video, and data exchange with the model. This keeps higher-level tutoring behavior independent of transport concerns.
- WebRTC connection lifecycle (ICE, tracks, data channels)
- Session setup/teardown with the Realtime API
- Low-level message framing and delivery guarantees
Pedagogical Policy/Constitution Engine
Defines guardrails on the client to enforce the pedagogical constraints governing how the system is allowed to help. Implemented via plugins to allow for easy extension and customization.
- Rules for answer disclosure vs. guidance
- Hint escalation limits
- Tone, style, and age-appropriateness constraints
- Safety and educational guardrails
Tutor Orchestrator
Acts as the control loop that determines when tutoring should activate and what kind of intervention should occur next.
- Tutor mode activation and deactivation
- Coordination between ProblemContext and policy
- Selection of the next tutoring action (ask a question, give a hint, confirm an answer, stay silent)
Problem Context
Maintains a representation of the student’s learning state for the current problem.
- Problem identity and domain (subject, difficulty, steps)
- Student progress, attempts, and errors
- Hint history and escalation level
- Suspected misconceptions or blockers
Observability
Captures structured signals about system behavior and learning outcomes for debugging, evaluation, and accountability.
- Key events (attempts, hints given, policy blocks)
- State transitions in ProblemContext
- Tutor decisions and policy outcomes