Build2026

NoteV

Every AI note-taker can hear. Ours can see — a multimodal AI assistant for Meta Ray-Ban smart glasses that captures and structures your lectures

Inspired by VisionClaw Backed by UTD Draper Pitch Competition

Smart GlassesAIEdTechiOSSwift

Code

Overview

NoteV is an AI classroom assistant designed for Meta Ray-Ban smart glasses that captures audio and visual content during lectures, then generates structured multimodal notes. The system works equally well with just an iPhone camera — no smart glasses required.

Polished Timeline with transcript and slides — Polished Timeline

Three-Layer Output

Polished Timeline — AI-refined transcript with embedded images, highlighted bookmarks, and persistent section headers.
AI Notes — Structured notes segmented by slide transitions, complete with timestamps and a navigation table of contents.
Action Items — Extracted tasks with categories, priority levels, and due dates; supports batch export to iOS Reminders and Calendar.

Smart Recording

Dual Capture Sources — Meta Ray-Ban glasses via DAT SDK or iPhone rear camera with automatic fallback.
Real-Time Speech Processing — Deepgram WebSocket streaming with connection persistence and graceful disconnection.
Intelligent Bookmarking — Automatically identifies key moments using a 4-tier keyword taxonomy with confidence metrics.
Visual Intelligence — 5-second sampling intervals, SSIM change detection, perceptual hashing for slide deduplication.
Slide Content Analysis — LLM vision extracts slide information (titles, bullet points, equations, diagrams).

AI Chat

Unified conversational interface for note queries, course setup, settings, and reminder creation.
Voice-powered input via Deepgram transcription.
Interactive action cards for user confirmation on course additions, setting changes, and reminder generation.
Full access to session transcripts, generated notes, slide images, and task lists.

Architecture

The system is organized into four layers:

Capture Layer — CaptureProvider protocol supporting Meta Ray-Ban DAT SDK v0.4.0 and AVCaptureSession for iPhone fallback.
Processing Layer — AudioPipeline (Deepgram nova-3 primary, Apple Speech fallback), FramePipeline (SSIM + perceptual hash deduplication), SmartBookmarkDetector, and SessionRecorder orchestration.
Generation Layer — TranscriptPolisher, SlideAnalyzer (LLM vision), NoteGenerator (multimodal LLM), and TodoExtractor.
Presentation Layer — Three output views (Timeline, Notes, Actions), AI Chat interface, and course management dashboard.

Tech Stack

Platform — iOS 17+, Swift 6, SwiftUI
Smart Glasses — Meta Ray-Ban Gen-2 (DAT SDK v0.4.0)
Speech-to-Text — Deepgram nova-3 WebSocket (primary); Apple Speech (fallback)
LLM — OpenAI GPT-4o, Anthropic Claude, Google Gemini (user-configurable)
Native APIs — EventKit, PDFKit, Speech, AVFoundation

My Role

Everything — system architecture, multimodal capture pipeline, AI note generation, chat interface, and course management UX. Built entirely through Claude Code.