CP194 Capstone — Oral Defense

SOMACH

A $40 Dual-Channel sEMG System for
Silent Speech Classification

ESP32 + 2× AD8232 | 6 Commands | 2 Studies

Advisor: Prof. Patrick Watson • Second Reader: Prof. Shekhar

↗ Paper 0 — Pareto-Optimal ML for sEMG (arXiv) ↗ Paper I — Electrode Placement ↗ Paper II — Curriculum Learning

01 — THE PROBLEM

Silent Speech Is Gated by Cost

Subvocal speech recognition — classifying words a person "says" silently, with no audible output — requires reading facial and laryngeal muscle signals (sEMG). Current systems cost $1,000+ and require custom fabrication.

ORIGINAL QUESTION

"Can we replicate MIT AlterEgo with cheaper, off-the-shelf parts?"

EVOLVED TO

"Can a $40 system classify 6 silent commands above chance?"

More honest. More measurable. Doesn't assume the gap can be fully closed.

WHY 16.7%?

Random guessing 1 of 6 commands: 1 ÷ 6 = 16.7%. If we beat this at p < 0.001, the hardware is reading something real — not random noise.

MIT AlterEgo

$1,250+ • 7 electrodes • 24-bit ADC

Custom fabrication required; not reproducible.

Somach (This Work)

$40 • 2 electrodes • 12-bit ADC

Off-the-shelf AD8232 cardiac sensors + ESP32.

MARKET VALIDATION — JANUARY 2026

↗

Apple acquires Q.ai — $2 billion

Whispered speech + facial muscle detection. AirPods + Vision Pro. Jan 29, 2026.

↗

Merge Labs (Sam Altman + OpenAI) — $250M seed

Goal: "a natural, nonverbal way for anyone to interact with AI." Jan 15, 2026.

Both companies are building what SOMACH demonstrates at $40. The market is no longer speculative.

02 — THE PATH

From Motion to Silent Speech

Four phases of increasing signal complexity.

PHASE 0

EEG Attempt

Spring 2025 SF. Tried EEG for silent speech. Hit acquisition noise floor. Contributed to the NETO paper. Pivoted to sEMG.

Spring 2025 • San Francisco

PHASE 1

Phone IMU

Phone-as-Xbox-Kinect. Walk/jump/punch → Hollow Knight controller. Android + real-time pipeline.

Sep 2025 • Taipei

PHASE 2

Watch IMU + ML

Pixel Watch HAR. CNN-LSTM. 94% binary walking accuracy.

Oct–Nov 2025 • Taipei

PHASE 3

Forearm sEMG

AD8232 muscle sensing. 18-model benchmark. RF: 74.3% accuracy.

Nov–Dec 2025 • Taipei

PHASE 4

Facial sEMG

Subvocalization. 6-class silent speech. 2 studies, 3 papers.

Dec 2025–Mar 2026 • Hyderabad

Each phase built skills for the next: Android dev → ML pipelines → Hardware integration → Study design

03 — HARDWARE

$40 Dual-Channel System

ELECTRODE PLACEMENT

SYSTEM ARCHITECTURE

Originally designed as 3-channel. The third AD8232 failed mid-project. Used both mastoid bones behind the ear as a shared ground reference → 2 active channels remained.

CH 1

AD8232 #1

Mentalis/Chin

→

MCU

ESP32

250 Hz ADC

→

USB Serial

Python

pyserial

CH 2

AD8232 #2

Throat/Under-chin

↗

Y-splitter for shared ground reference. 3.5mm electrode cables. Ag/AgCl gel electrodes.

Total Cost $40

Channels 2 (differential)

ADC Resolution 12-bit (4,096 levels)

Sampling Rate 250 Hz per channel

Sensor Bandpass 0.5–40 Hz

Vocabulary 6 commands (UP, DOWN, LEFT, RIGHT, SILENCE, NOISE)

Total Data 4,033 CSVs (87 MB)

Key insight: The AD8232's bandpass (0.5–40 Hz) matches the range MIT AlterEgo and OpenBCI use for sEMG signal processing — likely coincidental, but it turns out to be sufficient to capture the onset burst that carries the discriminative signal. No hardware modification needed.

04 — METHODOLOGY

Training Pipeline

Capture

250 Hz, 2-ch serial

Validate

NaN/zero check, trim

Normalize

Z-score per session

Train

5-fold stratified CV

Evaluate

Accuracy ± SE, confusion

STUDY A — TRANSFER LEARNING

Chin (mentalis) + Under-chin (mylohyoid)

4-phase progression: Overt → Mouthing → Exaggerated closed-mouth → Covert subvocalization. Model trained on overt, evaluated on covert.

900 CSVs • 6 classes • CN V (trigeminal)

STUDY B — CURRICULUM LEARNING

Chin (mentalis) + Throat (laryngeal)

5-phase Speech Intensity Curriculum: highest energy → lowest. Inspired by the NETO (EEG-to-text) paper's decreasing-difficulty design — not Bengio. Descending motor intensity.

1,500 CSVs • 6 classes • CN X (vagus)

Figure — Signal Amplitude Across Curriculum Phases

Speech Intensity Curriculum: Amplitude Decay

The 12-bit ADC noise floor (~10 µV) is reached at covert speech — explaining the resolution ceiling.

05 — RESULTS

Two Studies, One Vocabulary

STUDY A

51.8%

± 2.8% (5-fold CV)

Chance16.7%

Above Chance3.1× chance

Electrode SiteChin + Under-chin

CSVs900

STUDY B

48.9%

± 3.1% (5-fold CV)

Gated Accuracy64.1% (θ = 0.60)

Above Chance2.9× chance

Electrode SiteChin + Throat

CSVs1,500

Both studies: statistically significant above chance (p < 0.001)

Proves a $40 system can capture discriminative sEMG features during silent speech.

Figure — Study A vs Study B: Electrode Configuration Comparison

Left: single-session accuracy. Center: per-class F1. Right: cross-study transfer (near chance).

06 — KEY FINDING #1

The Onset Burst Problem

Feature importance analysis: 100% of discriminative weight concentrates in the first ~20 timesteps (~80ms). An onset masking experiment confirmed: zeroing out the first 80ms collapses accuracy to chance.

62.0%

Baseline

17.6%

Onset Masked

16.7%

Random Chance

INTERPRETATION

The 12-bit ADC (4,096 levels) can detect the initial motor command spike — the moment the brain fires the signal to the muscles. But it lacks the resolution to read sustained articulation patterns. MIT's 24-bit ADC (16M levels) sees both onset and articulation. Our system does onset classification, not continuous silent speech recognition.

Figure — Study A: Training vs. Test Accuracy (Generalization Gap)

Train reaches ~99%. Test plateaus at 51.8%. The gap is the onset-only signal: memorized in-session, fails to generalize.

07 — KEY FINDING #2

Electrode Placement Creates Incompatible Feature Spaces

WITHIN-STUDY ACCURACY

~50%

Both Study A and Study B

CROSS-STUDY TRANSFER

25–31%

Near chance — transfer fails

WHY?

Study A: CN V (Trigeminal Nerve)

Chin + under-chin → mentalis + mylohyoid muscles. Jaw elevation, lip protrusion.

Study B: CN X (Vagus Nerve)

Chin + throat → mentalis + laryngeal muscles. Vocal fold tension, glottal closure.

Conclusion: Different cranial nerves produce fundamentally different signal patterns. Electrode placement is not interchangeable — it must be standardized for any cross-session or cross-subject generalization.

Figure — Study A: Held-Out Confusion Matrix (5-Fold CV, n=900)

UP and SILENCE are strong. DOWN/LEFT/RIGHT/RIGHT bleed into each other — same onset pattern, different articulation below our noise floor.

08 — ENGINEERING

Engineering Around Hardware Limits

From 49.7% to 93.5% via software alone.

62.0%

Raw 6-class single-session

21.9%

Cross-session (1cm shift)

49.7%

Multi-session combined

57.0%

4-class merge (drop LEFT)

77.9%

Confidence gating @ θ=0.60

58.2%

Multi-session recovery

93.5%

Ensemble + gated @ θ=0.60

Figure — Confidence Gating: Accuracy vs. Coverage Trade-off

θ=0.60 sweet spot: 64.1% accuracy on 62.1% of samples. Higher threshold = more accurate, fewer answers.

09 — DELIVERABLES

Complete Package

3

arXiv Papers

P1: Curriculum, P2: Electrode, P3: EMG Benchmark

4,033

CSV Data Files

87 MB • Open dataset

5

Phase Repos

GitHub • MIT License

24

Blog Posts

Full journey documentation

10

Python Scripts

E2E pipeline

3

Arduino Sketches

ESP32 firmware

6

Instructables Pages

Reproducibility guide

2

Websites Deployed

somach.vercel.app + Kaggle dataset

1

Terminal UI Tool

Custom recorder + real-time display built for the study

10 — DEMO

Real-Time Demo

live_demo.py — Watch the model classify silent speech in real-time.

250

Hz Input

2

sEMG Channels

6

Commands

UP • DOWN • LEFT • RIGHT • SILENCE • NOISE

11 — NEXT STEPS & CLOSE

What's Next

RESEARCH ROADMAP

Higher ADC

24-bit ADC (ADS1299) to capture sustained articulation, not just onset. Bridge the onset→articulation gap — the same upgrade that separates MIT from this work.

Multi-Subject

Current data: single subject (n=1). Expand to 5–10 subjects with standardized electrode placement protocol. Required before any real-world generalization claim.

Open Dataset

4,033 CSVs → Kaggle/HuggingFace. Enable community replication and set a benchmark for low-cost sEMG silent speech — a field with no open standard yet.

COMPANY ROADMAP

BCI SDK v1

One Python package abstracting 8+ EEG/EMG headsets into a single API. Developers build once; it runs on SOMACH, OpenBCI, Muse, and future form factors.

100 Units

Manufacture and distribute 100 SOMACH units for independent user studies. Open-source hardware BOM. Prove cross-subject generalization at scale.

Community Standard

Apple and Merge Labs are shipping. The 12–18 month window before consumer hardware arrives is the window to establish an open-source developer ecosystem that won't disappear when AirPods do this natively.

ORAL DEFENSE — MARCH 10, 2026 — UNANIMOUS PASS

"Congratulations Carl, this was an easy pass for us. Hardware projects are difficult, and yours worked. Your documentation throughout has been exceptional. This was one of my favorite projects in a long time to advise."

— Prof. Patrick Watson, CS/ML (Advisor)

"Getting hardware to work under less than ideal circumstances — low budget — is amazing. You could make a strong case for why you should be part of the MIT team. I would call yours an ideal capstone trajectory."

— Prof. Shekhar, Signal Processing (Second Reader)

"Carl did this specific thing, which is a very difficult engineering challenge, and here's how he got it to work."

— Prof. Patrick Watson

Carl Vincent Ladres Kho

Minerva University Class of 2026

kho@uni.minerva.edu

somach.vercel.app GitHub arXiv

↗ Paper 0 — Pareto-Optimal ML for sEMG (arXiv) ↗ Paper I — Electrode Placement ↗ Paper II — Curriculum Learning

Thank you.