Audio Pipeline
The audio pipeline handles everything from playing phrases through speakers to capturing notes from the microphone and converting them to scored note sequences.
Pipeline Overview
graph LR
subgraph Playback
Transport["Tone.js Transport"]
SF["smplr SoundFont"]
Metro["Metronome"]
Backing["Backing Track"]
end
subgraph Capture
Mic["getUserMedia"]
Source["MediaStreamSource"]
Analyser["AnalyserNode (fftSize=4096)"]
Recorder["MediaRecorder"]
end
subgraph Detection
Pitch["PitchDetector (Pitchy)"]
Onset["OnsetDetector (AudioWorklet)"]
end
subgraph Segmentation
Seg["NoteSegmenter"]
Quant["Quantizer"]
Bleed["BleedFilter"]
end
Transport --> SF
Transport --> Metro
Transport --> Backing
SF --> Speakers["Speakers"]
Metro --> Speakers
Backing --> Speakers
Mic --> Source
Source --> Analyser
Source --> Recorder
Analyser --> Pitch
Source --> Onset
Pitch --> Seg
Onset --> Seg
Seg --> Quant
Quant --> Bleed
Bleed --> DetectedNotes["DetectedNote[] kept + filtered"]
1. Audio Context (src/lib/audio/audio-context.ts)
A singleton AudioContext shared between Tone.js and smplr. Must be initialized from a user gesture (click/tap) due to browser autoplay policies.
initAudio()— CallsTone.start(), returns the rawAudioContext. Idempotent.getAudioContext()— Returns the context (throws if not initialized).isAudioInitialized()— Boolean check.
Tone.js wraps the AudioContext in a standardized-audio-context. The raw context is passed to smplr so both libraries schedule on the same timeline.
2. Playback (src/lib/audio/playback.ts)
Plays phrases through SoundFont instrument samples using the Tone.js Transport for scheduling.
Flow:
loadInstrument(instrumentId)— Creates asmplr.Soundfontinstance using the MusyngKite soundfont (richer wind samples than FluidR3_GM) withloadLoopData: truefor natural sustained notes. Cached via smplr'sCacheStorage. Previous instruments are disconnected on switch.playPhrase(phrase, options, keepMetronome)— Converts phrase notes to Tone.jsPartevents using PPQ (Pulses Per Quarter) ticks for exact timing. Schedules metronome if enabled. Applies swing feel from settings.- When
keepMetronome=true(used during call-and-response), the Transport keeps running after the phrase ends so the metronome continues during recording. stopPlayback()— Stops Transport, disposes Part, stops all ringing notes.
Note conversion: Phrase note offsets are fractions of a whole note (e.g., [1, 4] = quarter note). These are converted to quarter-note beats (* 4), then to Tone.js ticks (* PPQ), then scheduled as "${ticks}i" time strings.
Jazz Expression
The playback engine adds several layers of expression for authentic jazz sound:
- Warmth filter: A low-pass
BiquadFilterNodeat 4500 Hz (sax) / 6000 Hz (trumpet) with Q 0.7, inserted via smplr'saddInsert. Rolls off harsh digital highs. - Vibrato: A Web Audio
OscillatorNodeLFO at 4.8 Hz modulating the filter's detune at 12 cents depth (sax) / 6 cents (trumpet). Simulates jaw/airstream vibrato. - Breath-scoop detune: Per-note
detunevalues — the first note of a phrase gets -15 cents, lower notes get -8 cents, higher notes stay on pitch. Simulates a saxophonist's initial attack. - Humanized velocity: Random +/-8 velocity deviation per note, avoiding robotic dynamics.
- Humanized timing: Micro-timing jitter of ~+/-15ms per note for organic feel.
Swing Feel
The swing setting from PlaybackOptions (0.5 = straight, 0.67 = triplet, 0.75 = heavy) is mapped to Tone.js transport.swing (0–0.5) with swingSubdivision set to '8n', giving authentic jazz eighth-note swing.
3. Capture (src/lib/audio/capture.ts)
Sets up microphone input with processing-optimized constraints:
audio: {
echoCancellation: false, // Don't filter the instrument signal
noiseSuppression: false, // Preserve harmonics
autoGainControl: false // Consistent levels
}
The MediaStreamSource connects to an AnalyserNode (fftSize=4096) but is not connected to the audio destination — this prevents feedback loops.
The AnalyserNode provides time-domain float data to the pitch detector. It also serves as the input level meter via getInputLevel(), which computes RMS and scales it to 0-1.
4. Pitch Detection (src/lib/audio/pitch-detector.ts)
Uses the Pitchy library which implements the McLeod Pitch Method — an autocorrelation-based algorithm well-suited for monophonic instruments.
Detection loop:
- Runs at ~60fps via
requestAnimationFrame - Reads
Float32Arraytime-domain data from theAnalyserNode - Calls
PitchDetector.findPitch(buffer, sampleRate)→[frequency, clarity] - Filters by: clarity >= 0.80, frequency 80–1200 Hz
- Converts frequency to MIDI via
12 * log2(freq / 440) + 69 - Quantizes to nearest MIDI note + cents deviation
- Accumulates readings in an array with timestamps relative to recording start
Each PitchReading contains:
midiFloat— Fractional MIDI numbermidi— Nearest integer MIDIcents— Deviation from nearest note (-50 to +50)clarity— Detection confidence (0-1)time— Seconds from recording startfrequency— Raw Hz
5. Onset Detection (src/lib/audio/onset-detector.ts + onset-worklet.ts)
An AudioWorklet processor running on the audio thread for low-latency onset detection.
Algorithm (energy-based with HFC):
- Compute High-Frequency Content (HFC):
sum(|sample[i]| * (i + 1)) / N— weights later samples in each 128-sample frame to emphasize transients. - Maintain an Exponential Moving Average (EMA) of HFC with smoothing factor 0.85.
- If
HFC / EMA > 3.0(threshold) and at least 60ms since last onset → fire onset event. - Silence detection: skip frames with energy below 0.001 to avoid noise triggers.
- Let EMA settle for 5 frames before detecting.
Onset timestamps are posted to the main thread via MessagePort and collected relative to recordingStartTime.
6. Note Segmentation (src/lib/audio/note-segmenter.ts)
Combines pitch readings and onset timestamps into DetectedNote[]:
- Use onset boundaries to divide pitch readings into segments.
- For each segment:
- Take the median MIDI note of all readings (robust to outliers)
- Take the median cents deviation of readings matching the median MIDI
- Compute average clarity of matching readings
- Filter out segments shorter than
minNoteDuration(default 50ms). - If no onsets were detected, treat all readings as one note.
7. Metronome (src/lib/audio/metronome.ts)
A synthesized jazz metronome using Tone.js synths:
- Kick drum (beat 1):
MembraneSynthtuned to C1 for a short membrane thump marking the downbeat. - Ride cymbal (all beats): White noise through an 8kHz highpass filter.
- Hi-hat chick (beats 2 and 4): Pink noise through a 6kHz highpass filter, very short envelope.
- Uses
Tone.Sequencefor pattern scheduling. - Can run for a finite number of bars (during playback) or loop indefinitely (during recording).
Recording Flow in Practice
The practice page (src/routes/practice/+page.svelte) orchestrates the full recording flow:
- Play: Load instrument if needed, play phrase via
playPhrase()withkeepMetronome=true. - Await input: After phrase ends, pitch detection runs. The first detected pitch triggers recording.
- Record: Pitch detector collects readings. Silence timeout (2s) or max duration triggers finish.
- Segment:
extractOnsetsFromReadings()detects onsets from pitch data (gap > 100ms or MIDI change). Gap-based onsets are back-dated by 50ms to compensate for pitch detector re-lock delay after silence (see Scoring Algorithm docs).segmentNotes()producesDetectedNote[]. - Score:
scoreAttempt()produces aScorewith per-note results.
The practice page prefers the AudioWorklet onset detector when available. It creates the worklet on first mic capture (ensureMicCapture()) and calls reset(recordingStartTime) before each recording pass to synchronize timestamps with the pitch detector. If the AudioWorklet is unavailable (e.g., unsupported browser), it falls back to extractOnsetsFromReadings() which derives onsets from pitch data gaps and MIDI changes.
8. Backing Track (src/lib/audio/backing-track.ts + backing-track-schedule.ts + backing-styles.ts)
Generates and schedules a three-part jazz rhythm section (upright bass, piano or organ comping, drum kit) synchronized to phrase harmony on the Tone.js Transport.
- Bass: Smolken pizzicato double-bass sample library; walking lines built per
HarmonicSegmentusing chord tones and passing tones. - Comping:
smplr.SplendidGrandPiano(Salamander Grand) or drawbar organ (GM SoundFont, MusyngKite). Voicings are built viavoicings.ts— shell voicings + voice-leading to minimize hand motion between chords. - Drums: Sampled kick, ride, hi-hat (Virtuosity Drums, CC0) loaded through
sample-maps.ts. Patterns are defined inbacking-styles.tsand include swing, straight, bossa, and ballad feels. - Schedule:
buildSchedule()returns aBackingTrackSchedule— a time-indexed structure withnotes[](bass/comp note events withstartSeconds,endSeconds,midi[]) and anactiveMidiAt(time)lookup. The schedule is consumed by the bleed filter. - Diagnostics: A
BackingTrackLog(per-beat dump of bass/comp/drum/melody activity) is produced for the/diagnosticspanel.
Because echoCancellation is disabled on the mic (to preserve instrument pitch accuracy), the backing track is audible in the user's recording. Two mitigations:
- The backing track gain is ducked during the
awaitingInputphase so bleed cannot falsely triggerbeginRecording(). - The bleed filter runs between segmentation and scoring (see below).
9. Bleed Filter (src/lib/audio/bleed-filter.ts)
Reference-aware filter that rejects detected notes which probably came from the backing track bleeding into the mic rather than from the user's instrument. It uses clarity (autocorrelation confidence) as a proxy for signal strength — direct instrument input typically produces clarity ≥ 0.92; speaker bleed at typical distances lands in 0.80–0.88.
Per detected note, given the BackingTrackSchedule:
- If no backing-track pitch is active at the note's transport time (octave-folded pitch-class match) → keep.
- If
clarity >= 0.92→ keep (even if the pitch matches — the user is playing the same note). - If
clarity < 0.88→ filter (weak signal on a matching pitch = bleed). - Borderline clarity (0.88–0.92) on a matching pitch → check onset coincidence: if a backing note begins within 50 ms of the detected onset → filter; otherwise → keep.
The filter returns { kept, filtered }. score-pipeline.ts passes both lists to scoreAttempt() so the unfiltered and filtered scores can be compared in diagnostics.
10. Quantizer (src/lib/audio/quantizer.ts)
Snaps detected onsets to the Transport's subdivision grid (quarter, eighth, triplet, sixteenth). Used by the lick-practice continuous flow so the notation overlay stays on a clean rhythmic grid even when the user's timing drifts.
11. Recording & Replay (src/lib/audio/recorder.ts + replay.ts + persistence/audio-store.ts)
recorder.ts wraps MediaRecorder to persist raw microphone audio per attempt. persistence/audio-store.ts stores blobs in IndexedDB (bounded, FIFO). replay.ts reads a stored blob and feeds it through the full detection + scoring pipeline again, which drives the /diagnostics rescore panel.
Recording Flow in Practice
The practice page (src/routes/practice/+page.svelte) orchestrates the full recording flow:
- Play: Load instrument if needed, play phrase via
playPhrase()withkeepMetronome=true. - Await input: After phrase ends, pitch detection runs and backing-track gain is ducked. The first detected pitch triggers recording.
- Record: Pitch detector collects readings. Silence timeout (2s) or max duration triggers finish.
- Segment:
extractOnsetsFromReadings()detects onsets from pitch data (gap > 100ms or MIDI change). Gap-based onsets are back-dated by 50ms to compensate for pitch detector re-lock delay after silence (see Scoring Algorithm docs).segmentNotes()producesDetectedNote[]. - Bleed filter:
filterBleed()splits the notes intokept/filtered. - Score:
runScorePipeline()runsscoreAttempt()on both the unfiltered and filtered sets.
The record and lick-practice flows use the AudioWorklet onset detector directly (it fires onset events on the audio thread), skipping the gap-inference path.