Algorithm Details

In-depth mathematical descriptions of Mankunku's core algorithms.

Dynamic Time Warping (DTW)

Source: src/lib/scoring/alignment.ts

DTW finds the optimal alignment between a sequence of expected notes and a sequence of detected notes, handling timing variations, missed notes, and extra notes.

Formulation

Given:

Expected notes E = [e_0, e_1, ..., e_{N-1}]
Detected notes D = [d_0, d_1, ..., d_{M-1}]

Build a cost matrix dp[i][j] of size (N+1) x (M+1):

dp[0][0] = 0
dp[i][0] = dp[i-1][0] + SKIP_COST          (skip all detected)
dp[0][j] = dp[0][j-1] + SKIP_COST          (skip all expected)

dp[i][j] = min(
  dp[i-1][j-1] + matchCost(e_{i-1}, d_{j-1}),   // match
  dp[i-1][j]   + SKIP_COST,                       // skip expected (missed)
  dp[i][j-1]   + SKIP_COST                        // skip detected (extra)
)

Where SKIP_COST = 2.0 and:

matchCost(e, d) = pitchDistance(e, d) + rhythmDistance(e, d)

pitchDistance(e, d) = {
  0.0,                   if e.pitch == d.midi
  min(1.0, |diff| * 0.5) otherwise
}

rhythmDistance(e, d) = min(1.0, |e.onset - d.onset| / beatDuration)

Backtracking

Starting from dp[N][M], trace back to dp[0][0] by checking which of the three options (match, skip expected, skip detected) produced each cell's value. This produces an AlignmentPair[].

Complexity

Time: O(N * M). Space: O(N * M). For typical phrase sizes (4–16 notes), this is negligible.

Latency Correction

Source: src/lib/scoring/scorer.ts

Human latency (reaction time + audio detection delay) creates a constant offset between expected and detected onsets. Rather than penalizing this as rhythmic inaccuracy, the scorer absorbs it.

Algorithm

For each matched pair (expectedIndex, detectedIndex), compute: offset = detected.onset - expected.onset
Take the median of all offsets (robust to outliers from misaligned pairs)
Subtract this median from all detected onsets

The median typically absorbs 100–300ms of constant delay without affecting relative timing accuracy between notes.

Why Median?

The mean is sensitive to outliers — a single badly aligned pair could skew the correction. The median ignores up to 50% outliers, making it robust when some pairs are poorly matched by DTW.

McLeod Pitch Method

Source: src/lib/audio/pitch-detector.ts (via Pitchy)

The McLeod Pitch Method is an autocorrelation-based algorithm optimized for monophonic pitch detection.

Key Properties

Autocorrelation-based — Measures the similarity of a signal with time-shifted copies of itself. Peaks in the autocorrelation correspond to the fundamental period.
Normalized Square Difference Function (NSDF) — Instead of raw autocorrelation, McLeod uses NSDF which normalizes by the signal energy, making peaks comparable across different amplitudes.
Peak picking — The algorithm finds peaks in the NSDF and selects the one that best balances clarity (peak height) and frequency (peak position).
Parabolic interpolation — Refines the peak position for sub-sample accuracy, yielding fractional MIDI values.

Parameters in Mankunku

Parameter	Value	Rationale
Buffer size	4096 samples	At 48kHz, gives ~85ms windows. Sufficient for frequencies down to ~80Hz.
Clarity threshold	0.80	Only accept readings where the signal is clearly periodic.
Min frequency	80 Hz	Below the lowest note of supported instruments.
Max frequency	1200 Hz	Above the highest fundamental of supported instruments.
Update rate	~60fps	`requestAnimationFrame` loop.

MIDI Conversion

midiFloat = 12 * log2(frequency / 440) + 69
midi = round(midiFloat)
cents = round((midiFloat - midi) * 100)

Onset Detection (HFC)

Source: src/lib/audio/onset-worklet.ts

An energy-based onset detector running on the audio thread via AudioWorklet.

High-Frequency Content (HFC)

For each 128-sample frame:

HFC = sum(|sample[i]| * (i + 1)) / N

The weighting by (i + 1) emphasizes later samples in each frame, which correspond to higher frequencies. Transients (note attacks) have more high-frequency content than sustained notes.

Detection Logic

EMA_new = alpha * EMA_old + (1 - alpha) * HFC     // alpha = 0.85
ratio = HFC / EMA

if ratio > threshold AND time - lastOnset > cooldown:
  fire onset event
  lastOnset = currentTime

Parameter	Value
Alpha (smoothing)	0.85
Threshold	3.0
Cooldown	60ms
Silence floor	0.001 energy
Settle frames	5

Why HFC over Spectral Flux?

HFC is computationally simpler (no FFT required) and works well for percussive onsets typical of wind instruments. Spectral flux requires computing the magnitude spectrum of each frame and comparing to the previous frame — more accurate for subtle onsets but more expensive.

Note Segmentation

Source: src/lib/audio/note-segmenter.ts

Median-Based Pitch Assignment

Within each onset-bounded segment, the pitch is determined by the median MIDI note of all pitch readings, not the mean or mode.

Why median:

Robustness to outliers — A brief pitch glitch (e.g., octave error) doesn't affect the result
No distribution assumption — Unlike the mean, the median doesn't assume symmetric error
Stable for modal data — For clean segments, the median equals the true pitch

The cents deviation is also the median, but only from readings matching the median MIDI (filtering out octave errors before computing intonation).

Adaptive Difficulty

Source: src/lib/difficulty/adaptive.ts

State Machine

                  avg >= 85%
    ┌─────────────────────────────┐
    │                             ▼
  HOLD ◄───── 50% <= avg < 85% ──── ADVANCE
    │                             │
    │         avg < 50%           │
    └────────────────────────────►┘
                RETREAT

Transitions require at least 10 attempts since the last change per dimension, preventing oscillation.

Independent Axis Adjustment

Pitch and rhythm complexity are adjusted independently — each has its own 25-score accuracy window and its own 10-attempt cooldown counter:

Pitch: advances when pitch accuracy window average ≥ 85%, retreats when < 50%
Rhythm: advances when rhythm accuracy window average ≥ 85%, retreats when < 50%
currentLevel = Math.round((pitchComplexity + rhythmComplexity) / 2)

This allows a player who is strong in rhythm but weak in pitch to progress their rhythm complexity while pitch stays at a comfortable level.

Welcome to Mankunku

#Algorithm Details

#Dynamic Time Warping (DTW)

#Formulation

#Backtracking

#Complexity

#Latency Correction

#Algorithm

#Why Median?

#McLeod Pitch Method

#Key Properties

#Parameters in Mankunku

#MIDI Conversion

#Onset Detection (HFC)

#High-Frequency Content (HFC)

#Detection Logic

#Why HFC over Spectral Flux?

#Note Segmentation

#Median-Based Pitch Assignment

#Adaptive Difficulty

#State Machine

#Independent Axis Adjustment