VAD segments speech into utterances. Partials update every ~1 s while you speak; a final is emitted when you pause (or on Stop & finalize) and is run through the denoiser before transcription.