whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp synced 2026-03-17 04:10:38 +01:00

History

Daniel Bevenius 705db0f728 whisper : fix VAD processing for skipped audio segments (#3230 ) This commit addresses an issue with token timestamps when audio segments are skipped, in `whisper_exp_compute_token_level_timestamps` related to the VAD processing and the energy levels. The motivation for this is that the token timestamps exceed the energy array bounds due to segment timing misalignment: ```console (skipped introduction) ↓ Audio segment: [2600ms → 5600ms] (3 seconds of actual audio) Energy array: [0 → 480652] (samples for 3 seconds) Token timestamps: [3266ms → 3408ms] (absolute timestamps) ``` So both `s0` and `t1` get clamped to the maximum sample index (480652) which causes the start/end timestamps to be the same for all the tokens after a certain point. This is addressed by using segment-relative timestamps in the `timestamp_to_sample` and `sample_to_timestamp`.		2025-06-13 17:35:52 +02:00
..
coreml	coreml: fix Whisper to CoreML conversion by disabling SDPA [no ci] (#2979 )	2025-04-01 18:01:23 +02:00
openvino	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
CMakeLists.txt	whisper : add support for backends with multiple ggml_backend_buffer_type (#2863 )	2025-03-26 16:54:02 +02:00
whisper-arch.h	vad : add initial Voice Activity Detection (VAD) support (#3065 )	2025-05-12 16:10:11 +02:00
whisper.cpp	whisper : fix VAD processing for skipped audio segments (#3230 )	2025-06-13 17:35:52 +02:00