whisper.cpp/src
Daniel Bevenius 705db0f728
whisper : fix VAD processing for skipped audio segments (#3230)
This commit addresses an issue with token timestamps when audio segments
are skipped, in `whisper_exp_compute_token_level_timestamps` related to
the VAD processing and the energy levels.

The motivation for this is that the token timestamps exceed the energy
array bounds due to segment timing misalignment:
```console
                  (skipped introduction)
                    ↓
Audio segment:     [2600ms → 5600ms]  (3 seconds of actual audio)
Energy array:      [0 → 480652]       (samples for 3 seconds)
Token timestamps:  [3266ms → 3408ms]  (absolute timestamps)
```
So both `s0` and `t1` get clamped to the maximum sample index (480652)
which causes the start/end timestamps to be the same for all the tokens
after a certain point.

This is addressed by using segment-relative timestamps in the
`timestamp_to_sample` and `sample_to_timestamp`.
2025-06-13 17:35:52 +02:00
..
coreml coreml: fix Whisper to CoreML conversion by disabling SDPA [no ci] (#2979) 2025-04-01 18:01:23 +02:00
openvino whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt whisper : add support for backends with multiple ggml_backend_buffer_type (#2863) 2025-03-26 16:54:02 +02:00
whisper-arch.h vad : add initial Voice Activity Detection (VAD) support (#3065) 2025-05-12 16:10:11 +02:00
whisper.cpp whisper : fix VAD processing for skipped audio segments (#3230) 2025-06-13 17:35:52 +02:00