whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp synced 2026-04-09 00:15:39 +02:00

Author	SHA1	Message	Date
Yshtola	f53dc74843	whisper : Fix UTF-8 character boundary issue in segment wrapping (max_len) (#3592 ) The current implementation in `whisper_wrap_segment()` uses `strlen()` to count bytes, not UTF-8 characters. When splitting segments at `max_len`, this can break multi-byte UTF-8 characters, resulting in invalid sequences displayed as `�` (U+FFFD replacement character).	2026-01-16 14:16:05 +02:00
Peter A.	a96310871a	examples : fix executable example targets (#3600 ) * cmake: - added `whisper-` prefix to unprefixed targets: `quantize`, `lsp`, `vad-speech-segments` - added `install(TARGETS ${TARGET} RUNTIME)` where it was missing Signed-off-by: Peter A. <ink.splatters@pm.me> * .github/workflows/build.yml: quantize -> whisper-quantize Signed-off-by: Peter A. <ink.splatters@pm.me> --------- Signed-off-by: Peter A. <ink.splatters@pm.me>	2026-01-13 08:08:18 +01:00
Russ	3e79e73eee	build: link whisper target against Threads::Threads for FreeBSD support (#3568 )	2025-12-17 11:13:38 +02:00
Georgi Gerganov	72714d169c	whisper : adjust to ggml changes (#0 )	2025-12-12 17:54:58 +02:00
Joseph Sellers	a88b93f85f	vad : fix buffer overflow in sample reduction loop (#3558 ) The buffer size calculation loop (line ~6661) uses `n_samples - 1` as the upper bound for segment_end_samples, but the copy loop (line 6696) uses `n_samples`. This inconsistency allows the copy loop to compute segment_length values up to 1 sample larger per segment than what was allocated, causing heap corruption. Symptom: `malloc(): corrupted top size` or `malloc(): invalid size (unsorted)` crashes after VAD completes sample reduction. Fix: Use consistent bounds (`n_samples - 1`) in both loops. Fixes #3403	2025-12-06 12:28:32 +01:00
Oleg Orlov	999a7e0cbf	whisper : enable IGPU (#3492 ) Co-authored-by: Oleg Orlov <vk.orelsokolov@yandex.by>	2025-11-01 13:38:28 +01:00
Ruben Ortlam	c3b5c4d934	whisper : Support using devices of type iGPU (#3469 )	2025-10-11 17:55:16 +03:00
Andreas Lubbe	85871a9469	whisper : add support for --carry-initial-prompt (#3395 ) * Add support for --carry-initial-prompt * PR fixes for ruby and go * Refactoring for readability * WIP 1 * WIP 2 * PR fixes * More PR fixes * PR fix * Further simplification * d'oh * One more logic fix * Update src/whisper.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Truncate prompt_past0 upon initialization * Slight simplification --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-10-10 19:51:15 +03:00
Georgi Gerganov	d3a29d7b88	minor : fix code style (#3463 )	2025-10-10 11:33:01 +03:00
Silviu Caragea	85d1d3d3dc	vad : free vad_segments in whisper_vad (#3463 ) This commit fixes multiple issues: * memory leak because vad_segments is never released * avoid segmentation fault when whisper_vad_segments_from_samples returns nullptr. * avoid potential segmentation fault when the app fails to allocate memory for filtered samples and the vad context is released but also get released withing state itself when whisper_free_state is called	2025-10-10 06:20:21 +02:00
Georgi Gerganov	98930fded1	whisper : clean-up headers	2025-10-09 10:48:52 +03:00
Daniel Bevenius	c8223a8548	vad : fix memory leaks in VAD implementation (#3453 ) * vad : fix memory leak by storing ggml_context in vad context struct This commit addresses a memory leak issue in the voice activity detection (VAD) where the ggml_context is not stored within the vad context structure. The motivation for this change that this is causing the context memory to stay allocated and the tensor still point to that memory but this memory is never freed. * vad : free memory allocated for VAD hparams This commit frees the model hyperparameters allocated for the VAD context in the `whisper_vad_free` function. Specifically, it deletes the `encoder_in_channels`, `encoder_out_channels`, and `kernel_sizes` arrays allocated with `new[]` in the `whisper_vad_init` function. The motivation for this is to prevent memory leaks when the VAD. * vad: free ggml buffer in whisper_vad_free This commit frees the ggml buffer in the whisper_vad_free function to prevent memory leaks. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3452 * Revert "vad : fix memory leak by storing ggml_context in vad context struct" This reverts commit `aeafca437e`. * whisper : free ggml context in whisper_vad_init_context This commit frees the ggml_context after initializing the VAD context in the whisper_vad_init_context function. The motivation for this is to prevent memory leaks.	2025-10-06 14:57:44 +02:00
Georgi Gerganov	0b3587acdd	whisper : enable flash attention by default (#3441 )	2025-09-30 15:47:20 +03:00
Georgi Gerganov	b4909a6c78	whisper : remove ggml_mul_mat padding (#3436 )	2025-09-29 16:42:08 +03:00
Dw9	5527454cdb	whisper : fixed crash in GPU device selection on multi-GPU systems (#3372 )	2025-08-12 13:58:52 +03:00
Georgi Gerganov	f7502dca87	whisper : reset conv scheduler when CoreML is used (#3350 ) ggml-ci	2025-07-30 21:54:58 +03:00
Charles Xu	032697b9a8	whisper: validate get_rows support for cpu extra buffer (#3323 )	2025-07-14 15:13:44 +03:00
Daniel Bevenius	32cf4e2aba	whisper : add version function (#3289 ) * whisper : add version function This commit adds a version function to the whisper API. The motivation for this is that it might be convenient to have a way to programmatically check the version. Example usage: ```c++ printf("Using whisper version: %s\n", whisper_version()); ``` Will output: ```console Using whisper version: 1.7.6 ``` * examples : add version to android example CMakeLists.txt	2025-06-26 18:09:42 +02:00
glaszig	0083335ba0	coreml : backport CoreML features to macos < 14 (#3255 )	2025-06-24 09:24:27 +02:00
Daniel Bevenius	1591558ccc	whisper : clear result_all if vad_samples is empty (#3262 ) This commit clears the results_all vector no VAD segments are found. The motivation for this is that this would normally be done by `whisper_full_with_state` but when no VAD segments are detected this current implementation does not call that function and hence the vector does not get reset. This can lead to issues in applications like the server example where it will incorrectly process the old results. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3250	2025-06-18 11:30:29 +02:00
Daniel Bevenius	705db0f728	whisper : fix VAD processing for skipped audio segments (#3230 ) This commit addresses an issue with token timestamps when audio segments are skipped, in `whisper_exp_compute_token_level_timestamps` related to the VAD processing and the energy levels. The motivation for this is that the token timestamps exceed the energy array bounds due to segment timing misalignment: ```console (skipped introduction) ↓ Audio segment: [2600ms → 5600ms] (3 seconds of actual audio) Energy array: [0 → 480652] (samples for 3 seconds) Token timestamps: [3266ms → 3408ms] (absolute timestamps) ``` So both `s0` and `t1` get clamped to the maximum sample index (480652) which causes the start/end timestamps to be the same for all the tokens after a certain point. This is addressed by using segment-relative timestamps in the `timestamp_to_sample` and `sample_to_timestamp`.	2025-06-13 17:35:52 +02:00
Daniel Bevenius	98dfe8dc26	vad : revisit timestamp alignment/mapping (#3173 ) * vad : revisit timestamp alignment/mapping This commit improving the timestamp alignment by introducing a mapping table, adding intermediate reference points for longer segments, and binary search for lookups. The motivation for this changes is to address issues with the currently solution where zero-length segments are possible, and also to improve the precision of the VAD timestamps. Refs: https://github.com/ggml-org/whisper.cpp/issues/3162 * vad : use uint64_t for time mapping This commit changes the type of the `processed_time` and `original_time` fields in the `vad_time_mapping` struct from `double` to `uint64_t`. The motivation for this change is made to improve precision and avoid floating-point inaccuracies and also be consistent with other part of the code base that use `uint64_t` for time representation. This is a part of a refactoring where I'm also going to change the vad_segment_info struct to use `uint64_t` for the start and end times. This is the reason for the not so pleasant conversion and casts in the code at the moment. * vad : change vad_segment_info and whisper_vad_segment to use uint64_t * vad : use int64_t instead of uint64_t for timestamps To be consistent with other timestamps in the codebase. * vad : add centisecond conversion functions * vad : extract vad processing from whisper_full_with_state This commit extracts the VAD processing from the `whisper_full_with_state` function into the `whisper_full` and `whisper_full_parallel` functions. The motivation for this is that I did not take into account that when `whisper_full_parallel` is called with `n_processors > 1`, then the vad processing would not be applied correctly. Instead the VAD processing should be done prior to processing in the case of `whisper_full_parallel`. * vad : remove filtered_n_samples from whisper_vad The commit removes the parameter `filtered_n_samples` from the `whisper_vad` function signature and its usage, as it is no longer needed since filtered samples is now a vector (previously it was a float) The motivation for this is to simplify the usage of this function. vad : remove vad_mapping_table_initialized flag * vad : fix leaning (none) of pointer/references	2025-05-30 06:28:46 +02:00
Daniel Bevenius	73a8c5fb94	whisper : remove whisper_load_backends function (#3196 ) * whisper : remove whisper_load_backends function This commit removes the `whisper_load_backends` function, which was used to load all GGML backends. The motivation for this change push the responsibility of loading backends to user applications to give them more control over which backends to load and when. See the references below for more context. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3182 Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801778733 Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801928990 * ruby : add check for rwc is NULL This commit adds a check to ensure that the `rwc` pointer is not NULL before attempting to mark its members in the garbage collector. The motivation for this is an attempt to see if this fixed the CI build as I'm not able to reproduce the issue locally. Refs: https://github.com/ggml-org/whisper.cpp/actions/runs/15299612277/job/43036694928?pr=3196	2025-05-29 08:03:17 +02:00
Daniel Bevenius	bd1cb0c8e3	whisper : remove redundant assignments (#3178 ) This commit removes some redundant assignments in the function `whisper_exp_compute_token_level_timestamps`. The motivations for this is that tokens[j] and token are references to the same object and this can be a little confusing when reading the code.	2025-05-21 13:23:20 +02:00
Daniel Bevenius	d1f114da61	vad : return early if no vad segments are detected (#3158 ) This commit adds a check to `whisper_full_with_state` and if no VAD segments are detected, the function will return early. The motivation for this is that if no VAD segments are detected, the function will not have any samples to process which can happen if an audio sample does not contain any speech. I did not test this previously and only discovered this when updating the stream example.	2025-05-16 08:50:53 +02:00
Daniel Bevenius	bae5d074c7	vad : store VAD context in whisper_state (#3156 ) * vad : store VAD context in whisper_state This commit stores the VAD context in the whisper_state structure, allowing for better management and reuse of the VAD context across multiple calls to the whisper_vad function. The motivation for this change is that when updating the stream example I noticed that the VAD context was being re-initialized every time the whisper_vad function was called. This involved loading the VAD model which is expensive and unnecessary if the context can be reused. Storing this in the whisper_state seems follow the pattern simliar to how whisper_coreml_context and whisper_openvion_context are stored. * vad : free vad_context in whisper_free_state	2025-05-16 07:53:26 +02:00
Georgi Gerganov	a14c89aefa	whisper : update to ggml-backend changes (#0 ) ggml-ci	2025-05-13 13:59:21 +03:00
Daniel Bevenius	e41bc5c61a	vad : add initial Voice Activity Detection (VAD) support (#3065 ) * vad : add initial Voice Activity Detection (VAD) support This commit add support for Voice Activity Detection (VAD). When enabled this feature will process the audio input and detect speech segments. This information is then used to reduce the number of samples that need to be processed by whisper_full. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-05-12 16:10:11 +02:00
Daniel Bevenius	e39ba750cd	whisper : remove dummy commit comment [no ci] (#3143 ) This commit removes a dummy comment that was add by Commit(`589b408` "ci : dummy commit to trigger CI").	2025-05-12 14:40:17 +02:00
Daniel Bevenius	09846f4e12	whisper: remove MSVC warnings pragmas (#3090 ) * ggml : remove MSVC warnings pragmas This commit removes the MSVC-specific pragmas as these are now handled in CMakeLists.txt. * whisper : remove MSVC warning pragmas This commit removes the MSVC-specific pragmas. These are now handled in the CMakeLists.txt file.	2025-05-05 13:09:35 +02:00
Daniel Bevenius	2e30e6df59	whisper : fix grammar advance stack warning (#3087 ) This commit addresses a warnings that is present for Release builds: ```console [ 30%] Building CXX object src/CMakeFiles/whisper.dir/whisper.cpp.o In file included from /usr/include/c++/13/bits/stl_tree.h:63, from /usr/include/c++/13/map:62, from /home/danbev/work/ai/whisper.cpp/src/whisper-arch.h:5, from /home/danbev/work/ai/whisper.cpp/src/whisper.cpp:2: In static member function ‘static void std::__copy_move<false, false, std::random_access_iterator_tag>::__assign_one(_Tp, _Up) [with _Tp = const whisper_grammar_element; _Up = const whisper_grammar_element const]’, inlined from ‘static _Up* std::__copy_move<_IsMove, true, std::random_access_iterator_tag>::__copy_m(_Tp, _Tp, _Up) [with _Tp = const whisper_grammar_element const; _Up = const whisper_grammar_element; bool _IsMove = false]’ at /usr/include/c++/13/bits/stl_algobase.h:440:20, inlined from ‘_OI std::__copy_move_a2(_II, _II, _OI) [with bool _IsMove = false; _II = const whisper_grammar_element const; _OI = const whisper_grammar_element]’ at /usr/include/c++/13/bits/stl_algobase.h:506:30, inlined from ‘_OI std::__copy_move_a1(_II, _II, _OI) [with bool _IsMove = false; _II = const whisper_grammar_element const; _OI = const whisper_grammar_element*]’ at /usr/include/c++/13/bits/stl_algobase.h:533:42, ... ``` This warning is caused by the fact that the `stack` vector is empty when it is passed to `new_stacks.push_back(stack);`. The suggested fix is to use `new_stacks.emplace_back();` instead of `new_stacks.push_back(stack);`.	2025-04-28 19:11:38 +02:00
Georgi Gerganov	549db9376f	whisper : reduce delta_min from 1000ms to 100ms (#3028 ) ggml-ci	2025-04-11 06:23:02 +02:00
Fujimoto Seiji	e6234cd435	whisper : fix "bench-all outputs an invalid result on larger models" (#3002 ) The benchmark script 'scripts/bench-all.sh' assumes that the 11th field of the output line is a timestamp. This assumption does not hold when the target model takes a bit longer to process. Fix this issue by introducing an explicit whitespace to the output lines of `whisper_print_timings()`. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>	2025-04-04 18:36:19 +03:00
Georgi Gerganov	2b6d0d2200	rename : ggerganov -> ggml-org (#3005 )	2025-04-04 16:11:52 +03:00
Daniel Bevenius	11688b262f	coreml: fix Whisper to CoreML conversion by disabling SDPA [no ci] (#2979 ) * coreml: fix Whisper to CoreML conversion by disabling SDPA This commit disables the use of PyTorch's `scaled_dot_product_attention` in the Whisper model to avoid compatibility issues during CoreML conversion. The issue occurs because coremltools requires PyTorch 2.5.0, but the Whisper implementation may expect behavior from newer PyTorch versions. By setting `MultiHeadAttention.use_sdpa = False`, we force Whisper to use its fallback manual attention implementation, which works correctly with PyTorch 2.5.0 during the tracing process. Refs: https://github.com/ggerganov/whisper.cpp/issues/2783 * coreml: fix audio shape in whisper decoder conversion This commit fixes the audio shape in the whisper decoder conversion script. The motivation for this is that the audio shape was incorrect and was causing the conversion to fail. * coreml : set -e in generate-coreml-interface.sh The commit sets the -e flag in the generate-coreml-interface.sh script to make sure the script fails if any command fails. * coreml : update generated encoder/decoder interfaces This commit updates the generated encoder/decoder interfaces for the whisper model which is the result of running the generate-coreml-interface.sh script.	2025-04-01 18:01:23 +02:00
Daniel Bevenius	f92bd59951	whisper : remove unnecessary GGML_UNUSED macro (#2960 )	2025-03-30 05:56:10 +02:00
Dan Johansson	21d890d534	whisper : add support for backends with multiple ggml_backend_buffer_type (#2863 ) * whisper : add support for ggml_backend_buffer_type Signed-off-by: Dan Johansson <dan.johansson@arm.com> * fix compile error when building on Ubuntu Signed-off-by: Dan Johansson <dan.johansson@arm.com> * remove copyright header from include file Signed-off-by: Dan Johansson <dan.johansson@arm.com> --------- Signed-off-by: Dan Johansson <dan.johansson@arm.com>	2025-03-26 16:54:02 +02:00
Daniel Bevenius	cf5ddb8c21	whisper : initialize decoder's rng with unique seed (#2932 ) This change initializes each decoder's random number generator with a unique seed. The motivation for this is that currently all decoders are initialized with the same seed value, 0. The result of this is that for the same state (logits, probs, and logprobs) they will produce the same output.	2025-03-24 09:36:07 +01:00
Daniel Bevenius	be9de81171	whisper : add check for CPU backend initialization (#2918 ) This commit adds a check for the CPU backend initialization in the whisper library. If the initialization fails, an exception is thrown. The motivation for this change is to make the library more robust and handle the case when the CPU backend initialization fails. Resolves: https://github.com/ggerganov/whisper.cpp/issues/2917	2025-03-21 09:53:26 +01:00
Daniel Bevenius	215990abde	whisper : fix compiler warnings in whisper.cpp (#2895 ) This commit fixes compiler warnings in whisper.cpp by changing the type of the loop index variable from int64_t to size_t. Currently the following warnings are generated by the compiler: ```console /whisper.cpp/src/whisper.cpp:209:27: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] 209 \| for (int64_t i = 0; i < nels; ++i) { \| ~ ^ ~~~~ /whisper.cpp/src/whisper.cpp:219:27: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] 219 \| for (int64_t i = 0; i < nels; ++i) { \| ~ ^ ~~~~ ```	2025-03-18 13:38:41 +01:00
Daniel Bevenius	740bf7f6a1	whisper : enable compiler warnings for src (#2891 ) * whisper : enable compiler warnings for src This commit enables compiler warnings for the src directory. Currently when the WHISPER_ALL_WARNINGS flag is set to ON is only enables warnings in ggml, by setting GGML_ALL_WARNINGS to ON. This commit adds the same compiler flags for whisper's src directory. The motivation for this is to catch potential bugs and issues early on in the development process. * squash! whisper : enable compiler warnings for src Remove GF_C_FLAGS and GF_CXX_FLAGS from add_compile_options.	2025-03-18 05:19:18 +01:00
Diego Devesa	339a1cba5d	whisper : support GGML_BACKEND_DL (#2843 ) * whisper : support GGML_BACKEND_DL * fix DTW crash * whisper.objc : fix build - add ggml-cpp.h --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-27 13:35:07 +01:00
Thomas Fitzsimmons	47e14c0529	whisper : restore big endian support (#2816 ) * whisper : fix BYTESWAP whitespace * whisper : make byteswap useable with C++17 * cmake : define WHISPER_BIG_ENDIAN for big-endian targets * ci : fix (again) arm64 build fails * docker : attempt fixing arm64 build on ci * qemu v7.0.0-28 [imported from https://github.com/ggml-org/llama.cpp /commit/818a340ea8be55b3706e1772527cb8738e90a8c7 (#11895)] --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-02-25 11:38:13 +02:00
Georgi Gerganov	589b40810a	ci : dummy commit to trigger CI	2025-02-03 16:32:48 +02:00
Georgi Gerganov	eb68324c86	whisper : fix gpu device selection (#2728 )	2025-01-13 13:11:37 +02:00
Sandro Hanea	2ab2eb5110	whisper : add whisper_full_get_segment_no_speech_prob_from_state (#2716 )	2025-01-09 16:21:07 +02:00
Sacha Arbonel	4183517076	server : add no-speech threshold parameter and functionality (#2654 )	2024-12-21 17:00:08 +02:00
Georgi Gerganov	f4668169a0	whisper : rename suppress_non_speech_tokens to suppress_nst (#2653 )	2024-12-21 12:54:35 +02:00
Karthick	f897eb7670	whisper : support no_speech_thold (#2625 ) * Implement no_speech_thold no_speech_thold functionality is on par with OpenAI's whisper * Addressed review comments	2024-12-17 19:15:47 +02:00
Karthick	2f2841bfce	whisper : add single-timestamp logic (#2629 ) * Fix hallucinations during silence When the predicted tokens end with a single timestamp the the entire 30 segment should be considered as done, to avoid hallucinations for the remaining part of segment. This behaviour is on par with openai's whisper. Refer to logic related to `single_timestamp_ending` in https://github.com/openai/whisper/blob/main/whisper/transcribe.py * Accept review comments related to formatting. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-17 19:07:08 +02:00

1 2

85 Commits