whisper.cpp

mirror of https://github.com/ggerganov/whisper.cpp synced 2026-04-12 01:45:30 +02:00

History

Daniel Bevenius ecb8f3c2b4 examples : add stereo to mono conversion in read_audio_data (#3266 ) This commit adds a conversion from stereo to mono in the `read_audio_data` function of `common-whisper.cpp`. The motivation for this change is prior to Commit `7d3da68f79` ("examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759)", there was a step that read stereo int16 data -> pcm16 (448512 samples), and then converted to mono (224256 samples), and then also convert to stereo in `pcmf32s. The middle step here seems to have been missed when rewriting the code to use Miniaudio and caused issues then transcribing stereo audio files. For example, currently using the audio sample in the linked issue the output is: ```console [00:00:00.000 --> 00:00:03.000] (speaker 1) Sous-titres réalisés para la communauté d'Amara.org ``` And with the change in this commit the output is: ``` [00:00:00.000 --> 00:00:01.500] (speaker 1) sonnerie de téléphone [00:00:01.500 --> 00:00:07.000] (speaker 1) Salut jeune homme ! [00:00:07.000 --> 00:00:08.500] (speaker 0) C'est vrai que je te dérange ? [00:00:08.500 --> 00:00:10.500] (speaker 1) Ah pas du tout, pas du tout, pas du tout ! [00:00:10.500 --> 00:00:12.500] (speaker 1) J'étais en train de... [00:00:12.500 --> 00:00:14.500] (speaker 1) de préparer un courrier ``` Resolves: https://github.com/ggml-org/whisper.cpp/issues/3092		2025-06-18 17:41:43 +02:00
..
addon.node	node : add language detection support (#3190 )	2025-06-02 14:58:05 +02:00
bench	whisper : remove whisper_load_backends function (#3196 )	2025-05-29 08:03:17 +02:00
bench.wasm	examples : add HEAPU8 to all of the exported runtime methods (#3134 )	2025-05-10 06:44:13 +02:00
cli	cli : fix short name conflict for vad options [no ci] (#3247 )	2025-06-13 10:25:25 +02:00
command	whisper : remove whisper_load_backends function (#3196 )	2025-05-29 08:03:17 +02:00
command.wasm	wasm : add note about worker.js file generation [no ci] (#3133 )	2025-05-09 15:42:45 +02:00
deprecation-warning	examples : add WHISPER_SDL2 check to deprecation executables (#2911 )	2025-03-20 18:36:02 +01:00
lsp	whisper : remove whisper_load_backends function (#3196 )	2025-05-29 08:03:17 +02:00
python	readme : remove invalid flag from Python example (#2396 )	2024-08-30 14:00:38 +03:00
quantize	whisper : remove whisper_load_backends function (#3196 )	2025-05-29 08:03:17 +02:00
server	examples : set the C++ standard to C++17 for server (#3261 )	2025-06-17 11:29:48 +02:00
stream	whisper : remove whisper_load_backends function (#3196 )	2025-05-29 08:03:17 +02:00
stream.wasm	wasm : add note about worker.js file generation [no ci] (#3133 )	2025-05-09 15:42:45 +02:00
sycl	sycl: fix example build (#2570 )	2024-11-18 14:57:23 +02:00
talk-llama	talk-llama : sync llama.cpp	2025-06-18 12:40:34 +03:00
vad-speech-segments	whisper : remove whisper_load_backends function (#3196 )	2025-05-29 08:03:17 +02:00
wchess	whisper : remove whisper_load_backends function (#3196 )	2025-05-29 08:03:17 +02:00
whisper.android	android : fix builds (#0 )	2025-06-10 12:40:33 +03:00
whisper.android.java	android : fix builds (#0 )	2025-06-10 12:40:33 +03:00
whisper.nvim	rename : ggerganov -> ggml-org (#3005 )	2025-04-04 16:11:52 +03:00
whisper.objc	docs : update README.md for whisper.objc app (#2569 )	2025-05-13 06:03:50 +02:00
whisper.swiftui	examples : clarify Core ML encoder model usage [no ci] (#2987 )	2025-04-02 08:32:14 +02:00
whisper.wasm	wasm : add note about worker.js file generation [no ci] (#3133 )	2025-05-09 15:42:45 +02:00
CMakeLists.txt	examples : add VAD speech segments example (#3147 )	2025-05-13 12:31:00 +02:00
coi-serviceworker.js	ci : add github pages workflow for wasm examples (#2969 )	2025-03-31 11:34:40 +02:00
common-ggml.cpp	common : remove old types	2024-12-18 12:52:16 +02:00
common-ggml.h	whisper : add integer quantization support (#540 )	2023-04-30 18:51:57 +03:00
common-sdl.cpp	common : more general m_audio_len update logic (#2855 )	2025-03-07 10:10:03 +02:00
common-sdl.h	sdl : fix audio callback (#1523 )	2023-11-20 13:16:38 +02:00
common-whisper.cpp	examples : add stereo to mono conversion in read_audio_data (#3266 )	2025-06-18 17:41:43 +02:00
common-whisper.h	common : separate whisper sources (#2846 )	2025-02-27 12:50:32 +02:00
common.cpp	whisper: remove MSVC warnings pragmas (#3090 )	2025-05-05 13:09:35 +02:00
common.h	examples : add --print-confidence option to cli (#3150 )	2025-05-14 19:21:48 +02:00
ffmpeg-transcode.cpp	examples : fix deprecated FFmpeg functions (#3073 )	2025-04-28 06:16:50 +02:00
generate-karaoke.sh	examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759 )	2025-02-27 09:06:54 +02:00
grammar-parser.cpp	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
grammar-parser.h	whisper : add grammar-based sampling (#1229 )	2023-11-13 10:51:34 +02:00
helpers.js	js : remove un-needed request header from fetchRemote (#2119 )	2024-05-13 15:13:19 +03:00
json.hpp	examples : clean up common code (#1871 )	2024-02-19 10:50:15 +02:00
livestream.sh	rename : ggerganov -> ggml-org (#3005 )	2025-04-04 16:11:52 +03:00
miniaudio.h	examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759 )	2025-02-27 09:06:54 +02:00
server.py	examples : update server.py to match github pages app [no ci] (#3004 )	2025-04-04 10:23:53 +02:00
stb_vorbis.c	examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759 )	2025-02-27 09:06:54 +02:00
twitch.sh	rename : ggerganov -> ggml-org (#3005 )	2025-04-04 16:11:52 +03:00
yt-wsp.sh	examples : update usage/help in yt-wsp.sh (#3251 )	2025-06-16 12:21:16 +02:00