mirror of
https://github.com/ggerganov/whisper.cpp
synced 2026-04-08 07:55:37 +02:00
The current implementation in `whisper_wrap_segment()` uses `strlen()` to count bytes, not UTF-8 characters. When splitting segments at `max_len`, this can break multi-byte UTF-8 characters, resulting in invalid sequences displayed as `�` (U+FFFD replacement character). |
||
|---|---|---|
| .. | ||
| coreml | ||
| openvino | ||
| CMakeLists.txt | ||
| whisper-arch.h | ||
| whisper.cpp | ||