mirror of
https://github.com/PABannier/bark.cpp
synced 2026-03-04 14:10:54 +01:00
155 lines
5.3 KiB
Markdown
155 lines
5.3 KiB
Markdown
# bark.cpp
|
|
|
|

|
|
|
|
[](https://github.com/PABannier/bark.cpp/actions)
|
|
[](https://opensource.org/licenses/MIT)
|
|
|
|
[Roadmap](https://github.com/users/PABannier/projects/1) / [encodec.cpp](https://github.com/PABannier/encodec.cpp) / [ggml](https://github.com/ggerganov/ggml)
|
|
|
|
Inference of [SunoAI's bark model](https://github.com/suno-ai/bark) in pure C/C++.
|
|
|
|
## Description
|
|
|
|
With `bark.cpp`, our goal is to bring **real-time realistic multilingual** text-to-speech generation to the community.
|
|
|
|
- [x] Plain C/C++ implementation without dependencies
|
|
- [x] AVX, AVX2 and AVX512 for x86 architectures
|
|
- [x] CPU and GPU compatible backends
|
|
- [x] Mixed F16 / F32 precision
|
|
- [x] 4-bit, 5-bit and 8-bit integer quantization
|
|
- [x] Metal and CUDA backends
|
|
|
|
**Models supported**
|
|
|
|
- [x] [Bark Small](https://huggingface.co/suno/bark-small)
|
|
- [x] [Bark Large](https://huggingface.co/suno/bark)
|
|
|
|
**Models we want to implement! Please open a PR :)**
|
|
|
|
- [ ] [AudioCraft](https://audiocraft.metademolab.com/) ([#62](https://github.com/PABannier/bark.cpp/issues/62))
|
|
- [ ] [AudioLDM2](https://audioldm.github.io/audioldm2/) ([#82](https://github.com/PABannier/bark.cpp/issues/82))
|
|
- [ ] [Piper](https://github.com/rhasspy/piper) ([#135](https://github.com/PABannier/bark.cpp/issues/135))
|
|
|
|
Demo on [Google Colab](https://colab.research.google.com/drive/1JVtJ6CDwxtKfFmEd8J4FGY2lzdL0d0jT?usp=sharing) ([#95](https://github.com/PABannier/bark.cpp/issues/95))
|
|
|
|
---
|
|
|
|
Here is a typical run using `bark.cpp`:
|
|
|
|
```java
|
|
./main -p "This is an audio generated by bark.cpp"
|
|
|
|
__ __
|
|
/ /_ ____ ______/ /__ _________ ____
|
|
/ __ \/ __ `/ ___/ //_/ / ___/ __ \/ __ \
|
|
/ /_/ / /_/ / / / ,< _ / /__/ /_/ / /_/ /
|
|
/_.___/\__,_/_/ /_/|_| (_) \___/ .___/ .___/
|
|
/_/ /_/
|
|
|
|
bark_tokenize_input: prompt: 'This is an audio generated by bark.cpp'
|
|
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20795 20172 20199 33733 58966 20203 28169 20222
|
|
|
|
Generating semantic tokens: 17%
|
|
|
|
bark_print_statistics: sample time = 10.98 ms / 138 tokens
|
|
bark_print_statistics: predict time = 614.96 ms / 4.46 ms per token
|
|
bark_print_statistics: total time = 633.54 ms
|
|
|
|
Generating coarse tokens: 100%
|
|
|
|
bark_print_statistics: sample time = 3.75 ms / 410 tokens
|
|
bark_print_statistics: predict time = 3263.17 ms / 7.96 ms per token
|
|
bark_print_statistics: total time = 3274.00 ms
|
|
|
|
Generating fine tokens: 100%
|
|
|
|
bark_print_statistics: sample time = 38.82 ms / 6144 tokens
|
|
bark_print_statistics: predict time = 4729.86 ms / 0.77 ms per token
|
|
bark_print_statistics: total time = 4772.92 ms
|
|
|
|
write_wav_on_disk: Number of frames written = 65600.
|
|
|
|
main: load time = 324.14 ms
|
|
main: eval time = 8806.57 ms
|
|
main: total time = 9131.68 ms
|
|
```
|
|
|
|
Here is a video of Bark running on the iPhone:
|
|
|
|
https://github.com/PABannier/bark.cpp/assets/12958149/bc807c0b-adfa-4c47-a05b-a2d8ba157dd8
|
|
|
|
|
|
## Usage
|
|
|
|
Here are the steps to use Bark.cpp
|
|
|
|
### Get the code
|
|
|
|
```bash
|
|
git clone --recursive https://github.com/PABannier/bark.cpp.git
|
|
cd bark.cpp
|
|
git submodule update --init --recursive
|
|
```
|
|
|
|
### Build
|
|
|
|
In order to build bark.cpp you must use `CMake`:
|
|
|
|
```bash
|
|
mkdir build
|
|
cd build
|
|
# To enable nvidia gpu, use the following option
|
|
# cmake -DGGML_CUBLAS=ON ..
|
|
cmake ..
|
|
cmake --build . --config Release
|
|
```
|
|
|
|
### Prepare data & Run
|
|
|
|
```bash
|
|
# Install Python dependencies
|
|
python3 -m pip install -r requirements.txt
|
|
|
|
# Download the Bark checkpoints and vocabulary
|
|
python3 download_weights.py --out-dir ./models --models bark-small bark
|
|
|
|
# Convert the model to ggml format
|
|
python3 convert.py --dir-model ./models/bark-small --use-f16
|
|
|
|
# run the inference
|
|
./build/examples/main/main -m ./models/bark-small/ggml_weights.bin -p "this is an audio generated by bark.cpp" -t 4
|
|
```
|
|
|
|
### (Optional) Quantize weights
|
|
|
|
Weights can be quantized using the following strategy: `q4_0`, `q4_1`, `q5_0`, `q5_1`, `q8_0`.
|
|
|
|
Note that to preserve audio quality, we do not quantize the codec model. The bulk of the computation is in the forward pass of the GPT models.
|
|
|
|
```bash
|
|
./build/examples/quantize/quantize ./ggml_weights.bin ./ggml_weights_q4.bin q4_0
|
|
```
|
|
|
|
### Seminal papers
|
|
|
|
- Bark
|
|
- [Text Prompted Generative Audio](https://github.com/suno-ai/bark)
|
|
- Encodec
|
|
- [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438)
|
|
- GPT-3
|
|
- [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
|
|
|
|
### Contributing
|
|
|
|
`bark.cpp` is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be
|
|
|
|
- bug report: you may encounter a bug while using `bark.cpp`. Don't hesitate to report it on the issue section.
|
|
- feature request: you want to add a new model or support a new platform. You can use the issue section to make suggestions.
|
|
- pull request: you may have fixed a bug, added a features, or even fixed a small typo in the documentation, ... you can submit a pull request and a reviewer will reach out to you.
|
|
|
|
### Coding guidelines
|
|
|
|
- Avoid adding third-party dependencies, extra files, extra headers, etc.
|
|
- Always consider cross-compatibility with other operating systems and architectures
|