..
ggml-cann
Merge mainline - Aug 12 2024 ( #17 )
2024-08-12 15:14:32 +02:00
ggml-cuda
Make Q8_0 KV cache work with mla=2,fa on CUDA ( #264 )
2025-03-18 15:40:47 +01:00
ggml-sycl
Merge mainline - Aug 12 2024 ( #17 )
2024-08-12 15:14:32 +02:00
iqk
Convert models to row-interleaved quants using the quantize tool ( #272 )
2025-03-21 07:23:36 +01:00
kompute @ 4565194ed7
Merge mainline llama.cpp ( #3 )
2024-07-27 07:55:01 +02:00
kompute-shaders
Merge mainline llama.cpp ( #3 )
2024-07-27 07:55:01 +02:00
llamafile
Merge mainline llama.cpp ( #3 )
2024-07-27 07:55:01 +02:00
vulkan-shaders
Merge mainline - Aug 12 2024 ( #17 )
2024-08-12 15:14:32 +02:00
CMakeLists.txt
Compile time option to use bf16 for qunts without MMQ kernels ( #261 )
2025-03-18 07:37:10 +01:00
ggml-aarch64.c
Merge mainline - Aug 12 2024 ( #17 )
2024-08-12 15:14:32 +02:00
ggml-aarch64.h
Merge mainline llama.cpp ( #3 )
2024-07-27 07:55:01 +02:00
ggml-alloc.c
Give the user the option to override where model weights are stored ( #232 )
2025-02-25 17:55:58 +02:00
ggml-backend-impl.h
Merge mainline llama.cpp ( #3 )
2024-07-27 07:55:01 +02:00
ggml-backend.c
FlashMLA-2 (CPU): faster and smaller compute buffer size ( #253 )
2025-03-13 12:07:43 +02:00
ggml-blas.cpp
Merge mainline - Aug 12 2024 ( #17 )
2024-08-12 15:14:32 +02:00
ggml-cann.cpp
Merge mainline - Aug 12 2024 ( #17 )
2024-08-12 15:14:32 +02:00
ggml-common.h
Use Q8_K_128 for IQ1_S_R4 and IQ1_M_R4 matrix multiplications ( #194 )
2025-02-09 09:14:52 +02:00
ggml-cuda.cu
Prevent FlashMLA-1 from running on CUDA ( #268 )
2025-03-19 13:03:59 +01:00
ggml-impl.h
Merge mainline - Aug 12 2024 ( #17 )
2024-08-12 15:14:32 +02:00
ggml-kompute.cpp
Merge mainline - Aug 12 2024 ( #17 )
2024-08-12 15:14:32 +02:00
ggml-metal.m
Faster MoE inference ( #112 )
2024-10-31 12:05:27 +01:00
ggml-metal.metal
Faster MoE inference ( #112 )
2024-10-31 12:05:27 +01:00
ggml-quants.c
Flash MLA (CPU only) ( #240 )
2025-03-03 15:17:51 +02:00
ggml-quants.h
IQ1_M_R4: better 1.75 bpw quants ( #187 )
2025-02-06 14:08:52 +02:00
ggml-rpc.cpp
Merge mainline - Aug 12 2024 ( #17 )
2024-08-12 15:14:32 +02:00
ggml-sycl.cpp
Merge mainline - Aug 12 2024 ( #17 )
2024-08-12 15:14:32 +02:00
ggml-vulkan.cpp
Merge mainline - Aug 12 2024 ( #17 )
2024-08-12 15:14:32 +02:00
ggml.c
Convert models to row-interleaved quants using the quantize tool ( #272 )
2025-03-21 07:23:36 +01:00