| .. |
|
CMakeLists.txt
|
Enable and clean up compiler warnings in src (#824)
|
2025-10-11 16:01:13 +03:00 |
|
llama-arch.cpp
|
Adding ministral3: this seems to work (#1030)
|
2025-12-03 11:01:21 +01:00 |
|
llama-arch.h
|
Adding ministral3: this seems to work (#1030)
|
2025-12-03 11:01:21 +01:00 |
|
llama-build-context.cpp
|
Slightly faster TG for split mode "graph" (#1057)
|
2025-12-12 07:54:37 +01:00 |
|
llama-build-context.h
|
Hadamard transforms for K-cache - CPU only (#1033)
|
2025-12-04 06:51:11 +01:00 |
|
llama-context.h
|
POC: CUDA tensor parallel (MoE models) (#1022)
|
2025-12-01 19:25:40 +01:00 |
|
llama-cparams.h
|
Hadamard transforms for K-cache - CPU only (#1033)
|
2025-12-04 06:51:11 +01:00 |
|
llama-grammar.cpp
|
Update grammar (#1023)
|
2025-11-30 18:45:38 +01:00 |
|
llama-grammar.h
|
Update grammar (#1023)
|
2025-11-30 18:45:38 +01:00 |
|
llama-hparams.cpp
|
Adding ministral3: this seems to work (#1030)
|
2025-12-03 11:01:21 +01:00 |
|
llama-hparams.h
|
Adding ministral3: this seems to work (#1030)
|
2025-12-03 11:01:21 +01:00 |
|
llama-impl.h
|
POC: CUDA tensor parallel (MoE models) (#1022)
|
2025-12-01 19:25:40 +01:00 |
|
llama-load-tensors.cpp
|
Be able to set a max. number of GPUs to be used in split mode graph (#1051)
|
2025-12-11 07:22:53 +01:00 |
|
llama-mmap.cpp
|
Enable CUDA graphs for MoE models + GPT-OSS support (#689)
|
2025-08-15 09:18:07 +03:00 |
|
llama-mmap.h
|
Enable CUDA graphs for MoE models + GPT-OSS support (#689)
|
2025-08-15 09:18:07 +03:00 |
|
llama-model-loader.cpp
|
CUDA: set compute parameters via command line arguments (#910)
|
2025-11-07 07:11:23 +02:00 |
|
llama-model-loader.h
|
Merge Q, K, V (#878)
|
2025-10-30 10:49:48 +02:00 |
|
llama-model.cpp
|
Adding ministral3: this seems to work (#1030)
|
2025-12-03 11:01:21 +01:00 |
|
llama-model.h
|
Be able to set a max. number of GPUs to be used in split mode graph (#1051)
|
2025-12-11 07:22:53 +01:00 |
|
llama-quantize.cpp
|
Fix requatizing from row-interleaved quants (#992)
|
2025-11-20 11:50:09 +01:00 |
|
llama-sampling.cpp
|
Update grammar (#1023)
|
2025-11-30 18:45:38 +01:00 |
|
llama-sampling.h
|
add dry sampler (#513)
|
2025-06-19 10:24:53 +03:00 |
|
llama-vocab.cpp
|
Update mtmd to improve accuracy of M-RoPE (#993)
|
2025-11-29 07:27:15 +01:00 |
|
llama-vocab.h
|
Update mtmd to improve accuracy of M-RoPE (#993)
|
2025-11-29 07:27:15 +01:00 |
|
llama.cpp
|
Do not use split mode graph scheduling if there are tensor overrides (#1060)
|
2025-12-12 14:48:38 +01:00 |
|
unicode-data.cpp
|
Merge mainline llama.cpp (#3)
|
2024-07-27 07:55:01 +02:00 |
|
unicode-data.h
|
Merge mainline llama.cpp (#3)
|
2024-07-27 07:55:01 +02:00 |
|
unicode.cpp
|
Enable CUDA graphs for MoE models + GPT-OSS support (#689)
|
2025-08-15 09:18:07 +03:00 |
|
unicode.h
|
Enable CUDA graphs for MoE models + GPT-OSS support (#689)
|
2025-08-15 09:18:07 +03:00 |