llama.cpp

mirror of https://github.com/ggerganov/llama.cpp synced 2026-04-18 21:26:07 +02:00

History

Michael Wand 84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074 ) * Introduced NVFP4 generic MMQ kernel * Added extra FP8 guard, hope to solve ci HIP failure * Rename tiles and use HIP_FP8_AVAILABLE * Removed remaning FP8 straggler and added const int * Const * Removed DECL_MMQ_CASE artifact * Removed newline * Removed space after else * Changed HIP FP8 NVFP4 conversion gate * Added new line to bottom of mmq.cu 270 * Removed extra spaces * Removed single space in front of else on line 814 * Added NVFP4 to generate cu script so HIP can see it, further tightened logic * Include generated mmq-instance-nvfp4.cu * Added NVFP4 mmq to HIP Check ignore list * Update ggml/src/ggml-cuda/mmq.cuh Changed to Q3_K tile to read MMQ_MMA_TILE_X_K_NVFP4 Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml/src/ggml-cuda/mmq.cuh Changed to Q3_K tile to read MMQ_MMA_TILE_X_K_NVFP4 in tile assert Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml/src/ggml-cuda/mmq.cuh Added function name ending for end if Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Added function names to closing endif Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>		2026-04-01 12:04:58 +02:00
..
apple	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
hip	ggml-cuda: Add generic NVFP4 MMQ kernel (#21074 )	2026-04-01 12:04:58 +02:00
jinja	ci : switch from pyright to ty (#20826 )	2026-03-21 08:54:34 +01:00
snapdragon	snapdragon: add missing features to WoS scripts to achieve parity with ADB scripts (#20884 )	2026-03-25 09:43:12 -07:00
bench-models.sh	benches : update models + numbers (#19359 )	2026-02-05 14:34:07 +02:00
build-info.sh	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
check-requirements.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
compare-commits.sh	scripts: add sqlite3 check for compare-commits.sh (#15633 )	2025-08-28 19:23:22 +08:00
compare-llama-bench.py	ci : switch from pyright to ty (#20826 )	2026-03-21 08:54:34 +01:00
compare-logprobs.py	scripts: update corpus of compare-logprobs (#19326 )	2026-02-25 12:57:34 +01:00
create_ops_docs.py	Docs: add instructions for adding backends (#14889 )	2025-07-27 09:36:43 +08:00
debug-test.sh	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
fetch_server_test_models.py	server: Add cached_tokens info to oaicompat responses (#19361 )	2026-03-19 19:09:33 +01:00
gen-authors.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
gen-unicode-data.py	ci : bump ty to 0.0.26 (#21156 )	2026-03-30 09:29:15 +02:00
get_chat_template.py	scripts: corrected encoding when getting chat template (#11866 ) (#11907 )	2025-02-18 10:30:16 +01:00
get-flags.mk	build : pass all warning flags to nvcc via -Xcompiler (#5570 )	2024-02-18 16:21:52 -05:00
get-hellaswag.sh	scripts : update get-hellaswag.sh and get-winogrande.sh (#20542 )	2026-03-14 11:21:50 +01:00
get-pg.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
get-wikitext-2.sh	scripts : improve get-wikitext-2.sh (#19952 )	2026-03-02 15:40:49 +01:00
get-winogrande.sh	scripts : update get-hellaswag.sh and get-winogrande.sh (#20542 )	2026-03-14 11:21:50 +01:00
git-bisect-run.sh	llama: end-to-end tests (#19802 )	2026-03-08 12:30:21 +01:00
git-bisect.sh	llama: end-to-end tests (#19802 )	2026-03-08 12:30:21 +01:00
hf.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
install-oneapi.bat	support SYCL backend windows build (#5208 )	2024-01-31 08:08:07 +05:30
pr2wt.sh	chore : correct typos [no ci] (#20041 )	2026-03-05 08:50:21 +01:00
serve-static.js	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
server-bench.py	ci : switch from pyright to ty (#20826 )	2026-03-21 08:54:34 +01:00
server-test-model.py	Autoparser - complete refactoring of parser architecture (#18675 )	2026-03-06 21:01:00 +01:00
sync_vendor.py	vendor : update cpp-httplib to 0.40.0 (#21100 )	2026-03-28 08:59:44 +01:00
sync-ggml-am.sh	scripts : update sync scripts	2025-08-18 22:06:44 +03:00
sync-ggml.last	sync : ggml	2026-03-31 14:00:41 +03:00
sync-ggml.sh	scripts : update sync scripts	2025-08-18 22:06:44 +03:00
tool_bench.py	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
tool_bench.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
verify-checksum-models.py	convert.py : add python logging instead of print() (#6511 )	2024-05-03 22:36:41 +03:00
xxd.cmake	llama : move end-user examples to tools directory (#13249 )	2025-05-02 20:27:13 +02:00