mirror of https://github.com/ggerganov/llama.cpp synced 2026-03-14 03:00:40 +01:00

History

Kerfuffle 4f0154b0ba llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691 ) * Add support for quantizing already quantized models * Threaded dequantizing and f16 to f32 conversion * Clean up thread blocks with spares calculation a bit * Use std::runtime_error exceptions.		2023-06-10 10:59:17 +03:00
..
CMakeLists.txt	Add git-based build information for better issue tracking (#1232 )	2023-05-01 18:23:47 +02:00
quantize.cpp	llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691 )	2023-06-10 10:59:17 +03:00
README.md	Overhaul the examples structure	2023-03-25 20:26:40 +02:00

quantize

TODO