git/ggml - ggml - Gitea: Git with a cup of tea

git/ggml

mirror of https://github.com/ggerganov/ggml synced 2026-03-02 21:20:06 +01:00

Author	SHA1	Message	Date
slaren	f3c1e6aeaa	update tests and examples	2024-11-04 19:42:09 +02:00
Georgi Gerganov	c73d836bbf	examples : adapt to new ggml backend interfaces ggml-ci	2024-10-03 22:12:49 +03:00
Georgi Gerganov	6b30c17879	metal : add perf-metal tool + fix build	2024-10-01 18:08:31 +03:00
Georgi Gerganov	336c10a4c3	examples : adapt to ggml.h changes (#0 ) ggml-ci	2024-09-20 22:03:57 +03:00
Salvatore Mesoraca	2438d62cb9	tests : fix memory leaks (#936 ) It is annoying to run the tests using the sanitizers because of all the uninteresting reports about the memory leaked by the tests themselves. Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>	2024-08-27 09:25:12 +03:00
slaren	e3b3846976	fix uses of GGML_USE_CUBLAS in tests and examples (#879 ) * fix uses of GGML_USE_CUBLAS in tests and examples * fix ci/run.sh ggml-ci	2024-07-02 19:11:52 +02:00
Georgi Gerganov	5378ea0d3c	ggml : reorganize source code + improve CMake (#865 ) * scripts : update sync [no ci] * ggml : move headers one up [no ci] * files : reorganize + update CMake ggml-ci * cmake : build normal ggml library ggml-ci * cmake : link math library to test + remove ci for code cov ggml-ci * files : move public headers to include ggml-ci	2024-06-26 19:33:53 +03:00
slaren	7652115c79	update examples and tests	2024-03-14 18:46:58 +02:00
slaren	5070f078a6	ggml-alloc : v3 (#727 ) * ggml-alloc v3 ggml-ci * fix ci ggml-ci * whisper : check for backend buffer allocation failures * whisper : avoid leaks when initialization fails * cleanup ggml-ci * style fixes ggml-ci	2024-02-11 14:37:58 +02:00
Georgi Gerganov	3c32701600	tests : fix im2col usage	2024-02-10 09:45:40 +02:00
Georgi Gerganov	aea446526b	examples : adapt to metal API	2024-01-14 00:09:26 +02:00
Georgi Gerganov	845d01bab3	sync : llama.cpp (ggml_scale, ggml_row_size, ggml_mul_mat_set_prec) (#662 ) * sync : llama.cpp (ggml_scale, ggml_row_size, ggml_mul_mat_set_prec) ggml-ci * ggml : add comment about backward GGML_OP_DIAG_MASK_INF (#4203) * llama : fix platforms without mmap (#4578) * llama : fix platforms without mmap * win32 : limit prefetch size to the file size * fix win32 error clobber, unnecessary std::string in std::runtime_error * ggml-alloc : fix ggml_tallocr_is_own * whisper : minor * ggml : cuda jetson + arm quants warnings ggml-ci --------- Co-authored-by: Herman Semenov <GermanAizek@yandex.ru> Co-authored-by: slaren <slarengh@gmail.com>	2023-12-22 17:53:50 +02:00
Steward Garcia	5bf85a5221	ggml: new gpu kernels + extends ggml_leaky_relu + ggml_pad (#621 ) * add new cuda kernels and new op ggml_pad * add ggml_tanh cuda kernel * remove old broadcast impl * restore some changes * cuda: optimized im2col + group_norm kernels * extent ggml_leaky -> ggml_leaky_relu * fix some code issues * cuda: concat support 4 dims * cuda: fix ggml_acc + add backends ops test * restore ggml_pad + add backend op test * metal : implement GGML_OP_ACC * ggml : fix bug in ggml_upscale * metal : add ggml_upscale * metal : add ggml_tanh * metal : add ggml_gelu_quick * ggml : make ggml_pad more general purpose * metal : add ggml_pad * ggml_leaky_relu as regular op + fix identation * cuda: ggml_acc admit all op_parms * negative_slope better pass param * metal : add ggml_leaky_relu * metal : add ggml_group_norm * cuda : minor * ggml : add GGML_OP_LEAKY_RELU to ggml_compute_backward * metal : soft max, tanh, supports_op fixes * test-backend-ops : add sentinels between tensors to detect overflows --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>	2023-12-13 09:08:48 -05:00
slaren	703825ffab	ggml : full broadcast in mul, add, div + ggml_mul_mat_id, ggml_argsort, ggml_top_k (#625 ) * ggml : support broadcasting in dim 0 in add and mul * add cuda add/mul broadcast impl add configurable eps to cuda norm * add metal impl ggml-ci * deduplicate code in cuda impl * try to optimize cuda impl * ggml : support broadcasting in ggml_div * test-backend-ops : allow filtering by op and backend * ggml-cuda : add ggml_div impl * ggml : add ggml_mul_mat_id, ggml_sort, ggml_top_k (CPU only) * fix ggml_div threads * fix ggml_div with accelerate * ggml_sort -> ggml_argsort * whatever * actually fix accelerate div * disable opencl ci * ci : disable ctest error check temporarily until we fix backend ops test * cmake : propagete GGML_USE_xxx compile flags with ggml target * whisper : utlize new ggml_add broadcast for dim 0 * cmake : adendum to ee666ae9 * ggml_backend_graph_copy : fix leak * ggml_cuda : add ggml_sum_rows impl * metal : add ggml_div * metal : add ggml_sum_rows * ggml_cuda : add ggml_argsort impl * move kernel * metal : add ggml_argsort * mul_mat_id : fix missing init task * cuda/metal: fix argsort synchronization * metal : add ggml_mul_mat_id * ggml-cuda : add mul_mat_id for f16 + tensor cores * test-backend-ops : add tests for quants mat mul * ggml : fix q5_0 and q5_1 hist stats * test-backend-ops : use smaller matrices to avoid automatic offloading, add mat-vec tests * metal : fix alibi to match the CPU behavior * metal : check dimensions in supports_op * test-backend-ops : reduce error threshold for mat muls * ggml-cuda : simplify dequantize funs, add supports_op by type for mul_mat_id * ggml-cuda : support quantized types in mul_mat_id with cublas * ggml-cuda : add fallback over CPU for mul_mat_id * test-backend-ops : increase mul mat error threshold * cleanup ggml-ci * test-backend-ops : fix usage * cleanup * ci : re-enable tests * metal : fix compile warnings --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-12-05 13:56:07 +01:00
slaren	38f46afdf2	ggml-backend update: buffer types, backend registry, graph compare, tests (#620 ) * ggml-backend update * update metal backend * show metal logs with ggml-backend * move buffer types to functions * cuda: add per-device backends * cuda: add host buffer type * fix metal build * ggml_backend_alloc_ctx_tensors : ignore allocated tensors * ggml_backend_compare_graph_backend fixes * ci : try to fix metal build * metal : first print device info, then build kernels * ci : disable GGML_METAL on Github Actions * test-backend-ops initial impl (unary and get_rows) * more op tests * cleanup * print test params, add more tests cases for add and mul * add tests for im2col * better f16 init * metal : add basic impl of supports_op * add test for ggml_concat * update im2col test params, show callstack with GGML_ASSERT on CUDA failures * add more rope tests * add more rope and mul_mat test cases * add more get_rows test cases ggml-ci * add more norm and rms_norm test cases with different eps * ci : fix metal resource path ggml-ci * tests : silence warning * add ggml_backend_tensor_alloc and ggml_backend_view_init for initializing tensors without ggml-alloc * add mul_mat test cases without dims 3 and 4 ggml-ci * check for nans and infs ggml-ci * add diag_mask_inf test cases without dims 3 and 4 ggml-ci * fix cuda leak while backend reg * fix msvc issues * remove backend_sched debug causes by default * gpt-2 : increase graph size ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-30 19:03:03 +01:00
slaren	aa1d26e6f3	update examples and tests to use ggml_allocr_new_measure_from_backend (#608 ) * update examples and tests to use ggml_allocr_new_measure_from_backend * update comments	2023-11-13 16:19:49 +01:00
Georgi Gerganov	537e06c953	sync : whisper.cpp (whisper full GPU, fix warnings) (#606 ) * sync : whisper.cpp (whisper full GPU, fix warnings) ggml-ci * ci : enable CUDA / Metal ggml-ci * cuda : fallback to CPU for mul mat ne03 != ne13 (fix SAM + CUDA) ggml-ci	2023-11-12 16:35:03 +02:00
Steward Garcia	ba779f117e	ggml : replace conv 1D - 2D stage_0 and stage_1 with im2col and mul_mat (#564 ) * added conv2d stage 0 - 1 cuda kernels * add im2col + refactor conv1d and conv2d * fix params invalid index * add conv1d and conv2d unit tests * resolving wrong values and fix mul_mat validation * improve tests + reduce code duplication * add cuda kernels * more data test * fix ggml_op_count to 70 * add temp test - gemm != mul_mat * tests : fix test-mul-mat matrix multiplication * test-mul-mat match gemm == ggml_mul_mat with conv2d op * replaced gemm by ggml_mul_mat * ggml_mul_mat cpu backend support fp16 src1 * ggml_mul_mat cuda backend fp16 fixed * remove unnecessary ggml_cont and removed conv1d-2d functions deprecated * some fixes * explain conv1d reshapes * ggml : fix tests on Arm + do not use BLAS for F16 data * tests : fix FP16 handling on Arm * ggml : avoid ggml_cont and ggml_transpose in ggml_conv_xd * ci : switch back to release * cuda : fix wrong pointer usage * ggml : add metal support for im2col and f16xf16 mul mat * ggml : im2col opts * Update src/ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>	2023-11-12 15:34:04 +02:00

18 Commits