Commit Graph

18 Commits

Author SHA1 Message Date
slaren
f3c1e6aeaa update tests and examples 2024-11-04 19:42:09 +02:00
Georgi Gerganov
c73d836bbf examples : adapt to new ggml backend interfaces
ggml-ci
2024-10-03 22:12:49 +03:00
Georgi Gerganov
6b30c17879
metal : add perf-metal tool + fix build 2024-10-01 18:08:31 +03:00
Georgi Gerganov
336c10a4c3 examples : adapt to ggml.h changes (#0)
ggml-ci
2024-09-20 22:03:57 +03:00
Salvatore Mesoraca
2438d62cb9
tests : fix memory leaks (#936)
It is annoying to run the tests using the sanitizers
because of all the uninteresting reports about the memory
leaked by the tests themselves.

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-08-27 09:25:12 +03:00
slaren
e3b3846976
fix uses of GGML_USE_CUBLAS in tests and examples (#879)
* fix uses of GGML_USE_CUBLAS in tests and examples

* fix ci/run.sh

ggml-ci
2024-07-02 19:11:52 +02:00
Georgi Gerganov
5378ea0d3c
ggml : reorganize source code + improve CMake (#865)
* scripts : update sync [no ci]

* ggml : move headers one up [no ci]

* files : reorganize + update CMake

ggml-ci

* cmake : build normal ggml library

ggml-ci

* cmake : link math library to test + remove ci for code cov

ggml-ci

* files : move public headers to include

ggml-ci
2024-06-26 19:33:53 +03:00
slaren
7652115c79 update examples and tests 2024-03-14 18:46:58 +02:00
slaren
5070f078a6
ggml-alloc : v3 (#727)
* ggml-alloc v3

ggml-ci

* fix ci

ggml-ci

* whisper : check for backend buffer allocation failures

* whisper : avoid leaks when initialization fails

* cleanup

ggml-ci

* style fixes

ggml-ci
2024-02-11 14:37:58 +02:00
Georgi Gerganov
3c32701600
tests : fix im2col usage 2024-02-10 09:45:40 +02:00
Georgi Gerganov
aea446526b
examples : adapt to metal API 2024-01-14 00:09:26 +02:00
Georgi Gerganov
845d01bab3
sync : llama.cpp (ggml_scale, ggml_row_size, ggml_mul_mat_set_prec) (#662)
* sync : llama.cpp (ggml_scale, ggml_row_size, ggml_mul_mat_set_prec)

ggml-ci

* ggml : add comment about backward GGML_OP_DIAG_MASK_INF (#4203)

* llama : fix platforms without mmap (#4578)

* llama : fix platforms without mmap

* win32 : limit prefetch size to the file size

* fix win32 error clobber, unnecessary std::string in std::runtime_error

* ggml-alloc : fix ggml_tallocr_is_own

* whisper : minor

* ggml : cuda jetson + arm quants warnings

ggml-ci

---------

Co-authored-by: Herman Semenov <GermanAizek@yandex.ru>
Co-authored-by: slaren <slarengh@gmail.com>
2023-12-22 17:53:50 +02:00
Steward Garcia
5bf85a5221
ggml: new gpu kernels + extends ggml_leaky_relu + ggml_pad (#621)
* add new cuda kernels and new op ggml_pad

* add ggml_tanh cuda kernel

* remove old broadcast impl

* restore some changes

* cuda: optimized im2col + group_norm kernels

* extent ggml_leaky -> ggml_leaky_relu

* fix some code issues

* cuda: concat support 4 dims

* cuda: fix ggml_acc + add backends ops test

* restore ggml_pad + add backend op test

* metal : implement GGML_OP_ACC

* ggml : fix bug in ggml_upscale

* metal : add ggml_upscale

* metal : add ggml_tanh

* metal : add ggml_gelu_quick

* ggml : make ggml_pad more general purpose

* metal : add ggml_pad

* ggml_leaky_relu as regular op + fix identation

* cuda: ggml_acc admit all op_parms

* negative_slope better pass param

* metal : add ggml_leaky_relu

* metal : add ggml_group_norm

* cuda : minor

* ggml : add GGML_OP_LEAKY_RELU to ggml_compute_backward

* metal : soft max, tanh, supports_op fixes

* test-backend-ops : add sentinels between tensors to detect overflows

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2023-12-13 09:08:48 -05:00
slaren
703825ffab
ggml : full broadcast in mul, add, div + ggml_mul_mat_id, ggml_argsort, ggml_top_k (#625)
* ggml : support broadcasting in dim 0 in add and mul

* add cuda add/mul broadcast impl
add configurable eps to cuda norm

* add metal impl
ggml-ci

* deduplicate code in cuda impl

* try to optimize cuda impl

* ggml : support broadcasting in ggml_div

* test-backend-ops : allow filtering by op and backend

* ggml-cuda : add ggml_div impl

* ggml : add ggml_mul_mat_id, ggml_sort, ggml_top_k (CPU only)

* fix ggml_div threads

* fix ggml_div with accelerate

* ggml_sort -> ggml_argsort

* whatever

* actually fix accelerate div

* disable opencl ci

* ci : disable ctest error check temporarily until we fix backend ops test

* cmake : propagete GGML_USE_xxx compile flags with ggml target

* whisper : utlize new ggml_add broadcast for dim 0

* cmake : adendum to ee666ae9

* ggml_backend_graph_copy : fix leak

* ggml_cuda : add ggml_sum_rows impl

* metal : add ggml_div

* metal : add ggml_sum_rows

* ggml_cuda : add ggml_argsort impl

* move kernel

* metal : add ggml_argsort

* mul_mat_id : fix missing init task

* cuda/metal: fix argsort synchronization

* metal : add ggml_mul_mat_id

* ggml-cuda : add mul_mat_id for f16 + tensor cores

* test-backend-ops : add tests for quants mat mul

* ggml : fix q5_0 and q5_1 hist stats

* test-backend-ops : use smaller matrices to avoid automatic offloading, add mat-vec tests

* metal : fix alibi to match the CPU behavior

* metal : check dimensions in supports_op

* test-backend-ops : reduce error threshold for mat muls

* ggml-cuda : simplify dequantize funs, add supports_op by type for mul_mat_id

* ggml-cuda : support quantized types in mul_mat_id with cublas

* ggml-cuda : add fallback over CPU for mul_mat_id

* test-backend-ops : increase mul mat error threshold

* cleanup
ggml-ci

* test-backend-ops : fix usage

* cleanup

* ci : re-enable tests

* metal : fix compile warnings

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-05 13:56:07 +01:00
slaren
38f46afdf2
ggml-backend update: buffer types, backend registry, graph compare, tests (#620)
* ggml-backend update

* update metal backend

* show metal logs with ggml-backend

* move buffer types to functions

* cuda: add per-device backends

* cuda: add host buffer type

* fix metal build

* ggml_backend_alloc_ctx_tensors : ignore allocated tensors

* ggml_backend_compare_graph_backend fixes

* ci : try to fix metal build

* metal : first print device info, then build kernels

* ci : disable GGML_METAL on Github Actions

* test-backend-ops initial impl (unary and get_rows)

* more op tests

* cleanup

* print test params, add more tests cases for add and mul

* add tests for im2col

* better f16 init

* metal : add basic impl of supports_op

* add test for ggml_concat

* update im2col test params, show callstack with GGML_ASSERT on CUDA failures

* add more rope tests

* add more rope and mul_mat test cases

* add more get_rows test cases
ggml-ci

* add more norm and rms_norm test cases with different eps

* ci : fix metal resource path

ggml-ci

* tests : silence warning

* add ggml_backend_tensor_alloc and ggml_backend_view_init for initializing tensors without ggml-alloc

* add mul_mat test cases without dims 3 and 4
ggml-ci

* check for nans and infs
ggml-ci

* add diag_mask_inf test cases without dims 3 and 4
ggml-ci

* fix cuda leak while backend reg

* fix msvc issues

* remove backend_sched debug causes by default

* gpt-2 : increase graph size

ggml-ci

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-30 19:03:03 +01:00
slaren
aa1d26e6f3
update examples and tests to use ggml_allocr_new_measure_from_backend (#608)
* update examples and tests to use ggml_allocr_new_measure_from_backend

* update comments
2023-11-13 16:19:49 +01:00
Georgi Gerganov
537e06c953
sync : whisper.cpp (whisper full GPU, fix warnings) (#606)
* sync : whisper.cpp (whisper full GPU, fix warnings)

ggml-ci

* ci : enable CUDA / Metal

ggml-ci

* cuda : fallback to CPU for mul mat ne03 != ne13 (fix SAM + CUDA)

ggml-ci
2023-11-12 16:35:03 +02:00
Steward Garcia
ba779f117e
ggml : replace conv 1D - 2D stage_0 and stage_1 with im2col and mul_mat (#564)
* added conv2d stage 0 - 1 cuda kernels

* add im2col + refactor conv1d and conv2d

* fix params invalid index

* add conv1d and conv2d unit tests

* resolving wrong values and fix mul_mat validation

* improve tests + reduce code duplication

* add cuda kernels

* more data test

* fix ggml_op_count to 70

* add temp test - gemm != mul_mat

* tests : fix test-mul-mat matrix multiplication

* test-mul-mat match gemm == ggml_mul_mat with conv2d op

* replaced gemm by ggml_mul_mat

* ggml_mul_mat cpu backend support fp16 src1

* ggml_mul_mat cuda backend fp16 fixed

* remove unnecessary ggml_cont and removed conv1d-2d functions deprecated

* some fixes

* explain conv1d reshapes

* ggml : fix tests on Arm + do not use BLAS for F16 data

* tests : fix FP16 handling on Arm

* ggml : avoid ggml_cont and ggml_transpose in ggml_conv_xd

* ci : switch back to release

* cuda : fix wrong pointer usage

* ggml : add metal support for im2col and f16xf16 mul mat

* ggml : im2col opts

* Update src/ggml-cuda.cu

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2023-11-12 15:34:04 +02:00