1.1 KiB
1.1 KiB
✨ #67 - Feature Request: Elliminate/reduce unnecessary copies
| Author | ikawrakow |
|---|---|
| State | ✅ Open |
| Created | 2024-09-28 |
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
PR #66 does it for Phi-3(.5)-mini, with a non-negligible performance gain on GPUs. Architectures that could potentially benefit from the same optimization are Falcon, DBRX, Starcoder, Bert, Bloom, MPT, Qwen, Phi-2, GPT-2, Codeshell, OpenLM, GPT-Neox, ChatGLM.
Motivation
Improve performance
Possible Implementation
See #66