Local LLM Server with NPU Acceleration
Overview
What is Lemonade?
Lemonade is a Local LLM Server designed to serve, benchmark, and deploy large language models (LLMs) with NPU acceleration.
How to use Lemonade?
To use Lemonade, follow the installation instructions provided in the README and utilize the Lemonade CLI to mix-and-match LLMs and run experiments.
Key features of Lemonade?
- 🌐 Lemonade Server: Integrates with local LLMs using the standard Open AI API.
- 🐍 Lemonade Python API: High-Level and Low-Level APIs for easy integration and custom experiments.
- 🖥️ Lemonade CLI: Tools for prompting, measuring accuracy, benchmarking, and profiling memory usage of LLMs.
Use cases of Lemonade?
- Serving LLMs on various hardware platforms (CPU, GPU, NPU).
- Benchmarking LLM performance for research and development.
- Custom experiments with different LLM frameworks.
FAQ from Lemonade?
- Can Lemonade be used on different operating systems?
Yes! Lemonade supports both Windows and Linux.
- Is Lemonade open-source?
Yes! Lemonade is licensed under the Apache 2.0 License.
- How can I contribute to Lemonade?
You can contribute by following the guidelines in the contribution guide.