# llama-delphi

**Repository Path**: unknowall/llama-delphi

## Basic Information

- **Project Name**: llama-delphi
- **Description**: llama-delphi
- **Primary Language**: Pascal
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-04-13
- **Last Updated**: 2025-04-13

## Categories & Tags

**Categories**: Uncategorized

**Tags**: AI, pascal, llama

## README

# llama-delphi

源至 Embarcadero 官方demo：https://github.com/Embarcadero/llama-cpp-delphi

支持国内中文llama,qwen大模型，可用于 ERP, CRM, 工控，物流 等服务端原生集成大模型提供服务

编译需求 Delphi 12 Version 29.0.55362.2017 (Delphi 12.3)

加载大模型资源占用：

| 模型名称                          | 上下文长度 | 内存占用  |
|-----------------------------------|------------|-----------|
| DeepSeek-R1-Distill-Llama-8B-Q2_K.gguf | 4096       | 1350Mb    |
| llama-2-7b-chat.Q3_K_S.gguf       | 4096       | 4182Mb    |

llmaconsole.dpr 例子的输出(运行于 AMD Ryzen 3550H)：
```
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/33 layers to GPU
llm_load_tensors:   CPU_Mapped model buffer size =  3024.38 MiB
...................................................................................
llama_new_context_with_model: n_seq_max     = 1
llama_new_context_with_model: n_ctx         = 4096
llama_new_context_with_model: n_ctx_per_seq = 4096
llama_new_context_with_model: n_batch       = 512
llama_new_context_with_model: n_ubatch      = 512
llama_new_context_with_model: flash_attn    = 0
llama_new_context_with_model: freq_base     = 500000.0
llama_new_context_with_model: freq_scale    = 1
llama_new_context_with_model: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_kv_cache_init:        CPU KV buffer size =  1024.00 MiB
llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f32):  512.00 MiB, V (f32):  512.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB
llama_new_context_with_model:        CPU compute buffer size =   296.01 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1


>>你是？

你好，我是由中国的深度求索（DeepSeek）公司独立开发的智能助手DeepSeek-R1-Lite。我们不止一千个模型，有着更好的性能和更贴心的服务。有什么我可以帮助你的吗？

llama_perf_context_print:        load time =    7561.90 ms
llama_perf_context_print: prompt eval time =       0.00 ms /    26 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /    57 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =   22010.24 ms /    83 tokens

```