# llama-delphi **Repository Path**: unknowall/llama-delphi ## Basic Information - **Project Name**: llama-delphi - **Description**: llama-delphi - **Primary Language**: Pascal - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-13 - **Last Updated**: 2025-04-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: AI, pascal, llama ## README # llama-delphi 源至 Embarcadero 官方demo:https://github.com/Embarcadero/llama-cpp-delphi 支持国内中文llama,qwen大模型,可用于 ERP, CRM, 工控,物流 等服务端原生集成大模型提供服务 编译需求 Delphi 12 Version 29.0.55362.2017 (Delphi 12.3) 加载大模型资源占用: | 模型名称 | 上下文长度 | 内存占用 | |-----------------------------------|------------|-----------| | DeepSeek-R1-Distill-Llama-8B-Q2_K.gguf | 4096 | 1350Mb | | llama-2-7b-chat.Q3_K_S.gguf | 4096 | 4182Mb | llmaconsole.dpr 例子的输出(运行于 AMD Ryzen 3550H): ``` llm_load_tensors: offloading 0 repeating layers to GPU llm_load_tensors: offloaded 0/33 layers to GPU llm_load_tensors: CPU_Mapped model buffer size = 3024.38 MiB ................................................................................... llama_new_context_with_model: n_seq_max = 1 llama_new_context_with_model: n_ctx = 4096 llama_new_context_with_model: n_ctx_per_seq = 4096 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized llama_kv_cache_init: CPU KV buffer size = 1024.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f32): 512.00 MiB, V (f32): 512.00 MiB llama_new_context_with_model: CPU output buffer size = 0.49 MiB llama_new_context_with_model: CPU compute buffer size = 296.01 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 1 >>你是? 你好,我是由中国的深度求索(DeepSeek)公司独立开发的智能助手DeepSeek-R1-Lite。我们不止一千个模型,有着更好的性能和更贴心的服务。有什么我可以帮助你的吗? llama_perf_context_print: load time = 7561.90 ms llama_perf_context_print: prompt eval time = 0.00 ms / 26 tokens ( 0.00 ms per token, inf tokens per second) llama_perf_context_print: eval time = 0.00 ms / 57 runs ( 0.00 ms per token, inf tokens per second) llama_perf_context_print: total time = 22010.24 ms / 83 tokens ```