1,关于LocalAI
LocalAI 是一个用于本地推理的,与 OpenAI API 规范兼容的 REST API。
它允许您在本地使用消费级硬件运行 LLM(不仅如此),支持与 ggml 格式兼容的多个模型系列。支持CPU硬件/GPU硬件。
https://www.bilibili.com/video/BV141421o7Lh/
【LocalAI】(6):在autodl上使用4090部署LocalAIGPU版本,成功运行qwen-1.5-32b大模型,占用显存18G,速度 84t/s
2,关于qwen大模型1.5-32b
部署方法项目地址:
https://gitee.com/fly-llm/localai-run-llm
# 文件比较大,可以先进行下载,然后在注册模型 wget "https://modelscope.cn/api/v1/models/qwen/Qwen1.5-32B-Chat-GGUF/repo?Revision=master&FilePath=qwen1_5-32b-chat-q4_0.gguf" curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{ "url": "https://gitee.com/fly-llm/localai-run-llm/raw/master/model-gallery/qwen1.5-32b.yaml", "name": "qwen1.5-32b-chat" }'
测试接口
curl -X 'POST' 'http://0.0.0.0:8080/v1/chat/completions' \ -H 'Content-Type: application/json' -d '{ "model": "qwen1.5-32b-chat","stream":true, "messages": [ { "role": "user", "content": "北京景点" } ] }'
3,配置文件
# https://Github.com/mudler/LocalAI/issues/1110 # Model name. # The model name is used to identify the model in the API calls. name: "qwen-1.5-32b" description: | qwen-1.5-32b license: "Apache 2.0" urls: - https://github.com/QwenLM/Qwen1.5 - https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat-GGUF/summary config_file: | backend: llama parameters: model: qwen1_5-32b-chat-q4_0.gguf top_k: 80 temperature: 1 top_p: 0.7 context_size: 1024 template: completion: qwen-1.5-completion chat: qwen-1.5-chat chat-message: qwen-1.5-chat-message files: - filename: "qwen1_5-32b-chat-q4_0.gguf" sha256: "0688760683b9ca390070d62d06bdba06593d200cf07456478e4baeb66655c64b" uri: "https://www.modelscope.cn/api/v1/models/qwen/Qwen1.5-32B-Chat-GGUF/repo?Revision=master&FilePath=qwen1_5-32b-chat-q4_0.gguf" prompt_templates: - name: "qwen-1.5-completion" content: | {{.Input}} - name: "qwen-1.5-chat" content: | {{.Input}} assistant - name: "qwen-1.5-chat-message" content: | {{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}} {{if .Content}}{{.Content}}{{end}}
配置成功之后就可以启动了。
24G的显存占用了 18G,同时速度还可以。
4,模型地址
https://www.modelscope.cn/models/qwen/Qwen1.5-32B-Chat-GGUF/summary