pipeline 中的最常见位置	使用 ChatPromptBuilder 后
必需的初始化变量	"model": 用于聊天补全的模型名称。这取决于 Llama Stack 服务器使用的推理提供商。
强制运行变量	“messages”：一个包含 `ChatMessage` 对象的列表，代表聊天内容
输出变量	“replies”: 模型对输入聊天的备选回复列表
API 参考	Llama Stack
GitHub 链接	https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/llama_stack

概述

Llama Stack 提供构建块和统一的 API，以简化跨各种环境的 AI 应用程序的开发。

该LlamaStackChatGenerator 使您能够访问托管在 Llama Stack 服务器上的推理提供商公开的任何 LLM。它抽象了底层提供商的细节，允许您重用相同的客户端代码，而无需考虑推理后端。有关支持的提供商和配置选项的列表，请参阅 Llama Stack 文档。

此组件使用与Haystack其他聊天生成器相同的ChatMessage格式，用于结构化输入和输出。有关更多信息，请参阅ChatMessage文档。

它还完全兼容 Haystack 的工具/工具集，支持与受支持的模型进行函数调用功能。

初始化

要使用此集成，您必须拥有

一个正在运行的 Llama Stack 服务器实例（本地或远程）
一个由您选择的推理提供商支持的有效模型名称

然后初始化LlamaStackChatGenerator，通过指定model 名称或 ID。该值取决于您服务器上运行的推理提供商。

示例

对于 Ollamamodel="ollama/llama3.2:3b"
对于 vLLMmodel="meta-llama/Llama-3.2-3B"

注意：切换推理提供商只需更新模型名称。

流式传输

此 Generator 支持将 LLM 的 token直接流式传输到输出中。要做到这一点，请将一个函数传递给streaming_callback 初始化参数。

用法

要开始使用此集成，请使用以下命令安装该包

pip install llama-stack-haystack

单独使用

import os
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.llama_stack import LlamaStackChatGenerator

client = LlamaStackChatGenerator(model="ollama/llama3.2:3b") 
response = client.run(
    [ChatMessage.from_user("What are Agentic Pipelines? Be brief.")]
)
print(response["replies"])

使用流式传输

import os
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.llama_stack import LlamaStackChatGenerator
from haystack.components.generators.utils import print_streaming_chunk

client = LlamaStackChatGenerator(model="ollama/llama3.2:3b",
				streaming_callback=print_streaming_chunk)
response = client.run(
    [ChatMessage.from_user("What are Agentic Pipelines? Be brief.")]
)
print(response["replies"])

在 pipeline 中

from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.llama_stack import LlamaStackChatGenerator

prompt_builder = ChatPromptBuilder()
llm = LlamaStackChatGenerator(model="ollama/llama3.2:3b")

pipe = Pipeline()
pipe.add_component("builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("builder.prompt", "llm.messages")

messages = [
    ChatMessage.from_system("Give brief answers."),
    ChatMessage.from_user("Tell me about {{city}}")
]

response = pipe.run(
    data={"builder": {"template": messages,
                      "template_variables": {"city": "Berlin"}}}
)
print(response)