pipeline 中的最常见位置	在 `ChatPromptBuilder` 之后
必需的初始化变量	"api_type": 要使用的 Hugging Face API 类型 "api_params": 一个包含以下键之一的字典 - `model`: Hugging Face 模型 ID。当`api_type` 是`SERVERLESS_INFERENCE_API` 时必需。或 -`url`: 推理终结点的 URL。当`api_type` 是`INFERENCE_ENDPOINTS` 或`TEXT_EMBEDDINGS_INFERENCE`。“token”：Hugging Face API 令牌。可以使用以下方式设置`HF_API_TOKEN` 或`HF_TOKEN` 环境变量设置。
强制运行变量	“messages”：一个 `ChatMessage` 对象列表，代表聊天记录
输出变量	“replies”：LLM 对输入聊天的回复列表
API 参考	Generators (生成器)
GitHub 链接	https://github.com/deepset-ai/haystack/blob/main/haystack/components/generators/chat/hugging_face_api.py

概述

HuggingFaceAPIChatGenerator 可用于通过不同的 Hugging Face API 生成聊天补全

此组件的主要输入是一个消息列表ChatMessage 对象。ChatMessage 是一个数据类，其中包含消息、角色（谁生成了消息，例如user, assistant, system, function），以及可选的元数据。有关更多信息，请参阅我们的ChatMessage 文档。

📘
此组件专为聊天补全而设计，因此它期望的是消息列表，而不是单个字符串。如果您想使用 Hugging Face API 进行简单的文本生成（例如翻译或摘要任务）或不想使用ChatMessage 对象，请改用HuggingFaceAPIGenerator。

该组件默认使用HF_API_TOKEN 环境变量。否则，您可以在初始化时通过以下方式传递 Hugging Face API 令牌token – 请参阅下面的代码示例。
令牌是必需的

如果您使用无服务器推理 API，或者
如果您使用推理终结点。

流式传输

此 Generator 支持将 LLM 的 token直接流式传输到输出中。要做到这一点，请将一个函数传递给streaming_callback 初始化参数。

用法

单独使用

使用无服务器推理 API（推理提供程序）- 提供免费套餐

此 API 允许您快速尝试 Hugging Face Hub 上托管的许多模型，将推理 offload 到 Hugging Face 服务器。它有速率限制，不适合生产环境。

要使用此 API，您需要一个免费的 Hugging Face 令牌。
生成器期望model 在api_params。也建议指定一个provider 以获得更好的性能和可靠性。

from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret
from haystack.utils.hf import HFGenerationAPIType

messages = [ChatMessage.from_system("\\nYou are a helpful, respectful and honest assistant"),
            ChatMessage.from_user("What's Natural Language Processing?")]

# the api_type can be expressed using the HFGenerationAPIType enum or as a string
api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API
api_type = "serverless_inference_api" # this is equivalent to the above

generator = HuggingFaceAPIChatGenerator(api_type=api_type,
                                        api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
                                                    "provider": "together"},
                                        token=Secret.from_env_var("HF_API_TOKEN"))

result = generator.run(messages)
print(result)

使用付费推理终结点

在这种情况下，Hugging Face 会部署模型的私有实例，您通常按小时付费。

要了解如何启动推理终结点，请访问 Hugging Face 文档。

此外，在这种情况下，您需要提供您的 Hugging Face 令牌。
生成器期望url 的您的终结点在api_params.

from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret

messages = [ChatMessage.from_system("\\nYou are a helpful, respectful and honest assistant"),
            ChatMessage.from_user("What's Natural Language Processing?")]

generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints",
                                        api_params={"url": "<your-inference-endpoint-url>"},
                                        token=Secret.from_env_var("HF_API_TOKEN"))

result = generator.run(messages)
print(result)

使用无服务器推理 API（推理提供程序）和文本+图像输入

您还可以将此组件与支持文本和图像输入的、多模态模型一起使用

from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage, ImageContent
from haystack.utils import Secret
from haystack.utils.hf import HFGenerationAPIType

# Create an image from file path, URL, or base64
image = ImageContent.from_file_path("path/to/your/image.jpg")

# Create a multimodal message with both text and image
messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])]

generator = HuggingFaceAPIChatGenerator(
    api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
    api_params={
        "model": "Qwen/Qwen2.5-VL-7B-Instruct",  # Vision Language Model
        "provider": "hyperbolic"
    },
    token=Secret.from_token("<your-api-key>")
)

result = generator.run(messages)
print(result)

使用自托管文本生成推理 (TGI)

Hugging Face Text Generation Inference 是一个用于高效部署和提供 LLM 的工具包。

虽然它支持无服务器推理 API 和推理终结点的最新版本，但也可以通过 Docker 轻松地在本地使用。

例如，您可以如下运行 TGI 容器

model=HuggingFaceH4/zephyr-7b-beta
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4 --model-id $model

有关更多信息，请参阅官方 TGI 存储库。

生成器期望url 您的 TGI 实例api_params.

from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage

messages = [ChatMessage.from_system("\\nYou are a helpful, respectful and honest assistant"),
            ChatMessage.from_user("What's Natural Language Processing?")]

generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference",
                                        api_params={"url": "https://:8080"})

result = generator.run(messages)
print(result)

在 pipeline 中

from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack import Pipeline
from haystack.utils import Secret
from haystack.utils.hf import HFGenerationAPIType

# no parameter init, we don't use any runtime template variables
prompt_builder = ChatPromptBuilder()
llm = HuggingFaceAPIChatGenerator(api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
                                  api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
                                             "provider": "together"},
                                  token=Secret.from_env_var("HF_API_TOKEN"))
                                        
pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("prompt_builder.prompt", "llm.messages")
location = "Berlin"
messages = [ChatMessage.from_system("Always respond in German even if some input data is in other languages."),
ChatMessage.from_user("Tell me about {{location}}")]
result = pipe.run(data={"prompt_builder": {"template_variables":{"location": location}, "template": messages}})

print(result)

其他参考资料

🧑‍🍳 食谱：使用 Google Gemma 进行聊天和 RAG