pipeline 中的最常见位置	在 `PromptBuilder` 之后
必需的初始化变量	"api_type": 要使用的 Hugging Face API 类型 "api_params": 一个包含以下键之一的字典 - `model`: Hugging Face 模型 ID。当`api_type` 是`SERVERLESS_INFERENCE_API` 时必需。或 -`url`: 推理终结点的 URL。当`api_type` 是`INFERENCE_ENDPOINTS` 或`TEXT_EMBEDDINGS_INFERENCE`。“token”：Hugging Face API 令牌。可使用以下方式设置`HF_API_TOKEN` 或`HF_TOKEN` 环境变量设置。
强制运行变量	“prompt”：一个包含 LLM 提示的字符串
输出变量	“replies”：一个包含 LLM 生成的所有回复的字符串列表 “meta”：包含与每个回复关联的元数据的字典列表，例如令牌计数、完成原因等。
API 参考	Generators (生成器)
GitHub 链接	https://github.com/deepset-ai/haystack/blob/main/haystack/components/generators/hugging_face_api.py

概述

HuggingFaceAPIGenerator 可用于使用不同的 Hugging Face API 生成文本。

🚧
重要提示
截至 2025 年 7 月，Hugging Face 推理 API 不再通过text_generation 端点提供生成模型。生成模型现在仅通过支持chat_completion 端点的提供程序可用。因此，此组件可能不再适用于 Hugging Face 推理 API。
请改用 HuggingFaceAPIChatGenerator 组件，它支持chat_completion 端点，并且适用于免费的 Serverless 推理 API。

📘
此组件专为文本生成而设计，并非用于聊天。如果您想将这些 LLM 用于聊天，请改用 HuggingFaceAPIChatGenerator。

该组件默认使用默认情况下，使用 HF_API_TOKEN 环境变量。否则，您可以在初始化时通过token 传递 Hugging Face API 令牌 — 请参阅下面的代码示例。
使用推理端点时需要该令牌。

流式传输

此 Generator 支持将 LLM 的 token直接流式传输到输出中。要做到这一点，请将一个函数传递给streaming_callback 初始化参数。

用法

单独使用

使用付费推理终结点

在这种情况下，Hugging Face 会部署模型的私有实例，您通常按小时付费。

要了解如何启动推理终结点，请访问 Hugging Face 文档。

此外，在这种情况下，您需要提供您的 Hugging Face 令牌。
生成器期望url 的您的终结点在api_params.

from haystack.components.generators import HuggingFaceAPIGenerator
from haystack.utils import Secret

generator = HuggingFaceAPIGenerator(api_type="inference_endpoints",
                                    api_params={"url": "<your-inference-endpoint-url>"},
                                    token=Secret.from_token("<your-api-key>"))

result = generator.run(prompt="What's Natural Language Processing?")
print(result)

使用自托管文本生成推理 (TGI)

Hugging Face 文本生成推理是一个用于高效部署和提供 LLM 的工具包。

虽然它支持无服务器推理 API 和推理终结点的最新版本，但也可以通过 Docker 轻松地在本地使用。

例如，您可以如下运行 TGI 容器

model=mistralai/Mistral-7B-v0.1
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4 --model-id $model

有关更多信息，请参阅官方 TGI 存储库。

生成器期望在以下内容中指定您的 TGI 实例的api_params.

from haystack.components.generators import HuggingFaceAPIGenerator

generator = HuggingFaceAPIGenerator(api_type="text_generation_inference",
                                    api_params={"url": "https://:8080"})

result = generator.run(prompt="What's Natural Language Processing?")
print(result)

使用免费 Serverless 推理 API（不推荐）

🚧
此示例可能无法正常工作，因为 Hugging Face 推理 API 不再提供支持text_generation 端点的模型。请使用 HuggingFaceAPIChatGenerator 通过chat_completion 端点进行生成。

以前称为（免费）Hugging Face 推理 API，此 API 允许您快速尝试 Hugging Face Hub 上托管的许多模型，将推理卸载到 Hugging Face 服务器。它有限制，不适用于生产环境。

要使用此 API，您需要一个免费的 Hugging Face 令牌。
生成器期望model 在api_params.

from haystack.components.generators import HuggingFaceAPIGenerator
from haystack.utils import Secret

generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
                                    api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
                                    token=Secret.from_token("<your-api-key>"))

result = generator.run(prompt="What's Natural Language Processing?")
print(result)

在 pipeline 中

from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import HuggingFaceAPIGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Document

docstore = InMemoryDocumentStore()
docstore.write_documents([Document(content="Rome is the capital of Italy"), Document(content="Paris is the capital of France")])

query = "What is the capital of France?"

template = """
Given the following information, answer the question.

Context: 
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{ query }}?
"""

generator = HuggingFaceAPIGenerator(api_type="inference_endpoints",
                                    api_params={"url": "<your-inference-endpoint-url>"},
                                    token=Secret.from_token("<your-api-key>"))

pipe = Pipeline()

pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", generator)
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

res=pipe.run({
    "prompt_builder": {
        "query": query
    },
    "retriever": {
        "query": query
    }
})

print(res)

其他参考资料

🧑‍🍳食谱

概述

🚧重要提示

📘

流式传输

用法

单独使用

使用付费推理终结点

使用自托管文本生成推理 (TGI)

使用免费 Serverless 推理 API（不推荐）

🚧

在 pipeline 中

其他参考资料

🚧
重要提示