pipeline 中的最常见位置	在 ChatPromptBuilder 之后
必需的初始化变量	“api_key”：Meta Llama API密钥。可以通过`LLAMA_API_KEY`环境变量设置，或传递给`init()`方法。
强制运行变量	“messages”：一个ChatMessage对象的列表
输出变量	“replies”：一个ChatMessage对象的列表
API 参考	Meta Llama API
GitHub 链接	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/meta_llama

概述

该MetaLlamaChatGenerator允许您通过向Meta Llama API进行聊天完成调用来使用多个Meta Llama模型。默认模型是Llama-4-Scout-17B-16E-Instruct-FP8.

当前可用的模型有

模型ID	输入上下文长度	输出上下文长度	输入模态	输出模态
`Llama-4-Scout-17B-16E-Instruct-FP8`	128k	4028	文本、图像	文本
`Llama-4-Maverick-17B-128E-Instruct-FP8`	128k	4028	文本、图像	文本
`Llama-3.3-70B-Instruct`	128k	4028	文本	文本
`Llama-3.3-8B-Instruct`	128k	4028	文本	文本

此组件使用与Haystack其他聊天生成器相同的ChatMessage格式，用于结构化输入和输出。有关更多信息，请参阅ChatMessage文档。

它也完全兼容Haystack Tools和Toolsets，这些工具支持与受支持模型进行函数调用。

初始化

要使用此集成，您必须拥有Meta Llama API密钥。您可以通过LLAMA_API_KEY环境变量或通过Secret提供。

然后，安装meta-llama-haystack集成

pip install meta-llama-haystack

流式传输

MetaLlamaChatGenerator支持从LLM流式传输响应，允许在生成令牌时立即发出。要启用流式传输，请在初始化时将一个可调用对象传递给streaming_callback参数。

用法

单独使用

from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator

llm = MetaLlamaChatGenerator()
response = llm.run(
    [ChatMessage.from_user("What are Agentic Pipelines? Be brief.")]
)
print(response["replies"][0].text)

结合流式传输和模型路由

from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator

llm = MetaLlamaChatGenerator(model="Llama-3.3-8B-Instruct", 
streaming_callback=lambda chunk: print(chunk.content, end="", flush=True))

response = llm.run(
    [ChatMessage.from_user("What are Agentic Pipelines? Be brief.")]
    )

# check the model used for the response
print("\n\n Model used: ", response["replies"][0].meta["model"])

在 pipeline 中

# To run this example, you will need to set a `LLAMA_API_KEY` environment variable.

from haystack import Document, Pipeline
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
from haystack.components.generators.utils import print_streaming_chunk
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.dataclasses import ChatMessage
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.utils import Secret

from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator

# Write documents to InMemoryDocumentStore
document_store = InMemoryDocumentStore()
document_store.write_documents(
    [
        Document(content="My name is Jean and I live in Paris."),
        Document(content="My name is Mark and I live in Berlin."),
        Document(content="My name is Giorgio and I live in Rome."),
    ]
)

# Build a RAG pipeline
prompt_template = [
    ChatMessage.from_user(
        "Given these documents, answer the question.\n"
        "Documents:\n{% for doc in documents %}{{ doc.content }}{% endfor %}\n"
        "Question: {{question}}\n"
        "Answer:"
    )
]

# Define required variables explicitly
prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables={"question", "documents"})

retriever = InMemoryBM25Retriever(document_store=document_store)
llm = MetaLlamaChatGenerator(
    api_key=Secret.from_env_var("LLAMA_API_KEY"),
    streaming_callback=print_streaming_chunk,
)

rag_pipeline = Pipeline()
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", llm)
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm.messages")

# Ask a question
question = "Who lives in Paris?"
rag_pipeline.run(
    {
        "retriever": {"query": question},
        "prompt_builder": {"question": question},
    }
)