pipeline 中的最常见位置	在 ChatPromptBuilder 之后
强制运行变量	“messages”：一个 `ChatMessage` 对象列表，代表聊天记录
输出变量	“replies”: LLM 的备选回复列表
API 参考	Ollama
GitHub 链接	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/ollama

概述

Ollama 是一个专注于在本地运行 LLM 的项目。默认情况下，它在内部使用量化 GGUF 格式。这意味着可以在普通机器（即使没有 GPU）上运行 LLM，而无需处理复杂的安装步骤。

OllamaChatGenerator 支持运行在 Ollama 上的模型，例如llama2 和mixtral。在此处找到支持模型的完整列表。

OllamaChatGenerator 需要一个model 名称和一个url 才能工作。默认情况下，它使用"orca-mini" 模型和"https://:11434" url。

使用OllamaChatGenerator 的方式是使用ChatMessage 对象。ChatMessage 是一个数据类，其中包含消息、角色（谁生成了消息，例如user, assistant, system, function）以及可选的元数据。有关示例，请参阅用法部分。

流式传输

您可以随着输出的生成而流式传输。将回调函数传递给streaming_callback。使用内置的print_streaming_chunk 来打印文本 token 和工具事件（工具调用和工具结果）。

from haystack.components.generators.utils import print_streaming_chunk

# Configure any `Generator` or `ChatGenerator` with a streaming callback
component = SomeGeneratorOrChatGenerator(streaming_callback=print_streaming_chunk)

# If this is a `ChatGenerator`, pass a list of messages:
# from haystack.dataclasses import ChatMessage
# component.run([ChatMessage.from_user("Your question here")])

# If this is a (non-chat) `Generator`, pass a prompt:
# component.run({"prompt": "Your prompt here"})

📘
流式输出仅适用于单个响应。如果提供程序支持多个候选，请将n=1.

设置为 1。有关流式输出如何工作以及如何编写自定义回调函数，请参阅我们的流式输出支持文档。StreamingChunk 工作原理以及如何编写自定义回调。

默认首选print_streaming_chunk。仅当您需要特定的传输（例如 SSE/WebSocket）或自定义 UI 格式时，才编写自定义回调。

用法

你需要一个正在运行的 Ollama 实例。安装说明在Ollama GitHub 仓库中。
运行 Ollama 的一种快速方法是使用 Docker

docker run -d -p 11434:11434 --name ollama ollama/ollama:latest

你需要下载或拉取所需的 LLM。模型库可在Ollama 网站上找到。
如果你使用 Docker，可以例如拉取 Zephyr 模型

docker exec ollama ollama pull zephyr

如果你已经在系统中安装了 Ollama，可以执行

ollama pull zephyr

👍
选择特定版本的模型
你还可以指定一个标签来选择模型的特定（量化）版本。可用的标签在 Ollama 模型库的模型卡片中显示。这是 Zephyr 的一个示例。
在这种情况下，只需运行
# ollama pull model:tag
ollama pull zephyr:7b-alpha-q3_K_S

你还需要安装ollama-haystack 包

pip install ollama-haystack

单独使用

from haystack_integrations.components.generators.ollama import OllamaChatGenerator
from haystack.dataclasses import ChatMessage

generator = OllamaChatGenerator(model="zephyr",
                            url = "https://:11434",
                            generation_kwargs={
                              "num_predict": 100,
                              "temperature": 0.9,
                              })

messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]

print(generator.run(messages=messages))
>> {
    "replies": [
        ChatMessage(
            _role=<ChatRole.ASSISTANT: 'assistant'>,
            _content=[
                TextContent(
                    text=(
                        "Natural Language Processing (NLP) is a subfield of "
                        "Artificial Intelligence that deals with understanding, "
                        "interpreting, and generating human language in a meaningful "
                        "way. It enables tasks such as language translation, sentiment "
                        "analysis, and text summarization."
                    )
                )
            ],
            _name=None,
            _meta={
                "model": "zephyr",...
            }
        )
    ]
}

在 Pipeline 中

from haystack.components.builders import ChatPromptBuilder
from haystack_integrations.components.generators.ollama import OllamaChatGenerator
from haystack.dataclasses import ChatMessage
from haystack import Pipeline

# no parameter init, we don't use any runtime template variables
prompt_builder = ChatPromptBuilder()
generator = OllamaChatGenerator(model="zephyr",
                            url = "https://:11434",
                            generation_kwargs={
                              "temperature": 0.9,
                              })

pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", generator)
pipe.connect("prompt_builder.prompt", "llm.messages")
location = "Berlin"
messages = [ChatMessage.from_system("Always respond in Spanish even if some input data is in other languages."),
            ChatMessage.from_user("Tell me about {{location}}")]
print(pipe.run(data={"prompt_builder": {"template_variables":{"location": location}, "template": messages}}))

>> {
    "llm": {
        "replies": [
            ChatMessage(
                _role=<ChatRole.ASSISTANT: 'assistant'>,
                _content=[
                    TextContent(
                        text=(
                            "Berlín es la capital y la mayor ciudad de Alemania. "
                            "Está ubicada en el estado federado de Berlín, y tiene más..."
                        )
                    )
                ],
                _name=None,
                _meta={
                    "model": "zephyr",...
                }
            )
        ]
    }
}

概述

流式传输

📘

用法

👍选择特定版本的模型

单独使用

在 Pipeline 中

👍
选择特定版本的模型