模块 haystack_integrations.components.generators.ollama.generator

OllamaGenerator

提供了一个使用 Ollama 上运行的 LLM 生成文本的接口。

使用示例

from haystack_integrations.components.generators.ollama import OllamaGenerator

generator = OllamaGenerator(model="zephyr",
                            url = "https://:11434",
                            generation_kwargs={
                            "num_predict": 100,
                            "temperature": 0.9,
                            })

print(generator.run("Who is the best American actor?"))

OllamaGenerator.init

def __init__(model: str = "orca-mini",
             url: str = "https://:11434",
             generation_kwargs: Optional[Dict[str, Any]] = None,
             system_prompt: Optional[str] = None,
             template: Optional[str] = None,
             raw: bool = False,
             timeout: int = 120,
             keep_alive: Optional[Union[float, str]] = None,
             streaming_callback: Optional[Callable[[StreamingChunk],
                                                   None]] = None)

参数:

model: 要使用的模型的名称。该模型应在正在运行的 Ollama 实例中可用。
url: 正在运行的 Ollama 实例的 URL。
generation_kwargs: 传递给 Ollama 生成端点的可选参数，例如 temperature、top_p 等。请参阅 Ollama 文档中的可用参数。
system_prompt: 可选的系统消息（覆盖 Ollama Modelfile 中定义的内容）。
template: 完整的提示模板（覆盖 Ollama Modelfile 中定义的内容）。
raw: 如果为 True，则不会对提示应用任何格式。如果您在 API 请求中指定了完整的模板化提示，则可以选择使用 raw 参数。
timeout: 从 Ollama API 抛出超时错误之前的秒数。
streaming_callback: 当从流中接收到新 token 时调用的回调函数。回调函数接受 StreamingChunk 作为参数。
keep_alive: 控制模型在请求后在内存中加载多长时间的选项。如果未设置，它将使用 Ollama 的默认值（5 分钟）。该值可以设置为：
持续时间字符串（例如“10m”或“24h”）
秒数（例如 3600）
任何负数，它将使模型加载到内存中（例如 -1 或“-1m”）
'0'，它将在生成响应后立即卸载模型。

OllamaGenerator.to_dict

def to_dict() -> Dict[str, Any]

将组件序列化为字典。

返回值:

包含序列化数据的字典。

OllamaGenerator.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OllamaGenerator"

从字典反序列化组件。

参数:

data: 要反序列化的字典。

返回值:

反序列化后的组件。

OllamaGenerator.run

@component.output_types(replies=List[str], meta=List[Dict[str, Any]])
def run(
    prompt: str,
    generation_kwargs: Optional[Dict[str, Any]] = None,
    *,
    streaming_callback: Optional[Callable[[StreamingChunk], None]] = None
) -> Dict[str, List[Any]]

在给定提示上运行 Ollama 模型。

参数:

prompt: 用于生成响应的提示。
generation_kwargs: 传递给 Ollama 生成端点的可选参数，例如 temperature、top_p 等。请参阅 Ollama 文档中的可用参数。
streaming_callback: 当从流中接收到新 token 时调用的回调函数。

返回值:

包含以下键的字典

replies: 来自模型的响应
meta: 在运行过程中收集的元数据

模块 haystack_integrations.components.generators.ollama.chat.chat_generator

OllamaChatGenerator

Haystack Chat Generator，用于通过 Ollama (https://ollama.ac.cn) 提供的模型。

支持流式传输、工具调用、推理和结构化输出。

使用示例

from haystack_integrations.components.generators.ollama.chat import OllamaChatGenerator
from haystack.dataclasses import ChatMessage

llm = OllamaChatGenerator(model="qwen3:0.6b")
result = llm.run(messages=[ChatMessage.from_user("What is the capital of France?")])
print(result)

OllamaChatGenerator.init

def __init__(model: str = "qwen3:0.6b",
             url: str = "https://:11434",
             generation_kwargs: Optional[Dict[str, Any]] = None,
             timeout: int = 120,
             keep_alive: Optional[Union[float, str]] = None,
             streaming_callback: Optional[Callable[[StreamingChunk],
                                                   None]] = None,
             tools: Optional[Union[List[Tool], Toolset]] = None,
             response_format: Optional[Union[None, Literal["json"],
                                             JsonSchemaValue]] = None,
             think: Union[bool, Literal["low", "medium", "high"]] = False)

:param model

要使用的模型的名称。该模型必须已存在于正在运行的 Ollama 实例中（已拉取）。 :param url: Ollama 服务器的基本 URL（默认为 "https://:11434"）。 :param generation_kwargs: 传递给 Ollama 生成端的可选参数，例如 temperature、top_p 等。请参阅 Ollama 文档中的可用参数。 :param timeout: 从 Ollama API 抛出超时错误之前的秒数。 :param think 如果为 True，则模型将在生成响应之前进行“思考”。只有思考模型支持此功能。一些模型（如 gpt-oss）支持不同级别的思考：“低”、“中”、“高”。中间的“思考”输出可以通过检查返回的reasoning 属性来找到ChatMessage。 :param keep_alive: 控制模型在请求后在内存中加载多长时间的选项。如果未设置，它将使用 Ollama 的默认值（5 分钟）。该值可以设置为： - 持续时间字符串（例如“10m”或“24h”） - 秒数（例如 3600） - 任何负数，它将使模型加载到内存中（例如 -1 或“-1m”） - '0'，它将在生成响应后立即卸载模型。 :param streaming_callback: 当从流中接收到新 token 时调用的回调函数。回调函数接受 StreamingChunk 作为参数。 :param tools: 一个工具列表，例如haystack.tools.Tool 或haystack.tools.Toolset。重复的工具名称会引发ValueError。并非所有模型都支持工具。有关与工具兼容的模型列表，请参阅模型页面。 :param response_format: 用于结构化模型输出的格式。该值可以是： - None：不对响应应用任何特定的结构或格式。响应按原样返回。 - "json"：响应格式化为 JSON 对象。 - JSON Schema：响应格式化为符合指定 JSON Schema 的 JSON 对象。（需要 Ollama ≥ 0.1.34）

OllamaChatGenerator.to_dict

def to_dict() -> Dict[str, Any]

将组件序列化为字典。

返回值:

包含序列化数据的字典。

OllamaChatGenerator.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OllamaChatGenerator"

从字典反序列化组件。

参数:

data: 要反序列化的字典。

返回值:

反序列化后的组件。

OllamaChatGenerator.run

@component.output_types(replies=List[ChatMessage])
def run(
    messages: List[ChatMessage],
    generation_kwargs: Optional[Dict[str, Any]] = None,
    tools: Optional[Union[List[Tool], Toolset]] = None,
    *,
    streaming_callback: Optional[StreamingCallbackT] = None
) -> Dict[str, List[ChatMessage]]

在给定的聊天历史记录上运行 Ollama 模型。

参数:

messages: 一个 ChatMessage 实例列表，表示输入消息。
generation_kwargs: 每调用一次 Ollama 推理选项的覆盖。这些将合并到实例级别的generation_kwargs 之上。传递给 Ollama 生成端的可选参数，例如 temperature、top_p 等。请参阅 Ollama 文档。
tools: 一个工具列表或 Toolset，模型可以为此准备调用。此参数可以接受一个Tool 对象列表或一个Toolset 实例。如果设置，它将覆盖组件初始化期间设置的tools 参数。
streaming_callback: 一个可调用对象，用于接收StreamingChunk 对象。提供回调（在此处或构造函数中）会将组件切换到流模式。

返回值:

包含以下键的字典

replies: 一个 ChatMessages 列表，包含模型的响应

OllamaChatGenerator.run_async

@component.output_types(replies=List[ChatMessage])
async def run_async(
    messages: List[ChatMessage],
    generation_kwargs: Optional[Dict[str, Any]] = None,
    tools: Optional[Union[List[Tool], Toolset]] = None,
    *,
    streaming_callback: Optional[StreamingCallbackT] = None
) -> Dict[str, List[ChatMessage]]

run 的异步版本。在给定的聊天历史记录上运行 Ollama 模型。

参数:

messages: 一个 ChatMessage 实例列表，表示输入消息。
generation_kwargs: 每调用一次 Ollama 推理选项的覆盖。这些将合并到实例级别的generation_kwargs.
tools: 一个工具列表或 Toolset，模型可以为此准备调用。如果设置，它将覆盖tools 参数。
streaming_callback: 一个可调用对象，用于接收StreamingChunk 对象。提供回调会将组件切换到流模式。

返回值:

包含以下键的字典

replies: 一个 ChatMessages 列表，包含模型的响应

模块 haystack_integrations.components.embedders.ollama.document_embedder

OllamaDocumentEmbedder

计算一系列文档的嵌入，并将获得的向量存储在每个文档的 embedding 字段中。它使用与 Ollama 库兼容的嵌入模型。

使用示例

from haystack import Document
from haystack_integrations.components.embedders.ollama import OllamaDocumentEmbedder

doc = Document(content="What do llamas say once you have thanked them? No probllama!")
document_embedder = OllamaDocumentEmbedder()

result = document_embedder.run([doc])
print(result['documents'][0].embedding)

OllamaDocumentEmbedder.init

def __init__(model: str = "nomic-embed-text",
             url: str = "https://:11434",
             generation_kwargs: Optional[Dict[str, Any]] = None,
             timeout: int = 120,
             keep_alive: Optional[Union[float, str]] = None,
             prefix: str = "",
             suffix: str = "",
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n",
             batch_size: int = 32)

参数:

model: 要使用的模型的名称。该模型应在正在运行的 Ollama 实例中可用。
url: 正在运行的 Ollama 实例的 URL。
generation_kwargs: 传递给 Ollama 生成端点的可选参数，例如 temperature、top_p 等。请参阅 Ollama 文档中的可用参数。
timeout: 从 Ollama API 抛出超时错误之前的秒数。
keep_alive: 控制模型在请求后在内存中加载多长时间的选项。如果未设置，它将使用 Ollama 的默认值（5 分钟）。该值可以设置为：
持续时间字符串（例如“10m”或“24h”）
秒数（例如 3600）
任何负数，它将使模型加载到内存中（例如 -1 或“-1m”）
'0'，它将在生成响应后立即卸载模型。
prefix: 添加到每个文本开头的字符串。
suffix: 添加到每个文本末尾的字符串。
progress_bar: 如果True，则运行时显示进度条。
meta_fields_to_embed: 要与文档文本一起嵌入的元数据字段列表。
embedding_separator: 用于将元数据字段连接到文档文本的分隔符。
batch_size: 一次处理的文档数量。

OllamaDocumentEmbedder.run

@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(
    documents: List[Document],
    generation_kwargs: Optional[Dict[str, Any]] = None
) -> Dict[str, Union[List[Document], Dict[str, Any]]]

运行 Ollama 模型以计算提供的文档的嵌入。

参数:

documents: 要转换为嵌入的文档。
generation_kwargs: 传递给 Ollama 生成端的可选参数，例如 temperature、top_p 等。请参阅 Ollama 文档。

返回值:

包含以下键的字典

documents: 附加了嵌入信息的文档
meta: 在嵌入过程中收集的元数据

OllamaDocumentEmbedder.run_async

@component.output_types(documents=List[Document], meta=Dict[str, Any])
async def run_async(
    documents: List[Document],
    generation_kwargs: Optional[Dict[str, Any]] = None
) -> Dict[str, Union[List[Document], Dict[str, Any]]]

异步运行 Ollama 模型以计算提供的文档的嵌入。

参数:

documents: 要转换为嵌入的文档。
generation_kwargs: 传递给 Ollama 生成端的可选参数，例如 temperature、top_p 等。请参阅 Ollama 文档。

返回值:

包含以下键的字典

documents: 附加了嵌入信息的文档
meta: 在嵌入过程中收集的元数据

模块 haystack_integrations.components.embedders.ollama.text_embedder

OllamaTextEmbedder

计算一系列文档的嵌入，并将获得的向量存储在每个文档的 embedding 字段中。它使用与 Ollama 库兼容的嵌入模型。

使用示例

from haystack_integrations.components.embedders.ollama import OllamaTextEmbedder

embedder = OllamaTextEmbedder()
result = embedder.run(text="What do llamas say once you have thanked them? No probllama!")
print(result['embedding'])

OllamaTextEmbedder.init

def __init__(model: str = "nomic-embed-text",
             url: str = "https://:11434",
             generation_kwargs: Optional[Dict[str, Any]] = None,
             timeout: int = 120,
             keep_alive: Optional[Union[float, str]] = None)

参数:

model: 要使用的模型的名称。该模型应在正在运行的 Ollama 实例中可用。
url: 正在运行的 Ollama 实例的 URL。
generation_kwargs: 传递给 Ollama 生成端点的可选参数，例如 temperature、top_p 等。请参阅 Ollama 文档中的可用参数。
timeout: 从 Ollama API 抛出超时错误之前的秒数。
keep_alive: 控制模型在请求后在内存中加载多长时间的选项。如果未设置，它将使用 Ollama 的默认值（5 分钟）。该值可以设置为：
持续时间字符串（例如“10m”或“24h”）
秒数（例如 3600）
任何负数，它将使模型加载到内存中（例如 -1 或“-1m”）
'0'，它将在生成响应后立即卸载模型。

OllamaTextEmbedder.run

@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(
    text: str,
    generation_kwargs: Optional[Dict[str, Any]] = None
) -> Dict[str, Union[List[float], Dict[str, Any]]]

运行 Ollama 模型以计算提供的文本的嵌入。

参数:

text: 要转换为嵌入的文本。
generation_kwargs: 传递给 Ollama 生成端的可选参数，例如 temperature、top_p 等。请参阅 Ollama 文档。

返回值:

包含以下键的字典

embedding: 计算出的嵌入
meta: 在嵌入过程中收集的元数据

OllamaTextEmbedder.run_async

@component.output_types(embedding=List[float], meta=Dict[str, Any])
async def run_async(
    text: str,
    generation_kwargs: Optional[Dict[str, Any]] = None
) -> Dict[str, Union[List[float], Dict[str, Any]]]

异步运行 Ollama 模型以计算提供的文本的嵌入。

参数:

text: 要转换为嵌入的文本。
generation_kwargs: 传递给 Ollama 生成端的可选参数，例如 temperature、top_p 等。请参阅 Ollama 文档。

返回值:

包含以下键的字典

embedding: 计算出的嵌入
meta: 在嵌入过程中收集的元数据

模块 haystack_integrations.components.generators.ollama.generator

OllamaGenerator

OllamaGenerator.__init__

OllamaGenerator.to_dict

OllamaGenerator.from_dict

OllamaGenerator.run

模块 haystack_integrations.components.generators.ollama.chat.chat_generator

OllamaChatGenerator

OllamaChatGenerator.__init__

OllamaChatGenerator.to_dict

OllamaChatGenerator.from_dict

OllamaChatGenerator.run

OllamaChatGenerator.run_async

模块 haystack_integrations.components.embedders.ollama.document_embedder

OllamaDocumentEmbedder

OllamaDocumentEmbedder.__init__

OllamaDocumentEmbedder.run

OllamaDocumentEmbedder.run_async

模块 haystack_integrations.components.embedders.ollama.text_embedder

OllamaTextEmbedder

OllamaTextEmbedder.__init__

OllamaTextEmbedder.run

OllamaTextEmbedder.run_async

OllamaGenerator.init

OllamaChatGenerator.init

OllamaDocumentEmbedder.init

OllamaTextEmbedder.init