OllamaTextEmbedder

此组件使用与 Ollama 库兼容的嵌入模型来计算字符串的嵌入。


pipeline 中的最常见位置	在查询/RAG 管道中的嵌入检索器之前
强制运行变量	“text”: 一个字符串
输出变量	“embedding”：浮点数列表（向量） “meta”：字符串元数据字典
API 参考	Ollama
GitHub 链接	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/ollama

OllamaDocumentEmbedder 计算文档列表的嵌入，并将获得的向量存储在每个文档的嵌入字段中。它使用与 Ollama 库兼容的嵌入模型。

此组件计算的向量对于对文档集合执行 embedding 检索至关重要。在检索时，表示查询的向量会与文档的向量进行比较，以找到最相似或最相关的文档。

概述

OllamaTextEmbedder 应用于嵌入字符串。要嵌入文档列表，请使用 OllamaDocumentEmbedder。

该组件使用https://:11434 作为默认 URL，因为大多数可用设置（Mac、Linux、Docker）默认使用端口 11434。

兼容的模型

除非在初始化此组件时另有指定，否则默认嵌入模型是“nomic-embed-text”。请参阅 Ollama 库中的其他可用预构建模型。要加载您自己的自定义模型，请遵循 Ollama 的说明。

安装

要开始使用此集成与 Haystack，请使用以下命令安装软件包：

pip install ollama-haystack

请确保您有一个正在运行的 Ollama 模型（通过 Docker 容器或本地托管）。无需其他配置，因为 Ollama 内置了嵌入 API。

Embedding 元数据

大多数嵌入的元数据包含有关模型名称和类型的信息。您可以将可选参数（例如 temperature、top_p 等）传递给 Ollama 生成端点。

使用的模型名称将自动作为元数据的一部分附加。使用 nomic-embed-text 模型的示例文本结构如下所示：

{'meta': {'model': 'nomic-embed-text'}}

用法

单独使用

from haystack_integrations.components.embedders.ollama import OllamaTextEmbedder

embedder = OllamaTextEmbedder()

result = embedder.run(text="What do llamas say once you have thanked them? No probllama!")

print(result['embedding'])

在 pipeline 中

from haystack import Document
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from cohere_haystack.embedders.text_embedder import OllamaTextEmbedder
from cohere_haystack.embedders.document_embedder import OllamaDocumentEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [Document(content="My name is Wolfgang and I live in Berlin"),
             Document(content="I saw a black horse running"),
             Document(content="Germany has many big cities")]

document_embedder = OllamaDocumentEmbedder()
documents_with_embeddings = document_embedder.run(documents)['documents']
document_store.write_documents(documents_with_embeddings)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", OllamaTextEmbedder())
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "Who lives in Berlin?"

result = query_pipeline.run({"text_embedder":{"text": query}})

print(result['retriever']['documents'][0])

更新于大约 1 年前