MongoDBAtlasEmbeddingRetriever
这是与 MongoDB Atlas Document Store 兼容的嵌入式检索器。
| pipeline 中的最常见位置 | 1. 在 Text Embedder 之后,在 RAG pipeline 的 PromptBuilder 之前 2. 语义搜索 pipeline 中的最后一个组件 3. 在 Text Embedder 之后,在 extractive QA pipeline 的 ExtractiveReader 之前 |
| 必需的初始化变量 | "document_store": MongoDBAtlasDocumentStore 的实例 |
| 强制运行变量 | “query_embedding”:浮点数列表 |
| 输出变量 | “documents”:文档列表 |
| API 参考 | MongoDB Atlas |
| GitHub 链接 | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/mongodb_atlas |
该MongoDBAtlasEmbeddingRetriever 是一个基于嵌入式检索器,与 MongoDBAtlasDocumentStore 兼容。它会比较查询和文档的嵌入,并根据结果从 Document Store 中检索出与查询最相关的文档。
参数
使用要将 MongoDBAtlasEmbeddingRetriever 集成到您的 NLP 系统中,请确保查询和文档的 嵌入 已经可用。您可以通过在索引 Pipeline 中添加 Document Embedder,并在查询 Pipeline 中添加 Text Embedder 来实现这一点。
除了query_embedding 之外,MongoDBAtlasEmbeddingRetriever 还接受其他可选参数,包括top_k(要检索的文档的最大数量)和filters(用于缩小搜索范围)。
用法
安装
要开始使用 MongoDB Atlas 和 Haystack,请使用以下命令安装包
pip install mongodb-atlas-haystack
单独使用
检索器需要一个实例MongoDBAtlasDocumentStore 和已索引的文档才能运行。
from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore
from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasEmbeddingRetriever
document_store = MongoDBAtlasDocumentStore()
retriever = MongoDBAtlasEmbeddingRetriever(document_store=document_store)
# example run query
retriever.run(query_embedding=[0.1]*384)
在 Pipeline 中
from haystack import Pipeline, Document
from haystack.document_stores.types import DuplicatePolicy
from haystack.components.writers import DocumentWriter
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore
from haystack_integrations.components.embedders.mongodb_atlas import MongoDBAtlasEmbeddingRetriever
# Create some example documents
documents = [
Document(content="My name is Jean and I live in Paris."),
Document(content="My name is Mark and I live in Berlin."),
Document(content="My name is Giorgio and I live in Rome."),
]
# We support many different databases. Here we load a simple and lightweight in-memory document store.
document_store = MongoDBAtlasDocumentStore()
# Define some more components
doc_writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP)
doc_embedder = SentenceTransformersDocumentEmbedder(model="intfloat/e5-base-v2")
query_embedder = SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2")
# Pipeline that ingests document for retrieval
ingestion_pipe = Pipeline()
ingestion_pipe.add_component(instance=doc_embedder, name="doc_embedder")
ingestion_pipe.add_component(instance=doc_writer, name="doc_writer")
ingestion_pipe.connect("doc_embedder.documents", "doc_writer.documents")
ingestion_pipe.run({"doc_embedder": {"documents": documents}})
# Build a RAG pipeline with a Retriever to get relevant documents to
# the query and a OpenAIGenerator interacting with LLMs using a custom prompt.
prompt_template = """
Given these documents, answer the question.\nDocuments:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
\nQuestion: {{question}}
\nAnswer:
"""
rag_pipeline = Pipeline()
rag_pipeline.add_component(instance=query_embedder, name="query_embedder")
rag_pipeline.add_component(instance=MongoDBAtlasEmbeddingRetriever(document_store=document_store), name="retriever")
rag_pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
rag_pipeline.add_component(instance=OpenAIGenerator(), name="llm")
rag_pipeline.connect("query_embedder", "retriever.query_embedding")
rag_pipeline.connect("embedding_retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm")
# Ask a question on the data you just added.
question = "Where does Mark live?"
result = rag_pipeline.run(
{
"query_embedder": {"text": question},
"prompt_builder": {"question": question},
}
)
# For details, like which documents were used to generate the answer, look into the GeneratedAnswer object
print(result["answer_builder"]["answers"])
更新于 2 个月前
