AzureAISearchBM25Retriever
一个基于关键词的 Retriever,用于从 Azure AI Search Document Store 中检索匹配查询的 Document。
一个基于关键词的 Retriever,用于从 Azure AI Search Document Store 中检索匹配查询的文档。
| pipeline 中的最常见位置 | 1. 在 RAG 管道的 PromptBuilder 之前 2. 语义搜索管道的最后一个组件 3. 在抽取式 QA 管道的 ExtractiveReader 之前 |
| 必需的初始化变量 | "document_store": AzureAISearchDocumentStore 的一个实例 |
| 强制运行变量 | "query": 一个字符串 |
| 输出变量 | “documents”: 文档列表(匹配查询) |
| API 参考 | Azure AI Search |
| GitHub 链接 | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/azure_ai_search |
概述
该AzureAISearchBM25Retriever 是一个基于关键词的 Retriever,旨在从AzureAISearchDocumentStore 中检索与查询匹配的文档。它使用 BM25 算法,该算法计算查询和文档之间的加权词重叠来确定它们的相似性。Retriever 接受文本查询,但您也可以提供带有布尔运算符的术语组合。一些有效的查询示例可以是"pool", "pool spa",以及"pool spa +airport".
除了query,AzureAISearchBM25Retriever 接受其他可选参数,包括top_k(要检索的文档的最大数量)和filters(用于缩小搜索范围)。
如果您的搜索索引包含 语义配置,您可以启用语义排名并将其应用于 Retriever 的结果。有关更多详细信息,请参阅 Azure AI 文档。
如果您想要 BM25 和向量检索的组合,请使用AzureAISearchHybridRetriever,它结合使用向量搜索和 BM25 搜索来匹配文档和查询。
用法
安装
此集成要求您拥有一个有效的 Azure 订阅,并已部署 Azure AI Search 服务。
要开始使用 Azure AI Search 和 Haystack,请使用以下命令安装包:
pip install azure-ai-search-haystack
单独使用
此 Retriever 需要AzureAISearchDocumentStore 和已索引的文档才能运行。
from haystack import Document
from haystack_integrations.components.retrievers.azure_ai_search import AzureAISearchBM25Retriever
from haystack_integrations.document_stores.azure_ai_search import AzureAISearchDocumentStore
document_store = AzureAISearchDocumentStore(index_name="haystack_docs")
documents = [Document(content="There are over 7,000 languages spoken around the world today."),
Document(content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors."),
Document(content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.")]
document_store.write_documents(documents=documents)
retriever = AzureAISearchBM25Retriever(document_store=document_store)
retriever.run(query="How many languages are spoken around the world today?")
在 RAG 管道中
下面的示例展示了如何在 RAG 管道中使用AzureAISearchBM25Retriever。将您的OPENAI_API_KEY 设置为环境变量,然后运行以下代码
from haystack_integrations.components.retrievers.azure_ai_search import AzureAISearchBM25Retriever
from haystack_integrations.document_stores.azure_ai_search import AzureAISearchDocumentStore
from haystack import Document
from haystack import Pipeline
from haystack.components.builders.answer_builder import AnswerBuilder
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.document_stores.types import DuplicatePolicy
import os
api_key = os.environ['OPENAI_API_KEY']
# Create a RAG query pipeline
prompt_template = """
Given these documents, answer the question.\nDocuments:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
\nQuestion: {{question}}
\nAnswer:
"""
document_store = AzureAISearchDocumentStore(index_name="haystack-docs")
# Add Documents
documents = [Document(content="There are over 7,000 languages spoken around the world today."),
Document(content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors."),
Document(content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.")]
# policy param is optional, as AzureAISearchDocumentStore has a default policy of DuplicatePolicy.OVERWRITE
document_store.write_documents(documents=documents, policy=DuplicatePolicy.OVERWRITE)
retriever = AzureAISearchBM25Retriever(document_store=document_store)
rag_pipeline = Pipeline()
rag_pipeline.add_component(name="retriever", instance=retriever)
rag_pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
rag_pipeline.add_component(instance=OpenAIGenerator(), name="llm")
rag_pipeline.add_component(instance=AnswerBuilder(), name="answer_builder")
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm")
rag_pipeline.connect("llm.replies", "answer_builder.replies")
rag_pipeline.connect("llm.meta", "answer_builder.meta")
rag_pipeline.connect("retriever", "answer_builder.documents")
question = "Tell me something about languages?"
result = rag_pipeline.run(
{
"retriever": {"query": question},
"prompt_builder": {"question": question},
"answer_builder": {"query": question},
}
)
print(result['answer_builder']['answers'][0])
更新于 10 个月前
