文档API 参考📓 教程🧑‍🍳 食谱🤝 集成💜 Discord🎨 Studio
文档

AzureAISearchBM25Retriever

一个基于关键词的 Retriever,用于从 Azure AI Search Document Store 中检索匹配查询的 Document。

一个基于关键词的 Retriever,用于从 Azure AI Search Document Store 中检索匹配查询的文档。

pipeline 中的最常见位置1. 在 RAG 管道的 PromptBuilder 之前 2. 语义搜索管道的最后一个组件 3. 在抽取式 QA 管道的 ExtractiveReader 之前
必需的初始化变量"document_store": AzureAISearchDocumentStore 的一个实例
强制运行变量"query": 一个字符串
输出变量“documents”: 文档列表(匹配查询)
API 参考Azure AI Search
GitHub 链接https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/azure_ai_search

概述

AzureAISearchBM25Retriever 是一个基于关键词的 Retriever,旨在从AzureAISearchDocumentStore 中检索与查询匹配的文档。它使用 BM25 算法,该算法计算查询和文档之间的加权词重叠来确定它们的相似性。Retriever 接受文本查询,但您也可以提供带有布尔运算符的术语组合。一些有效的查询示例可以是"pool", "pool spa",以及"pool spa +airport".

除了queryAzureAISearchBM25Retriever 接受其他可选参数,包括top_k(要检索的文档的最大数量)和filters(用于缩小搜索范围)。

如果您的搜索索引包含 语义配置,您可以启用语义排名并将其应用于 Retriever 的结果。有关更多详细信息,请参阅 Azure AI 文档

如果您想要 BM25 和向量检索的组合,请使用AzureAISearchHybridRetriever,它结合使用向量搜索和 BM25 搜索来匹配文档和查询。

用法

安装

此集成要求您拥有一个有效的 Azure 订阅,并已部署 Azure AI Search 服务。

要开始使用 Azure AI Search 和 Haystack,请使用以下命令安装包:

pip install azure-ai-search-haystack

单独使用

此 Retriever 需要AzureAISearchDocumentStore 和已索引的文档才能运行。

from haystack import Document
from haystack_integrations.components.retrievers.azure_ai_search import AzureAISearchBM25Retriever
from haystack_integrations.document_stores.azure_ai_search import AzureAISearchDocumentStore

document_store = AzureAISearchDocumentStore(index_name="haystack_docs")
documents = [Document(content="There are over 7,000 languages spoken around the world today."),
			       Document(content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors."),
			       Document(content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.")]
document_store.write_documents(documents=documents)

retriever = AzureAISearchBM25Retriever(document_store=document_store)
retriever.run(query="How many languages are spoken around the world today?")

在 RAG 管道中

下面的示例展示了如何在 RAG 管道中使用AzureAISearchBM25Retriever。将您的OPENAI_API_KEY 设置为环境变量,然后运行以下代码


from haystack_integrations.components.retrievers.azure_ai_search import AzureAISearchBM25Retriever
from haystack_integrations.document_stores.azure_ai_search import AzureAISearchDocumentStore

from haystack import Document
from haystack import Pipeline
from haystack.components.builders.answer_builder import AnswerBuilder
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.document_stores.types import DuplicatePolicy

import os
api_key = os.environ['OPENAI_API_KEY']

# Create a RAG query pipeline
prompt_template = """
    Given these documents, answer the question.\nDocuments:
    {% for doc in documents %}
        {{ doc.content }}
    {% endfor %}

    \nQuestion: {{question}}
    \nAnswer:
    """

document_store = AzureAISearchDocumentStore(index_name="haystack-docs")

# Add Documents
documents = [Document(content="There are over 7,000 languages spoken around the world today."),
			       Document(content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors."),
			       Document(content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.")]

# policy param is optional, as AzureAISearchDocumentStore has a default policy of DuplicatePolicy.OVERWRITE
document_store.write_documents(documents=documents, policy=DuplicatePolicy.OVERWRITE)

retriever = AzureAISearchBM25Retriever(document_store=document_store)
rag_pipeline = Pipeline()
rag_pipeline.add_component(name="retriever", instance=retriever)
rag_pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
rag_pipeline.add_component(instance=OpenAIGenerator(), name="llm")
rag_pipeline.add_component(instance=AnswerBuilder(), name="answer_builder")
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm")
rag_pipeline.connect("llm.replies", "answer_builder.replies")
rag_pipeline.connect("llm.meta", "answer_builder.meta")
rag_pipeline.connect("retriever", "answer_builder.documents")

question = "Tell me something about languages?"
result = rag_pipeline.run(
            {
                "retriever": {"query": question},
                "prompt_builder": {"question": question},
                "answer_builder": {"query": question},
            }
        )
print(result['answer_builder']['answers'][0])