AzureAISearchDocumentStore - Haystack 文档


API 参考	Azure AI Search
GitHub 链接	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/azure_ai_search

Azure AI Search 是一个企业级搜索和检索系统，用于在 Azure 上构建基于 RAG 的应用程序，并原生集成 LLM。

AzureAISearchDocumentStore 支持语义重排序以及元数据/内容过滤。Document Store 对于各种任务非常有用，例如生成知识库洞察（目录或文档搜索）、信息发现（数据探索）、RAG 和自动化。

初始化

此集成要求您拥有一个有效的 Azure 订阅，并已部署 Azure AI Search 服务。

拥有订阅后，请安装azure-ai-search-haystack 集成

pip install azure-ai-search-haystack

要使用AzureAISearchDocumentStore，您需要提供一个搜索服务终结点作为AZURE_AI_SEARCH_ENDPOINT，并提供一个 API 密钥作为AZURE_AI_SEARCH_API_KEY 用于身份验证。如果未提供 API 密钥，则DefaultAzureCredential 将尝试通过浏览器进行身份验证。

在初始化期间，Document Store 将检索给定index_name 的现有搜索索引，或者在索引不存在时创建一个新索引。请注意，AzureAISearchDocumentStore 的一个限制是 Azure 搜索索引的字段在创建后不能通过 API 修改。因此，除了默认字段之外的任何其他字段都必须在 Document Store 初始化期间作为metadata_fields 提供。但是，如果需要，可以使用 Azure AI portal 来修改字段，而无需删除索引。

建议在运行以下示例之前，通过AZURE_AI_SEARCH_API_KEY 和AZURE_AI_SEARCH_ENDPOINT 传递身份验证数据。

from haystack_integrations.document_stores.azure_ai_search import AzureAISearchDocumentStore
from haystack import Document

document_store = AzureAISearchDocumentStore(index_name="haystack-docs")
document_store.write_documents([
    Document(content="This is the first document."),
    Document(content="This is the second document.")
])
print(document_store.count_documents())

📘
延迟注意事项
由于 Azure 搜索索引的延迟，如果立即执行，示例中返回的文档计数可能为零。为确保结果准确，在从搜索索引检索文档时，请注意此延迟。

您可以在AzureAISearchDocumentStore 中启用语义重排序，通过在初始化期间的index_creation_kwargs 中提供 SemanticSearch 配置，并从某个 Retriever 调用它。有关更多信息，请参阅关于此功能的 Azure AI 教程。

支持的 Retrievers

Haystack Azure AI Search 集成包含三个 Retriever 组件。每个 Retriever 都利用 Azure AI Search API，您可以选择最适合您管道的 Retriever。

AzureAISearchEmbeddingRetriever：此 Retriever 接受单个查询的嵌入作为输入，并返回匹配文档的列表。查询必须事先进行嵌入，这可以使用 Embedder 组件完成。
AzureAISearchBM25Retriever：一个基于关键字的 Retriever，用于从 Azure AI Search 索引中检索与查询匹配的文档。
AzureAISearchHybridRetriever：此 Retriever 结合了基于嵌入的检索和关键字搜索，以在搜索索引中查找匹配的文档，从而获得更相关的结果。

初始化

📘延迟注意事项

支持的 Retrievers

📘
延迟注意事项