模块 haystack_integrations.components.retrievers.azure_ai_search.embedding_retriever

AzureAISearchEmbeddingRetriever

使用向量相似度度量从 AzureAISearchDocumentStore 中检索文档。必须连接到 AzureAISearchDocumentStore 才能运行。

AzureAISearchEmbeddingRetriever.init

def __init__(*,
             document_store: AzureAISearchDocumentStore,
             filters: Optional[Dict[str, Any]] = None,
             top_k: int = 10,
             filter_policy: Union[str, FilterPolicy] = FilterPolicy.REPLACE,
             **kwargs: Any)

创建 AzureAISearchEmbeddingRetriever 组件。

参数:

document_store: 要与 Retriever 一起使用的 AzureAISearchDocumentStore 实例。
filters: 在从 Document Store 获取文档时应用的过滤器。
top_k: 要返回的最大文档数。
filter_policy: 确定如何应用过滤器的策略。
kwargs: 传递给 Azure AI 搜索终结点的其他关键字参数。一些支持的参数： -query_type: 指示要执行的查询类型的字符串。可能的值为“simple”、“full”和“semantic”。 -semantic_configuration_name: 在处理语义查询时要使用的语义配置的名称。有关参数的更多信息，请参阅官方 Azure AI Search 文档。

AzureAISearchEmbeddingRetriever.to_dict

def to_dict() -> Dict[str, Any]

将组件序列化为字典。

返回值:

包含序列化数据的字典。

AzureAISearchEmbeddingRetriever.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AzureAISearchEmbeddingRetriever"

从字典反序列化组件。

参数:

data: 要反序列化的字典。

返回值:

反序列化后的组件。

AzureAISearchEmbeddingRetriever.run

@component.output_types(documents=List[Document])
def run(query_embedding: List[float],
        filters: Optional[Dict[str, Any]] = None,
        top_k: Optional[int] = None) -> Dict[str, List[Document]]

从 AzureAISearchDocumentStore 检索文档。

参数:

query_embedding: 表示查询嵌入的浮点数列表。
filters: 应用于检索到的文档的过滤器。运行时过滤器的应用方式取决于初始化检索器时选择的filter_policy: 在初始化 Retriever 时选择的。有关更多详细信息，请参阅__init__ 方法的文档字符串。
top_k: 要检索的文档的最大数量。

返回值:

包含以下键的字典

documents: 从 AzureAISearchDocumentStore 检索到的文档列表。

模块 haystack_integrations.document_stores.azure_ai_search.document_store

AzureAISearchDocumentStore

AzureAISearchDocumentStore.init

def __init__(*,
             api_key: Secret = Secret.from_env_var("AZURE_AI_SEARCH_API_KEY",
                                                   strict=False),
             azure_endpoint: Secret = Secret.from_env_var(
                 "AZURE_AI_SEARCH_ENDPOINT", strict=True),
             index_name: str = "default",
             embedding_dimension: int = 768,
             metadata_fields: Optional[Dict[str, Union[SearchField,
                                                       type]]] = None,
             vector_search_configuration: Optional[VectorSearch] = None,
             include_search_metadata: bool = False,
             **index_creation_kwargs: Any)

使用 Azure AI Search 作为后端的文档存储。

作为后端的。

参数:

azure_endpoint: Azure AI Search 服务的 URL 端点。
api_key: 用于身份验证的 API 密钥。
index_name: Azure AI Search 中的索引名称，如果不存在则会创建。
embedding_dimension: 嵌入的维度。
metadata_fields: 一个字典，将元数据字段名称映射到其对应的字段定义。每个字段都可以定义为
一个 SearchField 对象，用于指定详细的字段配置，如类型、可搜索性和可过滤性
一个 Python 类型（str, bool, int, float，或datetime），用于创建一个简单的可过滤字段

这些字段在创建搜索索引时自动添加。示例：metadata_fields={ "Title": SearchField( name="Title", type="Edm.String", searchable=True, filterable=True ), "Pages": int }

vector_search_configuration: 与向量搜索相关的配置选项。默认配置使用 HNSW 算法和余弦相似度来处理向量搜索。
include_search_metadata: 是否在返回的文档中包含 Azure AI Search 元数据字段。设置为 True 时，返回的文档的meta 字段将包含 @search.score、@search.reranker_score、@search.highlights、@search.captions 以及 Azure AI Search 返回的其他字段。
index_creation_kwargs: 可选关键字参数，将在索引创建期间传递给SearchIndex 类。一些支持的参数： -semantic_search: 定义搜索索引的语义配置。此参数对于在索引中启用语义搜索功能是必需的。 -similarity: 在对匹配搜索查询的文档进行评分和排名时使用的相似度算法类型。相似度算法只能在创建索引时定义，并且不能修改现有索引。

有关参数的更多信息，请参阅官方 Azure AI Search 文档。

AzureAISearchDocumentStore.to_dict

def to_dict() -> Dict[str, Any]

将组件序列化为字典。

返回值:

包含序列化数据的字典。

AzureAISearchDocumentStore.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AzureAISearchDocumentStore"

从字典反序列化组件。

参数:

data: 要反序列化的字典。

返回值:

反序列化后的组件。

AzureAISearchDocumentStore.count_documents

def count_documents() -> int

返回搜索索引中存在的文档数量。

返回值:

列表的检索文档。

AzureAISearchDocumentStore.write_documents

def write_documents(documents: List[Document],
                    policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int

将提供的文档写入搜索索引。

参数:

documents: 要写入索引的文档。
policy: 用于确定如何处理重复项的策略。

引发:

ValueError: 如果文档不是 Document 类型。
TypeError: 如果文档 ID 不是字符串。

返回值:

添加到索引的文档数量。

AzureAISearchDocumentStore.delete_documents

def delete_documents(document_ids: List[str]) -> None

从搜索索引中删除所有具有匹配 document_ids 的文档。

参数:

document_ids: 要删除的文档的 ID。

AzureAISearchDocumentStore.search_documents

def search_documents(search_text: str = "*",
                     top_k: int = 10) -> List[Document]

返回匹配 provided search_text 的所有文档。

如果 search_text 为 None，则返回所有文档。

参数:

search_text: 要在 Document 列表中搜索的文本。
top_k: 要返回的最大文档数。

返回值:

匹配给定 search_text 的 Documents 列表。

AzureAISearchDocumentStore.filter_documents

def filter_documents(
        filters: Optional[Dict[str, Any]] = None) -> List[Document]

返回匹配 provided filters 的文档。

过滤器应以字典形式提供，支持按元数据进行过滤。有关过滤器的详细信息，请参阅元数据过滤文档。

参数:

filters: 要应用于文档列表的过滤器。

返回值:

与给定过滤器匹配的文档列表。

模块 haystack_integrations.components.retrievers.azure_ai_search.embedding_retriever

AzureAISearchEmbeddingRetriever

AzureAISearchEmbeddingRetriever.__init__

AzureAISearchEmbeddingRetriever.to_dict

AzureAISearchEmbeddingRetriever.from_dict

AzureAISearchEmbeddingRetriever.run

模块 haystack_integrations.document_stores.azure_ai_search.document_store

AzureAISearchDocumentStore

AzureAISearchDocumentStore.__init__

AzureAISearchDocumentStore.to_dict

AzureAISearchDocumentStore.from_dict

AzureAISearchDocumentStore.count_documents

AzureAISearchDocumentStore.write_documents

AzureAISearchDocumentStore.delete_documents

AzureAISearchDocumentStore.search_documents

AzureAISearchDocumentStore.filter_documents

AzureAISearchEmbeddingRetriever.init

AzureAISearchDocumentStore.init