SentenceTransformersDiversityRanker

这是一个基于 Sentence Transformers 的多样性排序器。


pipeline 中的最常见位置	在查询管道中，在返回文档列表的组件（例如 Retriever）之后
必需的初始化变量	"token": Hugging Face API 令牌。可以通过`HF_API_TOKEN` 或`HF_TOKEN` 环境变量设置。
强制运行变量	“documents”：文档列表 “query”: 查询字符串
输出变量	“documents”：文档列表
API 参考	Rankers (排序器)
GitHub 链接	https://github.com/deepset-ai/haystack/blob/main/haystack/components/rankers/sentence_transformers_diversity.py

概述

该SentenceTransformersDiversityRanker 使用一种排序算法来最大化文档整体多样性的顺序。它根据文档与查询的相似度来排序文档列表。该组件使用预训练的 Sentence Transformers 模型来嵌入查询和文档。

此排序器的默认模型是sentence-transformers/all-MiniLM-L6-v2.

您可以选择性地设置top_k 参数，该参数指定要返回的文档的最大数量。如果未设置此参数，该组件将返回接收到的所有文档。

在我们的API 参考中查找可选的初始化参数的完整列表。

用法

单独使用

from haystack import Document
from haystack.components.rankers import SentenceTransformersDiversityRanker

ranker = SentenceTransformersDiversityRanker(model="sentence-transformers/all-MiniLM-L6-v2", similarity="cosine")
ranker.warm_up()

docs = [Document(content="Regular Exercise"), Document(content="Balanced Nutrition"), Document(content="Positive Mindset"), 
        Document(content="Eating Well"), Document(content="Doing physical activities"), Document(content="Thinking positively")]
        
query = "How can I maintain physical fitness?"
output = ranker.run(query=query, documents=docs)
docs = output["documents"]

print(docs)

在 pipeline 中

from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.rankers import SentenceTransformersDiversityRanker

docs = [Document(content="The iconic Eiffel Tower is a symbol of Paris"),
        Document(content="Visit Luxembourg Gardens for a haven of tranquility in Paris"),
        Document(content="The Pont Alexandre III bridge in Paris is famous for its Beaux-Arts style")]
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)

retriever = InMemoryBM25Retriever(document_store = document_store)
ranker = SentenceTransformersDiversityRanker(meta_field="rating")

document_ranker_pipeline = Pipeline()
document_ranker_pipeline.add_component(instance=retriever, name="retriever")
document_ranker_pipeline.add_component(instance=ranker, name="ranker")

document_ranker_pipeline.connect("retriever.documents", "ranker.documents")

query = "Most famous iconic sight in Paris"
document_ranker_pipeline.run(data={"retriever": {"query": query, "top_k": 3}, 
                                   "ranker": {"query": query, "top_k": 2}})

更新于大约 1 年前