pipeline 中的最常见位置	在查询管道中，在返回文档列表的组件（例如 Retriever）之后
强制运行变量	“documents”：文档列表
输出变量	“documents”：文档列表
API 参考	Rankers (排序器)
GitHub 链接	https://github.com/deepset-ai/haystack/blob/main/haystack/components/rankers/lost_in_the_middle.py

概述

该LostInTheMiddleRanker 根据《Lost in the Middle: How Language Models Use Long Contexts》研究论文中描述的“Lost in the Middle”顺序重新排序文档。它旨在将段落布局到 LLM 上下文中，以便相关的段落位于输入上下文的开头或结尾，而最不相关的信息位于上下文的中间。这种重新排序对于发送非常长上下文到 LLM 时很有用，因为当前模型更关注长输入上下文的开头和结尾。

与其它 Ranker 不同，LostInTheMiddleRanker 假定输入文档已按相关性排序，并且不需要查询作为输入。它通常用作构建 LLM 提示的最后一个组件，用于准备 LLM 的输入上下文。

参数

如果在运行组件时指定word_count_threshold，Ranker 将包含直到添加下一个文档将超出给定阈值的点为止的所有文档。最后一个超出阈值的文档将被包含在结果文档列表中，但所有后续文档都将被丢弃。

您也可以指定top_k 参数来设置要返回的文档的最大数量。

用法

单独使用

from haystack import Document
from haystack.components.rankers import LostInTheMiddleRanker

ranker = LostInTheMiddleRanker()
docs = [Document(content="Paris"), 
		Document(content="Berlin"), 
		Document(content="Madrid")]
result = ranker.run(documents=docs)

for doc in result["documents"]:
    print(doc.content)

在 pipeline 中

请注意，此示例需要 OpenAI 密钥才能运行。

from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.rankers import LostInTheMiddleRanker
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
from haystack.dataclasses import ChatMessage

# Define prompt template
prompt_template = [
    ChatMessage.from_system("You are a helpful assistant."),
    ChatMessage.from_user(
        "Given these documents, answer the question.\nDocuments:\n"
        "{% for doc in documents %}{{ doc.content }}{% endfor %}\n"
        "Question: {{query}}\nAnswer:"
    )
]

# Define documents
docs = [
    Document(content="Paris is in France..."),
    Document(content="Berlin is in Germany..."),
    Document(content="Lyon is in France...")
]

document_store = InMemoryDocumentStore()
document_store.write_documents(docs)

retriever = InMemoryBM25Retriever(document_store=document_store)
ranker = LostInTheMiddleRanker(word_count_threshold=1024)
prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables={"query", "documents"})
generator = OpenAIChatGenerator()

p = Pipeline()
p.add_component(instance=retriever, name="retriever")
p.add_component(instance=ranker, name="ranker")
p.add_component(instance=prompt_builder, name="prompt_builder")
p.add_component(instance=generator, name="llm")

p.connect("retriever.documents", "ranker.documents")
p.connect("ranker.documents", "prompt_builder.documents")
p.connect("prompt_builder.messages", "llm.messages")

p.run({
    "retriever": {"query": "What cities are in France?", "top_k": 3},
    "prompt_builder": {"query": "What cities are in France?"}
})