组件

组件是管道的构建块。它们执行诸如预处理、检索或摘要文本等任务，同时将查询通过管道的不同分支进行路由。此页面总结了 Haystack 中所有可用的组件类型。

组件使用管道相互连接，它们的功能类似于可轻松互相替换的构建块。一个组件可以接受其他组件的选定输出作为输入。您也可以在调用时为组件提供输入pipeline.run().

独立或在管道中

您可以在管道中集成组件以执行特定任务。但您也可以在管道之外独立使用其中一些组件。例如，您可以独立运行DocumentWriter，将文档写入文档存储。要查看如何使用组件以及它是否可以在管道之外使用，请查阅组件文档页面上的“用法”部分。

每个组件都有一个run() 方法。当您在管道中连接组件，并通过调用Pipeline.run() 运行管道时，它会按顺序调用每个组件的run() 方法。

输入和输出

要在管道中连接组件，您需要知道它们接受的输入和输出的名称。一个组件的输出必须与后续组件接受的输入兼容。例如，要在管道中连接 Retriever 和 Ranker，您必须知道 Retriever 输出documents，而 Ranker 接受documents 作为输入。

每个组件文档页面顶部都列出了强制输入和输出，以便您可以快速查看它们

您也可以在组件的run() 方法的代码中查找它们。以下是TransformerSimilarityRanker:

@component.output_types(documents=List[Document]) # "documents" is the output name you need when connecting components in a pipeline
def run(self, query: str, documents: List[Document], top_k: Optional[int] = None):# "query" and "documents" are the mandatory inputs, additionally you can also specify the optional top_k parameter
"""
Returns a list of Documents ranked by their similarity to the given query.

:param query: Query string.
:param documents: List of Documents.
:param top_k: The maximum number of Documents you want the Ranker to return.
:return: List of Documents sorted by their similarity to the query with the most similar Documents appearing first.
"""

预热组件

使用大量资源的组件，如 LLM 或嵌入模型，还具有一个warm_up() 方法。当您独立运行这样的组件时，您必须在初始化它之后，但在运行它之前运行warm_up()，如下所示

from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
doc = Document(content="I love pizza!")
doc_embedder = SentenceTransformersDocumentEmbedder() # First, initialize the component
doc_embedder.warm_up() # Then, warm it up to load the model

result = doc_embedder.run([doc]) # And finally, run it
print(result['documents'][0].embedding)

如果您在管道中使用具有warm_up() 方法的组件，您无需额外操作。管道会在运行前负责预热它。

该warm_up() 方法是保持init() 方法轻量级和验证快速的好方法。（管道中的验证发生在连接组件之后，但在预热和运行之前。）

更新于 1 年前