pipeline 中的最常见位置	在查询管道中的`ChatPromptBuilder` 之前
强制运行变量	"documents": 要处理的文档列表。每个文档的元数据至少应包含一个 'file_path_meta_field' 键。PDF 文档还需要一个 'page_number' 键来指定要转换的页面。
输出变量	"image_contents": 一个列表，包含`ImageContent` 对象
API 参考	图像转换器
GitHub 链接	https://github.com/deepset-ai/haystack/blob/main/haystack/components/converters/image/document_to_image.py

概述

DocumentToImageContent 处理包含图像或 PDF 文件路径的文档列表，并将其转换为ImageContent 对象。

对于图像，它直接读取并编码文件。
对于 PDF，它提取指定的页面（通过元数据中的page_number）并将其转换为图像。

默认情况下，它会在file_path 元数据字段中查找文件路径。您可以使用file_path_meta_field 参数自定义此设置。的root_path 允许您为文件解析指定一个通用的基目录。

此组件通常在查询管道中，紧接在ChatPromptBuilder 之前使用，当您想将图像添加到用户提示中时。

如果如果提供了size，则会调整图像大小，同时保持纵横比。这可以减小文件大小、内存使用量和处理时间，这对于处理具有分辨率限制的模型或传输图像到远程服务很有益。

用法

单独使用

from haystack import Document
from haystack.components.converters.image.document_to_image import DocumentToImageContent

converter = DocumentToImageContent(
    file_path_meta_field="file_path",
    root_path="/data/documents",
    detail="high",
    size=(800, 600)
)

documents = [
    Document(content="Photo of a mountain", meta={"file_path": "mountain.jpg"}),
    Document(content="First page of a report", meta={"file_path": "report.pdf", "page_number": 1})
]

result = converter.run(documents)
image_contents = result["image_contents"]
print(image_contents)

# [
#     ImageContent(
#         base64_image="/9j/4A...", mime_type="image/jpeg", detail="high",
#         meta={"file_path": "mountain.jpg"}
#     ),
#     ImageContent(
#         base64_image="/9j/4A...", mime_type="image/jpeg", detail="high",
#         meta={"file_path": "report.pdf", "page_number": 1}
#     )
# ]

在 pipeline 中

您可以使用DocumentToImageContent 在多模态索引管道中，在传递给 Embedder 或字幕模型之前使用。

from haystack import Document, Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.converters.image.document_to_image import DocumentToImageContent

# Query pipeline
pipeline = Pipeline()
pipeline.add_component("image_converter", DocumentToImageContent(detail="auto"))
pipeline.add_component(
    "chat_prompt_builder",
    ChatPromptBuilder(
        required_variables=["question"],
		    template="""{% message role="system" %}
You are a friendly assistant that answers questions based on provided images.
{% endmessage %}

{%- message role="user" -%}
Only provide an answer to the question using the images provided.

Question: {{ question }}
Answer:

{%- for img in image_contents -%}
  {{ img | templatize_part }}
{%- endfor -%}
{%- endmessage -%}
""",
    )
)
pipeline.add_component("llm", OpenAIChatGenerator(model="gpt-4o-mini"))

pipeline.connect("image_converter", "chat_prompt_builder.image_contents")
pipeline.connect("chat_prompt_builder", "llm")

documents = [
    Document(content="Cat image", meta={"file_path": "cat.jpg"}),
    Document(content="Doc intro", meta={"file_path": "paper.pdf", "page_number": 1}),
]

result = pipeline.run(
    data={
        "image_converter": {"documents": documents},
        "chat_prompt_builder": {"question": "What color is the cat?"}
    }
)
print(result)

# {
# "llm": {
#     "replies": [
#         ChatMessage(
#             _role=<ChatRole.ASSISTANT: 'assistant'>,
#             _content=[TextContent(text="The cat is orange with some black.")],
#             _name=None,
#             _meta={
#                 "model": "gpt-4o-mini-2024-07-18",
#                 "index": 0,
#                 "finish_reason": "stop",
#                 "usage": {...},
#             },
#         )
#     ]
# }
# }

其他参考资料

🧑‍🍳 食谱：M 模态简介