文档API 参考📓 教程🧑‍🍳 食谱🤝 集成💜 Discord🎨 Studio
文档

JinaReaderConnector

使用 Jina AI 的 Reader API 和 Haystack。

pipeline 中的最常见位置作为管道中的第一个组件,将结果文档传递给下游
必需的初始化变量“mode”: 阅读器的操作模式(read, search,或ground)

)“api_key”:Jina API 密钥。可以通过JINA_API_KEY 环境变量设置。
强制运行变量“query”: 查询字符串
输出变量“document”:文档列表
API 参考Jina
GitHub 链接https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/jina

概述

JinaReaderConnector 与 Jina AI 的 Reader API 交互,以处理查询并输出文档。

初始化组件时,您需要选择以下一种操作模式:

  • read:处理 URL 并提取文本内容。
  • search:搜索网络并从最相关的页面返回文本内容。
  • ground:使用事实核查引擎执行事实核查。

有关这些模式的更多信息,请参阅 Jina Reader 文档

您还可以通过组件的json_response 参数来控制 Jina Reader API 的响应格式。

  • True(默认)请求 JSON 响应,其中包含结构化元数据增强的文档。
  • False 请求原始响应,生成具有最少元数据的单个文档。

授权

该组件默认使用默认情况下,可以通过 JINA_API_KEY 环境变量。否则,您可以在初始化时使用 api_key 按如下方式传递 Jina API 密钥:

ranker = JinaRanker(api_key=Secret.from_token("<your-api-key>"))

要获取您的 API 密钥,请访问 Jina AI 的 网站

安装

要开始使用此集成与 Haystack,请使用以下命令安装软件包:

pip install jina-haystack

用法

单独使用

Read 模式

from haystack_integrations.components.connectors.jina import JinaReaderConnector

reader = JinaReaderConnector(mode="read")
query = "https://example.com"
result = reader.run(query=query)

print(result)
# {'documents': [Document(id=fa3e51e4ca91828086dca4f359b6e1ea2881e358f83b41b53c84616cb0b2f7cf,
# content: 'This domain is for use in illustrative examples in documents. You may use this domain in literature ...',
# meta: {'title': 'Example Domain', 'description': '', 'url': 'https://example.com/', 'usage': {'tokens': 42}})]}

Search 模式

from haystack_integrations.components.connectors.jina import JinaReaderConnector

reader = JinaReaderConnector(mode="search")
query = "UEFA Champions League 2024"
result = reader.run(query=query)

print(result)
# {'documents': Document(id=6a71abf9955594232037321a476d39a835c0cb7bc575d886ee0087c973c95940,
# content: '2024/25 UEFA Champions League: Matches, draw, final, key dates | UEFA Champions League | UEFA.com...',
# meta: {'title': '2024/25 UEFA Champions League: Matches, draw, final, key dates',
# 'description': 'What are the match dates? Where is the 2025 final? How will the competition work?',
# 'url': 'https://www.uefa.com/uefachampionsleague/news/...',
# 'usage': {'tokens': 5581}}), ...]}

Ground 模式

from haystack_integrations.components.connectors.jina import JinaReaderConnector

reader = JinaReaderConnector(mode="ground")
query = "ChatGPT was launched in 2017"
result = reader.run(query=query)

print(result)
# {'documents': [Document(id=f0c964dbc1ebb2d6584c8032b657150b9aa6e421f714cc1b9f8093a159127f0c,
# content: 'The statement that ChatGPT was launched in 2017 is incorrect. Multiple references confirm that ChatG...',
# meta: {'factuality': 0, 'result': False, 'references': [
# {'url': 'https://en.wikipedia.org/wiki/ChatGPT',
# 'keyQuote': 'ChatGPT is a generative artificial intelligence (AI) chatbot developed by OpenAI and launched in 2022.',
# 'isSupportive': False}, ...],
# 'usage': {'tokens': 10188}})]}

在 pipeline 中

使用 search 模式查询管道

以下管道示例中,JinaReaderConnector 首先搜索相关文档,然后将这些文档与用户查询一起输入到提示模板中,最后根据检索到的上下文生成响应。

from haystack import Pipeline
from haystack.utils import Secret
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack_integrations.components.connectors.jina import JinaReaderConnector
from haystack.dataclasses import ChatMessage

reader_connector = JinaReaderConnector(mode="search")

prompt_template = [
    ChatMessage.from_system("You are a helpful assistant."),
    ChatMessage.from_user(
        "Given the information below:\n"
        "{% for document in documents %}{{ document.content }}{% endfor %}\n"
        "Answer question: {{ query }}.\nAnswer:"
    )
]

prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables={"query", "documents"})
llm = OpenAIChatGenerator(model="gpt-4o-mini", api_key=Secret.from_token("<your-api-key>"))

pipe = Pipeline()
pipe.add_component("reader_connector", reader_connector)
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)

pipe.connect("reader_connector.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.messages", "llm.messages")

query = "What is the most famous landmark in Berlin?"

result = pipe.run(data={"reader_connector": {"query": query}, "prompt_builder": {"query": query}})
print(result)

# {'llm': {'replies': ['The most famous landmark in Berlin is the **Brandenburg Gate**. It is considered the symbol of the city and represents reunification.'], 'meta': [{'model': 'gpt-4o-mini-2024-07-18', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 27, 'prompt_tokens': 4479, 'total_tokens': 4506, 'completion_tokens_details': CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), 'prompt_tokens_details': PromptTokensDetails(audio_tokens=0, cached_tokens=0)}}]}}

同一组件在 search 模式下也可以用于索引管道。