JinaReaderConnector
使用 Jina AI 的 Reader API 和 Haystack。
| pipeline 中的最常见位置 | 作为管道中的第一个组件,将结果文档传递给下游 |
| 必需的初始化变量 | “mode”: 阅读器的操作模式(read, search,或ground))“api_key”:Jina API 密钥。可以通过 JINA_API_KEY 环境变量设置。 |
| 强制运行变量 | “query”: 查询字符串 |
| 输出变量 | “document”:文档列表 |
| API 参考 | Jina |
| GitHub 链接 | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/jina |
概述
JinaReaderConnector 与 Jina AI 的 Reader API 交互,以处理查询并输出文档。
初始化组件时,您需要选择以下一种操作模式:
read:处理 URL 并提取文本内容。search:搜索网络并从最相关的页面返回文本内容。ground:使用事实核查引擎执行事实核查。
有关这些模式的更多信息,请参阅 Jina Reader 文档。
您还可以通过组件的json_response 参数来控制 Jina Reader API 的响应格式。
True(默认)请求 JSON 响应,其中包含结构化元数据增强的文档。False请求原始响应,生成具有最少元数据的单个文档。
授权
该组件默认使用默认情况下,可以通过 JINA_API_KEY 环境变量。否则,您可以在初始化时使用 api_key 按如下方式传递 Jina API 密钥:
ranker = JinaRanker(api_key=Secret.from_token("<your-api-key>"))
要获取您的 API 密钥,请访问 Jina AI 的 网站。
安装
要开始使用此集成与 Haystack,请使用以下命令安装软件包:
pip install jina-haystack
用法
单独使用
Read 模式
from haystack_integrations.components.connectors.jina import JinaReaderConnector
reader = JinaReaderConnector(mode="read")
query = "https://example.com"
result = reader.run(query=query)
print(result)
# {'documents': [Document(id=fa3e51e4ca91828086dca4f359b6e1ea2881e358f83b41b53c84616cb0b2f7cf,
# content: 'This domain is for use in illustrative examples in documents. You may use this domain in literature ...',
# meta: {'title': 'Example Domain', 'description': '', 'url': 'https://example.com/', 'usage': {'tokens': 42}})]}
Search 模式
from haystack_integrations.components.connectors.jina import JinaReaderConnector
reader = JinaReaderConnector(mode="search")
query = "UEFA Champions League 2024"
result = reader.run(query=query)
print(result)
# {'documents': Document(id=6a71abf9955594232037321a476d39a835c0cb7bc575d886ee0087c973c95940,
# content: '2024/25 UEFA Champions League: Matches, draw, final, key dates | UEFA Champions League | UEFA.com...',
# meta: {'title': '2024/25 UEFA Champions League: Matches, draw, final, key dates',
# 'description': 'What are the match dates? Where is the 2025 final? How will the competition work?',
# 'url': 'https://www.uefa.com/uefachampionsleague/news/...',
# 'usage': {'tokens': 5581}}), ...]}
Ground 模式
from haystack_integrations.components.connectors.jina import JinaReaderConnector
reader = JinaReaderConnector(mode="ground")
query = "ChatGPT was launched in 2017"
result = reader.run(query=query)
print(result)
# {'documents': [Document(id=f0c964dbc1ebb2d6584c8032b657150b9aa6e421f714cc1b9f8093a159127f0c,
# content: 'The statement that ChatGPT was launched in 2017 is incorrect. Multiple references confirm that ChatG...',
# meta: {'factuality': 0, 'result': False, 'references': [
# {'url': 'https://en.wikipedia.org/wiki/ChatGPT',
# 'keyQuote': 'ChatGPT is a generative artificial intelligence (AI) chatbot developed by OpenAI and launched in 2022.',
# 'isSupportive': False}, ...],
# 'usage': {'tokens': 10188}})]}
在 pipeline 中
使用 search 模式查询管道
以下管道示例中,JinaReaderConnector 首先搜索相关文档,然后将这些文档与用户查询一起输入到提示模板中,最后根据检索到的上下文生成响应。
from haystack import Pipeline
from haystack.utils import Secret
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack_integrations.components.connectors.jina import JinaReaderConnector
from haystack.dataclasses import ChatMessage
reader_connector = JinaReaderConnector(mode="search")
prompt_template = [
ChatMessage.from_system("You are a helpful assistant."),
ChatMessage.from_user(
"Given the information below:\n"
"{% for document in documents %}{{ document.content }}{% endfor %}\n"
"Answer question: {{ query }}.\nAnswer:"
)
]
prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables={"query", "documents"})
llm = OpenAIChatGenerator(model="gpt-4o-mini", api_key=Secret.from_token("<your-api-key>"))
pipe = Pipeline()
pipe.add_component("reader_connector", reader_connector)
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("reader_connector.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.messages", "llm.messages")
query = "What is the most famous landmark in Berlin?"
result = pipe.run(data={"reader_connector": {"query": query}, "prompt_builder": {"query": query}})
print(result)
# {'llm': {'replies': ['The most famous landmark in Berlin is the **Brandenburg Gate**. It is considered the symbol of the city and represents reunification.'], 'meta': [{'model': 'gpt-4o-mini-2024-07-18', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 27, 'prompt_tokens': 4479, 'total_tokens': 4506, 'completion_tokens_details': CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), 'prompt_tokens_details': PromptTokensDetails(audio_tokens=0, cached_tokens=0)}}]}}
同一组件在 search 模式下也可以用于索引管道。
更新于 7 个月前
