文档API 参考📓 教程🧑‍🍳 食谱🤝 集成💜 Discord🎨 Studio
文档

GitHubRepoViewerTool

一个允许 Agent 和 ToolInvoker 导航和从 GitHub 仓库获取内容的工具。

概述

GitHubRepoViewerTool 封装了 GitHubRepoViewer 组件,为在 agent 工作流和基于工具的管道中使用提供了工具接口。

该工具根据路径类型提供不同的行为

  • 对于目录:返回一个文档列表,每个文档对应一个项目(文件和子目录),
  • 对于文件:返回一个包含文件内容的单个文档。

每个文档都包含丰富的元数据,例如路径、类型、大小和 URL。

参数

  • name可选的,默认为“repo_viewer”。指定工具的名称。
  • description可选的,并为 LLM 提供有关工具功能的上下文。
  • github_token可选的,但建议用于私有仓库或避免速率限制。
  • repo可选的,并设置一个默认的 owner/repo 格式的仓库。
  • branch可选的,默认为“main”。设置要使用的默认分支。
  • raise_on_failure可选的,默认为True。如果为 False,则错误会作为文档返回,而不是引发异常。
  • max_file_size可选的,默认为1,000,000 字节(1MB)。要获取的最大文件大小。

用法

安装 GitHub 集成即可使用GitHubRepoViewerTool:

pip install github-haystack

📘

仓库占位符

要运行以下代码片段,您需要将owner/repo 替换为您自己的 GitHub 仓库名称。

单独使用

查看仓库内容的基本用法

from haystack_integrations.tools.github import GitHubRepoViewerTool

tool = GitHubRepoViewerTool()
result = tool.invoke(
    repo="deepset-ai/haystack",
    path="haystack/components",
    branch="main"
)

print(result)
{'documents': [Document(id=..., content: 'agents', meta: {'path': 'haystack/components/agents', 'type': 'dir', 'size': 0, 'url': 'https://github.com/deepset-ai/haystack/tree/main/haystack/components/agents'}), Document(id=..., content: 'audio', meta: {'path': 'haystack/components/audio', 'type': 'dir', 'size': 0, 'url': 'https://github.com/deepset-ai/haystack/tree/main/haystack/components/audio'}),...]}

与 Agent 一起使用

您可以使用GitHubRepoViewerToolAgent 组件。Agent 会在需要时自动调用该工具来探索仓库结构和读取文件。

请注意,我们在代码示例中设置了 Agent 的state_schema 参数,以便 GitHubRepoViewerTool 可以将文档写入 state。

from typing import List

from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage, Document
from haystack.components.agents import Agent
from haystack_integrations.tools.github import GitHubRepoViewerTool

repo_tool = GitHubRepoViewerTool(name="github_repo_viewer")

agent = Agent(
    chat_generator=OpenAIChatGenerator(),
    tools=[repo_tool],
    exit_conditions=["text"],
    state_schema={"documents": {"type": List[Document]}},
)

agent.warm_up()
response = agent.run(messages=[
    ChatMessage.from_user("Can you analyze the structure of the deepset-ai/haystack repository and tell me about the main components?")
])

print(response["last_message"].text)
The `deepset-ai/haystack` repository has a structured layout that includes several important components. Here's an overview of its main parts:

1. **Directories**:
   - **`.github`**: Contains GitHub-specific configuration files and workflows.
   - **`docker`**: Likely includes Docker-related files for containerization of the Haystack application.
   - **`docs`**: Contains documentation for the Haystack project. This could include guides, API documentation, and other related resources.
   - **`e2e`**: This likely stands for "end-to-end", possibly containing tests or examples related to end-to-end functionality of the Haystack framework.
   - **`examples`**: Includes example scripts or notebooks demonstrating how to use Haystack.
   - **`haystack`**: This is likely the core source code of the Haystack framework itself, containing the main functionality and classes.
   - **`proposals`**: A directory that may contain proposals for new features or changes to the Haystack project.
   - **`releasenotes`**: Contains notes about various releases, including changes and improvements.
   - **`test`**: This directory likely contains unit tests and other testing utilities to ensure code quality and functionality.

2. **Files**:
   - **`.gitignore`**: Specifies files and directories that should be ignored by Git.
   - **`.pre-commit-config.yaml`**: Configuration file for pre-commit hooks to automate code quality checks.
   - **`CITATION.cff`**: Might include information on how to cite the repository in academic work.
   - **`code_of_conduct.txt`**: Contains the code of conduct for contributors and users of the repository.
   - **`CONTRIBUTING.md`**: Guidelines for contributing to the repository.
   - **`LICENSE`**: The license under which the project is distributed.
   - **`VERSION.txt`**: Contains versioning information for the project.
   - **`README.md`**: A markdown file that usually provides an overview of the project, installation instructions, and usage examples.
   - **`SECURITY.md`**: Contains information about the security policy of the repository.

This structure indicates a well-organized repository that follows common conventions in open-source projects, with a focus on documentation, contribution guidelines, and testing. The core functionalities are likely housed in the `haystack` directory, with additional resources provided in the other directories.