RemoteWhisperTranscriber

使用RemoteWhisperTranscriber 使用 OpenAI 的 Whisper 模型转录音频文件。


pipeline 中的最常见位置	作为索引管道中的第一个组件
必需的初始化变量	"api_key"：OpenAI API 密钥。可以通过环境变量设置`OPENAI_API_KEY`.
强制运行变量	“sources”：要转录的路径或二进制流列表
输出变量	“documents”：文档列表
API 参考	Audio
GitHub 链接	https://github.com/deepset-ai/haystack/blob/main/haystack/components/audio/whisper_remote.py

概述

RemoteWhisperTranscriber 与 OpenAI 兼容的客户端一起工作，并且不限于 OpenAI 作为提供商。例如，Groq 提供了一个可插入的替代方案，也可以使用。您可以通过以下两种方式之一设置 API 密钥：

通过api_key 初始化参数，其中使用 Secret API 解析密钥。
通过在OPENAI_API_KEY 环境变量中设置，系统将使用该环境变量来访问密钥。

from haystack.components.audio import RemoteWhisperTranscriber

transcriber = RemoteWhisperTranscriber()

此外，该组件还需要以下参数才能工作

model 指定 Whisper 模型。
api_base_url 指定 OpenAI 的基础 URL，默认为"<https://api.openai.com/v1>"。如果您使用的 Whisper 提供商不是 OpenAI，请根据提供商的文档设置此参数。

请参阅我们的 API 文档中的其他可选参数。

请参阅 Whisper API 文档和官方 Whisper GitHub 仓库以了解支持的音频格式和语言。

用法

单独使用

以下是如何使用RemoteWhisperTranscriber 转录本地文件的示例

import requests
from haystack.components.audio import RemoteWhisperTranscriber

response = requests.get("https://ia903102.us.archive.org/19/items/100-Best--Speeches/EK_19690725_64kb.mp3")
with open("kennedy_speech.mp3", "wb") as file:
    file.write(response.content)
    
transcriber = RemoteWhisperTranscriber()
transcription = transcriber.run(sources=["./kennedy_speech.mp3"])

print(transcription["documents"][0].content)

在 pipeline 中

下面的管道从指定的 URL 获取音频文件并对其进行转录。它首先使用LinkContentFetcher 获取音频文件，然后使用RemoteWhisperTranscriber 将音频转录为文本，最后输出转录的文本。

from haystack.components.audio import RemoteWhisperTranscriber
from haystack.components.fetchers import LinkContentFetcher
from haystack import Pipeline

pipe = Pipeline()
pipe.add_component("fetcher", LinkContentFetcher())
pipe.add_component("transcriber", RemoteWhisperTranscriber())

pipe.connect("fetcher", "transcriber")
result = pipe.run(
    data={"fetcher": {"urls": ["https://ia903102.us.archive.org/19/items/100-Best--Speeches/EK_19690725_64kb.mp3"]}})
print(result["transcriber"]["documents"][0].content)

其他参考资料

🧑‍🍳 食谱：使用 Whisper、Qdrant 和 Mistral 从播客进行多语言 RAG

更新于大约 1 年前