pipeline 中的最常见位置	在索引管道中的 `DocumentWriter` 之前
必需的初始化变量	"model": 要使用的嵌入模型 "aws_access_key_id": AWS 访问密钥 ID。可以通过以下方式设置：`AWS_ACCESS_KEY_ID` 环境变量。 "aws_secret_access_key": AWS 密钥访问密钥。可以通过以下方式设置：`AWS_SECRET_ACCESS_KEY` 环境变量。 "aws_region_name": AWS 区域名称。可以通过以下方式设置：`AWS_DEFAULT_REGION` 环境变量。
强制运行变量	“documents”：要计算 embedding 的文档列表
输出变量	“documents”：文档列表（已添加嵌入信息）
API 参考	Amazon Bedrock
GitHub 链接	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/amazon_bedrock

概述

Amazon Bedrock 是一个完全托管的服务，通过统一的 API 提供领先的 AI 初创公司和 Amazon 的语言模型供您使用。

支持的模型有：amazon.titan-embed-text-v1, cohere.embed-english-v3, cohere.embed-multilingual-v3，以及amazon.titan-embed-text-v2:0.

📘
批量推理
请注意，只有 Cohere 模型支持批量推理——用同一个请求计算更多文档的嵌入。

应使用此组件来嵌入文档列表。要嵌入单个字符串，您应该使用 AmazonBedrockTextEmbedder。

身份验证

AmazonBedrockDocumentEmbedder 使用 AWS 进行身份验证。您可以直接将凭证作为参数提供给组件，或者使用 AWS CLI 通过您的 IAM 进行身份验证。有关如何设置 IAM 身份策略的更多信息，请参阅官方文档。
要初始化AmazonBedrockDocumentEmbedder 并通过提供凭证进行身份验证，请提供model_name，以及 aws_access_key_id, aws_secret_access_key 和 aws_region_name。其他参数是可选的。您可以在我们的 API 参考中查看它们。

模型特定参数

尽管 Haystack 提供了统一的接口，但 Bedrock 提供的每个模型都可以接受特定的参数。您可以在初始化时传递这些参数。

例如，Cohere 模型支持input_type 和truncate，如 Bedrock 文档中所示。

from haystack_integrations.components.embedders.amazon_bedrock import AmazonBedrockDocumentEmbedder

embedder = AmazonBedrockDocumentEmbedder(model="cohere.embed-english-v3",
                                         input_type="search_document",
                                         truncate="LEFT")

Embedding 元数据

文本文档通常附带一组元数据。如果它们具有辨识度和语义意义，您可以将它们与文档文本一起 embedding，以提高检索效果。

您可以通过使用 Document Embedder 轻松做到这一点

from haystack import Document
from haystack_integrations.components.embedders.amazon_bedrock import AmazonBedrockDocumentEmbedder

doc = Document(content="some text",meta={"title": "relevant title", "page number": 18})

embedder = AmazonBedrockDocumentEmbedder(model="cohere.embed-english-v3",
																					meta_fields_to_embed=["title"])

docs_w_embeddings = embedder.run(documents=[doc])["documents"]

用法

安装

您需要首先安装amazon-bedrock-haystack 包以使用AmazonBedrockTextEmbedder:

pip install amazon-bedrock-haystack

单独使用

基本用法

import os
from haystack_integrations.components.embedders.amazon_bedrock import AmazonBedrockDocumentEmbedder
from haystack.dataclasses import DOcument

os.environ["AWS_ACCESS_KEY_ID"] = "..."
os.environ["AWS_SECRET_ACCESS_KEY"] = "..."
os.environ["AWS_DEFAULT_REGION"] = "us-east-1" # just an example

doc = Document(content="I love pizza!")

embedder = AmazonBedrockDocumentEmbedder(model="cohere.embed-english-v3",
																					input_type="search_document"

result = document_embedder.run([doc])
print(result['documents'][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

在 pipeline 中

在 RAG 管道中

from haystack import Document
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.amazon_bedrock import (
    AmazonBedrockDocumentEmbedder,
    AmazonBedrockTextEmbedder,
)
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [Document(content="My name is Wolfgang and I live in Berlin"),
             Document(content="I saw a black horse running"),
             Document(content="Germany has many big cities")]

indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", AmazonBedrockDocumentEmbedder(
	model="cohere.embed-english-v3"))
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("embedder", "writer")

indexing_pipeline.run({"embedder": {"documents": documents}})

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", AmazonBedrockTextEmbedder(model="cohere.embed-english-v3"))
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "Who lives in Berlin?"

result = query_pipeline.run({"text_embedder":{"text": query}})

print(result['retriever']['documents'][0])

# Document(id=..., content: 'My name is Wolfgang and I live in Berlin')

其他参考资料

🧑‍🍳 食谱：使用 Amazon Bedrock 和 Haystack 进行基于 PDF 的问答

概述

📘批量推理

身份验证

模型特定参数

Embedding 元数据

用法

安装

单独使用

在 pipeline 中

其他参考资料

📘
批量推理