pipeline 中的最常见位置	在查询/RAG 管道中的嵌入检索器之前
必需的初始化变量	"model": 要使用的嵌入模型 "aws_access_key_id": AWS 访问密钥 ID。可以通过以下方式设置：`AWS_ACCESS_KEY_ID` 环境变量。 "aws_secret_access_key": AWS 密钥访问密钥。可以通过以下方式设置：`AWS_SECRET_ACCESS_KEY` 环境变量。 "aws_region_name": AWS 区域名称。可以通过以下方式设置：`AWS_DEFAULT_REGION` 环境变量。
强制运行变量	“text”: 一个字符串
输出变量	“embedding”：一个浮点数列表（向量）
API 参考	Amazon Bedrock
GitHub 链接	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/amazon_bedrock

概述

Amazon Bedrock 是一项全托管服务，通过统一的API为您提供领先的AI初创公司和Amazon的语言模型。

支持的模型有amazon.titan-embed-text-v1, cohere.embed-english-v3 和cohere.embed-multilingual-v3.

使用 AmazonBedrockTextEmbedder 将简单字符串（例如查询）嵌入到向量中。使用 AmazonBedrockDocumentEmbedder 来丰富文档，并计算嵌入，也称为向量。

身份验证

AmazonBedrockTextEmbedder 使用AWS进行身份验证。您可以直接将凭证作为参数提供给组件，也可以使用AWS CLI通过IAM进行身份验证。有关如何设置IAM身份策略的更多信息，请参阅官方文档。
要初始化使用 AmazonBedrockTextEmbedder 进行身份验证，并提供凭证，请提供model 名称，以及aws_access_key_id, aws_secret_access_key 和aws_region_name。其他参数是可选的，您可以在我们的API参考中查看它们。

模型特定参数

尽管Haystack提供了一个统一的接口，但Bedrock提供的每个模型都可以接受特定参数。您可以在初始化时传递这些参数。

例如，Cohere模型支持input_type 和truncate，正如在Bedrock文档中所见。

from haystack_integrations.components.embedders.amazon_bedrock import AmazonBedrockTextEmbedder

embedder = AmazonBedrockTextEmbedder(model="cohere.embed-english-v3",
                                     input_type="search_query",
                                     truncate="LEFT")

用法

安装

您需要首先安装amazon-bedrock-haystack 包来使用AmazonBedrockTextEmbedder:

pip install amazon-bedrock-haystack

单独使用

基本用法

import os
from haystack_integrations.components.embedders.amazon_bedrock import AmazonBedrockTextEmbedder

os.environ["AWS_ACCESS_KEY_ID"] = "..."
os.environ["AWS_SECRET_ACCESS_KEY"] = "..."
os.environ["AWS_DEFAULT_REGION"] = "us-east-1" # just an example

text_to_embed = "I love pizza!"

text_embedder = AmazonBedrockTextEmbedder(model="cohere.embed-english-v3",
																					input_type="search_query")

print(text_embedder.run(text_to_embed))
# {'embedding': [-0.453125, 1.2236328, 2.0058594, 0.67871094...]}

在 pipeline 中

在 RAG 管道中

from haystack import Document
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.amazon_bedrock import (
    AmazonBedrockDocumentEmbedder,
    AmazonBedrockTextEmbedder,
)
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [Document(content="My name is Wolfgang and I live in Berlin"),
             Document(content="I saw a black horse running"),
             Document(content="Germany has many big cities")]

document_embedder = AmazonBedrockDocumentEmbedder(model="cohere.embed-english-v3")
documents_with_embeddings = document_embedder.run(documents)['documents']
document_store.write_documents(documents_with_embeddings)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", AmazonBedrockTextEmbedder(model="cohere.embed-english-v3"))
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "Who lives in Berlin?"

result = query_pipeline.run({"text_embedder":{"text": query}})

print(result['retriever']['documents'][0])

# Document(id=..., content: 'My name is Wolfgang and I live in Berlin')

其他参考资料

🧑‍🍳 食谱：使用 Amazon Bedrock 和 Haystack 进行基于 PDF 的问答