使用 Milvus 和 LangChain 构建检索增强生成（RAG）

本指南展示了如何使用 LangChain 和 Milvus 构建检索增强生成（RAG）系统。

RAG 系统将检索系统与生成模型相结合，根据给定的提示生成新文本。该系统首先使用 Milvus 从语料库中检索相关文档，然后使用生成模型基于检索到的文档生成新文本。

LangChain 是一个用于开发由大语言模型（LLM）驱动的应用程序的框架。Milvus 是世界上最先进的开源向量数据库，专为支持嵌入相似性搜索和 AI 应用而构建。

先决条件

在运行此笔记本之前，请确保您已安装以下依赖项：

pip install --upgrade --quiet  langchain langchain-core langchain-community langchain-text-splitters langchain-milvus langchain-openai bs4

如果您使用 Google Colab，为了启用刚安装的依赖项，您可能需要重启运行时（点击屏幕顶部的"Runtime"菜单，并从下拉菜单中选择"Restart session"）。

我们将使用 OpenAI 的模型。您应该准备 api key OPENAI_API_KEY 作为环境变量。

import os

os.environ["OPENAI_API_KEY"] = "sk-***********"

准备数据

我们使用 Langchain WebBaseLoader 从网络源加载文档，并使用 RecursiveCharacterTextSplitter 将它们拆分为块。

import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Create a WebBaseLoader instance to load documents from web sources
loader = WebBaseLoader(
    web_paths=(
        "https://lilianweng.github.io/posts/2023-06-23-agent/",
        "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    ),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
# Load documents from web sources using the loader
documents = loader.load()
# Initialize a RecursiveCharacterTextSplitter for splitting text into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)

# Split the documents into chunks using the text_splitter
docs = text_splitter.split_documents(documents)

# Let's take a look at the first document
docs[1]

输出：

Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to "think step by step" to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model's thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into "Problem PDDL", then (2) requests a classical planner to generate a PDDL plan based on an existing "Domain PDDL", and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\nSelf-Reflection#', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'})

如我们所见，文档已经被拆分成块。数据的内容与 AI 智能体相关。

使用 Milvus 向量存储构建 RAG 链

我们将使用文档初始化一个 Milvus 向量存储，它将文档加载到 Milvus 向量存储中，并在底层构建索引。

from langchain_milvus import Milvus
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

vectorstore = Milvus.from_documents(
    documents=docs,
    embedding=embeddings,
    connection_args={
        "uri": "./milvus_demo.db",
    },
    drop_old=True,  # Drop the old Milvus collection if it exists
)

对于 connection_args：

将 uri 设置为本地文件，例如 ./milvus.db，是最方便的方法，因为它会自动利用 Milvus Lite 将所有数据存储在此文件中。
如果您有大规模数据，可以在 docker 或 kubernetes 上设置性能更高的 Milvus 服务器。在此设置中，请使用服务器 uri，例如 http://localhost:19530，作为您的 uri。
如果您想使用 Zilliz Cloud，Milvus 的完全托管云服务，请调整 uri 和 token，它们对应于 Zilliz Cloud 中的公共端点和 API 密钥。

使用测试查询问题在 Milvus 向量存储中搜索文档。让我们看看排名第一的文档。

query = "What is self-reflection of an AI Agent?"
vectorstore.similarity_search(query, k=1)

输出：

[Document(page_content='Self-Reflection#\nSelf-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.\nReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language.\nThe ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:\nThought: ...\nAction: ...\nObservation: ...\n... (Repeated many times)', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'pk': 449281835035555859})]

from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# Initialize the OpenAI language model for response generation
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# Define the prompt template for generating AI responses
PROMPT_TEMPLATE = """
Human: You are an AI assistant, and provides answers to questions by using fact based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
<context>
{context}
</context>

<question>
{question}
</question>

The response should be specific and use statistics or numbers when possible.

A:"""

# Create a PromptTemplate instance with the defined template and input variables
prompt = PromptTemplate(
    template=PROMPT_TEMPLATE, input_variables=["context", "question"]
)
# Convert the vector store to a retriever
retriever = vectorstore.as_retriever()


# Define a function to format the retrieved documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

使用 LCEL（LangChain 表达式语言）构建 RAG 链。

# Define the RAG (Retrieval-Augmented Generation) chain for AI response generation
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# rag_chain.get_graph().print_ascii()

# Invoke the RAG chain with a specific question and retrieve the response
res = rag_chain.invoke(query)
res

"AI 智能体的自我反思涉及随着时间的推移将记忆综合成更高层次的推理，以指导智能体的未来行为。它作为一种机制来创建过去事件的更高层次摘要。自我反思的一种方法是用最近的 100 个观察来提示语言模型，并要求它基于这些观察生成 3 个最突出的高层次问题。这个过程帮助 AI 智能体在当前时刻和随着时间的推移优化可信度。"

恭喜！您已经构建了一个由 Milvus 和 LangChain 驱动的基本 RAG 链。

元数据过滤

我们可以使用 Milvus 标量过滤规则基于元数据过滤文档。我们已经从两个不同的源加载了文档，我们可以通过元数据 source 过滤文档。

vectorstore.similarity_search(
    "What is CoT?",
    k=1,
    expr="source == 'https://lilianweng.github.io/posts/2023-06-23-agent/'",
)

# The same as:
# vectorstore.as_retriever(search_kwargs=dict(
#     k=1,
#     expr="source == 'https://lilianweng.github.io/posts/2023-06-23-agent/'",
# )).invoke("What is CoT?")

输出：

[Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to "think step by step" to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model's thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into "Problem PDDL", then (2) requests a classical planner to generate a PDDL plan based on an existing "Domain PDDL", and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\nSelf-Reflection#', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'pk': 449281835035555858})]

如果我们想要在不重建链的情况下动态更改搜索参数，我们可以配置运行时链内部。让我们定义一个具有这种动态配置的新检索器，并使用它构建一个新的 RAG 链。

from langchain_core.runnables import ConfigurableField

# Define a new retriever with a configurable field for search_kwargs
retriever2 = vectorstore.as_retriever().configurable_fields(
    search_kwargs=ConfigurableField(
        id="retriever_search_kwargs",
    )
)

# Invoke the retriever with a specific search_kwargs which filter the documents by source
retriever2.with_config(
    configurable={
        "retriever_search_kwargs": dict(
            expr="source == 'https://lilianweng.github.io/posts/2023-06-23-agent/'",
            k=1,
        )
    }
).invoke(query)

输出：

[Document(page_content='Self-Reflection#\nSelf-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.\nReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language.\nThe ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:\nThought: ...\nAction: ...\nObservation: ...\n... (Repeated many times)', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'pk': 449281835035555859})]

# Define a new RAG chain with this dynamically configurable retriever
rag_chain2 = (
    {"context": retriever2 | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

让我们尝试使用不同的过滤条件测试这个动态可配置的 RAG 链。

# Invoke this RAG chain with a specific question and config
rag_chain2.with_config(
    configurable={
        "retriever_search_kwargs": dict(
            expr="source == 'https://lilianweng.github.io/posts/2023-06-23-agent/'",
        )
    }
).invoke(query)

当我们将搜索条件更改为按第二个源过滤文档时，由于该博客源的内容与查询问题无关，我们得到了一个没有相关信息的答案。

rag_chain2.with_config(
    configurable={
        "retriever_search_kwargs": dict(
            expr="source == 'https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/'",
        )
    }
).invoke(query)

"抱歉，根据提供的上下文，没有关于 AI 智能体自我反思的特定信息或统计数据。"

本教程专注于 Milvus LangChain 集成的基本用法和简单的 RAG 方法。有关更高级的 RAG 技术，请参考高级 RAG 训练营。

先决条件​

准备数据​

使用 Milvus 向量存储构建 RAG 链​

元数据过滤​

先决条件

准备数据

使用 Milvus 向量存储构建 RAG 链

元数据过滤