与 LangChain 集成(integrate_with_langchain)

使用 Milvus 和 LangChain 进行文档问答

本指南演示了如何使用 Milvus 和 LangChain 构建一个 LLM 驱动的问答应用程序。


本页面的代码片段需要安装 pymilvuslangchain。为将文档嵌入向量存储库中,还需使用OpenAI的嵌入式API,因此还需要安装openaitiktok库。如果您的计算机上没有这些库,请运行以下命令进行安装。

! python -m pip install --upgrade pymilvus langchain openai tiktoken



from os import environ
MILVUS_HOST = "localhost"
MILVUS_PORT = "19530"
OPENAI_API_KEY = "sk-******" # example: "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
## Set up environment variables



  • 准备当LLM思考时要查看的文档。

  • 设置嵌入模型以将文档转换为向量嵌入。

  • 设置用于保存向量嵌入的向量存储。

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Milvus
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import CharacterTextSplitter
# Use the WebBaseLoader to load specified web pages into documents
loader = WebBaseLoader([
docs = loader.load()
# Split the documents into smaller chunks
text_splitter = CharacterTextSplitter(chunk_size=1024, chunk_overlap=0)
docs = text_splitter.split_documents(docs)


Created a chunk of size 1745, which is longer than the specified 1024
Created a chunk of size 1278, which is longer than the specified 1024


# Set up an embedding model to covert document chunks into vector embeddings.
embeddings = OpenAIEmbeddings(model="ada")
# Set up a vector store used to save the vector embeddings. Here we use Milvus as the vector store.
vector_store = Milvus.from_documents(
    connection_args={"host": MILVUS_HOST, "port": MILVUS_PORT}


query = "What is milvus?"
docs = vector_store.similarity_search(query)


[Document(page_content='Milvus workflow.', metadata={'source': 'https://milvus.io/docs/overview.md', 'title': 'Introduction Milvus documentation', 'description': 'Milvus is an open-source vector database designed specifically for AI application development, embeddings similarity search, and MLOps v2.2.x.', 'language': 'en'}), Document(page_content="Installat...rved.", metadata={'source': 'https://milvus.io/docs/overview.md', 'title': 'Introduction Milvus documentation', 'description': 'Milvus is an open-source vector database designed specifically for AI application development, embeddings similarity search, and MLOps v2.2.x.', 'language': 'en'}), Document(page_content='Introduction ... Milvus is able to analyze the correlation between two vectors by calculating their similarity distance. If the two embedding vectors are very similar, it means that the original data sources are similar as well.', metadata={'source': 'https://milvus.io/docs/overview.md', 'title': 'Introduction Milvus documentation', 'description': 'Milvus is an open-source vector database designed specifically for AI application development, embeddings similarity search, and MLOps v2.2.x.', 'language': 'en'}), Document(page_content="Key concepts...search algorithms are used to accelerate the searching process. If the two embedding vectors are very similar, it means that the original data sources are similar as well.Why Milvus?", metadata={'source': 'https://milvus.io/docs/overview.md', 'title': 'Introduction Milvus documentation', 'description': 'Milvus is an open-source vector database designed specifically for AI application development, embeddings similarity search, and MLOps v2.2.x.', 'language': 'en'})]



请注意,LangChain为带有来源的问答提供了四种链式类型,分别为stuffmap_reducerefinemap-rerank。简单来说,stuff链将整个文档作为输入,只适用于小型文档。由于大多数LLMs对提示中可能包含的最大标记数量有限制,建议使用其他三种链式类型。这些链式类型将输入文档分成较小的部分,并以不同的方式将它们馈送到LLM中。有关详细信息,请参阅LangChain文档中的索引相关链式类型 (opens in a new tab)


from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.llms import OpenAI
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True)
query = "What is Milvus?"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)


{'intermediate_steps': [' No relevant text.',
  ' What is Milvus vector database?',
  'What is Milvus? Milvus was created in 2019 with a singular goal: store, index, and manage massive embedding vectors generated by deep neural networks and other machine learning (ML) models. As a database specifically designed to handle queries over input vectors, it is capable of indexing vectors on a trillion scale. Unlike existing relational databases which mainly deal with structured data following a pre-defined pattern, Milvus is designed from the bottom-up to handle embedding vectors converted from unstructured data.',
  ' Milvus is a vector database and similarity search platform that enables users to quickly and accurately search for semantically similar vectors in an unstructured data repository. It uses modern embedding techniques to convert unstructured data to embedding vectors, and approximate nearest neighbor (ANN) search algorithms to accelerate the searching process.'],
 'output_text': ' Milvus is a vector database and similarity search platform that enables users to quickly and accurately search for semantically similar vectors in an unstructured data repository. It uses modern embedding techniques to convert unstructured data to embedding vectors, and approximate nearest neighbor (ANN) search algorithms to accelerate the searching process.SOURCES: https://milvus.io/docs/overview.md'}