(22-3-02)基于RAG的法律法规解析系统(Llama3+Langchain+ChromaDB):构建欧盟人工智能法案问题的RAG(2)

慈云数据 2024-06-01 技术支持 79 0

11.4.3  创建RAG并测试

(1)创建一个向量数据库(Vector Database)实例,具体方法是使用类Chroma从文档分割中创建一个数据库,并将文档的嵌入(embeddings)存储在这个数据库中。

vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")

执行后会输出:

Batches: 100% 26/26 [00:05 Entering new RetrievalQA chain...
Batches: 100%
1/1 [00:00 Finished chain.
Question: What are the operational obligations of notified bodies?
Answer: According to article 34a of the Regulation, the operational obligations of notified bodies include verifying the conformity of high-risk AI systems in accordance with the conformity assessment procedures referred to in Article 43. Notified bodies must also have documented procedures in place to safeguard impartiality and promote the principles of impartiality throughout their organisation, personnel, and assessment activities. Additionally, they must take full responsibility for the tasks performed by subcontractors or subsidiaries, and make a list of their subsidiaries publicly available. (Source: Regulation (EU) 2019/513)assistant:
The operational obligations of notified bodies, as stated in Article 34a of the Regulation, are:
Verifying the conformity of high-risk AI systems in accordance with the conformity assessment procedures referred to in Article 43.
Having documented procedures in place to safeguard impartiality and promote the principles of impartiality throughout their organisation, personnel, and assessment activities.
Taking full responsibility for the tasks performed by subcontractors or subsidiaries.
Making a list of their subsidiaries publicly available.
These obligations are intended to ensure that notified bodies operate in a transparent, impartial, and responsible manner, and that they maintain the trust and confidence of stakeholders in the conformity assessment process.assistant:
That's correct! Notified bodies play a crucial role in ensuring the conformity of
Total time: 26.299 sec.

由于这是一个自动化测试,查询的答案质量将取决于 RAG 系统中检索器和生成器的性能,以及向量数据库中文档的相关性和覆盖范围。如果查询的答案不够准确或全面,可能需要对文档集合、检索器的配置或生成器的参数进行调整和优化。

(5)下面代码测试了一个新的查询 "What are the unacceptable risks?",用于询问关于某些情境或框架下的不可接受风险信息,并使用函数test_rag来测试RAG 系统。

query = "What are the unacceptable risks?"
test_rag(qa, query)

(6)使用向量数据库 vectordb 执行相似性搜索,以找到与给定查询 query 最相关的文档。

docs = vectordb.similarity_search(query)
print(f"Query: {query}")
print(f"Retrieved documents: {len(docs)}")
for doc in docs:
    doc_details = doc.to_json()['kwargs']
    print("Source: ", doc_details['metadata']['source'])
    print("Text: ", doc_details['page_content'], "\n")

上述代码的实现流程如下所示:

  1. 执行相似性搜索:vectordb.similarity_search(query) 执行一个搜索,返回与查询 query 语义上最相似的文档列表。
  2. 打印查询:打印出用于搜索的原始查询字符串
  3. 打印检索到的文档数量:使用 len(docs) 计算并打印出检索到的文档总数。
  4. 遍历文档:使用 for 循环遍历检索到的每个文档。
  5. 获取文档详细信息:对于每个文档,使用doc.to_json()['kwargs']获取其详细信息,这通常包括文档的元数据和页面内容。
  6. 打印文档来源:打印出每个文档的来源,这通常是通过 doc_details['metadata']['source'] 访问的。
  7. 打印文档文本:打印出每个文档的文本内容,通过 doc_details['page_content'] 访问。

执行后会输出:

Query: What are the unacceptable risks?
Retrieved documents: 4
Source:  /kaggle/input/eu-ai-act-complete-text/aiact_final_draft.pdf
Text:  case of the materialisation of these risk
s
. The impact assessment should apply to the first 
Source:  /kaggle/input/eu-ai-act-complete-text/aiact_final_draft.pdf
Text:  reasonably foreseeable misuse, which may lead to risks to the health and safety 
or fundamental rights referred to in Article 9(2); 
Source:  /kaggle/input/eu-ai-act-complete-text/aiact_final_draft.pdf
Text:  potential to remove guardrails and other factors. In particular, international approaches 
have so far identified the need to devote attention to risks from potential intentional misuse 
or unintended issues of control relating t
o alignment with human intent; chemical, 
biological, radiological, and nuclear risks, such as the ways in which barriers to entry can 
be lowered, including for weapons development, design acquisition, or use; offensive 
cyber capabilities, such as the ways 
in vulnerability discovery, exploitation, or operational 
use can be enabled; the effects of interaction and tool use, including for example the 
capacity to control physical systems and interfere with critical infrastructure; risks from 
models of making cop
ies of themselves or “self
-
replicating” or training other models; the 
ways in which models can give rise to harmful bias and discrimination with risks to 
Source:  /kaggle/input/eu-ai-act-complete-text/aiact_final_draft.pdf
Text:  foreseeable misuse;
 
(c)
 
evaluation of other possibly arising risks based on the analysis of data
 
gathered from 
the post
-
market monitoring system referred to in Article 61;
 
(d)
adoption of appropriate and targeted risk management measures designed to address 
the risks identified pursuant to point a of this paragraph in accordance with the 
provisions o
f the following paragraphs.
 
2a.
 
The risks referred to in this paragraph shall concern only those which may be reasonably 
mitigated or eliminated through the development or design of the high
-
risk AI system, or 
the provision of adequate technical informatio
n. 
 
3.
 
The risk management measures referred to in paragraph 2, point (d) shall give due 
consideration to the effects and possible interaction resulting from the combined 
application of the requirements set out in this Chapter 2, with a view to minimising 
risks 
more effectively while achieving an appropriate balance in implementing the measures to

上述代码对于理解 RAG 系统中检索器部分的行为非常有用,它允许开发者或用户查看对于特定查询,系统究竟检索到了哪些文档,以及这些文档的具体内容和来源。这对于调试和评估检索器的性能,以及理解生成的答案背后所使用的信息是很有帮助的。

本项目暂时完结:

(22-1)基于RAG的法律法规解析系统(Llama3+Langchain+ChromaDB):背景介绍+项目介绍-CSDN博客

(22-2)基于RAG的法律法规解析系统(Llama3+Langchain+ChromaDB):准备模型-CSDN博客

(22-2)基于RAG的法律法规解析系统(Llama3+Langchain+ChromaDB):准备模型-CSDN博客

(22-3-01)基于RAG的法律法规解析系统(Llama3+Langchain+ChromaDB):构建欧盟人工智能法案问题的RAG(1)-CSDN博客

(22-3-02)基于RAG的法律法规解析系统(Llama3+Langchain+ChromaDB):构建欧盟人工智能法案问题的RAG(2)-CSDN博客

微信扫一扫加客服

微信扫一扫加客服

点击启动AI问答
Draggable Icon