【Python自然语言处理+tkinter图形化界面】实现智能医疗客服问答机器人实战（附源码、数据集、演示超详细）-慈云数据

需要源碼和數據集請點贊關注收藏後評論區留言私信~~~

一、問答智能客服簡介

QA問答是Question-and-Answer的縮寫，根據用戶提出的問題檢索答案，并用用戶可以理解的自然語言回答用戶，問答型客服注重一問一答處理，側重知識的推理。

從應用領域視角，可将問答系統分爲限定域問答系統和開放域問答系統。

根據支持問答系統産生答案的文檔庫、知識庫，以及實現的技術分類，可分爲自然語言的數據庫問答系統、對話式問答系統、閱讀理解系統、基于常用問題集的問答系統、基于知識庫的問答系統等。

智能問答客服功能架構

典型的問答系統包含問題輸入問題理解信息檢索信息抽取答案排序答案生成和結果輸出等，首先由用戶提出問題，檢索操作通過在知識庫中查詢得到相關信息，并依據特定規則從提取到的信息中抽取相應的候選答案特征向量，最後篩選候選答案結果輸出給用戶

智能問答客服框架

1：問題處理問題處理流程識别問題中包含的信息，判斷問題的主題信息和主題範疇歸屬，比如是屬于一般類問題還是屬于特定主題類問題，然後提取與主題相關的關鍵信息，比如人物信息、地點信息和時間信息等。

2 ：問題映射根據用戶咨詢的問題，進行問題映射消除歧義。通過字符串相似度匹配和同義詞表等解決映射問題，根據需要執行拆分和合并操作。

3 ：查詢構建通過對輸入問題進行處理，将問題轉化爲計算機可以理解的查詢語言，然後查詢知識圖譜或者數據庫，通過檢索獲得相應備選答案。

4 ：知識推理根據問題屬性進行推理，問題基本屬性如果屬于知識圖譜或者數據庫中的已知定義信息，則可以從知識圖譜或者數據庫中查找，直接返回答案。如果問題屬性是未定義類問題，則需要通過機器算法推理生成答案。

5：消岐排序根據知識圖譜中查詢返回的一個或者多個備選答案，結合問題屬性進行消歧處理和優先級排序，輸出最佳答案。

二、智能醫療客服問答實戰

定制性智能客服程序一般需要實現選擇語料庫，去除噪聲信息後根據算法對預料進行訓練，最後提供人機接口問答對話，基于互聯網獲得的醫學語料庫，并通過餘弦相似度基本原理，設計并開發以下問答型智能醫療客服應用程序

項目結構如下

效果展示

下面是csv文件中定義的一些病例

預先定義好的歡迎語句

運行chatrobot文件彈出以下窗口輸出問題後點擊提交咨詢即可

對于語料庫中沒有的問題會自動推斷給出答案（通常不太準确）

三、代碼

部分代碼如下全部代碼和數據集請點贊關注收藏後評論區留言私信

Python

# -*- coding:utf-8 -*-
from fuzzywuzzy import fuzz
import sys
import jieba
import csv
import pickle
print(sys.getdefaultencoding())
import logging
from fuzzywuzzy import fuzz
import math
from scipy import sparse
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from scipy.sparse import lil_matrix
from sklearn.naive_bayes import MultinomialNB
import warnings
from tkinter import *
import time
import difflib
from collections import Counter
import numpy as np
filename = 'label.csv'
def tokenization(filename):
    corpus = []
    label = []
    question = []
    answer = []
    with open(filename, 'r', encoding="utf-8") as f:
        data_corpus = csv.reader(f)
        next(data_corpus)
        for words in data_corpus:
            word = jieba.cut(words[1])
            tmp = ''
            for x in word:
                tmp += x
            corpus.append(tmp)
            question.append(words[1])
            label.append(words[0])
            answer.append(words[2])
    
    with open('corpus.h5','wb') as f:
        pickle.dump(corpus,f)
    with open('label.h5','wb') as f:
        pickle.dump(label,f)
    with open('question.h5', 'wb') as f:
        pickle.dump(question, f)
    with open('answer.h5', 'wb') as f:
        pickle.dump(answer, f)
    return corpus,label,question,answer
def train_model():
    with open('corpus.h5','rb') as f_corpus:
        corpus = pickle.load(f_corpus)
    with open('label.h5','rb') as f_label:
        label = pickle.load(f_label,encoding='bytes')
    vectorizer = CountVectorizer(min_df=1)
    transformer = TfidfTransformer()
    tfidf = transformer.fit_transform(vectorizer.fit_transform(corpus))
    words_frequency = vectorizer.fit_transform(corpus)
    word = vectorizer.get_feature_names()
    saved = tfidf_calculate(vectorizer.vocabulary_,sparse.csc_matrix(words_frequency),len(corpus))
    model = MultinomialNB()
    model.fit(tfidf,label)
    with open('model.h5','wb') as f_model:
        pickle.dump(model,f_model)
    with open('idf.h5','wb') as f_idf:
        pickle.dump(saved,f_idf)
    return model,tfidf,label
    
    
    
    
class tfidf_calculate(object):
    def __init__(self,feature_index,frequency,docs):
        self.feature_index = feature_index
        self.frequency = frequency
        self.docs = docs
        self.len = len(feature_index)
    def key_count(self,input_words):
        keys = jieba.cut(input_words)
        count = {}
        for key in keys:
            num = count.get(key, 0)
            count[key] = num + 1
        return count
    def getTfidf(self,input_words):
        count = self.key_count(input_words)
        result = lil_matrix((1, self.len))
        frequency = sparse.csc_matrix(self.frequency)
        for x in count:
            word = self.feature_index.get(x)
            if word != None and word>=0:
                word_frequency = frequency.getcol(word)
                feature_docs = word_frequency.sum()
                tfidf = count.get(x) * (math.log((self.docs+1) / (feature_docs+1))+1)
                result[0, word] = tfidf
        return result    
if __name__=="__main__":
    tokenization(filename)
    train_model()

創作不易覺得有幫助請點贊關注收藏~~~

【Python自然語言處理+tkinter圖形化界面】實現智能醫療客服問答機器人實戰（附源碼、數據集、演示超詳細）

一、問答智能客服簡介

智能問答客服功能架構

智能問答客服框架

二、智能醫療客服問答實戰

效果展示

三、代碼

stm32編寫Modbus步驟

如何保證數據庫和緩存的一緻性

Mongodb聚合操作中的$unset

私域引流寶PHP源碼以及搭建教程

php redis分布式鎖

linux内存緩存占用過高分析和優化

stm32編寫Modbus步驟

如何保證數據庫和緩存的一緻性

Mongodb聚合操作中的$unset

私域引流寶PHP源碼以及搭建教程

php redis分布式鎖

linux内存緩存占用過高分析和優化

stm32編寫Modbus步驟

如何保證數據庫和緩存的一緻性

一、問答智能客服簡介

智能問答客服功能架構

智能問答客服框架

二、智能醫療客服問答實戰

效果展示

三、代碼

猜你喜歡

stm32編寫Modbus步驟

如何保證數據庫和緩存的一緻性

Mongodb聚合操作中的$unset

私域引流寶PHP源碼 以及搭建教程

php redis分布式鎖

linux内存緩存占用過高分析和優化

stm32編寫Modbus步驟

如何保證數據庫和緩存的一緻性

Mongodb聚合操作中的$unset

私域引流寶PHP源碼 以及搭建教程

php redis分布式鎖

linux内存緩存占用過高分析和優化

stm32編寫Modbus步驟

如何保證數據庫和緩存的一緻性

私域引流寶PHP源碼以及搭建教程

私域引流寶PHP源碼以及搭建教程