本人用的是Python3.11,在机器学习项目实战时,无法通过传统的方式加载sklearn内置的波士顿房价数据。

(图片来源网络,侵删)
原代码如下:
# 导入必要的库 import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression # 加载波士顿房屋数据集 boston = load_boston() # 数据准备 X = boston.data y = boston.target # 将数据集划分为训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # 创建线性回归模型实例 model = LinearRegression() # 模型训练 model.fit(X_train, y_train) # 模型预测 y_pred = model.predict(X_test) # 计算均方误差 mse = np.mean((y_pred - y_test) ** 2) print("均方误差:", mse) # 结果可视化 plt.scatter(y_test, y_pred) plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=2) plt.xlabel('actual_price') plt.ylabel('predict_price') plt.title('liner') plt.show()
运行这个代码会出现错误:

(图片来源网络,侵删)
Traceback (most recent call last):
File "\line_regression.py", line 5, in
from sklearn.datasets import load_boston
File "sklearn\datasets\__init__.py", line 157, in __getattr__
raise ImportError(msg)
ImportError:
`load_boston` has been removed from scikit-learn since version 1.2.
该错误说明高版本不再适用传统形式加载这个数据集。
对加载波士顿房屋数据集模块进行修改,修改代码如下:
# 导入必要的库 import numpy as np import pandas as pd import matplotlib.pyplot as plt #from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression # 加载波士顿房屋数据集 data_url = "http://lib.stat.cmu.edu/datasets/boston" raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None) data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]]) target = raw_df.values[1::2, 2] # 数据准备 X = data y = target # 将数据集划分为训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # 创建线性回归模型实例 model = LinearRegression() # 模型训练 model.fit(X_train, y_train) # 模型预测 y_pred = model.predict(X_test) # 计算均方误差 mse = np.mean((y_pred - y_test) ** 2) print("均方误差:", mse) # 结果可视化 plt.scatter(y_test, y_pred) plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=2) plt.xlabel('actual_price') plt.ylabel('predict_price') plt.title('liner') plt.show()