首页 > 解决方案 > 无法在 mysql 表中正确存储非英文名称

问题描述

我正在尝试将一些从网页派生的字段存储在 mysql 表中。我创建的脚本可以解析数据并将它们存储在表中。但是,由于用户名不是英文,因此该表将名称存储为????????? ?????????而不是Αθανάσιος Σουλιώτης.

我试过的脚本:

import mysql.connector
import requests
from bs4 import BeautifulSoup

link = 'https://stackoverflow.com/questions/67941060/web-scraper-update'

mydb = mysql.connector.connect(
  host="localhost",
  user="root",
  passwd = "",
  database="mydatabase",
  charset='utf8',
  use_unicode=True
)

mycursor = mydb.cursor()

mycursor.execute("DROP TABLE if exists webdata")
mycursor.execute("CREATE TABLE if not exists webdata (title VARCHAR(255), username VARCHAR(255), reputation VARCHAR(255))")

response = requests.get(link)
soup = BeautifulSoup(response.text,"lxml")
post_title = soup.select_one("h1[itemprop='name'] > a").get_text(strip=True)
username = soup.select_one(".user-details > a").get_text(strip=True)
reputation = soup.select_one("span.reputation-score").get_text(strip=True)

print((post_title,username,reputation))

mycursor.execute(
    "INSERT INTO webdata (title,username,reputation) VALUES (%s,%s,%s)",
    (post_title,username,reputation)
)

mydb.commit()
mydb.close()

这是在控制台中打印输出的方式:

('Web scraper update', 'Αθανάσιος Σουλιώτης', '13')

数据库将输出存储为:

'Web scraper update', '????????? ?????????', '13'

如何相应地将非英文名称存储在 mysql 表中?

标签: pythonmysqlpython-3.xweb-scraping

解决方案


请阅读并重试。

我在新的 3 行上添加了提交。

mydb = mysql.connector.connect(

host="localhost",
  user="root",
  passwd = "",
  database="mydatabase",
  charset='utf8',
  use_unicode=True
)

mycursor = mydb.cursor()

// add below line 1)
mycursor.execute("ALTER DATABASE `%s` CHARACTER SET 'utf8' COLLATE 'utf8_unicode_ci'" % 'mydatabase')

mycursor.execute("DROP TABLE if exists webdata")
mycursor.execute("CREATE TABLE if not exists webdata (title VARCHAR(255), username VARCHAR(255), reputation VARCHAR(255))")

mycursor.execute('SET CHARACTER SET utf8;')             // <--- add this line   2)
mycursor.execute('SET character_set_connection=utf8;')  // <--- add this line   3)

response = requests.get(link)
soup = BeautifulSoup(response.text,"lxml")

我已经测试过了。

请在检查后提交。


推荐阅读