首页 > 解决方案 > 由于python中的未知字符,无法将数据提交到数据库

问题描述

我正在抓取一些网站并将数据存储在我的数据库中。有时我会得到错误的字符映射,我认为这是由于非 ASCII 字符造成的。由于我正在使用不同语言的文本抓取许多网站,因此我无法以通用且有效的方式解决我的问题。

错误示例

Message: 'commit exception GRANTS.GOV'
Arguments: (UnicodeEncodeError('charmap', 'The Embassy of the United States in Nur-Sultan and the Consulate General of the United States in Almaty announces an open competition for past participants (“alumni”) of U.S. government-funded and U.S. government-sponsored exchange programs to submit applications to the 2021 Alumni Engagement Innovation Fund (AEIF) 2021.\xa0\xa0We seek proposals from teams of at least two alumni that meet all program eligibility requirements below. Exchange alumni interested in participating in AEIF 2021 should submit proposals to KazakhstanAlumni@state.gov\xa0by March 31, 2021, 18:00 Nur-Sultan time.\xa0\nAEIF provides alumni of U.S. sponsored and facilitated exchange programs with funding to expand on skills gained during their exchange experience to design and implement innovative solutions to global challenges facing their community. Since its inception in 2011, AEIF has funded nearly 500 alumni-led projects around the world through a competitive global competition.\n\nThis year, the U.S. Mission to Kazakhstan will accept proposals managed by teams of at least two (2) alumni that support the following theme:\n\u25cf\xa0\xa0\xa0\xa0\xa0\xa0Mental health awareness, promotion of mental wellbeing and resiliency.\nGoals. Projects may support one or more of the following goals:\nGoal 1: Increase in public understanding of mental health issues,\xa0its signs and strategies for providing timely help;\nGoal 2: Increase in public understanding of resources, methods, and tools that promote mental health and resiliency, especially among at-risk audiences; American best practices to promote mental health.\nGoal 3: Combatting stigma around mental health issues and dispelling common myths.\n\nFor full package of required forms please Related Documents section.', 1098, 1099, 'character maps to <undefined>'),)

我的代码:

   title ='..............'
   description ='......'
    op = Op(
        website='',
        op_link='',
        title='it might be a long text coming form websites,
        description= it might be a long text coming from websites.,
        organization_id=org_id,
        close_date=',
        checksum=singleData['checksum'],
        published_date='',
        language_id=lang_id,
        is_open=1)
    try:
        session.add(op)
        session.commit()
        session.flush()
        ....
        ....

请注意:它应该在 Linux 系统上工作;我的数据库(Mysql)在 Linux 系统中。我主要面临标题和描述的问题,它们可以是多种语言和任何长度。如何正确对其进行编码,以便在提交数据库时不会出现任何错误?

谢谢

标签: pythonmysqlweb-scraping

解决方案


推荐阅读