首页 > 解决方案 > 为什么 to_datetime 可以处理正时区 (UTC+) 而不是负时区 (UTC-)?

问题描述

我一直在使用 Pandas 将 .CSV 文件转换为可在另一个系统上读取的格式,并且我即将完成它,但我无法让它与负时区(UTC-1,- 2等)

这是我正在使用的代码,它不是最整洁的,但它可以为 UTC+ 时区完成工作,你能明白为什么它可能无法正确处理 UTC- 时区吗?

import pandas as pd
from datetime import datetime
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding = 'utf-8')
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding = 'utf-8')

df = pd.read_csv('CONCACAF_First_Round.csv', index_col=False)
df['starttime'] = df['starttime'].str.replace('\s+', '')
df.insert(loc=2, column='season', value='2020')
df.insert(loc=6, column='awayscore', value='')
df.insert(loc=8, column='round_a', value='1')
df['venue'] = df['venue'].str.split(',').str[0]
df[['homescore', 'awayscore']] = df['homescore'].str.split('–',expand=True)
df['awayscore'] = df['awayscore'].str.split(' ').str[0]
df['starttime'] = df['starttime'].str.replace('UTC', ' UTC')
df['datepicker'] = df['datepicker'] + (' ') + df['starttime']
del df['starttime']
df[['datepicker', 'time', 'UTC']] = df.datepicker.str.split(" ", expand=True)
df['datepicker'] = df['datepicker'] + ' ' + df['time']
del df['time']
df['datepicker'] = df['datepicker'] + ' ' + df['UTC']
df['datepicker'] = df['datepicker'].str.replace('±', '+')
df['datepicker'] = df['datepicker'].str.replace('UTC', '')
del df['UTC']
df['datepicker'] = pd.to_datetime(df['datepicker'], utc=True)
df.insert(loc=1, column='starttime', value='')
df['starttime'] = df['datepicker'].dt.strftime('%H:%M:%S')
df['datepicker'] = df['datepicker'].dt.strftime('%Y-%m-%d')
print(df.head(10))

这就是它返回负时区的结果

dateutil.parser._parser.ParserError: Unknown string format: 2015-03-25 19:30 −4

这是它在 UTC+ 时区正常工作时返回的内容

datepicker starttime season  hometeam awayteam homescore awayscore              venue round_a
0  2019-09-04  13:00:00   2020  Ethiopia  Lesotho         0         0  Bahir Dar Stadium       1

数据:CONCACAF_First_Round.csv

# copy the data to the clipboard and read with
df = pd.read_clipboard(sep=',')

datepicker,starttime,hometeam,awayteam,homescore,venue
2015-03-25,19:30 UTC−4,Bahamas,Bermuda,0–5,"Thomas Robinson Stadium, Nassau"
2015-03-29,15:00 UTC−3,Bermuda,Bahamas,3–0,"Bermuda National Stadium, Devonshire"
2015-03-26,19:00 UTC−4,British Virgin Islands,Dominica,2–3,"Windsor Park, Roseau (Dominica)[note 2]"
2015-03-29,17:00 UTC−4,Dominica,British Virgin Islands,0–0,"Windsor Park, Roseau"
2015-03-22,19:00 UTC−4,Barbados,U.S. Virgin Islands,0–1,"Barbados National Stadium, Bridgetown"
2015-03-26,15:30 UTC−4,U.S. Virgin Islands,Barbados,0–4,"Addelita Cancryn Junior High School Ground, Charlotte Amalie"
2015-03-23,20:00 UTC−4,Saint Kitts and Nevis,Turks and Caicos Islands,6–2,"Warner Park, Basseterre"
2015-03-26,19:00 UTC−4,Turks and Caicos Islands,Saint Kitts and Nevis,2–6,"TCIFA National Academy, Providenciales"
2015-03-23,18:00 UTC−6,Nicaragua,Anguilla,5–0,"Nicaragua National Football Stadium, Managua"
2015-03-29,17:00 UTC−4,Anguilla,Nicaragua,0–3,"Ronald Webster Park, The Valley"
2015-03-25,20:00 UTC−6,Belize,Cayman Islands,0–0,"FFB Stadium, Belmopan"
2015-03-29,19:00 UTC−5,Cayman Islands,Belize,1–1,"Truman Bodden Sports Complex, George Town"
2015-03-27,20:00 UTC−4,Curaçao,Montserrat,2–1,"Ergilio Hato Stadium, Willemstad"
2015-03-31,19:00 UTC−4,Montserrat,Curaçao,2–2,"Blakes Estate Stadium, St. John's"

标签: pythonpandasdatetimetimezone

解决方案


你的“减号”并不都是减号。例如在您的错误消息中:

Unknown string format: 2015-03-25 19:30 −4

如果将看起来像减号的倒数第二个字符作为字符串复制到 Python 中,您将看到:

>>> '−'
'\xe2\x88\x92'

那就是 U+2212,一个特殊的Unicode“减号”,与键盘上的“-”不同。

您需要用普通的 ASCII 减号替换那些 Unicode“减号”,然后它应该可以工作。

2013 年有一个关于这个的 Python 想法讨论,“Unicode 减号在数字转换中”,其中一个人说

至于支持非 ASCII 加号和减号,我原则上很热衷,但在实践中不冷不热。我认为这将是一个很好的选择,如果有人确定应该接受哪些字符,我会支持将其添加为新功能。但我不认为缺乏对非 ASCII 数字符号的支持是一个错误。

那个帖子最终没有结果,因为每个人都多次将他们的爱好马打死,附带的话题是泰国数字是否应该起作用float()。如果有人接手这项任务,更专注于支持 U+2212 和一些 Unicode“加号”符号可能会更好。


推荐阅读