首页 > 解决方案 > 从行中提取每日值并使用日期创建新行

问题描述

我有一个从 csv 文件字节字符串编译的天气数据 DataFrame。每行包含一个元素的一个月数据,即 1949 年 6 月的最高温度位于年份 = 1949、月份 = 06、元素 = TMAX 的行中。6 月 1 日的温度值为 value1,6 月 2 日的温度值为 value2,依此类推。

我已将 df 的片段复制到剪贴板并粘贴在下面,您应该可以复制并使用它pd.read_clipboard(sep=',')来重新创建此 df。

我需要做的是提取每个月的每一天的值(在这个阶段我不太担心 mflag、sflag、qflag),附加值引用的元素并制作一个看起来像这样的新数据框:

日期 TMAX 聚氯乙烯
1949-06-01 210 24
1949-06-02 189 36
,ID,year,month,element,value1,mflag1,qflag1,sflag1,value2,mflag2,qflag2,sflag2,value3,mflag3,qflag3,sflag3,value4,mflag4,qflag4,sflag4,value5,mflag5,qflag5,sflag5,value6,mflag6,qflag6,sflag6,value7,mflag7,qflag7,sflag7,value8,mflag8,qflag8,sflag8,value9,mflag9,qflag9,sflag9,value10,mflag10,qflag10,sflag10,value11,mflag11,qflag11,sflag11,value12,mflag12,qflag12,sflag12,value13,mflag13,qflag13,sflag13,value14,mflag14,qflag14,sflag14,value15,mflag15,qflag15,sflag15,value16,mflag16,qflag16,sflag16,value17,mflag17,qflag17,sflag17,value18,mflag18,qflag18,sflag18,value19,mflag19,qflag19,sflag19,value20,mflag20,qflag20,sflag20,value21,mflag21,qflag21,sflag21,value22,mflag22,qflag22,sflag22,value23,mflag23,qflag23,sflag23,value24,mflag24,qflag24,sflag24,value25,mflag25,qflag25,sflag25,value26,mflag26,qflag26,sflag26,value27,mflag27,qflag27,sflag27,value28,mflag28,qflag28,sflag28,value29,mflag29,qflag29,sflag29,value30,mflag30,qflag30,sflag30,value31,mflag31,qflag31,sflag31
14,40223,1945,12,TAVG,  244,H, ,S,  228,H, ,S,  239,H, ,S,  231,H, ,S,  195,H, ,S,  216,H, ,S,  232,H, ,S,  230,H, ,S,  253,H, ,S,  260,H, ,S,  251,H, ,S,  260,H, ,S,  256,H, ,S,  257,H, ,S,  252,H, ,S,  256,H, ,S,  253,H, ,S,  241,H, ,S,  209,H, ,S,  226,H, ,S,  229,H, ,S,  239,H, ,S,  239,H, ,S,  234,H, ,S,  233,H, ,S,  232,H, ,S,  234,H, ,S,  231,H, ,S,  232,H, ,S,  232,H, ,S,  241,H, ,S
15,40223,1946,01,TAVG,  249,H, ,S,  259,H, ,S,  256,H, ,S,  271,H, ,S,  277,H, ,S,  264,H, ,S,  231,H, ,S,  236,H, ,S,  235,H, ,S,  249,H, ,S,  257,H, ,S,  256,H, ,S,  250,H, ,S,  253,H, ,S,  261,H, ,S,  262,H, ,S,  257,H, ,S,  264,H, ,S,  264,H, ,S,  264,H, ,S,  272,H, ,S,  274,H, ,S,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , ,-9999, , , 
16,40223,1949,06,TMAX,-9999, , , ,-9999, , , ,  210, , ,a,  146, , ,a,  189, , ,a,  216, , ,a,  211, , ,a,  210, , ,a,  214, , ,a,  174, , ,a,  225, , ,a,  187, , ,a,  168, , ,a,  207, , ,a,  218, , ,a,  205, , ,a,  174, , ,a,  192, , ,a,  186, , ,a,  199, , ,a,  176, , ,a,  141, , ,a,  173, , ,a,  190, , ,a,  189, , ,a,  197, , ,a,  207, , ,a,  216, , ,a,  202, , ,a,  204, , ,a,-9999, , , 
17,40223,1949,06,TMIN,-9999, , , ,-9999, , , ,  136, , ,a,   60, , ,a,  113, , ,a,   84, , ,a,   91, , ,a,   88, , ,a,  126, , ,a,  131, , ,a,  129, , ,a,   84, , ,a,   81, , ,a,  113, , ,a,  110, , ,a,  133, , ,a,  136, , ,a,  108, , ,a,  111, , ,a,  122, , ,a,  107, , ,a,   81, , ,a,   59, , ,a,   57, , ,a,   44, , ,a,   42, , ,a,   66, , ,a,   77, , ,a,   60, , ,a,   68, , ,a,-9999, , , 
18,40223,1949,06,PRCP,  277, , ,a,  239, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    3, , ,a,   53, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    8, , ,a,    3, , ,a,    3, , ,a,   71, , ,a,  107, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,    0, , ,a,-9999, , , 
19,40223,1949,06,TAVG,  156,H, ,S,  174,H, ,S,  136,H, ,S,  129,H, ,S,  146,H, ,S,  153,H, ,S,  146,H, ,S,  165,H, ,S,  164,H, ,S,  154,H, ,S,  156,H, ,S,  126,H, ,S,  144,H, ,S,  153,H, ,S,  169,H, ,S,  167,H, ,S,  151,H, ,S,  154,H, ,S,  153,H, ,S,  150,H, ,S,  132,H, ,S,  102,H, ,S,  106,H, ,S,  109,H, ,S,  113,H, ,S,  121,H, ,S,  133,H, ,S,  132,H, ,S,  132,H, ,S,  129,H, ,S,-9999, , , 

提前致谢

标签: pythonpandas

解决方案


可能有一种更直接的方法来做你想做的事,但是这段代码可以工作,并且可以一步一步地遵循。

请注意,我将数据保存在一个名为的文件中weather_data.csv- 使用剪贴板一段时间后不太好用。:)

import pandas as pd

# df = pd.read_clipboard(sep=',')
# df.to_csv('weather_data.csv')

df = pd.read_csv('weather_data.csv')

keep_columns = [col for col in df.columns[2:] if not 'flag' in col]
df = df[keep_columns]

df_melt = pd.melt(df, id_vars=['year', 'month', 'element'], value_vars=keep_columns[3:])

df_melt['day'] = df_melt['variable'].str[5:].astype(int)

df_melt = df_melt.query('value!=-9999')

df_melt['date'] = pd.to_datetime(df_melt[['year', 'month', 'day']])
                              
df_final = df_melt[['date', 'element', 'value']].pivot(index='date', columns='element')

print(df_final)

样本输出:

          value
element     PRCP   TAVG   TMAX  TMIN
date
1945-12-01   NaN  244.0    NaN   NaN
1945-12-02   NaN  228.0    NaN   NaN
1945-12-03   NaN  239.0    NaN   NaN
1945-12-04   NaN  231.0    NaN   NaN
1945-12-05   NaN  195.0    NaN   NaN
...          ...    ...    ...   ...
1949-06-26   0.0  121.0  197.0  42.0
1949-06-27   0.0  133.0  207.0  66.0
1949-06-28   0.0  132.0  216.0  77.0
1949-06-29   0.0  132.0  202.0  60.0
1949-06-30   0.0  129.0  204.0  68.0

推荐阅读