python - 熊猫在使用 dataframe.shift() 时表现得很奇怪
问题描述
我正在阅读一些看起来像这样的数据:
在此数据集中,第 16 列中有许多行null
。我需要将这些行中的值向右移动,以便以“*”开头的值(例如,第 16 列第 4 行,第 13 列第 5 行等) ) 将移动到它们右侧的列。(最终我将在一个循环中执行此操作,以便这些值将进入第 16 列)。
这些值左侧的数据也必须移动。例如,当 {column 7 row 16} 中的数据移动到 {column 8, row 16} 时,{column 2 row 16} 中的数据应该移动到 {column 3 row 16}。
但是,我不希望第 1 列(零索引第 0 列)中的数据移动,因为我将使用它作为我的数据的索引。
因此我的预期输出是这样的:
我正在使用下面的代码来实现这一点:
import StringIO
import pandas
# Store the csv string in a variable and turn that into a dataframe
# This string here is the same as the data in the image above.
gps_string = """2010-01-12 18:00:00,$GPGGA,180439,7249.2150,N,11754.4238,W,2.0,10,0.9,-8.1,M,-12.4,M,,*57,,,
2010-01-12 17:30:00,$GPGGA,173439,7249.2160,N,11754.4233,W,2.0,11,0.8,-4.5,M,-12.4,M,,*5B,,,
2010-01-12 17:00:00,$GPGGA,170439,7249.2152,N,11754.4235,W,2.0,11,0.8,-3.1,M,-12.4,M,,*5C,,,
2010-01-12 16:30:00,N,11754.4210,W,2,9.0,1.1,-13.1,M,-12.4,M,,*6C,,,,,,
2010-01-12 16:00:00,N,11754.4229,W,2,10.0,0.9,-2.9,M,-12.4,M,,*53,,,,,,
2010-01-12 15:30:00,N,11754.4269,W,2,9.0,0.8,-4.3,M,-12.4,M,,*54,,,,,,
2010-01-12 15:00:00,N,11754.4267,W,2,10.0,0.8,-1.6,M,-12.4,M,,*56,,,,,,
2010-01-12 14:30:00,$GPGGA,143439,7249.2152,N,11754.4253,W,2.0,11,0.7,-4.3,M,-12.4,M,,*56,,,
2010-01-12 14:00:00,N,11754.4245,W,2,10.0,0.9,-7.0,M,-12.4,M,,*50,,,,,,
2010-01-12 13:30:00,$GPGGA,133439,7249.2134,N,11754.4243,W,2.0,11,0.7,-10.7,M,-12.4,M,,*61,,,
2010-01-12 13:00:00,N,11754.4245,W,2,10.0,0.8,-5.5,M,-12.4,M,,*56,,,,,,
2010-01-12 12:30:00,N,11754.4226,W,2,10.0,0.9,-7.1,M,-12.4,M,,*59,,,,,,
2010-01-12 12:00:00,N,11754.4238,W,2,10.0,0.8,-6.5,M,-12.4,M,,*51,,,,,,
2010-01-12 11:30:00,N,11754.4227,W,2,10.0,0.8,0.1,M,-12.4,M,,*73,,,,,,
2010-01-12 11:00:00,-7.4,M,-12.4,M,,*5F,,,,,,,,,,,,
2010-01-12 10:30:00,N,11754.4271,W,2,8.0,1.1,-8.4,M,-12.4,M,,*5A,,,,,,
"""
# Read the csv string into a dataframe, with no headers
# Make the first column with timestamp values the index column.
gps_df = pd.read_csv(StringIO.StringIO(gps_string), header=None,
index_col=0)
rows_to_shift = gps_df[gps_df[15].isnull()].index
# Shift the rows here.
gps_df.loc[rows_to_shift] = gps_df.loc[rows_to_shift].shift(periods=1, axis=1)
gps_df.to_csv("f.csv") # Creates a file after shift to see the output
执行代码时,我得到以下输出文件。
从这里我看到 shift 函数null(s)
出于某种原因在第 5 列创建了一列,并且它还将最初在第 10 列中的数据移动到第 15 列,知道为什么会这样吗?
这可能是dataframe.shift()
函数中的错误吗?还是我在这里做错了什么?
解决方案
这是 pandas 中的一个错误,可以在此处找到更多详细信息。
似乎移动对象列将自动转移到具有对象 dtype 的下一列。
为了解决这个问题,我选择了要移动的索引,将数据框中的所有数据转换为字符串,执行移动,再次将数据作为 csv 字符串获取,然后重新创建数据框以获取以前的数据类型.
以下是我用来解决此问题的代码:
import pandas as pd
import StringIO
gps_string = """
"2010-01-12 18:00:00","$GPGGA","180439","7249.2150","N","11754.4238","W","2","10","0.9","-8.1","M","-12.4","M","","*57","","",""
"2010-01-12 17:30:00","$GPGGA","173439","7249.2160","N","11754.4233","W","2","11","0.8","-4.5","M","-12.4","M","","*5B","","",""
"2010-01-12 17:00:00","$GPGGA","170439","7249.2152","N","11754.4235","W","2","11","0.8","-3.1","M","-12.4","M","","*5C","","",""
"2010-01-12 16:30:00","N","11754.4210","W","2","09","1.1","-13.1","M","-12.4","M","","*6C","","","","","",""
"2010-01-12 16:00:00","N","11754.4229","W","2","10","0.9","-2.9","M","-12.4","M","","*53","","","","","",""
"2010-01-12 15:30:00","N","11754.4269","W","2","09","0.8","-4.3","M","-12.4","M","","*54","","","","","",""
"2010-01-12 15:00:00","N","11754.4267","W","2","10","0.8","-1.6","M","-12.4","M","","*56","","","","","",""
"2010-01-12 14:30:00","$GPGGA","143439","7249.2152","N","11754.4253","W","2","11","0.7","-4.3","M","-12.4","M","","*56","","",""
"2010-01-12 14:00:00","N","11754.4245","W","2","10","0.9","-7.0","M","-12.4","M","","*50","","","","","",""
"2010-01-12 13:30:00","$GPGGA","133439","7249.2134","N","11754.4243","W","2","11","0.7","-10.7","M","-12.4","M","","*61","","",""
"2010-01-12 13:00:00","N","11754.4245","W","2","10","0.8","-5.5","M","-12.4","M","","*56","","","","","",""
"2010-01-12 12:30:00","N","11754.4226","W","2","10","0.9","-7.1","M","-12.4","M","","*59","","","","","",""
"2010-01-12 12:00:00","N","11754.4238","W","2","10","0.8","-6.5","M","-12.4","M","","*51","","","","","",""
"2010-01-12 11:30:00","N","11754.4227","W","2","10","0.8","0.1","M","-12.4","M","","*73","","","","","",""
"2010-01-12 11:00:00","-7.4","M","-12.4","M","","*5F","","","","","","","","","","","",""
"2010-01-12 10:30:00","N","11754.4271","W","2","08","1.1","-8.4","M","-12.4","M","","*5A","","","","","",""
"""
gps_df = pd.read_csv(StringIO.StringIO(gps_string), header=None, index_col=0)
rows_to_shift = gps_df[gps_df[15].isnull()].index # get the indexes to shift
gps_df_all_strings = gps_df.astype(str) # Convert all the data to be of type str (string)
# Shift the data
gps_df_all_strings.loc[rows_to_shift] = gps_df_all_strings.loc[rows_to_shift].shift(periods=1, axis=1)
s = gps_df_all_strings.to_csv(header=None) # Put shifted csv data into a string after shifting.
new_gps_df = pd.read_csv(StringIO.StringIO(s), header=None, index_col=0) # re read csv data.
推荐阅读
- php - 数据和客户端套接字之间的关系
- c# - 覆盖 DisplayNameFor() 的默认约定
- ruby-on-rails - Q:活动记录关联被缓存。但是没有缓存活动记录范围?
- macos - 可以使用“始终信任”以编程方式在钥匙串访问中安装“不受信任的证书”或“P12”文件吗?
- javascript - 关于 null 和 undefined 的类型转换
- php - 我们可以使用 KnpPaginatorBundle 基于 Symfony 4 中的 2 个实体创建分页吗?
- android - 多个 dex 文件定义 Landroid/arch/core/util/Function firebase
- matlab - 如何根据 Matlab 中的坐标值计算三角插值?
- android - 未启动 firebasemessage 服务时未收到 Firebase 通知
- python-3.x - 如何在 peewee 中进行此 SQL 查询