首页 > 解决方案 > 如何遍历数据框中每一列的行

问题描述

如果只有 1 个传感器,即如果 col2 和 col3 在下面提供的示例数据中被删除,我当前的代码将运行并生成一个图表,留下一列。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

d = {'col1': [-2587.944231, -1897.324231,-2510.304231,-2203.814231,-2105.734231,-2446.964231,-2963.904231,-2177.254231, 2796.354231,-2085.304231], 'col2': [-3764.468462,-3723.608462,-3750.168462,-3694.998462,-3991.268462,-3972.878462,3676.608462,-3827.808462,-3629.618462,-1841.758462,], 'col3': [-166.1357692,-35.36576923, 321.4157692,108.9257692,-123.2257692, -10.84576923, -100.7457692, 89.27423077, -211.0857692, 101.5342308]}

df = pd.DataFrame(data=d)
sensors = 3
window_size = 5
dfn = df.rolling(window_size).corr(pairwise = True)

index = df.index #index of values in the data frame.
rows = len(index) #len(index) returns number of rows in the data.
sensors = 3

baseline_num = [0]*(rows) #baseline numerator, by default zero
baseline = [0]*(rows) #initialize baseline value
baseline = DataFrame(baseline)
baseline_num = DataFrame(baseline_num)


v = [None]*(rows) # Initialize an empty array v[] equal to amount of rows in .csv file
s = [None]*(rows) #Initialize another empty array for the slope values for detecting when there is an exposure
d = [0]*(rows)

sensors_on = True #Is the sensor detecting something (True) or not (False).
off_count  = 0
off_require = 8 # how many offs until baseline is updated
sensitivity = 1000

for i in range(0, (rows)): #This iterates over each index value, i.e. each row, and sums the values and returns them in list format.

    v[i] = dfn.loc[i].to_numpy().sum() - sensors


for colname,colitems in df.iteritems():
    for rownum,rowitem in colitems.iteritems():

        #d[rownum] = dfone.loc[rownum].to_numpy()
        #d[colname][rownum] = df.loc[colname][rownum]

        if v[rownum] >= sensitivity:
            sensors_on = True
            off_count = 0
            baseline_num[rownum] = 0

        else:
            sensors_on = False
            off_count += 1
            if off_count == off_require:
                for x in range(0, (off_require)):
                    baseline_num[colname][rownum] += df[colname][rownum - x]

            elif off_count > off_require:
                baseline_num[colname][rownum] += baseline_num[colname][rownum - 1] + df[colname][rownum] - (df[colname][rownum - off_require]) #this loop is just an optimization, one calculation per loop once the first calculation is established

        baseline[colname][rownum] = ((baseline_num[colname][rownum])//(off_require)) #mean of the last "off_require" points



dfx = DataFrame(v, columns =['Sensor Correlation']) #converts the summed correlation tables back from list format to a DataFrame, with the sole column name 'Sensor Correlation'
dft = pd.DataFrame(baseline, columns =['baseline'])
dft = dft.astype(float)

dfx.plot(figsize=(50,25), linewidth=5, fontsize=40) # plots dfx dataframe which contains correlated and summed data
dft.plot(figsize=(50,25), linewidth=5, fontsize=40)

基本上,我想仅针对此循环遍历每一列,而不是生成 1 个图形:

for colname,colitems in df.iteritems():
    for rownum,rowitem in colitems.iteritems():

        #d[rownum] = dfone.loc[rownum].to_numpy()
        #d[colname][rownum] = df.loc[colname][rownum]

        if v[rownum] >= sensitivity:
            sensors_on = True
            off_count = 0
            baseline_num[rownum] = 0

        else:
            sensors_on = False
            off_count += 1
            if off_count == off_require:
                for x in range(0, (off_require)):
                    baseline_num[colname][rownum] += df[colname][rownum - x]

            elif off_count > off_require:
                baseline_num[colname][rownum] += baseline_num[colname][rownum - 1] + df[colname][rownum] - (df[colname][rownum - off_require]) #this loop is just an optimization, one calculation per loop once the first calculation is established

我已经尝试了其他问题的其他解决方案,但似乎都没有解决这个问题。到目前为止,我已经尝试过多次转换为列表和元组之类的东西,然后像这样称呼它们:

baseline_num[i,column] += d[i - x,column]

baseline_num[i][column += d[i - x][column]

同时使用迭代循环

for column in columns

但是,无论我如何安排解决方案,总是存在一些期望整数或切片索引的关键错误,以及其他错误。有关实际数据的一列的预期/可能输出,请参见图片。输入参数不同(灵敏度值和 off_require 在不同情况下会有所不同。)其中一个不起作用的解决方案是来自此链接的循环方法:

https://www.geeksforgeeks.org/iterating-over-rows-and-columns-in-pandas-dataframe/

我还尝试使用 ititems 作为外循环创建一个循环。这也不起作用。

下面是各种灵敏度值的可能图形输出的链接,以及我实际数据集中的窗口,只有一列。(即我手动删除了其他列,并仅使用当前程序绘制了一个)

灵敏度 1000,窗口 8

灵敏度 800,窗口 5

灵敏度 1500,窗口 5

如果我遗漏了任何有助于解决此问题的内容,请告诉我,以便我立即纠正。

请参阅这张图片以获取我原来的 df.head: df.head

标签: pythonpandasdataframenumpydata-science

解决方案


你试过了吗,

for colname,colitems in df.iteritems():
    for rownum,rowitem in colitems.iteritems():
        print(df[colname][rownum])

第一个循环遍历所有列,第二个循环遍历该列的所有行。

编辑:

从我们下面的对话中,我认为您的基线和 df 数据框没有相同的列名,因为您创建它们的方式以及访问元素的方式。

我的建议是您创建基线数据框作为您的 df 数据框的副本,并从那里编辑其中的信息。

编辑:

我已经设法让你的代码在 1 个循环中工作,但是我遇到了一个索引错误,我不确定你的优化函数做了什么,但我认为这是导致它的原因,看看。

就是这部分baseline_num[colname][rownum - 1],我猜在第二个循环中,因为你执行 rownum (0) -1,你得到索引 -1。您需要更改它,以便在第一个循环中 rownum 为 1 或其他内容,我不确定您要在那里做什么。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

d = {'col1': [-2587.944231, -1897.324231,-2510.304231,-2203.814231,-2105.734231,-2446.964231,-2963.904231,-2177.254231, 2796.354231,-2085.304231], 'col2': [-3764.468462,-3723.608462,-3750.168462,-3694.998462,-3991.268462,-3972.878462,3676.608462,-3827.808462,-3629.618462,-1841.758462,], 'col3': [-166.1357692,-35.36576923, 321.4157692,108.9257692,-123.2257692, -10.84576923, -100.7457692, 89.27423077, -211.0857692, 101.5342308]}

df = pd.DataFrame(data=d)
sensors = 3
window_size = 5
dfn = df.rolling(window_size).corr(pairwise = True)

index = df.index #index of values in the data frame.
rows = len(index) #len(index) returns number of rows in the data.
sensors = 3

baseline_num = [0]*(rows) #baseline numerator, by default zero
baseline = [0]*(rows) #initialize baseline value
baseline = pd.DataFrame(df)
baseline_num = pd.DataFrame(df)
#print(baseline_num)


v = [None]*(rows) # Initialize an empty array v[] equal to amount of rows in .csv file
s = [None]*(rows) #Initialize another empty array for the slope values for detecting when there is an exposure
d = [0]*(rows)

sensors_on = True #Is the sensor detecting something (True) or not (False).
off_count  = 0
off_require = 8 # how many offs until baseline is updated
sensitivity = 1000

for i in range(0, (rows)): #This iterates over each index value, i.e. each row, and sums the values and returns them in list format.

    v[i] = dfn.loc[i].to_numpy().sum() - sensors


for colname,colitems in df.iteritems():
    #print(colname)
    for rownum,rowitem in colitems.iteritems():
        #print(rownum)
        #display(baseline[colname][rownum])
        #d[rownum] = dfone.loc[rownum].to_numpy()
        #d[colname][rownum] = df.loc[colname][rownum]

        if v[rownum] >= sensitivity:
            sensors_on = True
            off_count = 0
            baseline_num[rownum] = 0

        else:
            sensors_on = False
            off_count += 1
            if off_count == off_require:
                for x in range(0, (off_require)):
                    baseline_num[colname][rownum] += df[colname][rownum - x]

            elif off_count > off_require:
                baseline_num[colname][rownum] += baseline_num[colname][rownum - 1] + df[colname][rownum] - (df[colname][rownum - off_require]) #this loop is just an optimization, one calculation per loop once the first calculation is established

        baseline[colname][rownum] = ((baseline_num[colname][rownum])//(off_require)) #mean of the last "off_require" points

        print(baseline[colname][rownum])


dfx = pd.DataFrame(v, columns =['Sensor Correlation']) #converts the summed correlation tables back from list format to a DataFrame, with the sole column name 'Sensor Correlation'
dft = pd.DataFrame(baseline, columns =['baseline'])
dft = dft.astype(float)

dfx.plot(figsize=(50,25), linewidth=5, fontsize=40) # plots dfx dataframe which contains correlated and summed data
dft.plot(figsize=(50,25), linewidth=5, fontsize=40)

我的输出看起来像这样,

-324.0
-238.0
-314.0
-276.0
-264.0
-306.0
-371.0
-806.0
638.0
-412.0

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
    354                 try:
--> 355                     return self._range.index(new_key)
    356                 except ValueError as err:

ValueError: -1 is not in range


The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)

3 frames

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
    355                     return self._range.index(new_key)
    356                 except ValueError as err:
--> 357                     raise KeyError(key) from err
    358             raise KeyError(key)
    359         return super().get_loc(key, method=method, tolerance=tolerance)

KeyError: -1


推荐阅读