首页 > 解决方案 > 如何标记数据框中的异常(按行)?

问题描述

这里是 Python 新手,我想标记那些明显偏离行的其余部分的零星数字。简单来说,标记似乎不属于每一行的数字。100 和 100000 中的数字被认为是“其余的”

import pandas as pd
  
# intialise data of lists.
data = {'A':['R1', 'R2', 'R3', 'R4', 'R5'],
         'B':[12005, 18190, 1021, 13301, 31119,],
        'C':[11021, 19112, 19021,15, 24509 ],
        'D':[10022,19910, 19113,449999, 25519],
        'E':[14029, 29100, 39022, 24509, 412271],
        'F':[52119,32991,52883,69359,57835],
         'G':[41218, 52991,1021,69152,79355],
         'H': [43211,7672991,56881,211,77342],
          'J': [31211,42901,53818,62158,69325],
        }
  
# Create DataFrame
df = pd.DataFrame(data)
  
# Print the output.
df.describe()

我正在尝试做这样的事情

在此处输入图像描述

# I need help with step 1

#my code/pseudocode

# step 1:  identify the values in each row that are don't belong to the group


# step 2:  flag the identified values and export to excel

style_df = .applymap(lambda x: "background-color: yellow" if x else "") # flags the values that meets the criteria 

with pd.ExcelWriter("flagged_data.xlsx", engine="openpyxl") as writer:
    df.style.apply(lambda x: style_df, axis=None).to_excel(writer,index=False)


标签: pythonpandasdataframeexport-to-excel

解决方案


我在这里使用了两个条件,一个检查小于 1000,另一个检查大于 99999。基于此条件,代码将以红色突出显示异常值。

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')

# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')

# Get the xlsxwriter workbook and worksheet objects.
workbook  = writer.book
worksheet = writer.sheets['Sheet1']

# Add a format. Light red fill with dark red text.
format1 = workbook.add_format({'bg_color': '#FFC7CE',
                               'font_color': '#9C0006'})

first_row = 1
first_col = 2
last_row  = len(df)
last_col  = 9

worksheet.conditional_format(first_row, first_col, last_row, last_col,
                                        {'type':     'cell',
                                        'criteria': '<',
                                        'value':    1000,
                                        'format':   format1})

worksheet.conditional_format(first_row, first_col, last_row, last_col,
                                        {'type':     'cell',
                                        'criteria': '>',
                                        'value':    99999,
                                        'format':   format1})

# Close the Pandas Excel writer and output the Excel file.
writer.save()

推荐阅读