python - Pandas 计算多列中的匹配项
问题描述
我有一个数据框,其中的列来自A - Z
. 值为0,1 or NA
。我需要迭代地比较列A
and N
,A
等等O
,直到Z
,然后循环返回以开始比较B
and N
,B
and O
,然后C
再次开始比较。我只需要1
比较两列中出现的行数。我该如何做到这一点?
解决方案
使用 SQL 可以更轻松地进行集合操作,因此下面的示例使用 pandasql 进行您要求的比较:
import pandas as pd
import pandasql as ps
import string
# Create a string consisting of the letters in the English alphabet in alphabetical order
alphabet_string = string.ascii_uppercase
#print(alphabet_string)
# Create a list of data
data = []
# To approximate your data, use the value 0, 1, and None (~null) for each column
data.append([0] * len(alphabet_string))
data.append([1] * len(alphabet_string))
data.append([None] * len(alphabet_string))
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = [letter for letter in alphabet_string])
# Create a list of the letters from A to N
a_to_n = [letter for letter in alphabet_string if letter < "O"]
print(a_to_n)
# And N to O
n_to_o = [letter for letter in alphabet_string if letter > "M"]
print(n_to_o)
# Then perform the comparison in a nested loop over the two lists
for ll in a_to_n:
for rl in n_to_o:
cnt = ps.sqldf(f"select count(*) cnt from df where {ll} = 1 and {rl} = 1")["cnt"].iloc[0]
print(f"Comparing {ll} to {rl}, there were {cnt} rows where the values matched.")
其结尾打印如下:
Comparing N to U, there were 1 rows where the values matched.
Comparing N to V, there were 1 rows where the values matched.
Comparing N to W, there were 1 rows where the values matched.
Comparing N to X, there were 1 rows where the values matched.
Comparing N to Y, there were 1 rows where the values matched.
Comparing N to Z, there were 1 rows where the values matched.
推荐阅读
- css - GridPane 背景不显示图像的透明度
- matlab - 平均复数矩阵中的每 n 行
- neo4j - Neo4j 高效地添加多个节点和边
- sql-server - SQL Server 2008 R2 SP3 中的 LogShipping 问题
- django - 如何在 Django 中使用 Vue.js?
- polymer - 如何将外部 JavaScript 导入 Polymer 3.0
- java - 如何使资源文件在 jar 中可用?
- c# - CMake - 将预建库链接到 C# 项目
- git - 处理以 GIT 结尾的行
- vb.net - VB.NET选择区域屏幕如何忽略鼠标中的锁定中心以使游戏将鼠标移动到我想要的位置