首页 > 解决方案 > 使用 2 列数据创建 4 个新的 csv 列

问题描述

我有一个 4 列的 csv 并将其放入 pandas

username   likes  username2   likes2

Mill       2      John        10
John       3      Clyde       2 
Tye        7      Ellie       4
Clyde      8      Mill        9
Ellie      2      Ken         2

我正在尝试创建四个新列:

  1. username3包含名称在username和中的所有用户的列表username2

  2. like3这些用户的likeslikes2数字相加。

  3. username4在任一列表中只出现一次的用户列表。

  4. likes4那些用户不变的喜欢。

期望的输出:

username   likes  username2   likes2  username3  likes3  username4  likes4

Mill       2      John        10      Mill       11      Tye        7
John       3      Clyde       2       John       13      Ken        2
Tye        7      Ellie       4       Clyde      10
Clyde      8      Mill        9       Ellie      6
Ellie      2      Ken         2

标签: pythonpandascsv

解决方案


一种方法是:

  • 把你的用户名和点赞以及用户名2和点赞2变成{username, likes}和{username2, likes2}的2个字典
  • 添加匹配键的值,不匹配的键不会添加任何内容。把它变成一个名为usernamesandlikes.
  • usernameandlikesintersection包含所有用户名并汇总了剩余 4 列的喜欢,因此您需要做的就是使用username3 的symmetric_difference列表和username4 的列表将它们过滤到各自的列中

执行此操作的代码如下:

import pandas as pd
import numpy as np
import collections, functools, operator 

df = pd.read_csv("your_file.csv")

usernameandlikes = pd.Series(dict(functools.reduce(operator.add, map(collections.Counter, [dict(zip(df["username"], df["likes"])),  dict(zip(df["username2"], df["likes2"]))])))).reset_index() 
usernameandlikes.columns = ["users", "likes"]

username3_likes3 = usernameandlikes.loc[usernameandlikes['users'].isin(list(set(df["username"]).intersection(set(df["username2"]))))].reset_index(drop=True)

username3_likes4 = usernameandlikes.loc[usernameandlikes['users'].isin(list(set(df["username"]).symmetric_difference(set(df["username2"]))))].reset_index(drop=True)

dfoutput = pd.concat([df, username3_likes3, username3_likes4], axis=1)
dfoutput.columns = ["username", "likes", "username2", "likes2", "username3", "likes3", "username4", "likes4"]

  username  likes username2  likes2 username3  likes3 username4  likes4
0     Mill      2      John      10      Mill    11.0       Tye     7.0
1     John      3     Clyde       2      John    13.0       Ken     2.0
2      Tye      7     Ellie       4     Clyde    10.0       NaN     NaN
3    Clyde      8      Mill       9     Ellie     6.0       NaN     NaN
4    Ellie      2       Ken       2       NaN     NaN       NaN     NaN

推荐阅读