首页 > 解决方案 > 拆分列表并为两个值保留唯一标识符 - python

问题描述

我正在尝试找出一种将这些值最好地映射到课程名称的方法。每次抓取的数据的结果并不相同。每次列表中出现“购物车”元素时,都表示新课程已被抓取。拆分后,我试图将课程名称与这些标签和值联系起来,以便稍后加入另一个 DataFrame。最好的方法是什么?

原始列表/映射:

['Carts\nYes - $18', 'Clubs\nYes', 'GPS\nNo', 'Pull-carts\nYes', 'Carts\nYes', 'Clubs\nYes', 'GPS\nNo', 'Pull-carts\nYes']

['Course_1, Course_1, Course_1, Course_1, Course_2, Course_2, Course_2, Course_2]

代码:

def split_func():
    for r in rentals_list:
        split = r.split('\n')
        print(split)
        temp_rentals_cleansed.append(split)
        

    #FLATTENING LIST
    for sublist in temp_rentals_cleansed:
        for item in sublist:
            rentals_cleansed.append(item)


    rental_label = rentals_cleansed[::2]
    rental_value = rentals_cleansed[1::2]

rental_cleanser()

输出:

['Carts', 'Yes - $18', 'Clubs', 'Yes', 'GPS', 'No', 'Pull-carts', 'Yes', 'Carts', 'Yes', 'Clubs', 'Yes', 'GPS', 'No', 'Pull-carts', 'Yes'] #<- rentals_cleansed

['Carts', 'Clubs', 'GPS', 'Pull-carts', 'Carts', 'Clubs', 'GPS', 'Pull-carts'] #rental_label
['Yes - $18', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes'] #rental_value

编辑

下面的解决方案有效,当我将 DataFrame 保存到 CSV 时,我得到以下输出。给定以下解决方案,有没有办法将课程包含在数据框中?

Carts,Clubs,GPS,Pull-carts
Yes - $18,Yes,No,Yes
Yes,Yes,No,Yes
Yes,Yes,,Yes
Yes,Yes,,Yes
Yes,Yes,No,Yes

标签: pythonpandasdictionarysplit

解决方案


import pandas as pd

# initial data
cx = ['Course_1', 'Course_1', 'Course_1', 'Course_1', 'Course_2', 'Course_2', 'Course_2', 'Course_2']
ax = ['Carts\nYes - $18', 'Clubs\nYes', 'GPS\nNo', 'Pull-carts\nYes', 'Carts\nYes', 'Clubs\nYes', 'GPS\nNo', 'Pull-carts\nYes']

dicCs = {} # dictionary of courses, each course will have a dictionary of values

zz = zip (cx,ax) # merge list to tuple list -> [('Course_1', 'Carts\nYes - $18'), ('Course_1', 'Clubs\nYes'),......]
for p in zz: # each tuple in list
   if (not p[0] in dicCs): dicCs[p[0]] = {} # create entry for this course if not exist
   dicCs[p[0]][p[1].split('\n')[0]]=p[1].split('\n')[1]  # add dictionary entry to this course: 'Clubs\nYes' -> 'Clubs'='Yes' 

print("\n",dicCs)  # show final dictionary (actually dictionary of dictionaries)
   
df = pd.DataFrame() # empty dataframe
for c in dicCs:  # each course (Course_1, Course_2)
    for k in dicCs[c]:  # each key in course (Carts, Clubs,...)
        df.at[c,k] = dicCs[c][k] # set value in dataframe using course and key as coordinates

df.index.name = "Course" # set first (index) column name
print("\n",df)  # final dataframe

输出

               Carts Clubs GPS Pull-carts
Course
Course_1  Yes - $18   Yes  No        Yes
Course_2        Yes   Yes  No        Yes

推荐阅读