python - 如何迭代直到所有条目都在给定列中?
问题描述
我正在尝试将 while 语句应用于我的代码,以便运行它,直到下面列表中的所有元素(在 Check 列中)都在 Source 列中。
到目前为止,我的代码是:
while set_condition: # to set the condition
newCol = pd.Series(list(set(df['Check']) - set(df['Source']))) # this check for elements which are not currently included in the column Source
newList1 = newCol.apply(lambda x: my_function(x)) # this function should generate the lists n Check -> this explains why I need to create a while statement
df = df.append(pd.DataFrame(dict('Source'=newCol, 'Check'=newList1)), ignore_index=True) # append the results in the new column
df = df.explode('Check')
我会给你一个过程和如何my_function
工作的例子:假设我有我的初始数据集
Source Check
mouse [dog, horse, cat]
horse [mouse, elephant]
tiger []
elephant [horse, bird]
在爆炸Check
列并将结果附加到之后Source
,我将拥有
Source Check
mouse [dog, horse, cat]
horse [mouse, elephant]
tiger []
elephant [horse, bird]
dog [] # this will be filled in after applying the function
cat [] # this will be filled in after applying the function
bird [] # this will be filled in after applying the function
在应用函数之前,列表中的每个元素都应该添加到 Source 列中。当我应用这个函数时,我填充了其他元素的列表;所以,例如我可以有
Source Check
mouse [dog, horse, cat]
horse [mouse, elephant]
tiger []
elephant [horse, bird]
dog [mouse, fish] # they are filled in
cat [mouse]
bird [elephant, penguin]
fish [dog]
由于fish
and penguin
are not in Source
,我将需要再次运行代码以获得预期的输出(列表中的所有元素都已经在 Source 列中):
Source Check
mouse [dog, horse, cat]
horse [mouse, elephant]
tiger []
elephant [horse, bird]
dog [mouse, fish]
cat [mouse]
bird [elephant, penguin]
fish [dog]
penguin [bird]
因为两者dog
都bird
已经在 中Source
,所以我不需要再次应用该函数,因为所有列表都填充了 Source 列中已经存在的元素。代码可以停止运行。
我想做的是在列表中的所有元素都在 Source 列中并应用该函数来填充所有列表时停止循环/循环。
感谢您提供的所有帮助。
解决方案
如果您重复循环直到没有更多行要添加到 DataFrame 中,这与说 的所有元素df['Check']
都在df['Source']
. 无论如何,您必须计算每个循环,那么为什么不使用它来跳出循环呢?
while True: # loop forever!
diff = set(df['Check']) - set(df['Source'])
if len(diff) == 0:
break # done!
newCol = pd.Series(list(diff))
newList1 = newCol.apply(lambda x: my_function(x))
df = df.append(pd.DataFrame(dict('Source'=newCol, 'Check'=newList1)), ignore_index=True)
df = df.explode('Check') # NOTE: I will use this to my advantage in the next suggested solution
因为不断附加到 DataFrame 会占用内存,所以您可能需要考虑先构建列,然后在循环之外一次构建 DataFrame。df['Check']
无论如何最终都会爆炸,所以从爆炸开始并建立在这些列表上:
df = df.explode('Check')
check = df['Check'] # Append to this list as we iterate
source = df['Source'] # Append to this list as we iterate
unique_source = set(source)
diff = set(check) - unique_source # Iterate until this is empty
while len(diff) > 0:
new_check = [my_function(x) for x in diff] # a list of lists
check.append(new_check) # Add the list of lists as-is, but explode later
source.append(diff) # Keep track of the new sources for the DataFrame...
unique_source.update(diff) # and keep track of the unique sources for efficiency
flat_check = set(x for sublist in new_check for x in sublist)
diff = flat_check - unique_source # We only have to check the new elements!
df = pd.DataFrame({"Check": check, "Source": source}).explode("Check") # build the entire DataFrame at once
有很多方法可以使用这个结构来获得你想要的 DataFrame 的结构。例如,如果您不想爆炸,只需保留本示例开头df['Check']
的原始版本并将新数据附加到该版本:df
new_df = df.explode('Check')
unique_source = set(new_df['Source'])
diff = set(new_df['Check']) - unique_source
source = [] # append to empty lists
check = [] # append to empty lists
while len(diff) > 0:
# ...
df = pd.append([df, pd.DataFrame({"Check": check, "Source": source})]) # keep the unexploded columns
推荐阅读
- css - 在@font-face 中添加行高
- oracle - 需要 Oracle 绑定插入示例
- meteor - Meteor 捆绑可视化器将动态导入添加到初始客户端捆绑
- flutter - 找不到 futter_share_me packgae 的明确示例
- ios - 先决条件中的意外字符
- java - 为 Basic Auth 和 JWT 配置多种身份验证类型
- javascript - 如何在mongodb中删除子对象
- javascript - 如何添加数组的对象属性并更新它?
- swift - 如何在同一导航控制器的第二个 VC 中隐藏 searchBar
- python - PDFMiner TypeError:字符串格式化期间并非所有参数都转换