首页 > 解决方案 > pandas: How to search by a list of values and return in the same order?

问题描述

Forgive me if this is a dupe, I've searched all morning and only found pieces of the puzzles and couldn't quite fit it all together.

My Quest:

I have a simple DataFrame where I want to extract a view by the a search list searches in the same order of said list. Example:

import pandas as pd
data = {k: [v+str(i) for i in range(10)] for k, v in zip(('OrderNo','Name', 'Useless','Description'),('1000','Product ', 'Junk ','Short Desc '))}
df = pd.DataFrame(data)
df.loc[2:6, ('Useless',)] = pd.np.nan
# to mock some nan data in my real one.

Resulting df:

  OrderNo       Name Useless   Description
0   10000  Product 0  Junk 0  Short Desc 0
1   10001  Product 1  Junk 1  Short Desc 1
2   10002  Product 2     Nan  Short Desc 2
3   10003  Product 3     Nan  Short Desc 3
4   10004  Product 4     Nan  Short Desc 4
5   10005  Product 5     Nan  Short Desc 5
6   10006  Product 6     Nan  Short Desc 6
7   10007  Product 7  Junk 7  Short Desc 7
8   10008  Product 8  Junk 8  Short Desc 8
9   10009  Product 9  Junk 9  Short Desc 9

Now I want to search by a list of the OrderNos like so:

searches = ['10005','10009','10003','10000']

I'm trying to get to a view like this:

  OrderNo       Name Useless   Description
5   10005  Product 5     Nan  Short Desc 5
9   10009  Product 9  Junk 9  Short Desc 9
3   10003  Product 3     Nan  Short Desc 3
0   10000  Product 0  Junk 0  Short Desc 0

So I can finally transpose the view into this (notice I dropped some useless column):

                        0             1             2             3
OrderNo             10005         10009         10003         10000
Name            Product 5     Product 9     Product 3     Product 0
Description  Short Desc 5  Short Desc 9  Short Desc 3  Short Desc 0

What I've tried:

This great question/answer helped me do a search by the searches, but the returned view is not in my order:

found = df.loc[df['OrderNo'].isin(searches)]

  OrderNo       Name Useless   Description
0   10000  Product 0  Junk 0  Short Desc 0
3   10003  Product 3     Nan  Short Desc 3
5   10005  Product 5     Nan  Short Desc 5
9   10009  Product 9  Junk 9  Short Desc 9

I tried adding a column ['my_sort'] to found so I can reorder based on the list:

found['my_sort'] = found['OrderNo'].apply(lambda x: searches.index(x))
found.sort_values(by='my_sort', inplace=True)
# For now assume index will always be matched and ValueError will be handled.
# This detail is not critical

While this kinda works, pandas is throwing SettingWithCopyWarning all over the place, telling me to use .loc[row_indexer,col_indexer] = ... instead. I tried that too and it's still throwing me the same warning. In fact it seems anything I try to assign under found throws the same, so I suspected the problem came from the search. I ended up wrapping it as a new DataFrame to not see the warning anymore:

found = pd.DataFrame(df.loc[df['OrderNo'].isin(searches)])
found['my_sort'] = found['OrderNo'].apply(lambda x: searches.index(x))
found = found[columns].T

While this works, I can't help but feel this is very convoluted and not very efficient as I had to introduce a new column just to sort and then drop again. I looked into a few relevant functions like reindex or combo of where and dropna (doesn't work because there are other nan objects in my real data) but none of them seem to work towards my goal.

Is there a better way to approach this?

标签: pythonpandas

解决方案


set_index+ loc+T

您可以利用 Pandas 索引功能:

df = df.set_index('OrderNo')

searches = ['10005','10009','10003','10000']

df_search = df.loc[searches]

print(df_search)

          Description       Name Useless
OrderNo                                 
10005    Short Desc 5  Product 5     NaN
10009    Short Desc 9  Product 9  Junk 9
10003    Short Desc 3  Product 3     NaN
10000    Short Desc 0  Product 0  Junk 0

res = df_search.T

print(res)

OrderNo             10005         10009         10003         10000
Description  Short Desc 5  Short Desc 9  Short Desc 3  Short Desc 0
Name            Product 5     Product 9     Product 3     Product 0
Useless               NaN        Junk 9           NaN        Junk 0

如果您需要编号的列标签:

print(df_search.reset_index().T)

                        0             1             2             3
OrderNo             10005         10009         10003         10000
Description  Short Desc 5  Short Desc 9  Short Desc 3  Short Desc 0
Name            Product 5     Product 9     Product 3     Product 0
Useless               NaN        Junk 9           NaN        Junk 0

推荐阅读