python - pandas: How to search by a list of values and return in the same order?
问题描述
Forgive me if this is a dupe, I've searched all morning and only found pieces of the puzzles and couldn't quite fit it all together.
My Quest:
I have a simple DataFrame
where I want to extract a view by the a search list
searches
in the same order of said list
. Example:
import pandas as pd
data = {k: [v+str(i) for i in range(10)] for k, v in zip(('OrderNo','Name', 'Useless','Description'),('1000','Product ', 'Junk ','Short Desc '))}
df = pd.DataFrame(data)
df.loc[2:6, ('Useless',)] = pd.np.nan
# to mock some nan data in my real one.
Resulting df
:
OrderNo Name Useless Description
0 10000 Product 0 Junk 0 Short Desc 0
1 10001 Product 1 Junk 1 Short Desc 1
2 10002 Product 2 Nan Short Desc 2
3 10003 Product 3 Nan Short Desc 3
4 10004 Product 4 Nan Short Desc 4
5 10005 Product 5 Nan Short Desc 5
6 10006 Product 6 Nan Short Desc 6
7 10007 Product 7 Junk 7 Short Desc 7
8 10008 Product 8 Junk 8 Short Desc 8
9 10009 Product 9 Junk 9 Short Desc 9
Now I want to search by a list
of the OrderNos
like so:
searches = ['10005','10009','10003','10000']
I'm trying to get to a view like this:
OrderNo Name Useless Description
5 10005 Product 5 Nan Short Desc 5
9 10009 Product 9 Junk 9 Short Desc 9
3 10003 Product 3 Nan Short Desc 3
0 10000 Product 0 Junk 0 Short Desc 0
So I can finally transpose the view into this (notice I dropped some useless column):
0 1 2 3
OrderNo 10005 10009 10003 10000
Name Product 5 Product 9 Product 3 Product 0
Description Short Desc 5 Short Desc 9 Short Desc 3 Short Desc 0
What I've tried:
This great question/answer helped me do a search by the searches
, but the returned view is not in my order:
found = df.loc[df['OrderNo'].isin(searches)]
OrderNo Name Useless Description
0 10000 Product 0 Junk 0 Short Desc 0
3 10003 Product 3 Nan Short Desc 3
5 10005 Product 5 Nan Short Desc 5
9 10009 Product 9 Junk 9 Short Desc 9
I tried adding a column ['my_sort']
to found
so I can reorder based on the list:
found['my_sort'] = found['OrderNo'].apply(lambda x: searches.index(x))
found.sort_values(by='my_sort', inplace=True)
# For now assume index will always be matched and ValueError will be handled.
# This detail is not critical
While this kinda works, pandas
is throwing SettingWithCopyWarning
all over the place, telling me to use .loc[row_indexer,col_indexer] = ...
instead. I tried that too and it's still throwing me the same warning. In fact it seems anything I try to assign under found
throws the same, so I suspected the problem came from the search. I ended up wrapping it as a new DataFrame
to not see the warning anymore:
found = pd.DataFrame(df.loc[df['OrderNo'].isin(searches)])
found['my_sort'] = found['OrderNo'].apply(lambda x: searches.index(x))
found = found[columns].T
While this works, I can't help but feel this is very convoluted and not very efficient as I had to introduce a new column just to sort and then drop again. I looked into a few relevant functions like reindex
or combo of where
and dropna
(doesn't work because there are other nan
objects in my real data) but none of them seem to work towards my goal.
Is there a better way to approach this?
解决方案
set_index
+ loc
+T
您可以利用 Pandas 索引功能:
df = df.set_index('OrderNo')
searches = ['10005','10009','10003','10000']
df_search = df.loc[searches]
print(df_search)
Description Name Useless
OrderNo
10005 Short Desc 5 Product 5 NaN
10009 Short Desc 9 Product 9 Junk 9
10003 Short Desc 3 Product 3 NaN
10000 Short Desc 0 Product 0 Junk 0
res = df_search.T
print(res)
OrderNo 10005 10009 10003 10000
Description Short Desc 5 Short Desc 9 Short Desc 3 Short Desc 0
Name Product 5 Product 9 Product 3 Product 0
Useless NaN Junk 9 NaN Junk 0
如果您需要编号的列标签:
print(df_search.reset_index().T)
0 1 2 3
OrderNo 10005 10009 10003 10000
Description Short Desc 5 Short Desc 9 Short Desc 3 Short Desc 0
Name Product 5 Product 9 Product 3 Product 0
Useless NaN Junk 9 NaN Junk 0
推荐阅读
- java - 如何在不创建新行的情况下将值存储到 Ms Access 中的单个单元格中?
- php - 一个系列中最大的 5 位数字
- excel - Excel 2013:如何在轴标签中显示相应的文本而不是数字?
- java - 我怎样才能简化这个奇数代码?
- javascript - 使用 nodeJS 显示为 HTML
- react-native - 在 react native android 文件夹中自动生成文件
- sql - 有没有办法在不选择 columnName 两次的情况下执行 SELECT to_char(t.columnName, Format), t.* FROM Table t?
- alfresco - alfresco 在最后一个索引处向文档库添加新操作
- javascript - 如何在 ExpressJS 中使用 JSON 数据呈现 HTML 页面
- mdc-components - 从按钮打开 MDCMenu