首页 > 解决方案 > 在 Pandas 中使用 unstack

问题描述

应用 unstack 时出现异常,并想了解它。

对于一个可重现的例子:

(to load the data: pd.DataFrame(json.loads(titanic)))

titanic
'{"home.dest":{"0":"St Louis, MO","1":"Montreal, PQ \\/ Chesterville, ON","2":"Montreal, PQ \\/ Chesterville, ON","3":"Montreal, PQ \\/ Chesterville, ON","4":"Montreal, PQ \\/ Chesterville, ON","5":"New York, NY","6":"Hudson, NY","7":"Belfast, NI","8":"Bayside, Queens, NY","9":"Montevideo, Uruguay","10":"New York, NY","11":"New York, NY","12":"Paris, France","13":null,"14":"Hessle, Yorks","15":"New York, NY","16":"Montreal, PQ","17":"Montreal, PQ","18":null,"19":"Winnipeg, MN"},"pclass":{"0":1,"1":1,"2":1,"3":1,"4":1,"5":1,"6":1,"7":1,"8":1,"9":1,"10":1,"11":1,"12":1,"13":1,"14":1,"15":1,"16":1,"17":1,"18":1,"19":1},"survived":{"0":1,"1":1,"2":0,"3":0,"4":0,"5":1,"6":1,"7":0,"8":1,"9":0,"10":0,"11":1,"12":1,"13":1,"14":1,"15":0,"16":0,"17":1,"18":1,"19":0},"name":{"0":"Allen, Miss. Elisabeth Walton","1":"Allison, Master. Hudson Trevor","2":"Allison, Miss. Helen Loraine","3":"Allison, Mr. Hudson Joshua Creighton","4":"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)","5":"Anderson, Mr. Harry","6":"Andrews, Miss. Kornelia Theodosia","7":"Andrews, Mr. Thomas Jr","8":"Appleton, Mrs. Edward Dale (Charlotte Lamson)","9":"Artagaveytia, Mr. Ramon","10":"Astor, Col. John Jacob","11":"Astor, Mrs. John Jacob (Madeleine Talmadge Force)","12":"Aubart, Mme. Leontine Pauline","13":"Barber, Miss. Ellen \\"Nellie\\"","14":"Barkworth, Mr. Algernon Henry Wilson","15":"Baumann, Mr. John D","16":"Baxter, Mr. Quigg Edmond","17":"Baxter, Mrs. James (Helene DeLaudeniere Chaput)","18":"Bazzani, Miss. Albina","19":"Beattie, Mr. Thomson"},"sex":{"0":"female","1":"male","2":"female","3":"male","4":"female","5":"male","6":"female","7":"male","8":"female","9":"male","10":"male","11":"female","12":"female","13":"female","14":"male","15":"male","16":"male","17":"female","18":"female","19":"male"},"age":{"0":29.0,"1":0.92,"2":2.0,"3":30.0,"4":25.0,"5":48.0,"6":63.0,"7":39.0,"8":53.0,"9":71.0,"10":47.0,"11":18.0,"12":24.0,"13":26.0,"14":80.0,"15":null,"16":24.0,"17":50.0,"18":32.0,"19":36.0},"sibsp":{"0":0,"1":1,"2":1,"3":1,"4":1,"5":0,"6":1,"7":0,"8":2,"9":0,"10":1,"11":1,"12":0,"13":0,"14":0,"15":0,"16":0,"17":0,"18":0,"19":0},"parch":{"0":0,"1":2,"2":2,"3":2,"4":2,"5":0,"6":0,"7":0,"8":0,"9":0,"10":0,"11":0,"12":0,"13":0,"14":0,"15":0,"16":1,"17":1,"18":0,"19":0},"ticket":{"0":"24160","1":"113781","2":"113781","3":"113781","4":"113781","5":"19952","6":"13502","7":"112050","8":"11769","9":"PC 17609","10":"PC 17757","11":"PC 17757","12":"PC 17477","13":"19877","14":"27042","15":"PC 17318","16":"PC 17558","17":"PC 17558","18":"11813","19":"13050"},"fare":{"0":211.3375,"1":151.55,"2":151.55,"3":151.55,"4":151.55,"5":26.55,"6":77.9583,"7":0.0,"8":51.4792,"9":49.5042,"10":227.525,"11":227.525,"12":69.3,"13":78.85,"14":30.0,"15":25.925,"16":247.5208,"17":247.5208,"18":76.2917,"19":75.2417},"cabin":{"0":"B5","1":"C22 C26","2":"C22 C26","3":"C22 C26","4":"C22 C26","5":"E12","6":"D7","7":"A36","8":"C101","9":null,"10":"C62 C64","11":"C62 C64","12":"B35","13":null,"14":"A23","15":null,"16":"B58 B60","17":"B58 B60","18":"D15","19":"C6"},"embarked":{"0":"S","1":"S","2":"S","3":"S","4":"S","5":"S","6":"S","7":"S","8":"S","9":"C","10":"C","11":"C","12":"C","13":"S","14":"S","15":"S","16":"C","17":"C","18":"C","19":"C"},"boat":{"0":"2","1":"11","2":null,"3":null,"4":null,"5":"3","6":"10","7":null,"8":"D","9":null,"10":null,"11":"4","12":"9","13":"6","14":"B","15":null,"16":null,"17":"6","18":"8","19":"A"},"body":{"0":null,"1":null,"2":null,"3":135.0,"4":null,"5":null,"6":null,"7":null,"8":null,"9":22.0,"10":124.0,"11":null,"12":null,"13":null,"14":null,"15":null,"16":null,"17":null,"18":null,"19":null}}'

我使用以下命令创建了一个多索引:

titanic = titanic.set_index(['name', 'home.dest'])

然后我想解开。

titanic.unstack(level = 'home.dest')

我收到以下异常消息:

ValueError: Index contains duplicate entries, cannot reshape

标签: python-3.xpandasstackreshape

解决方案


错误是说您在其中构建 MultiIndex 的列的选择不是唯一的,因此由于存在歧义,因此在取消堆叠时存在问题。

解决此问题的一种方法是通过添加计数器来保证唯一性。

counts = titanic.gropuby(['name', 'home.dest']).cumcount().rename('Counter')
titanic = titanic.set_index(['name', 'home.dest', counts])

然后你unstack会工作

titanic.unstack(level = 'home.dest')

但我建议也许

titanic.unstack(['home.dest', 'Counter'])

否则,您必须与groupby

titanic.groupby(['name', 'home.dest']).first().unstack()

推荐阅读