python - 使用 Pandas 如何使用列数据对大数据进行统计分析
问题描述
在我的数据集中,我有 48000 个村庄,每个村庄有 10 到 12 种作物,每个村庄每种作物的播种面积,我想找出哪些作物在哪些村庄有主要面积,在所有作物中在那个村庄,作物 1 到 ... 作物 n 的百分比是多少?即我想知道庄稼的比例,如果村A有crop-1和crop-2,那么A有多少百分比的crop-1和crop-2
所以,接下来我可以对特定作物的村庄进行排名。之后我可以了解哪些作物是大面积播种的村庄。
District Taluka Village Name Crop Area in hec
0 Ahmednagar Pathardi Alhanwadi Bajara 370.0
1 Ahmednagar Pathardi Adgaon Bajara 302.0
2 Ahmednagar Pathardi Ambika Nagar Bajara 40.0
3 Ahmednagar Pathardi Bharajwadi Bajara 90.0
4 Ahmednagar Pathardi Bhalgaon Bajara 254.0
5 Ahmednagar Pathardi Bhawarwadi (N.V.) Bajara 35.0
6 Ahmednagar Pathardi Badewadi Bajara 17.0
7 Ahmednagar Pathardi Akola Bajara 175.0
8 Ahmednagar Pathardi Auranjpur Bajara 35.0
9 Ahmednagar Pathardi Agaskhand Bajara 100.0
10 Ahmednagar Pathardi Alhanwadi Cotton 150.0
11 Ahmednagar Pathardi Adgaon Cotton 310.0
12 Ahmednagar Pathardi Ambika Nagar Cotton 131.0
13 Ahmednagar Pathardi Bharajwadi Cotton 161.0
14 Ahmednagar Pathardi Bhalgaon Cotton 562.0
15 Ahmednagar Pathardi Bhawarwadi (N.V.) Cotton 211.0
16 Ahmednagar Pathardi Badewadi Cotton 104.0
17 Ahmednagar Pathardi Akola Cotton 550.0
18 Ahmednagar Pathardi Auranjpur Cotton 0.0
19 Ahmednagar Pathardi Agaskhand Cotton 0.0
20 Ahmednagar Pathardi Alhanwadi Soybean 26.0
21 Ahmednagar Pathardi Adgaon Soybean 52.0
22 Ahmednagar Pathardi Ambika Nagar Soybean 72.0
23 Ahmednagar Pathardi Bharajwadi Soybean 88.0
24 Ahmednagar Pathardi Bhalgaon Soybean 90.0
25 Ahmednagar Pathardi Bhawarwadi (N.V.) Soybean 93.0
26 Ahmednagar Pathardi Badewadi Soybean 100.0
27 Ahmednagar Pathardi Akola Soybean 10.0
28 Ahmednagar Pathardi Auranjpur Soybean 45.0
29 Ahmednagar Pathardi Agaskhand Soybean 20.0
30 Ahmednagar Pathardi Alhanwadi Maize 10.0
31 Ahmednagar Pathardi Adgaon Maize 1.5
32 Ahmednagar Pathardi Ambika Nagar Maize 3.0
33 Ahmednagar Pathardi Bharajwadi Maize 5.0
34 Ahmednagar Pathardi Bhalgaon Maize 12.0
35 Ahmednagar Pathardi Bhawarwadi (N.V.) Maize 51.0
36 Ahmednagar Pathardi Badewadi Maize 5.0
37 Ahmednagar Pathardi Akola Maize 25.0
38 Ahmednagar Pathardi Auranjpur Maize 5.0
39 Ahmednagar Pathardi Agaskhand Maize 10.0
import pandas as pd
import numpy as np
D=pd.read_excel("/media/desktop/Sample-2.xlsx","Sheet1")
village=D["Village Name"].unique()
crop=D["Crop"].unique()
q1=[]
for i in village:
for j in crop:
a=D["Village Name"]==i
b=D["Crop"]==j
D1=D[a&b]
q1.append(D1)
q2=[]
for i in q1:
if i.empty==False:
q2.append(i)
现在我们可以得到以公顷为单位的乡村作物播种面积,接下来我们必须计算作物 1 的村庄 A %,作物 2 的百分比 ... 作物 n 的百分比。
公式:对于村庄 A,作物 1 是作物 1/该村的所有作物,我们得到该村庄的作物 1 %,以同样的方式找出作物 2 的 %。
所有村庄都一样。
有什么建议吗?
解决方案
首先是每个村庄使用的顶级作物:
df1 = df.sort_values(['Village Name','Area in hec'], ascending=[True, False])
df2 = df1.drop_duplicates('Village Name')
print (df2)
District Taluka Village Name Crop Area in hec
11 Ahmednagar Pathardi Adgaon Cotton 310.0
9 Ahmednagar Pathardi Agaskhand Bajara 100.0
17 Ahmednagar Pathardi Akola Cotton 550.0
0 Ahmednagar Pathardi Alhanwadi Bajara 370.0
12 Ahmednagar Pathardi Ambika Nagar Cotton 131.0
28 Ahmednagar Pathardi Auranjpur Soybean 45.0
16 Ahmednagar Pathardi Badewadi Cotton 104.0
14 Ahmednagar Pathardi Bhalgaon Cotton 562.0
13 Ahmednagar Pathardi Bharajwadi Cotton 161.0
15 Ahmednagar Pathardi Bhawarwadi (N.V.) Cotton 211.0
以及每株作物的面积百分比:
s = df1.groupby("Crop")['Area in hec'].transform('sum')
df1['perc'] = df1['Area in hec'].div(s).mul(100)
print (df1.head(10))
District Taluka Village Name Crop Area in hec perc
11 Ahmednagar Pathardi Adgaon Cotton 310.0 14.226709
1 Ahmednagar Pathardi Adgaon Bajara 302.0 21.297602
21 Ahmednagar Pathardi Adgaon Soybean 52.0 8.724832
31 Ahmednagar Pathardi Adgaon Maize 1.5 1.176471
9 Ahmednagar Pathardi Agaskhand Bajara 100.0 7.052186
29 Ahmednagar Pathardi Agaskhand Soybean 20.0 3.355705
39 Ahmednagar Pathardi Agaskhand Maize 10.0 7.843137
19 Ahmednagar Pathardi Agaskhand Cotton 0.0 0.000000
17 Ahmednagar Pathardi Akola Cotton 550.0 25.240936
7 Ahmednagar Pathardi Akola Bajara 175.0 12.341326
推荐阅读
- java - Hibernate TypeMismatchException - 为类提供了错误类型的 id
- angular - 检测 Angular 9 应用程序的浏览器关闭
- networking - 捎带如何抑制重复段?
- python - Split String in Text File to Multiple Rows in Python
- html - VBA Web-scraping:单击具有动态生成的名称和 ID 的按钮
- ruby-on-rails - 如何在 Rails 视图中调用 javascript 函数?
- testing - Gatling,测试并发问题
- angular - 使用响应式表单设计复杂的 UI Angular 8
- html - 如何获取某个类之前的 HTML 元素?
- kubernetes - 如何在大使 api 网关中禁用 openapi docs API 调用?