首页 > 解决方案 > Sorting algorithm based on two conditions

问题描述

Problem

I am working on a Python problem, where a car dealer wants to accumulate a list of vehicles where the total mileage across the chosen vehicles is a maximum (constraint 1, I don't know why he would want the highest mileage but it is what it is) and he has to remain under a certain budget (constraint 2, $300000).

Question

I know how to sort data based on 1 condition, but sorting it based on 2 values is trickier than I thought. What is the best way to achieve my problem? Please see my attempt below.

Small sample of Data

--------------------------------------------------
| Licence | Manufacturer | Price         | Mileage
--------------------------------------------------
|   1     | Audi         |     42000     | 8000
--------------------------------------------------
|   2     | Mercedes     |     33000     | 15000
--------------------------------------------------
|   3     | Lexus        |     38000     | 10000
--------------------------------------------------
|   4     | BMW          |     25000     | 20000
--------------------------------------------------
|   5     | Mercedes     |     55000     | 33000
--------------------------------------------------

My Attempt

I first thought of assigning a sort of weight between the mileage and price, because a car could have high mileage but its price could also be very high, so I thought sorting based just on mileage is incorrect. For example, suppose I have three cars, A, B and C. Car A has 10000 miles and costs $20000. Car B has 20000 miles but costs $40000. In this case, choosing either wouldn't make a difference. But suppose car C has 25000 miles, but it costs $80000. The algorithm should first consider Car A and B before considering C, even though C has the most mileage, it is not 'worth' the price.

So I created a new column that is the ratio between the mileage and price and sorted this list using that ratio as the key and then reversed it to get the ratios starting from the highest value. I then looped through this list, appending the cars to a new list if the total amount didn't exceed the budget.

cost = 0;

with open(fileName, 'r') as inputFile:
    list1 = csv.reader(inputFile, delimiter=' ')
    list2 = [(row[0], row[1], row[2], row[3],  float(row[3])/float(row[2])) for l in list1]
    list2.sort(key = lambda x: x[4])
    list2.reverse()


cars2Buy = []
for l in list2:
    if (cost + int(row[2])) <= 300000:
       cost += int(row[2])
       cars2Buy.append((row[0], row[1], row[2], row[3]))

    else: break

However, I also tried a different dataset and sorted based just on the mileage, for example:

 list2.sort(key = lambda x: x[3]), 

instead of

list2.sort(key = lambda x: x[4])

and surprisingly in that particular dataset sorting based just on mileage gave me a list of cars that had more mileage than my 'weighting' algorithm and was still under budget! This must mean my way of solving this problem is flawed, but I can't see why. I would greatly appreciate any suggestions! Thanks.

标签: python

解决方案


您所描述的问题在我看来是一个背包问题:您有一组物品(汽车清单),每个物品都有一个价值(里程)和重量(价格),背包(汽车选择)有限重量方面的容量(总预算)。您希望选择能够最大限度地提高汽车价值的汽车,同时将总重量保持在容量以下。

您应该知道这是一个难题(NP-hard),并且根据您的数据大小,找到最佳解决方案可能需要很长时间。所以你经常不得不重复一个近似的解决方案。

您描述的算法(按价值/重量比排序并选择顶部物品直到背包装满)是贪心算法,它给出了一个不能保证是最优的近似解决方案。因此,我假设在您的情况下,贪心算法没有找到最佳解决方案(而可以通过按值选择顶级项目来找到更好的解决方案)。

发生这种情况的一个简单情况如下:假设您有 10K 的预算和 2 辆汽车的清单。一个里程9K,价格10K,另外一个里程和价格都等于2K。第二辆车具有更好的里程/价格比率(1 而不是 0,9),但如果您只选择里程数最多的汽车,您会有更好的解决方案(在这种情况下,它显然是最佳解决方案)。

更新

要找到为您提供最佳解决方案的实现,您应该在“背包求解器 python”或类似的东西周围搜索。你应该找到这样的东西(使用 Google 的 OR-Tools)或那样的东西(使用 PuLP 或其他库)。


推荐阅读