首页 > 解决方案 > Python 定位不重复的特定单词

问题描述

我有个问题。我正在尝试在字符串中查找设备名称。我要查找的所有设备名称都存储在一个列表中。我想要的有一件非常重要的事情:

现在我遇到的问题是:

我有两个设备(FanFan Light)。当我发出命令时:Turn on Fan Light两个设备都已找到,但我只想Fan Light找到。我尝试检查所有已找到的设备并将最长的设备设置为找到的设备,如下所示:

# Create 2 dummy devices
device1 = {
    "name": "fan"
}

device2 = {
    "name": "fan light"
}


# Add devices to list
devices = []
devices.append(device1)
devices.append(device2)


# Given command
command = "Turn on fan light"
    
    
foundDevices = []

# Search devices in sentence
for device in devices:

    # Splits a device name if it has multiple words
    deviceSplit = device["name"].split()
    numOfSubNames = len(deviceSplit)

    # Checks for every sub-name if it is found in the string
    i = 0
    for subName in deviceSplit:
        if subName in command:
            i += 1

    # Checks if all names where located in string
    if i == numOfSubNames:
         foundDevices.append(device["name"])

# Checks if multiple devices have been found
if len(foundDevices) >= 2:
    largestNameLength = 0

    # Checks which device has the largest name
    for device in foundDevices:
        if (len(device) > largestNameLength):
            largestName = device
            largestNameLength = len(device)


    # Clears list and only add longest one
    foundDevices.clear()
    foundDevices.append(largestName)


print(foundDevices)

但是当我说例如“打开风扇灯和风扇”时就会出现问题,因为该命令确实包含多个设备。如何以我想要的方式扫描设备?

标签: python

解决方案


正则表达式搜索是一种快速执行所需操作的方法,其模式由不同的设备名称组成。

import re

def find_with_regex(command, pattern):
    return list(set(re.findall(pattern, command, re.IGNORECASE)))

我还建议构建device: name形状的反向字典,也许它有助于快速找到给定设备的代号。

devices = [{'name': 'fan light'}, {'name': 'fan'}]

# build a quick-reference dict with device>name structure
transformed = {dev: name for x in devices for name, dev in x.items()}
# should also help weeding out duplicated devices
# as it would raise an error as soon as it fids one

# print(transformed)
# {'fan light': 'name', 'fan': 'name'}

特别感谢buddemat指出设备名称按特定顺序排列以使该解决方案正常工作,并reversed(sorted(...在下一个代码块的模式制作行上对其进行了修复。

测试功能

test_cases = [
    'Turn on fan light',
    'Turn on fan light and fan',
    'Turn on fan and fan light',
    'Turn on fan and fan',
]

pattern = '|'.join(reversed(sorted(transformed)))
for command in test_cases:
    matches = find_with_regex(command, pattern)
    print(matches)

输出

['fan light']
['fan', 'fan light']
['fan', 'fan light']
['fan']

推荐阅读