首页 > 解决方案 > 如何修复我的 LINQ 以从字符串列表中正确查找字符串?

问题描述

几个小时以来,我一直在努力使用 LINQ,它应该从地址列表中的对象列表中找到城市。

我有一个CityModel对象列表,其中:

public class CityModel
    {
        public string City { get; set; }
        public char CountryChar { get; set; }
    }

AddressModel对象列表:

public class AddressModel
    {
        public string Address { get; set; }
        public char CountryChar { get; set; }
    }

在这两种情况下CountryChar,国家的第一个字母都属于CityorAddress属性。所有字符串和字符都是从 解析的ToLower(),所以它们都是小写的。

对象示例CityModel

            cities.Add(new CityModel()
            {
                City = "singapore",
                CountryChar = 's'
            }); //Singapore in singapore
            cities.Add(new CityModel()
            {
                City = "anthony",
                CountryChar = 'u'
            }); //Anthony in United States

对象的两种情况AddressModel

addressesM.Add(new AddressModel()
            {
                Address = "#20-06, gateway east, 152, beach road, singapore 189721",
                CountryChar = 's'
            });
            addressesM.Add(new AddressModel()
            {
                Address = "01-01, 8, anthony road, singapore 229957",
                CountryChar = 's'
            }); //note: Anthony

Address我的 LINQ 的想法是在每个AddressModel对象中查找是否有任何城市是我的属性的子字符串。如果是,则验证forCountryChar是否AddressModel匹配CountryChar.CityModel

我的 LINQ:

foreach (AddressModel address in addressesM)
            {
                string city = "xxx";
                i++;

                Console.WriteLine(i + " z " + addresses.Count());

                CityModel tocompare = cities.Where(collectionOfCities => address.Address.IndexOf(collectionOfCities.City) >= 0 &&
                (address.Address[address.Address.IndexOf(collectionOfCities.City) - 1] == ' ' ||
                address.Address[address.Address.IndexOf(collectionOfCities.City) - 1] == ',') &&
                (address.Address[address.Address.IndexOf(collectionOfCities.City) + collectionOfCities.City.Length] == ' ' ||
                address.Address[address.Address.IndexOf(collectionOfCities.City) + collectionOfCities.City.Length] == ',') &&
                collectionOfCities.CountryChar == address.CountryChar).FirstOrDefault();

                if (tocompare != null)
                {
                    TextInfo textInfo = new CultureInfo("en-US", false).TextInfo;

                    tocompare.City = textInfo.ToTitleCase(tocompare.City);

                    city = tocompare.City;
                }

                output.Add(city);
            }

对于我的AddressModelLINQ 的第一种情况,效果很好。问题出现了,当我的第二个AddressModel里面有一个“安东尼”这个词,并且还有一个城市叫Anthony。在这种情况下,在检查“Anthony”的其余 LINQ 条件后,它会添加到我的output“xxx”字符串并移至AddressModel列表中的下一个。

我不知道在“安东尼”城市失败后如何做到这一点,该程序将测试列表中的其他城市?

编辑:

一些地址可能有包含数字和大写字母的邮政编码,例如:

中国上海市浦东新区蔡伦路1690号7座1-3楼,邮编:201203

1 楼, 6, quai Antoine-1er Le Ruscino, 98012 Monte Carlo, CEDEX, Monaco。

1, Dole Drive, Westlake Village CA 91362-7300, USA。

Gratsos 大厦,15,Eleftheriou Venizelou Street,105 64 雅典,希腊。

一些城市名称可能超过 1 个单词,例如:

鱼鹰

巴拿马城

拉凡尔纳

标签: c#listlinq

解决方案


首先,让我们组织起来cities;假设(City, CountryChar)组合是唯一的,我们可以构建一个字典:

List<CityModel> cities = ...

Dictionary<(string city, char country), CityModel> citiesDict = cities
  .ToDictionary(item => (item.City, item.CountryChar), 
                item => item);

然后我们必须发明城市名称提取(可能存在误报);可能,姓氏(后续a..z字母)是一个不错的选择(让我们为此使用正则表达式):

// will return "singapore"
IEnumerable<string> CityNames(string address) {
  string name = Regex.Match(
     address, 
   @"\b[a-z]+\b", 
     RegexOptions.RightToLeft | RegexOptions.IgnoreCase).Value;

  if (!string.IsNullOrEmpty(name))
    yield return name;
}

或者更宽松(任何名称,将返回"gateway", "east", "beach", "road", "singapore")实现:

IEnumerable<string> CityNames(string address) {
  return Regex
    .Matches(address, @"\b[a-z]+\b", RegexOptions.IgnoreCase)
    .Cast<Match>()
    .Select(match => match.Value);
}

然后我们可以在以下帮助下构建最终的LinqSelectMany

List<AddressModel> addresses = ...

var result = addresses
  .SelectMany(item => CityNames(item.Address) // match all possible cities form address
     .Select(possibleCity => new { // actual city from possible city
        address = item,
        city    = citiesDict.TryGetValue((possibleCity, item.CountryChar),
                                          out var actualCity) 
          ? actualCity // Either Real City (if found), say, "singapore"
          : null       // null if not exits, say, "road"
      }))
  .Where(item => item.city != null); // Real City Only

编辑:这里的主要困难是提取潜在的城市名称(一般情况下的自然语言处理......)。如果您可以保证地址部分(街道、城市、国家等)用逗号分隔,,我们可以尝试Split

  IEnumerable<string> CityNames(string address) {
    return address
      .Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
      .Select(item => Regex.Replace(item.Trim(), @"\s+", " ").ToLower())
      .Where(item => !string.IsNullOrEmpty(item));
  }

现在,因为"1st Floor, 6, quai Antoine-1er Le Ruscino, 98012 Monte Carlo, CEDEX, Monaco"我们将有"1st floor" "6" "quai antoine-1er le ruscino" "98012 monte carlo", "cedex", "monaco". 请注意,98012添加到Monte Carlo. 如果你想去掉数字并有"st floor", "quai antoine-er le ruscino" "monte carlo", "cedex","monaco"

  IEnumerable<string> CityNames(string address) {
    return address
      .Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
      .Select(item => Regex.Replace(item, "[0-9]+", ""))
      .Select(item => Regex.Replace(item.Trim(), @"\s+", " ").ToLower())
      .Where(item => !string.IsNullOrEmpty(item));
  }

推荐阅读