c# - 对大型 XML 数据进行更好的 Linq 解析
问题描述
我有一个应用程序,它接收许多 xml 文件并执行查找以创建 csv 文件,我注意到数据并不总是 100%,即丢失结果或 2,所以我认为我处理数据的方式不正确和穷人所以真的很感谢这里的大师的一些帮助。
小型 XML 示例:
<?xml version="1.0" encoding="utf-8"?>
<lookupdb xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:sample:lookupdb:0.1">
<References>
<Reference id="3cb7ceb0-43c7-4c67-a7fb-fffb32fc71c4">
<Vehicle>Beach_Buggy_01</Vehicle>
<Engineers>
<Engineer>Joe Bloggs</Engineer>
</Engineers>
<IsActive>true</IsActive>
<Owner>Bill Bloggs</Owner>
<Serviced>True</Serviced>
<OwnerName>Bill</OwnerName>
<CostID>ABCDEF123456</CostID>
<FuelType>Petrol</FuelType>
<Phone>1234567890</Phone>
<Address>Some Address</Address>
</Reference>
<Reference id="d1053bd3-a1cb-4fb4-a7d5-ffee3e10ffdb">
<Vehicle>Transit</Vehicle>
<Engineers>
<Engineer>Joe Bloggs2</Engineer>
</Engineers>
<IsActive>true</IsActive>
<Owner>Andy Bloggs</Owner>
<Serviced>True</Serviced>
<OwnerName>Andy</OwnerName>
<CostID>9345089</CostID>
<FuelType>Petrol</FuelType>
<Phone>1234567890</Phone>
<Address>Some Address4</Address>
</Reference>
<Reference id="30f8cfe8-40fd-4c99-9c7d-5ab98f8e5620">
<Vehicle>Ford Fiesta</Vehicle>
<Engineers>
<Engineer>Steve Bloggs</Engineer>
</Engineers>
<IsActive>true</IsActive>
<Owner>Sarah H</Owner>
<Serviced>True</Serviced>
<OwnerName>Bill</OwnerName>
<CostID>834hsdfgs</CostID>
<FuelType>Petrol</FuelType>
<Phone>1234567890</Phone>
<Address>Some Address3</Address>
</Reference>
</References>
<Sessions>
<RentalSession id="cc5d9960-3a80-4fd9-b7d6-0963198567c3">
<VehicleRefId>3cb7ceb0-43c7-4c67-a7fb-fffb32fc71c4</VehicleRefId>
<RentalPeriod startDate="2018-10-02T07:46:34Z" endDate="2018-10-02T08:27:36Z" />
<HiringInfo HireId="2e428f42-f8f1-4603-9570-fed1fa78e470" customerId="1929936734" customerRefId="6da73407-f443-491d-9cad-c4fed9bfb71f" />
<Notes>Vehicle Broke Down Recovery ordered</Notes>
<VehicleGroup>ATV</VehicleGroup>
</RentalSession>
<RentalSession id="829221a2-196e-403a-bdcb-9759959cfa70">
<VehicleRefId>3cb7ceb0-43c7-4c67-a7fb-fffb32fc71c4</VehicleRefId>
<RentalPeriod startDate="2018-10-03T07:46:34Z" endDate="2018-10-04T08:27:36Z" />
<HiringInfo HireId="4fb2cd21-9f48-44de-ae72-01ce4eeccdf9" customerId="2929936735" customerRefId="0a2d3d8b-ab06-4cd1-9ec5-aea4ac3f6da3" />
<Notes>Returned on Time no Damage</Notes>
<VehicleGroup>ATV</VehicleGroup>
</RentalSession>
<RentalSession id="68a6b485-d30a-439a-8081-8c09f724d23b">
<VehicleRefId>d1053bd3-a1cb-4fb4-a7d5-ffee3e10ffdb</VehicleRefId>
<RentalPeriod startDate="2018-10-05T07:46:34Z" endDate="2018-10-05T08:27:36Z" />
<HiringInfo HireId="c4022764-7fc2-4415-97bf-57d616e3b8bd" customerId="3929936736" customerRefId="cb260bfc-34c1-4ac5-befa-17f69b2406bb" />
<Notes>Scratch to Door Charges applied</Notes>
<VehicleGroup>VANS</VehicleGroup>
</RentalSession>
<RentalSession id="c4083f9a-65ee-4693-8488-e299271064b1">
<VehicleRefId>30f8cfe8-40fd-4c99-9c7d-5ab98f8e5620</VehicleRefId>
<RentalPeriod startDate="2018-10-09T07:46:34Z" endDate="2018-10-09T08:27:36Z" />
<HiringInfo HireId="cb260bfc-34c1-4ac5-befa-17f69b2406bb" customerId="4929936737" customerRefId="c4022764-7fc2-4415-97bf-57d616e3b8bd" />
<Notes>Generally a rubbish vehicle</Notes>
<VehicleGroup>Small Cars</VehicleGroup>
</RentalSession>
</Sessions>
</lookupdb>
用户名是程序的主要查找,以及所需的工程师,因为会话中的 VehicleRefId 与参考 id 匹配,大部分数据来自租赁会话;但是,从一些本地测试中,我发现首先获取会话数据似乎效果更好,但对这种方法并不完全确定,这是我认为需要查看的代码:
1:获取租赁数据
var result = xDoc.Descendants().Descendants(ns + "RentalSession")
.Where(x => x.Element(ns + "VehicleRefId").Value != null)
.Select(x => new
{
_VehicleRefId = GetResultValue(true, x, "VehicleRefId", "VehicleRefId", "Vehicle Reference ID"),
_RentalSessionId = GetResultValue(false, x, "RentalSession", "id", "Session ID"),
_startDate = GetResultValue(false, x, "RentalPeriod", "startDate", "Start date"),
_endDate = GetResultValue(false, x, "RentalPeriod", "endDate", "End date"),
_VehicleGroup = GetResultValue(true, x, "VehicleGroup", "VehicleGroup", "Vehicle Group"),
_Notes = GetResultValue(true, x, "Notes", "Notes", "Event Notes")
}).ToList().Distinct();
2:租赁数据查询查询中看到的方法:
private string GetResultValue(bool isNode, XElement atrr_value,string nodeName, string xattr_Name, string value_text)
{
string retValue = "";
try
{
switch(isNode)
{
case true:
retValue = !string.IsNullOrEmpty((string)atrr_value.Element(ns + nodeName).Value)
? (string)atrr_value.Element(ns + nodeName).Value
: $"No {value_text} Found.";
break;
default:
if(nodeName == "RentalSession")
{
retValue = !string.IsNullOrEmpty((string)atrr_value.Attribute(xattr_Name).Value)
? (string)atrr_value.Attribute(xattr_Name).Value
: $"No {value_text} Found.";
}
else
{
retValue = !string.IsNullOrEmpty((string)atrr_value.Element(ns + nodeName).Attribute(xattr_Name).Value)
? (string)atrr_value.Element(ns + nodeName).Attribute(xattr_Name).Value
: $"No {value_text} Found.";
}
break;
}
}
catch(Exception rex)
{
retValue = "null";
}
return retValue;
}
3:获取 Owner 和 Engineer 数据:
foreach(var itemData in result)
{
try
{
var references = xDoc.Descendants().Descendants(ns + "Reference")
.Where(
a => a.Attribute("id").Value == itemData._VehicleRefId
)
.Select(a => new
{
_OwnerName = a.Element(ns + "OwnerName").Value,
_Engineer = a.Elements(ns + "Engineers").Descendants(ns + "Engineer").Select(e => e.Value).Single()
}).FirstOrDefault();
... Further parsing
catch (Exception xEx)
{
//some error handling stuff
}
}
非常感谢您的帮助,以了解我在学习和简化这部分代码方面的不足之处。
提前谢谢了。
编辑:上面的 xml 只显示了一段数据,会有多个引用和会话,一些会话将匹配相同的引用。
解决方案
不要使用当元素为空时会出现问题的“值”属性。而是做一个像下面的代码一样的演员
var result = xDoc.Descendants().Descendants(ns + "RentalSession")
.Where(x => x.Element(ns + "VehicleRefId").Value != null)
.Select(x => new
{
_VehicleRefId = (string)x.Element("VehicleRefId"),
_RentalSessionId = (string)x.Element("RentalSession),
_startDate = (DateTime)x.Element("RentalPeriod),
_endDate = (DateTime)x.Element("RentalPeriod"),
_VehicleGroup = (string)x.Element("VehicleGroup"),
_Notes = (string)x.Element("Notes")
}).ToList().Distinct();
推荐阅读
- swift - Swift ImagePicker - 视图控制器
- java - netty.io java 禁用调试消息
- javascript - 在 Observable 数组中搜索特定项目
- typescript - 打字稿智能感知仅从根索引文件中提取导出
- node.js - 在打字稿中使用类作为变量和类型
- r - 比较R中边距内的时间戳
- python - 我如何只使用一个 if 语句并将所有这些条件包含在 OR (python)
- python - 如何选择或优化标签以便我们获得更好的多类分类结果?
- php - Box\Spout\Common\Exception\IOException: 无法打开 C:\xampp\htdocs\projectname\public\uploads/ 进行阅读!在 C:\
- python - 如何在熊猫中提取的数据框中添加子标题