首页 > 解决方案 > 将前导零添加回 ID 变量

问题描述

我正在使用以下 ID 数据,我正在尝试将数据恢复为正确的形式。

“不正确 ID”的前 20 个观察结果如下所示:

 [1] 11820096867 11820053047 13410057602 13410015341 14257205715 28382012393 13410001306 11820000771 11820000784 11820000884 11820011030
[12] 15230002545 13410015602 17336011108 11820000769 11820096867 11820053030 13410050602 11820053030 14257205715

该数据可以分为 4 个部分S, G,VI

我想加回这些前导零并将数据分成 4 列。

S = 2 digits long
G = 1 digit long
V = 5 digits
I = 5 digits

我一直在使用这些“不正确的 ID”,例如,观察结果11820000771将被分成最后 5 位数字(减去前导零),并且 =I接下来的 5 位数字(减去前导零)将 = V等等。

示例 1:

11820000771将会:

I = 0771
V = 82000
G = 1
S = 1

示例 2:

14257205715将会:

I = 5715
V = 25720
G = 4
S = 1

示例 3:

13410015602将会:

I = 15602
V = 4100
G = 3
S = 1

示例 4:

10943900008将会:

I = 0008
V = 94390
G = 0
S = 1

在文档中,它指出leading zeros are not shown“不正确的 ID”数据已被删除。

在第二个“正确”数据框中,这就是S、和的样子:GVI

        S G     V     I
 [91,]  0 1 18200 97341
 [92,]  0 1 71990 15340
 [93,]  0 1 18200 87418
 [94,]  6 1 18200 38602
 [95,] 27 1 34100  1640
 [96,]  0 1 19699 30069
 [97,]  0 2 84694 59574
 [98,]  0 1 71990  1640
 [99,]  0 1 18200   771
[100,]  0 1 18200  1640

所以

第一个目标是将“不正确的 ID”拆分为正确SGVI类似于上述内容。

第二个目标是创建一个新的 ID 密钥,如下所示:

 [1] "00-01-73360-50661" "00-01-87692-30040" "00-01-34100-57509" "00-01-18200-53047" "00-03-70310-30703" "00-01-82000-72385"
 [7] "00-01-68213-09410" "00-01-18200-00771" "00-01-34100-50340" "00-03-73360-97341"

其中S,G和由 a 组合和分割V,前导零添加回数据。I-

概述:

我正在尝试将前导零添加到 ID 变量的分段中,该分段分为 4 个最大长度部分。如果一段从 a 开始,0则将其删除。如果它以大于的数字开头,0则不会向 ID 添加前导零。

希望我很清楚,如果我不在任何部分,请告诉我,我会澄清

数据:

 ID <- c(11820096867, 11820053047, 13410057602, 13410015341, 14257205715, 
    28382012393, 13410001306, 11820000771, 11820000784, 11820000884, 
    11820011030, 15230002545, 13410015602, 17336011108, 11820000769, 
    11820096867, 11820053030, 13410050602, 11820053030, 14257205715, 
    11820011168, 27336097343, 13410015509, 12556924173, 13410001222, 
    18769227102, 18769210012, 13410048574, 13410057602, 28066095605, 
    17199030030, 11820011047, 13410057509, 13410017256, 13410050306, 
    18200072518, 13410001306, 11820053168, 11820053168, 11820096867, 
    11820043047, 18200072385, 11820043218, 13410029602, 13410030341, 
    17199030030, 17199000048, 18066095615, 15230002540, 13410015341, 
    17199030030, 13410057306, 11820011168, 13410059505, 17336011214, 
    11820096867, 11820000884, 13410003602, 31820000042, 13410015341, 
    11820000891, 13410000355, 11820096867, 13410031306, 17289010016, 
    11820053218, 11820053030, 11820000016, 11820011030, 17336011214, 
    13410015340, 2710000106005, 11820061030, 17089701331, 23410017306, 
    11820000016, 27199077005, 13410003256, 13410057341, 17199030030, 
    15230000435, 11820053218, 13410015341, 18769241103, 15230000434, 
    11820043218, 11820000842, 13410057340, 11820011047, 13410001340, 
    33410000354, 12210000170, 11820041218, 27336097343, 13410046874, 
    13410015340, 31820000697, 13410015306, 13410000007, 613598510062, 
    15230000022, 618516510505, 11820053218, 13410001602, 15146051460, 
    15230000022, 17031000024, 11820000884, 14182700012, 11820000784, 
    2710000106005, 18769233103, 17199010074, 17199030030, 18200072385, 
    11820011168, 11820000769, 16821309117, 11820053168, 13410050505, 
    11820043218, 11820053030, 13410017509, 17231163001, 15230002540, 
    33410000354, 18769210014, 15230002545, 27031030701, 15230000002, 
    18769240020, 12210000170, 23410017306, 13410050340, 17199000048, 
    15230000434, 11820096867, 15230002903, 13410057340, 28066095605, 
    11820079047, 17199000048, 11820011030, 17199000048, 27336097343, 
    13410057341, 13410000555, 13410050574, 18769230050, 11820096867, 
    11820000884, 18769210014, 21820086167, 11820053168, 11820041218, 
    13410015306, 715643501208, 11820002990, 613598512001, 16821309117, 
    13410000355, 33410000354, 13410057602, 11820000126, 17089701331, 
    11820027168, 17336035201, 27336097343, 13410057340, 11820000769, 
    11820053218, 11820011168, 16206705142, 11820000884, 11820053168, 
    11820011168, 18066095615, 15230000017, 11820003982, 11820043218, 
    17199030030, 11820000466, 27336097343, 11820096867, 11820011030, 
    15230002966, 611969902000, 11820011030, 17289010011, 711820053025, 
    23410017306, 11820096867, 12210000170, 13410057341, 18382072553, 
    15230000434, 13410057306, 13410048574, 12556971416, 618516510505, 
    13410014574, 13410017340, 27336082341, 13410001306, 18200072385, 
    13410015341, 11820079047, 15230000435, 17336035201, 13410015341, 
    13410051574, 17289010011, 11820096867, 13410050574, 13410001306, 
    15230000434, 21820000801, 13410001602, 17089701331, 23410017306, 
    13410050306, 11820053030, 11820000771, 11820000016, 11820000884, 
    18200072385, 15230002903, 17143945712, 11820004989, 16206705155, 
    11820011030, 13410050602, 16821309117, 18769233103, 11820011030, 
    13410003602, 17199030069, 23410017306, 17336013661, 15230002540, 
    13410050340, 15230002903, 18769283102, 13410057602, 17336011108, 
    27336097343, 17199070002, 13410057306, 15230000966, 13410072805, 
    11820000693, 17336035301, 21820000115, 15230000536, 31820000042, 
    13410057340, 17143932012, 11820053047, 13410017256, 13410001222, 
    18769241103, 17199030030, 13410015340, 10948700007, 11820086031, 
    11820043218, 13410031306, 13410057602, 17199030030, 11820003982, 
    11820011168, 17336011214, 16206705155, 11820053030, 13410057340, 
    15230002545, 613598510062, 13410057340, 2710000106005, 13410057306, 
    11820004990, 18200072518, 17336013343, 18066095615, 11820053218, 
    13410048574, 13410015306, 11820096867, 13410015340, 18469400001, 
    13410048574, 11820053218, 13410001340, 11820053168, 18769233103, 
    13410050306, 13410010602, 15230002545, 18066095615, 11820000106, 
    11820002992, 11820000693, 17199000048, 13410057306, 11820000771, 
    13410015341, 17031000009, 13410078574, 27336097343, 21820000647, 
    13410015341, 13410057256, 31820000697, 15230000017, 13410030341, 
    13410000175, 16821309117, 11820000771, 21820086167, 613598510062, 
    13410048505, 13410001306, 13410007306, 13410001505, 11820079047, 
    18806705542, 37336097341, 12210007500, 13410072805, 18066095615, 
    11820011047, 13410078574, 31820000697, 18417341130, 16206705155, 
    11820053168, 13410015341, 13410057306, 13410017256, 18382023473, 
    15230000435, 613598512001, 14182700712, 13410057340, 13410057509, 
    11820053168, 11820011218, 15230000434, 15230002966, 13410001602, 
    17199000027, 13410057306, 13410050340, 13410057341, 15230000434, 
    13410057602, 11820053047, 15146051460, 27199077008, 13410057340, 
    13410001306, 23410000005, 11820053218, 11820003982, 23410068505, 
    11820000833, 17031037037, 11820000466, 16206705155, 11820043218, 
    11820011030, 27336082341, 11820003982, 23410017306, 11820043218, 
    17336013302, 13410057341, 17336035201, 17199030005, 11820000884, 
    18200072385, 13410017505, 11820096418, 15230000540, 11820015168, 
    715643501201, 16821302112, 613598512001, 11820053168, 11820053047, 
    13410010505, 13410000554, 21820086167, 15230000416, 13410001340, 
    11820053030, 13410001340, 11820096867, 23410003505, 11820053218, 
    23410000005, 18200072385, 15230002545, 23410000005, 11820096867, 
    11820001991, 21820086167, 13410001602, 13410015341, 13410057602, 
    13410000355, 13410007306, 13410057602, 18066095615, 18382012368, 
    12210001640, 15230000434, 13410057340, 13410015256, 28382012393, 
    13410050306, 11820053047, 11820000891, 13410000559, 11820000466, 
    18015761194, 11820096418, 11820000891, 11820096418, 17199030030, 
    13410057509, 18769241103, 11820096867, 16821309117, 16821309117, 
    11820079047, 27336097343, 2710000106744, 11820000784, 11820000884, 
    18066095675, 11820096418, 13410015341, 11820053168, 11820053168, 
    11820096867, 11820004990, 613598510062, 15230000434, 2710000106005, 
    15230000434, 11820053047, 613598512001, 31820000042, 11820096379, 
    15230000435, 11820011030, 11820053030, 12210001640, 13410003306, 
    18200072385, 18417340130, 11820053168, 13410072805, 11820053218, 
    11820015168, 13410001509, 13410031306, 17089701325, 17199048004, 
    11820096867, 13410001509, 18549811113, 18066095937, 17336011341, 
    11820011025, 11820011030, 11820096418, 18066095935, 11820015168, 
    18200072385, 13410007341, 17336011348, 13410007306, 13410057602, 
    13410001341, 18769241102, 13410057340, 13410001602, 17199036400, 
    17289000016, 11820096867, 16821302117, 13410057306, 13410057306, 
    11820000833, 14182700712, 11820011030, 11820011030, 15230000440
    )

编辑2:

正如评论中指出的那样,删除以下数据的前导零。

该数据是正确格式的“正确”数据。我现在要做的就是从以下数据的每个部分中删除前导零。因此,采取00-01-18200-00987将像以前一样分为 4 列,并删除前导零。

S = 0
G = 1
V = 18200
I = 0987

数据:

IDs <- c("00-01-41827-00712", "00-01-52300-01540", "00-01-18200-00987", 
"00-01-83820-07131", "00-01-34100-01222", "00-01-34100-50602", 
"00-01-52300-00536", "00-01-42572-05715", "00-01-34100-25574", 
"00-01-73360-73149", "00-01-34100-51574", "00-01-34100-07602", 
"00-01-89961-00420", "00-01-71990-90029", "00-01-34100-31341", 
"00-02-34100-30602", "00-01-34100-17536", "00-01-34100-57602", 
"00-01-18200-11047", "00-01-34100-00880", "00-01-34100-07602", 
"07-01-67084-27455", "00-01-34100-07340", "00-01-80660-95615", 
"00-01-34100-50222", "00-01-34100-15509", "00-01-72311-63009", 
"00-01-18200-54028", "06-01-19699-02000", "00-01-73360-35201", 
"06-01-85165-10504", "06-01-34986-10003", "00-03-70310-30703", 
"00-01-18200-53168", "00-01-18200-01991", "00-01-89961-10120", 
"00-01-82000-72385", "00-01-18200-00784", "00-01-71990-30030", 
"00-01-72890-00011", "00-01-34100-00622", "00-01-18200-15168", 
"00-01-52300-00440", "00-01-34100-00355", "00-01-71990-00048", 
"00-01-34100-77435", "00-01-80157-11125", "00-01-52300-01301", 
"06-01-85165-10505", "00-01-87692-83102", "00-01-34100-50505", 
"00-01-34100-00355", "00-01-52300-00440", "00-01-34100-50340", 
"00-01-73360-13343", "00-01-80660-95301", "00-01-34100-14505", 
"00-01-34100-59574", "00-01-34100-07306", "00-01-18200-53168", 
"00-01-34100-15256", "27-01-00001-06502", "00-01-71990-77828", 
"00-01-18200-43218", "00-01-73360-13343", "00-01-72311-63001", 
"00-01-18200-00987", "00-01-18200-79047", "00-01-18200-00466", 
"00-01-82000-72385", "00-01-34100-57602", "00-02-34100-25505", 
"00-01-34100-01341", "00-03-73360-97341", "00-01-18200-00987", 
"00-01-34100-00488", "00-01-18200-15168", "00-01-34100-01306", 
"00-02-18200-29031", "00-01-34100-48602", "00-01-85498-73837", 
"00-02-34100-62509", "00-01-34100-00009", "00-02-34100-17306", 
"00-01-18200-00106", "00-01-41827-00712", "00-01-71990-70002", 
"00-01-82488-12700", "00-01-72890-00030", "00-01-18200-00956", 
"00-01-84173-32130", "00-01-52300-00536", "00-01-80660-95625", 
"00-01-22100-00157", "00-01-34100-03306", "00-01-18200-00639", 
"00-01-18200-15047", "00-01-85498-73837", "00-01-22100-00170", 
"00-01-52300-02540", "00-01-52300-02540", "00-01-34100-68574", 
"00-01-34100-03509", "00-01-18200-00978", "00-01-71990-10006", 
"00-01-52300-02540", "00-01-18200-01991", "00-03-34100-00354", 
"00-01-18200-03982", "07-01-18200-53025", "00-01-18200-03982", 
"00-01-72890-00016", "00-01-34100-15509", "00-01-84173-10545", 
"00-01-34100-03340", "00-01-71990-48004", "00-01-34100-62340", 
"00-01-71990-77828", "00-01-34100-00904", "00-01-71990-00047", 
"00-01-87692-10012", "00-01-34100-07341", "00-01-18200-79047", 
"00-01-85725-00005", "00-01-52300-00540", "00-01-71990-30030", 
"00-01-34100-50574", "00-02-73360-82341", "00-01-34100-57306", 
"00-01-72311-63011", "00-01-73360-35201", "00-01-34100-50574", 
"00-01-71990-10033", "00-01-71990-00048", "00-01-34100-57536", 
"00-01-70897-01331", "00-01-52300-00434", "00-01-71990-48016", 
"00-01-34100-31602", "00-01-18200-00834", "00-01-34100-31306", 
"00-01-18200-11168", "00-01-34100-00252", "00-02-72890-00012", 
"00-01-52300-00022", "00-02-34100-17306", "00-01-52300-00017", 
"00-01-82488-12356", "00-01-18200-04989", "00-01-34100-01222", 
"00-03-34100-00354", "00-01-34100-14505", "00-01-18200-00933", 
"00-01-52300-00416", "00-02-18200-29031", "00-01-18200-00865", 
"00-01-82488-12910", "00-01-80660-95625", "00-01-41827-00076", 
"00-01-18200-27168", "00-01-34100-53505", "00-01-34100-01340", 
"00-01-18200-02989", "00-01-34100-62505", "00-01-73360-50202", 
"00-01-34100-01256", "00-01-71250-40205", "00-01-34100-15340", 
"00-02-18200-29031", "00-01-72311-63012", "00-03-18200-00697", 
"00-02-18200-00166", "00-01-34100-00491", "00-01-52300-02966", 
"00-01-22100-00171", "00-01-34100-14574", "00-01-49483-18000", 
"00-01-71990-09511", "00-01-34100-50222", "00-02-71250-00019", 
"00-01-34100-03509", "00-01-18200-53168", "00-01-34100-57306", 
"00-01-34100-17505", "00-02-34100-17306", "00-01-87000-50882", 
"00-01-34100-50574", "00-01-83820-12360", "00-01-34100-10505", 
"00-01-71990-70002", "00-03-70897-01123", "00-01-18200-00833", 
"00-01-34100-57256", "00-01-34100-62340", "07-01-19256-00058", 
"00-01-71250-40205", "00-01-09487-00007", "00-01-18200-00833", 
"00-01-83820-23473", "00-01-34100-00355", "00-01-34100-01256", 
"00-01-71439-34806", "00-01-34100-51306", "00-01-34100-50306", 
"06-01-33745-13000", "00-01-34100-00904", "00-01-18200-03982", 
"00-01-18200-00769", "00-01-52300-00966", "00-01-52300-00022", 
"00-01-52300-00540", "00-01-71990-10074", "00-02-18200-00801", 
"00-01-71990-30030", "00-01-18200-96867", "00-02-18200-87418", 
"00-01-34100-15222", "00-01-34100-15340", "00-01-87692-40020", 
"00-01-18200-00126", "00-01-71439-34806", "00-01-34100-15256", 
"00-02-18200-00701", "00-02-73360-82301", "00-01-68213-03112", 
"00-01-73360-80301", "00-01-34100-46805", "00-01-18200-11025", 
"00-01-34100-53505", "00-02-18200-00647", "00-01-18200-00974", 
"00-01-62067-05172", "00-01-71990-30069", "00-01-34100-01528", 
"00-02-83820-12393", "00-02-18200-87418", "00-01-34100-01509", 
"00-01-34100-57602", "00-01-34100-15509", "00-01-34100-03509", 
"00-01-34100-01602", "00-01-34100-50222", "00-01-34100-67505", 
"00-01-84173-37133", "00-02-34100-25505", "00-01-18200-00834", 
"00-01-71990-00028", "00-01-34100-03602", "00-01-22100-00171", 
"00-01-18200-00106", "00-01-83741-10012", "00-01-73360-11348", 
"00-01-80660-95935", "00-01-18200-86418", "00-01-22100-01640", 
"00-01-84173-32130", "00-01-71990-48016", "00-01-62067-05172", 
"00-01-18200-00891", "00-01-52300-00022", "00-01-34100-62340", 
"00-01-34100-50306", "00-01-34100-17256", "00-01-34100-57306", 
"00-01-62067-05172", "00-01-85725-11508", "00-03-18200-00697", 
"00-01-34100-01505", "00-01-18200-00466", "00-01-34100-00271", 
"00-01-18200-43218", "00-01-70897-01331", "00-01-18200-00974", 
"00-03-34100-00304", "00-02-34100-00005", "00-01-80157-11016", 
"00-01-34100-57256", "00-01-34100-17505", "06-01-13008-71310", 
"00-01-34100-57306", "00-01-34100-00559", "00-01-52300-02540", 
"00-01-82054-80441", "00-01-71990-10033", "00-02-73360-82341", 
"00-01-83820-12360", "00-02-18200-00166", "00-01-18200-00834", 
"00-01-62067-05172", "00-01-52300-02903", "00-02-34100-17306", 
"00-01-80660-95937", "00-01-52300-00536", "00-01-34100-77435", 
"00-01-70310-37037", "00-01-73360-35201", "00-01-34100-57306", 
"00-01-18200-61047", "00-01-62067-38072", "00-01-34100-50574", 
"07-01-19256-00054", "00-01-34100-62505", "00-02-83741-00006", 
"00-03-70897-01123", "00-01-34100-57341", "00-01-34100-25574", 
"00-01-34100-00554", "00-03-18200-00042", "06-01-35985-00016", 
"00-01-34100-15340", "00-01-18200-04990", "00-01-73360-50661", 
"00-01-52300-00022", "00-01-34100-50340", "00-02-18200-00801", 
"00-01-18200-00769", "00-03-34100-00354", "00-01-49483-11200", 
"00-01-73360-35301", "00-01-34100-50602", "07-02-39165-00125", 
"00-01-71990-10074", "00-01-70897-01331", "00-01-71439-22033", 
"00-02-82488-00006", "00-01-18200-00670", "06-01-35985-00016", 
"00-01-71990-48016", "00-01-22100-07500", "00-01-34100-17602", 
"00-01-73360-11214", "00-01-34100-10602", "00-01-18200-11168", 
"00-01-34100-31306", "00-01-18200-00468", "00-02-82488-00006", 
"00-01-87692-10012", "00-02-82488-00006", "00-01-18200-79047", 
"00-01-87692-30040", "00-01-34100-01509", "00-02-83741-00006", 
"27-01-00001-06505", "06-01-85165-10505", "00-01-18200-86418", 
"00-01-18200-53168", "00-01-34100-67602", "00-01-80660-95625", 
"00-01-71990-00048", "00-01-62067-05155", "00-01-71990-48004", 
"00-01-18200-61047", "00-01-18200-00313", "00-02-83820-12393", 
"00-01-71990-77828", "00-01-18200-00126", "00-01-71990-30030", 
"00-01-34100-01602", "00-01-82488-12345", "00-01-71670-04064", 
"00-01-34100-03306", "00-01-18200-00964", "00-01-34100-50505", 
"00-01-18200-00974", "06-01-85165-10707", "00-02-18200-29031", 
"00-01-68213-03112", "00-01-34100-10505", "00-01-18200-04989", 
"00-01-34100-17505", "00-01-72890-00020", "00-01-72311-63011", 
"00-01-34100-01222", "00-01-84173-32130", "07-01-60890-95602", 
"00-01-70897-18331", "00-01-72890-00020", "00-01-87692-27102", 
"06-01-35985-12001", "00-01-73360-35301", "00-01-70897-01331", 
"00-01-18200-04990", "00-01-18200-00769", "00-01-18200-04997", 
"00-01-70897-01125", "00-01-18200-41218", "00-01-18200-92867", 
"00-04-34100-00152", "00-01-18200-53218", "00-01-34100-10505", 
"00-01-84694-00001", "00-01-34100-62340", "00-01-52300-00435", 
"00-01-34100-25602", "00-01-34100-62340", "00-01-62067-05155", 
"00-01-34100-50505", "00-01-18200-79047", "00-01-34100-00555", 
"00-01-18200-00466", "07-01-18200-53025", "00-01-71990-00007", 
"00-01-34100-07341", "00-01-89961-00120", "06-01-19699-00006", 
"00-02-18200-86167", "00-01-71439-22033", "00-01-09487-00007", 
"00-01-72311-63009", "00-01-73360-11214", "00-01-42572-05715", 
"00-01-34100-50340", "00-01-34100-31341", "00-02-22100-02500", 
"00-02-80660-95785", "00-01-71990-70002", "07-01-98373-12603", 
"00-01-18200-00865", "00-01-71990-00027", "00-01-85498-73837", 
"00-02-71250-00019", "00-01-80660-95615", "00-02-70310-30701", 
"00-01-85498-12345", "00-01-18200-86031", "00-01-87692-33103", 
"00-01-62067-05155", "00-01-18200-53218", "00-01-87000-50901", 
"00-01-71990-48016", "00-01-73360-11214", "00-01-34100-00579", 
"00-01-34100-62340", "00-01-87692-10012", "00-01-34100-62340", 
"00-01-70310-00012", "00-01-18200-00016", "00-01-80157-61147", 
"00-01-18200-04997", "00-01-18200-00784", "00-01-71439-45712", 
"00-01-18200-00833", "00-01-71990-77603", "00-01-34100-15340", 
"00-01-71990-30030", "00-01-18200-61047", "00-01-34100-30306", 
"00-01-34100-15505", "00-03-18200-00697", "00-04-25569-19231", 
"00-01-18200-04997", "00-01-34100-15602", "00-01-71990-47712", 
"00-01-22100-01640", "00-01-34100-15256", "06-01-85165-10502", 
"00-01-71990-30005", "00-02-18200-29031", "00-02-71250-00019", 
"06-01-35985-10062", "06-01-19699-00002", "00-01-18200-00468", 
"00-01-34100-17505", "00-02-71990-77005", "00-01-34100-80706", 
"00-02-18200-00801", "00-01-34100-48602", "00-01-34100-00904", 
"00-01-73360-50202", "00-01-34100-30306", "00-01-89961-00120", 
"00-01-34100-10602", "00-01-34100-03306", "00-02-72890-00012", 
"00-01-62067-05142", "00-01-18200-53168", "00-01-34100-77435", 
"00-01-34100-48574", "00-01-72890-00011", "00-01-83820-07531", 
"00-01-34100-01222", "07-01-18200-53025", "00-01-62067-04955", 
"00-01-18200-79047", "00-03-41827-00046", "00-01-18200-15047", 
"06-01-85165-10106", "00-02-18200-87418", "00-02-18200-29031", 
"00-01-18200-00773", "00-01-82488-13000", "00-01-73360-13343", 
"00-01-62067-38055", "00-01-34100-50222", "00-01-71990-00008", 
"00-01-85498-73837", "00-01-34100-00009", "00-01-71990-90029", 
"00-01-34100-00009", "00-01-34100-01509")

编辑 3:

使用编辑 2 中的数据:我有以下示例。

00-01-34100-01509这是IDs第二个编辑数据中的一个。这应该折叠到1341001509.

示例 2:

00-01-62067-05155应该崩溃到16206705155

示例 3:00-01-82488-12356应该折叠到18248812356

示例 4:06-01-19699-00002应该折叠到611969900002

示例 5:00-01-09439-00008应该折叠到10943900008

示例 6:00-01-09439-00008应该折叠到10943900008

这里的共同主题是它只是被删除的第一个前导零。那是 和 中的前导SG

所以我现在要做的是gsub删除IDs数据-所以我将得到如下所示的数据(以示例 6 为例) - 00010943900008 然后从这里删除前导零,因此数据变为 10943900008。这比什么简单得多我之前也想过。

编辑4:

当我运行我的版本时

我得到以下控制台输出:

> df_panel$COLUPC <- gsub("-","",df_panel$UPC)
> df_panel$COLUPC <- sub("^[0]+", "", df_panel$COLUPC) 
> beer_PANEL_GR$COLUPCmatch <- beer_PANEL_GR$COLUPC %in% df_panel$COLUPC
> sum(beer_PANEL_GR$COLUPCmatch == FALSE) 
[1] 896
> sum(beer_PANEL_GR$COLUPCmatch == TRUE) 
[1] 19119
> 
> beer_PANEL_GR$COLUPC <- as.character(beer_PANEL_GR$COLUPC)
> df <- full_join(df_panel, beer_PANEL_GR, by = "COLUPC") #Joining with UPC causes us to lose a lot of observations
> dim(df)
[1] 5293488      40

当我运行您的版本时,我得到以下控制台输出:

> # remove 0s at the beginning of the string, or preceded by "-"
> df_panel$COLUPC <- gsub("(?<=^|-)0","", df_panel$UPC, perl = TRUE)
>   
>   # remove dashes
> df_panel$COLUPC <- gsub("-", "", df_panel$COLUPC)
>   # remove leading zeros
> df_panel$COLUPC <- gsub("^0+", "", df_panel$COLUPC)
> 
> beer_PANEL_GR$COLUPCmatch <- beer_PANEL_GR$COLUPC %in% df_panel$COLUPC
> sum(beer_PANEL_GR$COLUPCmatch == FALSE) 
[1] 7382
> sum(beer_PANEL_GR$COLUPCmatch == TRUE) 
[1] 12633
> 
> df2 <- full_join(df_panel, beer_PANEL_GR, by = "COLUPC") 
> dim(df2)
[1] 3564132      40

标签: r

解决方案


解决您的编辑问题,如何:

library(dplyr)

# remove 0s at the beginning of the string, or preceded by "-"
gsub("(?<=^|-)0","", IDs, perl = TRUE) %>% 

  # remove dashes
  gsub("-", "", .) %>% 
  # remove leading zeros
  gsub("^0+", "", .)

[1] "1418270712"  "1523001540"  "1182000987"  "1838207131"  "1341001222" 
[6] "13410050602"

推荐阅读