首页 > 解决方案 > 正则表达式(re2 googlesheets)多行单元格中的多个值

问题描述

被困在如何通过 arrayformula 从多行单元格中读取和修饰这些值。

我使用正则表达式作为前行可能会有所不同。


请只是公式,没有自定义代码


第一列看起来像一组:``` [config] name = the_name texture = blah.dds cost = 1000

[效果0] 值 = 1000 类型 = ATTR_A

[效果1] 值 = 8 类型 = ATTR_B

[feature0] 名称 = feature_blah

[组件] 0 = comp_one,1

[资源] res_one = 1 res_five = 1 res_four = 1

<br/>
Where to be useful elsewhere, at minimum it needs each [tag] set ([effect\d], [feature\d], ect) to be in one column each, for example the 'effects' column would look like:

ATTR_A:1000,ATTR_B:8


and so on.



Desired output can also be seen in the included spreadsheet


<br/>
<b>Here is the example spreadsheet:</b>

https://docs.google.com/spreadsheets/d/1arMaaT56S_STTvRr2OxCINTyF-VvZ95Pm3mljju8Cxw/edit?usp=sharing


**Current REGEXREPLACE**

Kinda works, finds each 'type' and 'value' great, just cant figure out how to extract just that from the rest, tried capture (and non-capturing) groups before and after but didnt work

=ARRAYFORMULA(REGEXREPLACE($A3:$A,"[\n.] [效果\d][\n.] (. )\n(. )","1:$1 2:$2"))


**Current SUBSTITUTE + REGEXEXTRACT + REGEXREPLACE**

A different approach entirely, also kinda works, longer form though and left with having to parse the values out of that string, where got stuck again. Idea was to use this to simplify, then regexreplace like above. Getting stuck removing content around the final matches though, and if can do that then above approach is fine too.

// 首先运行一个替换 =ARRAYFORMULA(SUBSTITUTE(SUBSTITUTE($A3:$A,char(10),";"),";;",char(10))) // 然后这个变体(放弃单行 'effect/d' 所以将其拆分以尝试使其正常工作)=ARRAYFORMULA(IF(A3:A<>"",IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect0) ]);(. ) $")&";;")&""&IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect1]);(. ) $")&";; ")&""&IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect2]);(. ) $")&";;"),"")) // 然后使用正则表达式替换以上 =ARRAYFORMULA(REGEXREPLACE($B3:$B,"value = (. );type = (. );;","1:$1 2:$2"))


**--EDIT--**

Also, as my updated 'Desired Output' sheet shows (see timestamped comment below), bonus kudos if you can also extract just the values of matching 'type's to those extra columns (see spreadsheet). 

All good if you cant though, just realized would need that too for lookups.

**--END OF EDIT--**

<br/>
Ive tried dozens of things, discarding each in turn, had a quick look in version history to grab out two promising attempts and shared them in separate sheets.

One of these also used SUBSTITUTE to simplify input column, im happy for a solution using either RAW or the SUBSTITUTE results.


<br/>
**Potentially Useful links:**

https://github.com/google/re2/wiki/Syntax



<br/>

<b>Just some more words:</b>

I also have looked at dozens of stackoverflow and google support pages, so tried both REGEXEXTRACT and REGEXREPLACE, both promising but missing that final tweak. And i tried dozens of tweaks already on both.


Any help would be great, and hopefully help others in future since examples with spreadsheets are great since every new REGEX seems to be a new adventure ;) 

<br/>
P.S. if we can think of better title for OP, please say in comment or your answer :)

标签: regexgoogle-sheetsgoogle-sheets-formulaarray-formulasre2

解决方案


0

粘贴在B3中:

=ARRAYFORMULA(SUBSTITUTE(TRIM(TRANSPOSE(QUERY(TRANSPOSE(
 IF(C3:E<>"", C2:E2&":"&C3:E, )),,999^99))), " ", ", "))

粘贴在C3中:

=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&C2)))

粘贴在D3中:

=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&D2)))

粘贴在E3中:

=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&E2)))

粘贴在F3中:

=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[feature\d+\]\nname = (.*)")))

粘贴在G3中:

=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[components\]\n\d+ = (.*)")))

粘贴在H3中:

=ARRAYFORMULA(IFNA(REGEXREPLACE(INDEX(SPLIT(REGEXEXTRACT(
 REGEXREPLACE(A3:A, "\n", ", "), "\[resources\], (.*)"), "["),,1), ", , $", )))

电子表格演示


推荐阅读