首页 > 解决方案 > 在python中查找字符串中每个唯一字符的开始和结束索引

问题描述

我有一个重复字符的字符串。我的工作是找到该字符串中每个唯一字符的起始索引和结束索引。下面是我的代码。

import re
x = "aaabbbbcc"
xs = set(x)
for item in xs:
     mo = re.search(item,x)
     flag = item
     m = mo.start()
     n = mo.end()
     print(flag,m,n)

输出 :

a 0 1
b 3 4
c 7 8

这里字符的结束索引不正确。我理解为什么会发生这种情况,但是如何将要动态匹配的字符传递给正则表达式搜索函数。例如,如果我在搜索功能中对字符进行硬编码,它会提供所需的输出

x = 'aabbbbccc'
xs = set(x)
mo = re.search("[b]+",x)
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n)

输出:

b 2 5

上述函数提供了正确的结果,但在这里我无法传递要动态匹配的字符。如果有人可以让我知道如何实现这一点,那将是非常有帮助的,任何提示也可以。提前致谢

标签: regexpython-3.xpyspark

解决方案


拯救字符串文字格式:

import re

x = "aaabbbbcc"
xs = set(x)
for item in xs:
    # for patterns better use raw strings - and format the letter into it
    mo = re.search(fr"{item}+",x)  # fr and rf work both :) its a raw formatted literal
    flag = item
    m = mo.start()
    n = mo.end()
    print(flag,m,n)  # fix upper limit by n-1

输出:

a 0 3   # you do see that the upper limit is off by 1?
b 3 7   # see above for fix
c 7 9

您的模式不需要[]字母周围的 - 无论如何您只匹配一个。


没有正则表达式1

x = "aaabbbbcc"
last_ch = x[0]
start_idx = 0
# process the remainder
for idx,ch in enumerate(x[1:],1):
    if last_ch == ch:
        continue
    else:
        print(last_ch,start_idx, idx-1)
        last_ch = ch
        start_idx = idx
print(ch,start_idx,idx)

输出:

a 0 2   # not off by 1
b 3 6
c 7 8

1 RegEx:现在你有两个问题......


推荐阅读