python - Repeated vowels and consonants in words in pandas
问题描述
I have the following dataset:
a_df = pd.DataFrame({'id':[1,2,3,4,5],'text':['This was fuuuuun','aaaawesome','Hiiigh altitude','Oops','See you']})
a_df
id text
0 1 This was fuuuuun
1 2 aaaawesome
2 3 Hiiigh altitude
3 4 Oops
4 5 See you
Some words are misspelled. One rule to apply is to that, if I see three or more vowels or consonants, then I could be somehow sure that there is a misspelled word, so I replace that repetition with ''.
So I have tried this:
a_df['corrected_text'] = a_df['text'].str.replace(r'([a-zA-Z])\\3+','')
But there is no change. My logic was to try to capture letters that were repeated, but I must be doing something wrong. Please, any help will be greatly appreciated.
解决方案
You can use
a_df['text'] = a_df['text'].str.replace(r'([a-zA-Z])\1{2,}', r'\1', regex=True)
Details:
([a-zA-Z])
- capturing group with ID 1\1{2,}
- two or more occurrences (so, three or more letters together with the previous pattern) of Group 1 value (\1
is a replacement backreference to Group 1 value, make sure to use it in a raww string literal, else you would have to double backslashes).
推荐阅读
- kotlin - Kotlin:编写类似于 Java 的大型 lambda?
- laravel - 根据 url 在刀片模板上显示内容
- database - 如何使用 db 而不是 Laravel 本地化配置文件
- sql - Rank() Over Partition By 对表列进行排名以给我较旧的记录
- javascript - 如何禁用反应选择中的某些选项
- sql - 为什么 SQL Server 不允许我将 '21/04/17' 存储为日期?
- python - 仅当某个字段为 True 时,如何才能渲染视图?- 姜戈
- c# - WPF - 按钮上下文菜单未正确显示
- java - Spring Security JDBC认证登录用户错误
- python - 查找列表中多个集合之间的交集