python - 如何将正则表达式匹配函数的结果放入熊猫数据框中?
问题描述
我有这个功能:
def find_regex(regex, text, opzione2= None, opzione3 = None):
lista = []
for x in text:
matches_prima = re.findall(regex, x)
matches_prima2 = []
matches_prima3 = []
if opzione2 is not None:
matches_prima2 = re.findall(opzione2, x)
if opzione3 is not None:
matches_prima3 = re.findall(opzione3, x)
lunghezza = len(matches_prima) + len(matches_prima2) + len(matches_prima3)
lista.append(lunghezza)
print(sum(lista))
假设我的文本是“我爱猫”,正则表达式是“猫”,结果将是一个。
我有
def pandas():
regex1 = re.compile(r"cat")
text1 = "I love cats"
find_regex(regex1, text1) #it returns 1
df = pd.DataFrame([find_regex(regex1, text1)])
print(df)
它行不通。正确的方法是什么?
解决方案
我在这里推断了很多事情,但我认为至少它使您处于具有操作代码的增强状态。
用作输入的文本:
Text
Be cxt careful with that cxt.
Be cat careful with that cat.
Stop waiting for the cat to just happen.
He found the cat covered with cats.
The dogmatic people who cat insist are so cathartic!
There should have been a cat and a dog.
Three people with six decades of cat experience.
import pandas as pd
import re
def find_regex(regex, text, opzione2=None, opzione3=None):
matches_prima = re.findall(regex, text)
lunghezza = len(matches_prima)
if opzione2:
matches_prima2 = re.findall(opzione2, text)
lunghezza += len(matches_prima2)
if opzione3:
matches_prima3 = re.findall(opzione3, text)
lunghezza += len(matches_prima3)
return lunghezza
df = pd.read_csv("data.txt")
print(df)
regex1 = r"cat"
regex2 = r"dog"
regex3 = r"people"
df["CntRegex[1]"] = df["Text"].map(lambda x: find_regex(regex1, x))
df["CntRegex[1&2]"] = df["Text"].map(lambda x: find_regex(regex1, x, regex2))
df["CntRegex[1&2&3]"] = df["Text"].map(lambda x: find_regex(regex1, x, regex2, regex3))
with pd.option_context('display.max_colwidth', 25, "display.max_columns", None):
print(df)
Text CntRegex[1] CntRegex[1&2] CntRegex[1&2&3]
0 Be cxt careful with t... 0 0 0
1 Be cat careful with t... 2 2 2
2 Stop waiting for the ... 1 1 1
3 He found the cat cove... 2 2 2
4 The dogmatic people w... 2 3 4
5 There should have bee... 1 2 2
6 Three people with six... 1 1 2
推荐阅读
- regex - 如何使用正则表达式匹配字符串中的模式
- html - 谷歌地图绝对定位重叠固定位置导航栏
- angular - 测试 angular if else block 和 subscribe(response => {} block in jasmine
- c# - 从 JSON 数组中捕获值
- powerbi - 如何在PowerBI中提取文本直到URL的第三个斜杠
- excel - 使用索引/匹配公式跳过空白单元格
- c# - 尽管“AllowEdit = true”,为什么DataView是只读的?
- html - bootstrap 4单选按钮不符合填充
- javascript - JS Bundle:如何确保构建前端包的另一个开发人员具有相同的依赖项?
- sql - 获取当前 RDBMS 的标准 SQL 命令