首页 > 解决方案 > 如何将正则表达式匹配函数的结果放入熊猫数据框中?

问题描述

我有这个功能:

def find_regex(regex, text, opzione2= None, opzione3 = None):
    lista = []
    for x in text:
        matches_prima = re.findall(regex, x)
        matches_prima2 = []
        matches_prima3 = []
        if opzione2 is not None:
            matches_prima2 = re.findall(opzione2, x)
            if opzione3 is not None:
                matches_prima3 = re.findall(opzione3, x)
        lunghezza = len(matches_prima) + len(matches_prima2) + len(matches_prima3)
        lista.append(lunghezza)

    print(sum(lista))

假设我的文本是“我爱猫”,正则表达式是“猫”,结果将是一个。

我有

def pandas():
  regex1 = re.compile(r"cat")
  text1 = "I love cats"
  find_regex(regex1, text1) #it returns 1
  df = pd.DataFrame([find_regex(regex1, text1)])
  print(df)

它行不通。正确的方法是什么?

标签: pythonpython-3.xregexpandasdataframe

解决方案


我在这里推断了很多事情,但我认为至少它使您处于具有操作代码的增强状态。

用作输入的文本:

Text
Be cxt careful with that cxt.
Be cat careful with that cat.
Stop waiting for the cat to just happen.
He found the cat covered with cats.
The dogmatic people who cat insist are so cathartic!
There should have been a cat and a dog.
Three people with six decades of cat experience.
import pandas as pd
import re

def find_regex(regex, text, opzione2=None, opzione3=None):
    matches_prima = re.findall(regex, text)
    lunghezza = len(matches_prima)
    if opzione2:
        matches_prima2 = re.findall(opzione2, text)
        lunghezza += len(matches_prima2)
        if opzione3:
            matches_prima3 = re.findall(opzione3, text)
            lunghezza += len(matches_prima3)
    return lunghezza

df = pd.read_csv("data.txt")
print(df)

regex1 = r"cat"
regex2 = r"dog"
regex3 = r"people"

df["CntRegex[1]"] = df["Text"].map(lambda x: find_regex(regex1, x))
df["CntRegex[1&2]"] = df["Text"].map(lambda x: find_regex(regex1, x, regex2))
df["CntRegex[1&2&3]"] = df["Text"].map(lambda x: find_regex(regex1, x, regex2, regex3))

with pd.option_context('display.max_colwidth', 25, "display.max_columns", None):
    print(df)
                       Text  CntRegex[1]  CntRegex[1&2]  CntRegex[1&2&3]
0  Be cxt careful with t...            0              0                0
1  Be cat careful with t...            2              2                2
2  Stop waiting for the ...            1              1                1
3  He found the cat cove...            2              2                2
4  The dogmatic people w...            2              3                4
5  There should have bee...            1              2                2
6  Three people with six...            1              1                2

推荐阅读