首页 > 解决方案 > 如何比较一个字符串的两列并将一列中的字符串大小写替换为另一列?

问题描述

我有两列句子和更新。我想将 Url 末尾的 Updates 列中的每个单词与相应的 Sentences 单词大小写进行匹配,并将其替换为 Sentences 中单词的大小写。

我不知道如何进行这种比较任何帮助表示赞赏。实际数据有 43k 行不同的 Url。

示例代码:

import pandas as pd

dict1 = {'Updates': ['The new abc.com/Line','Its a abc.com/bright and abc.com/Sunny Day','abc.com/smartphone have taken our the abc.com/WORLD','abc.com/GLOBAL Warming is abc.com/Reaching its abc.com/peak'],
     'Sentences': ['The new line','Its a bright and sunny day','Smartphone have taken our the World','GLOBAL Warming is reaching its Peak ']
        }

df = pd.DataFrame(dict1)

当前 O/P:

Sentences           Updates
The new line            The new abc.com/Line

Its a bright and sunny day          Its a abc.com/bright and abc.com/Sunny Day

Smartphone have taken our the World         abc.com/smartphone have taken our the abc.com/WORLD

GLOBAL Warming is reaching its Peak             abc.com/GLOBAL Warming is abc.com/Reaching its abc.com/peak
Expected O/P:

Sentences           Updates
The new line            The new abc.com/line

Its a bright and sunny day          Its a abc.com/bright and abc.com/sunny day

Smartphone have taken our the World         abc.com/Smartphone have taken our the abc.com/World

GLOBAL Warming is reaching its Peak             abc.com/GLOBAL Warming is abc.com/reaching its abc.com/Peak

标签: pythonregexpandaspattern-matching

解决方案


利用re

代码:

import re

dict1 = {
    'Sentences': [
        'The new line',
        'Its a bright and sunny day',
        'Smartphone have taken our the World',
        'GLOBAL Warming is reaching its Peak '
    ],
    'Updates': [
        'The new abc.com/Line',
        'Its a abc.com/bright and abc.com/Sunny Day',
        'abc.com/smartphone have taken our the abc.com/WORLD',
        'abc.com/GLOBAL Warming is abc.com/Reaching its abc.com/peak'
    ]
 }
for sentence, update in zip(dict1['Sentences'], dict1['Updates']):
    urls = [x.split("/")[-1] for x in update.split() if "/" in x]
    for url in urls:
        update = (re.sub(url, re.search(url, sentence, re.IGNORECASE).group(), update, flags=re.IGNORECASE))

    print(f"{sentence}\t{update}")

输出:

The new line    The new abc.com/line
Its a bright and sunny day  Its a abc.com/bright and abc.com/sunny Day
Smartphone have taken our the World abc.com/Smartphone have taken our the abc.com/World
GLOBAL Warming is reaching its Peak     abc.com/GLOBAL Warming is abc.com/reaching its abc.com/Peak

推荐阅读