首页 > 解决方案 > 每当两列的值之间存在匹配时,如何打印时间戳?

问题描述

我有两个名为 dataset1 和 dataset 2 的数据框(如下所示)。“模式”和“SAX”列包含字符串值。

dataset1=
       pattern   tstamps
0    glngsyu     1610460
1    zicobgm     1610466
2    eerptow        .
3    cqbsynt        .
4    zvmqben        .
..       ...
475  rfikekw
476  bnbzvqx
477  rsuhgax
478  ckhloio
479  lbzujtw

480 rows × 1 columns

dataset2 =
    SAX     timestamp
0   hssrlcu 16015
1   ktyuymp 16016
2   xncqmfr 16017
3   aanlmna 16018
4   urvahvo 16019
... ... ...
263455  jeivqzo 279470
263456  bzasxgw 279471
263457  jspqnqv 279472
263458  sxwfchj 279473
263459  gxqnhfr 279474

263460 rows × 2 columns

每当数据集1的“模式”列的值与数据集2的“SAX”列的值匹配时,是否有方法/函数打印数据集1的“tstamps”?

PS:这是一个可用于生成 dataset1 和 dataset2 的代码片段:

import pandas as pd
import numpy as np

def sax_generator(num):
    return [''.join(chr(x) for x in np.random.randint(97, 97+26, size=4)) for _ in range(num)]

dataset1 = pd.DataFrame({'pattern': sax_generator(480), 'tstamps': range(480)})
dataset2 = pd.DataFrame({'sax': sax_generator(263460 ), 'timestamp': range(263460 )})

标签: pythonstringdata-science

解决方案


您可以使用Series.isin

import pandas as pd

dataset1 = pd.DataFrame([['value1', 1234], ['value2', 12345], ['value3', 12346],
                         ['value4', 12347], ['value5', 12348], ['value6', 12349]],
                        columns=['pattern', 'tstamps'])

dataset2 = pd.DataFrame([['value10', 1234], ['value2', 12345], ['value30', 12346],
                         ['value4', 12347], ['value50', 12347], ['value6', 12347], ],
                        columns=['sax', 'timestamp'])

timestamps = dataset1[dataset1['pattern'].isin(dataset2['sax'])]['tstamps']
print(timestamps)

# Result (type: pandas.Series), do timestamps.tolist() to get python list
1    12345
3    12347
5    12349
Name: tstamps, dtype: int64

推荐阅读