import pandas as pd
df = pd.DataFrame({'eventType': ['A', 'A', 'B', 'B', 'B', 'C', 'C'], 'eventId': ['1', '1', '2', '2', '3', '4', '4'], 'date': ['2021-01-01', '2021-01-02', '2021-01-05', '2021-01-10', '2021-01-15', '2021-01-20', '2021-01-25']})
df['date'] = pd.to_datetime(df['date']) # 将日期转换为日期格式 df['diff'] = df.groupby(['eventType', 'eventId'])['date'].diff() # 计算日期差
print(df)
eventType eventId date diff 0 A 1 2021-01-01 NaT 1 A 1 2021-01-02 1 days 2 B 2 2021-01-05 NaT 3 B 2 2021-01-10 5 days 4 B 3 2021-01-15 NaT 5 C 4 2021-01-20 NaT 6 C 4 2021-01-25 5 days
这段代码将数据框按照eventType和eventId进行分组,并计算每个eventType之间的日期差。我们使用了pandas库的groupby函数和diff函数来实现这个功能。