- 导入两个数据框,并查看它们的列名和数据:
import pandas as pd
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
print('df1 columns:', df1.columns)
print('df2 columns:', df2.columns)
print('df1 head:\n', df1.head())
print('df2 head:\n', df2.head())
- 确定在哪些列上进行比较,并选择相应的列名:
compare_cols = ['From', 'To', 'Subject']
- 使用 merge 和 np.where 方法来比较两个数据框中的邮件数据:
import numpy as np
merged = df1.merge(df2, on=compare_cols, how='outer', suffixes=['_df1', '_df2'])
merged['comparison'] = np.where(merged.isna().sum(axis=1) == 0, 'match', 'mismatch')
- 查看比较结果:
print('Comparison results:\n', merged[['From', 'To', 'Subject', 'comparison']])