filter one dataframe by another
df1 = pd.DataFrame({'c': ['A', 'A', 'B', 'C', 'C'], 'k': [1, 2, 2, 2, 2], 'l': ['a', 'b', 'a', 'a', 'd']}) df2 = pd.DataFrame({'c': ['A', 'C'], 'l': ['b', 'a']}) keys = list(df2.columns.values) i1 = df1.set_index(keys).index i2 = df2.set_index(keys).index df1[~i1.isin(i2)]
Here is what the above code is Doing:
1. Set the index of df1 and df2 to the columns specified in keys.
2. Get the index of df1 and df2.
3. Use the index of df1 to select the rows of df1 that are not in the index of df2.