impute data by using groupby and transform 1

impute data by using groupby and transform

import pandas as pd
from datetime import datetime

def generate_data():
    ...

t = datetime.now()
df = generate_data()
df['value'] = df.groupby(['category', 'name'])['value']\
    .transform(lambda x: x.fillna(x.mean()))
print(datetime.now()-t)

# 0:00:00.016012

t = datetime.now()
df = generate_data()
df["value"] = df.groupby(['category', 'name'])\
    .transform(lambda x: x.fillna(x.mean()))['value']
print(datetime.now()-t)

# 0:00:00.030022

Here is what the above code is Doing:
1. Grouping by category and name
2. Filling the NaN values with the mean of the group
3. Selecting the value column

The first method is faster because it doesn’t have to select the value column.

Similar Posts