#------------------------------------------------------------------------------ # accept a dataframe, remove outliers, return cleaned data in a new dataframe # see http://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm #------------------------------------------------------------------------------ def remove_outlier(df_in, col_name): q1 = df_in[col_name].quantile(0.25) q3 = df_in[col_name].quantile(0.75) iqr = q3-q1 #Interquartile range fence_low = q1-1.5*iqr fence_high = q3+1.5*iqr df_out = df_in.loc[(df_in[col_name] > fence_low) & (df_in[col_name] < fence_high)] return df_out

**Here is what the above code is Doing:**

1. Calculate the first quartile (Q1) and third quartile (Q3) of the variable.

2. Calculate the interquartile range (IQR) by subtracting Q3 from Q1.

3. Calculate the lower fence by subtracting 1.5 times IQR from the first quartile. Any values less than this are considered outliers.

4. Calculate the upper fence by adding 1.5 times IQR to the third quartile. Any values greater than this are considered outliers.

5. Finally, any values which fall outside of the upper and lower bounds are considered outliers and are removed.