Handling Outliers, is it difficult?!
Before Handling Outliers, first Lets us see what Outliers are and how does these Outliers impact on the performance.
What are Outliers?
An Outlier is an Abnormal Observation of the data which lies at a long or abnormal distance from the other normal values in the random sample of a Population.
How does these Outliers impact on the Performance?
Outlier Affect on variance, and standard deviation of a data distribution. In a data distribution, with extreme outliers, the distribution is skewed in the direction of the outliers which makes it difficult to analyze the data. And hence the prediction.
How to identify the Outliers?
As Stated Outliers are the one which stands out or aside from the normal data Points. So one way to detect Outliers is by Visualizing the Data. Visualization Techniques such as Scatter plot, Box plot etc.
If the Data can’t be Visualized, then here is the trick to detect Outliers by performing some Mathematical Operations:
- Percentile.
- By using 3 Standard Deviation.
- With the help of Z-Test.
1.Detecting Outliers using Percentile:
Set the Criteria to which you want to find Outliers, find the upper and lower bounds for detecting Outliers, now with the help of Pandas find the Data Frames which comes in between the Upper and Lower bounds, these data frames are inliers.
2.By Using Standard Deviation:
Standard Deviation for finding out the Outliers. Standard Deviation is helpful when we are dealing with Normal Distribution(bell shaped curve) which can be easily observed.
upper limit=dataframe.mean()+3* std of .dataframe
lower limit=dataframe.mean()-3*std of. dataframe
detect the Outliers if the data frame values are greater than the upper limit or less than the Lower limit.
values which are present in between the upper limit and lower limit are inliers.
3.By using Z-score:
get the Data points that has Z score greater than 3 or less than 3.
here’s my LinkedIn id you can ping me for your queries,
www.linkedin.com/in/korrapati-sindhu-49445b18b
Comment and let me know your feedback, suggestions are most welcome!