This article presents a brief explanation of Outliers, and how this type of analysis is used.
What is Outlier Analysis?
An outlier is an element of a data set that distinctly stands out from the rest of the data. In other words, outliers are those data points that lie outside the overall pattern of distribution as shown in figure below.
The easiest way to detect outliers is to create a graph. Plots such as Box plots, Scatterplots and Histograms can help to detect outliers. Alternatively, we can use mean and standard deviation to list out the outliers. Interquartile Range and Quartiles can also be used to detect outliers.
Here is another illustration of an outlier. If you look at the Histogram below, you will see that one value lies far to the left of all other data. This data point is an outlier.
How Can Outlier Detection Improve Business Analysis?
Outlier data points can represent either a) items that are so far outside the norm that they need not be considered or b) the illustration of a very unique and singular category or variable that is worth exploring either to capitalize on a niche or find an area where an organization can offer a unique focus.
When considering the use of Outlier analysis, a business should first think about why they want to find the outliers and what they will do with that data. That focus will help the business to select the right method of analysis, graphing or plotting to reveal the results they need to see and understand.
When considering the use of Outlier analysis, it is important to recognize that, when the Outlier analysis is applied to certain datasets, the results will indicate that outliers should be discounted, while in other cases, the outlier results will indicate that the organization should focus solely on those outliers. For example, if an outlier indicates a risk or a mistake, that outlier should be identified and the risk or mistake should be addressed. If an outlier indicates an exceptional result, such as a person that recovered from a particular disease in spite of the fact that most other patients did not survive, the organization will want to perform further analysis on the outlier result to identify the unique aspects that may be responsible for the patient’s recovery.
When a business uses Outlier analysis, it is important to test the results and analyze the overall dataset and environment to be sure that the presence of outliers does not indicate that the dataset may be more complex than anticipated and may require a different form of analysis.
The Smarten approach to augmented analytics and modern business intelligence focuses on the business user and provides tools for Advanced Data Discovery so users can perform early prototyping and test hypotheses without the skills of a data scientist. Smarten Augmented Analytics tools include assisted predictive modeling, smart data visualization, self-serve data preparation, Clickless Analytics with natural language processing (NLP) for search analytics, Auto Insights, Key Influencer Analytics, and SnapShot monitoring and alerts. These tools are designed for business users with average skills and require no specialized knowledge of statistical analysis or support from IT or data scientists. Businesses can advance Citizen Data Scientist initiatives with in-person and online workshops and self-paced eLearning courses designed to introduce users and businesses to the concept, illustrate the benefits and provide introductory training on analytical concepts and the Citizen Data Scientist role.
The Smarten approach to data discovery is designed as an augmented analytics solution to serve business users. Smarten is a representative vendor in multiple Gartner reports including the Gartner Modern BI and Analytics Platform report and the Gartner Magic Quadrant for Business Intelligence and Analytics Platforms Report.