Outlier Detection and Explanation for Domain Experts

Publikation: Bog/antologi/afhandling/rapportPh.d.-afhandlingForskning

Standard

Outlier Detection and Explanation for Domain Experts. / Micenková, Barbora.

Department of Computer Science, University of Aarhus, 2015. 100 s.

Publikation: Bog/antologi/afhandling/rapportPh.d.-afhandlingForskning

Harvard

Micenková, B 2015, Outlier Detection and Explanation for Domain Experts. Department of Computer Science, University of Aarhus.

APA

Micenková, B. (2015). Outlier Detection and Explanation for Domain Experts. Department of Computer Science, University of Aarhus.

CBE

Micenková B 2015. Outlier Detection and Explanation for Domain Experts. Department of Computer Science, University of Aarhus. 100 s.

MLA

Micenková, Barbora Outlier Detection and Explanation for Domain Experts Department of Computer Science, University of Aarhus. 2015.

Vancouver

Micenková B. Outlier Detection and Explanation for Domain Experts. Department of Computer Science, University of Aarhus, 2015. 100 s.

Author

Micenková, Barbora. / Outlier Detection and Explanation for Domain Experts. Department of Computer Science, University of Aarhus, 2015. 100 s.

Bibtex

@phdthesis{b0de662fedb946fc9cf99076461d95cf,
title = "Outlier Detection and Explanation for Domain Experts",
abstract = "In many data exploratory tasks, extraordinary and rarely occurring patternscalled outliers are more interesting than the prevalent ones. For example, theycould represent frauds in insurance, intrusions in network and system monitoring,or motion in video surveillance. Decades of research have producedvarious outlier detection algorithms. It is commonly known that these algorithmsare difficult to apply and interpret in practice for a variety of reasons.In this thesis we propose novel algorithms that provide robust performance,support for validation and interpretability for outlier detection in practice andwe empirically evaluate them on synthetic and real world data sets.First, we tackle the problem that most algorithms leave the end user withoutany explanation of how or why the identified outliers deviate. Such knowledgeis important for domain experts in order to be able to validate the outputof outlier detection algorithms and perhaps then take necessary actions. Tothis end we develop an algorithm that outputs an outlierness score and an accompanyingexplanation in the form of relevancy feature weights to each datapoint. We further present a general explanation technique that given a querypoint on input, outputs its outlier explanation in the form of the attributesubset where the point is the most separable from the other data.In the second part we address the problem that unsupervised outlier detectionalgorithms require a lot of user input for model selection which leads topoor overall performance. Furthermore, in many applications some labeled examplesof outliers are available but not sufficient enough in number as trainingdata for standard supervised learning methods. As such, this valuable informationis typically ignored. We introduce a new paradigm for outlier detectionwhere supervised and unsupervised information are combined to improve theperformance while reducing the sensitivity to parameters of individual outlierdetection algorithms. We do this by learning a new representation using theoutliers from outputs of unsupervised outlier detectors as input to a supervisedclassifier. The resulting method is robust to parameters and as such itcan be easily applied to data by non-experts in data mining. We also considerthe case where computational resources at test time are limited and introducea feature selection technique that respects a computational budget whileretaining good predictive performance.",
author = "Barbora Micenkov{\'a}",
year = "2015",
language = "English",
publisher = "Department of Computer Science, University of Aarhus",

}

RIS

TY - BOOK

T1 - Outlier Detection and Explanation for Domain Experts

AU - Micenková, Barbora

PY - 2015

Y1 - 2015

N2 - In many data exploratory tasks, extraordinary and rarely occurring patternscalled outliers are more interesting than the prevalent ones. For example, theycould represent frauds in insurance, intrusions in network and system monitoring,or motion in video surveillance. Decades of research have producedvarious outlier detection algorithms. It is commonly known that these algorithmsare difficult to apply and interpret in practice for a variety of reasons.In this thesis we propose novel algorithms that provide robust performance,support for validation and interpretability for outlier detection in practice andwe empirically evaluate them on synthetic and real world data sets.First, we tackle the problem that most algorithms leave the end user withoutany explanation of how or why the identified outliers deviate. Such knowledgeis important for domain experts in order to be able to validate the outputof outlier detection algorithms and perhaps then take necessary actions. Tothis end we develop an algorithm that outputs an outlierness score and an accompanyingexplanation in the form of relevancy feature weights to each datapoint. We further present a general explanation technique that given a querypoint on input, outputs its outlier explanation in the form of the attributesubset where the point is the most separable from the other data.In the second part we address the problem that unsupervised outlier detectionalgorithms require a lot of user input for model selection which leads topoor overall performance. Furthermore, in many applications some labeled examplesof outliers are available but not sufficient enough in number as trainingdata for standard supervised learning methods. As such, this valuable informationis typically ignored. We introduce a new paradigm for outlier detectionwhere supervised and unsupervised information are combined to improve theperformance while reducing the sensitivity to parameters of individual outlierdetection algorithms. We do this by learning a new representation using theoutliers from outputs of unsupervised outlier detectors as input to a supervisedclassifier. The resulting method is robust to parameters and as such itcan be easily applied to data by non-experts in data mining. We also considerthe case where computational resources at test time are limited and introducea feature selection technique that respects a computational budget whileretaining good predictive performance.

AB - In many data exploratory tasks, extraordinary and rarely occurring patternscalled outliers are more interesting than the prevalent ones. For example, theycould represent frauds in insurance, intrusions in network and system monitoring,or motion in video surveillance. Decades of research have producedvarious outlier detection algorithms. It is commonly known that these algorithmsare difficult to apply and interpret in practice for a variety of reasons.In this thesis we propose novel algorithms that provide robust performance,support for validation and interpretability for outlier detection in practice andwe empirically evaluate them on synthetic and real world data sets.First, we tackle the problem that most algorithms leave the end user withoutany explanation of how or why the identified outliers deviate. Such knowledgeis important for domain experts in order to be able to validate the outputof outlier detection algorithms and perhaps then take necessary actions. Tothis end we develop an algorithm that outputs an outlierness score and an accompanyingexplanation in the form of relevancy feature weights to each datapoint. We further present a general explanation technique that given a querypoint on input, outputs its outlier explanation in the form of the attributesubset where the point is the most separable from the other data.In the second part we address the problem that unsupervised outlier detectionalgorithms require a lot of user input for model selection which leads topoor overall performance. Furthermore, in many applications some labeled examplesof outliers are available but not sufficient enough in number as trainingdata for standard supervised learning methods. As such, this valuable informationis typically ignored. We introduce a new paradigm for outlier detectionwhere supervised and unsupervised information are combined to improve theperformance while reducing the sensitivity to parameters of individual outlierdetection algorithms. We do this by learning a new representation using theoutliers from outputs of unsupervised outlier detectors as input to a supervisedclassifier. The resulting method is robust to parameters and as such itcan be easily applied to data by non-experts in data mining. We also considerthe case where computational resources at test time are limited and introducea feature selection technique that respects a computational budget whileretaining good predictive performance.

M3 - Ph.D. thesis

BT - Outlier Detection and Explanation for Domain Experts

PB - Department of Computer Science, University of Aarhus

ER -