Aarhus University Seal / Aarhus Universitets segl

Outlier Detection and Explanation for Domain Experts

Publikation: Bog/antologi/afhandling/rapportPh.d.-afhandlingForskning

Dokumenter

  • Barbora Micenková, Danmark
In many data exploratory tasks, extraordinary and rarely occurring patterns
called outliers are more interesting than the prevalent ones. For example, they
could represent frauds in insurance, intrusions in network and system monitoring,
or motion in video surveillance. Decades of research have produced
various outlier detection algorithms. It is commonly known that these algorithms
are difficult to apply and interpret in practice for a variety of reasons.
In this thesis we propose novel algorithms that provide robust performance,
support for validation and interpretability for outlier detection in practice and
we empirically evaluate them on synthetic and real world data sets.
First, we tackle the problem that most algorithms leave the end user without
any explanation of how or why the identified outliers deviate. Such knowledge
is important for domain experts in order to be able to validate the output
of outlier detection algorithms and perhaps then take necessary actions. To
this end we develop an algorithm that outputs an outlierness score and an accompanying
explanation in the form of relevancy feature weights to each data
point. We further present a general explanation technique that given a query
point on input, outputs its outlier explanation in the form of the attribute
subset where the point is the most separable from the other data.
In the second part we address the problem that unsupervised outlier detection
algorithms require a lot of user input for model selection which leads to
poor overall performance. Furthermore, in many applications some labeled examples
of outliers are available but not sufficient enough in number as training
data for standard supervised learning methods. As such, this valuable information
is typically ignored. We introduce a new paradigm for outlier detection
where supervised and unsupervised information are combined to improve the
performance while reducing the sensitivity to parameters of individual outlier
detection algorithms. We do this by learning a new representation using the
outliers from outputs of unsupervised outlier detectors as input to a supervised
classifier. The resulting method is robust to parameters and as such it
can be easily applied to data by non-experts in data mining. We also consider
the case where computational resources at test time are limited and introduce
a feature selection technique that respects a computational budget while
retaining good predictive performance.
OriginalsprogEngelsk
ForlagDepartment of Computer Science, University of Aarhus
Antal sider100
StatusUdgivet - 2015

Note vedr. afhandling

Main Supervisor: Ira Assent

Se relationer på Aarhus Universitet Citationsformater

Download-statistik

Ingen data tilgængelig

ID: 85339943