About ODDS

Home  >>  About ODDS

Outliers or anomalies are instances that do not conform to the norm of a dataset. Outlier detection is an important data mining problem that has been researched within diverse research areas and applications domains such as intrusion detection, fraud detection, unusual event detection, disease condition detection etc.

The exact notion of an outlier is different for different application domains. Hence, applying a technique developed for one domain to another is not straightforward. Moreover, availability of labeled data for training/validation of outlier detection methods is scarce and often noise contained in data tends to be similar to outliers, thus makes it difficult to distinguish them. Because of these challenges outlier detection is not an easy problem to solve. Furthermore, research on outlier detection has been held back by the lack of good benchmark datasets with ground truths. Existing benchmarks are typically either proprietary or else very artificial. Moreover, existing real-world outlier/anomaly detection datasets lack the availability of ground truth.

In ODDS, we openly provide access to a large collection of outlier detection datasets with ground truth (if available). Our focus is to provide datasets from different domains and present them under a single platform for the research community. As such, we arrange the datasets based on their types into different tables in ODDS library.

The ODDS library is being actively developed since summer 2016 and is growing as a result of our research pursuits in outlier/anomaly mining and also to help the corresponding research community. Researchers are welcome to share their datasets with us to include in ODDS library by emailing [email protected].

Disclaimer: ODDS library contains datasets collected by DATALab as well as many other different research groups. Readers are suggested to email the corresponding contacts of the research group for specific datasets. 

Contact Info

Please use the above contact or comment section of the website to send us comments, questions, bug reports, broken links and inquiries about hosting your datasets on our website.

Citation Policy

If you publish material based on data sets obtained from this library, then, please note the assistance you received by using this library by including citation. This will help others to obtain the same data sets and replicate your experiments. We suggest the following format for referring to this library:

Shebuti Rayana (2016).  ODDS Library [https://shebuti.com/outlier-detection-datasets-odds/]. Stony Brook, NY: Stony Brook University, Department of Computer Science.

Here is a BiBTeX citation:

@misc{Rayana:2016 ,
author = “Shebuti Rayana”,
year = “2016”,
title = “{ODDS} Library”,
url = “https://shebuti.com/outlier-detection-datasets-odds/“,
institution = “Stony Brook University, Department of Computer Sciences” }

A few data sets have additional citation requests. These requests can be found on the bottom of each data set’s web page.