Annthyroid dataset

Home >> Annthyroid dataset

Dataset information

The original thyroid disease (ann-thyroid) dataset from UCI machine learning repository is a classification dataset, which is suited for training ANNs. It has 3772 training instances and 3428 testing instances. It has 15 categorical and 6 real attributes. The problem is to determine whether a patient referred to the clinic is hypothyroid. Therefore three classes are built: normal (not hypothyroid), hyperfunction and subnormal functioning. For outlier detection, both training and testing instances are used, with only 6 real attributes. The hyperfunction and subnormal classes are treated as outlier class and the other one as inliers class.

Source (citation)

Abe, Naoki, Bianca Zadrozny, and John Langford. “Outlier detection by active learning.” Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006.

Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. “Isolation forest.” 2008 Eighth IEEE International Conference on Data Mining. IEEE, 2008.

K. M. Ting, J. T. S. Chuan, and F. T. Liu. “Mass: A New Ranking Measure for Anomaly Detection.“, IEEE Transactions on Knowledge and Data Engineering, 2009.

Downloads

File: annthyroid.mat

Description: X = Multi-dimensional point data, y = labels (1 = outliers, 0 = inliers)