Intentional Control of Type I Error Over Unconscious Data Distortion: A Neyman–Pearson Approach to Text Classification

Lucy Xia, Richard Zhao, Yanhui Wu & Xin Tong
This article addresses the challenges in classifying textual data obtained from open online platforms, which are vulnerable to distortion. Most existing classification methods minimize the overall classification error and may yield an undesirably large Type I error (relevant textual messages are classified as irrelevant), particularly when available data exhibit an asymmetry between relevant and irrelevant information. Data distortion exacerbates this situation and often leads to fallacious prediction. To deal with inestimable data distortion, we propose...
This data repository is not currently reporting usage information. For information on how your repository can submit usage information, please see our documentation.