Risk Analysis

Risk Analysis Framework


Given a classification task and a trained deep model, risk analysis analyzes and evaluates the risk that the model mislabels a target instance. We have made the following specific contributions:

  1. We named the task as risk analysis. The name is attributed to financial investment theory, which needs to measure the uncertainty of investment rewards. Instead of using a single value to indicate label status, we propose to represent it by a distribution and use the uncertainty metrics borrowed from financial investment theory to measure its fluctuation risk;
  2. We introduced the concept of risk features, and proposed a technique based on one-sided decision trees to automatically generate risk features;
  3. We proposed a learnable risk model, and presented its training techniques;
  4. We have applied risk analysis to enable quality control for human-machine collaboration on classification tasks.

Our current work focused on entity resolution. The proposed framework and techniques can however be generalized to various classification tasks. Risk analysis is by itself an important and interesting research problem. Moreover, it can have a profound impact on the design and implementation of core machine learning operations, e.g. active selection of training instances, model training and model selection. Therefore, our work opens an interesting and promising research direction.


Adaptive Deep Learning for Network Intrusion Detection by Risk Analysis. Neurocomputing,2022.
Lijun Zhang, Xingyu Lu, Zhaoqiang Chen, Tianwei Liu, Qun Chen, Zhanhuai Li
[Abstract]  [PDF]  [Code]  [Data]  [Detail]

With increasing connectedness, network intrusion has become a critical security concern for modern information systems. The state-of-the-art performance of Network Intrusion Detection(NID) has been achieved by deep learning. Unfortunately, NID remains very challenging, and in real scenarios, deep models may still mislabel many network activities. Therefore, there is a need for risk analysis, which aims to know which activities may be mislabeled and why.
In this paper,we propose a novel solution of interpretable risk analys is for NID that can rank the activities in a task by their mislabeling risk. Built upon the existing framework of LearnRisk, it first extracts interpretable risk features and then trains a risk model by a learning-to-rank objective. It constructs risk features based on domain knowledge of network intrusion as well as statistical characteristics of activities. Furthermore, we demonstrate how to leverage risk analysis to improve prediction accuracy of deep models. Specifically, we present an adaptive training approach for NID that can effectively fine-tune a deep model towards a particular workload by minimizing its misprediction risk. Finally, we empirically evaluate the performance of the proposed solutions on real benchmark data. Our extensive experiments have shown that the proposed solution of risk analysis can identify mislabeled activities with considerably higher accuracy than the existing alternatives, and the proposed solution of adaptive training can effectively improve the performance of deep models by considerable margins in both offline and online settings.

Selected Publications

Towards Interpretable and Learn able Risk Analysis for Entity Resolution. International Conference on Management of Data (SIGMOD), 2020.
Zhaoqiang Chen, Qun Chen, Boyi Hou, Tianyi Duan, Zhanhuai Li and Guoliang Li
[Abstract]  [Bibtex]  [PDF]  [Code]

Machine-learning-based entity resolution has been widely studied. However, some entity pairs may be mislabeled by machine learning models and existing studies do not study the risk analysis problem-predicting and interpreting which entity pairs are mislabeled. In this paper, we propose an in terpretable and learnable framework for risk analysis, which aims to rank the labeled pairs based on their risks of being mislabeled. We first describe how to automatically generate interpretable risk features, and then present a learnable risk model and its training technique. Finally, we empirically eval uate the performance of the proposed approach on real data. Our extensive experiments have shown that the learning risk model can identify the mislabeled pairs with considerably higher accuracy than the existing alternatives.

title={Towards Interpretable and Learnable Risk Analysis for Entity Resolution},
author={Chen, Zhaoqiang and Chen, Qun and Hou, Boyi and Duan, Tianyi and Li, Zhanhuai and Li, Guoliang},
j ournal={arXiv preprint arXiv:1912.02947},

Improving Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective. International Workshop on Real-Time Business Intelligence and Analytics, 2018.
Zhaoqiang Chen, Qun Chen, Boyi Hou, Murtadha Ahmed, Zhanhuai Li
[Abstract]  [Bibtex]  [PDF]  [Technical report]

Pure machine-based solutions usually struggle in the challenging classification tasks such as entity resolution (ER). To alleviate this problem, a recent trend is to involve the human in the resolution process, most notably the crowdsourcing approach. However, it remains very challenging to effectively improve machine-based entity resolution with limited human effort. In this paper, we investigate the problem of human and machine cooperation for ER from a risk perspective. We propose to select the machine-labeled instances at high risk of being mislabeled for manual verification. For this task, we present a risk model that takes into consideration the human-labeled instances as well as the output of machine resolution. Finally, we evaluate the performance of the proposed risk model on real data. Our experiments demonstrate that it can pick up the mislabeled instances with considerably higher accuracy than the existing alternatives. Provided with the same amount of human cost budget, it can also achieve better resolution quality than the state-of-the-art approach based on active learning.

title={Improving Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective},
author={Chen, Zhaoqiang and Chen, Qun and Hou, Boyi and Ahmed, Murtadha and Li, Zhanhuai},
booktitle={Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics},

r-HUMO: A Risk-aware Human-Machine Cooperation Framework for Entity Resolution with Quality Guarantees. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2018.
Boyi Hou, Qun Chen, Zhaoqiang Chen, Youcef Nafa, Zhanhuai Li
[Abstract]  [Bibtex]  [PDF]  [Technical report]

Even though many approaches have been proposed for entity resolution (ER), it remains very challenging to enforce quality guarantees. To this end, we propose a risk-aware HUman-Machine cOoperation framework for ER, denoted by r-HUMO. Built on the existing HUMO framework, r-HUMO similarly enforces both precision and recall guarantees by partitioning an ER workload between the human and the machine. However, r-HUMO is the first solution that optimizes the process of human workload selection from a risk perspective. It iteratively selects human workload by real-time risk analysis based on the human-labeled results as well as the pre-specified machine metric. In this paper, we first introduce the r-HUMO framework and then present the risk model to prioritize the instances for manual inspection. Finally, we empirically evaluate r-HUMO's performance on real data. Our extensive experiments show that r-HUMO is effective in enforcing quality guarantees, and compared with the state-of-the-art alternatives, it can achieve desired quality control with reduced human cost.

title={r-HUMO: A Risk-aware Human-Machine Cooperation Framework for Entity Resolution with Quality Guarantees},
author={Hou, Boyi and Chen, Qun and Chen, Zhaoqiang and Nafa, Youcef and Li, Zhanhuai},
booktitle={IEEE Transactions on Knowledge and Data Engineering (TKDE)},

Enabling Quality Control for Entity Resolution: A Human and Machine Cooperation Framework. ICDE 2018.
Zhaoqiang Chen, Qun Chen, Fengfeng Fan, Yanyan Wang, Zhuo Wang, Youcef Nafa, Zhanhuai Li, Hailong Liu, Wei Pan
[Abstract]  [Bibtex]  [PDF]  [Slides]

Even though many machine algorithms have been proposed for entity resolution, it remains very challenging to find a solution with quality guarantees. In this paper, we propose a novel Human and Machine cOoperation (HUMO) framework for entity resolution (ER), which divides an ER workload between the machine and the human. HUMO enables a mechanism for quality control that can flexibly enforce both precision and recall levels. We introduce the optimization problem of HUMO, minimizing human cost given a quality requirement, and then present three optimization approaches: a conservative baseline one purely based on the monotonicity assumption of precision, a more aggressive one based on sampling and a hybrid one that can take advantage of the strengths of both previous approaches. Finally, we demonstrate by extensive experiments on real and synthetic datasets that HUMO can achieve high-quality results with reasonable return on investment (ROI) in terms of human cost, and it performs considerably better than the state-of-the-art alternatives in quality control.

author={Z. Chen and Q. Chen and F. Fan and Y. Wang and Z. Wang and Y. Nafa and Z. Li and H. Liu and W. Pan},
booktitle={2018 IEEE 34th International Conference on Data Engineering (ICDE)},
title={Enabling Quality Control for Entity Resolution: A Human and Machine Cooperation Framework},

A Human-and-Machine Cooperative Framework for Entity Resolution with Quality Guarantees. ICDE 2017.
Zhaoqiang Chen, Qun Chen, Zhanhuai Li
[Abstract]  [Bibtex]  [PDF]

For entity resolution, it remains very challenging to find the solution with quality guarantees as measured by both precision and recall. In this demo, we propose a HUman-andMachine cOoperative framework, denoted by HUMO, for entity resolution. Compared with the existing approaches, HUMO enables a flexible mechanism for quality control that can enforce both precision and recall levels. We also introduce the problem of minimizing human cost given a quality requirement and present corresponding optimization techniques. Finally, we demo that HUMO achieves high-quality results with reasonable return on investment (ROI) in terms of human cost on real datasets.

author = {Zhaoqiang, Chen and Qun, Chen and Zhanhuai, Li},
title = {A Human-and-Machine Cooperative Framework for Entity Resolution with
Quality Guarantees},
booktitle = {33rd {IEEE} International Conference on Data Engineering, {ICDE} 2017,
San Diego, CA, USA, April 19-22, 2017},
pages = {1405--1406},
year = {2017},
crossref = {DBLP:conf/icde/2017},
url = {https://doi.org/10.1109/ICDE.2017.197},
doi = {10.1109/ICDE.2017.197},
timestamp = {Wed, 24 May 2017 11:31:57 +0200},
biburl = {https://dblp.org/rec/bib/conf/icde/ChenCL17},
bibsource = {dblp computer science bibliography, https://dblp.org}