Mining Test Execution History to Support Test Selection, Prioritization, and Guide Bisection
相关文章:Bisecting Commits and Modeling Commit Risk during Testing

背景:
- Software testing is expensive
- Too many test need to be run
RQ1: What is the most cost-effective BatchSize for the number of culprits discovered during testing?
Bisect介绍:

在这种情况下,需要跑五次测试,比单独挨个测试还要多。
Conclusion: The higher the CulpritRate the smaller the most costeffective BatchSize.

RQ2: What is the most cost-effective BatchSize when some bisections are done as a result of flaky failures?
The higher the FlakeRate the smaller the BatchSize and smaller the savings in executions.

TestTopK思想:

关键在于预测模型。之前对于commit-level bug prediction的研究可以帮助。Test history证明对于预测模型是有效的。
A Large-Scale Empirical Study of Just-in-Time Quality Assurance
Commit Guru: Analytics and Risk Prediction of Software Commits
Test Re-prioritization in Continuous Testing Environments
An example of calculating the culprit score

- Probability of a file A change cooccurring with Test 1 fail is 3/8=0.375
- Normalized probobility of file A change and Test 1 fail is 0.375*(20/30)
- Culprit score of commit 101 = 0.375(20/30) + 0.25(10/30)=0.33
结果:
