Data valuation with Leave-One-Out (LOO) test and Shapley methods

Authors

  • Nathan Martin Science & Engineering Magnet Program, Manalapan High School

Keywords:

data valuation, leave-one-out test, Shapley methods, internship, data science

Abstract

During my fall internship with WIT, I explored methods to value  data from progressive profiling questions that I designed. These profiling questions were split into demographic (username, email, country,) behavioral (actions), and psychographic (values, interests, lifestyle) questions. I was to then design a database that recorded estimated values to serve as a base for future progress. After this, I researched data valuation with two tests: Leave-One-Out (LOO) and Shapley. One valued data by essentially finding the least common answer by eliminating all datasets except one to  then record the effect the removal had on the distribution. This was efficient in valuing multiple choice answers but was slow and did not accurately affect each removal on the distribution. After making the program, I worked on valuing with Shapley, the marginal contribution each dataset’s removal has with respect to another’s removal. This yielded more accurate results and eliminated less valuable data at a more successful rate.

Downloads

Published

2025-01-27 — Updated on 2025-01-28

How to Cite

Martin, N. (2025). Data valuation with Leave-One-Out (LOO) test and Shapley methods. Journal of Science & Engineering , 1(3), 72. http://34.172.72.90/index.php/jse/article/view/37

Similar Articles

1-10 of 34

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)