Today’s companies collect immense amounts of personal data and enable wide access to it within the company. This exposes the data to external hackers and privacy-transgressing employees, say the authors of “Enhancing Selectivity in Big Data.”

Researchers from Microsoft, Uber, and Columbia University show that, for a wide and important class of workloads, only a fraction of the data is needed to approach state-of-the-art accuracy.

They propose selective data systems that are designed to pinpoint the data that is valuable for a company’s current and evolving workloads. These systems limit data exposure by setting aside the data that is not truly valuable.