NIST: Explore Data Deidentification With Us!

We invite you to come explore deidentification technologies with us by participating in the Collaborative Research Cycle. This technology challenge seeks to advance our understanding of synthetic data generation and other de-identification technologies. We present the NIST Diverse Community Excerpts, rich demographic data from the American Community Survey, as benchmark data.  We invite you to submit deidentified instances of these data using any technique. In return, you will receive detailed utility and privacy reports. 

Beginning May 15, we plan to make periodic releases of all of the submitted data alongside detailed method details and evaluation results in a machine-readable ‘research acceleration bundle,’ that we anticipate will become an invaluable resource for comparing and exploring deidentification techniques. 

Please visit the project’s website to see the data, the metrology package we have to analyze the de-identified data, and learn more about the program. 

Any and all techniques are welcome (even poor performing ones!). We already have a library of techniques, with some open source tools, that you’re welcome to try out.

Submit data by May 9, 2023 to have your data included in the first release of our acceleration bundle. We plan to drop additional releases during the summer. Send a blank email to [email protected] to Join our listserv for updates, and invitations to our biweekly office hour and seminars.