My Ph.D. Research Plan
Differential Privacy (DP) has become a de facto standard for protecting confidentiality when creating publicly available data products, with an ever-increasing list of real-world deployments, including the U.S. Census Bureau, Uber, Apple, Facebook, Microsoft, and Google. DP provides a rigorous plausible deniability guarantee (e.g., an attacker cannot determine whether a target patient has cancer or not) but also has an extremely steep learning curve – the aforementioned applications all require highly trained DP experts.
To challenge current methods of differentially private (DP) learning, we set the
goal to identify the most effective model-inversion attack rationales that achieve the maximum exposure of sensitive data with minimum prompts. We hypothesize that it is possible to protect target data from all its In-Context-Learning (ICL) attacks by fine-tuning a model on correct adversary rationale. We propose a model that learns to generate rationales to attack a private record in the training set of any DP-protected model by a chain-of-thought (CoT) process. If successful, the attack model can then be used to evaluate the likelihood of leaking private identifiable information and find the adjustment of the tight bounds for privacy loss for a target model to be further improved on.
