This paper revisits membership inference attacks (MIAs) not just as privacy threats, but as practical tools for privacy evaluation. We identify significant disparities in attack results—across different methods and even across repeated runs of the same method—raising concerns about their reliability. We propose a systematic framework to analyze these disparities using coverage and stability, and introduce ensemble strategies that improve both attack effectiveness and evaluation robustness. Our findings highlight the need for multi-perspective MIA evaluation rather than reliance on a single “top-performing” method.


This paper is accepted by ACM CCS 2025. Paper Code