Toxic language can take many forms, from explicit hate speech to more subtle microaggressions. Within this space, models identifying transphobic language have largely focused on overt forms. However, a more pernicious and subtle source of transphobic comments comes in the form of statements made by Trans-exclusionary Radical Feminists (TERFs); these statements often appear seemingly-positive and promote women’s causes and issues, while simultaneously denying the inclusion of transgender women as women. Here, we introduce two models to mitigate this antisocial behavior. The first model identifies TERF users in social media, recognizing that these users are a main source of transphobic material that enters mainstream discussion and whom other users may not desire to engage with in good faith. The second model tackles the harder task of recognizing the masked rhetoric of TERF messages and introduces a new dataset to support this task. Finally, we discuss the ethics of deploying these models to mitigate the harm of this language, arguing for a balanced approach that allows for restorative interactions.