TL;DR: I want marginalized people and communities to be recognized by those who design AI systems, and for those people’s needs to be clearly communicated and understood.
I recently returned from NAACL 2018, where I presented a short paper on language variation and political identity co-authored with Yuval Pinter and my advisor Jacob Eisenstein. Check out the full paper here and the slides here! The TL;DR is that local languages such as Catalan are likely associated with political identity, and that code-switching in political situations may have different constraints than typical frameworks such as audience design would predict.
As part of a class in computational social science, my friend Yuval and I just wrapped up a short replication study relating to politics and language variation. We replicated the main findings of Shoemark et al. (2017) “Aye or naw,” with a twist: instead of focusing on variation between Scots and English, we looked at Spanish versus Catalan. You can check out preliminary results here.
I’ve just returned from NWAV 2017, a conference centered around language variation and change. Although I’m technically focused on social computing, my research looks at situations of language change in online communities, like what makes certain lexical innovations survive longer than others. In the work I presented at NWAV (poster here), I found that lexical innovations on Reddit are more likely to succeed when they have higher dissemination among social and linguistic contexts. Here is photographic evidence of my presentation, documented by Emily Sabo!
This summer, I’ve been doing some reading on social science research methods. That topic is obviously very broad, but I wanted to get a better sense of “how to think like a social scientist” rather than “how to think like a sociolinguist.” I’ve been doing sociolinguistics for so long (6 years?!) that I’ve sort of gotten stuck in a bubble about how to study social phenomena, and the books that I’ve read so far have done a nice job at providing a birds-eye view of quantitative social science methods that I’ve been missing. I’m hosting the notes here if you’re interested!
As I mentioned last time, my current work is concerned with how linguistic and social context influence the likelihood of a new word’s adoption. Last time I talked about semantic context as the popularity of a word’s “nearest neighbors” and how that might play a role in word adoption.
My current research is concerned with the relationship between the social and semantic context of lexical innovations and their likelihood of adoption in the online community Reddit. The innovation “fleek” gained success due to its restricted context, i.e. the phrase “on fleek”, but this might be a rarity compared to most innovations that might gain success as a result of being used in a wide variety of contexts. Unlike “fleek,” the intensifier “af” (“as fuck”) seems to occur in a wide range of post-adjective contexts (“cool af”, “dope af”, etc.). Related to work on adoption of innovations like this and this, it seems like there is a nontrivial relationship between the linguistic context of a new word and the likelihood of that word being adopted by a community. But how do we study that relationship quantitatively? It’s not easy to come up with a universal definition of “context” apart from the generic “company that you keep” definition, and this still leaves a lot of room for interpretation.
From May 14-18, I recently attended the International Conference for Web and Social Media in Montreal to present my work on semantic change that I conducted during my internship at the Pacific Northwest National Laboratory during summer 2016. Check out the full paper here and the poster here!