SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions

TL;DR: I want marginalized people and communities to be recognized by those who design AI systems, and for those people’s needs to be clearly communicated and understood.

Rock, Rap, or Reggaeton?: Assessing Mexican Immigrants’ Cultural Assimilation Using Facebook Data

Making “fetch” happen: The influence of social and linguistic context on nonstandard word growth and decline

NAACL 2018

I recently returned from NAACL 2018, where I presented a short paper on language variation and political identity co-authored with Yuval Pinter and my advisor Jacob Eisenstein. Check out the full paper here and the slides here! The TL;DR is that local languages such as Catalan are likely associated with political identity, and that code-switching in political situations may have different constraints than typical frameworks such as audience design would predict.

Sí o no, ¿què penses? Catalonian independence and linguistic identity on social media

Replication studies

As part of a class in computational social science, my friend Yuval and I just wrapped up a short replication study relating to politics and language variation. We replicated the main findings of Shoemark et al. (2017) “Aye or naw,” with a twist: instead of focusing on variation between Scots and English, we looked at Spanish versus Catalan. You can check out preliminary results here.

#anorexia, #anarexia, #anarexyia: Characterizing Online Community Practices with Orthographic Variation

NWAV 2017

I’ve just returned from NWAV 2017, a conference centered around language variation and change. Although I’m technically focused on social computing, my research looks at situations of language change in online communities, like what makes certain lexical innovations survive longer than others. In the work I presented at NWAV (poster here), I found that lexical innovations on Reddit are more likely to succeed when they have higher dissemination among social and linguistic contexts. Here is photographic evidence of my presentation, documented by Emily Sabo!

A Viz of Ice and Fire: Exploring Entertainment Video Using Color and Dialogue

Reading notes

This summer, I’ve been doing some reading on social science research methods. That topic is obviously very broad, but I wanted to get a better sense of “how to think like a social scientist” rather than “how to think like a sociolinguist.” I’ve been doing sociolinguistics for so long (6 years?!) that I’ve sort of gotten stuck in a bubble about how to study social phenomena, and the books that I’ve read so far have done a nice job at providing a birds-eye view of quantitative social science methods that I’ve been missing. I’m hosting the notes here if you’re interested!

What does "context" mean? Part two

As I mentioned last time, my current work is concerned with how linguistic and social context influence the likelihood of a new word’s adoption. Last time I talked about semantic context as the popularity of a word’s “nearest neighbors” and how that might play a role in word adoption.

What does "context" mean? Part one

My current research is concerned with the relationship between the social and semantic context of lexical innovations and their likelihood of adoption in the online community Reddit. The innovation “fleek” gained success due to its restricted context, i.e. the phrase “on fleek”, but this might be a rarity compared to most innovations that might gain success as a result of being used in a wide variety of contexts. Unlike “fleek,” the intensifier “af” (“as fuck”) seems to occur in a wide range of post-adjective contexts (“cool af”, “dope af”, etc.). Related to work on adoption of innovations like this and this, it seems like there is a nontrivial relationship between the linguistic context of a new word and the likelihood of that word being adopted by a community. But how do we study that relationship quantitatively? It’s not easy to come up with a universal definition of “context” apart from the generic “company that you keep” definition, and this still leaves a lot of room for interpretation.

ICWSM 2017

From May 14-18, I recently attended the International Conference for Web and Social Media in Montreal to present my work on semantic change that I conducted during my internship at the Pacific Northwest National Laboratory during summer 2016. Check out the full paper here and the poster here!

Measuring, Predicting and Visualizing Short-Term Change in Word Representation and Usage in VKontakte Social Network

SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions

Surprisingly Fragile: Assessing and Addressing Prompt Instability in Multimodal Foundation Models

Generalist multimodal AI: a review of architectures, challenges and opportunities

Whose wife is it anyway? Bias in machine translation of same-gender relationships

Democratizing Machine Learning for Interdisciplinary Scholars: Report on Organizing the NLP+CSS Online Tutorial Series

Expressive Interviewing Agents to Support Health-Related Behavior Change: Randomized Controlled Study of COVID-19 Behaviors

Lexical Measurement of Teaching Qualities

Understanding the Role of Questions in Mental Health Support-Seeking Forums

How well do you know your audience? Socially-aware question generation

Surfacing Racial Stereotypes through Identity Portrayal

FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation Framework

Room to Grow: Understanding Personal Characteristics Behind Self Improvement Using Social Media

Tuiteamos o pongamos un tuit? Investigating the social constraints of loanword integration in Spanish social media

GT Thesis Submission

Characterizing Collective Attention via Descriptor Context in Public Discussions of Crisis Events

Queer NLP - stand and be counted

Rock, Rap, or Reggaeton?: Assessing Mexican Immigrants’ Cultural Assimilation Using Facebook Data

Making “fetch” happen: The influence of social and linguistic context on nonstandard word growth and decline

NAACL 2018

Sí o no, ¿què penses? Catalonian independence and linguistic identity on social media

Replication studies

#anorexia, #anarexia, #anarexyia: Characterizing Online Community Practices with Orthographic Variation

NWAV 2017

A Viz of Ice and Fire: Exploring Entertainment Video Using Color and Dialogue

Reading notes

What does "context" mean? Part two

What does "context" mean? Part one

ICWSM 2017

Measuring, Predicting and Visualizing Short-Term Change in Word Representation and Usage in VKontakte Social Network

Extending generative models of large scale networks

Now we stronger than ever: African American syntax on Twitter