As part of a class in computational social science, my friend Yuval and I just wrapped up a short replication study relating to politics and language variation. We replicated the main findings of Shoemark et al. (2017) “Aye or naw,” with a twist: instead of focusing on variation between Scots and English, we looked at Spanish versus Catalan. You can check out preliminary results here.
I won’t lie that this project led me through a bunch of mistakes, setbacks and unexpected triumphs. I learned a lot about the importance of replication studies and how to properly execute them. Here’s a short list:
- Think about the big picture first, then drill down to the details. Every social science study should have a big-picture takeaway that speaks to a larger body of work. Why do we care about the original study? What does it add to the bigger conversation about a social phenomenon like code-switching? It’s important to think about why the study was conducted in the first place, particularly if the replication fails. For some replication studies, the “big picture” might be the methods rather than the theory, such as how to adapt machine learning techniques from one domain to another.
- Set your scope appropriately. This is especially important for a class project, which can rapidly grow into a full-blown research study if you’re not careful. It may be impractical to replicate the entire study from scratch, but there is likely a sub-section of the study that is worth unpacking on its own. In our case, even though we didn’t have the bandwidth to repeat the original study’s qualitative vocabulary matching, we were still able to replicate the study’s “big picture.” The process of setting scope will also depend on the availability of resources related to the original study: if the original authors provide all the code and data that you need, then the replication should have a larger scope than a replication that starts from scratch!
- Look for the cracks in the original work. No study is perfect, and a lot of research often omits or obfuscates details that are irrelevant to the primary methods or findings. For instance, in our case one of the experimental conditions was vague, so we made an assumption about what the original authors probably meant. That doesn’t mean the original work is invalid and doesn’t guarantee that your replication will be better. You will have to make some judgment calls just like the original authors did, so be prepared to defend your decisions. This is arguably the biggest motivation for a replication study, that the original work left out important details that prevent a proper replication (a major problem in modern social science research).
- Document everything. When you’re in the thick of a study, it’s easy to focus on getting the main result and come back to the details later. Nope! Don’t do that! Having the details documented helps to explain the process later, especially if your collaborators find a hole in your work. The documentation also helps keep you honest and compare your work directly to the original work. If you notice, for instance, that one of your sample sizes is significantly smaller than the original study, you might stop the analysis, you should go back and try to figure out what made the sample so small. Using an integrated system like the Github issues list makes it easy to keep track of what’s been done and to set future goals.
- Get feedback early and try to present the results to someone outside of the project. An advisor can help with this, but you can also use classmates or other research colleagues as a sounding board. If you hit a roadblock on an analysis, is there another way to go about it that would produce a similar result? If a chart doesn’t make sense, what’s a possible explanation for the pattern? This will force you to boil down the original study to its most condensed form so that you can set the scene quickly before moving into the real problem.
Those are the points that come to mind now, and I might update the list with more points as I think of them. Hope that was helpful! While frustrating at times, the replication process taught me to be a better scientist, and I would encourage other early-stage researchers to try it out. It wouldn’t hurt to see more replication studies at conferences, either: one of the tracks at CoLing 2018 centers on reproduction studies, so I’m looking forward to seeing what comes from those submissions.