Methods Bites

Blog of the MZES Social Science Data Lab

BERT and Explainable AI

2023-03-28 43 min read tutorials [Andreas Küpfer Cosima Meyer]

Natural language processing (NLP) is a fascinating field. Popular NLP techniques for understanding (written) human language include next-sentence predictions, translations, text classifications, or sentiment analysis. Such techniques already permeate our everyday lives: What would the world be without services such as Google Translate, DeepL, or the recently released ChatGPT? While common bag-of-words approaches can often be a valuable approach for NLP, Google’s release of BERT in 2018 revolutionized the possibilities in NLP. This Methods Bites Tutorial introduces the logic of large language models (LLM) with a special emphasis on BERT. It provides an applied use case from the social sciences, walks readers through explainable artificial intelligence (AI), and explains how we can leverage explainable AI to explain predictions of our models. Continue reading

How to write your own R package and publish it on CRAN

R is a great resource for data management, statistics, analysis, and visualization — and it becomes better every day. This is to a large part because of the active community that continuously creates and builds extensions for the R world. If you want to contribute to this community, writing a package can be one way. That is exactly what we intended with our package overviewR. While there exist many great resources for learning how to write a package in R, we found it difficult to find one all-encompassing guide that is also easily accessible for beginners. Continue reading

LaTeX and Overleaf

LaTeX is a high quality typesetting system that that facilitates the production of well-formatted document. It has become highly popular in academic settings as an alternative to common typewriting systems (e.g., Word). This Methods Bites Tutorial by our team member Cosima Meyer and Dennis Hammerschmidt walks you through your first steps in LaTeX (using Overleaf) and provides you with a hands-on guide for writing scientific papers using an easily accessible template. Continue reading

Efficient Data Management in R

The software environment R is widely used for data analysis and data visualization in the social sciences and beyond. Additionally, it is becoming increasingly popular as a tool for data and file management. Focusing on these latter aspects, this Methods Bites Tutorial by Marcel Neunhoeffer, Oliver Rittmann and our team members Denis Cohen and Cosima Meyer illustrates the workflow and best practices for efficient data management in R. Continue reading

Advancing Text Mining with R and quanteda

Everyone is talking about text analysis. Is it puzzling that this data source is so popular right now? Actually no. Most of our datasets rely on (hand-coded) textual information. Extracting, processing, and analyzing this oasis of information becomes increasingly relevant for a large variety of research fields. This Methods Bites Tutorial by Cosima Meyer summarizes Cornelius Puschmann’s workshop in the MZES Social Science Data Lab in January 2019 on advancing text mining with R and the package quanteda. Continue reading