Events
Upcoming
Using YouTube Data for Social Science Research: Studying American Local Politics with Government Meeting Videos
more
Hybrid event [A5, 6, Room A231 + Zoom]
December 04, 2024, 13:45-15:15
Abstract
Despite the fundamental importance of American local governments for service provision in areas like education and public health, local policy-making remains difficult and expensive to study at scale due to a lack of centralized data. I will present LocalView and DistrictView, two projects that build the largest existing datasets of real-time local government public meetings–the central policy-making process in local governments. We have collected hundreds of thousands of videos and their corresponding textual and audio transcripts of local government meetings publicly uploaded to YouTube–the world’s largest public video-sharing website–from thousands of governments like city councils and school boards across the United States between 2006–2024. I will discuss ongoing projects using LocalView data, and demonstrate how researchers can identify, process, download, clean, and analyze data from YouTube for use in social science research.
Presenter(s)
Tyler Simko is a Postdoctoral Fellow at Princeton University, and an incoming Assistant Professor of Political Science at the University of Michigan. He studies American state & local policymaking, with a particular focus on political geography and non-tabular data sources like text, images, and videos. He regularly partners with federal, state, & local policymakers to improve program design and reduce administrative burdens. He earned a PhD in Government from Harvard University in 2024.
Using Data Donations to Collect Digital Trace Data: Promises and Pitfalls for the Social Sciences
more
Hybrid event [A5, 6, Room A231 + Zoom]
November 20, 2024, 13:45-15:15
Abstract
Data donations constitute a new method for collecting digital trace data: In line with the GDPR, users can download their data from digital platforms. They can then donate such data to researchers via Data Donation Tools, which process data locally on participants' devices and allow for their informed consent. Since data donations allow for detailed and longitudinal measurements of individual behavior, they can amplify other data sources (e.g., survey data, administrative data). This talk introduces data donations as a method from a social science perspective. It first discusses how this method is implemented in practice, both from the perspective of participants and researchers. Next, it explains technical, legal, and ethical considerations, such as which tools to use or how to handle sensitive information. Lastly, it highlights promises and pitfalls, including potential errors in representation and measurement.
Presenter(s)
Valerie Hase is a postdoctoral researcher at the Department of Media and Communication at LMU Munich. Her research focuses on computational social science, especially text-as-data and digital trace data, cross–platform perspectives, and digital journalism.
From Evidence Gap Maps to Systematic Reviews and Meta-Analyses: Approaches, Techniques, and Tools
more
Hybrid event [A5, 6, Room A231 + Zoom]
October 30, 2024, 13:45-15:15
Abstract
The ever-increasing number of empirical studies in many social science disciplines makes it difficult for researchers to keep track of the current state of research and contextualize new findings within existing research. Evidence gap maps, systematic reviews and meta-analyses are useful tools for creating evidence syntheses and can form the basis for evidence-based decisions in policy and science. This input talk will provide a basic introduction to various approaches, techniques, and tools of (quantitative) research synthesis. Additionally, participants will be provided with evaluation criteria how to review an evidence synthesis and shown exemplary application examples.
Presenter(s)
Jessica Daikeler holds a PhD from the University of Mannheim and is a PostDoc at GESIS, where she coordinates the BMBF/EU-funded project KODAQS and focuses on evidence-based methods in survey methods and data quality issues.
Computationally Analyzing Politicians' Body Language Using Pose Estimation
more
Hybrid event [A5, 6, Room A231 + Zoom]
October 16, 2024, 13:45-15:15
Abstract
Politicians can increase the appeal of their speeches through nonverbal cues such as gestures and vocal emphasis. Understanding the factors that make political speech appealing is central to political science research, yet studying nonverbal cues during political speech is difficult due to their audiovisual nature. In this workshop, you learn how to analyse politicians' body language in video recordings using pose estimation---a class of computer vision models that locate and trace human body key points, such as hands, ellbows, and shoulders throughout videos. We will keep the technical introduction brief, focusing instead on hands-on implementation of pose estimation and learning how the resulting data can be used to meaningfully quantify gestures in political speech.
Presenter(s)
Oliver Rittmann is a Lorenz-von-Stein Fellow at the MZES. In his reasearch, he uses computational methods to analyze large-scale audio and video corpora of political speeches, exploring the nonverbal communication strategies of politicians.
Robust Causal Inference using Double/Debiased Machine Learning: A Guide for Empirical Research
more
Hybrid event [A5, 6, Room A231 + Zoom]
September 18, 2024, 13:45-15:15
Abstract
Motivated by their robustness to partially unknown functional forms, supervised machine learning estimators are increasingly leveraged for causal inference. One method that has received much attention is double/debiased machine learning (DML), which allows leveraging generic supervised machine learners for the estimation of common (causal) parameters. In this paper, we review DML and provide practical guidance to empirical researchers. We highlight three points: First, DML allows researchers to focus their attention on defining credible identifying assumptions while weakening assumptions of convenience such as linearity. Second, DML is versatile in that it can accommodate high-dimensional sets of variables (e.g. arising from text data) but also provides a cheap and sensible robustness check when only a few controls or instruments are observed. Third, the use of poorly validated machine learners may yield misleading inferences. Considering a diverse set of nuisance function estimators (including parametric estimators) alleviates this problem. We use several applications to illustrate these main points and derive practically relevant recommendations for empirical researchers.
Presenter(s)
Achim Ahrens is a Senior Researcher at the Public Policy Group, ETH Zurich, and the Immigration Policy Lab (ETH/Stanford). He holds an MSc in Economics from University of Edinburgh and a PhD in Economics from Heriot-Watt University. Achim has worked on empirical projects in a wide range of fields including housing economics, migration, and labor economics. Achim has a strong interest in the intersection of causal inference and machine learning, and in algorithmic fairness.
2023-2024
more
Towards more life-course-sensitive decompositions of group-inequalities: Two approaches applied to the Gender Pension Gap
more
Hybrid event [A5, 6, Room A231 + Zoom]
April 24, 2024, 13:45-15:15
Abstract
Life courses become increasingly complex and determine inequalities in outcomes between groups. However, research often decomposes group gaps in life-course-sensitive outcomes based on a selective set of life course summary measures, such as the years spent in (full-time) employment. One example of group-specific outcomes is the Gender Pension Gap. In my dissertation I developed and applied two innovative combinations of methods: i) sequence analysis with the Kitagawa-Oaxaca-Blinder decomposition, as well as ii) the Life Course Feature Selection with the Ñopo decomposition. This talk guides through the practical application of both and presents the methodological and substantial contributions of both approaches that can be easily applied to other life-course-sensitive group inequalities.
Presenter(s)
Carla Rowold is a PhD candidate in Sociology at the University of Oxford and a postdoctoral researcher at the Max-Planck-Institute for Demographic Research. In her dissertation, she explores life-course- and gender-sensitive approaches to assess the link between gender inequalities over the life course and the Gender Pension Gap. Her research interests cover life course sociology and inequalities, particularly among genders, social demography, and work-family and retirement policies.
Materials
Using AI tools for research
more
Hybrid event [A5, 6, Room A231 + Zoom]
April 17, 2024, 13:45-15:15
Abstract
Ever noticed how phrases like 'delve into', "revolutionise", and 'embark on' have suddenly become really popular? We can definitely thank AI generated texts for this linguistic trend . We all played with AI tools and admired their smartness. The fastest among us have already published papers with ChatGPT or about ChatGPT. But let's face it, working with AI, doing research with AI isn't that straightforward. We can't produce good research products by asking AI to write for us, but we can certainly treat it as a mighty research assistant that will work through the volumes, leaving us to convert the volumes of information into a quality research product.
In my talk, I'm going to walk you through stages of the research process and introduce you to existing AI tools that can assist you in formulating your research question, finding relevant literature, reading selected papers, analysing data, proofreading your writing, getting feedback and presenting your findings. Additionally, tools like ChatGPT can also be used to support tasks that are not directly related to research in larger projects (such as developing visual identity, creating catchy acronyms, and communication strategies). And last but not least, ChatGPT can be a motivational coach to help with writing anxiety and overcoming perfectionism in the first place.
Join me to learn how we can collaborate with AI to become better at the craft of research.
Presenter(s)
Olga Kononykhina is a mid-career PhD researcher at LMU Munich and the Munich Center for Machine Learning, holds degrees in Applied Mathematics and Sociology. She has 15 years of research and consulting experience in measurement systems, indicator frameworks, and data analysis in social and political contexts. She is passionate about improving data literacy and communication through interactive data platforms and storytelling. Her academic specialization focuses on enhancing occupational data quality for machine learning classification and exploring intersections of machine learning, biases, official statistics, AI governance, and privacy.
Materials
Survey mastery: a deep dive into SQP 3.0 to enhance questionnaire development
more
Hybrid event [A5, 6, Room A231 + Zoom]
March 13, 2024, 13:45-15:15
Abstract
Designing questionnaires is said to be an art. It involves knowledge and experience. To make this a more scientific activity, Saris and colleagues developed a practical hands-on tool called "Survey Quality Predictor" (SQP). SQP is an open-access web-based program that predicts the quality of survey questions for continuous latent variables based on the linguistic and formal characteristics of the survey item (e.g., the properties of the answer scale). The underlying prediction algorithm was derived from a meta-analysis of many multitrait-multimethod (MTMM) experiments with more than 6,000 survey questions in 28 languages and 33 countries. SQP is not intended to replace cognitive pretesting, expert review, or web probing techniques. Instead, it is a complementary tool to help researchers in the development phase of new questionnaires in national and international survey projects. In this workshop, I will show how researchers can use SQP 3.0 to find survey questions for their questionnaires, improve their questions before data collection, and identify discrepancies between the source and translated versions of a survey question.
Presenter(s)
Lydia Repke is a Senior Researcher and the head of the team “Scale Development and Documentation” at the Department of Survey Design & Methodology at GESIS – Leibniz Institute for the Social Sciences. Her research focuses on survey design, data quality, social networks, acculturation, and multiculturalism. She is a member of the Young Academy of the Academy of Sciences and Literature | Mainz.
Materials
Models all the way down
more
Hybrid event [A5, 6, Room A231 + Zoom]
February 21, 2024, 13:45-15:15
Abstract
The Neyman-Rubin causal model characterizes how, through experimental (or quasi-experimental) manipulation of an intervention, researchers can make data-informed counterfactual claims about what would happen in the absence of that intervention. The Neyman-Rubin causal model is, nevertheless, just that: a model. In this talk, I will present some excerpts from a larger book project in which my collaborators and I describe the connections between the Neyman-Rubin causal model, the basic estimands of randomized control trials targeting respondents’ preferences, and the theoretical object that is traditionally described as a preference. After a brief reminder of the basic structure of the Neyman-Rubin causal model, I will explain how this framework has been applied to preference elicitation experiments. Then, I will proceed to show that, although this gives us well-defined counterfactuals, the corresponding causal quantities do not straightforwardly represent preferences, either at the individual level or in the aggregate. Finally, I will present a model-based alternative for preference elicitation, with a hands-on application using replication data from a survey experiment.
Presenter(s)
Asya Magazinnik is Professor of Social Data Science at the Hertie School. Her research interests include electoral geography, federalism, local politics, and law enforcement. She also works on political methodology, in particular at the intersection of causal inference and formal theory. Her work has appeared in the American Journal of Political Science, the Journal of Politics, and other outlets. Previously, she was an Assistant Professor of Political Science at MIT. She earned a PhD in Political Science from Princeton University in 2020 and holds an MPP from the Harris School at the University of Chicago.
Materials
Power analysis for social science research
more
Hybrid event [A5, 6, Room A231 + Zoom]
December 06, 2023, 13:45-15:15
Abstract
Power analysis is an essential component of designing experiments. It helps researchers to allocate sufficient resources to data collection, finding the balance between too small and too large N, and is often required for grant proposals. In this workshop, we will first talk about the basics of power calculations and discuss practical considerations. We will then give hands-on-examples on analytical power calculations using the software G*Power, the R package pwr, and will show how to calculate power with simulations.
Presenter(s)
Denis Cohen is a Senior Research Fellow in the Data and Methods Unit at the Mannheim Centre for European Social Research (MZES), University of Mannheim. His research focuses on spatial inequalities, party competition, and political behavior. His methodological interests include quantitative approaches to the analysis of clustered data, strategies for causal identification, and Bayesian statistics.
Alexander Wenz is a Research Fellow in the Data and Methods Unit at the Mannheim Centre for European Social Research (MZES) at the University of Mannheim. His research examines the quality of novel methods of data collection, with a focus on mobile web surveys, smartphone apps, wearable sensors, and digital behavioral data.
Materials
Finding, accessing, and re-using research data: The University of Mannheim Research Data Center and BERD@NFDI
more
Hybrid event [A5, 6, Room A231 + Zoom]
November 15, 2023, 13:45-15:15
Abstract
In the age of data-driven research, the University of Mannheim Research Data Center (DFZ) at the Mannheim University Library bundles together services for researchers interested in research data management. In this talk we will introduce the audience to the FDZ’s core services such as guidelines for Data Management Plans or the data collection via OCR, provide information about its data resources such as the Aktienführer data archive, and (re-)introduce the newest addition to the FDZ: The German Internet Panel data collection infrastructure. In addition, this talk will provide an overview of the University Libraries activity within the BERD@NFDI consortium. The second part of the talk will deal with the NFDI consortium BERD@NFDI which is coordinated by the University Library. BERD@NFDI focuses on the research data management and analysis of unstructured data in Business, Economics and related fields, such as unstructured text from social media, news, images etc. We will present the services under development and how researchers can profit from BERD@NFDI.
Presenter(s)
Irene Schumm is Head of the Research Data Center Department at the Mannheim University Library.
Ulrich Krieger is Coordinator for the BERD@NFDI consortium based at the Mannheim University Library.
Materials
A beginner's guide to neural networks for social scientists
more
Hybrid event [A5, 6, Room A231 + Zoom]
October 18, 2023, 13:45-15:15
Abstract
Neural networks are powerful machine learning algorithms that form the basis of many important technologies, including generative AI and computer vision. However, they are not as straight-forward to implement as many other machine learning techniques, like random forest or logistic regression. If you are a researcher interested in applying neural networks, this tutorial will provide an easy introduction to what neural networks are, how they work, and how you can implement a simple one yourself.
Presenter(s)
John 'Jack' Collins is a PhD candidate at the Mannheim Centre for European Social Research (MZES). Jack holds a Master's in Data Science and his research focuses on applying machine learning to survey methodology. Before coming to Mannheim University for his PhD, Jack was an IT consultant.
Materials
Estimating and correcting for measurement error using hidden Markov models
more
Hybrid event [A5, 6, Room A231 + Zoom]
September 20, 2023, 13:45-15:15
Abstract
Hidden Markov models (HMMs) are a group of latent class models that allow for the estimation and correction of measurement error in categorical, longitudinal data. The main advantage of these models is the fact that they do not rely on the availability of an error-free data source that is used as a benchmark to validate error-prone data. Instead, these models make use of the availability of multiple measures of the same indicator over time to extract information about the error from the data itself. In this workshop, I will provide an introduction to HMMs, discuss how they work and how they can be used in practice for measurement error correction. I will also show how standard HMMs can be implemented in R and how more complex specifications can be implemented in specialized software, specifically Latent Gold.
Presenter(s)
Paulina Pankowska is an Assistant Professor at the Sociology Department of Utrecht University. Her research relates primarily to data and methods quality in the social sciences. In 2020 Paulina defended her PhD dissertation titled: 'Measurement error: estimation, correction, and analysis of implications', which investigated the feasibility of using hidden Markov models (a latent variable modelling technique) to account and correct for measurement error in survey and administrative data. The project was conducted in collaboration with Statistics Netherlands
Materials
2022-2023
more
The GLES Open Science Challenge 2021: A pilot project on the applicability of registered reports in quantitative political science
more
Hybrid event [A5, 6, Room A231 + Zoom]
May 17, 2023, 13:45-15:15
Abstract
The GLES Open Science Challenge 2021 was a pioneering initiative in quantitative political science. It aimed at increasing the adoption of replicable and transparent research practices. The project combined the rigor of registered reports-a new publication format in which studies are evaluated prior to data collection/access and analysis-with quantitative political science research in the context of the 2021 German federal election. In this presentation, we first elaborate on why more transparent research practices are necessary to guarantee the cumulative progress of scientific knowledge and how registered reports can contribute to increasing the transparency of scientific practice. Next, we present the GLES Open Science Challenge as an example how registered reports on the basis of secondary data are applicable. Finally, we reflect on (a) special challenges of preregistration and Registered Reports for research based on secondary data, (b) lessons learned in the course the GLES OSC and (c) discuss potential future developments in this area.
Presenter(s)
Hannah Bucher is a PhD student in survey research at the University of Mannheim and a research associate at GESIS – Leibniz-Institute for the Social Sciences at the German Longitudinal Election Study (GLES). Together with Anne Stroppe and Axel Burger, she co-organized and co-edited the GLES Open Science Challenge 2021.
Anne-Kathrin Stroppe is a PhD student in political science and a research associate at GESIS - Leibniz-Institute for the Social Sciences for the German Longitudinal Election Study (GLES). Together with Hannah Bucher and Axel Burger, she co-organized and co-edited the GLES Open Science Challenge 2021.
Axel Burger is a social psychologist with a research focus on political psychology and works as a postdoctoral researcher at GESIS - Leibniz-Institute for the Social Sciences in the team of the German Longitudinal Election Study (GLES). Together with Hannah Bucher and Anne Stroppe, he co-organized and co-edited the GLES Open Science Challenge 2021.
Materials
Multiverse analysis
more
Hybrid event [A5, 6, Room A231 + Zoom]
April 26, 2023, 13:45-15:15
Abstract
Data analysis involves many decisions, including study design, data preparation, and statistical model selection. However, a single analysis represents only one of many possible outcomes, raising questions about the impact of undocumented and at times arbitrary choices. Multiverse analysis addresses this issue by conducting all---or a large set of---meaningful analyses and presenting the results in summary form to assess the robustness of conclusions to alternative modeling decisions. The approach addresses two fundamental problems in research: the lack of transparency and the dependence of analysis results on data-analytic decisions. We will also discuss how to implement the approach, it's advantages over more traditional analysis approaches, as well as limitations and open challenges, including statistical inference and computational requirements.
Presenter(s)
Reinhard Schunck is Professor of Sociology at the University of Wuppertal. He works primarily in the field of social stratification and inequality, concentrating on migration and family related processes, and has a focus on quantitative methods.
Nora Huth-Stöckle is a doctoral student and works at the University of Wuppertal. Her research interests comprise intergroup relations, educational inequality, and quantitative methods.
Materials
Transformer-based language models
more
Hybrid event [A5, 6, Room A231 + Zoom]
March 29, 2023, 13:45-15:15
Abstract
Transformer-based models have recently gained much attention, especially with the release of ChatGPT. Since 2017, deep learning models based on the Transformer architecture have become an important research tool. Their development and application in various fields, including the social sciences, continue to expand. In this talk, we will examine the components that make up these language models and explore how to train state-of-the-art models with HuggingFace for your research. We will also discuss these models' limitations and open challenges, including open-source availability, the growing need for resources, responsibility, and more.
Presenter(s)
Christopher Klamm is an interdisciplinary researcher at the University of Mannheim (Germany) at the Data and Web Science Group working at the intersection of Natural Language Processing and Computational Political Science.
Materials
Social media ads for web survey participant recruitment
more
Hybrid event [A5, 6, Room A231 + Zoom]
March 08, 2023, 13:45-15:15
Abstract
The growing proportion of the global population active on so-called social media platforms opens up new opportunities for survey researchers. A novel approach uses ads on, e.g., Facebook, Instagram, or Twitter to recruit participants for web surveys. Given the growing number of studies that used these platforms, social media appears to be a good resource for promoting surveys and recruiting participants. Regularly cited benefits include reaching large numbers of respondents in a short time at a low cost. Nonetheless, this approach presents a number of challenges, including under-coverage and self-selection, fraud and fake interviews, and problems with weighting survey data.
This workshop will provide insight into the use of social media platforms for survey recruitment. Using various application examples from the field of social research, the potential but also points of criticism will be highlighted. In addition, a fictitious example is used to systematically guide the audience through the most important steps before, during, and after such participant recruitment, thus providing valuable tips for future applications.
Presenter(s)
Zaza Zindel is a doctoral researcher and research assistant in sociology at Bielefeld University. Her research interests include survey methodology, social media and its potential for empirical social research, and social inequalities in general.
Materials
Getting started with Python: A how-to guide for social scientists (Part II)
more
Hybrid event [A5, 6, Room A231 + Zoom]
February 22, 2023, 13:45-15:15
Abstract
The merits of Python for social scientists become tangible when working on a concrete use case. In this follow-up event of our Social Science Data Lab workshop series on Python we use Jupyter Notebooks in the Google Colab environment to implement a simple machine learning routine for prediction. To do that, we first take a step-by-step look at the peculiarities of Python such as data wrangling and basic visualization techniques. With that knowledge, we delve into the basics of applied machine learning by implementing the pipeline for both a logistic regression as well as a random forest model using the Python package scikit-learn. We conclude this workshop with a brief outlook on more advanced possibilities with Python to lay the foundation for your own research.
Presenter(s)
Andreas Küpfer is a doctoral researcher at the University of Darmstadt. His interdisciplinary research interests include text as data, applying machine learning technologies, and substantial inference in the fields of political communication and political competition.
Ruben Bach is a postdoctoral researcher at the MZES, University of Mannheim, focusing on social science quantitative research methods. His interests include topics related to big data in the social sciences, machine learning, causal inference, and survey research.
Materials
Getting started with Python: A how-to guide for social scientists (Part I)
more
Hybrid event [A5, 6, Room A231 + Zoom]
February 15, 2023, 13:45-15:15
Abstract
Other than with R, getting started with Python can be burdensome at times as there is no one-stop shop solution like RStudio. Although tons of introductory tutorials for Python are available on the web, navigating and setting up one’s programming environment can be challenging, especially for users with little programming experience. To lower the burden of getting started with Python, we will talk in this workshop about the basics of Python, installing and maintaining virtual environments and the various graphical user interfaces and integrated development environments out there like Jupyter Notebooks, Google Colab, and Anaconda. We show situations where Python may be beneficial for your research and when you may choose to go with R. Please note that this talk is the first part of a two-day workshop in the Social Science Data Lab. In the second event (February 22, 2023), we will focus our attention on implementing a simple machine learning routine in Python.
Presenter(s)
Ruben Bach is a postdoctoral researcher at the MZES, University of Mannheim, focusing on social science quantitative research methods. His interests include topics related to big data in the social sciences, machine learning, causal inference, and survey research.
Andreas Küpfer is a doctoral researcher at the University of Darmstadt. His interdisciplinary research interests include text as data, applying machine learning technologies, and substantial inference in the fields of political communication and political competition.
Materials
Extracting political data & relations from Wikidata
more
Hybrid event [A5, 6, Room A231 + Zoom]
December 07, 2022, 13:45-15:15
Abstract
Political research often involves tedious coding of politically relevant data or relations: Political biographies, actors' characteristics or the networks between them. However, often, this data is already available: Wikidata is a free and open knowledge base that collects historical and contemporary (political) facts as relational data. However, researchers often hesitate to use these sources due to technical barriers. This introductory talk introduces Wikidata and its potential uses. It then showcases a work-in-progress application that measures the persistence of the legacies of slave-ownership in British politics. Drawing connections between historical slave-owners and historical and present-day MPs allows to quantify these legacies and their persistence over time. Finally, to enable participants to use Wikidata in their own research, the talk includes a practical part on collecting and using Wikidata with R. To follow the practical applications, please bring a laptop with installations of R, RStudio, and the packages dplyr and tidywikidatar.
Presenter(s)
Theresa Gessler is Junior Professor of Comparative Politics at the European University Viadrina Frankfurt (Oder). Her work centers on conflicts around democracy, immigration, digitalization and patterns of party competition. Next to classical political science methods, her research uses text-as-data, webscraping and various types of digital trace data.
Materials
Agent-based modeling for social scientists
more
Hybrid event [A5, 6, Room A231 + Zoom]
November 16, 2022, 13:45-15:15
Abstract
Agent-based computer simulations have gained increasing popularity in many scientific disciplines in the last two decades. But what are they? And what is their appeal for the social sciences? How can they be used by social scientists? In this introductory talk, we first give an overview over the definition and origin of agent-based models and their relation to other types of computer simulation. Second, we show the appeal and usage depending on different research goals with examples from our own work in sociology (namely information diffusion and residential segregation) and political science (namely dynamic multiparty competition).
Presenter(s)
Oke Bahnsen is a doctoral researcher and research associate in political science at the University of Mannheim. He studied economics (M.Sc.) as well as mathematics and political science (M.Ed.) in Kiel and Göteborg. His research focuses on coalition politics and electoral behavior. Methodologically, he is interested in using agent-based modeling to study party competition and opinion dynamics, as well as in experimental research conducted both in the laboratory and in large-scale population-based surveys.
Malte Grönemann is a doctoral researcher and lecturer in sociology at the University of Mannheim. He studied sociology, economics, and statistics in Bonn, Cologne, Mannheim and Linköping. His work focuses on complex social systems which he studies using differential equations and agent-based models. Specifically, he currently works on network diffusion and socio-economic residential segregation. Methodically, he is interested in statistics, visualisation as well as data and research quality in the quantitative social sciences.
Materials
Investigating fairness in data-driven allocation of public resources
more
Hybrid event [A5, 6, Room A231 + Zoom]
October 26, 2022, 13:45-15:15
Abstract
Data-driven approaches for the allocation of public resources promise to make fast, reliable, cost-efficient and objective decisions. However, there are also concerns about such approaches. For example, data-driven algorithmic profiling in the context of the allocation of labor market support programs led to public outrage in Austria. Fairness concerns were raised, as gender and citizenship were found to influence allocation decisions. Thereby, they bear the risk of disparate treatment. In this workshop, we will provide an introduction to fairness notions in machine learning and discuss the possibilities and limitations of technical approaches. A data-driven profiling system for allocating support to jobseekers will be implemented in Python and provided as executable code snippets. Our aim is to discuss and evaluate fairness metrics in a realistic example. Prior knowledge of Python is not necessary and the libraries used are also available in R.
Presenter(s)
Eva Achterhold is a master's student at the Chair for Statistics and Data Science in Social Sciences and the Humanities at LMU Munich. Her research interests include the study of the socio-cultural impact of AI, especially with regard to discrimination and transparency, and the application of methods to mitigate negative consequences. She is currently working on the topic of fairness in algorithmic decision making in the context of allocating support programs to unemployed individuals.
Materials
Matching methods for causal inference with time-series cross-sectional data
more
Online-only event [Zoom Meeting]
October 05, 2022, 08:30-10:00
Abstract
Matching methods improve the validity of causal inference by reducing model dependence and offering intuitive diagnostics. While they have become a part of the standard tool kit across disciplines, matching methods are rarely used when analyzing time-series cross-sectional data. We fill this methodological gap. In the proposed approach, we first match each treated observation with control observations from other units in the same time period that have an identical treatment history up to the pre-specified number of lags. We use standard matching and weighting methods to further refine this matched set so that the treated and matched control observations have similar covariate values. Assessing the quality of matches is done by examining covariate balance. Finally, we estimate both short-term and long-term average treatment effects using the difference-in-differences estimator, accounting for a time trend. We illustrate the proposed methodology through simulation and empirical studies. An open-source software package is available for implementing the proposed matching methods.
Presenter(s)
Erik H. Wang is an Assistant Professor in the Department of Political and Social Change (PSC) at the Australian National University. His research interests center on historical political economy, politics of state-building, and bureaucracy as well as statistical methods of causal inference.
Materials
Application programming interfaces for social scientists: A collaborative review
more
Hybrid event [A5, 6, Room A231 + Zoom]
September 21, 2022, 13:45-15:15
Abstract
Application Programming Interfaces, short APIs, are a technology that includes a set of tools allowing users to send and receive data or functionality through a documented interface. Nowadays, not only developers but also social scientists make use of APIs where typical use cases consist of systematically querying data that are made available by the API. On this occasion, we want to introduce the website "APIs for social scientists: A collaborative review" (Bauer, Landesvatter and Behrens, 2022) which is a collection of examples of different APIs alongside social science examples. The roundtable will be structured into two parts. First, the current editors of the collaborative review introduce the review and its chapter in more general terms. Second, together with our panelists who have authored several chapters in the review, we will discuss various questions surrounding APIs. This includes use cases, opportunities as well as limitations that APIs bring for social science research questions.
Presenter(s)
Camille Landesvatter is a PhD Candidate in Sociology at the University of Mannheim and research associate at the MZES. Her research includes generalized trust and social cohesion for which she draws on methods including survey experiments and text classification. Camille co-founded the API review, is author of multiple chapters and current editor.
Paul C. Bauer is a postdoctoral fellow at the MZES and previous postdoctoral fellow at the European University Institute. His current research focuses on social and political trust as well as polarization for which he draws on experimental methods and a focus on causal inference, text data and data visualization. The API review is only one of many projects he uses to teach topics of computational social science. Paul is the founder of the API review, contributed multiple chapters and is current editor.
Lion Behrens is a PhD Candidate in Political Science at the Graduate School of Economic and Social Sciences (GESS) at the University of Mannheim and a research associate at the Chair of Quantitative Methods in the Social Sciences. His research includes topics of electoral fraud and legislative behavior alongside statistical modeling. Lion contributed a chapter on the CrowdTangle API to the API review and is current editor.
Chung-hong Chan is a postdoctoral fellow at the MZES. His research includes the role of media in conflicts and platform interventions with a further focus on text analysis. Hong is author of multiple chapters in the API review (Mediacloud API, Twitter API) and also provided many other contributions and supervision to the overall project such as a chapter on best practices in using APIs.
Marie-Lou Sohnius is a graduate student of political science at the University of Mannheim. This autumn she will start her PhD in Political Science in Oxford where she will work on topics related to elections and public opinion, with a focus on non-citizen voting rights. In the API review she is author of the chapter on the Spotify API.
Domantas Undzėnas is a graduate student at the University of Mannheim. This autumn he will start his PhD in political science at the Graduate School of Economic and Social Sciences (GESS). His research focuses on political behaviour, especially individual-level authoritarianism and social dominance. In the API review he is author of the chapter on the Reddit API.
Lukas Isermann is a PhD Candidate in Political Science at the Graduate School of Economic and Social Sciences (GESS) at the University of Mannheim and a research associate at the MZES. His areas of research include issue competition and political behavior. Lukas contributed multiple chapters to the API review, namely the Google Places API and the Internet Archive Wayback API.
Materials
2021-2022
more
Detecting implicit biases in large language corpora
more
Online-only event [Zoom Meeting]
June 01, 2022, 13:45-15:15
Abstract
In this tutorial, I will show you how the R package sweater can be used to detect biases in word embeddings. The package provides highly optimized functions to calculate the following bias metrics: mean average cosine similarity, relative norm distance, SemAxis, normalized association score, relative negative sentiment bias, embedding coherence test and word embedding association test. Using two public available word embeddings trained on media content, I am going to demonstrate how sweater can be used to study implicit gender and racial biases.
Presenter(s)
Chung-hong Chan is a Research Fellow at the Mannheim Center for European Social Research (MZES), University of Mannheim.
Materials
Collection, Management, and Analysis of Twitter Data: Using the Twitter API for Academic Research and BERT
more
Online-only event [Zoom Meeting]
May 04, 2022, 13:45-15:15
Abstract
As a highly relevant platform for political as well as social online interactions, social scientists increasingly analyze Twitter data. As of 01/2021, Twitter renewed its API, which now includes access to the full history of tweets for academic usage. In this talk, I will first present a detailed walkthrough of the data collection process, from applying to access to storing the data. Following a brief discussion of data processing routines, I then introduce an application from my own methodological research that uses textual contents of tweets from German members of parliament. It combines the state-of-the-art NLP method BERT with hierarchical shrinkage estimators to obtain legislator-level salience and position metrics for specific policy domains and sub-domains.
Presenter(s)
Andreas Küpfer is a graduate of the Mannheim Master in Data Science and an incoming doctoral researcher at the University of Darmstadt. His interdisciplinary research interests include text as data, applying machine learning technologies, and substantial inference in the fields of political communication and political competition.
Materials
Using Smartphones for Data Collection
more
Online-only event [Zoom Meeting]
April 06, 2022, 13:45-15:15
Abstract
Smartphones are increasingly used as tools for data collection in the social sciences. While web surveys can be administered on these devices, they also allow researchers to passively collect behavioral data from the operating system and built-in sensors. In this talk, I will provide an overview about the opportunities that smartphones can bring for social science research. Examples will include the use of apps for experience sampling, sensors to measure mobility patterns, and operating system log files to measure social interactions. Furthermore, I will discuss different aspects of data quality and practical issues when implementing data collection via smartphones, including coverage, participation, and measurement error.
Presenter(s)
Alexander Wenz is a postdoctoral researcher at the Professorship for Statistics and Methodology at the School of Social Sciences at the University of Mannheim and an incoming member of the Data and Methods Unit at the MZES. His research focuses on survey methodology and digital inequalities.
Materials
Survey data collection from start to finish: Designing and executing reproducible research with an online access panel
more
Online-only event [Zoom Meeting]
March 02, 2022, 13:45-15:15
Abstract
Fielding an online survey with an access panel or crowdsourcing platform can be a quick, flexible, and relatively low-cost method of collecting data from the general population. However, social scientists who want to conduct their own survey for the first time may not know where to begin in planning their data collection. In this talk, we will walk through the process of planning and conducting a survey with an online access panel step by step. We will use our own recent data collection as an example, a survey experiment in Germany conducted as part of a replication seminar at the University of Mannheim in 2021. In addition to topics like ethical approval, sampling, and choosing a survey provider, we will also discuss how researchers can work reproducibly by pre-registering their designs and sharing data and code.
Presenter(s)
Johanna Gereke is a postdoctoral research fellow at the Mannheim Centre for European Social Research (MZES). Her current research focuses on intergroup relations, migration, discrimination and cooperative behavior in modern societies and draws on a range of experimental and quasi-experimental methods, including original lab-in-the-field, survey and field experiments.
Joshua Hellyer is a doctoral researcher at the Mannheim Centre for European Social Research (MZES). His research focuses on discrimination against ethnic and sexual minorities, particularly in the housing and labor markets.
Materials
Getting the most out of comparative vote switching data: A new framework for studying dynamic multi-party competition
more
Online-only event [Zoom Meeting]
December 08, 2021, 13:45-15:15
Abstract
Large literatures on party competition and voting behavior focus on voter reactions to parties' policy strategies, agency, or legislative performance. While many inquiries make explicit assumptions about the direction and magnitude of voter flows between parties, comparative empirical analyses of vote switching remain rare. In this talk, I present a new approach that overcomes three challenges that have previously impeded the comparative study of dynamic party competition based on voter flows: A newly compiled data set that marries comparative vote switching data with information on party behavior and party systems in over 200 electoral contexts across 36 OECD countries, a novel conceptual framework for studying how party behavior affects voter retention, defection, and attraction in multi-party systems, and a statistical model that renders this framework operable. An applied walkthrough showcases the data set and a newly developed R package for the estimation of the newly developed statistical model, along with functions for the calculation and visualization of substantively meaningful quantities of interest.
Presenter(s)
Denis Cohen is a postdoctoral fellow in the Data and Methods Unit at the Mannheim Centre for European Social Research (MZES), University of Mannheim. His research focus lies at the intersection of political preference formation, electoral behavior and political competition. His methodological interests include quantitative approaches to the analysis of clustered data, measurement models, data visualization, strategies for causal identification, and Bayesian statistics.
Materials
Multilingual Automated Text Analysis for Comparative Social Science Research
more
Online-only event [Zoom Meeting]
November 24, 2021, 13:45-15:15
Abstract
Automated text analysis methods have become popular in computational social science. They appeal as they promise the automated extraction of meaning from large numbers of documents, thus allowing to better understand the contents and, indirectly, the document creators and audiences.While the existing techniques are well established for English-language text, the situation is different when it comes to the study of text in more than one language and in languages other than English. Yet it is precisely these multilingual techniques that are needed for (country) comparative research designs. This workshop will start to motivate the need for comparative social science studies that base their interpretations on text data. The main part will provide guidance and many practical tips to help plan such research designs. In particular, it will cover considerations related to the definition of comparative research goals, the selection of a case comparative text data set, the definition of concepts, and the creation of a human annotated validation baseline. The workshop will then focus on methodological strategies that can be employed to obtain measurements from a multilingual corpus with automated text analysis methods. All steps will be illustrated with an applied example. The workshop materials, including slides and scripts, will be made available on GitHub.
Presenter(s)
Fabienne Lind is a research associate at the Department of Communication at the University of Vienna as a part of the H2020 project OPTED. Her research interests include political communication and quantitative methods with a focus on quantitative text analysis.
Materials
MZES Roundtable "Collection of Micro-level Data"
more
Internal online-only event. Open to MZES members and external MZES fellows only.
November 03, 2021, 13:45-15:15
Abstract
The MZES Roundtable "Collection of Micro-level Data” provides a forum for exchanging experiences and for pooling knowledge across several MZES research projects based on original individual-level data collections. It brings together colleagues with expertise in different types of micro-level data, including survey data, field, survey, and lab experiments, social media data, and web tracking data. We will discuss various challenges and opportunities of original micro-level data collection efforts at different points in the project cycle, including initial planning, data collection, data protection, research ethics, analysis, and archiving. With this roundtable, we aim to capture the existing body of knowledge from ongoing, planned, and completed projects, stimulate greater exchange between projects, and strengthen intra-institutional synergies and networks.
The workshop will start with short input presentations summarizing the panelists’ data collection efforts, followed by a moderated discussion and an open Q&A with the audience. Please note that this workshop is open to MZES members and affiliates only.
Presenter(s)
Ruben Bach is a postdoctoral researcher at the University of Mannheim, focusing on social science quantitative research methods. His interests include topics related to big data in the social sciences, machine learning, causal inference, and survey research.
Jörg Dollmann is a research fellow at the Mannheim Centre for European Social Research (MZES) and the project coordinator of the panel survey CILS4EU-DE.
Jennifer Eck is a social psychologist at the University of Mannheim, School of Social Sciences. Her research interests include social exclusion, assimilation and contrast, as well as self-concept.
Alejandro Ecker is Assistant Professor in Politics and Communication in Ibero-America at the Heidelberg Center for Ibero-American Studies (HCIAS) and the Faculty of Economics and Social Sciences at Heidelberg University. Combining observational data with experimental and machine learning methods, his research focuses on the effects of political institutions on the behavior of multiparty governments, political parties, and individual politicians and their consequences for citizen behavior and voter attitudes.
Johanna Gereke is a postdoctoral research fellow at the Mannheim Centre for European Social Research (MZES). Her current research focuses on intergroup relations, migration, discrimination and cooperative behavior in modern societies and draws on a range of experimental and quasi-experimental methods, including original lab-in-the-field, survey and field experiments.
Whose scale is it anyway?
more
Online-only event [Zoom Meeting]
October 20, 2021, 13:45-15:15
Abstract
Exploratory text scaling methods are widely used and have natural multidimensional extensions, but their results require careful interpretation and may simply not reflect constructs of substantive interest. While confirmatory methods have also been suggested, e.g. Wordscores, these are under-theorized such that it is not entirely clear how and when they can be expected to work, or how they might extend to multiple dimensions. Since these questions are currently open, I will review some existing practical solutions to confirmatory scaling, suggests some methods not yet in widespread use, and consider how they might address an as-yet unappreciated and startling artifact of exploratory methods.
Presenter(s)
Will Lowe is Senior Research Scientist at the Hertie School. His research spans legislative politics, political economy, and public policy. Methodologically he is interested in statistical models of text and in causal inference.
Materials
Networks All the Way Down: Assessing Modeling Choices for Political Conversation
more
Online-only event [Zoom Meeting]
September 22, 2021, 13:45-15:15
Abstract
Political conversations, whether online or in person, are networked along multiple dimensions: people come into contact with each other through social networks, they spread messages and ideas using semantic networks, and conversational interactions themselves form a network of back-and-forth exchange. Each of these networked dimensions can be valuable in understanding the political implications of discourse and for developing appropriate interventions around the spread of misinformation and toxic speech. Yet it is rarely practical or meaningful to consider all of these networks simultaneously. Indeed, most studies focus on a single type of social, semantic, or conversational network and make explicit choices about the content of interest and the types of relationships examined. Research on Twitter, for example, may consider social networks formed by follower relationships, semantic networks formed by hashtag co-occurrence, or conversational networks of replies and interactions. Each of these networks is meaningful in its own right, but only captures a piece of the larger public discourse. This paper therefore examines the network modeling choices researchers must make when studying political conversations. Using diverse corpora including Twitter exchanges, Reddit threads, and U.S. Congressional debates, we present a framework for modeling the social, semantic, and conversational networks of political discourse in a range of contexts. We illustrate what can and cannot be inferred from individual network models, and assess the sensitivity of findings to various modeling choices. Ultimately, this paper presents a roadmap to assist researchers in identifying the network models most appropriate for different research questions related to political discourse.
Presenter(s)
Sarah Shugars is a computational political scientist, studying American political behavior and developing new methods in natural language processing, network analysis, and machine learning.
Materials
2020-2021
more
Detecting Intra-Cluster Spillovers Using a Placebo-Controlled Design
more
Online-only event [Zoom Meeting]
June 16, 2021, 15:30-17:00
Abstract
Questions about the degree to which treatment effects diffuse through social networks are of great policy relevance. For example, philanthropic groups routinely deploy media interventions in developing countries to promote pro-social attitudes. Even though studies have shown that such interventions can change the minds of those who are directly exposed to media content, less is known about the existence of second-hand or spillover effects. If audience members convey the media message to others in their social network, the media campaign’s reach expands, perhaps by a sizable factor. Scholarly interest in such spillover effects has grown markedly in recent years, and experimental designs to detect them have become increasingly sophisticated. In this talk, I present a design-based strategy to assess intra-cluster diffusion of treatment effects in cluster-randomized trials. The key design innovation is a placebo condition that helps reveal the degree to which experimental subjects would have been exposed to treatment had they been assigned to it. I contrast the approach with other design-based ways to identify spillover effects and present results from two large cluster-randomized experiments that implement this strategy. Both studies are set in rural Uganda and assess the effect of video dramatizations on the topics of violence against women, teacher absenteeism and abortion stigma. We find several instances of sizable and highly significant direct effects on the attitudes of audience members, but little evidence that these effects diffused to others in the villages where the videos were aired. A paper that employs the proposed design can be found here.
Presenter(s)
Anna M. Wilke is a Ph.D. Candidate in Political Science at Columbia University and a Predoctoral Fellow at the University of California, Berkeley. Her work focuses on the comparative politics of developing countries, mainly in Sub-Saharan Africa. She has conducted field work in South Africa, Uganda and Ethiopia and employs experimental methods and formal theory in her research.
Materials
Telling Stories with Data: Insights into Data Journalism
more
Online-only event [Zoom Meeting]
May 26, 2021, 13:45-15:15
Abstract
Data journalism is all about using and presenting data in a way that readers will intuitively understand them. In this event, we will talk about examples of data-driven stories in order to demonstrate how journalists tell stories using data, what the obstacles are to a good data-driven story and what scientists and journalists can learn from each other regarding data storytelling.
Presenter(s)
Yannik Buhl was formerly a data journalist at Stuttgarter Zeitung and Stuttgarter Nachrichten, where he writes primarily on mobility turnaround and transport policies using intuitive visualisations (example 1, example 2). Prior to joining the Stuttgarter Zeitung and Stuttgarter Nachrichten, he obtained his MA in Political Science with a focus on quantitative methods from the University of Mannheim.
Materials
How to Read Tea Leaves: A hands-on Guide for Semantic Validation of Text Models using Oolong
more
Online-only event [Zoom Meeting]
May 05, 2021, 13:45-15:15
Abstract
The growing supply of unstructured text is a great chance, but also a challenge for social science. In many instances we want to classify, scale or compare text for which no prelabeled data is available. In this case, unsupervised learning techniques such as topic models or the use of dictionaries promise the automated analysis of text with little or no human input. But these models are notoriously difficult to evaluate. While the validation of statistical properties of topics models is well established, the substantive meaning of categories uncovered is often less clear and their interpretation reliant on "intuition" or "eyeballing". Computer science scholars rather call it "reading tea leaves". The story for dictionary-based methods is not better. Researchers usually assume these dictionaries have built-in validity and use them directly in their research. Oolong provides a set of tools to objectively judge substantive interpretability to applied users in disciplines such as political science and communication science. It allows standardized content based testing of topic models as well as dictionary-based methods with clear numeric indicators of semantic validity. This session is a hand-on guide on how to create and administer your own tests. Oolong provides a set of tools to objectively judge substantive interpretability to applied users in disciplines such as political science and communication science. It allows standardized content based testing of topic models as well as dictionary-based methods with clear numeric indicators of semantic validity. This session is a hand-on guide on how to create and administer your own tests.
Presenter(s)
Marius Sältzer is a doctoral researcher in political science at the University of Mannheim. His research revolves around the dimensions of political conflict, e.g., the questions what issues matter for the public, political parties and their constituencies. To answer these questions, he studies political communication of legislators, parties and other key political actors, with a special emphasis on political elites' use of social media.
Materials
Why to use Git and Git essentials workshop: An argument for adopting Git + GitHub/GitLab for academic research followed by a getting started workshop
more
Online-only event [Zoom Meeting]
April 14, 2021, 13:45-15:15
Abstract
The workshop will have two parts: In a ca. 20-30 minute input talk, I will motivate the usage of Git and GitHub/GitLab in the context of social science research by presenting relevant features and introducing potentially relevant workflows from software development. In the second, more interactive part (ca. 1 hour), we’ll go through the typical pull-commit-push-pull request cycle which will enable participants to work with Git and GitHub, mainly using the RStudio Git interface.
Presenter(s)
Frie Preu is a data scientist, a low-budget data engineer and COO of CorrelAid, a data4good network of over 1500 data scientists. Before, she studied political science and data science at the University of Konstanz and worked in IT consulting.
Materials
Generalized Additive Models: Allowing for some wiggle room in your models
more
Online-only event [Zoom Meeting]
March 17, 2021, 13:45-15:15
Abstract
In this workshop, we'll unpack GAMs as an extension of generalized linear models, learn about the role of splines in these models, and explore the many choices available to define and fit these models. We'll be using data on traffic stops to investigate racially-biased policing in South Carolina as a motivating example, and we'll get a chance to try out the related R code so that you have the basic tools needed to try out GAMs in your own research context.
Presenter(s)
Sara Stoudt is a lecturer in the Statistical & Data Sciences program at Smith College. She received her PhD in statistics from the University of California, Berkeley where she was also a Berkeley Institute for Data Science, and her BA in Mathematics with an emphasis on Statistics from Smith College. Her research focuses on ecological applications of statistics and statistics communication.
Materials
Reproducible and Dynamic Documents with RMarkdown
more
Online-only event [Zoom Meeting]
March 03, 2021, 13:45-15:15
Abstract
As demands for computational reproducibility in science are increasing, tools for literate programming are becoming ever more relevant. R Markdown offers a framework to generate reproducible research in various output formats. I present a new package (reproducr) that allows users without any prior knowledge of R Markdown to implement reproducible research practices in their scientific workflows. The reproducr package provides a single Rmd-template that is fully optimized for two different output formats, HTML and PDF. While in the stage of explorative analysis and when focusing on content only, researchers may rely on the 'draft mode' of the template that knits to HTML and allows them to interactively explore their data. When in the stage of research dissemination and when focusing on the presentation of results, in contrast, researchers may rely on the 'manuscript mode’ that knits to PDF and allows them to circulate a publication-ready version of their working paper or submit it (blinded) for review.
Presenter(s)
Julia Schulte-Cloos is a Marie Sklodowska-Curie funded LMU Research Fellow at the Geschwister Scholl Institute of Political Science at LMU Munich. Her research lies at the intersection of comparative politics, political sociology and socio-psychology. An advocate of open science, she is a member of the LMU Open Science Center and part of the catalyst network of the Berkeley Initiative for Transparency in the Social Sciences (BITSS).
Materials
Factorial Survey Designs
more
Online-only event
December 09, 2020, 13:45-15:15
Abstract
The factorial survey (vignette analyses) is a method that integrates multi-factorial experimental designs into surveys. Respondents are asked to evaluate fictitious situations, objects or persons. By systematically varying attributes of the descriptions it is possible to determine their influence on respondents' stated attitudes, decisions, or choices. Due to experimental variation of stimuli researchers can estimate the influence of each attribute on respondents' evaluations. As the experiment is embedded in a survey questionnaire, it is possible to reach heterogeneous sample populations. This workshop provides insights into the steps that are necessary to design factorial survey experiments: (1) construction of vignettes and response scales, (2) selection of an experimental design, (3) programming of vignettes for implementation into questionnaires, (4) data management, (5) data analysis techniques. The workshop furthermore discusses (6) methodological issues and best practices and shows similarities and differences to (7) related methods like conjoint analysis and choice experiments.
Presenter(s)
Carsten Sauer is Full Professor of Sociology and Social Stratification in the Department of Political & Social Sciences at Zeppelin University, Friedrichshafen, Germany. His susbtantive work focuses on social inequality, social stratification, labor markets, empirical justice research organizations, and health. Methodologically, he is interested in quantitative methods, survey experiments, and longitudinal data analysis.
Materials
Enabling Collaborative and Reproducible Data Science with the Renku Platform
more
Online-only event
November 18, 2020, 15:30-17:00
Abstract
Communities and funding sources are increasingly demanding reproducibility in scientific work. There are now a variety of tools available to support reproducible data science, but choosing and using one is not always straightforward. In this tutorial, we present RENKU: an open-source platform integrating git, Jupyter/RStudio Server, Docker, analysis workflows linked with a queryable knowledge graph.
Presenter(s)
Christine Choirat is the Chief Innovation Officer of the Swiss Data Science Center and an Adjunct Lecturer on Biostatistics at the Harvard T.H. Chan School of Public Health and at the Harvard Extension School. Her research interests are data science and high-performance computing, reproducible research, and environmental policy and health policy.
Emma Jablonski is a doctoral research in the History of Science Program at UC San Diego, CA, USA. Previously, she worked on systems to facilitate computational molecular dynamics research at D. E. Shaw Research and on exoplanet climate modeling in the astrobiology group at NASA GISS, both in New York City. Her research interests include networks and complexity as applied to life in the universe and also to the flow of scientific information through academia and society.
Materials
Generative Adversarial Nets for Social Scientists
more
Online-only event
November 04, 2020, 13:45-15:15
Abstract
In this talk I introduce Generative Adversarial Networks (GANs) for Social Scientists. GANs are an innovative neural network architecture where two neural networks adversarially learn arbitrary target distributions. A Generator network learns to produce simulated samples that mimic real data. At the same time, a Discriminator network learns to distinguish between real and simulated data. A GAN is successful in producing simulated data if a Discriminator is maximally uncertain about the origins of the data (real or simulated). GANs achieve impressive results in producing synthetic samples from complex data like images (e.g. cats, faces) or audio data (e.g. voices, songs). In this talk, I introduce current applications of GANs and present my work on their use for Social Science research. In particular, I will cover applications to Multiple Imputation, Small Area Estimation and the Generation of fully Synthetic Data. All applications will be accompanied by hands-on code examples.
Presenter(s)
Marcel Neunhoeffer is a PhD Candidate and Research Associate at the chair of Political Science, Quantitative Methods in the Social Sciences, at the University of Mannheim. His research focuses on political methodology, specifically on the application of deep learning algorithms to social science problems. His substantive interests include data privacy, political campaigns, and forecasting elections.
Materials
Management and Analysis of Georeferenced Survey Data
more
Online-only event
October 21, 2020, 13:45-15:15
Abstract
Geospatial methods have become an emerging field in social science survey research where Geographic Information Systems (GIS) facilitate enriching individual-level survey data with auxiliary geospatial information, such as road traffic noise. This development is due to researchers' increased general interest in the questions that can be answered via these methods, but also because of more and more available data. However, this endeavor remains an issue because applying GIS in social science survey research is challenging, requiring new analytical skills from diverse and foreign disciplines, such as ecology and engineering. Data management issues, including technical procedures, data protection, and access to georeferenced survey data, must be resolved. Lastly, researchers are confronted with how these additional data build upon existing knowledge within the social sciences. In my talk, I give a general overview of the data management challenges of using GIS in social science survey research, including organizational, technical, and legal barriers. I show examples of different GIS methods for enriching survey data, and further demonstrate their analysis in 'real-life' social science applications. Nowadays, we can perform all these steps in the statistical language R. Therefore, since providing public access to georeferenced survey data is a bit tricky, I conclude with a small hands-on tutorial on wrangling geospatial data in R for creating maps.
Presenter(s)
Stefan Jünger is a postdoctoral researcher at GESIS, Leibniz Institute for the Social Sciences, where he provides services in the area of geocoding, georeferencing, and spatial linking. He is also deputy head of the GESIS Secure Data Center. His research focuses on the use, analysis and management of georeferenced data in social science (survey) research.
Materials
2019-2020
more
Extracting Emotions (and more) from Faces with Face++ and Microsoft Azure
more
Online-only event
May 12, 2020, 12:00-13:30
Abstract
Images are an increasingly used data source in the social sciences. One application is to extract features from human faces using machine learning algorithms. This workshop will provide a guide on how to use APIs for this task, specifically how to access the services offered by Face++ and the Microsoft Face API. While the talk focuses on extracting emotions from facial expressions, the method can also be used for other variables of interest such as gender or age. The talk starts with a short introduction on why we should care about emotions in social sciences, why APIs are useful for the task of facial expression recognition and where to apply caution with this method. The main part will be a walkthrough, to show (1) how to gain API access credentials, (2) how to call the API from R and (3) how to handle the output.
Presenter(s)
Theresa Küntzler is PhD candidate at the Graduate School of Decision Sciences at the University of Konstanz. She specializes in information processing and statistical analysis. Her research focuses on the role the of emotions in politics, especially in election behaviour.
Materials
Inferential Network Analysis (and Big Data): Challenges and Opportunities
more
Online-only event
April 21, 2020, 12:00-13:30
Abstract
Why should we take networks seriously? What are the gains and sacrifices analysing our social world from a network perspective? The talk starts off with this general question and continues with a brief navigation through the methodological world of network analysis. A hands-on guide (in the statistical software R) on how to create, describe, and test patterns in networks follows. The last part of the talk is dedicated to current challenges in network analysis. This encompasses the lag of good and interesting theories in network studies as well as the challenge (yet opportunity) of estimating inferential network models on large datasets.
Presenter(s)
Lisa Lechner is Assistant Professor in Political Science Methodology at the University of Innsbruck. Her research interests are trade policy, tax policy, diffusion, and issue-linkage. Her methodological expertise includes automatic text analysis and network analysis.
Materials
Remote Computing Services: bwCloud, bwHPC, and Beyond
more
Online-only event
March 31, 2020, 12:00-13:30
Abstract
Demand for computational resources in the social sciences increases steadily, often to the point where local solutions are no longer sufficient. Reasons for this may be long runtimes and large memory requirements, the need for specific hardware, the demand for an optimized software stack, or the desire to parallelize applications over a large number of cores. We will offer an introduction to bwHPC, the federal supercomputing project of the universities in Baden-Württemberg, as well as a comparison to cloud solutions. Specifically, we will focus on the architecture and use of the bwForCluster MLS&WISO, located in Mannheim and Heidelberg. The talk will be aimed at entry-level users, as the system and its logic can be difficult to understand for newcomers, but will leave room for advanced questions.
Presenter(s)
Hendrik Winkhardt is an IT staff member at the University of Mannheim where he works for the bwHPC-S5 project "High Performance Computing in Baden-Württemberg".
Materials
Efficient Data Management in R
more
Room A-231, A5, 6, 68159 Mannheim
February 18, 2020, 12:00-13:30
Abstract
The software environment R is widely used for data analysis and data visualization in the social sciences and beyond. Additionally, it is becoming increasingly popular as a tool for data and file management. Focusing on the latter aspects, we present workflows and best practices for efficient data management in R. Through applied exercises and walkthroughs, participants will learn about (1) the workflow for organizing and conducting complex analyses in R, (2) creating, editing, and accessing directory hierarchies and their contents, (3) data merging, data management and data manipulation using tidy R and base R, and (4) the basics of programming and debugging.
Presenter(s)
Denis Cohen is a postdoctoral fellow in the Data and Methods Unit at the Mannheim Centre for European Social Research (MZES), University of Mannheim, and one of the organizers of the MZES Social Science Data Lab. His research focus lies at the intersection of political preference formation, electoral behavior, and political competition. His methodological interests include quantitative approaches to the analysis of clustered data, measurement models, data visualization, strategies for causal identification, and Bayesian statistics.
Cosima Meyer is a doctoral researcher and lecturer at the University of Mannheim and one of the organizers of the MZES Social Science Data Lab. Motivated by the continuing recurrence of conflicts in the world, her research interest on conflict studies became increasingly focused on post-civil war stability. In her dissertation, she analyzes leadership survival - in particular in post-conflict settings. Using a wide range of quantitative methods, she further explores questions on conflict elections, women's representation as well as autocratic cooperation.
Marcel Neunhoeffer is a PhD Candidate and Research Associate at the chair of Political Science, Quantitative Methods in the Social Sciences, at the University of Mannheim. His research focuses on political methodology, specifically on the application of deep learning algorithms to social science problems. His substantive interests include data privacy, political campaigns, and forecasting elections.
Oliver Rittmann is a PhD Candidate and Research Associate at the chair of Political Science, Quantitative Methods in the Social Sciences, at the University of Mannheim. His research focuses on legislative studies and political representation. His methodological expertise includes statistical modeling, authomated text and video analysis, and subnational public opinion estimation.
Materials
Using Web Logs and Smartphone Records for Social Research
more
Room A-231, A5, 6, 68159 Mannheim
December 10, 2019, 12:00-13:30
Abstract
In this talk, I will demonstrate how web logs (records of individuals' browsing behavior) and records of smartphone use can be used for social research, for example, to study political views and behaviors. First, I will talk about the question how to obtain such data and how one can extract information about individuals' behavior from web logs. Second, I will present results of my own work (predicting political views and behaviors from web logs) and from other studies that work with similar data (e.g., studies of political polarization and echo chambers in the online world). I will conclude the talk with a short overview of ongoing projects and potentials for future research projects.
Presenter(s)
Ruben Bach is a postdoctoral researcher at the University of Mannheim, focusing on social science quantitative research methods. His interests include topics related to big data in the social sciences, machine learning, causal inference, and survey research.
Materials
Introduction to LaTeX and Overleaf
more
Room A-231, A5, 6, 68159 Mannheim
November 26, 2019, 12:00-13:30
Abstract
The LaTeX workshop offers an introduction, hands-on practices and a template for scientific articles. The aim is to provide the participants with sufficient knowledge of the general set-up of LaTeX to write (future) papers and to cope with common problems. We cover the LaTeX environment, including packages, structure, and commands. This allows to substantially improve the academic workflow. We further provide an originally generated template specifically made for this workshop that can later be used by the participants to get easily started with their projects in LaTeX.
Presenter(s)
Cosima Meyer is a PhD candidate at the Doctoral Center in Social and Behavioral Science of the Graduate School of Economics and Social Sciences, a research associate at the Chair of Political Science IV at the University of Mannheim, and a co-editor of Methods Bites. Her research focuses on conflict studies, particularly post-civil war stability.
Dennis Hammerschmidt is a PhD candidate at the Doctoral Center in Social and Behavioral Science of the Graduate School of Economics and Social Sciences and a research associate at the Chair of Empirical Democracy Research at the University of Mannheim. His research focuses on the alignment structure of states in the international system and the strategic application of foreign aid with a focus on vote-buying in international organizations. His methodological expertise includes general quantitative research, text analysis, and network analysis.
Materials
Causal Graphs
more
Room A-231, A5, 6, 68159 Mannheim
November 05, 2019, 12:00-13:30
Abstract
This workshop discusses causal graphs as a fundamental modelling framework and highly useful tool for empirical researchers in the social sciences. Questions addressed in interaction with participants include drawing and interpreting a graph, understanding d-separation, the nature of post-treatment bias and other common mistakes in observational studies, the connection of causal graphs to structural models and potential outcomes, and using them to better understand instrumental variable and mediation analysis.
Presenter(s)
Julian Schuessler is a PhD Student at the Graduate School of Decision Sciences at the University of Konstanz, Germany, where he is also affiliated with the Center for Data and Methods. His research focuses on public support for the European Union, political economy, and quantitative methods. His methodological interests include non-parametric causal inference, especially using graphs, and Bayesian statistics.
Materials
Shiny Apps: Development and Deployment
more
Room A-231, A5, 6, 68159 Mannheim
October 15, 2019, 12:00-13:30
Abstract
Shiny Apps allows developers and researchers to easily build interactive web applications only using the statistical software R. These apps allow R developers to interactively communicate their work to a broader audience in order to facilitate outreach. Since Shiny Apps comes with an extensive backend setup, users do not need extensive web development skills to build and host standalone apps on a homepage. However, for those keen in building beautiful apps, Shiny Apps allows for CSS, html and JavaScript extensions. In this workshop, I introduce the Shiny environment and show important features to develop Shiny apps, which can be used either for data presentation, as a communication tool for results or even as interactive analytical tool. Using the example data sets by R, I introduce the distinction between front-end ui.R and back-end server.R required to build Shiny apps. Based upon this, I will introduce important concepts and features to build an interactive app, including control widgets, reactivity and rendering. The participants will be able to build their own Shiny App after this workshop. In the last part of the workshop, I am going to show two ways of deploying Shiny Apps (letting them run in the world wide web), shinyapps.io and Shiny Server.
Presenter(s)
Konstantin Gavras is a Ph.D. candidate at the Graduate School of Economic and Social Sciences in Political Science, research associate at the Chair of Political Psychology at the University of Mannheim and doctoral researcher for the MZES project "Fighting together, moving apart? European common defence and shared security in an age of Brexit and Trump". His research interests comprise the intersection of Social Psychology and Political Behavior, focusing on the behavioral consequences and conditions underlying political attitudes regarding both domestic and foreign policies.
Materials
Randomized Experiments and Randomization Inference
more
Room A-231, A5, 6, 68159 Mannheim
September 16, 2019, 15:30-17:00
Abstract
Randomization inference is a design-based approach to hypothesis testing, which relies on minimal assumptions and enables the researcher to "analyse as you randomize". Randomization inference considers what would have happened under all possible random assignments (all possible ways of assigning N number of units to treatment and control). Against the backdrop of all possible random assignments, is the actual experimental result unusual, and how unusual is it? Randomization inference is flexible and allows for the test of different sharp hypotheses, using a variety of test-statistics to obtain p-values, which have an intuitive interpretation: the share of random assignments that produce a test statistic as large or larger than the statistic obtained from the realised experiment. Randomization-inference-based p-values can differ from p-values obtained from conventional tests if samples are small and/or if test-statistics are not normally distributed. During the workshop, building on the potential outcomes framework, I will introduce participants to the logic of randomization inference, and discuss applied examples both on the white board and using the ri2 package in R.
Presenter(s)
Florian Foos is an Assistant Professor in Political Behaviour in the Department of Government at the London School of Economics and Political Science (LSE). His research focuses on partisan election campaigns, including electoral mobilization, opinion change and political activism of politicians. His methodological expertise includes the design, conduct, and analysis of randomized field experiments as well as natural and quasi-experiments.
Materials
Introduction to the Potential Outcomes Framework
more
Room A-231, A5, 6, 68159 Mannheim
September 10, 2019, 12:00-13:30
Abstract
This talk introduces participants to the potential outcomes framework, one of the primary approaches to causality in the social sciences and beyond. The talk covers the basic intuition of counterfactual causality as well as the fundamental problem of causal inference and relates core assumptions of frequently used identification strategies to the potential outcomes framework. A hands-on simulation exercise allows participants to apply the framework to artificial data and to further their understanding of biases in causal quantities of interest when core assumptions are violated.
Presenter(s)
Denis Cohen is a postdoctoral fellow in the Data and Methods Unit at the Mannheim Centre for European Social Research (MZES), University of Mannheim. His research focus lies at the intersection of political preference formation, electoral behavior and political competition. His methodological interests include quantitative approaches to the analysis of clustered data, measurement models, data visualization, strategies for causal identification, and Bayesian statistics.
Materials
2018-2019
more
Studying Politics on and with Wikipedia
more
May 06, 2019
Abstract
The online encyclopedia Wikipedia, together with its sibling, the collaboratively edited knowledge base Wikidata, provide incredibly rich yet largely untapped sources for political research. In this hands-on workshop, I will show how these platforms can inform research on public attention dynamics, policies, political and other events, political elites, and parties, among other things. To that end, I will show how to use R and the packages WikipediR, WikidataR, pageviews, and wikipediatrend to connect with APIs from the Wikimedia foundation and efficiently access and parse content. Furthermore, I will provide an overview of the legislatoR package, a fully relational individual-level data package that comprises political, sociodemographic, and Wikipedia-related data on elected politicians across the globe.
Presenter(s)
Simon Munzert is a lecturer in Political Data Science at Hertie School of Governance, Berlin. A former member of the MZES Data and Methods Unit, he is the originator of the Social Science Data Lab. His research focuses on public opinion, political representation and the role of new media for political processes.
Materials
Applied Bayesian Statistics using Stan and R
more
April 17, 2019
Abstract
This 90 minute workshop provides an applied introduction to Stan, a platform for statistical modeling and Bayesian statistical inference. Participants will get an overview of the programming language, the R interface RStan, and the workflow for Bayesian model building, inference, and convergence diagnosis. Applied exercises provide participants with the chance to write and run their own models.
Presenter(s)
Denis Cohen is a postdoctoral fellow in the Data and Methods Unit at the Mannheim Centre for European Social Research (MZES), University of Mannheim. His research focus lies at the intersection of political preference formation, electoral behavior and political competition. His methodological interests include quantitative approaches to the analysis of clustered data, measurement models, data visualization, strategies for causal identification, and Bayesian statistics.
Materials
Collecting and Analyzing Twitter Data Using R
more
March 27, 2019
Abstract
This 90 minute workshop provides an overview about Twitter data and how to collect and analyse it using R. Participants learn how to access Twitter's API in order to collect data for their own research projects. A number of examples illustrate how to preprocess and analyse the content and meta-information of Tweets.
Presenter(s)
Simon Kühne is a post-doc at Bielefeld University. He holds a BA in Sociology and an MA in Survey Methodology from the University of Duisburg-Essen and a PhD in Sociology from Humboldt University of Berlin. His research focuses on survey methodology, social media and online data, and social inequality.
Materials
Roundtable on Text as Data (Part II)
more
February 27, 2019
Abstract
- Marius Sältzer: Sentiment Analysis for German Tweets by Election Candidates
- Samuel Müller: Automated Extraction of Reasoning Using Topic Models
- Konstantin Gavras: Inferring Policy Preferences from Strategy Papers on National Security in Europe using Unsupervised Machine Learning Technique
Advancing Text Mining with R and quanteda
more
January 30, 2019
Abstract
The usefulness of R for text mining and content analysis has greatly increased in recent years, especially following the release of specialized packages such as tm, stringr and tidytext. My interactive presentation will focus on quanteda, which has rapidly become a all-purpose framework for conducting text mining with R due to its high functionality, speed and quality of documentation. I will showcase a number of techniques from corpus compilation and cleaning to the application of dictionaries such as LIWC and Lexicoder Policy Agendas and the application of text scaling models such as Wordscores and Wordfish. I will also show how topic modeling and supervised machine learning for extrapolating content categories can be applied through the topicmodels, STM and RTextTools packages, and point to interfaces with external services such as the Microsoft Cognitive Services and Google Cloud Machine Learning API. My presentation will close with suggestions for improving the robustness and reproducibility of content analyses conducted with R.
Presenter(s)
Cornelius Puschmann is a senior researcher at the Leibniz Institute for Media Research in Hamburg.
Materials
Introduction to R
more
December 05, 2018
Abstract
This brief introduction to R covers the following topics:
- Algebraic operators and transformation
- Object types and conversions
- Control structures (loops, conditions, etc.)
- Writing simple functions
- Installing, updating, and using packages
- Getting help in R
- Data import and export
- A glimpse on the tidyverse package
- A quick first self-authored package in R
Presenter(s)
Julian Bernauer is a postdoctoral fellow in the Data and Methods Unit at the Mannheim Centre for European Social Research (MZES), University of Mannheim, and a co-organizer of the Social Science Data Lab.
Denis Cohen is a postdoctoral fellow in the Data and Methods Unit at the Mannheim Centre for European Social Research (MZES), University of Mannheim, and a co-organizer of the Social Science Data Lab.
Materials
Topic-centric Sentiment Analysis of UK Parliamentary Debates
more
November 28, 2018
Abstract
Debate transcripts from the UK House of Commons provide access to a wealth of information concerning the opinions and attitudes of politicians and their parties towards arguably the most important topics facing societies and their citizens, as well as potential insights into the democratic processes that take place within Parliament. In my PhD project, I apply natural language processing and machine learning methods to debate speeches with the aim of determining the attitudes and positions expressed by speakers towards the topics they discuss. In this talk, I will present research on speech-level sentiment analysis and opinion-topic/policy detection in debate motions, as well as ongoing work on compiling a comprehensive review of research from both computer science and social science in this area. I will also discuss the challenges presented and multidisciplinary approaches to the problem, and present ideas for the direction of future investigation.
Presenter(s)
Gavin Abercrombie pursues a PhD in natural language processing at the School of Computer Science, University of Manchester.
Materials
Roundtable on Text as Data (Part I)
more
November 21, 2018
Abstract
- Dennis Hammerschmid: "Talk and Action in the United Nations - How Text Analysis can Help to Uncover Vote-Buying in the International Arena"
- Verena Kunz: "Position Blurring as a Response to Competing Principals? Assessing Speech Clarity in the European Parliament"
- Jason Eichorst: "Political Competency Signals in Word Choice"
- Julian Bernauer and Federico Nanni: "Cross-Lingual Topical Scaling of Sparse Political Text using Word Embeddings"
Fast, cheap, but is it still good? An opinionated guide to crowdsourcing platforms in 2018
more
May 12, 2018
Abstract
In 2008, four prominent Stanford AI researchers published an article "Fast, Cheap - but is it good?" and claimed crowdsourcing can produce very high-quality data for scientific research. A decade has passed and social scientists are picking up the pace to deploy crowdsourcing to collect survey data and conduct content analysis. A new silver bullet is born. In this talk, I will share my experience of using a crowdsourcing platform to conduct a large-scale, multilingual content analysis (a.k.a. crowdcoding). I will briefly go through the promises of those platforms in the literature and then talk about the pitfalls. A realistic conclusion is: it is impossible to obtain both fast, cheap, and good data from those platforms. As in the real life, it is only possible to take at most two out of the three. Sometimes you take none of them.
Presenter(s)
Chung-hong Chan is a Research Associate at the Mannheim Center for European social science research (MZES), University of Mannheim.
Materials
2017-2018
more
Dealing with the complexity of cross-national data: The method of web probing
more
May 09, 2018
Abstract
There has been a tremendous increase in cross-national data production in social science research in recent decades. Before drawing substantive conclusions based on cross-national survey data, researchers need to verify whether the measures are indeed comparable. An important addition to quantitative measurement invariance tests are qualitative approaches, such as web probing. In the first part, I will discuss why comparability of data should not be assumed but needs to be tested. I will shortly present the different approaches to test for and explain (in)comparability of data, introduce the method of web probing and present studies where web probing could shed light on incomparable data. In the second part of this talk, I will discuss different aspects of the implementation of web probing, such as sample size, nonresponse conversion, the optimal visual design (e.g., textbox size, order of probes) and how to analyze such data.
Results from a Text Scaling Hackathon
more
April 11, 2018
Abstract
In my talk I'll offer an overview of a shared-task hackathon that took place as part of a research seminar bringing together a variety of experts and young researchers from the fields of political science, natural language processing and computational social science. The task looked at ways to develop novel methods for political text scaling to better quantify political party positions on European integration and Euroscepticism from the transcript of speeches of three legislations of the European Parliament. I will also focus on the potential of hackathons for fostering interdisciplinary collaborations between computer science and the social sciences and the next steps of my research group in this direction.
Materials
This paper summarizes the results of the hackathon. Here is related code for cross-lingual classification and scaling.Cognitive Pretesting Methods
more
March 21, 2018
Abstract
This talk highlights the general importance of carrying out cognitive pretests before fielding a questionnaire. This is done by presenting examples of untested as well as pretested and improved questions. With regard to cognitive pretesting methods, we provide an introduction to the traditional cognitive interview (e.g., f2f interviewing) and give an overview of current developments (e.g., combining f2f interviews with eye-tracking, conducting cognitive pretests over the Web). Finally, we discuss the pros and cons of these different cognitive pretesting methods and offer practical advice on how to conduct cognitive pretesting projects.
Quantitative Analysis of Political Text: Tools and Applications
more
March 14, 2018
Abstract
The workshop introduces concepts and methods for the quantitative analysis of political text (QTA) in R. Speeches delivered by prime ministers during the Euro-Crisis (EUSpeech dataset) serve as an application for the demonstration of text preparation, visualization, scaling, topic models and sentiment analysis. After an introduction of the text corpus and a brief discussion of QTA methods, the participants have the opportunity to carry out some QTA themselves under the instructors' supervision.
Presenter(s)
Denise Traber is a Senior Research Fellow at the University of Lucerne, Switzerland, where she heads an Ambizione research grant project on "The divided people: polarization of political attitudes in Europe" funded by the Swiss National Science Foundation. She has a strong interest in quantitative text analysis, has co-organized the first "Zurich Summer School for Women in Political Methodology" in 2017 and has recently published the article "Estimating Intra-Party Preferences: Comparing Speeches to Votes" in PSRM, jointly with Daniel Schwarz and Ken Benoit.
Materials
Introduction to Social Media's RESTful APIs and data collection with SocialMediaLab
more
February 21, 2018
Abstract
In this talk, I will demonstrate how to collect data from social media. I will walk through how RESTful API works and how to obtain API access rights from Facebook, Twitter and Youtube (optional topic: Sina Weibo). The R package SocialMediaLab will be introduced, which is a easy tool for social media data collection and data transformation.
Materials
Introduction to Structural Equation Modeling
more
November 29, 2017
Abstract
In our talk we will introduce participants to the techniques of structural equation modeling (SEM). We will show how a theoretical model represented through measurement models and possibly causal relationships can be applied to empirical data. The talk presents basic models relevant for social scientist: we start with exploratory and confirmatory factor analysis (EFA and CFA) and then move on to path models, latent class models and measurement invariance. In our talk we will also show how to use the statistical software Mplus to perform SEM. No previous knowledge of Mplus is required. Workshop participants can download and install Mplus if they want to follow the examples in class. A demo version is available here.
Materials
Social Network Analysis with igraph
more
November 28, 2017
Abstract
This talk introduces the nuts and bots of social network analysis, and how to do it in R using the package igraph. In this talk, I will quickly walk through the concept of graph (social network), the common scenarios of data collection and the usual analysis patterns. Getting up close and personal, I will use the data scraped from the MZES website as an example to demonstrate how to collect, analyze and visualize the MZES collaboration network. Let's find out the most important researchers and fractions in MZES or not.
Materials
Visual Inference for the Social Sciences
more
October 18, 2017
Abstract
This talk introduces a remedy to the criticism frequently voiced against data visualization and exploration: that it may give rise to an over-interpretation of random patterns. A way to overcome this problem is the realization that "visual discoveries" correspond to the implicit rejection of "null hypotheses". The basic idea of visual inference is that graphical displays can be treated as "test statistics" and compared to a reference distribution of plots under the assumption of the null. Visual inference helps us answer the question "Is what we see really there?" By so doing, it seeks to overcome long-standing reservations against visualization as merely "informal" approach to data analysis and the fear that beautiful pictures may in fact not correspond to any meaningful patterns of substantive scientific interest. The talk illustrates the application and benefits of this visual method by drawing on examples from the social sciences. A little lab exercise will encourage participants to try out visual inference in practice using the statistical programming language R.
Materials
Using mainly Stata and increasingly R (and knitr)
more
October 04, 2017
Abstract
Very familiar with Stata, probably like most of you, throughout my project and dissertation work, I came to increasingly incorporate R in my data analysis and even my data edition. In one instance, I had to run specific models for network analysis that I was not able to run in Stata. Then I ran the analysis in R but kept doing the entire preceeding data edition in Stata. In another instance, I ran a simulation model in R, which by nature slightly changed its results every time I ran it. As I wanted to avoid a time-consuming copy-and-paste marathon between R and Word, I wrote the manuscript describing this simulation model using knitr. The reason was that it automatically handed over the values, figures, and tables to a latex processor producing a nice document. In this talk I simply describe these developments in my workflow to show you how you may gain from incorporating R or knitr in small dosages in your Stata workflow.
Materials
2016-2017
more
Introduction to Unipark
more
April 26, 2017
Abstract
In this Social Science Data Lab, I will give an introduction to the EFS Survey Software from Unipark (Questback). If you have never worked with the tool, then you will learn how to set up a first questionnaire to collect survey data over the Internet. We will discuss basic principals of participant recruitment, web questionnaire layout, and study design to conduct methodologically sound web surveys. This will also include taking into account the increasing number of respondents who participate in web surveys using their smartphone. For those who already have worked with Unipark before, we will have time to discuss more advanced features of the software such as working with quotas, lists, and loops.
Materials
Topic-based and Cross-lingual Scaling of Political Text
more
March 29, 2017
Abstract
Political text scaling aims to linearly order parties and politicians across political dimensions (e.g., left-to-right ideology) based on textual content (e.g., politician speeches or party manifestos). Existing models, such as Wordscores and Wordfish, scale texts based on relative word usage; by doing so, they do not take into consideration topical information and cannot be used for cross-lingual analyses. In our talk, we present our efforts toward developing a topic-based and cross-lingual political text scaling approach. First we introduce our initial work, TopFish, a multi?level computational method that integrates topic detection and political scaling and shows its applicability for temporal aspect analyses of political campaigns (pre-primary elections, primary elections, and general elections). Next, we present a new text scaling approach that leverages semantic representations of text and is suitable for cross-lingual political text scaling. We also propose a simple and straightforward setting for quantitative evaluation of political text scaling.
Materials
Building Infrastructure for Data-Driven Research
more
March 15, 2017
Abstract
Most methods for data-driven research (including Big Data, Data Science, and Digital Humanities) work primarily on text data or numbers. However, there is also a lot of information which is only available in printed books or newspapers. This information has to be first digitized and then further processed to extract the text or data. The main focus of the talk is optical character recognition (OCR). We will see the OCR workflow in general, discuss some OCR software, and how you can use these tools practically. Building such an infrastructure or performing these initial steps may need a reasonable amount of time and resources, or also be a project itself. The Mannheim University Library has in this area some infrastructure projects which are briefly mentioned.
Materials
Functional Data Analysis in a Nutshell
more
February 15, 2017
Abstract
Functional data analysis (FDA) is a field of statistics that deals with the analysis of data that have a functional character. Functional data include curves, images, surfaces and trajectories. In the following, we will focus on curves. Growth curves are an example for one-dimensional functional data observed over time. Other examples are spectrometric measures over wavelength or blood markers measured continuously over time. FDA is applied in diverse fields including biometry, demography, medicine, linguistics and finance. Instead of analyzing single points on the curves, FDA treats the curves as observation units. The talk will approach FDA rather intuitively to give an idea of functional data. The talk covers basic summary statistics, like mean and variance for functional data, and contains an outlook to more complex methods like regression with functional data.
Materials
Advanced R and Recent Advances in R
more
January 18, 2017
Abstract
This one-day course is set out to improve your R skills and make you a more efficient programmer. In particular, you will:
- become better at file management with R
- learn all about piping operators
- understand what functional programming means
- get an overview of string processing and regular expressions
- get to know new tools that help you tidy data
- learn how to manipulate data frames efficiently
- be able to routinely split-apply-combine your data
- learn to establish a debugging workflow
Materials
Data Visualization
more
December 16, 2016
Abstract
Data visualisation is one of the most powerful tools to explore, understand and communicate patterns in quantitative information. At the same time, good data visualisation is a surprisingly difficult task and demands three quite different skills: substantive knowledge, statistical skill, and artistic sense. The course is intended to introduce participants to a) key principles of graphical perception and analytic design, b) useful visualisation techniques for the exploration and presentation of various forms of data and c) new developments of data visualisation for the social sciences, such as visual inference and visualising statistical models.
Materials
Fundamentals in Bayesian Statistics
more
December 14, 2016
Abstract
Besides the frequentist approach to statistical inference, which was dominant in science in the 20th century, another school exists: Bayesian Statistics. With modern computational techniques, Bayesian data analysis has a proven track-record and established itself as an alternative to frequentist procedures. Sometimes, Bayesian techniques can be applied to complex scientific questions where no frequentist solution exists. This talk gives an introduction to Bayesian statistics. While it is not possible to avoid central mathematical formulas and derivations, I concentrate on concepts, intuitive motivations, and interpretations that underlie the Bayesian view. Critical model assumptions are also discussed. Participants will learn when to mistrust a Bayesian analysis and in which situations it may provide new insights.
Materials
An Open Science Primer for Social Scientists
more
November 16, 2016
Abstract
"Open Science" has become a buzzword in academic circles. However, exactly what it means, why you should care about it, and - most importantly - how it can be put into practice is often not very clear to researchers. In this session of the SSDL, we will provide a brief tour d'horizon of Open Science in which we touch on all of these issues and by which we hope to equip you with a basic understanding of Open Science and a practical tool kit to help you make your research more open to other researchers and the larger interested public. Throughout the presentation, we will focus on giving you an overview of tools and services that can help you open up your research workflow and your publications, all the way from enhancing the reproducibility of your research and making it more collaborative to finding outlets which make the results of your work accessible to everyone. Absolutely no prior experience with open science is required to participate in this talk which should lead into an open conversation among us as a community about the best practices we can and should follow for a more open social science.
Materials
Statistical Boosting with mboost
more
October 19, 2016
Abstract
The talk will be about model based boosting. Originally, boosting is an algorithm from the field of machine learning. It was further developed to fit statistical regression models, like linear models, generalized linear models and quantile regression models. Boosting can be used in high-dimensional data settings and inherently does variable selection. The first part of the talk will give some background information on boosting and explain the basic ideas. The second part will be on the practical use of the R package mboost, which provides a flexible toolbox to boost regression models.
Materials
UCSP: Universal Client-Side Paradata
more
October 05, 2016
Abstract
The talk will inform about the collection of online paradata using the universal client-side paradata script (UCSP). To see which data are collected on the fly in the GESIS panel, check out the documentation at: http://kaczmirek.de/ucsp/ucsp.html Also, we will hear about EvalAnswer, a tool that helps you automatically code non-response in open questions. This in turn can be used to trigger conversion attempts in online surveys as well as to assign nonresponse codes to open answers in existing survey data sets.
Three easy-to-learn tools to scrape data from the Web with R
more
June 15, 2016
Abstract
This workshop shows how to
- use regular expression to extract data from raw text (or websites)
- use XPath for static webpage scraping
- tap APIs from within R
- scrape data from dynamic webpages (i.e. JavaScript-generated content) using AJAX and Selenium
Materials