Methods Bites

2025-09-24 | Roundtable | Claudia Schmiedeberg, Pablo Christmann, Stefanie Wolter, Michael Bergmann, Arne Bethmann

Roundtable Discussion: New Developments in Large-Scale Survey Data in Germany

Hybrid event [A5, 6, Room A231 + Zoom]
September 24, 2025, 13:45-15:15

Abstract

The roundtable brings together researchers from leading institutes and survey programs in Germany including the German Longitudinal Environmental Study (GLEN), the Family Research and Demographic Panel (FreDA), the Research Data Center of the Institute for Employment Research (IAB), and the Survey of Health, Aging and Retirement in Europe (SHARE). The discussion will focus on current innovations, challenges, and opportunities in large-scale survey data from the data producers’ and users’ perspective.

Presenter(s)

Claudia Schmiedeberg is a postdoctoral researcher at the Department of Sociology at LMU Munich. Her research focuses on survey methodology, environmental topics, and couple relationships.

Pablo Christmann is the project coordinator of FReDA – The German Family Demography Panel Study and a postdoctoral researcher at GESIS. His main research interests include political attitudes as well as survey methodology and methods.

Stefanie Wolter is a senior researcher at the Research Data Center of the Federal Employment Agency. She is project head of the Linked Personnel Panel, and responsible for linking enterprise and establishment data. Her research focuses on flexible work and within-firm inequality.

Michael Bergmann is a survey methodologist with a doctorate in social sciences from the University of Mannheim. As part of a joint appointment by htw saar and SBI, he works as head of the Survey Methodology department for the Survey of Health, Ageing and Retirement in Europe (SHARE) and as professor of survey methodology at the Faculty of Social Sciences. His research interests include methods for improving the quality of survey data, the investigation of the effects of different survey modes on data quality in panel studies, and the analysis of interviewer behavior. In terms of content, he is primarily interested in issues of health and care needs in old age.

Arne Bethmann is a survey methodologist and data scientist, Country Team Leader, and Principal Investigator for the German sub-study of SHARE. His research advances data quality and infrastructures with a focus on social inequality, poverty, family dynamics, and health.

Materials

Surveying Diversity: Integrating Queer Perspectives in Survey Research

Hybrid event [A5, 6, Room A231 + Zoom]
April 09, 2025, 13:45-15:15

Abstract

Persistent discrimination against LGBTQI* people, along with recent developments such as rising violence and current backlash in some regions, highlights the urgent need for research on the (changing) living conditions of LGBTQI* people. However, current research on LGBTQI* people is often hindered by data limitations and gaps, and many survey providers and researchers struggle to incorporate queer perspectives into surveys adequately. This talk will give an overview of different methods, opportunities, and challenges of integrating queer perspectives in survey research. Based on practical examples and personal experiences, the talk will provide insights into the measurement of sexual orientation and gender identity, different sampling methods for reaching LGBTQI* people, and questionnaire design beyond heteronormativity. Finally, the talk will highlight the challenges and opportunities of data analysis in light of current research and developments.

Presenter(s)

Lisa de Vries holds a PhD from Bielefeld University and is research associate at the German Institute for Adult Education. Her research focuses on labor market discrimination, sexual and gender diversity, the measurement of sexual orientation and gender/sex, and surveying minority groups.

Materials

Hybrid event [A5, 6, Room A231 + Zoom]
September 18, 2024, 13:45-15:15

Hybrid event [A5, 6, Room A231 + Zoom]
Dezember 06, 2023, 13:45-15:15

Abstract

Power analysis is an essential component of designing experiments. It helps researchers to allocate sufficient resources to data collection, finding the balance between too small and too large N, and is often required for grant proposals. In this workshop, we will first talk about the basics of power calculations and discuss practical considerations. We will then give hands-on-examples on analytical power calculations using the software G*Power, the R package pwr, and will show how to calculate power with simulations.

Presenter(s)

Denis Cohen is a Senior Research Fellow in the Data and Methods Unit at the Mannheim Centre for European Social Research (MZES), University of Mannheim. His research focuses on spatial inequalities, party competition, and political behavior. His methodological interests include quantitative approaches to the analysis of clustered data, strategies for causal identification, and Bayesian statistics.

Alexander Wenz is a Research Fellow in the Data and Methods Unit at the Mannheim Centre for European Social Research (MZES) at the University of Mannheim. His research examines the quality of novel methods of data collection, with a focus on mobile web surveys, smartphone apps, wearable sensors, and digital behavioral data.

Materials

2023-11-15 | Input Talk | Irene Schumm (University of Mannheim) & Ulrich Krieger (University of Mannheim)

Finding, accessing, and re-using research data: The University of Mannheim Research Data Center and BERD@NFDI

Hybrid event [A5, 6, Room A231 + Zoom]
November 15, 2023, 13:45-15:15

Abstract

In the age of data-driven research, the University of Mannheim Research Data Center (DFZ) at the Mannheim University Library bundles together services for researchers interested in research data management. In this talk we will introduce the audience to the FDZ’s core services such as guidelines for Data Management Plans or the data collection via OCR, provide information about its data resources such as the Aktienführer data archive, and (re-)introduce the newest addition to the FDZ: The German Internet Panel data collection infrastructure. In addition, this talk will provide an overview of the University Libraries activity within the BERD@NFDI consortium. The second part of the talk will deal with the NFDI consortium BERD@NFDI which is coordinated by the University Library. BERD@NFDI focuses on the research data management and analysis of unstructured data in Business, Economics and related fields, such as unstructured text from social media, news, images etc. We will present the services under development and how researchers can profit from BERD@NFDI.

Presenter(s)

Irene Schumm is Head of the Research Data Center Department at the Mannheim University Library.

Ulrich Krieger is Coordinator for the BERD@NFDI consortium based at the Mannheim University Library.

Materials

2023-10-18 | Workshop | John 'Jack' Collins (MZES)

A beginner's guide to neural networks for social scientists

Hybrid event [A5, 6, Room A231 + Zoom]
Oktober 18, 2023, 13:45-15:15

Abstract

Neural networks are powerful machine learning algorithms that form the basis of many important technologies, including generative AI and computer vision. However, they are not as straight-forward to implement as many other machine learning techniques, like random forest or logistic regression. If you are a researcher interested in applying neural networks, this tutorial will provide an easy introduction to what neural networks are, how they work, and how you can implement a simple one yourself.

Presenter(s)

John 'Jack' Collins is a PhD candidate at the Mannheim Centre for European Social Research (MZES). Jack holds a Master's in Data Science and his research focuses on applying machine learning to survey methodology. Before coming to Mannheim University for his PhD, Jack was an IT consultant.

Materials

2023-09-20 | Workshop | Paulina Pankowska (Utrecht University)

Estimating and correcting for measurement error using hidden Markov models

Hybrid event [A5, 6, Room A231 + Zoom]
September 20, 2023, 13:45-15:15

Abstract

Hidden Markov models (HMMs) are a group of latent class models that allow for the estimation and correction of measurement error in categorical, longitudinal data. The main advantage of these models is the fact that they do not rely on the availability of an error-free data source that is used as a benchmark to validate error-prone data. Instead, these models make use of the availability of multiple measures of the same indicator over time to extract information about the error from the data itself. In this workshop, I will provide an introduction to HMMs, discuss how they work and how they can be used in practice for measurement error correction. I will also show how standard HMMs can be implemented in R and how more complex specifications can be implemented in specialized software, specifically Latent Gold.

Presenter(s)

Paulina Pankowska is an Assistant Professor at the Sociology Department of Utrecht University. Her research relates primarily to data and methods quality in the social sciences. In 2020 Paulina defended her PhD dissertation titled: 'Measurement error: estimation, correction, and analysis of implications', which investigated the feasibility of using hidden Markov models (a latent variable modelling technique) to account and correct for measurement error in survey and administrative data. The project was conducted in collaboration with Statistics Netherlands

Materials

2022-2023

2023-05-17 | Input Talk | Hannah Bucher, Anne-Kathrin Stroppe, Axel Burger

The GLES Open Science Challenge 2021: A pilot project on the applicability of registered reports in quantitative political science

Hybrid event [A5, 6, Room A231 + Zoom]
Mai 17, 2023, 13:45-15:15

Abstract

The GLES Open Science Challenge 2021 was a pioneering initiative in quantitative political science. It aimed at increasing the adoption of replicable and transparent research practices. The project combined the rigor of registered reports-a new publication format in which studies are evaluated prior to data collection/access and analysis-with quantitative political science research in the context of the 2021 German federal election. In this presentation, we first elaborate on why more transparent research practices are necessary to guarantee the cumulative progress of scientific knowledge and how registered reports can contribute to increasing the transparency of scientific practice. Next, we present the GLES Open Science Challenge as an example how registered reports on the basis of secondary data are applicable. Finally, we reflect on (a) special challenges of preregistration and Registered Reports for research based on secondary data, (b) lessons learned in the course the GLES OSC and (c) discuss potential future developments in this area.

Presenter(s)

Hannah Bucher is a PhD student in survey research at the University of Mannheim and a research associate at GESIS – Leibniz-Institute for the Social Sciences at the German Longitudinal Election Study (GLES). Together with Anne Stroppe and Axel Burger, she co-organized and co-edited the GLES Open Science Challenge 2021.

Anne-Kathrin Stroppe is a PhD student in political science and a research associate at GESIS - Leibniz-Institute for the Social Sciences for the German Longitudinal Election Study (GLES). Together with Hannah Bucher and Axel Burger, she co-organized and co-edited the GLES Open Science Challenge 2021.

Axel Burger is a social psychologist with a research focus on political psychology and works as a postdoctoral researcher at GESIS - Leibniz-Institute for the Social Sciences in the team of the German Longitudinal Election Study (GLES). Together with Hannah Bucher and Anne Stroppe, he co-organized and co-edited the GLES Open Science Challenge 2021.

Materials

2023-04-26 | Input Talk | Reinhard Schunck (University of Wuppertal) & Nora Huth-Stöckle (University of Wuppertal)

Multiverse analysis

Hybrid event [A5, 6, Room A231 + Zoom]
April 26, 2023, 13:45-15:15

Abstract

Data analysis involves many decisions, including study design, data preparation, and statistical model selection. However, a single analysis represents only one of many possible outcomes, raising questions about the impact of undocumented and at times arbitrary choices. Multiverse analysis addresses this issue by conducting all---or a large set of---meaningful analyses and presenting the results in summary form to assess the robustness of conclusions to alternative modeling decisions. The approach addresses two fundamental problems in research: the lack of transparency and the dependence of analysis results on data-analytic decisions. We will also discuss how to implement the approach, it's advantages over more traditional analysis approaches, as well as limitations and open challenges, including statistical inference and computational requirements.

Presenter(s)

Reinhard Schunck is Professor of Sociology at the University of Wuppertal. He works primarily in the field of social stratification and inequality, concentrating on migration and family related processes, and has a focus on quantitative methods.

Nora Huth-Stöckle is a doctoral student and works at the University of Wuppertal. Her research interests comprise intergroup relations, educational inequality, and quantitative methods.

Materials

2023-03-29 | Input Talk | Christopher Klamm (University of Mannheim)

Transformer-based language models

Hybrid event [A5, 6, Room A231 + Zoom]
März 29, 2023, 13:45-15:15

Abstract

Transformer-based models have recently gained much attention, especially with the release of ChatGPT. Since 2017, deep learning models based on the Transformer architecture have become an important research tool. Their development and application in various fields, including the social sciences, continue to expand. In this talk, we will examine the components that make up these language models and explore how to train state-of-the-art models with HuggingFace for your research. We will also discuss these models' limitations and open challenges, including open-source availability, the growing need for resources, responsibility, and more.

Presenter(s)

Christopher Klamm is an interdisciplinary researcher at the University of Mannheim (Germany) at the Data and Web Science Group working at the intersection of Natural Language Processing and Computational Political Science.

Materials

2023-03-08 | Input Talk | Zaza Zindel (Bielefeld University)

Social media ads for web survey participant recruitment

Hybrid event [A5, 6, Room A231 + Zoom]
März 08, 2023, 13:45-15:15

Abstract

The growing proportion of the global population active on so-called social media platforms opens up new opportunities for survey researchers. A novel approach uses ads on, e.g., Facebook, Instagram, or Twitter to recruit participants for web surveys. Given the growing number of studies that used these platforms, social media appears to be a good resource for promoting surveys and recruiting participants. Regularly cited benefits include reaching large numbers of respondents in a short time at a low cost. Nonetheless, this approach presents a number of challenges, including under-coverage and self-selection, fraud and fake interviews, and problems with weighting survey data.
This workshop will provide insight into the use of social media platforms for survey recruitment. Using various application examples from the field of social research, the potential but also points of criticism will be highlighted. In addition, a fictitious example is used to systematically guide the audience through the most important steps before, during, and after such participant recruitment, thus providing valuable tips for future applications.

Presenter(s)

Zaza Zindel is a doctoral researcher and research assistant in sociology at Bielefeld University. Her research interests include survey methodology, social media and its potential for empirical social research, and social inequalities in general.

Materials

2023-02-22 | Workshop | Andreas Küpfer (TU Darmstadt) & Ruben Bach (MZES)

Getting started with Python: A how-to guide for social scientists (Part II)

Hybrid event [A5, 6, Room A231 + Zoom]
Februar 22, 2023, 13:45-15:15

Abstract

The merits of Python for social scientists become tangible when working on a concrete use case. In this follow-up event of our Social Science Data Lab workshop series on Python we use Jupyter Notebooks in the Google Colab environment to implement a simple machine learning routine for prediction. To do that, we first take a step-by-step look at the peculiarities of Python such as data wrangling and basic visualization techniques. With that knowledge, we delve into the basics of applied machine learning by implementing the pipeline for both a logistic regression as well as a random forest model using the Python package scikit-learn. We conclude this workshop with a brief outlook on more advanced possibilities with Python to lay the foundation for your own research.

Presenter(s)

Andreas Küpfer is a doctoral researcher at the University of Darmstadt. His interdisciplinary research interests include text as data, applying machine learning technologies, and substantial inference in the fields of political communication and political competition.

Ruben Bach is a postdoctoral researcher at the MZES, University of Mannheim, focusing on social science quantitative research methods. His interests include topics related to big data in the social sciences, machine learning, causal inference, and survey research.

Materials

2023-02-15 | Workshop | Ruben Bach (MZES) & Andreas Küpfer (TU Darmstadt)

Getting started with Python: A how-to guide for social scientists (Part I)

Hybrid event [A5, 6, Room A231 + Zoom]
Februar 15, 2023, 13:45-15:15

Abstract

Other than with R, getting started with Python can be burdensome at times as there is no one-stop shop solution like RStudio. Although tons of introductory tutorials for Python are available on the web, navigating and setting up one’s programming environment can be challenging, especially for users with little programming experience. To lower the burden of getting started with Python, we will talk in this workshop about the basics of Python, installing and maintaining virtual environments and the various graphical user interfaces and integrated development environments out there like Jupyter Notebooks, Google Colab, and Anaconda. We show situations where Python may be beneficial for your research and when you may choose to go with R. Please note that this talk is the first part of a two-day workshop in the Social Science Data Lab. In the second event (February 22, 2023), we will focus our attention on implementing a simple machine learning routine in Python.

Presenter(s)

Materials

2022-12-07 | Input Talk | Theresa Gessler (European University Viadrina)

Extracting political data & relations from Wikidata

Hybrid event [A5, 6, Room A231 + Zoom]
Dezember 07, 2022, 13:45-15:15

Abstract

Political research often involves tedious coding of politically relevant data or relations: Political biographies, actors' characteristics or the networks between them. However, often, this data is already available: Wikidata is a free and open knowledge base that collects historical and contemporary (political) facts as relational data. However, researchers often hesitate to use these sources due to technical barriers. This introductory talk introduces Wikidata and its potential uses. It then showcases a work-in-progress application that measures the persistence of the legacies of slave-ownership in British politics. Drawing connections between historical slave-owners and historical and present-day MPs allows to quantify these legacies and their persistence over time. Finally, to enable participants to use Wikidata in their own research, the talk includes a practical part on collecting and using Wikidata with R. To follow the practical applications, please bring a laptop with installations of R, RStudio, and the packages dplyr and tidywikidatar.

Presenter(s)

Theresa Gessler is Junior Professor of Comparative Politics at the European University Viadrina Frankfurt (Oder). Her work centers on conflicts around democracy, immigration, digitalization and patterns of party competition. Next to classical political science methods, her research uses text-as-data, webscraping and various types of digital trace data.

Materials

2022-11-16 | Workshop | Oke Bahnsen (University of Mannheim) & Malte Grönemann (University of Mannheim)

Agent-based modeling for social scientists

Hybrid event [A5, 6, Room A231 + Zoom]
November 16, 2022, 13:45-15:15

Abstract

Agent-based computer simulations have gained increasing popularity in many scientific disciplines in the last two decades. But what are they? And what is their appeal for the social sciences? How can they be used by social scientists? In this introductory talk, we first give an overview over the definition and origin of agent-based models and their relation to other types of computer simulation. Second, we show the appeal and usage depending on different research goals with examples from our own work in sociology (namely information diffusion and residential segregation) and political science (namely dynamic multiparty competition).

Presenter(s)

Oke Bahnsen is a doctoral researcher and research associate in political science at the University of Mannheim. He studied economics (M.Sc.) as well as mathematics and political science (M.Ed.) in Kiel and Göteborg. His research focuses on coalition politics and electoral behavior. Methodologically, he is interested in using agent-based modeling to study party competition and opinion dynamics, as well as in experimental research conducted both in the laboratory and in large-scale population-based surveys.

Malte Grönemann is a doctoral researcher and lecturer in sociology at the University of Mannheim. He studied sociology, economics, and statistics in Bonn, Cologne, Mannheim and Linköping. His work focuses on complex social systems which he studies using differential equations and agent-based models. Specifically, he currently works on network diffusion and socio-economic residential segregation. Methodically, he is interested in statistics, visualisation as well as data and research quality in the quantitative social sciences.

Materials

2022-10-26 | Input Talk | Eva Achterhold (LMU Munich)

Investigating fairness in data-driven allocation of public resources

Hybrid event [A5, 6, Room A231 + Zoom]
Oktober 26, 2022, 13:45-15:15

Abstract

Data-driven approaches for the allocation of public resources promise to make fast, reliable, cost-efficient and objective decisions. However, there are also concerns about such approaches. For example, data-driven algorithmic profiling in the context of the allocation of labor market support programs led to public outrage in Austria. Fairness concerns were raised, as gender and citizenship were found to influence allocation decisions. Thereby, they bear the risk of disparate treatment. In this workshop, we will provide an introduction to fairness notions in machine learning and discuss the possibilities and limitations of technical approaches. A data-driven profiling system for allocating support to jobseekers will be implemented in Python and provided as executable code snippets. Our aim is to discuss and evaluate fairness metrics in a realistic example. Prior knowledge of Python is not necessary and the libraries used are also available in R.

Presenter(s)

Eva Achterhold is a master's student at the Chair for Statistics and Data Science in Social Sciences and the Humanities at LMU Munich. Her research interests include the study of the socio-cultural impact of AI, especially with regard to discrimination and transparency, and the application of methods to mitigate negative consequences. She is currently working on the topic of fairness in algorithmic decision making in the context of allocating support programs to unemployed individuals.

Materials

2022-10-05 | Input Talk | Erik H. Wang (Australian National University)

Matching methods for causal inference with time-series cross-sectional data

Online-only event [Zoom Meeting]
Oktober 05, 2022, 08:30-10:00

Abstract

Matching methods improve the validity of causal inference by reducing model dependence and offering intuitive diagnostics. While they have become a part of the standard tool kit across disciplines, matching methods are rarely used when analyzing time-series cross-sectional data. We fill this methodological gap. In the proposed approach, we first match each treated observation with control observations from other units in the same time period that have an identical treatment history up to the pre-specified number of lags. We use standard matching and weighting methods to further refine this matched set so that the treated and matched control observations have similar covariate values. Assessing the quality of matches is done by examining covariate balance. Finally, we estimate both short-term and long-term average treatment effects using the difference-in-differences estimator, accounting for a time trend. We illustrate the proposed methodology through simulation and empirical studies. An open-source software package is available for implementing the proposed matching methods.

Presenter(s)

Erik H. Wang is an Assistant Professor in the Department of Political and Social Change (PSC) at the Australian National University. His research interests center on historical political economy, politics of state-building, and bureaucracy as well as statistical methods of causal inference.

Materials

2022-09-21 | Roundtable | Camille Landesvatter, Paul C. Bauer, Lion Behrens, Chung-hong Chan, Marie-Lou Sohnius, Domantas Undzėnas, Lukas Isermann

Application programming interfaces for social scientists: A collaborative review

Hybrid event [A5, 6, Room A231 + Zoom]
September 21, 2022, 13:45-15:15

Abstract

Application Programming Interfaces, short APIs, are a technology that includes a set of tools allowing users to send and receive data or functionality through a documented interface. Nowadays, not only developers but also social scientists make use of APIs where typical use cases consist of systematically querying data that are made available by the API. On this occasion, we want to introduce the website "APIs for social scientists: A collaborative review" (Bauer, Landesvatter and Behrens, 2022) which is a collection of examples of different APIs alongside social science examples. The roundtable will be structured into two parts. First, the current editors of the collaborative review introduce the review and its chapter in more general terms. Second, together with our panelists who have authored several chapters in the review, we will discuss various questions surrounding APIs. This includes use cases, opportunities as well as limitations that APIs bring for social science research questions.

Presenter(s)

Camille Landesvatter is a PhD Candidate in Sociology at the University of Mannheim and research associate at the MZES. Her research includes generalized trust and social cohesion for which she draws on methods including survey experiments and text classification. Camille co-founded the API review, is author of multiple chapters and current editor.

Paul C. Bauer is a postdoctoral fellow at the MZES and previous postdoctoral fellow at the European University Institute. His current research focuses on social and political trust as well as polarization for which he draws on experimental methods and a focus on causal inference, text data and data visualization. The API review is only one of many projects he uses to teach topics of computational social science. Paul is the founder of the API review, contributed multiple chapters and is current editor.

Lion Behrens is a PhD Candidate in Political Science at the Graduate School of Economic and Social Sciences (GESS) at the University of Mannheim and a research associate at the Chair of Quantitative Methods in the Social Sciences. His research includes topics of electoral fraud and legislative behavior alongside statistical modeling. Lion contributed a chapter on the CrowdTangle API to the API review and is current editor.

Online-only event [Zoom Meeting]
März 02, 2022, 13:45-15:15

Abstract

Fielding an online survey with an access panel or crowdsourcing platform can be a quick, flexible, and relatively low-cost method of collecting data from the general population. However, social scientists who want to conduct their own survey for the first time may not know where to begin in planning their data collection. In this talk, we will walk through the process of planning and conducting a survey with an online access panel step by step. We will use our own recent data collection as an example, a survey experiment in Germany conducted as part of a replication seminar at the University of Mannheim in 2021. In addition to topics like ethical approval, sampling, and choosing a survey provider, we will also discuss how researchers can work reproducibly by pre-registering their designs and sharing data and code.

Presenter(s)

Johanna Gereke is a postdoctoral research fellow at the Mannheim Centre for European Social Research (MZES). Her current research focuses on intergroup relations, migration, discrimination and cooperative behavior in modern societies and draws on a range of experimental and quasi-experimental methods, including original lab-in-the-field, survey and field experiments.

Joshua Hellyer is a doctoral researcher at the Mannheim Centre for European Social Research (MZES). His research focuses on discrimination against ethnic and sexual minorities, particularly in the housing and labor markets.

Materials

2021-12-08 | Input Talk | Denis Cohen (MZES)

Getting the most out of comparative vote switching data: A new framework for studying dynamic multi-party competition

Online-only event [Zoom Meeting]
Dezember 08, 2021, 13:45-15:15

Abstract

Large literatures on party competition and voting behavior focus on voter reactions to parties' policy strategies, agency, or legislative performance. While many inquiries make explicit assumptions about the direction and magnitude of voter flows between parties, comparative empirical analyses of vote switching remain rare. In this talk, I present a new approach that overcomes three challenges that have previously impeded the comparative study of dynamic party competition based on voter flows: A newly compiled data set that marries comparative vote switching data with information on party behavior and party systems in over 200 electoral contexts across 36 OECD countries, a novel conceptual framework for studying how party behavior affects voter retention, defection, and attraction in multi-party systems, and a statistical model that renders this framework operable. An applied walkthrough showcases the data set and a newly developed R package for the estimation of the newly developed statistical model, along with functions for the calculation and visualization of substantively meaningful quantities of interest.

Presenter(s)

Denis Cohen is a postdoctoral fellow in the Data and Methods Unit at the Mannheim Centre for European Social Research (MZES), University of Mannheim. His research focus lies at the intersection of political preference formation, electoral behavior and political competition. His methodological interests include quantitative approaches to the analysis of clustered data, measurement models, data visualization, strategies for causal identification, and Bayesian statistics.

Materials

video recording

2021-11-24 | Input Talk | Fabienne Lind (University of Vienna)

Multilingual Automated Text Analysis for Comparative Social Science Research

Online-only event [Zoom Meeting]
November 24, 2021, 13:45-15:15

Abstract

Automated text analysis methods have become popular in computational social science. They appeal as they promise the automated extraction of meaning from large numbers of documents, thus allowing to better understand the contents and, indirectly, the document creators and audiences.While the existing techniques are well established for English-language text, the situation is different when it comes to the study of text in more than one language and in languages other than English. Yet it is precisely these multilingual techniques that are needed for (country) comparative research designs. This workshop will start to motivate the need for comparative social science studies that base their interpretations on text data. The main part will provide guidance and many practical tips to help plan such research designs. In particular, it will cover considerations related to the definition of comparative research goals, the selection of a case comparative text data set, the definition of concepts, and the creation of a human annotated validation baseline. The workshop will then focus on methodological strategies that can be employed to obtain measurements from a multilingual corpus with automated text analysis methods. All steps will be illustrated with an applied example. The workshop materials, including slides and scripts, will be made available on GitHub.

Presenter(s)

Fabienne Lind is a research associate at the Department of Communication at the University of Vienna as a part of the H2020 project OPTED. Her research interests include political communication and quantitative methods with a focus on quantitative text analysis.

Materials

2021-11-03 | Roundtable | Ruben Bach, Jörg Dollmann, Jennifer Eck, Alejandro Ecker, Johanna Gereke

MZES Roundtable "Collection of Micro-level Data"

Internal online-only event. Open to MZES members and external MZES fellows only.
November 03, 2021, 13:45-15:15

Abstract

The MZES Roundtable "Collection of Micro-level Data” provides a forum for exchanging experiences and for pooling knowledge across several MZES research projects based on original individual-level data collections. It brings together colleagues with expertise in different types of micro-level data, including survey data, field, survey, and lab experiments, social media data, and web tracking data. We will discuss various challenges and opportunities of original micro-level data collection efforts at different points in the project cycle, including initial planning, data collection, data protection, research ethics, analysis, and archiving. With this roundtable, we aim to capture the existing body of knowledge from ongoing, planned, and completed projects, stimulate greater exchange between projects, and strengthen intra-institutional synergies and networks.

The workshop will start with short input presentations summarizing the panelists’ data collection efforts, followed by a moderated discussion and an open Q&A with the audience. Please note that this workshop is open to MZES members and affiliates only.

Presenter(s)

Ruben Bach is a postdoctoral researcher at the University of Mannheim, focusing on social science quantitative research methods. His interests include topics related to big data in the social sciences, machine learning, causal inference, and survey research.

Jörg Dollmann is a research fellow at the Mannheim Centre for European Social Research (MZES) and the project coordinator of the panel survey CILS4EU-DE.

Jennifer Eck is a social psychologist at the University of Mannheim, School of Social Sciences. Her research interests include social exclusion, assimilation and contrast, as well as self-concept.

Alejandro Ecker is Assistant Professor in Politics and Communication in Ibero-America at the Heidelberg Center for Ibero-American Studies (HCIAS) and the Faculty of Economics and Social Sciences at Heidelberg University. Combining observational data with experimental and machine learning methods, his research focuses on the effects of political institutions on the behavior of multiparty governments, political parties, and individual politicians and their consequences for citizen behavior and voter attitudes.

2021-10-20 | Input Talk | Will Lowe (Hertie School of Governance)

Whose scale is it anyway?

Online-only event [Zoom Meeting]
Oktober 20, 2021, 13:45-15:15

Abstract

Exploratory text scaling methods are widely used and have natural multidimensional extensions, but their results require careful interpretation and may simply not reflect constructs of substantive interest. While confirmatory methods have also been suggested, e.g. Wordscores, these are under-theorized such that it is not entirely clear how and when they can be expected to work, or how they might extend to multiple dimensions. Since these questions are currently open, I will review some existing practical solutions to confirmatory scaling, suggests some methods not yet in widespread use, and consider how they might address an as-yet unappreciated and startling artifact of exploratory methods.

Presenter(s)

Will Lowe is Senior Research Scientist at the Hertie School. His research spans legislative politics, political economy, and public policy. Methodologically he is interested in statistical models of text and in causal inference.

Materials

2021-09-22 | Input Talk | Sarah Shugars (New York University)

Networks All the Way Down: Assessing Modeling Choices for Political Conversation

Online-only event [Zoom Meeting]
September 22, 2021, 13:45-15:15

Online-only event
November 18, 2020, 15:30-17:00

Abstract

Communities and funding sources are increasingly demanding reproducibility in scientific work. There are now a variety of tools available to support reproducible data science, but choosing and using one is not always straightforward. In this tutorial, we present RENKU: an open-source platform integrating git, Jupyter/RStudio Server, Docker, analysis workflows linked with a queryable knowledge graph.

Presenter(s)

Christine Choirat is the Chief Innovation Officer of the Swiss Data Science Center and an Adjunct Lecturer on Biostatistics at the Harvard T.H. Chan School of Public Health and at the Harvard Extension School. Her research interests are data science and high-performance computing, reproducible research, and environmental policy and health policy.

Emma Jablonski is a doctoral research in the History of Science Program at UC San Diego, CA, USA. Previously, she worked on systems to facilitate computational molecular dynamics research at D. E. Shaw Research and on exoplanet climate modeling in the astrobiology group at NASA GISS, both in New York City. Her research interests include networks and complexity as applied to life in the universe and also to the flow of scientific information through academia and society.

Materials

2020-11-04 | Input Talk | Marcel Neunhoeffer (University of Mannheim)

Generative Adversarial Nets for Social Scientists

Online-only event
November 04, 2020, 13:45-15:15

Abstract

In this talk I introduce Generative Adversarial Networks (GANs) for Social Scientists. GANs are an innovative neural network architecture where two neural networks adversarially learn arbitrary target distributions. A Generator network learns to produce simulated samples that mimic real data. At the same time, a Discriminator network learns to distinguish between real and simulated data. A GAN is successful in producing simulated data if a Discriminator is maximally uncertain about the origins of the data (real or simulated). GANs achieve impressive results in producing synthetic samples from complex data like images (e.g. cats, faces) or audio data (e.g. voices, songs). In this talk, I introduce current applications of GANs and present my work on their use for Social Science research. In particular, I will cover applications to Multiple Imputation, Small Area Estimation and the Generation of fully Synthetic Data. All applications will be accompanied by hands-on code examples.

Presenter(s)

Marcel Neunhoeffer is a PhD Candidate and Research Associate at the chair of Political Science, Quantitative Methods in the Social Sciences, at the University of Mannheim. His research focuses on political methodology, specifically on the application of deep learning algorithms to social science problems. His substantive interests include data privacy, political campaigns, and forecasting elections.

Materials

2020-10-21 | Input Talk | Stefan Jünger (GESIS)

Management and Analysis of Georeferenced Survey Data

Online-only event
Oktober 21, 2020, 13:45-15:15

Room A-231, A5, 6, 68159 Mannheim
Februar 18, 2020, 12:00-13:30

Abstract

The software environment R is widely used for data analysis and data visualization in the social sciences and beyond. Additionally, it is becoming increasingly popular as a tool for data and file management. Focusing on the latter aspects, we present workflows and best practices for efficient data management in R. Through applied exercises and walkthroughs, participants will learn about (1) the workflow for organizing and conducting complex analyses in R, (2) creating, editing, and accessing directory hierarchies and their contents, (3) data merging, data management and data manipulation using tidy R and base R, and (4) the basics of programming and debugging.

Presenter(s)

Denis Cohen is a postdoctoral fellow in the Data and Methods Unit at the Mannheim Centre for European Social Research (MZES), University of Mannheim, and one of the organizers of the MZES Social Science Data Lab. His research focus lies at the intersection of political preference formation, electoral behavior, and political competition. His methodological interests include quantitative approaches to the analysis of clustered data, measurement models, data visualization, strategies for causal identification, and Bayesian statistics.

Cosima Meyer is a doctoral researcher and lecturer at the University of Mannheim and one of the organizers of the MZES Social Science Data Lab. Motivated by the continuing recurrence of conflicts in the world, her research interest on conflict studies became increasingly focused on post-civil war stability. In her dissertation, she analyzes leadership survival - in particular in post-conflict settings. Using a wide range of quantitative methods, she further explores questions on conflict elections, women's representation as well as autocratic cooperation.

Oliver Rittmann is a PhD Candidate and Research Associate at the chair of Political Science, Quantitative Methods in the Social Sciences, at the University of Mannheim. His research focuses on legislative studies and political representation. His methodological expertise includes statistical modeling, authomated text and video analysis, and subnational public opinion estimation.

Materials

video recording
blog post

2019-12-10 | Input Talk | Ruben Bach (University of Mannheim)

Using Web Logs and Smartphone Records for Social Research

Room A-231, A5, 6, 68159 Mannheim
Dezember 10, 2019, 12:00-13:30

Abstract

In this talk, I will demonstrate how web logs (records of individuals' browsing behavior) and records of smartphone use can be used for social research, for example, to study political views and behaviors. First, I will talk about the question how to obtain such data and how one can extract information about individuals' behavior from web logs. Second, I will present results of my own work (predicting political views and behaviors from web logs) and from other studies that work with similar data (e.g., studies of political polarization and echo chambers in the online world). I will conclude the talk with a short overview of ongoing projects and potentials for future research projects.

Presenter(s)

Materials

2019-11-26 | Workshop | Cosima Meyer (University of Mannheim) & Dennis Hammerschmidt (University of Mannheim)

Introduction to LaTeX and Overleaf

Room A-231, A5, 6, 68159 Mannheim
November 26, 2019, 12:00-13:30

Abstract

The LaTeX workshop offers an introduction, hands-on practices and a template for scientific articles. The aim is to provide the participants with sufficient knowledge of the general set-up of LaTeX to write (future) papers and to cope with common problems. We cover the LaTeX environment, including packages, structure, and commands. This allows to substantially improve the academic workflow. We further provide an originally generated template specifically made for this workshop that can later be used by the participants to get easily started with their projects in LaTeX.

Presenter(s)

Cosima Meyer is a PhD candidate at the Doctoral Center in Social and Behavioral Science of the Graduate School of Economics and Social Sciences, a research associate at the Chair of Political Science IV at the University of Mannheim, and a co-editor of Methods Bites. Her research focuses on conflict studies, particularly post-civil war stability.

Dennis Hammerschmidt is a PhD candidate at the Doctoral Center in Social and Behavioral Science of the Graduate School of Economics and Social Sciences and a research associate at the Chair of Empirical Democracy Research at the University of Mannheim. His research focuses on the alignment structure of states in the international system and the strategic application of foreign aid with a focus on vote-buying in international organizations. His methodological expertise includes general quantitative research, text analysis, and network analysis.

Materials

2019-11-05 | Workshop | Julian Schuessler (University of Konstanz)

Causal Graphs

Room A-231, A5, 6, 68159 Mannheim
November 05, 2019, 12:00-13:30

Abstract

This workshop discusses causal graphs as a fundamental modelling framework and highly useful tool for empirical researchers in the social sciences. Questions addressed in interaction with participants include drawing and interpreting a graph, understanding d-separation, the nature of post-treatment bias and other common mistakes in observational studies, the connection of causal graphs to structural models and potential outcomes, and using them to better understand instrumental variable and mediation analysis.

Presenter(s)

Julian Schuessler is a PhD Student at the Graduate School of Decision Sciences at the University of Konstanz, Germany, where he is also affiliated with the Center for Data and Methods. His research focuses on public support for the European Union, political economy, and quantitative methods. His methodological interests include non-parametric causal inference, especially using graphs, and Bayesian statistics.

Materials

2019-10-15 | Input Talk | Konstantin Gavras (MZES)

Shiny Apps: Development and Deployment

Room A-231, A5, 6, 68159 Mannheim
Oktober 15, 2019, 12:00-13:30

Abstract

Shiny Apps allows developers and researchers to easily build interactive web applications only using the statistical software R. These apps allow R developers to interactively communicate their work to a broader audience in order to facilitate outreach. Since Shiny Apps comes with an extensive backend setup, users do not need extensive web development skills to build and host standalone apps on a homepage. However, for those keen in building beautiful apps, Shiny Apps allows for CSS, html and JavaScript extensions. In this workshop, I introduce the Shiny environment and show important features to develop Shiny apps, which can be used either for data presentation, as a communication tool for results or even as interactive analytical tool. Using the example data sets by R, I introduce the distinction between front-end ui.R and back-end server.R required to build Shiny apps. Based upon this, I will introduce important concepts and features to build an interactive app, including control widgets, reactivity and rendering. The participants will be able to build their own Shiny App after this workshop. In the last part of the workshop, I am going to show two ways of deploying Shiny Apps (letting them run in the world wide web), shinyapps.io and Shiny Server.

Presenter(s)

Konstantin Gavras is a Ph.D. candidate at the Graduate School of Economic and Social Sciences in Political Science, research associate at the Chair of Political Psychology at the University of Mannheim and doctoral researcher for the MZES project "Fighting together, moving apart? European common defence and shared security in an age of Brexit and Trump". His research interests comprise the intersection of Social Psychology and Political Behavior, focusing on the behavioral consequences and conditions underlying political attitudes regarding both domestic and foreign policies.

Materials

2019-09-16 | Input Talk | Florian Foos (LSE)

Randomized Experiments and Randomization Inference

Room A-231, A5, 6, 68159 Mannheim
September 16, 2019, 15:30-17:00

Abstract

Randomization inference is a design-based approach to hypothesis testing, which relies on minimal assumptions and enables the researcher to "analyse as you randomize". Randomization inference considers what would have happened under all possible random assignments (all possible ways of assigning N number of units to treatment and control). Against the backdrop of all possible random assignments, is the actual experimental result unusual, and how unusual is it? Randomization inference is flexible and allows for the test of different sharp hypotheses, using a variety of test-statistics to obtain p-values, which have an intuitive interpretation: the share of random assignments that produce a test statistic as large or larger than the statistic obtained from the realised experiment. Randomization-inference-based p-values can differ from p-values obtained from conventional tests if samples are small and/or if test-statistics are not normally distributed. During the workshop, building on the potential outcomes framework, I will introduce participants to the logic of randomization inference, and discuss applied examples both on the white board and using the ri2 package in R.

Presenter(s)

Florian Foos is an Assistant Professor in Political Behaviour in the Department of Government at the London School of Economics and Political Science (LSE). His research focuses on partisan election campaigns, including electoral mobilization, opinion change and political activism of politicians. His methodological expertise includes the design, conduct, and analysis of randomized field experiments as well as natural and quasi-experiments.

Materials

Collecting and Analyzing Twitter Data Using R

März 27, 2019

Abstract

This 90 minute workshop provides an overview about Twitter data and how to collect and analyse it using R. Participants learn how to access Twitter's API in order to collect data for their own research projects. A number of examples illustrate how to preprocess and analyse the content and meta-information of Tweets.

Presenter(s)

Simon Kühne is a post-doc at Bielefeld University. He holds a BA in Sociology and an MA in Survey Methodology from the University of Duisburg-Essen and a PhD in Sociology from Humboldt University of Berlin. His research focuses on survey methodology, social media and online data, and social inequality.

Materials

2019-02-27 | Roundtable | Konstantin Gavras, Samuel Müller, Marius Sältzer

Roundtable on Text as Data (Part II)

Februar 27, 2019

Abstract

Marius Sältzer: Sentiment Analysis for German Tweets by Election Candidates
Samuel Müller: Automated Extraction of Reasoning Using Topic Models
Konstantin Gavras: Inferring Policy Preferences from Strategy Papers on National Security in Europe using Unsupervised Machine Learning Technique

2019-01-30 | Workshop | Cornelius Puschmann (Leibniz Institute for Media Research Hamburg)

Advancing Text Mining with R and quanteda

Januar 30, 2019

Abstract

The usefulness of R for text mining and content analysis has greatly increased in recent years, especially following the release of specialized packages such as tm, stringr and tidytext. My interactive presentation will focus on quanteda, which has rapidly become a all-purpose framework for conducting text mining with R due to its high functionality, speed and quality of documentation. I will showcase a number of techniques from corpus compilation and cleaning to the application of dictionaries such as LIWC and Lexicoder Policy Agendas and the application of text scaling models such as Wordscores and Wordfish. I will also show how topic modeling and supervised machine learning for extrapolating content categories can be applied through the topicmodels, STM and RTextTools packages, and point to interfaces with external services such as the Microsoft Cognitive Services and Google Cloud Machine Learning API. My presentation will close with suggestions for improving the robustness and reproducibility of content analyses conducted with R.

Presenter(s)

Cornelius Puschmann is a senior researcher at the Leibniz Institute for Media Research in Hamburg.

Materials

2018-12-05 | Workshop | Julian Bernauer (MZES) & Denis Cohen (MZES)

Introduction to R

Dezember 05, 2018

Abstract

This brief introduction to R covers the following topics:

Algebraic operators and transformation
Object types and conversions
Control structures (loops, conditions, etc.)
Writing simple functions
Installing, updating, and using packages
Getting help in R
Data import and export
A glimpse on the tidyverse package
A quick first self-authored package in R

Presenter(s)

Julian Bernauer is a postdoctoral fellow in the Data and Methods Unit at the Mannheim Centre for European Social Research (MZES), University of Mannheim, and a co-organizer of the Social Science Data Lab.

Denis Cohen is a postdoctoral fellow in the Data and Methods Unit at the Mannheim Centre for European Social Research (MZES), University of Mannheim, and a co-organizer of the Social Science Data Lab.

Materials

2018-11-28 | Input Talk | Gavin Abercrombie (University of Manchester)

Topic-centric Sentiment Analysis of UK Parliamentary Debates

November 28, 2018

Abstract

Debate transcripts from the UK House of Commons provide access to a wealth of information concerning the opinions and attitudes of politicians and their parties towards arguably the most important topics facing societies and their citizens, as well as potential insights into the democratic processes that take place within Parliament. In my PhD project, I apply natural language processing and machine learning methods to debate speeches with the aim of determining the attitudes and positions expressed by speakers towards the topics they discuss. In this talk, I will present research on speech-level sentiment analysis and opinion-topic/policy detection in debate motions, as well as ongoing work on compiling a comprehensive review of research from both computer science and social science in this area. I will also discuss the challenges presented and multidisciplinary approaches to the problem, and present ideas for the direction of future investigation.

Presenter(s)

Gavin Abercrombie pursues a PhD in natural language processing at the School of Computer Science, University of Manchester.

Materials

2018-11-21 | Roundtable | Julian Bernauer, Jason Eichorst, Dennis Hammerschmidt, Verena Kunz, Federico Nanni

Roundtable on Text as Data (Part I)

November 21, 2018

Abstract

Dennis Hammerschmid: "Talk and Action in the United Nations - How Text Analysis can Help to Uncover Vote-Buying in the International Arena"
Verena Kunz: "Position Blurring as a Response to Competing Principals? Assessing Speech Clarity in the European Parliament"
Jason Eichorst: "Political Competency Signals in Word Choice"
Julian Bernauer and Federico Nanni: "Cross-Lingual Topical Scaling of Sparse Political Text using Word Embeddings"

2018-05-12 | Input Talk | Chung-hong Chan (MZES)

Fast, cheap, but is it still good? An opinionated guide to crowdsourcing platforms in 2018

Mai 12, 2018

Abstract

In 2008, four prominent Stanford AI researchers published an article "Fast, Cheap - but is it good?" and claimed crowdsourcing can produce very high-quality data for scientific research. A decade has passed and social scientists are picking up the pace to deploy crowdsourcing to collect survey data and conduct content analysis. A new silver bullet is born. In this talk, I will share my experience of using a crowdsourcing platform to conduct a large-scale, multilingual content analysis (a.k.a. crowdcoding). I will briefly go through the promises of those platforms in the literature and then talk about the pitfalls. A realistic conclusion is: it is impossible to obtain both fast, cheap, and good data from those platforms. As in the real life, it is only possible to take at most two out of the three. Sometimes you take none of them.

This talk highlights the general importance of carrying out cognitive pretests before fielding a questionnaire. This is done by presenting examples of untested as well as pretested and improved questions. With regard to cognitive pretesting methods, we provide an introduction to the traditional cognitive interview (e.g., f2f interviewing) and give an overview of current developments (e.g., combining f2f interviews with eye-tracking, conducting cognitive pretests over the Web). Finally, we discuss the pros and cons of these different cognitive pretesting methods and offer practical advice on how to conduct cognitive pretesting projects.

2018-03-14 | Workshop | Denise Traber (University of Lucerne)

Quantitative Analysis of Political Text: Tools and Applications

März 14, 2018

Abstract

The workshop introduces concepts and methods for the quantitative analysis of political text (QTA) in R. Speeches delivered by prime ministers during the Euro-Crisis (EUSpeech dataset) serve as an application for the demonstration of text preparation, visualization, scaling, topic models and sentiment analysis. After an introduction of the text corpus and a brief discussion of QTA methods, the participants have the opportunity to carry out some QTA themselves under the instructors' supervision.

Presenter(s)

Denise Traber is a Senior Research Fellow at the University of Lucerne, Switzerland, where she heads an Ambizione research grant project on "The divided people: polarization of political attitudes in Europe" funded by the Swiss National Science Foundation. She has a strong interest in quantitative text analysis, has co-organized the first "Zurich Summer School for Women in Political Methodology" in 2017 and has recently published the article "Estimating Intra-Party Preferences: Comparing Speeches to Votes" in PSRM, jointly with Daniel Schwarz and Ken Benoit.

Materials

2018-02-21 | Input Talk | Chung-hong Chan (MZES)

Introduction to Social Media's RESTful APIs and data collection with SocialMediaLab

Februar 21, 2018

Abstract

In this talk, I will demonstrate how to collect data from social media. I will walk through how RESTful API works and how to obtain API access rights from Facebook, Twitter and Youtube (optional topic: Sina Weibo). The R package SocialMediaLab will be introduced, which is a easy tool for social media data collection and data transformation.

Materials

2017-11-29 | Workshop | Nate Breznau (MZES) & Christiane Grill (MZES)

Introduction to Structural Equation Modeling

November 29, 2017

Abstract

In our talk we will introduce participants to the techniques of structural equation modeling (SEM). We will show how a theoretical model represented through measurement models and possibly causal relationships can be applied to empirical data. The talk presents basic models relevant for social scientist: we start with exploratory and confirmatory factor analysis (EFA and CFA) and then move on to path models, latent class models and measurement invariance. In our talk we will also show how to use the statistical software Mplus to perform SEM. No previous knowledge of Mplus is required. Workshop participants can download and install Mplus if they want to follow the examples in class. A demo version is available here.

Materials

2017-11-28 | Input Talk | Chung-hong Chan (MZES)

Social Network Analysis with igraph

November 28, 2017

Abstract

This talk introduces the nuts and bots of social network analysis, and how to do it in R using the package igraph. In this talk, I will quickly walk through the concept of graph (social network), the common scenarios of data collection and the usual analysis patterns. Getting up close and personal, I will use the data scraped from the MZES website as an example to demonstrate how to collect, analyze and visualize the MZES collaboration network. Let's find out the most important researchers and fractions in MZES or not.

Materials

2017-10-18 | Workshop | Richard Traunmüller (University of Mannheim)

Visual Inference for the Social Sciences

Oktober 18, 2017

Abstract

This talk introduces a remedy to the criticism frequently voiced against data visualization and exploration: that it may give rise to an over-interpretation of random patterns. A way to overcome this problem is the realization that "visual discoveries" correspond to the implicit rejection of "null hypotheses". The basic idea of visual inference is that graphical displays can be treated as "test statistics" and compared to a reference distribution of plots under the assumption of the null. Visual inference helps us answer the question "Is what we see really there?" By so doing, it seeks to overcome long-standing reservations against visualization as merely "informal" approach to data analysis and the fear that beautiful pictures may in fact not correspond to any meaningful patterns of substantive scientific interest. The talk illustrates the application and benefits of this visual method by drawing on examples from the social sciences. A little lab exercise will encourage participants to try out visual inference in practice using the statistical programming language R.

Materials

2017-10-04 | Workshop | Sebastian Pink (MZES)

Using mainly Stata and increasingly R (and knitr)

Oktober 04, 2017

Januar 18, 2017

Abstract

This one-day course is set out to improve your R skills and make you a more efficient programmer. In particular, you will:

become better at file management with R
learn all about piping operators
understand what functional programming means
get an overview of string processing and regular expressions
get to know new tools that help you tidy data
learn how to manipulate data frames efficiently
be able to routinely split-apply-combine your data
learn to establish a debugging workflow

Materials

2016-12-16 | Workshop | Richard Traunmüller (University of Mannheim)

Data Visualization

Dezember 16, 2016

Abstract

Data visualisation is one of the most powerful tools to explore, understand and communicate patterns in quantitative information. At the same time, good data visualisation is a surprisingly difficult task and demands three quite different skills: substantive knowledge, statistical skill, and artistic sense. The course is intended to introduce participants to a) key principles of graphical perception and analytic design, b) useful visualisation techniques for the exploration and presentation of various forms of data and c) new developments of data visualisation for the social sciences, such as visual inference and visualising statistical models.

Materials

2016-12-14 | Input Talk | Malte Schierholz (MZES)

Fundamentals in Bayesian Statistics

Dezember 14, 2016

Abstract

Besides the frequentist approach to statistical inference, which was dominant in science in the 20th century, another school exists: Bayesian Statistics. With modern computational techniques, Bayesian data analysis has a proven track-record and established itself as an alternative to frequentist procedures. Sometimes, Bayesian techniques can be applied to complex scientific questions where no frequentist solution exists. This talk gives an introduction to Bayesian statistics. While it is not possible to avoid central mathematical formulas and derivations, I concentrate on concepts, intuitive motivations, and interpretations that underlie the Bayesian view. Critical model assumptions are also discussed. Participants will learn when to mistrust a Bayesian analysis and in which situations it may provide new insights.

Materials

2016-11-16 | Input Talk | Eike M. Rinke (MZES)

An Open Science Primer for Social Scientists

November 16, 2016

Abstract

"Open Science" has become a buzzword in academic circles. However, exactly what it means, why you should care about it, and - most importantly - how it can be put into practice is often not very clear to researchers. In this session of the SSDL, we will provide a brief tour d'horizon of Open Science in which we touch on all of these issues and by which we hope to equip you with a basic understanding of Open Science and a practical tool kit to help you make your research more open to other researchers and the larger interested public. Throughout the presentation, we will focus on giving you an overview of tools and services that can help you open up your research workflow and your publications, all the way from enhancing the reproducibility of your research and making it more collaborative to finding outlets which make the results of your work accessible to everyone. Absolutely no prior experience with open science is required to participate in this talk which should lead into an open conversation among us as a community about the best practices we can and should follow for a more open social science.

Materials

2016-10-19 | Workshop | Sarah Brockhaus (LMU Munich)

Statistical Boosting with mboost

Oktober 19, 2016

Abstract

The talk will be about model based boosting. Originally, boosting is an algorithm from the field of machine learning. It was further developed to fit statistical regression models, like linear models, generalized linear models and quantile regression models. Boosting can be used in high-dimensional data settings and inherently does variable selection. The first part of the talk will give some background information on boosting and explain the basic ideas. The second part will be on the practical use of the R package mboost, which provides a flexible toolbox to boost regression models.

Materials

2016-10-05 | Input Talk | Lars Kaczmirek (University of Vienna)

UCSP: Universal Client-Side Paradata

Oktober 05, 2016

Abstract

The talk will inform about the collection of online paradata using the universal client-side paradata script (UCSP). To see which data are collected on the fly in the GESIS panel, check out the documentation at: http://kaczmirek.de/ucsp/ucsp.html Also, we will hear about EvalAnswer, a tool that helps you automatically code non-response in open questions. This in turn can be used to trigger conversion attempts in online surveys as well as to assign nonresponse codes to open answers in existing survey data sets.

2016-06-15 | Workshop | Simon Munzert (MZES)

Three easy-to-learn tools to scrape data from the Web with R

Juni 15, 2016

Abstract

This workshop shows how to

use regular expression to extract data from raw text (or websites)
use XPath for static webpage scraping
tap APIs from within R
scrape data from dynamic webpages (i.e. JavaScript-generated content) using AJAX and Selenium

Obviously, these are four not three tools. However, regular expressions are never easy to learn, so the title is still valid.

Materials