Introduction to gerda

Overview

The gerda package provides functions to access and work with GERDA datasets. The German Election Database (GERDA) provides data on German elections spanning federal elections (since 1953 at the county level, 1980 at the municipal level), state (Landtag) elections, local (Kommunal) elections, mayoral (Bürgermeister) elections, European Parliament elections, and county (Kreistag) elections. All election datasets include turnout and vote shares for all major parties. GERDA also supplies geographically harmonized datasets that account for changes in municipal boundaries and mail-in voting districts.

In addition to election results, the package provides county-level socioeconomic covariates from INKAR, municipality-level data from the German Census 2022, and a party crosswalk that maps GERDA party names to standardized ParlGov attributes.

GERDA was compiled by Vincent Heddesheimer, Florian Sichart, Andreas Wiedemann and Hanno Hilbig. For additional information, see also the GERDA website (www.german-elections.com) and the accompanying publication: doi.org/10.1038/s41597-025-04811-5

This vignette will introduce you to the main functions of the package and demonstrate how to use them.

Available Datasets

To see a list of all available GERDA electoral result datasets, you can use the gerda_data_list() function:

gerda_data_list()
#> municipal_unharm                 Local elections at the municipal level (1990-2020, unharmonized).
#> municipal_harm                   Local elections at the municipal level (1990-2020, harmonized).
#> municipal_harm_25                Local elections at the municipal level, harmonized to 2025 boundaries.
#> state_unharm                     State elections at the municipal level (2006-2019, unharmonized).
#> state_harm                       State elections at the municipal level (2006-2019, harmonized).
#> state_harm_21                    State elections at the municipal level, harmonized to 2021 boundaries.
#> state_harm_23                    State elections at the municipal level, harmonized to 2023 boundaries.
#> state_harm_25                    State elections at the municipal level, harmonized to 2025 boundaries.
#> federal_muni_raw                 Federal elections at the municipal level (1980-2025, raw data).
#> federal_muni_unharm              Federal elections at the municipal level (1980-2025, unharmonized).
#> federal_muni_harm_21             Federal elections at the municipal level (1990-2025, harmonized to 2021 boundaries).
#> federal_muni_harm_25             Federal elections at the municipal level (1990-2025, harmonized to 2025 boundaries).
#> federal_cty_unharm               Federal elections at the county level (1953-2021, unharmonized).
#> federal_cty_harm                 Federal elections at the county level (1990-2021, harmonized).
#> county_elec_unharm               County (Kreistag) elections at the municipal level, unharmonized.
#> county_elec_harm_21              County (Kreistag) elections, harmonized to 2021 boundaries.
#> county_elec_harm_21_cty          County (Kreistag) elections aggregated to county level, harmonized to 2021 boundaries.
#> county_elec_harm_21_muni         County (Kreistag) elections at the municipal level, harmonized to 2021 boundaries.
#> european_muni_unharm             European Parliament elections at the municipal level, unharmonized.
#> european_muni_harm               European Parliament elections at the municipal level, harmonized.
#> mayoral_unharm                   Mayoral election results at the municipal level, unharmonized.
#> mayoral_harm                     Mayoral election results at the municipal level, harmonized.
#> mayoral_candidates               Mayoral candidates (person-level).
#> mayor_panel                      Mayor panel (person-level, one row per mayor-term).
#> mayor_panel_harm                 Mayor panel (person-level, harmonized to current boundaries).
#> mayor_panel_annual               Mayor panel at annual frequency (one row per municipality-year).
#> mayor_panel_annual_harm          Mayor panel at annual frequency, harmonized to current boundaries.
#> ags_crosswalks                   Crosswalks for municipalities (1990-2025).
#> cty_crosswalks                   Crosswalks for counties (1990-2025).
#> ags_1990_to_2023_crosswalk       Municipality crosswalk: 1990 boundaries to 2023 boundaries.
#> ags_1990_to_2025_crosswalk       Municipality crosswalk: 1990 boundaries to 2025 boundaries.
#> crosswalk_ags_2021_to_2023       Municipality crosswalk: AGS 2021 to AGS 2023 (targeted).
#> crosswalk_ags_2021_2022_to_2023  Municipality crosswalk: AGS 2021 and 2022 to AGS 2023 (targeted).
#> crosswalk_ags_2023_to_2025       Municipality crosswalk: AGS 2023 to AGS 2025 (targeted; RDS only).
#> crosswalk_ags_2023_24_to_2025    Municipality crosswalk: AGS 2023 and 2024 to AGS 2025 (targeted; RDS only).
#> crosswalk_ags_2024_to_2025       Municipality crosswalk: AGS 2024 to AGS 2025 (targeted; RDS only).
#> ags_area_pop_emp                 Crosswalk covariates (area, population, employment) for municipalities (1990-2025).
#> ags_area_pop_emp_2023            Crosswalk covariates (area, population, employment) for municipalities, harmonized to 2023 boundaries.
#> cty_area_pop_emp                 Crosswalk covariates (area, population, employment) for counties (1990-2025).

This function displays a formatted table with the names and descriptions of all available datasets. You can use the file_name column from this output to specify which dataset you want to load using the load_gerda_web() function.

Loading Data

The main function for loading GERDA data is load_gerda_web(). This function allows you to load a specific dataset from a web source. Here’s an example of how to use it:

# Load the municipal harmonized dataset
municipal_harm_data <- load_gerda_web("municipal_harm", verbose = TRUE, file_format = "rds")

The load_gerda_web() function takes the following parameters:

file_name: A character string with the name of the dataset to load, e.g. "federal_cty_harm" (as shown in the gerda_data_list() output). The function supports fuzzy matching, so close misspellings will produce a helpful suggestion.
verbose: If set to TRUE, it prints messages about the loading process (default is FALSE)
file_format: Specifies the format of the file to load, either "rds" or "csv" (default is "rds"). Both formats return the same tibble, so this choice only affects download size and speed.

Example Workflow

Here’s an example of a typical workflow using the gerda package:

List available datasets:

gerda_data_list()
#> municipal_unharm                 Local elections at the municipal level (1990-2020, unharmonized).
#> municipal_harm                   Local elections at the municipal level (1990-2020, harmonized).
#> municipal_harm_25                Local elections at the municipal level, harmonized to 2025 boundaries.
#> state_unharm                     State elections at the municipal level (2006-2019, unharmonized).
#> state_harm                       State elections at the municipal level (2006-2019, harmonized).
#> state_harm_21                    State elections at the municipal level, harmonized to 2021 boundaries.
#> state_harm_23                    State elections at the municipal level, harmonized to 2023 boundaries.
#> state_harm_25                    State elections at the municipal level, harmonized to 2025 boundaries.
#> federal_muni_raw                 Federal elections at the municipal level (1980-2025, raw data).
#> federal_muni_unharm              Federal elections at the municipal level (1980-2025, unharmonized).
#> federal_muni_harm_21             Federal elections at the municipal level (1990-2025, harmonized to 2021 boundaries).
#> federal_muni_harm_25             Federal elections at the municipal level (1990-2025, harmonized to 2025 boundaries).
#> federal_cty_unharm               Federal elections at the county level (1953-2021, unharmonized).
#> federal_cty_harm                 Federal elections at the county level (1990-2021, harmonized).
#> county_elec_unharm               County (Kreistag) elections at the municipal level, unharmonized.
#> county_elec_harm_21              County (Kreistag) elections, harmonized to 2021 boundaries.
#> county_elec_harm_21_cty          County (Kreistag) elections aggregated to county level, harmonized to 2021 boundaries.
#> county_elec_harm_21_muni         County (Kreistag) elections at the municipal level, harmonized to 2021 boundaries.
#> european_muni_unharm             European Parliament elections at the municipal level, unharmonized.
#> european_muni_harm               European Parliament elections at the municipal level, harmonized.
#> mayoral_unharm                   Mayoral election results at the municipal level, unharmonized.
#> mayoral_harm                     Mayoral election results at the municipal level, harmonized.
#> mayoral_candidates               Mayoral candidates (person-level).
#> mayor_panel                      Mayor panel (person-level, one row per mayor-term).
#> mayor_panel_harm                 Mayor panel (person-level, harmonized to current boundaries).
#> mayor_panel_annual               Mayor panel at annual frequency (one row per municipality-year).
#> mayor_panel_annual_harm          Mayor panel at annual frequency, harmonized to current boundaries.
#> ags_crosswalks                   Crosswalks for municipalities (1990-2025).
#> cty_crosswalks                   Crosswalks for counties (1990-2025).
#> ags_1990_to_2023_crosswalk       Municipality crosswalk: 1990 boundaries to 2023 boundaries.
#> ags_1990_to_2025_crosswalk       Municipality crosswalk: 1990 boundaries to 2025 boundaries.
#> crosswalk_ags_2021_to_2023       Municipality crosswalk: AGS 2021 to AGS 2023 (targeted).
#> crosswalk_ags_2021_2022_to_2023  Municipality crosswalk: AGS 2021 and 2022 to AGS 2023 (targeted).
#> crosswalk_ags_2023_to_2025       Municipality crosswalk: AGS 2023 to AGS 2025 (targeted; RDS only).
#> crosswalk_ags_2023_24_to_2025    Municipality crosswalk: AGS 2023 and 2024 to AGS 2025 (targeted; RDS only).
#> crosswalk_ags_2024_to_2025       Municipality crosswalk: AGS 2024 to AGS 2025 (targeted; RDS only).
#> ags_area_pop_emp                 Crosswalk covariates (area, population, employment) for municipalities (1990-2025).
#> ags_area_pop_emp_2023            Crosswalk covariates (area, population, employment) for municipalities, harmonized to 2023 boundaries.
#> cty_area_pop_emp                 Crosswalk covariates (area, population, employment) for counties (1990-2025).

Load a dataset (in this case, the federal elections at the county level, harmonized):

federal_cty_harm <- load_gerda_web("federal_cty_harm", verbose = TRUE)

Joining GERDA Datasets

If you are using add_gerda_covariates() or add_gerda_census(), you can skip this section: the helpers detect the level of your data and use the correct join keys automatically. If you are writing a manual left_join() or merging against other sources, the table below shows which identifier and time columns each family carries.

Dataset family	Geographic id	Time column
`municipal_`, `state_`, `federal_muni_`, `european_muni_`, `mayoral_*`	`ags` (8-digit municipality)	`election_year` (+ `election_date` where available)
`federal_cty_harm`	`county_code` (5-digit county)	`election_year`
`federal_cty_unharm`	`county_code` + `ags` alias (see Deprecations)	`election_year` + `year` alias
`county_elec_*` (with `_cty` suffix)	`county_code` (5-digit county)	`election_year`
`county_elec_*` (without `_cty` suffix)	`ags` (8-digit municipality)	`election_year`
`mayor_panel` / `mayor_panel_harm`	`ags` + `person_id`	`election_date`
`mayor_panel_annual` / `mayor_panel_annual_harm`	`ags` + `person_id`	`year`
`gerda_covariates()` (INKAR, county-level)	`county_code` (5-digit county)	`year` (not `election_year`)
`gerda_census()` (Zensus 2022, municipality-level)	`ags` (8-digit municipality)	time-invariant (2022)
`ags_crosswalks`, `ags_1990_to_2023_crosswalk`, `ags_1990_to_2025_crosswalk`, `crosswalk_ags_*`	Pair of AGS codes at source and target vintages	Vintage is encoded in column names
`cty_crosswalks`	Pair of 5-digit county codes at source and target	Vintage is in column names

Two things to watch for when joining manually:

gerda_covariates() uses year, not election_year. If you merge it directly into federal election data, you need a rename: by = c("county_code" = "county_code", "election_year" = "year").
On municipal-level files, the county column is a county name or partial code, not the 5-digit AGS-based county code. Use substr(ags, 1, 5) to extract the county key when you want to join against county-level data.

County-Level Covariates

The gerda package includes county-level socioeconomic and demographic covariates from INKAR (Indikatoren und Karten zur Raum- und Stadtentwicklung). These covariates can be easily merged with GERDA election data to enrich your analyses. INKAR data is available from 1995 to 2022, so covariates can be matched to federal elections from 1998 onwards (earlier elections fall outside the INKAR coverage window).

Quick Start

The easiest way to add covariates to your election data is using the add_gerda_covariates() function:

library(dplyr)

# Load election data and add covariates
merged <- load_gerda_web("federal_cty_harm") %>%
  add_gerda_covariates()

# Your data now includes 30 county-level covariates!

Under the hood, add_gerda_covariates() merges on county code and election year. It automatically:

Detects whether the input is county-level or municipal-level data (and extracts county codes from municipal AGS codes if needed)
Performs a left join, so all election rows are kept and covariates are added where available
Validates that the required columns (county_code or ags, and election_year) are present

Available Covariates

The covariates dataset includes 30 variables across 10 categories (for the full list of variable names, units, and descriptions, see gerda_covariates_codebook()):

Demographics: Age structure, foreign population, gender composition
Economy: GDP per capita, sectoral composition, enterprise structure
Labor Market: Unemployment rates (overall, youth, long-term)
Education: School completion rates, students, apprentices
Income: Purchasing power, low-income households
Healthcare: Physician density, hospital beds, GP density
Childcare: Coverage rates for under-3 and 3-6 age groups
Housing: Building permits, rent levels, living space
Transport: Cars per capita
Public Finances: Municipal debt, tax revenue

Viewing the Codebook

To see detailed information about each covariate, including units and missing data patterns:

# Get the codebook
codebook <- gerda_covariates_codebook()
print(codebook)

# Find variables with good coverage
library(dplyr)
codebook %>%
  filter(missing_pct < 10) %>%
  select(variable, label, category)

Advanced Usage

For more control, you can access the raw covariates data:

# Get raw covariate data
covs <- gerda_covariates()

# Inspect before merging
summary(covs$unemployment_rate)

# Custom merge
elections <- load_gerda_web("federal_cty_harm")
merged <- elections %>%
  left_join(covs, by = c("county_code" = "county_code", "election_year" = "year"))

Data Coverage

Counties: 400 German counties (Kreise)
Time period: 1995-2022 (annual data)
Election coverage: Elections from 1998 onwards have covariate data

Coverage varies by variable: core indicators (demographics, economy, labor market) are available for all 7 federal election years (1998-2021). Newer INKAR indicators (e.g., childcare, some healthcare variables) are available for 2-3 recent elections only. Consult the codebook’s missing_pct column to check per-variable availability before analysis.

Census 2022 Data

The gerda package includes municipality-level data from the German Census 2022 (Zensus 2022). This cross-sectional snapshot covers approximately 10,800 municipalities and can be merged with any GERDA election dataset.

The main advantage of this covariate data is that it is observed at the municipal level (unlike the county-level INKAR data). This allows for more fine-grained analyses of local election outcomes. However, the census is a single time point (2022), so it does not vary across election years. This means that the resulting merged dataset will have time-invariant covariates, i.e. each municipality receives the same census values for all election years. Users should not conduct analyses that rely on over-time variation in these covariates.

Quick Start

library(gerda)

# Add census data to municipal-level elections
muni_merged <- load_gerda_web("federal_muni_harm_21") |>
  add_gerda_census()

# Also works with county-level data (aggregated from municipalities)
county_merged <- load_gerda_web("federal_cty_harm") |>
  add_gerda_census()

Available Indicators

The census data includes 14 indicators across four categories:

Demographics: Population, age structure (under 18, 18-29, 30-49, 50-64, 65+)
Migration: Migration background share, foreign nationals share
Households: Average household size
Housing: Total dwellings, vacancy rate, ownership rate, average rent per m², single-family home share

Since the census is a 2022 snapshot, the same values are attached to all election years (see also the note above).

Viewing the Codebook

# Get the census codebook
census_cb <- gerda_census_codebook()
print(census_cb)

Data Coverage

Most census variables have >95% municipality coverage. avg_household_size_census22 has approximately 12.5% missing values because Destatis suppresses data for small municipalities under its disclosure rules.

Party Crosswalk Function

The party_crosswalk() function provides a mapping between GERDA party names and standardized party information from the ParlGov database. This is particularly useful for linking GERDA data with other political science datasets or for obtaining standardized party characteristics.

Usage

The function takes two main parameters:

party_gerda: A character vector of GERDA party names
destination: The name of the column from the ParlGov view_party table to map to

Available Mapping Options

You can map GERDA party names to various standardized party characteristics, including:

left_right: Left-right position scores
party_name_english: English party names
party_name_short: Short party names
country_name: Country names
And many other ParlGov variables

Example

# Map GERDA party names to left-right positions
parties <- c("cdu", "spd", "linke_pds", "fdp")
left_right_scores <- party_crosswalk(parties, "left_right")
print(left_right_scores)

# Map to English party names
english_names <- party_crosswalk(parties, "party_name_english")
print(english_names)

This function is especially useful when you want to:

Analyze parties along ideological dimensions
Merge GERDA data with other comparative datasets
Standardize party names across different data sources
Access additional party metadata from ParlGov

Introduction to gerda

Overview

Available Datasets

Loading Data

Example Workflow

Joining GERDA Datasets

County-Level Covariates

Quick Start

Available Covariates

Viewing the Codebook

Advanced Usage

Data Coverage

Census 2022 Data

Quick Start

Available Indicators

Viewing the Codebook

Data Coverage

Party Crosswalk Function

Usage

Available Mapping Options

Example

Conclusion