The Twitter Parliamentarian Database

The tweet ids from the politicans' tweets have mostly been collected from September 2017 to 31 October 2019. Some countries (UK, US and NL) go back as far as May 2017. In compliance with Twitter's policy, we only store tweet ids, which can be re-hydrated into full tweets using existing tools.

It is recommended that you use the .csv files to work with the data, rather than the SQL tables.

Download

CSV-based dataset
full_member_info.csv 17.8 MB

Information about politicians, their parties, political groups, their electoral systems and their political leanings.

all_tweet_ids.csv 222.2 MB

A list of all tweet ids for tweets made by the politicians. These ids can be used to rehydrate to the full tweet and user information from Twitter.

SQL-based dataset
database.sql 3.1 MB

A MySql database containing politicians, their parties, political groups, their electoral systems and their political leanings.

Database codebook.pdf 130 KB

Information on the relations in the SQL database.

Retweet and mentions networks
Retweets Mentions

Setup

The Twitter Politician Database can be used through the following two files: full_member_info.csv and all_tweet_ids.csv We recommend using readily available tools such as Hydrator or Twarc to hydrate the tweets from the all_tweet_ids.csv into JSON. The user ids provided by Twitter (after hydration of tweet ids) can be used to identify the politician that made the tweet, through the uid column in the full_member_info.csv file.

Examples

Python and the Pandas library can be used to wrangle the data once the tweets have been rehydrated:

import pandas as pd
member_info = pd.read_csv('full_member_info.csv', encoding='utf16', engine='python')
tweets = pd.read_json('json_file', orient='columns')

# View the first 10 rows for each file
member_info.head(10)
tweets.head(10)

# rename the user id from twitter so the two dataframes can be merged
tweets.rename(columns={'id': 'uid'}, inplace=True)

# Please note: Since there are many variables in full_member_info.csv,
# we recommend first filtering for the columns you need before joining the hydrated tweets.
# This can be done in pandas by creating a new dataframe through the following code:
# note that you should always copy the uid column as that is what will identify the
# person who made the tweet
member_info_filtered = member_info[['country', 'name', 'party', 'uid']].copy()

# Join the member information with the tweets 
df = pd.merge(member_info_filtered, tweets, how='left', on='uid')