If you use Reddit data for research (or anything else) you’re probably aware that Reddit submission and comment data are available for download (originally via Pushshift). The files are compressed as Zstandard however it has not been obvious to me the data fields that are included in the .zst files. Hence, below are lists of fields included in the Pushshift files (as far as I’m aware).
Reddit Pushshift Comments fields
all_awardings
associated_award
author
author_created_utc
author_flair_background_color
author_flair_css_class
author_flair_richtext
author_flair_template_id
author_flair_text
author_flair_text_color
author_flair_type
author_fullname
author_patreon_flair
author_premium
awarders
body
can_gild
can_mod_post
collapsed
collapsed_because_crowd_control
collapsed_reason
controversiality
created_utc
distinguished
edited
gilded
gildings
id
is_submitter
link_id
locked
no_follow
parent_id
permalink
quarantined
removal_reason
retrieved_on
score
send_replies
stickied
subreddit
subreddit_id
subreddit_name_prefixed
subreddit_type
total_awards_received
treatment_tags
Reddit Pushshift Submissions fields
all_awardings
allow_live_comments
archived
author
author_flair_background_color
author_flair_css_class
author_flair_template_id
author_flair_text
author_flair_text_color
awarders
can_gild
can_mod_post
category
content_categories
contest_mode
created_utc
discussion_type
distinguished
domain
edited
event_end
event_is_live
event_start
gilded
gildings
hidden
id
is_crosspostable
is_meta
is_original_content
Submissions:
is_reddit_media_domain
is_robot_indexable
is_self
is_video
link_flair_background_color
link_flair_css_class
link_flair_richtext
link_flair_text
link_flair_text_color
link_flair_type
locked
media
media_embed
media_only
no_follow
num_comments
num_crossposts
over_18
parent_whitelist_status
permalink
pinned
pwls
quarantine
removal_reason
removed_by
removed_by_category
retrieved_on
score
secure_media
secure_media_embed
selftext
send_replies
spoiler
stickied
subreddit
subreddit_id
subreddit_name_prefixed
subreddit_subscribers
subreddit_type
suggested_sort
thumbnail
thumbnail_height
thumbnail_width
title
total_awards_received
treatment_tags
url
whitelist_status
wls
P.S. Watchful1 provides useful Python scripts for working with the Zstandard files.
Leave a ReplyCancel reply