[1] 15
I am a Political Scientist
A comparativist who studies political parties and elections
Also interested in research methods and design
I just got here…official start date was August 15th
(Affective) Polarization, using Surveys, Observational Data and Experiments
Political Parties - Using Experiments, Mixed Methods and Text-as-Data/NLP
Machine Learning models, especially as applied to political text and speech.
Now let’s talk about you for a minute
The goals of this course are:
Introduce the foundations of quantitative social science (aka computational social science)
Measurement Theory
Probability Theory
Descriptive and Causal Inference
Link QSS foundations to policy relevant questions
Introduce the R coding language for data analysis
Department is updating graduate methods sequence
In practice, this is the first course in a sequence
We will cover roughly 101 level stats –> OLS Regression. Roughly 2 semesters of (rigorous) UG stats.
Spring 2025: advanced regression models + design based techniques for causal inference (DiD, Experiments, etc)
Currently no ML/AI or Text courses, but if you have interest stop by
Not Quant vs Qual
Quant and qual
Mixed methods research often stronger than pure qual or pure quant
New techniques –> wider range of research questions
Source: United States Bureau of Labor Statistics
Early on, the focus is on getting up and running in R
Measurement Theory -> Probability Theory -> Regression
Will require some math
Calculus (derivatives, integrals, limits) + a little matrix algebra
We will review! Not a bad idea to check out Khan Academy or similar if rusty/new
Grad school –> Learning is your responsibility, be proactive.
This course will be hard but…
Don’t stress, grad school is not about grades!
If you are lost….stop me and ask questions
Office hours: By appointment (online or in-person). Please utilize!
Please read! Tons of online resources for both statistics and coding. If you don’t like the readings, feel free to supplement w/ something else.
Jospeh Blitzsten and Jessica Hwang, Introduction to Probability
Hadley Wickham et al, R for Data Science (second edition)
Blackwell (Free) or Gelman et al ($ but more user friendly) for Regression
Lauderdale for Measurement
My plan is, once we have a solid foundation, to add roughly one to two social science articles a week, so you can see the connection to doing good research
What the readings are will, in part, depend on your interests. I have a few in mind, but the goal is to draw connections to topics you care about
20% Attendance and Participation
25% Midterm Exam (Likely Take Home)
25% Problem sets (3 or 4)
30%: Final Project (Research Proposal with Analyses)
This course has two main aims
Teaching the fundamentals of statistics for social scientists
Teaching the fundamentals of coding for data analysis using R
We will start in on the statistics side next time, the rest of today will be devoted to getting up and running in R
Open Source and Free (unlike STATA, SPSS, SAS, etc)
Widely used in academic and government research
Specifically developed for statistical analysis (unlike Python)
Has a friendly IDE, R Studio (Unlike Python imo)
Everything we do can be done in Python if you prefer. I use both for my work.
But…
Please no SPSS/STATA/SAS
Numeric - 0, 1, 2, 2.25, 3.14, -100
Integer - 1,2,3,4,5
Logical - TRUE/FALSE (or T/F)
String - “The small brown fox”
Factor - Categorical (“South Carolina”, “Georgia”, “Florida”) or Ordinal (“Bad”, “Ok”, “Good”)
Date - “9/2/23”, “2024-5-25”, “13/01/01” or “September 1, 2024”
Not Exhaustive, but these are the basics
We often need to install and load packages to use package specific functions
Sometimes, packages you want may not be in CRAN. There will usually be package specific installation instructions.
What about a * 5?
Some data types cannot be combined
Error in a + "cat": non-numeric argument to binary operator
And some probably shouldn’t be combined
Note - ALWAYS verify that your output makes sense and that your code is performing the correct operations. Errors are ok - but sometimes you’ll make mistakes that don’t cause errors.
Matrices are just multiple vectors of the same length combined, with dimension row x column
Data frames solve the problems with matrices, and are much better for data manipulation.
We can view the data frame by either calling it directly from the code, or accessing it from our environment panel in Rstudio.
Access specific columns with $
Stack Exchange- Active community of R users giving advice. Can also pose questions to the community.
R for Data Scientists - Free book by the creator of the Tidyverse suite of packages.
GPT - GPT is pretty good at R. Will make mistakes, dangerous if you don’t know what you are doing!
Comment your code!
Real example of some code to create a figure