Study design: Simulation-based power analysis for study design
Aims
By the end of the session you will learn how to perform:
Simulation of random variables
Power analysis via simulation
Setup
Create a worksheet document
In Week 1 you set up R and RStudio, and an RProject folder (we called it “HSS8005_labs”) with an .R script and a .qmd or .Rmd document in it (we called these “Lab_1”). Ideally, you saved this on a cloud drive so you can access it from any computer (e.g. OneDrive). You will be working in this folder. If it’s missing, complete Exercise 3 from the Week 1 Worksheet.
Create a new Quarto markdown file (.qmd) for this session (e.g. “Lab_7.qmd”) and work in it to complete the exercises and report on your final analysis.
R packages
Code
# Just in case we get errors asking that the package repositories be explicitly set when installing new packages:options(repos =list(CRAN ="http://cran.rstudio.com/"))# Install and load required packages# We can use the {pacman} package to easily install and load several packages:# ({pacman} itself may need installing if not yet installed)pacman::p_load(tidyverse, sjlabelled, easystats, ggformula, ggeffects, marginaleffects, modelsummary, gtsummary,MASS# for the function `mvrnorm` to generate variables with pre-specified covariance structure)
Exercise 1: basic data simulation
Code
rand_norms_10000<-rnorm(n =10000, mean =0, sd =1)hist(rand_norms_10000, xlab ="Random value (X)", col ="grey", main ="", cex.lab =1.5, cex.axis =1.5)
Compare the above distribution with a normal distribution that has a standard deviation of 2 instead of 1.
Sample 10,000 new values in rnorm with sd = 2 instead of sd = 1 and create a new histogram with hist.
To see what the distribution of sampled data might look like given a low sample size (e.g., 10), repeat the process of sampling from rnorm(n = 10, mean = 0, sd = 1) multiple times and look at the shape of the resulting histogram.
Exercise 2: Simple randomised experiment
Work through Example 1 in the Notes, making the following adjustments:
change the seed number for the simulations to 8005
draw 300 samples instead of 100
once you have done the exercise with a normally distributed grade outcome, try another simulation for a Fail/Pass binary outcome and adjust the model functions accordingly
Exercise 3: (advanced) Multivariate distributions
Work through Example 2 in the Notes, and as an exercise, try to expand the work in Exercise 2 above by simulating an additional covariate and modelling joint effects on that data.