Using STATA for the self-controlled case series method

Our tutorial paper explains how to carry out a case series analysis using STATA. The files corresponding to the examples in this paper are given here.

Correction to section 4.3 of the tutorial paper: It is no longer necessary to download the file 'aglm.ado' as the standard STATA command 'xtpoisson' can be used to fit poisson regression models with absorbing factors. The following command is given to fit a standard case series model:

xi: xtpoisson nevents i.exgr i.agegr, fe i(indiv) offset(loginterval)

All the example do-files below now use 'xtpoisson'.

MMR and meningitis in Oxford example

To run the MMR and meningitis in Oxford example detailed in the tutorial paper save these two files in your STATA working directory:

'oxford.dta', the data in a STATA data file.

'oxford.do', the commands in a STATA do file.

Then 'do' the do file (type do oxford in the command window, or open in the STATA do file editor and press Ctrl+D or click on 'Do current file'). Details of what the commands in the file are doing are given in the file itself as well as in the tutorial paper.

Files for ITP and MMR examples in the tutorial paper

Save 'itp.dta', the ITP and MMR data, along with the do files into your STATA working directory.

Multiple risk periods

(Also Multiple events)


Unique and non-independent events


Files for intussusception and oral polio vaccine examples in the tutorial paper

Save 'intuss.dta', the intussusception and OPV data, along with the do files into your STATA working directory.

Event dependent exposure




Covariates and Interactions


Repeat exposures



Multiple exposures


Semi-parametric model

The semi-parametric version of the analysis for the MMR and meningitis in Oxford data is in 'oxford_sp.do' and for the ITP and MMR data it is in 'itp_sp.do'.

The semi-parametric version of analysis 1 of the OPV and intussusception data is more computationally intensive, the do file is 'intuss_sp.do'.

These do files require the day before the start of the observation period to be named sta, the end of the observation period end, the individual identifier indiv and the day the adverse event occurred eventday. Generate the exposure group cut points, naming them excp1, excp2, excp3 etc... and put the number of exposure group cut points - 1 (usually this is the number of exposure groups) into a local macro named nexgr. Exposure group factors are generated as in the parametric models as a list of increasing ordinals with a zero on the end, so if the exposure groups do not necessarily occur one straight after another it will be necessary to correct them using a recode function as described in section 6.6 (repeat exposures) in the tutorial paper.

Analysis for censored, perturbed or curtatiled post-event exposures

These do files were used to fit analysis 4 of the validation study in the paper 'Case series analysis for censored, perturbed or curtailed post-event exposures'. The files use the intussusception and OPV data from the tutorial paper.


'intuss_cens_pseudo.do' this file just contains the pseudo-likelihood. This can be bootstrapped to get confidence intervals, the bootstrap command is shown at the end of intuss_censored.do, though is preceeded with a * so that it will not run, as it is very slow.  

'aglm.ado', 'xtpoisson' and fitting a GLM with absorbing factors

It has been pointed out to us (many thanks to Therese Stukel of the Institute for Clinical Evaluative Sciences, Canada) that it is possible to fit a log-linear model with absorbing factors using the standard STATA command `xtpoisson', the examples above now all use this command rather than `aglm.ado'. However, we have kept the old information on the aglm ado file here:

To use `aglm.ado' save the file which fits a GLM with absorbing factors:


either into 'stata8/ado/base/a' or into your STATA working directory. This file was created by amending the ado file that fits a GLM, 'glm.ado', in STATA 8.

For most of the examples once the data is in the correct format, the case series model can be fitted using:

xi: aglm nevents i.exgr i.agegr, offset(loginterval) family(poisson) irls eform

For those wishing to understand what 'aglm.ado' does, these are the changes made to 'glm.ado' to create 'aglm.ado':

  1. Open 'glm.ado' in a text editor.
  2. On line 2 replace glm with aglm, so that the line reads program aglm, eclass byable(onecall).
  3. First we change the call to regress in the IRLS option to a call to areg. On line 762 replace regress with areg. On line 763 replace [iw=`W' with [aw=`W', delete mse1 and insert absorb(indiv) in its place. Lines 762-3 should now read cap areg `z' `xvars' /* */ [aw=`W'*`wt'/`Wscale'], absorb(indiv) `constant'.
  4. Options for predict differ between regress and areg, so on line 769 replace xb with xbd giving predict double `eta' if `touse', xbd.
  5. The mse1 option does not exist for areg, this re-scales the mean squared error to 1. To re-scale the variance-covariance matrix accordingly, on line 850 replace e(V) with e(V)/(e(rss)/e(df_r)).
  6. The degrees of freedom from the absorbing factors needs to be taken into account. On line 848 add df_a to the end of the line, so that it reads tempname b V df_a. After line 850 add an extra line scalar `df_a' = e(df_a). Add to the end of line 1041 (line 1040 before adding the extra line) -`df_a', so that line 1041 reads local df=`nobs'-`p'-`df_a'.
  7. Save in the relevant directory as type 'all files' with file name 'aglm.ado'.

The absorbing factor must always be called indiv. It would be better to specify the name of the absorbing factor as an option when giving the command, and also to delete the parts of the file that cannot be used.

The self-controlled case series method / Heather Whitaker / updated September 2005