Chapter 11 Solutions

Chapter 1

Q1-1: Introductory Questions

  1. What is a model?
  2. What is a statistical model?
  3. What is an advanced statistical model?
  4. What is a variable?
  5. What is a random variable?

Answer

  1. Quite broad. The Cambridge dictionary lists the following definitions of model (noun)
  • something that a copy can be based on because it is an extremely good example of its type.
  • a person who wears clothes so that they can be photographed or shown to possible buyers, or a person who is employed to be photographed or painted.
  • a physical object, usually smaller than the real object, that is used to represent something.
  • a simple description of a system or process that can be used in calculations or predictions of what might happen.
  1. A statistical model is a set (also known as a family) of (probability) distributions.

  2. Advanced is clearly a subjective word here…

  • Can a model be advanced? If so, what does that mean?
  • Perhaps it is the modelling that is advanced? Requiring more in-depth, expert knowledge, for example.
  • Let’s look at the Aim of the Course: To provide advanced methodological and practical knowledge in the field of statistical modelling, covering a wide range of modelling techniques which are essential for the professional statistician.
  1. and e) A variable in statistics is the same as a variable in mathematics more generally: an unspecified element of a given set whose elements are the possible values of the variable. The difference is that in statistics we place a probability distribution on this space of values. A variable with a probability distribution on its space of values is sometimes called a random variable (although the usual mathematical definition is not in these terms). However, I believe this terminology can be redundant and confusing, and would exercise caution using this terminology, for the following two reasons:
  • Firstly, nothing has happened to the variable. Rather, an extra structure (the probability distribution) has been added to the space of values, and it is this structure that we must focus on for statistics. As a result, just as in mathematics more generally, we can, and will, get away with not distinguishing between a variable and its value, using \(x\), for example, for both. Nevertheless, just as in mathematics more generally, such a distinction is sometimes useful. If it is useful, then a variable will be denoted in upper case: the usual notation \(x\) will be replaced by the notation \(X = x\), both of them indicating that the variable \(X\) has the value \(x\).

  • Secondly, the terminology encourages the notion that there are two types of quantity in the world: “random” quantities, to be represented by “random” variables; and “non-random” quantities, to be represented by “ordinary” variables. With the possible exception of quantum mechanics, which is irrelevant to our purposes, this is false, or, rather, meaningless: try thinking of an empirical test to decide whether a quantity is “random” or “non-random”; there are only quantities, represented by variables. The purpose of probability distributions is to describe our knowledge of the values of these variables.

Q1-2: Types of Variable

Decide whether the following variables are categorical or numerical, and classify further if possible:

  1. gender of the next lamb born at the local farm.
  • Answer: categorical, unranked, binary.
  1. number of times a dice needs to be thrown until the first \(6\) is observed.
  • Answer: numerical, discrete.
  1. amount of fluid (in ounces) dispensed by a machine used to fill bottles with lemonade.
  • Answer: numerical, continuous.
  1. thickness of penny coins in millimetres.
  • Answer: numerical, continuous.
  1. assignment grades of a 3H Mathematics course (from A to D).
  • Answer: categorical, ranked.
  1. marital status of some random sample of citizens.
  • Answer: categorical, unranked.

Q1-3: Properties of Probability Distributions

  1. Prove the results for the expectation and covariance structure of the multinomial distribution stated in Section 1.4.3.1.

Answer: Consider first a Multinoulli random variable \(\boldsymbol{W}\), that is, a random variable following a Multinomial distribution with \(n = 1\), parameterised by probability vector \(\boldsymbol{\pi} = (\pi_1,...,\pi_k)\). Note that \(\boldsymbol{W}\) is a vector of \(k-1\) \(0\)’s and a single \(1\), with the \(W_j\) taking value \(1\) with probability \(\pi_j\).

Almost by definition, we have that \({\mathrm E}[\boldsymbol{W}] = \boldsymbol{\pi}\).

Now, for \(j = 1,...,k\), we have \[\begin{eqnarray} {\mathrm{Var}}[W_j] & = & {\mathrm E}[W_j^2] - ({\mathrm E}[W_j])^2 \\ & = & {\mathrm E}[W_j] - ({\mathrm E}[W_j])^2 \\ & = & \pi_j(1 - \pi_j) \end{eqnarray}\] For \(j \neq j'\): \[\begin{eqnarray} {\mathrm{Cov}}[W_j,W_{j'}] & = & {\mathrm E}[W_j W_{j'}] - {\mathrm E}[W_j] {\mathrm E}[W_{j'}] \\ & = & 0 - \pi_j \pi_{j'} \\ & = & - \pi_j \pi_{j'} \end{eqnarray}\] where we have used the fact that if \(j \neq j'\), at least one of \(W_j\) and \(W_{j'}\) is equal to \(0\). It therefore follows that \[\begin{equation} {\mathrm{Var}}[\boldsymbol{W}] = \Sigma = \textrm{diag}(\boldsymbol{\pi}) - \boldsymbol{\pi} \boldsymbol{\pi}^T \end{equation}\]

Now, \(\boldsymbol{Y} = \sum_{l=1}^n \boldsymbol{Y}_l\), where \(\boldsymbol{Y}_l, l = 1,...,n\) are i.i.d. multinoulli. Therefore \[\begin{eqnarray} {\mathrm E}[\boldsymbol{Y}] & = & {\mathrm E}[\sum_{l=1}^n \boldsymbol{Y}_l] \\ & = & \sum_{l=1}^n {\mathrm E}[ \boldsymbol{Y}_l ] \\ & = & n \boldsymbol{\pi} \end{eqnarray}\] with the third line from the identically distributed nature of \(\boldsymbol{Y}_l\).

Also, \[\begin{eqnarray} {\mathrm{Var}}[\boldsymbol{Y}] & = & {\mathrm{Var}}[\sum_{l=1}^n \boldsymbol{Y}_l] \\ & = & \sum_{l=1}^n {\mathrm{Var}}[ \boldsymbol{Y}_l ] \\ & = & n \Sigma \end{eqnarray}\] with the second line coming from the independence of \(\boldsymbol{Y}_l\) and the third line from the identically distributed nature of \(\boldsymbol{Y}_l\).

  1. Suppose \(X_1^2,...,X_m^2\) are \(m\) independent variables with chi-squared distributions \(X_i^2 \sim \chi^2(n_i)\). Show that \[\begin{equation} \sum_{i=1}^m X_i^2 = \chi^2(\sum_{i=1}^m n_i ) \end{equation}\]

Answer: This simply comes from noticing that each \(X_i^2\) is a sum of \(n_i\) squared standard normal variables, which when all summed together just becomes a bigger sum of \(\sum_{i=1}^m n_i\) squared standard normal variables. \[\begin{equation} \sum_{i=1}^m X_i^2 = \sum_{i=1}^m \sum_{j=1}^{n_i} Z_{ij}^2 = \chi^2(\sum_{i=1}^m n_i ) \end{equation}\]

  1. Suppose \(X^2\) has the distribution \(\chi^2(n)\). Prove that \[\begin{align} {\mathrm E}[X^2] & = n \\ {\mathrm{Var}}[X^2] & = 2n \end{align}\] Hint: Your solution may require you to assume or show that \({\mathrm E}[Z^4] = 3\), where \(Z \sim{\mathcal N}(0,1)\).

Answer: Note that \(X^2 = \sum_{i=1}^n Z_i^2\) with \(Z_i \sim {\mathcal N}(0,1)\). Therefore \[\begin{equation} {\mathrm E}[X^2] = \sum_{i=1}^n {\mathrm E}[Z_i^2] = n \end{equation}\] since \({\mathrm E}[Z_i^2] = {\mathrm{Var}}[Z_i] + ({\mathrm E}[Z_i])^2 = 1\).

Also, \[\begin{equation} {\mathrm{Var}}[X^2] = {\mathrm{Var}}[\sum_{i=1}^n Z_i^2] = \sum_{i=1}^n {\mathrm{Var}}[Z_i^2] = n {\mathrm{Var}}[Z_i^2] \end{equation}\] with \[\begin{equation} {\mathrm{Var}}[Z_i^2] = {\mathrm E}[Z_i^4] - ({\mathrm E}[Z_i^2])^2 = 3 - 1^2 = 2 \end{equation}\]

To show that \({\mathrm E}[Z_i^4] = 3\), we choose to show that \({\mathrm E}[Z^{n+1}] = n {\mathrm E}[Z^{n-1}]\), which clearly yields the required result. With \(\phi(z) = \frac{1}{2 \pi} e^{-z/2}\), we have \[\begin{eqnarray} {\mathrm E}[Z^{n+1}] & = & \int_{-\infty}^{\infty} z^{n+1} \phi(z) dz \\ & = & \int_{-\infty}^{\infty} z^{n} z \phi(z) dz \\ & = & - \int_{-\infty}^{\infty} z^{n} \phi'(z) dz \\ & = & -z^n \phi(z) \biggr|_{-\infty}^{\infty} + \int_{-\infty}^{\infty} n z^{n-1} \phi(z) dz \\ & = & 0 + n {\mathrm E}[Z^{n-1}] \end{eqnarray}\] where the third line comes from the fact that \(\phi'(z) = - z \phi(z)\), and the fourth line comes from integration by parts with \(u = z^n\) and \(v' = \phi'(z)\).

Chapter 2

Q2-1: Quickfire Questions

  1. What is the difference between a Poisson, Multinomial and product Multinomial sampling scheme?

Answer: See Sections 2.1.2 and 2.3.

  1. What does an odds of 1.8 mean relative to success probability \(\pi\)?

Answer: An odds of 1.8 means that the probability of success is 1.8 times greater than the probability of failure.

Q2-2: Sampling Schemes

Write down statements for the expectation and variance of a variable following a product Multinomial sampling scheme as described in Section 2.3.3.

Answer: Let \(\boldsymbol{N}_i = (N_{i1},...,N_{iJ})\) and \(\boldsymbol{\pi}_i^\star = (\pi_{1|i},...,\pi_{J|i})\) for \(i = 1,...,I\). Then \[\begin{eqnarray} {\mathrm E}[\boldsymbol{N}_i] & = & n_{i+} \boldsymbol{\pi}_i^\star \\ {\mathrm{Var}}[\boldsymbol{N}_i] & = & n_{i+} (\textrm{diag}(\boldsymbol{\pi}_i^\star) - \boldsymbol{\pi}_i^\star \boldsymbol{\pi}_i^{\star,T}) \\ {\mathrm{Cov}}[\boldsymbol{N}_i,\boldsymbol{N}_{i'}] & = & 0_M \qquad i \neq i' \end{eqnarray}\] where \(0_M\) is a matrix of zeroes.

Q2-3: Fatality of Road Traffic Accidents

Table 10.1 shows fatality results for drivers and passengers in road traffic accidents in Florida in 2015, according to whether the person was wearing a shoulder and lap belt restraint versus not using one. Find and interpret the odds ratio.

Answer: The sample odds ratio is calculated as \[\begin{equation} \frac{n_{11} n_{22}}{n_{12} n_{21}} = \frac{433 \times 554883}{8049 \times 570} = 52.37 \end{equation}\]

Thus, the odds of a road traffic accident being fatal are estimated to be78 52.37 times greater if no restraint is used relative to if one is used.

Q2-4: Difference of Proportions or Odds Ratios?

A 20-year study of British male physicians (Doll and Peto (1976)) noted that the proportion who died from lung cancer was 0.00140 per year for cigarette smokers and 0.00010 per year for non-smokers. The proportion who died from heart disease was 0.00669 for smokers and 0.00413 for non-smokers.

  1. Describe and compare the association of smoking with lung cancer and with heart disease using the difference of proportions.

  2. Describe and compare the association of smoking with lung cancer and heart disease using the odds ratio.

  3. Which response (lung cancer or heart disease) is more strongly related to cigarette smoking, in terms of increased proportional risk to the individual?

  4. Which response (lung cancer or heart disease) is more strongly related to cigarette smoking, in terms of the reduction in deaths that could occur with an absence of smoking?

Answer:

  1. Difference of proportions:
  • Lung cancer: 0.0014 - 0.0001 = 0.0013;

  • Heart disease: 0.00669 - 0.00413 = 0.00256.

Using the difference of proportions, the data suggests that cigarette smoking has a bigger impact on heart disease.

  1. Odds ratio:
  • Lung cancer (L): The sample odds for smokers (S) is given by \(\omega_{L,S} = 0.0014/0.9986\), and the sample odds for non-smokers (N) is \(\omega_{L,N} =0.0001/0.9999\). The sample odds ratio is therefore \(\omega_{L,S}/\omega_{L,N} = 14.02\).

  • Heart disease (H): The sample odds for smokers (S) is given by \(\omega_{H,S} = 0.00669/0.99331\), and the sample odds for non-smokers (N) is \(\omega_{H,N} =0.00413/0.99587\). The sample odds ratio is therefore \(\omega_{H,S}/\omega_{H,N} = 1.624\).

The odds of dying from lung cancer are estimated to be 14.02 times higher for smokers than for non-smokers, whilst the odds of dying from heart disease are estimated to be 1.624 times higher for smokers than non-smokers. Thus, using the sample odds ratio, the data suggests that cigarette smoking has a bigger impact on lung cancer.

  1. For measure based on increased proportional risk, we use the sample odds ratios above; lung cancer has an odds ratio of 14.02 compared to heart disease with an odds ratio of 1.624. Therefore, increased proportional risk to the individual smoker is much higher for lung cancer than heart disease relative to the corresponding risks for a non-smoker.

  2. The difference of proportions describes excess deaths due to smoking. That is, if \(N =\) number of smokers in population, we predict there would be \(0.00130N\) fewer deaths per year from lung cancer if they had never smoked, and \(0.00256N\) fewer deaths per year from heart disease. Thus (based on this study), elimination of cigarette smoking is predicted to have the biggest impact on deaths due to heart disease.

Q2-5: Asymptotic Distribution of X^2

  1. Show that Equation (2.6) in Section 2.4.2 holds.

Answer: To verify Equation (2.6), let \[\begin{equation} W = \Sigma^\star \, (\Sigma^\star)^{-1} \end{equation}\] Then we can verify that \[\begin{eqnarray} W_{jj} & = & \pi_j(1-\pi_j) \left( \frac{1}{\pi_j} + \frac{1}{\pi_k} \right) - \sum_{i = 1, i \neq j}^{k-1} \frac{\pi_i \pi_j}{\pi_k} \\ & = & 1 - \pi_j + \frac{\pi_j}{\pi_k} (1 - \pi_j - \sum_{i = 1, i \neq j}^{k-1} \pi_i) \\ & = & 1 - \pi_j + \frac{\pi_j}{\pi_k} \pi_k \\ & = & 1 \\ W_{jl} & = & \pi_j(1-\pi_j) \frac{1}{\pi_k} - \pi_j \pi_l \left( \frac{1}{\pi_l} + \frac{1}{\pi_k} \right) - \frac{1}{\pi_k} \sum_{i = 1, i \neq j}^{k-1} \pi_i \pi_j \\ & = & \frac{\pi_j}{\pi_k} ( 1 - \pi_j - \pi_l - \sum_{i = 1, i \neq j}^{k-1} \pi_i) - \pi_j \frac{\pi_l}{\pi_l} \\ & = & \frac{\pi_j}{\pi_k} \pi_k - \pi_j \\ & = & 0 \end{eqnarray}\]

  1. Show that Equation (2.7) in Section 2.4.2 holds.

Answer: To verify Equation (2.7) we have that \[\begin{eqnarray} m(\bar{\boldsymbol{Y}} - \pi^\star)^T (\Sigma^\star)^{-1} (\bar{\boldsymbol{Y}} - \pi^\star) & = & m \frac{1}{\pi_k} \sum_{i,j = 1}^{k-1} (\bar{X}_i - \pi_i)(\bar{X}_j - \pi_j) \\ && \,\,\, + \,\, m \sum_{i=1}^{k-1} \frac{1}{\pi_i} (\bar{X}_i - \pi_i)^2 \\ \end{eqnarray}\] where we have that \[\begin{eqnarray} \sum_{i,j = 1}^{k-1} (\bar{X}_i - \pi_i)(\bar{X}_j - \pi_j) & = & \sum_{i = 1}^{k-1} \left( (\bar{X}_i - \pi_i) \sum_{j = 1}^{k-1} (\bar{X}_j - \pi_j) \right) \\ & = & - (\bar{X}_k - \pi_k) \sum_{i = 1}^{k-1} (\bar{X}_i - \pi_i) \\ & = & (\bar{X}_k - \pi_k)^2 \end{eqnarray}\] thus verifying the result.

Q2-6: Maximum Likelihood by Lagrange Multipliers

  1. Consider a Multinomial sampling scheme, and that \(X\) and \(Y\) are independent. We need to find the MLE of \(\boldsymbol{\pi}\), but now we have that \[\begin{equation} \pi_{ij} = \pi_{i+} \pi_{+j} \end{equation}\] The log likelihood is \[\begin{eqnarray} l(\boldsymbol{\pi}) & \propto & \sum_{i,j} n_{ij} \log (\pi_{ij}) \\ & = & \sum_{i} n_{i+} \log (\pi_{i+}) + \sum_{j} n_{+j} \log (\pi_{+j}) \end{eqnarray}\] Use the method of Lagrange multipliers to show that \[\begin{eqnarray} \hat{\pi}_{i+} & = & \frac{n_{i+}}{n_{++}} \\ \hat{\pi}_{+j} & = & \frac{n_{+j}}{n_{++}} \end{eqnarray}\]

Answer: The Lagrange function is \[\begin{eqnarray} \mathcal{L}(\boldsymbol{\pi}, \boldsymbol{\lambda}) & = & l(\boldsymbol{\pi}) - \lambda_1 \bigl( \sum_{i} \pi_{i+} - 1 \bigr) - \lambda_2 \bigl( \sum_{j} \pi_{+j} - 1 \bigr) \\ & = & \sum_{i} n_{i+} \log (\pi_{i+}) + \sum_{j} n_{+j} \log (\pi_{+j}) - \lambda_1 \bigl( \sum_{i} \pi_{i+} - 1 \bigr) - \lambda_2 \bigl( \sum_{j} \pi_{+j} - 1 \bigr) \nonumber \end{eqnarray}\]

Local optima \(\hat{\boldsymbol{\pi}}, \hat{\boldsymbol{\lambda}}\) will satisfy: \[\begin{eqnarray} \frac{\partial \mathcal{L}(\hat{\boldsymbol{\pi}}, \hat{\boldsymbol{\lambda}})}{\partial \pi_{i+}} & = & 0 \qquad i = 1,...,I \\ \frac{\partial \mathcal{L}(\hat{\boldsymbol{\pi}}, \hat{\boldsymbol{\lambda}})}{\partial \pi_{+j}} & = & 0 \qquad j = 1,...,J \\ \frac{\partial \mathcal{L}(\hat{\boldsymbol{\pi}}, \hat{\boldsymbol{\lambda}})}{\partial \lambda_1} & = & 0 \\ \frac{\partial \mathcal{L}(\hat{\boldsymbol{\pi}}, \hat{\boldsymbol{\lambda}})}{\partial \lambda_2} & = & 0 \end{eqnarray}\] which in this case means satisfy: \[\begin{eqnarray} \frac{n_{i+}}{\pi_{i+}} - \lambda_1 & = & 0 \qquad i = 1,...,I \\ \frac{n_{+j}}{\pi_{+j}} - \lambda_2 & = & 0 \qquad j = 1,...,J \\ \sum_i \pi_{i+} & = & 1 \\ \sum_j \pi_{+j} & = & 1 \end{eqnarray}\] and hence \[\begin{eqnarray} n_{i+} & = & \hat{\lambda_1} \hat{\pi}_{i+} \\ \implies \sum_{i} n_{i+} & = & \hat{\lambda}_1 \sum_{i} \hat{\pi}_{i+} \\ \implies \hat{\lambda}_1 & = & n_{++} \end{eqnarray}\] and similarly that \[\begin{equation} \hat{\lambda}_2 = n_{++} \end{equation}\]

Thus \[\begin{eqnarray} \hat{\pi}_{i+} & = & \frac{n_{i+}}{n_{++}} \\ \hat{\pi}_{+j} & = & \frac{n_{+j}}{n_{++}} \end{eqnarray}\]

  1. Using the method of Lagrange multipliers, show that Equation (2.13) of Section 2.4.4.1.1 holds.

Answer:

\[\begin{eqnarray} l(\boldsymbol{\pi}) & = & \sum_{i,j} n_{ij} \log (\pi_{j|i}) \\ & = & \sum_{i,j} n_{ij} \log (\pi_{+j}) \\ & = & \sum_{j} n_{+j} \log (\pi_{+j}) \end{eqnarray}\]

Since \(\sum_j \pi_{+j} = 1\), the Lagrange function is \[\begin{eqnarray} L(\boldsymbol{\pi}, \lambda) & = & l(\boldsymbol{\pi}) - \lambda(\sum_j \pi_{+j} - 1) \\ & = & \sum_{j} n_{+j} \log (\pi_{+j}) - \lambda(\sum_j \pi_{+j} - 1) \end{eqnarray}\]

Local optima \(\hat{\boldsymbol{\pi}}, \hat{\lambda}\) will satisfy: \[\begin{eqnarray} \frac{\partial \mathcal{L}(\hat{\boldsymbol{\pi}}, \hat{\lambda})}{\partial \pi_{+j}} & = & 0 \qquad j = 1,...,J \\ \frac{\partial \mathcal{L}(\hat{\boldsymbol{\pi}}, \hat{\lambda})}{\partial \lambda} & = & 0 \end{eqnarray}\] which implies \[\begin{eqnarray} \frac{n_{+j}}{\hat{\pi}_{+j}} - \hat{\lambda} & = & 0 \qquad j = 1,...,J \\ \sum_{j} \hat{\pi}_{+j} - 1 & = & 0 \end{eqnarray}\] and hence \[\begin{eqnarray} \sum_{j=1}^J n_{+j} & = & \hat{\lambda} \\ \implies \hat{\lambda} & = & n_{++} \end{eqnarray}\] and thus \[\begin{equation} \hat{\pi}_{+j} = \frac{n_{+j}}{n_{++}} \end{equation}\]

Q2-7: Second-Order Taylor Expansion

Show that Approximation (2.15) of Section 2.4.5.1 holds.

Answer: Let \(n_{ij} = \hat{E}_{ij} + \delta_{ij}\), and note that a second-order Taylor expansion of \(\log(1+x)\) about \(1\) is given by \(\log(1+x) \approx x - \frac{1}{2} x^2\).

Then: \[\begin{eqnarray*} G^2 & = & 2 \sum_{i,j} n_{ij} \log \bigl( \frac{n_{ij}}{\hat{E}_{ij}} \bigr) \\ & = & 2 \sum_{i,j} (\hat{E}_{ij} + \delta_{ij}) \log \bigl( 1 + \frac{\delta_{ij}}{\hat{E}_{ij}} \bigr) \\ & \approx & 2 \sum_{i,j} (\hat{E}_{ij} + \delta_{ij}) \bigl( \frac{\delta_{ij}}{\hat{E}_{ij}} - \frac{\delta_{ij}^2}{2\hat{E}_{ij}^2} \bigr) \\ & \approx & 2 \sum_{i,j} \bigl( \delta_{ij} + \frac{\delta_{ij}^2}{2\hat{E}_{ij}} ) \end{eqnarray*}\] up to second-order. Since \(\delta_{ij} = n_{ij} - \hat{E}_{ij}\) and \(\sum \delta_{ij} = 0\), we have that \[\begin{equation} G^2 \approx \sum_{i,j} \frac{(n_{ij} - \hat{E}_{ij})^2}{\hat{E}_{ij}} = X^2 \end{equation}\] as required.

Q2-8: Relative Risk

  1. In 1998, A British study reported that “Female smokers were 1.7 times more vulnerable than Male smokers to get lung cancer.” We don’t investigate whether this is true or not, but is 1.7 the odds ratio or the relative risk79? Briefly (one sentence maximum) explain your answer.

Answer: I would say that 1.7 is the relative risk, as it seems that the statement is claiming that the probability of one event happening is 1.7 times greater than the probability of the other. This by assuming that the definition of vulnerability is with regard to the absolute probability of the event occurring (for one group to another) rather than a ratio of the odds of it occurring. Having said this, there is vaguery in the wording - there is potential for it to be interpreted differently. Such vaguery in this one line alone highlights how we need to be careful to be precise in the phrasing of our results, so as not to confuse, or (deliberately…?) mislead people.

  1. A National Cancer institute study about tamoxifen and breast cancer reported that the women taking the drug were \(45\%\) less likely to experience invasive breast cancer than were women taking a placebo. Find the relative risk for (i) those taking the drug compared with those taking the placebo, and (ii) those taking the placebo compared with those taking the drug.

Answer: i) Defining \(\pi_D\) and \(\pi_P\) to be the probabilities of invasive breast cancer given the drug and the placebo respectively, we have that \[\begin{equation} \pi_D = (1-0.45) \pi_P \implies \pi_D = 0.55 \pi_P \implies \pi_D/\pi_P = 0.55 \end{equation}\]

  1. From part (i), we have that \[\begin{equation} \pi_P/\pi_D = 1/0.55 = 1.82 \end{equation}\]

Q2-9: The Titanic

For adults who sailed on the Titanic on its fateful voyage, the odds ratio between gender (categorised as Female (F) or Male (M)), and survival (categorised as yes (Y) or no (N)) was 11.4 (Dawson (1995)).

  1. It is claimed that “The Probability of survival for women was 11.4 times that for men”. i) What is wrong with this interpretation? ii) What should the correct interpretation be? iii) When would the quoted interpretation be approximately correct?

Answer: i) and ii) The probability of survival for women was not 11.4 times that for men. The (sample80) odds of survival for women (not accounting for other factors) was 11.4 times greater than for men.

  1. Let \(\pi_{YF}, \pi_{NF}, \pi_{YM}, \pi_{NM}\) be the probabilities of survival or not for women and men respectively. The quoted interpretation would be approximately correct when the probabilities of success of both events are small, hence \(\pi_{NF} \approx 1\) and \(\pi_{NM} \approx 1\) so that: \[\begin{equation} \frac{\pi_{YF}/\pi_{NF}}{\pi_{YM}/\pi_{NM}} \approx \frac{\pi_{YF}}{\pi_{YM}} \end{equation}\]
  1. The odds of survival for women was 2.9. Find the proportion of each gender who survived.

Answer: We have that \[\begin{eqnarray} && \frac{\pi_{YF}}{\pi_{NF}} = \frac{\pi_{Y|F}}{\pi_{N|F}} = 2.9 \\ & \implies & \pi_{Y|F} = 2.9(1 - \pi_{Y|F}) \\ & \implies & \pi_{Y|F} = \frac{2.9}{3.9} = \frac{29}{39} \end{eqnarray}\]

We also have that: \[\begin{eqnarray} && \frac{\pi_{Y|F}/\pi_{N|F}}{\pi_{Y|M}/\pi_{N|M}} = 11.4 \\ & \implies & \frac{\pi_{Y|M}}{\pi_{N|M}} = \frac{2.9}{11.4} = \frac{29}{114} \\ & \implies & \pi_{Y|M} = \frac{29}{114} (1 - \pi_{Y|M}) \\ & \implies & \frac{143}{114} \pi_{Y|M} = \frac{29}{114} \\ & \implies & \pi_{Y|M} = \frac{29}{143} \end{eqnarray}\]

Q2-10: Test and Reality

For a diagnostic test of a certain disease, let \(\pi_1\) denote the probability that the diagnosis is positive given that a subject has the disease, and let \(\pi_2\) denote the probability that the diagnosis is positive given that a subject does not have the disease. Let \(\tau\) denote the probability that a subject has the disease.

  1. More relevant to a patient who has received a positive diagnosis is the probability that they truly have the disease. Given that a diagnosis is positive, show that the probability that a subject has the disease (called the positive predictive value) is \[\begin{equation} \frac{\pi_1 \tau}{\pi_1 \tau + \pi_2 (1-\tau)} \end{equation}\]

Answer: As defined in the question, let \[\begin{eqnarray} \pi_1 & = & P(\textrm{positive diagnosis given presence of disease}) = P(T^+|D^+) \\ \pi_2 & = & P(\textrm{positive diagnosis given absence of disease}) = P(T^+|D^-) \\ \tau & = & P(\textrm{presence of disease}) = P(D^+) \end{eqnarray}\]

Then, using Bayes theorem, we have that \[\begin{eqnarray} P(D^+|T^+) & = & \frac{P(T^+|D^+) P(D^+)}{P(T^+)} \\ & = & \frac{P(T^+|D^+) P(D^+)}{P(T^+|D^+) P(D^+) + P(T^+|D^-) P(D^-)}\\ & = & \frac{\pi_1 \tau}{\pi_1 \tau + \pi_2 (1-\tau)} \end{eqnarray}\]

  1. Suppose that a diagnostic test for a disease has both sensitivity and specificity equal to 0.95, and that \(\tau = 0.005\). Find the probability that a subject truly has the disease given a positive diagnostic test result.

Answer: Recall that \[\begin{eqnarray} \textrm{Sensitivity:} & \, & P(T^+ | D^+) = 0.95 \\ \textrm{Specificity:} & \, & P(T^- | D^-) = 0.95 \end{eqnarray}\] and we also have that \(\tau = P(D^+) = 0.005\).

Then \[\begin{equation} P(D^+|T^+) = \frac{0.95 \times 0.005}{0.95 \times 0.005 + (1-0.95)(1-0.005)} = 0.087 \end{equation}\]

  1. Create a \(2 \times 2\) contingency table of cross-classified probabilities for presence or absence of the disease and positive or negative diagnostic test result.

Answer: See Table 11.1.

Table 11.1: Table 11.2: Contingency table of probabilities cross-classifying presence or absence of disease against positive or negative diagnostic test.
Test: Positive Negative Sum
Disease: Presence 0.00475 0.00025 0.005
Disease: Absence 0.04975 0.94525 0.995
Sum 0.05450 0.94550 1.000
  1. Calculate the odds ratio and interpret.

Answer:

\[\begin{equation} r_{12} = \frac{0.00475 \times 0.94525}{0.00025 \times 0.04975} = 361 \end{equation}\]

The odds of a positive test result are 361 times higher for a subject for whom the disease is present than a subject for whom the disease is absent. Equivalently, the odds of presence of the disease are 361 times higher for a subject with a positive test result than a subject with a negative test result.

Q2-11: Happiness and Income

Table 10.3 shows data from a General Social Survey cross-classifying a person’s perceived happiness with their family income.

  1. Perform a \(\chi^2\) test of independence between the two variables.

Answer: Table 11.3 shows the observed data with row and column sum totals.

Table 11.3: Table 11.4: \(n_{ij}\).
Happiness: Not too Happy Pretty Happy Very Happy Sum
Income: Above Average 21 159 110 290
Income: Average 53 372 221 646
Income: Below Average 94 249 83 426
Sum 168 780 414 1362

Table 11.5 shows the estimated cell values under independence.

Table 11.5: Table 11.6: Estimated \(E_{ij}\) values.
Happiness: Not too Happy Pretty Happy Very Happy Sum
Income: Above Average 35.77093 166.0793 88.14978 290
Income: Average 79.68282 369.9559 196.36123 646
Income: Below Average 52.54626 243.9648 129.48899 426
Sum 168.00000 780.0000 414.00000 1362

Table 11.7 shows the \(X_{ij}^2\) values for each cell.

Table 11.7: Table 11.8: \(X_{ij}^2\) values.
Happiness: Not too Happy Pretty Happy Very Happy
Income: Above Average 6.099373 0.3017620 5.416147
Income: Average 8.935086 0.0112936 3.091592
Income: Below Average 32.702862 0.1039235 16.690423

We thus get that \[\begin{equation} X^2 = \sum_{i,j} X_{ij}^2 = 73.352... \end{equation}\]

Comparing to a chi-square distribution with 4 degrees of freedom, we have that \[\begin{equation} P(\chi^{2,\star}_4 \geq 73.352) \leq 0.0005 \end{equation}\] hence the test provides strong evidence to reject \(\mathcal{H}_0\) that the two variables are independent.

  1. Calculate and interpret the adjusted residuals for the four corner cells of the table.

Answer: Adjusted standardised residuals are presented in Table 11.9.

Table 11.9: Table 11.10: Adjusted standardised Residuals.
Happiness: Not too Happy Pretty Happy Very Happy
Income: Above Average -2.973173 -0.9472192 3.144277
Income: Average -4.403194 0.2242210 2.906749
Income: Below Average 7.367666 0.5948871 -5.907023

The top-left and bottom-right cell adjusted residuals provide evidence that fewer people are in those cells in the population than if the variables were independent. The top-right and bottom-left cell adjusted residuals provide evidence that more people are in those cells in the population than if the variables were independent. Although not required for the question, by calculating all residuals, we can see that there is also evidence of more people on average income that are very happy, and less people on average income that are not too happy, than if the variables were independent.

Q2-13: Tea! (Fisher’s Exact Test of Independence)

Regarding the quote (not repeated here to save space) in the corresponding problem from Fisher (1937):

  1. From the text, we know that there are 4 cups with milk added first and 4 with tea infusion added first. How many distinct orderings can these 8 cups to be tasted take, in terms of type.

Answer: There is a total of \(\frac{8!}{4!4!} = 70\) distinct orderings of these cups.

  1. Note that the lady also knows that there are four cups of each type, and must group them into two sets of four (those she thinks had milk added first, and those she thinks had tea infusion added first). Given that the lady guesses milk first three times when indeed the milk was added first, cross-classify the lady’s guesses against the truth in a \(2 \times 2\) contingency table.

Answer: Observe that we are in a sampling scenario with fixed row and column sums. Therefore we have all the information we need to display the results of this experiment in a contingency table, as given by Table 11.11.

Table 11.11: Table 11.12: Cross-classification of guess versus truth in the tea tasting experiment of Fisher (1935).
Truth: Milk first Tea first Sum
Guess: Milk first 3 1 4
Guess: Tea first 1 3 4
Sum 4 4 8
  1. Fisher presented an exact81 solution for testing the null hypothesis \[\begin{equation} \mathcal{H}_0: r_{12} = 1 \end{equation}\] against the one-sided alternative \[\begin{equation} \mathcal{H}_1: r_{12} > 1 \end{equation}\] for contingency tables with fixed row and column sums. What hypothesis does \(\mathcal{H}_0\) correspond to in the context of the tea tasting test described above? Write down an expression for \(P(N_{11} = t)\) under \(\mathcal{H}_0\). Thus, perform a (Fisher’s exact) hypothesis test to test the lady’s claim that she can indeed discriminate whether the milk or tea infusion was first added to the cup.

Answer: \(\mathcal{H}_0\) corresponds to the situation where the lady is purely guessing whether milk or tea infusion was added first (with no ability to discriminate based on taste).

Under \(\mathcal{H}_0\), given \(n_{1+}\), \(n_{+1}\) and \(n_{++}\), we have that \(N_{11} \sim \mathcal{H} g (N = n_{++}, M = n_{1+}, q = n_{+1})\)82 so that: \[\begin{equation} P(N_{11} = t) = \frac{\binom{n_{1+}}{x}\binom{n_{++}-n_{1+}}{n_{+1}-x}}{\binom{n_{++}}{n_{+1}}} \qquad \max(0,n_{+1}+n_{1+}-n_{++}) \leq x \leq \min(n_{+1},n_{1+}) \end{equation}\]

A \(p\)-value is obtained by calculating the sum of the extreme probabilities, where extreme is in the direction of the alternative hypothesis. Let \(t_{obs}\) denote the observed value of \(N_{11}\), then \[\begin{eqnarray} P(N_{11} \geq t_{obs}) & = & P(N_{11} \geq 3) \\ & = & P(N_{11} = 3) + P(N_{11} = 4) \\ & = & \frac{\binom{4}{3}\binom{8-4}{4-3}}{\binom{8}{4}} + \frac{\binom{4}{4}\binom{8-4}{4-4}}{\binom{8}{4}} \\ & = & \frac{17}{70} \end{eqnarray}\]

Hence the test does not provide evidence of the lady’s ability at any standard level of significance.

  1. Suppose the lady had correctly classified all eight cups as having either milk or tea infusion added first. Would Fisher’s exact hypothesis test provide evidence of her ability now?

Answer: You may repeat the above steps of part (c), and see that: \[\begin{equation} P(N_{11} \geq 4) = P(N_{11} = 4) = \frac{1}{70} \end{equation}\] hence the test would provide evidence of the lady’s ability at the 5% level of significance, but the power is such that this test could never provide evidence at the \(1\%\) level of significance. For this, we would need more cups in the test, for example, five of each type - feel free to play around with this scenario. How many cups of each type (assuming there is an even split) would be required such that if the lady misclassifies one cup of each type, the hypothesis test still provides evidence for her ability at the \(1\%\) level of significance?

Q2-13: US Presidential Elections

Table 10.5 cross-classifies a sample of votes in the 2008 and 2012 US Presidential elections. Test the null hypothesis that vote in 2008 was independent from vote in 2012 by estimating, and finding a \(95\%\) confidence interval for, the population odds ratio.

Answer:

We wish to test the following hypotheses: \[\begin{eqnarray} \mathcal{H}_0: & \quad& r_{12} = 1 \\ \mathcal{H}_0: &\quad & r_{12} \neq 1 \end{eqnarray}\]

An estimate for this odds ratio is given by \[\begin{equation} \hat{r}_{12} = \frac{802 \times 494}{34 \times 53} = 219.8602 \end{equation}\]

A \((1-\alpha)\) confidence interval for \(\log r_{12}\) is given by \[\begin{eqnarray*} \log \hat{r} \pm z_{\alpha/2} \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{21}} + \frac{1}{n_{12}} + \frac{1}{n_{22}}} & = & \log 219.8602 \pm 1.96 \sqrt{\frac{1}{802} + \frac{1}{34} + \frac{1}{53} + \frac{1}{494}} \\& = & (4.9480, 5.8380) \end{eqnarray*}\] so that a Wald confidence interval for \(\hat{r}_{12}\) is given by \[\begin{equation} (e^{4.9480}, e^{5.8380}) = (140.9, 343.1) \end{equation}\]

Since \(r_{0,12} = 1\) lies outside of this interval, the test rejects \(\mathcal{H}_0\) at the \(5\%\) level of significance.

Q2-14: Job Security and Happiness

Consider the table presented in Figure 10.1 summarising the responses for extent of agreement to the statement “job security is good” (JobSecOK, or Job Security) and general happiness (Happiness) from the 2018 US General Social Surveys. Additional possible responses of don’t know and no answer are omitted here.

  1. Calculate a minimal set of local odds ratios. Interpret and discuss.
Answer: Cell \((1,1)\) corresponds to the relative difference between not at all true and not too true in the odds of pretty happy to very happy. This is estimated by \[\begin{equation} \hat{r}_{11}^L = \frac{15 \times 47}{21 \times 25} = 1.34. \end{equation}\] Calculating the remaining odds ratios yields Table 11.13.
Table 11.13: Table 11.14: Local Odds Ratios between successive row categories \(i\) and \(i+1\) and column categories \(j\) and \(j+1\), these being indicated in each case by the row and column names.
Happiness: Not very-Pretty Pretty-Very
JobSecOK: not at all-not too 1.34 2.23
JobSecOK: not too-somewhat 1.73 0.90
JobSecOK: somewhat-very 1.68 1.63

In general, all of the local odds ratios are greater than 1, thus suggesting a positive association between consecutive levels of Job Security and Happiness. The exception to this is cell \((2,2)\), which is slightly less than 1.

  1. Calculate a minimal set of odds ratios, treating job security locally and happiness cumulatively. Interpret and discuss.

Answer: Cell \((1,1)\) corresponds to the relative difference between those who say not too happy and those who say either pretty happy and very happy, in the odds of saying not at all true to not too true. This is estimated by \[\begin{equation} \hat{r}_{11}^{C_Y} = \frac{15/(25+5)}{21/(47+21)} = \frac{15 \times (47+21)}{21 \times (25+5)} = 1.619 \end{equation}\]

Calculating the remaining odds ratios yields Table 11.15.
Table 11.15: Table 11.16: Cumulative Odds Ratios between successive row categories \(i\) and \(i+1\), with the cumulative split at \(j\) (this being in the lower combined category) being denoted by the column name.
Happiness: Not very Pretty
JobSecOK: not at all-not too 1.62 2.47
JobSecOK: not too-somewhat 1.68 1.04
JobSecOK: somewhat-very 1.98 1.77

In general, all of the odds ratios are greater than 1, thus suggesting a positive association between consecutive levels of Job Security with the grouped levels of Happiness in each case (regardless of whether Pretty happy is grouped with not too happy or very happy).

  1. Calculate a minimal set of odds ratios, treating job security as a nominal variable (taking very true as the reference category) and happiness as a cumulative variable. Interpret and discuss.

Answer: Cell \((1,1)\) corresponds to the relative difference between those who say not too happy and those who say either pretty happy and very happy, in the odds of saying not at all true to very true. This is estimated by \[\begin{equation} \hat{r}_{11}^{C_Y} = \frac{15/(25+5)}{73/(474+311)} = \frac{15 \times (474+311)}{73 \times (25+5)} = 5.377 \end{equation}\]

Calculating the remaining odds ratios yields Table 11.17.
Table 11.17: Table 11.18: Cumulative Odds Ratios between row categories \(i\) and \(I=4\), with the cumulative split at \(j\) (this being in the lower combined category) being denoted by the column name.
Happiness: Not very Pretty
JobSecOK: not at all 5.38 4.55
JobSecOK: not too 3.32 1.84
JobSecOK: somewhat 1.98 1.77

In general, all of the odds ratios are greater than 1, thus suggesting a positive association between the level of job security with the (grouped) level of Happiness.

  1. Calculate a minimal set of global odds ratios, that is, treating both job security and happiness as cumulative variables.

Answer: Cell \((1,1)\) corresponds to the odds ratio obtained by combining pretty happy and very happy into one category for Happiness, and combining not too true, somewhat true and very true into one category for Job Security. This is estimated by \[\begin{equation} \hat{r}_{11}^{G} = \frac{15 \times (47+21+248+100+474+311)}{(21+64+73) \times (25+5)} = 3.80 \end{equation}\]

Calculating the remaining global odds ratios yields Table 11.19.
Table 11.19: Table 11.20: Global Odds Ratios, with the cumulative split at \(i,j\) (these themselves being in the corresponding lower combined categories) being denoted by the row and column name.
Happiness: Not very Pretty
JobSecOK: not at all 3.80 3.72
JobSecOK: not too 3.03 1.99
JobSecOK: somewhat 2.41 1.90

In general, all of the odds ratios are greater than 1, thus suggesting a positive association between the grouped levels of job security with the grouped levels of Happiness.

  1. Calculate a \(95\%\) confidence interval for the global odds ratio, with the first two categories of each variable being grouped together into one category in each case.
Answer: We can readily adopt the standard formula for a confidence interval of a log odds ratio for the global odds ratio. We just need to work with the \(2 \times 2\) table of counts induced by any accumulation. In our case, we calculate \[\begin{eqnarray*} 15 + 25 + 21 + 47 & = & 108 \\ 5 + 21 & = & 26 \\ 64 + 248 + 73 + 474 & = & 859 \\ 100 + 311 & = & 411 \end{eqnarray*}\] so that the cumulative contingency table is given by Table 11.21.
Table 11.21: Table 11.22: Cumulative contingency table, with the categories grouped according to the row and column names.
Happiness: Not very-Pretty Very
JobSecOK: not at all-not too 108 26
JobSecOK: somewhat-very 859 411

We therefore have that \[\begin{equation} \hat{r}_{22}^G = \frac{108 \times 411}{26 \times 859} = 1.99 \end{equation}\] and the \(95\%\) confidence interval for \(\log r_{22}^G\) is given by \[\begin{equation} \log 1.99 \pm 1.96 \sqrt{\frac{1}{108} + \frac{1}{26} + \frac{1}{859} + \frac{1}{411} } = (0.24, 1.13) \end{equation}\] Thus a confidence interval for \(r_{22}^G\) is given by \[\begin{equation} (e^{0.24}, e^{1.13}) = (1.27, 3.10) \end{equation}\] hence the test rejects the null hypothesis of no association between these combined groups of job security and happiness.83

  1. Perform a linear trend test to assess whether there is any evidence of association between Happiness and Job Security.

Answer: We wish to test: \[\begin{equation} \mathcal{H}_0: \rho_{XY} = 0 \end{equation}\] against \[\begin{equation} \mathcal{H}_1: \rho_{XY} \neq 0 \end{equation}\]

We choose to utilise equally spaced scores set equal to the order of each category, with cross-multplied \(XY\) scores presented in Table 11.23.
Table 11.23: Table 11.24: Cross-multiplied XY scores.
Happiness: Not very Pretty Very
JobSecOK: not at all 1 2 3
JobSecOK: not too 2 4 6
JobSecOK: somewhat 3 6 9
JobSecOK: very 4 8 12

We calculate the following: \[\begin{eqnarray} \sum_{k=1}^{n_{++}} x_k y_k = \sum_{i,j} x_i y_j n_{ij} & = & 1 \times 15 + 2 \times 25 + 3 \times 5 \nonumber \\ && + \, 2 \times 21 + 4 \times 47 + 6 \times 21 \nonumber \\ && + \, 3 \times 64 + 6 \times 248 + 9 \times 100 \nonumber \\ && + \, 4 \times 73 + 8 \times 474 + 12 \times 311 = 10832 \nonumber \\ \sum_{k=1}^{n_{++}} x_k = \sum_{i=1}^4 x_i n_{i+} & = & 1 \times 45 + 2 \times 89 + 3 \times 412 + 4 \times 858 = 4891 \nonumber \\ \sum_{k=1}^{n_{++}} y_k & = & 1 \times 173 + 2 \times 794 + 3 \times 437 = 3072 \nonumber \\ \sum_{k=1}^{n_{++}} x_k^2 = \sum_{i=1}^4 x_i^2 n_{i+} & = & 1^2 \times 45 + 2^2 \times 89 + 3^2 \times 412 + 4^2 \times 858 = 17837 \nonumber \\ \sum_{k=1}^{n_{++}} y_k^2 & = & 1^2 \times 173 + 2^2 \times 794 + 3^2 \times 437 = 7282 \nonumber \end{eqnarray}\]

so that \[\begin{equation} r_{XY} = \frac{1404 \times 10832 - 4891 \times 3072}{\sqrt{1404 \times 17837 - 4891^2}\sqrt{1404 \times 7282 - 3072^2}} = 0.1948... \end{equation}\] and \[\begin{equation} M^2 = (n-1) r_{XY}^2 = 1403 \times 0.1948...^2 = 53.24 \end{equation}\]

We have that \[\begin{equation} P(\chi^2_1 \geq 53.24) \leq 10^{-12} \end{equation}\] hence the test rejects the null hypothesis of independence for any reasonable level of significance, and thus provides evidence of association between job security and happiness.

Q2-15: A New Treatment?

A study investigates the relationship between a new treatment (Treatment A) and a standard treatment (Treatment B) for reducing symptoms of a specific disease. Patients are classified as either having symptoms reduced (Yes) or not reduced (No). The data from the study are as shown in Figure 10.2.

  1. Calculate the sample odds ratio of symptoms being reduced for patients receiving Treatment A compared to those receiving Treatment B. Interpret.

Answer: The sample odds ratio of symptoms being reduced for patients receiving Treatment A compared to those receiving Treatment B is: \[\begin{equation} \hat{r}_{12} = \frac{n_{11} n_{22}}{n_{12} n_{21}} = \frac{50 \times 40}{20 \times 30} = 3.333... \end{equation}\]

Thus, the odds of symptoms being reduced are estimated to be 3.33 times greater under Treatment A relative to under Treatment B.

  1. Using the data provided, apply a generalized likelihood ratio test to determine if there is a significant association between treatment type and symptom reduction at the \(5\%\) significance level.

Answer: We have null a hypothesis of: \[\begin{equation} \mathcal{H}_0: r_{12} = 1 \end{equation}\] against a two-sided alternative \[\begin{equation} \mathcal{H}_1: r_{12} \neq 1 \end{equation}\]

The table of expected frequencies is given in Figure 11.1.
Contingency Table for Q2-15

Figure 11.1: Contingency Table for Q2-15

We calculate the generalised likelihood ratio test statistic as: \[\begin{eqnarray} G^2 & = & 2 \sum_{i,j} n_{ij} \log \frac{n_{ij}}{\hat{E}_{ij}} \\ & = & 2 \left( 50 \log \frac{50}{40} + 20 \log \frac{20}{30} + 30 \log \frac{30}{40} + 40 \log \frac{40}{30} \right) \\ & = & 11.85 \end{eqnarray}\]

We have that \[\begin{equation} G^2 = 11.85 \geq \chi^{2,\star}_{1,0.05} = 3.84 \end{equation}\] hence there is evidence to reject the null hypothesis of independence between treatment type and symptom reduction.

–>

Agresti, A. 2019. An Introduction to Categorical Data Analysis. 3rd ed. New York: Wiley.
Bilder, C. R., and T. M. Loughin. 2015. Analysis of Categorical Data with r. Boca Raton: CRC press.
Covey, L. S., A. H. Glassman, and F. Stetner. 1990. “Depression and Depressive Symptoms in Smoking Cessation.” Comprehensive Psychiatry 31 (4): 350–54.
Dawson, R. J. M. 1995. “The Unusual Episode Data Revisited.” Journal of Statistical Education 3 (3).
Doll, R., and R. Peto. 1976. “Mortality in Relation to Smoking - 20 Years’ Observations on Male British Doctos.” British Medical Journal 2 (6051): 1525–36.
Faraway, J. J. 2016. Extending the Linear Model with r. 2nd ed. London: CRC press.
Fisher, R. A. 1937. The Design of Experiments. Oliver; Boyd.
Haberman, S. J. 1973. “The Analysis of Residuals in Cross-Classified Tables.” Biometrics 29: 205–20.
Kateri. 2014. Contingency Table Analysis - Methods and Implementation Using r. New York: Birkhauser.
Mantel, N. 1963. “Chi-Square Tests with One Degree of Freedom- Extensions of the Mantel-Haenszel Procedure.” Journal of the American Statistical Association 58: 690–700.
Tutz, G. 2012. Regression for Categorical Data. Cambridge: Cambridge University Press.
Wilks, S. S. 1938. “The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses.” Annals of Mathematical Statistics 9 (1): 60–62.
Yates, F. 1934. “Contingency Tables Involving Small Numbers and the Chi Square Test.” Journal of the Royal Statistical Society Supplement 1: 217–35.

References

Dawson, R. J. M. 1995. “The Unusual Episode Data Revisited.” Journal of Statistical Education 3 (3).
Doll, R., and R. Peto. 1976. “Mortality in Relation to Smoking - 20 Years’ Observations on Male British Doctos.” British Medical Journal 2 (6051): 1525–36.
Fisher, R. A. 1937. The Design of Experiments. Oliver; Boyd.

  1. Crucially, using the data, we estimate…↩︎

  2. Relative risk was introduced in Section 2.1.3.2.↩︎

  3. In brackets here as in this case we are also talking about the population of interest.↩︎

  4. Exact in the sense that the probabilities of any possible outcome can be calculated exactly.↩︎

  5. Note that this distribution follows in the context of this scenario since we view the lady as randomly guessing which \(q=n_{+1}=4\) cups had milk added first from the total of \(N=n_{++}=8\) cups and seeing how many are the type of interest, namely those for which milk really was added first \(M=n_{+1}=4\). Having explained it this way, we could also view it as \(\mathcal{H} g (N = n_{++}, M = n_{+1}, q = n_{1+})\).↩︎

  6. You should repeat this analysis for some of the other global odds ratios.↩︎