[의학통계방법론] Ch15. Nested (Hierarchical) Analysis of Variance

05 May 2022 in Study log / Study / 의학통계방법론 on R, Sas, Blog, Jekyll, Jupyter, Data, Python

Nested (Hierarchical) Analysis of Variance
R 프로그램 결과
- 15장
SAS 프로그램 결과
- 15장
  - EXAMPLE 15.1
  - EXAMPLE 15.2

Nested (Hierarchical) Analysis of Variance

예제 데이터파일 Exam 15_1a.sav 다운로드

R 프로그램 결과

R 접기/펼치기 버튼

패키지

설치된 패키지 접기/펼치기 버튼

getwd()

## [1] "C:/Biostat"

library("foreign")
library("dplyr")
library("kableExtra")

15장

15장 연습문제 불러오기

ex15_1a <- read.spss('Exam 15_1a.sav', to.data.frame=T)

EXAMPLE 15.1a

#데이터셋
ex15_1a%>%
  kbl(caption = "Dataset",escape=F) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

Dataset
drug	source	subj	resp
1	1	1	102
1	1	2	104
1	2	3	103
1	2	4	104
2	3	5	108
2	3	6	110
2	4	7	109
2	4	8	108
3	5	9	104
3	5	10	106
3	6	11	105
3	6	12	107

해당 데이터는 세 가지 약 중 한 가지를 12명의 여성 각각에게 투여한 후 측정된 혈중 콜레스테롤 농도에 대한 데이터이다.

drug <- as.factor(ex15_1a$drug)
source <- as.factor(ex15_1a$source)

분석을 위해 drug와 source를 factor로 변환해 주었다.

tapply(ex15_1a$resp,drug,sum)

##   1   2   3 
## 413 435 422

tapply(ex15_1a$resp,drug,mean)

##      1      2      3 
## 103.25 108.75 105.50

EXAMPLE 15.1b

콜레스테롤 수치는 drug type과 drug source에 영향을 받으므로, 두 인자를 고려하여 nested ANOVA를 수행하였다.
따라서 가설은 아래와 같이 두 가지다.

첫번째 가설

\[\begin{aligned} H_0 &: There \ is \ no \ difference \ among \ the \ drug \ sources \ in \ affecting \ mean \ blood \ cholesterol \ concentration. \\ H_A &: There \ is \ difference \ among \ the \ drug \ sources \ in \ affecting \ mean \ blood \ cholesterol \ concentration. \end{aligned}\]

두번째 가설

\[\begin{aligned} H_0 &: There \ is \ no \ difference \ in \ mean \ cholesterol \ concentrations \ owing \ to \ the \ three \ drugs. \\ H_A &: There \ is \ difference \ in \ mean \ cholesterol \ concentrations \ owing \ to \ the \ three \ drugs. \end{aligned}\]

fit <- lm(ex15_1a$resp~drug/source)
anova(fit)

## Analysis of Variance Table
## 
## Response: ex15_1a$resp
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## drug         2 61.167  30.583 20.3889 0.00211 **
## drug:source  3  1.500   0.500  0.3333 0.80220   
## Residuals    6  9.000   1.500                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

첫 번째 귀무가설에 대해, p-value=0.8022로 유의수준 5%하에 귀무가설을 기각하지 못하였다.
즉, 혈중 콜레스테롤 수치에 영향을 미치는 drug sources의 차이는 없다고 할 수 있는 근거가 충분하다 할 수 있다.

fit2 <- aov(ex15_1a$resp~drug+Error(source))
summary(fit2)

## 
## Error: source
##           Df Sum Sq Mean Sq F value Pr(>F)   
## drug       2  61.17   30.58   61.17 0.0037 **
## Residuals  3   1.50    0.50                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Within
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  6      9     1.5

두 번째 귀무가설에 대해, p-value=0.0037로 유의수준 5%하에 귀무가설을 기각한다.
즉, 세가지 약으로 인한 콜레스테롤 수치 평균은 유의한 차이가 있다고 할 수 있는 근거가 충분하다.

EXAMPLE 15.2

이 예제는 3개의 factor와 5번의 반복이 있는 삼요인설계 모형에서의 nested된 모형이다.
데이터의 뼈대는 12.1을 하고 있지만 반복 된 후의 데이터가 없어서 데이터를 새로 지정해주어야한다.
그래서 rnorm function으로 임의로 데이터를 지정해주었다.
데이터를 넣고 nested된 시간을 보고 가설 검정을 진행하여보겠겠다.

trt<-c(rep('NO',10), rep('YES',10))
gender<-c(rep('female',5),rep('male',5),rep('female',5), rep('male',5))
xx<-c(16.3,20.4,12.4,15.8,9.5,15.3,17.4,10.9,10.3,6.7, 38.1,26.2,32.3,35.8,30.2,34.0,22.8,27.8,25.0,29.3)
trt<-as.factor(trt) ; gender<-as.factor(gender) 
ex1502=data.frame(trt,gender,xx)
a=ex1502[1:5,]; aa=a$xx; h0ft1=rnorm(5,mean(aa),1) ; h0ft2=rnorm(5,mean(aa),1)
b=ex1502[6:10,]; bb=b$xx; h0mt1=rnorm(5,mean(bb),1) ; h0mt2=rnorm(5,mean(bb),1)
c=ex1502[11:15,]; cc=c$xx; hxft1=rnorm(5,mean(cc),1) ; hxft2=rnorm(5,mean(cc),1)
d=ex1502[16:20,]; dd=d$xx; hxmt1=rnorm(5,mean(dd),1) ; hxmt2=rnorm(5,mean(dd),1)
new_xx<-c(aa,h0ft1,h0ft2,bb,h0mt1,h0mt2,cc,hxft1,hxft2,dd,hxmt1,hxmt2)
animal<-c(rep(c(1,2,3,4,5),12))
cell<-c(rep(1,15),rep(2,15),rep(3,15),rep(4,15))
new_trt<-c(rep('NO',30), rep('YES',30))
new_gender<-c(rep('female',15),rep('male',15),rep('female',15), rep('male',15))
ex1502a<-data.frame(new_trt,new_gender,cell,animal,new_xx)
new_gender<-as.factor(new_gender) ; new_trt<-as.factor(new_trt) ; cell<-as.factor(cell); animal <- as.factor(animal)
fit_animal<-lm(new_xx ~(new_gender*new_trt)/animal+(new_gender*new_trt)/cell)
anova(fit_animal)

## Analysis of Variance Table
## 
## Response: new_xx
##                           Df Sum Sq Mean Sq  F value    Pr(>F)    
## new_gender                 1  214.9   214.9  38.8940 2.198e-07 ***
## new_trt                    1 4153.8  4153.8 751.6521 < 2.2e-16 ***
## new_gender:new_trt         1    7.5     7.5   1.3496    0.2522    
## new_gender:new_trt:animal 16  132.7     8.3   1.5008    0.1476    
## Residuals                 40  221.1     5.5                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

SAS 프로그램 결과

SAS 접기/펼치기 버튼

15장

EXAMPLE 15.1

LIBNAME ex 'C:\Biostat';
RUN;

PROC IMPORT DBMS=sav
DATAFILE='C:\Biostat\Exam 15_1A.sav'
OUT=ex.ex15_1a REPLACE;
RUN;

title "Example 15.1";
ods graphics off;ods exclude all;ods noresults;
PROC GLM DATA=ex.ex15_1a;
	CLASS drug source;
	MODEL resp = drug source(drug) / solution;
	RANDOM source(drug) / test;
	ods output RandomModelANOVA = RandomModelANOVA OverallANOVA = OverallANOVA;
RUN;
ods graphics on;ods exclude none;ods results;

PROC PRINT DATA=OverallANOVA label;
RUN;
PROC PRINT DATA=RandomModelANOVA label;
    VAR Source error SS DF MS FValue ProbF;
RUN;

SAS 출력

Example 15.1


OBS	Dependent	Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
1	resp	Model	5	62.66666667	12.53333333	8.36	0.0112
2	resp	Error	6	9.00000000	1.50000000	_	_
3	resp	Corrected Total	11	71.66666667	_	_	_

Example 15.1


OBS	Source	Error	Type III SS	DF	Mean Square	F Value	Pr > F
1	drug	source(drug)	61.166667	2	30.583333	61.17	0.0037
2	Error	source(drug)	1.500000	3	0.500000	_	_
3	source(drug)	Error	1.500000	3	0.500000	0.33	0.8022
4	Error: MS(Error)	Error	9.000000	6	1.500000	_	_

EXAMPLE 15.2

PROC IMPORT DATAFILE="C:\git_blog\study\data\의학통계방법론\ch15\ex1502a.csv" DBMS=csv OUT=ex.ex15_2a;
GETNAMES=Yes;
RUN;

PROC GLM DATA=ex.ex15_2a;
CLASS trt sex animal ;
    MODEL y = trt sex sex*trt animal(trt*sex) ;
    random animal(trt*sex) ;
    TEST H=trt E=animal(trt*sex);
    TEST H=sex E=animal(trt*sex);
    TEST H=sex*trt E=animal(trt*sex);
RUN;

SAS 출력

Example 15.1

The GLM Procedure


Class Level Information
Class	Levels	Values
trt	2	NO YES
sex	2	female male
animal	5	1 2 3 4 5


Number of Observations Read	60
Number of Observations Used	60

Example 15.1

The GLM Procedure

Dependent Variable: y


Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	19	4532.722406	238.564337	46.53	<.0001
Error	40	205.083610	5.127090
Corrected Total	59	4737.806016


R-Square	Coeff Var	Root MSE	y Mean
0.956713	10.35280	2.264308	21.87146


Source	DF	Type I SS	Mean Square	F Value	Pr > F
trt	1	4185.338364	4185.338364	816.32	<.0001
sex	1	216.056461	216.056461	42.14	<.0001
trt*sex	1	10.945035	10.945035	2.13	0.1518
animal(trt*sex)	16	120.382546	7.523909	1.47	0.1608


Source	DF	Type III SS	Mean Square	F Value	Pr > F
trt	1	4185.338364	4185.338364	816.32	<.0001
sex	1	216.056461	216.056461	42.14	<.0001
trt*sex	1	10.945035	10.945035	2.13	0.1518
animal(trt*sex)	16	120.382546	7.523909	1.47	0.1608

Example 15.1

The GLM Procedure


Source	Type III Expected Mean Square
trt	Var(Error) + 3 Var(animal(trtsex)) + Q(trt,trtsex)
sex	Var(Error) + 3 Var(animal(trtsex)) + Q(sex,trtsex)
trt*sex	Var(Error) + 3 Var(animal(trtsex)) + Q(trtsex)
animal(trt*sex)	Var(Error) + 3 Var(animal(trt*sex))

Example 15.1

The GLM Procedure

Dependent Variable: y


Tests of Hypotheses Using the Type III MS for animal(trt*sex) as an Error Term
Source	DF	Type III SS	Mean Square	F Value	Pr > F
trt	1	4185.338364	4185.338364	556.27	<.0001
sex	1	216.056461	216.056461	28.72	<.0001
trt*sex	1	10.945035	10.945035	1.45	0.2453

교재: Biostatistical Analysis (5th Edition) by Jerrold H. Zar

**이 글은 22학년도 1학기 의학통계방법론 과제 자료들을 정리한 글 입니다.**

drug	source	subj	resp
1	1	1	102
1	1	2	104
1	2	3	103
1	2	4	104
2	3	5	108
2	3	6	110
2	4	7	109
2	4	8	108
3	5	9	104
3	5	10	106
3	6	11	105
3	6	12	107

drug	source	subj	resp
1	1	1	102
1	1	2	104
1	2	3	103
1	2	4	104
2	3	5	108
2	3	6	110
2	4	7	109
2	4	8	108
3	5	9	104
3	5	10	106
3	6	11	105
3	6	12	107

Nested (Hierarchical) Analysis of Variance

R 프로그램 결과

15장

EXAMPLE 15.1a

EXAMPLE 15.1b

EXAMPLE 15.2

SAS 프로그램 결과

15장

EXAMPLE 15.1

EXAMPLE 15.2

Templates (for web app):

Error

drug	source	subj	resp
1	1	1	102
1	1	2	104
1	2	3	103
1	2	4	104
2	3	5	108
2	3	6	110
2	4	7	109
2	4	8	108
3	5	9	104
3	5	10	106
3	6	11	105
3	6	12	107