[의학통계방법론] Ch15. Nested (Hierarchical) Analysis of Variance

Nested (Hierarchical) Analysis of Variance

예제 데이터파일 Exam 15_1a.sav 다운로드

R 프로그램 결과

R 접기/펼치기 버튼

패키지

설치된 패키지 접기/펼치기 버튼
getwd()
## [1] "C:/Biostat"
library("foreign")
library("dplyr")
library("kableExtra")

15장

15장 연습문제 불러오기

ex15_1a <- read.spss('Exam 15_1a.sav', to.data.frame=T)

EXAMPLE 15.1a

#데이터셋
ex15_1a%>%
  kbl(caption = "Dataset",escape=F) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Dataset
drug source subj resp
1 1 1 102
1 1 2 104
1 2 3 103
1 2 4 104
2 3 5 108
2 3 6 110
2 4 7 109
2 4 8 108
3 5 9 104
3 5 10 106
3 6 11 105
3 6 12 107
  • 해당 데이터는 세 가지 약 중 한 가지를 12명의 여성 각각에게 투여한 후 측정된 혈중 콜레스테롤 농도에 대한 데이터이다.
drug <- as.factor(ex15_1a$drug)
source <- as.factor(ex15_1a$source)
  • 분석을 위해 drug와 source를 factor로 변환해 주었다.
tapply(ex15_1a$resp,drug,sum)
##   1   2   3 
## 413 435 422
tapply(ex15_1a$resp,drug,mean)
##      1      2      3 
## 103.25 108.75 105.50

EXAMPLE 15.1b

  • 콜레스테롤 수치는 drug type과 drug source에 영향을 받으므로, 두 인자를 고려하여 nested ANOVA를 수행하였다.
    따라서 가설은 아래와 같이 두 가지다.

첫번째 가설

\[\begin{aligned} H_0 &: There \ is \ no \ difference \ among \ the \ drug \ sources \ in \ affecting \ mean \ blood \ cholesterol \ concentration. \\ H_A &: There \ is \ difference \ among \ the \ drug \ sources \ in \ affecting \ mean \ blood \ cholesterol \ concentration. \end{aligned}\]

두번째 가설

\[\begin{aligned} H_0 &: There \ is \ no \ difference \ in \ mean \ cholesterol \ concentrations \ owing \ to \ the \ three \ drugs. \\ H_A &: There \ is \ difference \ in \ mean \ cholesterol \ concentrations \ owing \ to \ the \ three \ drugs. \end{aligned}\]
fit <- lm(ex15_1a$resp~drug/source)
anova(fit)
## Analysis of Variance Table
## 
## Response: ex15_1a$resp
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## drug         2 61.167  30.583 20.3889 0.00211 **
## drug:source  3  1.500   0.500  0.3333 0.80220   
## Residuals    6  9.000   1.500                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • 첫 번째 귀무가설에 대해, p-value=0.8022로 유의수준 5%하에 귀무가설을 기각하지 못하였다.
  • 즉, 혈중 콜레스테롤 수치에 영향을 미치는 drug sources의 차이는 없다고 할 수 있는 근거가 충분하다 할 수 있다.
fit2 <- aov(ex15_1a$resp~drug+Error(source))
summary(fit2)
## 
## Error: source
##           Df Sum Sq Mean Sq F value Pr(>F)   
## drug       2  61.17   30.58   61.17 0.0037 **
## Residuals  3   1.50    0.50                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Within
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  6      9     1.5
  • 두 번째 귀무가설에 대해, p-value=0.0037로 유의수준 5%하에 귀무가설을 기각한다.
  • 즉, 세가지 약으로 인한 콜레스테롤 수치 평균은 유의한 차이가 있다고 할 수 있는 근거가 충분하다.

EXAMPLE 15.2

  • 이 예제는 3개의 factor와 5번의 반복이 있는 삼요인설계 모형에서의 nested된 모형이다.
  • 데이터의 뼈대는 12.1을 하고 있지만 반복 된 후의 데이터가 없어서 데이터를 새로 지정해주어야한다.
  • 그래서 rnorm function으로 임의로 데이터를 지정해주었다.
  • 데이터를 넣고 nested된 시간을 보고 가설 검정을 진행하여보겠겠다.
trt<-c(rep('NO',10), rep('YES',10))
gender<-c(rep('female',5),rep('male',5),rep('female',5), rep('male',5))
xx<-c(16.3,20.4,12.4,15.8,9.5,15.3,17.4,10.9,10.3,6.7, 38.1,26.2,32.3,35.8,30.2,34.0,22.8,27.8,25.0,29.3)
trt<-as.factor(trt) ; gender<-as.factor(gender) 
ex1502=data.frame(trt,gender,xx)
a=ex1502[1:5,]; aa=a$xx; h0ft1=rnorm(5,mean(aa),1) ; h0ft2=rnorm(5,mean(aa),1)
b=ex1502[6:10,]; bb=b$xx; h0mt1=rnorm(5,mean(bb),1) ; h0mt2=rnorm(5,mean(bb),1)
c=ex1502[11:15,]; cc=c$xx; hxft1=rnorm(5,mean(cc),1) ; hxft2=rnorm(5,mean(cc),1)
d=ex1502[16:20,]; dd=d$xx; hxmt1=rnorm(5,mean(dd),1) ; hxmt2=rnorm(5,mean(dd),1)
new_xx<-c(aa,h0ft1,h0ft2,bb,h0mt1,h0mt2,cc,hxft1,hxft2,dd,hxmt1,hxmt2)
animal<-c(rep(c(1,2,3,4,5),12))
cell<-c(rep(1,15),rep(2,15),rep(3,15),rep(4,15))
new_trt<-c(rep('NO',30), rep('YES',30))
new_gender<-c(rep('female',15),rep('male',15),rep('female',15), rep('male',15))
ex1502a<-data.frame(new_trt,new_gender,cell,animal,new_xx)
new_gender<-as.factor(new_gender) ; new_trt<-as.factor(new_trt) ; cell<-as.factor(cell); animal <- as.factor(animal)
fit_animal<-lm(new_xx ~(new_gender*new_trt)/animal+(new_gender*new_trt)/cell)
anova(fit_animal)
## Analysis of Variance Table
## 
## Response: new_xx
##                           Df Sum Sq Mean Sq  F value    Pr(>F)    
## new_gender                 1  214.9   214.9  38.8940 2.198e-07 ***
## new_trt                    1 4153.8  4153.8 751.6521 < 2.2e-16 ***
## new_gender:new_trt         1    7.5     7.5   1.3496    0.2522    
## new_gender:new_trt:animal 16  132.7     8.3   1.5008    0.1476    
## Residuals                 40  221.1     5.5                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


SAS 프로그램 결과

SAS 접기/펼치기 버튼

15장

EXAMPLE 15.1

LIBNAME ex 'C:\Biostat';
RUN;

PROC IMPORT DBMS=sav
DATAFILE='C:\Biostat\Exam 15_1A.sav'
OUT=ex.ex15_1a REPLACE;
RUN;

title "Example 15.1";
ods graphics off;ods exclude all;ods noresults;
PROC GLM DATA=ex.ex15_1a;
	CLASS drug source;
	MODEL resp = drug source(drug) / solution;
	RANDOM source(drug) / test;
	ods output RandomModelANOVA = RandomModelANOVA OverallANOVA = OverallANOVA;
RUN;
ods graphics on;ods exclude none;ods results;

PROC PRINT DATA=OverallANOVA label;
RUN;
PROC PRINT DATA=RandomModelANOVA label;
    VAR Source error SS DF MS FValue ProbF;
RUN;
SAS 출력

Example 15.1

OBSDependentSourceDFSum of SquaresMean SquareF ValuePr > F
1respModel562.6666666712.533333338.360.0112
2respError69.000000001.50000000__
3respCorrected Total1171.66666667___

Example 15.1

OBSSourceErrorType III SSDFMean SquareF ValuePr > F
1drugsource(drug)61.166667230.58333361.170.0037
2Errorsource(drug)1.50000030.500000__
3source(drug)Error1.50000030.5000000.330.8022
4Error: MS(Error)Error9.00000061.500000__

EXAMPLE 15.2

PROC IMPORT DATAFILE="C:\git_blog\study\data\의학통계방법론\ch15\ex1502a.csv" DBMS=csv OUT=ex.ex15_2a;
GETNAMES=Yes;
RUN;

PROC GLM DATA=ex.ex15_2a;
CLASS trt sex animal ;
    MODEL y = trt sex sex*trt animal(trt*sex) ;
    random animal(trt*sex) ;
    TEST H=trt E=animal(trt*sex);
    TEST H=sex E=animal(trt*sex);
    TEST H=sex*trt E=animal(trt*sex);
RUN;
SAS 출력

Example 15.1

The GLM Procedure

Class Level Information
ClassLevelsValues
trt2NO YES
sex2female male
animal51 2 3 4 5
Number of Observations Read60
Number of Observations Used60

Example 15.1

The GLM Procedure

 

Dependent Variable: y

SourceDFSum of SquaresMean SquareF ValuePr > F
Model194532.722406238.56433746.53<.0001
Error40205.0836105.127090  
Corrected Total594737.806016   
R-SquareCoeff VarRoot MSEy Mean
0.95671310.352802.26430821.87146
SourceDFType I SSMean SquareF ValuePr > F
trt14185.3383644185.338364816.32<.0001
sex1216.056461216.05646142.14<.0001
trt*sex110.94503510.9450352.130.1518
animal(trt*sex)16120.3825467.5239091.470.1608
SourceDFType III SSMean SquareF ValuePr > F
trt14185.3383644185.338364816.32<.0001
sex1216.056461216.05646142.14<.0001
trt*sex110.94503510.9450352.130.1518
animal(trt*sex)16120.3825467.5239091.470.1608

Example 15.1

The GLM Procedure

SourceType III Expected Mean Square
trtVar(Error) + 3 Var(animal(trt*sex)) + Q(trt,trt*sex)
sexVar(Error) + 3 Var(animal(trt*sex)) + Q(sex,trt*sex)
trt*sexVar(Error) + 3 Var(animal(trt*sex)) + Q(trt*sex)
animal(trt*sex)Var(Error) + 3 Var(animal(trt*sex))

Example 15.1

The GLM Procedure

 

Dependent Variable: y

Tests of Hypotheses Using the Type III MS for animal(trt*sex) as an Error Term
SourceDFType III SSMean SquareF ValuePr > F
trt14185.3383644185.338364556.27<.0001
sex1216.056461216.05646128.72<.0001
trt*sex110.94503510.9450351.450.2453



교재: Biostatistical Analysis (5th Edition) by Jerrold H. Zar


**이 글은 22학년도 1학기 의학통계방법론 과제 자료들을 정리한 글 입니다.**