As time passes, society gradually begins to be more liberal and people are more open to ideas. Despite the increasing liberalness, there are still many controversial issues at hand where opinions vary a lot. In the past, many people living in the United States were Christians who read the Bible and prayed. This has resulted in a history of people accepting the use of the Bible and prayers in public schools. As time passed however, the demographics and views started to change and therefore, our society started to change in order to adapt to the changes happening in society. Eventually the Supreme Court ruled that it is unconstitutional for public schools to require the reading of prayers and the Bible. This makes people wonder if the opinion of the people has changed to conform to society’s changes and whether their background affects these opinions or not.
I am interested in studying opinions on banning the Bible or prayers in public schools and the respondent’s race and gender. One of the questions that the General Social Survey asks is the respondent’s opinion on this issue. The data below consists of 1458 observations, asking if they approve or disapprove of this opinion, their race, and their gender. This data table was generated using the SDA website which uses data collected from the General Social Survey in 2014 using the unweighted sample size. This data is considered categorical data because the data is broken down into several categories and observations counted with respect to all categories.
I first use R to fit all of the possible loglinear models for this data. I use “O” to represent the opinion, whether the person approves or disapproves, “R” for their race which I only included white or black in this study, and “G” for their gender which is male and female. Below is a table of summary for the goodness-of-fit tests for loglinear models fitted for the data.
Based on the values listed above, there are two models that has a high p-value which could fit the data, the loglinear model for conditional independence on opinion and gender given the respondent’s race which has a p-value of 0.305 and the loglinear model where there is homogenous association between the three factors which has a p-value of 0.867. The AIC values of these two models are close to each other, which the homogenous association model having a slightly lower AIC of 67.41 compared to the conditional independence model where gender is independent from opinion given race, that has an AIC of 67.76. Now I would like to test whether these two models are statistically the same or not with the null hypothesis of the two models statistically being the same and the alternative hypothesis that these two models are not statistically the same. To do this, I subtract the G^2 of the model with more parameters from the model with less parameters, obtaining 2.38 – 0.028 = 2.352. I subtract the df of the model with more parameters from the model with less parameters, obtaining 2 – 1 = 1. The chi-square value for a degree of freedom 1 with p-value of 0.1253 so I fail to reject the null hypothesis and therefore it’s possible that the two models are statistically the same. Since the two models are statistically the same, I choose the model with [OR][RG] (opinion and gender are independent given race) because that model is more useful than the [OR][OG][RG] (homogeneous association) model.:
The analysis of association is done with the Hosmer and Lemeshow Chi-Square test to test the goodness of fit. The test statistic is 4.439 with 8 degrees of freedom and a p-value of 0.8155. Therefore, we can’t reject the null hypothesis that the model is fitting the data and that the model is probably a good fit.
Now that I selected the model where given race, opinion and gender are independent, I extracted the two odds ratios where the association from gender and race don’t depend on the opinion and where the respondent's race and opinion doesn’t depend on gender and where race and gender of the respondent doesn’t depend on their opinion. The odds ratio where factor levels O and R doesn’t depend on G is 2.06 and the confidence interval for that odds ratio is 1.52 to 2.78. This confidence interval suggests that there is an association between the respondent’s opinion and gender for both races. The odds ratio where factor levels G and R doesn’t depend on O is 1.75 and the confidence interval for that odds ratio is 1.32 to 2.33. This confidence interval suggests that there is an association between respondents’ race and gender for both opinions. I found both odds ratios by exponentiating the log of the odds ratio found in the loglinear model and similarly, found both confidence intervals by exponentiating the log of the odds ratios plus or minus 1.96 * the standard error of the logs of the odds ratios which is also found in the models.
After fitting loglinear models and finding the odds ratio, I conclude that there is a positive association between a respondent’s opinion and race for both genders and that there is a positive association between a respondent’s race and gender for both types of opinions which in this case doesn’t seem useful. I’m surprised to find that there is only an association between a respondent’s opinion and race because I had expected that both the respondent’s gender and race affected their opinion only to find that there doesn’t seem to be any association between a respondent’s opinion and their gender. There are several other factors that may affect a person’s opinion on this issue such as their ideology or age which could possibly how the data is being spread. For example, it could be that people in the older generation would normally take the time to answer the General Social Survey than younger people which could cause the data to be more biased in favor of people of various backgrounds disapproving this issue rather than approving it. I think that if I had generated more factors into the data and possibly include more variables for each factor, that different results and conclusions could possibly be generated.