Factors that Could Affect Depression¶

Introduction¶

Depression is a disorder that causes someone to feel doubt and negative emotions that is not stopping and affecting their daily lives. It’s a condition that affects lots of Americans. Normally, people get depressed when an event happens in their lives that makes them feel negative emotions but long term depression could be caused by possible other factors. When someone is affected by depression, it’s not apparent during the first few days but if the depression is long term, they could be having the illness. Symptoms of depression can include feelings of various negative emotions in a long term that affect a patient’s physical or mental health state which can possibly be diagnosed based on some surveys. The SF-36 health survey asks several questions about physical and mental health and scores are given to measure a person’s certain health based on the categories of the survey. The Beck Depression score is a test used to measure the severity of depression based on the answers to a patient’s survey. Additionally, there could be other factors that can measures how effectively a patient is diagnosed with depression or not.

Materials and Methods¶

I am interested in studying the probability of diagnosing depression on patients based on several factors, gender, age, years of education, physical and mental components of the SF-36 survey, and the Beck depression score of each patient. This data was collected from the UC Davis health center from 400 patients. I will be modeling the probability of diagnosing depression based on these factors with multiple logistic regression and then using backwards selection to possibly fit better models with the following model:

First I ran the logistic regression of the multiple regression model using all the explanatory variables and obtained the following output obtained the following model which has an AIC value of 306.18:

Of these parameters, the pcs, age, and intercept parameters are not significant because their p-values are larger than 0.05 and the rest of the parameters seem significant because their p-values are either smaller or equal to 0.05. Next I used the stepwise regression function in R to see if I can obtain a better model if I subtract a parameter from my model. After running the stepwise function, I find that of all the models where I either subtract one parameter or don’t at all for my model, the model with the smallest AIC value is the one where I subtract the PCS parameter, which is the SF-36 value for the physical state of the patient with an AIC value of 304.78. I ran another regression model without including the PCS parameter and now obtained the following model,

Of the parameters in my model, the age parameter has a p-value of 0.12 which is greater than 0.05 meaning that it’s not significant and the rest of the parameters all have p-values less than 0.05 meaning that they are significant. Once again, I run the stepwise regression function in R to see if I can obtain better models if I subtract any parameters. After running the stepwise regression function, the model by subtracting no parameters from my new model has the smallest AIC value of all my possible models. I also conducted the Hosmer and Lemeshow goodness of fit test at the 0.05 significance level where I obtained a test statistic of 6.8257, a df of 8, and a p-value of 0.5556 so we can’t reject the null hypothesis that the model fits the data. Based on this test and the AIC criterion, I chose this model as my model for my data analysis.

Results¶

The model I choose for my analysis of results will be the following model:

Based on the resulting model, the predicted probability of depression being diagnosed decreases as the patient’s MCS or the mental component of the SF-36 survey score increases. For the test of the MCS beta parameter being 0 as the null hypothesis, the p-value of the test is 0.00175 and so we reject the null hypothesis. The 95% confidence interval for this parameter is [-0.0764, -0.0176] meaning that the effect of MCS is significant. The 95% confidence interval for the odds ratio is [0.93, 0.98] meaning that for each additional point of the MCS survey score, the odds that depression is diagnosed is 0.93 to 0.98 times as much as the original MCS survey score. The predicted probability of depression being diagnosed in the model increases as the patient’s Beck depression score increases. For the test that the Beck depression score beta parameter being 0 as the null hypothesis, the p-value of the test is 0.01961 meaning that we can reject the null hypothesis and conclude. The 95% confidence interval for this parameter is [0.0118, 0.1354] which supports this conclusion and means that the effect of the Beck depression score is significant. The 95% confidence interval for the odds ratio is [1.01, 1.14] meaning that for each additional point in the Beck depression score, the odds that depression is diagnosed is 1.01 to 1.14 times as much as the original Beck depression score. The model shows that the chance of depression being diagnosed decreases for males. For the test that the gender beta parameter being 0 as the null hypothesis, the p-value of the test is 0.03959 which means we can reject the null hypothesis. The 95% confidence interval for this parameter is [-1.367, -0.033] which supports that the gender parameter is significant. The 95% confidence interval for this odds ratio is [0.26, 0.98] meaning that the odds that depression is diagnosed in males is 0.26 to 0.98 times as much as depression being diagnosed in females. The model shows that the odds that depression is diagnosed increases as age increases. For the hypothesis test that the beta parameter for age equal to 0, the p-value is 0.116 which means we can’t reject the null hypothesis. The 95% confidence interval is [-0.004, 0.0352] which supports this conclusion. Despite age not being significant in the model, I will not drop the parameter as the AIC of the model increases when I do. The 95% confidence interval is [0.996, 1.04] meaning that for each additional year in age, the odds that depression is diagnosed is 0.996 to 1.04 times as much as the original age which does not seem very useful here. The model shows that as education level increases, the probability of depression being diagnosed increases. The null hypothesis for this test is the beta parameter for education is equal to 0 and the p-value for this test is 0.00245 so I will be rejecting the null hypothesis. The 95% confidence interval is [0.065, 0.305] meaning that the effect for education is significant. The 95% confidence interval for the odds ratio is [1.07, 1.36] meaning that for each additional year in education, the odds of depression being diagnosed is 1.07 to 1.36 more. Below is a plot of the standardized residuals:

I found that in this graph, a lot of the standardized residuals are negative and approaching zero and there are some unusually high values in the residuals such as the 4 to 6 range. This could possibly mean that there is a lack of fit in the model yet the Hosmer-Lemeshow test indicated that this model fits well and the AIC criterion shows that this is the best fitting model of all possible models.

Conclusion¶

From my model and my results, I have found that the odds of diagnosing depression is higher for females than males, and as the Beck depression score, and years of education increases and that the odds of diagnosing depression is lower as the SF-36 survey MCS score increases. I used backwards selection to choose the model I worked with and used the AIC criterion to conclude this one is the best. However, based on my analysis of the standardized Pearson residuals, there were some unusually high numbers for those residuals which indicates that the model I have chosen still has some misfit in it. Based on these results, I think it maybe better to analyze multiple possible models to see if the results of those models are consistent with the results I obtained using the model I selected.