Why so GLUMM? Detecting depression clusters through graphing lifestyle-environs using machine-learning methods (GLUMM)

Authors: J.F. Dipnall, J.A. Pasco, M. Berk, L.J. Williams, S. Dodd, F.N. Jacka, D. Meyer

Source: European Psychiatry, 39, 40-50, Jan 2017

Brief summary of the paper:

Background: Key lifestyle-environ risk factors are operative for depression, but it is unclear how risk factors cluster. Machine-learning (ML) algorithms exist that learn, extract, identify and map underlying patterns to identify groupings of depressed individuals without constraints. The aim of this research was to use a large epidemiological study to identify and characterise depression clusters through “Graphing lifestyle-environs using machine-learning methods” (GLUMM).

Methods: Two ML algorithms were implemented: unsupervised Self-organised mapping (SOM) to create GLUMM clusters and a supervised boosted regression algorithm to describe clusters. Ninety-six “lifestyle-environ” variables were used from the National health and nutrition examination study (2009–2010). Multivariate logistic regression validated clusters and controlled for possible sociodemographic confounders.

Results: The SOM identified two GLUMM cluster solutions. These solutions contained one dominant depressed cluster (GLUMM5-1, GLUMM7-1). Equal proportions of members in each cluster rated as highly depressed (17%). Alcohol consumption and demographics validated clusters. Boosted regression identified GLUMM5-1 as more informative than GLUMM7-1. Members were more likely to: have problems sleeping; unhealthy eating; ≤ 2 years in their home; an old home; perceive themselves underweight; exposed to work fumes; experienced sex at ≤ 14 years; not perform moderate recreational activities. A positive relationship between GLUMM5-1 (OR: 7.50, P < 0.001) and GLUMM7-1 (OR: 7.88, P < 0.001) with depression was found, with significant interactions with those married/living with partner (P = 0.001).

Conclusion: Using ML based GLUMM to form ordered depressive clusters from multitudinous lifestyle-environ variables enabled a deeper exploration of the heterogeneous data to uncover better understandings into relationships between the complex mental health factors.