Structured sparse logistic regression with application to lung cancer prediction using breath volatile biomarkers

Stat Med. 2020 Mar 30;39(7):955-967. doi: 10.1002/sim.8454. Epub 2019 Dec 27.

Abstract

This article is motivated by a study of lung cancer prediction using breath volatile organic compound (VOC) biomarkers, where the challenge is that the predictors include not only high-dimensional time-dependent or functional VOC features but also the time-independent clinical variables. We consider a high-dimensional logistic regression and propose two different penalties: group spline-penalty or group smooth-penalty to handle the group structures of the time-dependent variables in the model. The new methods have the advantage for the situation where the model coefficients are sparse but change smoothly within the group, compared with other existing methods such as the group lasso and the group bridge approaches. Our methods are easy to implement since they can be turned into a group minimax concave penalty problem after certain transformations. We show that our fitting algorithm possesses the descent property and leads to attractive convergence properties. The simulation studies and the lung cancer application are performed to demonstrate the accuracy and stability of the proposed approaches.

Keywords: group smooth-penalty; group spline-penalty; high-dimensional data; time-dependent variables; variable selection.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Biomarkers
  • Computer Simulation
  • Humans
  • Logistic Models
  • Lung Neoplasms* / diagnosis

Substances

  • Biomarkers