Biomedical Data Science COVID-19 PAPERS Data Science

Potential limitations in COVID-19 machine learning due to data source variability

diciembre 15, 2020

326

Our recent paper Potential limitations in COVID-19 machine learning due to data source variability: A case study in the nCov2019 dataset has been accepted for publication in J Am Med Inform Assoc. (JAMIA, IF 4.112). We study whether the lack of representative coronavirus disease 2019 (COVID-19) data is a bottleneck for reliable and generalizable machine learning. Data sharing is insufficient without data quality, in which source variability plays an important role. We showcase and discuss potential biases from data source variability for COVID-19 machine learning. Our results are based in the publicly available nCov2019 dataset, including patient-level data from several countries. We aimed to the discovery and classification of severity subgroups using symptoms and comorbidities. We show that cases from the 2 countries with the highest prevalence were divided into separate subgroups with distinct severity manifestations. This variability can reduce the representativeness of training data with respect the model target populations and increase model complexity at risk of overfitting.

Artículo anterior

Success at the ANDI-Challenge

Artículo siguiente

Reversible Self-Replication of Spatio-Temporal Kerr Cavity Patterns (Phys. Rev. Let.)

J. Alberto Conejero

Applied Mathematics

Analysis

Potential limitations in COVID-19 machine learning due to data source variability

Related Articles

Visualizing Academic Contributions to Achieving the Sustainable Development Goals through AI

Multifractal spectrum and complex cepstrum analysis of armature currents and stray flux signals for sparking detection in DC motors

A pre-processing procedure for the implementation of the greedy rank-one algorithm to solve high-dimensional linear systems

DEJA UNA RESPUESTA

Latest Articles

Visualizing Academic Contributions to Achieving the Sustainable Development Goals through AI

Multifractal spectrum and complex cepstrum analysis of armature currents and stray flux signals for sparking detection in DC motors

A pre-processing procedure for the implementation of the greedy rank-one algorithm to solve high-dimensional linear systems

The Electric Vehicle Traveling Salesman Problem on Digital Elevation Models for Traffic-Aware Urban Logistics

A variant-dependent molecular clock with anomalous diffusion models SARS-CoV-2 evolution in humans (PNAS)