Paper published in JAMIA: Potential limitations in COVID-19 machine learning due to data source variability: a case study in the nCov2019 dataset

The lack of representative COVID-19 data is a bottleneck for reliable and generalizable machine learning. Data sharing is insufficient without data quality, where source variability plays an important role. We showcase and discuss potential biases from data source variability for COVID-19 machine learning. In this work, we used the publicly available nCov2019 dataset, including patient level data from several countries. We aimed to the discovery and classification of severity subgroups using symptoms and comorbidities.

In our work published in JAMIA, we have shown that cases from the two countries with the highest prevalence were divided into separate subgroups with distinct severity manifestations. This variability can reduce the representativeness of training data with respect the model target populations and increase model complexity at risk of overfitting. We conclude that data source variability is a potential contributor to bias in distributed research networks. We call for systematic assessment and reporting of data source variability and data quality in COVID-19 data sharing, as key information for reliable and generalizable machine learning.
Our analysis tool developed within BDSLab at UPV can be found at http://covid19sdetool.upv.es/?tab=ncov2019

Grant concession from BBVA fund SARS-CoV-2 and COVID-19

On September 30th, 2020 we have received funding for the project Ciencias de Datos e Inteligencia Artificial contra el COVID-19, IA4COVID19, from Fundación BBVA among more than 150 proposals presented to the call in the category: Big Data e Inteligencia Artificial (“Data-IA-COVID-19”). This proposal has been lead by the data research scientist Nuria Oliver from Ellis Alicante. The initiative, is linked to the Valencian Strategy in Artificial Intelligence.

Our project is a collaborative work that we have developed voluntarily and altruistically since the beginning of the crisis caused by the pandemic, professors from Valencian universities. The research entitled «Data science against Covid-19» brings together the participation of civil society (through a citizen survey), experts from the academic-research environment, and public administration, with the aim of providing information so that the those responsible for public crisis management can make informed decisions based on scientific evidence obtained from data analysis. In particular, I collaborate in the epidemiological models part and as head of the UPV node together with Miguel Rebollo from the VRAIN Institute. The initiative, linked to the Valencian Strategy in Artificial Intelligence through the commissioner of the presidency occupied by the researcher Nuria Oliver.

 

 

Grant funding from CRUE-Santander Fondo Supera CoVID-19

Last July we received funding for the project «Data Sciences against Covid-19» (CD4COVID), from the Supera Covid-19 fund that Banco Santander launched in April, together with CRUE Spanish Universities and the Higher Council for Scientific Research ( CSIC). The fund, endowed with 8.5 million euros to finance programs, projects and support measures, aims to minimize the impact of the crisis generated by the pandemic and focuses on three lines of action: research, impact projects social and strengthening the technological capacity of Spanish universities.

Our project is a collaborative work that we have developed voluntarily and altruistically since the beginning of the crisis caused by the pandemic, professors from Valencian universities. The research entitled «Data science against Covid-19» brings together the participation of civil society (through a citizen survey), experts from the academic-research environment and public administration, with the aim of providing information so that the those responsible for public crisis management can make informed decisions based on scientific evidence obtained from data analysis. In particular, I collaborate in the epidemiological models part and as head of the UPV node together with Miguel Rebollo from the VRAIN Institute. The initiative, linked to the Valencian Strategy in Artificial Intelligence through the commissioner of the presidency occupied by the researcher Nuria Oliver.

Research paper on Network Science and Machine Learning

It has already been published our paper «Community detection based deep neural network (CD-DNN) architectures: a fully automated framework for Likert scales» in the mathematical journal Mathematical Methods in Applied Science, where we apply network community detection in order a suitable infrastructure for an Artificial Neural Network. This permits to efficiently use raw data from psychological questionnaires based on Likert scales.

Community detection and neural networks.