687
(Invited) Machine Learning for DNA/SWCNT Based Molecular Perceptron: Finding Sequences and Training Sensor Arrays

Wednesday, 1 June 2022: 16:00
West Meeting Room 204 (Vancouver Convention Center)
Y. Yang (Lehigh University), Z. A. Yaari (Memorial Sloan Kettering), Z. Lin (NIST), D. A. Heller (Memorial Sloan Kettering Cancer Center), M. Zheng (National Institute of Standards and Technology), and A. Jagota (Lehigh University)
Single wall carbon nanotube (SWCNT) based biosensors provide opportunities for building an ultra-sensitive biosensing system due to their unique optical properties and strong sensitivity to changes in the local environment. Consequently, much effort has been made to develop SWCNT-based sensors. However, the usual method is based on one-to-one recognition which is a difficult way to detect a variety of molecules. In this study, we describe a sensing system, which we call the Molecular Perceptron, that uses an array of weakly-specific sensors combined with a machine learning model. We show how machine learning algorithms, along with choice of feature representation, can be used both for discovery of special resolving DNA sequences and to predict presence and concentration of biomarkers. DNA/SWCNT hybrids were utilized to optically detect biomarker analytes by observing changes in the fluorescence spectra of each SWCNT. Using the experimental data, machine learning models were trained using three different algorithms: Support Vector Machine, Random Forest, and Artificial Neural Network. We demonstrated this platform in gynecologic cancers, often diagnosed at advanced stages, leading to low survival rates. We investigated the detection of protein biomarkers in uterine lavage samples, which are enriched with certain cancer markers compared to blood. We found that the method enables the simultaneous detection of multiple biomarkers in patient samples, with F1-scores of ~0.95 in uterine lavage samples from patients with cancer. This work demonstrates the potential of perception-like systems for the development of multiplexed sensors of disease biomarkers without the need for specific molecular recognition elements. We also demonstrated that machine learning models trained on relatively small DNA sequence data sets can very accurately predict new resolving DNA sequences.