697
Learning DNA/SWCNT Recognition Sequences

Tuesday, 15 May 2018: 10:40
Room 205 (Washington State Convention Center)
Y. Yang (Lehigh University), M. Zheng (National Institute of Standards and Technology), and A. Jagota (Lehigh University)
DNA/single walled carbon nanotube (SWCNT) hybrids have attracted significant attention for their ability to disperse and sort SWCNTs according to their chirality. Much effort has been expended in recent years to discover the special short DNA sequences, called ‘recognition sequences’, which recognize specific corresponding partner SWCNTs from mixtures. Several computational molecular modeling studies have established our understanding of the structural basis for sequence-specific recognition. Although some sequence patterns have emerged from a directed and limited search of the DNA library, our ability to predict recognition sequences remains weak. In this research, we proposed a new approach to prediction of recognition sequences using machine learning techniques. We used two different ways to represent the DNA sequences: Position Specific Vector and Term Frequency Vector. The Position Specific Vector method conserves the position information of each nucleotide in a sequence but is only applicable for sets of sequences with the same length. The Term Frequency Vector method does not contain position information but can be used for sets of sequences with variable length. We have applied these feature extraction methods to experimental data sets, using them to train several classifier algorithms (logistic regression, support vector machine and multilayer perceptron). We add a further iteration loop by tested experimentally new sets of predicted sequences, re-evaluating and retraining models each time. We report significantly improved ability to predict recognition sequences.