Program 2013

The BIOT-2013 Symposium will be held on Thursday and Friday December 5 and 6, 2013. A conference dinner for all registered conference attendees (fee included) and guests (additional fee) will be held on the night of December 5, 2013. The conference will finish at 12 noon on Friday December 6, 2013. A boxed lunch will be available.

The sessions will be in room 2258 of the Conference Center (CONF) at Brigham Young University.

Thursday, December 5
8:00 – 8:45 Registration and breakfast
8:45 – 9:00 Welcome address
9:00 – 10:30 Session 1
9:00 Mitochondrial genomic variation associated with higher mitochondrial copy number: The Cache County Study on Memory Health and AgingPerry Ridge, Brigham Young University
9:30 Predicting protein solvent accessibility with sequence, evolutionary information and context-based featuresAshraf Yaseen, Central State University
10:00 Inferences of population substructure: The Cache County StudyAaron Sharp, Brigham Young Univeristy
10:30 – 11:00  break
11:00 Keynote #1
Uncertainty in biological networks: challenges, solutions and opportunitiesTamer Kahveci – University of Florida
12:00 – 1:00 Lunch
1:00 – 3:00 Session 2
1:00 Proteomics, lipidomics, metabolomics: a mass spectrometry tutorial from a computer scientist’s point of viewRob Smith, Brigham Young University
1:30 Structure prediction of polyglutamine disease proteins: comparison of methodsJingran Wen, University of Utah
2:00 Identifying microRNA targets in different gene regionsWenlong Xu
2:30 Modeling of yeast pheromone pathway using petri netsAbhishek Majumdar, University of Nebraska-Lincoln
3:00 – 3:30 break
3:30 – 5:00 Session 3
3:30 GeCON: expression pattern based reconstruction of gene co-expression networksSwarup Roy, North Eastern Hill University
4:00 Inference of radio-responsive gene regulatory networks using the graphical lasso algorithmJung Hun Oh, Memorial Sloan-Kettering Cancer Center
4:30 Multistable switches and their role in cellular differentiation networksAhmadreza Ghaffarizadeh, Utah State University
5:00 – 6:00 Poster Session
6:30 Dinner
Friday, December 6
8:00 – 8:45 Registration and breakfast
9:00 Keynote #2Ethnicity estimation at AncestryDNARoss Curtis –
10:00 – 12:00 Session 4
10:00 Effects of error-correction of heterozygous next-generation sequencing dataM. Stanley Fujimoto, Brigham Young University
10:30 Variant Tool Chest: an improved tool to analyze and manipulate variant call format (VCF) filesMark T.W. Ebbert, Brigham Young University
11:00 ADaM: augmenting existing approximate fast matching algorithms with efficient and exact range queriesNathan Clement, University of Texas at Austin
11:30 Analysis of interactions between the epigenome and structural mutability of the genome using Genoboree Workbench ToolsCristian Coarfa, Baylor College of Medicine
12:00 Keynote #3Towards Precision Medicine: Tute Genomics, a Cloudbased Application for Analysis of Personal GenomesReid Robison
 1:00 Closing Remarks and Box Lunch


Posters appearing in the proceedings 2012 (Additional late posters will be presented at the conference)
Variant Phasing for Family Data, Ben Coffman, Kathryn Kintaro, Jimmy Hales, Dan Bunker, “Given the variants for a mother, father, and child (.vcf or 23andme format), our program phases all the variants of the child that can be phased with the information given. To be phaseable, a child’s variant must be heterozygous and at least one of the parents must be homozygous for their allele in that position.”
Heterozygous Genome Assembly via Binary Classification of Homologous Sequence, Paul Bodily, Cami Ortega, Nozomu Okuda, “Application of traditional genome assemblers to heterozygous genome assembly results in highly-fragmented assemblies, owing to the violation of assumptions during both the contigging and scaffolding phases. Increasing parameter stringency during contigging is an effective solution to obtaining haplotype-specific contigs; however, effective algorithms for scaffolding such contigs are lacking. We propose using machine learning classification of contig pairs to identify homologous sequences in order to reconstruct haplotype-specific scaffolds. A classifier is trained on homologous sequences as found in scaffold graph “bubbles”, which tend to accurately indicate homologous, heterozygous genomic regions. We then use the classifier to find other homologous contigs in order to perform diploid genomic assembly.”
Optimization and Analysis of de novo transcriptome assembly of the halophyte Suaeda fruticosaJoann Diray-Arce, Mark Clement, Bilquees Gul Ajmal Khan, Brent Nielsen, Brigham Young University, “Major efforts to improve salt tolerance in agricultural crops have been attempted to address problems with increasing soil salinity due to irrigation. Conventional crops have been used to breed salt tolerance through genetic engineering, in which genes for salt tolerance were introduced directly to plants. Most of these traditional cash crops can only endure relatively low concentrations of salt; therefore, it may be reasonable to study and domesticate a native, salt tolerant plant. We sequenced a cDNA library using Illumina platform from shoots and roots of a perennial salt-tolerant plant Suaeda fruticosa. The reads were quality assessed and filtered using softwares FastX toolkit, Sickle and Trimmomatic then digitally normalized to yield a total of 99,577,045 good quality paired-­‐end reads. We assembled the transcriptome of this species using Oases-Velvet, Trinity, and Trinity-CAP. Our method included BLAST using the generated assembly as both the query and the reference, keeping the longest transcript to reduce number of replicated transcripts without affecting the percentage of reads mapping back to the assembly.We have compared results and assessed the contigs based on predicted protein coding regions and on the percentage of reads mapped back to the assembly using GSNAP. Our results show the best de-novo assembly of the transcriptome of non-conventional crop plants comparing no salt and optimal salt concentration. This project aims to study the adaptation of halophytes to salt tolerance and identify genes that are involved in this response mechanism.”
Genome Selection Mapping, Hayden Smith, Tyler Dawson, Cameron Ortega, “We created a genome-wide dn/ds map across the Hg19 genome and used this map to create a gene-selection map of 211 Cache County residents. “
Firefly and Local Search algorithms in short read alignment,Dan Haskin, Kyle Corbitt, “Various algorithms are discussed to more accurately align short reads. Firefly and Local Search algorithms are compared to a basic Shotgun algorithm in accuracy. The future potential of these algorithms in this problem domain are explored. “
Evidence for an APOE e4 independent effect on CSF AB42 levels in the APOE gene region, Ivan Arano Rodriguez, Josue Gonzales Murcia, Mark Wadsworth, John S Kauwe, “Alzheimer’s disease (AD) is the most common neurodegenerative disorder and the only one of the top 10 causes of death in the United States for which effective prevention or treatment methods are not available. Substantial progress has been made in understanding Alzheimer’s disease as several genetic markers that are associated with AD risk have been identified1 . Cerebrospinal fluid (CSF) Aβ42 levels are a promising biomarker for AD and have been shown to change more than 15 years before the onset of familial disease. Here we use a new endophenotype-°©‐based approach2 to search for SNPs in the APOE region that show association with CSF Aβ42 levels independently of the APOE ε2 and ε4 isoforms. To do this, we used a large collection of CSF genotype data of 3050 individuals from the ADNI, University of Washington, Washington University and Mayo clinic. We use this dataset to resulting to find SNP association with CSF AB42 levels. First, we trimmed a region with all the SNPs for chromosome 19 using the compiled genotype dataset. Next, we filtered the SNPs for MAF (0.02) and HWE < 1 x 10 -°©‐6, a total of 80,525 SNPs passed this filter. We use the 1000 genomes and dbsnp databases to identify SNPs in a 100 Kb window of the APOE region. We use these SNPs to trim 50 kilobases upstream and downstream of the APOE region and identified 169 SNPs.
We performed logistic regression of these SNPs using various adjustments for series, age, gender and the main components of population stratification analyses. Most importantly, we adjusted for the effects of APOE alleles 2 and 4 in addition to the covariates used above. A Bonferroni correction for these SNPs was also used to adjust for p‐values. Our results show that SNPs rs769449 (p < 1 x 10 -°©‐4) and rs2075650 (p = 0.0001) show association with CSF AB42 levels independently of APOE genotype. Ongoing work will confirm these results by using a meta-°©‐analysis that will include the p-°©‐values obtained as well as the sample and effect sizes. Also, we will run permutation tests with F-°©‐statistics analyses to further confirm association of the SNPs with AB42.This ongoing project will determine whether we can use an endophenotype approach to report association of SNPs in the APOE regions with AB42 levels in CSF. We hope that our results will add insight for a genetic method that can predict AD risk using an endophenotype approach.”
Variant Tool Chest, Mark Wadsworth, Mark Ebbert, “”Next Generation Sequencing (NGS) is a novel technology that enables us to obtain whole genome sequences in only a couple weeks. NGS has ushered in an era of unprecedented sequencing and genotyping across virtually every field of the life sciences and all scientists face a common challenge of finding effective tools to analyze NGS data. One of the most common uses of NGS data is to identify variants that cause disease. A common study design is to sequence the genomes of a group of people affected by disease and a second group of unaffected individuals. The observed variants in the two groups are compared to identify variations that cause disease. This process can produce over 100 Gigabytes of information, with several million genetic variants expected for Caucasian individuals. The analysis of these data is challenging because of the large quantity of data and large number of sequence variants. Identified sequence variants are stored in specialized files called variant call format (VCF) files. VCF files have a highly flexible format and can store genotypes for one or multiple individuals (Danecek, P. et al 2011).
We have developed a novel tool, the Variant Tool Chest (VTC), for the analysis and processing of an unlimited number of VCF files. This tool will fill an important and critical hole in analysis capabilities. Presently there are no tools that can perform set operations (union, intersect, and complement) on VCF files’ functionality that is vital to researchers studying the effects of specific variants on phenotypes of interest. We decided to build our algorithms upon existing open source software. By using Java to write VTC, the program will be accessible to people regardless of which operating system they choose to use (Windows, OSX, Linux, etc). VTC is designed in such a way that anyone can choose to extend its functionality as the needs change in this rapidly developing field. Finally, the Variant Tool Chest will have statistical tools built in to allow researchers to assess the significance of observed sequence variants between individuals displaying different phenotypes. By observing this difference we will be aiding other researchers in the search for disease causing variants in the genome for a variety of diseases.”
A new normalization method for RNA-seq data from the viewpoint of Bayesian, Yongchao Dou, Chi Zhang “In recent years, the next-generation sequencing technology (NGS) has been widely used to measure genome-wide transcriptional profiles. One particular use of NGS is for quantifying gene expression, called RNA-Seq. Previous works show that normalization is an essential step in the analysis of differentially expressed (DE) genes. From the viewpoint of Bayesian, we developed a novel normalization method for RNA-seq data analysis.”
Replication of Epistatic Interactions in Large Alzheimer Disease Dataset, Kevin Boehme, Mark Ebbert, John Kauwe, “Alzheimer disease is a devastating disease that affects millions of people. Understanding interactions involved in this disease play an essential part in the fight to end it. To better understand the complex biological processes behind the etiology and progression of Alzheimer disease I attempted to validate reported interactions (Ebbert et al. 2013) between three known Alzheimer genes but in a much larger dataset. The previous study found statistically significant interactions between rs3865444 C/C (CD33) genotype and the rs670139 G/G (MS4A4E) genotype (p < .016) and the rs11136000 C/C (CLU) genotype and the rs670139 G/G (MS4A4E) genotype (p < .003). We performed the replication in the Alzheimer disease genetic Consortium (ADGC) dataset which comprises around 20,000 individuals.
For both interactions we cleaned the data the same way. We removed individuals not genotyped at the genes of interest. This gave us the following datasets to analyze: CLU-MS4A4E (n=14,692) and CD33- MS4A4E (n=17,222). We then performed a general linear model using APOE, age, gender, and site as covariates. We failed to find statistically significant interactions between the reported genes in these datasets (CLU- MS4A4E (p < 0.15) and CD33- MS4A4E (p < 0.23)). The directions for both interactions were consistent with what was found previously and the p-value of another genotype did approach significance (CD33 (C/C) – MS4A4E (T/G) (p < .1)). This perhaps suggests a non-additive interaction between these two genotypes. Finding replicable interactions in large datasets is filled with challenges. However, these potential interactions merit further study and analysis.”
Towards a more comprehensive evaluation of LC-MS correspondence algorithms,
Rob Smith, Ryan Money, Christine Kendall, Dan Ventura, John Prince, “Mass spectrometry is an established technology for the identification and quantification of unknown substances—such as lipids, proteins, and metabolites—via measurement of the mass-to-charge ratio (m/z) of substances composited in the sample. Because such compositions are inherently complex in nature, approaches have been proposed to differentiate between substances that would otherwise be indistinguishable. Chromatography, for instance, provides a means of delaying the detection of substances within a mass spectrometer based on the physicochemical properties (e.g., hydrophobicity) of each substance in order to spread their detection across time (known as retention time, RT), minimizing the number of isobaric substances.
Liquid chromatography-mass spectrometry (LC-MS) is one of the most widely used chromatographic MS configurations. Despite it’s ubiquity, serious data processing limitations are still inherent. A central challenge is how to match signals generated from identical source substances across multiple MS samples. This is a critical task in many MS experiments, as the end goal is almost always to identify differences between similar samples, such as blood samples from healthy and diseased individuals. However, differential analysis cannot occur without first establishing a sample to sample correspondence. In the last decade or so, over 50 algorithms for LC-MS correspondence have been proposed. However, very few if any have been critically evaluated, thus making it nearly impossible for practitioners to know a priori which of the 50 algorithms is best suited to their experiments. What’s more, in the absence of evaluation, there is a substantial amount of overlap among these methods. For example, about 45 of these algorithms share the same basic approach to the problem, that of warping LC-MS output files such that the difference in retention time of expressed analytes is minimized. The glut of novel MS data processing algorithms is disadvantageous for multiple reasons, including a significant amount of sifting for practitioners before knowing which algorithm is best suited for their application.
In order to facilitate progressive novel algorithms for LCMS correspondence, we propose an expansive evaluative review of current methods. Our review will expand upon the only known critical review by adding additional datasets as well as methods that have since been published or were not included. In our poster, we present preliminary results from several correspondence algorithms on several datasets.
Future work will extend to ten algorithms and benefit from the addition of several in silico simulated datasets from a recently published first of its kind LC-MS simulator, allowing for the use of truly quantitative metrics, whereas the current review is limited to qualitative evaluation since ground truth LC-MS data does not exist.”
Computational aspects of evolutionary biology and phylogeny Ryan Hillary, Aaron Sharp, “In the growing emphasis to understand the physiological systems of the Odonates a general pipeline of analysis was constructed to lay the foundation for further studies. In order to dig deeper into the evolutionary events of taxa contained within Odonata a preliminary phylogenetic tree was needed before exploring other systems. With the ultimate goal of studying and understanding the visual systems of Odonates a phylogenetic tree of solely orthologous genes was needed to lay a framework for the study. Transcriptomic data from about 20 taxa were obtained. QC was completed using Popoolation, Assembly using Trinity. Since the underlying focus of this study is to understand the visual systems of Odonates BLAST will be used to search if opsins are present in our assembled datasets. Beyond this curent study, it is the intent to eventually build a phylogenetic tree to display the evolutionary trends of the three types of opsins among these taxa, namely: Ultraviolet opsin, Long Wavelength opsin, and Blue opsin. After the opsins have been retrieved from the our dataset following the procedure outlined in this study a tree will eventually be constructed using a newer program called IQTREE.”