2008 — 2009 |
Shendure, Jay Ashok |
R21Activity Code Description: To encourage the development of new research activities in categorical program areas. (Support generally is restricted in level of support and in time.) |
Molecular Tools For Genome Partitioning @ University of Washington
[unreadable] DESCRIPTION (provided by applicant): A new generation of technologies is poised to reduce the cost of DNA sequencing by over two orders of magnitude. However, the routine sequencing of full human genomes will continue to be prohibitively expensive in the context of studies that require even modest sample sizes. However, it is frequently the case that investigators are interested in identifying germline variation or somatic mutations in a particular subset of the genome. Examples of genomic subsets that are highly relevant in the context of specific studies include: (a) a locus to which a disease phenotype has been mapped (i.e. a contiguous genomic region); and (b) the exons of genes belonging to a specific disease-related pathway (i.e. a large set of short, discontiguous sequences). Such subsets total to megabases in length, raising the question of how they can be efficiently isolated without performing hundreds to thousands of PCR reactions per genome. Our ability to take advantage of the power of next-generation sequencing technologies is markedly impaired by the lack of a corresponding targeting method, analogous to PCR that is matched to the scale at which the new sequencing platforms will routinely operate. To address this critical need, we will explore several novel strategies for "genome partitioning". Our goal is to develop these strategies into broadly available methods that enable the selective and uniform amplification of complex, arbitrary subsets of a mammalian genome in a single reaction. Our specific aims are: (1) to develop an enzymatic method for the uniform amplification of large sets of exon sequences from a human genome; (2) to develop a hybridization-based method for the selective amplification of contiguous megabase-scale regions from a human genome; (3) to integrate these methods with next-generation sequencing technologies, validating their utility by performing targeted variation discovery in a small number of individuals. [unreadable] [unreadable] PUBLIC HEALTH RELEVANCE: As we enter an era of "personalized medicine", DNA sequencing technology will be increasingly important to public health, contributing towards the unraveling of the genetic basis of human disease, as well as serving an increasing role in clinical diagnostics. Next-generation sequencing technologies have the potential to markedly accelerate genetics research, but are markedly hindered by the lack of equivalently powerful methods to target specific subsets of the human genome. We propose here to develop technologies that meet this critical need. [unreadable] [unreadable] [unreadable]
|
0.958 |
2008 — 2009 |
Eichler, Evan (co-PI) [⬀] Green, Philip P Nickerson, Deborah A [⬀] Shendure, Jay Ashok |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Seattleseq @ University of Washington
DESCRIPTION (provided by applicant): Deep resequencing of human genes has led to the discovery of rare, nonsynonymous sequence variants that are robustly associated with complex human phenotypes. Such studies have historically been rate-limited by the cost of DNA sequencing. Although a new generation of sequencing platforms is reducing costs by over two orders of magnitude, the routine sequencing of complete human genomes continues to be prohibitively expensive. Recently, methods have been developed to enable the efficient capture of specific subsets of the genome. With these methods, the cost of sequencing all of the protein-coding sequences (i.e. ~1% of the human genome split across ~180,000 discontiguous subsequences) may soon be on par with that of dense genotyping arrays. The goals of this proposal are to further the development of these targeting methods, and to integrate them into a scalable resequencing pipeline that relies on second-generation sequencing technology. Specifically, we will: (1) optimize and evaluate candidate strategies for multiplex capture, including array hybridization and gap-fill molecular inversion probes, while extending their application to the full protein-coding genome;(2) integrate optimized capture methods, second-generation sequencing technology, and sequence analysis software into a scalable resequencing pipeline;(3) develop the requisite computational tools for translating raw sequence data generated by new sequencing platforms into quality-tagged, consensus predictions of sequence variants;(4) make our data and methods broadly available, and facilitate the goals of this program through open communication with other investigators and the NIH. PUBLIC HEALTH RELEVANCE - As we enter an era of "personalized medicine", DNA sequencing technology will be increasingly important to public health, contributing towards the unraveling of the genetic basis of human disease, as well as serving an increasing role in clinical diagnostics. Next-generation sequencing technologies have the potential to markedly accelerate genetics research, but are hindered by the lack of equivalently powerful methods to target specific subsets of the human genome. We propose here to develop technologies that meet this critical need, focusing specifically on the development of a scalable resequencing pipeline that targets the ~1% of the genome that is protein-coding. The principal investigator (PI) proposes to evaluate two different methods that will capture approximately 1% of the human genome that represents the protein coding genome (PCG).
|
0.958 |
2009 — 2010 |
Bamshad, Michael Joseph Nickerson, Deborah A [⬀] Raskind, Wendy H (co-PI) [⬀] Shendure, Jay Ashok |
RC2Activity Code Description: To support high impact ideas that may lay the foundation for new fields of investigation; accelerate breakthroughs; stimulate early and applied research on cutting-edge technologies; foster new approaches to improve the interactions among multi- and interdisciplinary research teams; or, advance the research enterprise in a way that could stimulate future growth and investments and advance public health and health care delivery. This activity code could support either a specific research question or propose the creation of a unique infrastructure/resource designed to accelerate scientific progress in the future. |
Next Generation Mendelian Genetics @ University of Washington
DESCRIPTION (provided by applicant): This application addresses NHGRI RFA-OD-09-004 for Medical Sequencing Discovery Projects. The ultimate goal of this proposal is to scale a new approach to identify the candidate genes and mutations that underlie rare Mendelian diseases in humans by exome resequencing. For decades, linkage analysis has been the mainstay of human genetics. However, for rare Mendelian diseases where family collection is difficult or pedigrees are small, this approach is less useful. Although the molecular bases of more than 2,600 Mendelian diseases have been determined by linkage mapping or a candidate gene approach, a nearly equal number remain to be solved (OMIM). We have assembled a collection of rare pediatric and adult Mendelian diseases that are representative of this unsolved set. In every instance, the identification of the causal gene remains intractable to either linkage mapping or exhaustive candidate gene analysis. Exome resequencing offers a new way forward for dissecting the underlying causes of rare Mendelian diseases. In our preliminary studies, we show that selective capture of protein coding sequences across the human genome coupled with massively parallel resequencing to define coding variation can accurately identify the gene underlying a monogenic disorder. In this example, comparative analysis of exome variation data from as few as two unrelated individuals affected with the disease reduced the list of candidate genes to less than ten. The candidate list was further reduced to a single gene with exome data from as few as four unrelated cases. Once identified, each candidate gene will be screened for disease-causing variants by conventional methods in a larger set of cases. Discovery of the genetic basis of a large collection of rare disorders that have, to date, been unyielding to traditional analysis will substantially expand our understanding of the biology of the human genome, facilitate accurate diagnosis and improved management of these diseases, and provide the information needed for the development of novel therapeutics. If successful, this approach is likely to replace linkage analysis as the dominant paradigm for studying diseases exhibiting Mendelian inheritance patterns and will provide a new path forward for medical genetics. PUBLIC HEALTH RELEVANCE: As we enter an era of personalized medicine, DNA sequencing will be increasingly important to public health, contributing to our understanding of the genetic basis of human disease. The targeted capture and massively parallel sequencing of all protein coding regions in the human genome (the exome) has the potential to markedly accelerate human genetics research as an efficient method for identifying highly penetrant variants at a genome-wide scale. This project will apply and evaluate exome resequencing as a new tool to rapidly identify the causes of dozens of rare genetic diseases in humans.
|
0.958 |
2009 — 2010 |
Green, Philip P Nickerson, Deborah A [⬀] Rieder, Mark J Shendure, Jay Ashok |
RC2Activity Code Description: To support high impact ideas that may lay the foundation for new fields of investigation; accelerate breakthroughs; stimulate early and applied research on cutting-edge technologies; foster new approaches to improve the interactions among multi- and interdisciplinary research teams; or, advance the research enterprise in a way that could stimulate future growth and investments and advance public health and health care delivery. This activity code could support either a specific research question or propose the creation of a unique infrastructure/resource designed to accelerate scientific progress in the future. UC2Activity Code Description: To support high impact ideas through cooperative agreements that that may lay the foundation for new fields of investigation; accelerate breakthroughs; stimulate early and applied research on cutting-edge technologies; foster new approaches to improve the interactions among multi- and interdisciplinary research teams; or, advance the research enterprise in a way that could stimulate future growth and investments and advance public health and health care delivery. This activity code could support either a specific research question or propose the creation of a unique infrastructure/resource designed to accelerate scientific progress in the future. This is the cooperative agreement companion to the RC2. |
Northwest Genomics Center @ University of Washington
DESCRIPTION (provided by applicant): Project Abstract: This application addresses the NHLBI-RFA-OD-09-004 for Large-scale DNA Sequencing and Molecular Profiling of Well-phenotyped NHLBI Cohorts. We propose to establish a sequencing center to perform the production-level resequencing of exomes from 10,000 genomic DNA samples derived from well-phenotyped NHLBI cohorts. Second generation methods for targeted capture and DNA sequencing have matured rapidly. Exome sequencing currently has advantages over whole genome sequencing for studies aimed at understanding the contribution of rare variants to heart, lung and blood diseases. These advantages include much lower costs per sample and an increased likelihood of identifying variants of large effect that are amenable to functional interpretation. In our preliminary studies, we developed methods for targeted capture and second generation sequencing of protein-coding sequences at a genome-wide scale, i.e. the exome. We are consistently able to identify coding variants at 96% of targeted bases for 5% of the sequencing effort required for a whole genome. The result is high quality exomes, with a concordance to genotype calls of >99.75% and a false discovery rate for novel variants of <1%. We also show the power of exome sequencing for the direct identification of the causative gene for a monogenic disease. This proof-of-concept serves as a starting point for extending exome sequencing to study extreme and/or complex phenotypes of relevance to the NHLBI mission. Improvements that increase throughput or decrease costs while maintaining high data quality will be integrated into the exome production pipeline. Our recent innovations include a novel algorithm that nearly doubles the usable amount of sequence data that can be extracted from second generation sequencing image sets. The production focus of our team will be complemented by experts in high-throughput sequencing and genotyping, technology development (experimental and algorithmic), the statistical analysis of rare variation, population genetics, and copy number variation. Samples will be received from NHLBI cohorts and undergo extensive quality control prior to exome sequencing. Following sequencing, we will deliver a fully annotated set of coding variants for each individual. For the final deliverable, we will develop a custom genotyping chip for up to 50,000 high-impact, nonsynonymous variants to be assayed on a larger set of cohort samples (up to 50,000). We anticipate working closely with cohort investigators and the NHLBI to maximize the scientific value of these data and of this program. PUBLIC HEALTH RELEVANCE: Project Narrative: Well-phenotyped cohorts provide a key resource for studying the contribution of genetic variation to traits related to heart, lung or blood diseases. Applying targeted capture and massively parallel sequencing of all protein coding regions in the human genome (the exome) to well-phenotyped cohorts will help to delineate the contributions of both rare and common protein-altering variants to common diseases for the first time.
|
0.958 |
2011 — 2016 |
Shendure, Jay Ashok |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Massively Parallel Contiguity Mapping @ University of Washington
DESCRIPTION (provided by applicant): Massively parallel technologies have reduced the per-base cost of DNA sequencing by several orders of magnitude. However, limited read lengths and a lack of methods to establish contiguity over even modest distances have prevented these technologies from achieving the high-quality, low-cost de novo assembly of mammalian genomes. Even as revolutionary sequencing technologies further mature, it may continue to be the case that the best technologies in terms of cost-per-base yield reads that are of an insufficient length or quality for the effective de novo assembly of large genomes. To address this critical need, we are exploiting high density, random, in vitro transposition as a novel means of physically shattering genomic DNA in creative ways that facilitate the recovery of contiguity information at different scales. Our project is divided into four aims, the first three of which are respectively directed at the development of massively parallel methods for determining short-range, mid-range, and long-range contiguity. These are: 1) a method for shattering genomic DNA with symmetric tags that post hoc inform the ordering of adjacent fragmentation events in a way that is entirely independent of the primary sequence content;2) a method for massively parallel, in vitro barcoding of fosmid or BAC-sized subsequences of a genome, thereby facilitating hierarchical assembly;3) an in situ method for converting stretched DNA molecules into adaptor-flanked libraries, such that reads generated by massively parallel sequencing will remain linearly ordered in terms of the XY coordinates at which they originate. In the fourth aim, we will integrate these methods to demonstrate: 1) the highly cost-effective de novo assembly of the mouse genome with a quality that exceeds that of the original assembly;2) the highly cost-effective haplotype resolved resequencing of a human genome. PUBLIC HEALTH RELEVANCE: As we enter an era of personalized medicine, a deep understanding of the human genome will be increasingly important to public health, contributing to the unraveling of the genetic basis of human disease, as well as serving an increasing role in clinical diagnostics. The technologies developed by this project will accelerate progress towards these goals by enabling the affordable sequencing of haplotype-resolved human genomes. These same technologies will also facilitate the high-quality, cost-effective assembly of the genomes of other mammalian species, which inform our understanding of the human genome through evolutionary analysis.
|
0.958 |
2011 — 2013 |
Shendure, Jay Ashok |
R21Activity Code Description: To encourage the development of new research activities in categorical program areas. (Support generally is restricted in level of support and in time.) |
Ultrasensitive Identification and Precise Quantitation of Low Frequency Somatic M @ University of Washington
DESCRIPTION (provided by applicant): The ultrasensitive detection of clinically relevant somatic alterations in cancer genomes has great potential for impacting patient care, e.g. for early detection, establishing diagnoses, refining prognoses, guiding treatment, and monitoring recurrence. However, current technologies are poorly suited to the robust detection of somatic mutations present at very low frequencies. Massively parallel sequencing represents one path forward, but its sensitivity to detect very rare events is fundamentally constrained by the sequencing error rate. Our goal is to develop a new experimental paradigm that overcomes this limitation. In our approach, each copy of a target sequence that is present in a sample is molecularly tagged during the first cycle of a multiplex capture reaction with a unique barcode sequence. After amplification, target amplicons and their corresponding barcodes are subjected to massively parallel sequencing. During analysis, the barcodes are used to associate sequence reads sharing a common origin. Through oversampling, barcode-associated reads error-correct one another to yield an independent haploid consensus for each progenitor molecule, i.e. "molecular counting". Furthermore, the collapsing of commonly derived reads inherently corrects for any allele-specific bias during amplification, such that estimates of mutant allele frequency can be accompanied by precise confidence bounds. In our first aim, we will develop experimental methods and analytical tools that enable the robust detection of targeted somatic mutations via molecular counting to frequencies as low as 1 mutated copy in a background of 100,000 unmutated copies. In our second aim, we will develop three ultrasensitive, multiplex molecular counting assays that are specifically targeted at panels of clinically relevant cancer mutations or genes, and rigorously evaluate these for reproducibility. The availability of robust, cost-effective, generically applicable tools for the ultrasensitive, multiplex detection of rare somatic events will be a transformative step forward for the translation of discoveries in cancer genetics to a clinical setting. PUBLIC HEALTH RELEVANCE: As we enter an era of "personalized medicine", DNA sequencing technology will be increasingly important to public health, contributing towards the unraveling of the genetic basis of human disease, as well as for clinical diagnostics. This proposal aims to develop ultrasensitive methods for detecting cancer-relevant mutations in tumor samples. These technologies have the potential to directly enable the translation of discoveries made in cancer genetics to clinical applications such as the early detection of cancer and the monitoring of patients for cancer recurrence.
|
0.958 |
2012 — 2015 |
Bamshad, Michael Joseph Nickerson, Deborah A [⬀] Shendure, Jay Ashok |
U54Activity Code Description: To support any part of the full range of research and development from very basic to clinical; may involve ancillary supportive activities such as protracted patient care necessary to the primary research or R&D effort. The spectrum of activities comprises a multidisciplinary attack on a specific disease entity or biomedical problem area. These differ from program project in that they are usually developed in response to an announcement of the programmatic needs of an Institute or Division and subsequently receive continuous attention from its staff. Centers may also serve as regional or national resources for special research purposes, with funding component staff helping to identify appropriate priority needs. |
Uw Center For Mendelian Genomics @ University of Washington
DESCRIPTION (provided by applicant): Over the past three decades, the genes underlying nearly 3,000 Mendelian disorders have been identified by methods such as linkage analysis and positional cloning. Although the availability of a reference human genome greatly accelerated these efforts, there are thousands of additional suspected Mendelian disorders that remain unsolved. An understanding of the genetic basis of a Mendelian disorder can yield fundamental insights into basic human biology and disease pathophysiology, as well as a molecular basis for diagnosis or carrier status determination. In some instances, biological insights from studying Mendelian disorders can prove highly relevant to our understanding of more common diseases. Recently, we and others have shown that the coupling of targeted capture and next-generation DNA sequencing technology can be used to cost-effectively determine nearly all coding variation in an individual human genome, a process termed exome sequencing. We, and others, have also demonstrated how exome sequencing can be applied to efficiently identify the causal genes for Mendelian disorders that have proven intractable to conventional modes of analysis. To accelerate progress towards a comprehensive understanding of the genetic basis of all Mendelian disorders, we propose to establish the UW Center for Mendelian Genomics. Our proposal has four specific aims: (1) To organize samples for all unsolved Mendelian disorders from investigators around the world, either by their submission to our center for sequencing, or by their inclusion on a public sample list that we will develop; (2) To apply our existing production pipeline for exome and genome sequencing to samples corresponding to unsolved Mendelian disorders, and to improve this process through ongoing technology innovation; (3) To determine the genetic basis for as many unsolved Mendelian disorders as possible, through efficient study design and effective, innovative analysis; (4) To take a leadership role in the dissemination of methods and data.
|
0.958 |
2012 — 2014 |
Ahituv, Nadav (co-PI) [⬀] Shendure, Jay Ashok |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Massively Parallel, in Vivo Functional Testing of Regulatory Elements @ University of Washington
DESCRIPTION (provided by applicant): The overall aim of the ENCODE project is to comprehensively identify functional elements in the human genome. Currently applicable high-throughput technologies, such as RNA-Seq, ChIP-Seq, and DNase-Seq, exploit patterns of marks to infer the role of specific sequences, but generally fall short of functionally interrogatig and thereby validating these predictions. To address this gap, we propose a novel paradigm for the massively parallel functional testing of candidate regulatory elements. In preliminary work, we have developed a system whereby sequence-based transcribed barcodes enable the extensive multiplexing of classic reporter assays, in vitro or in vivo. Here, we propose to adapt this approach for testing tens-of- thousands of human regulatory elements in single assays, and furthermore to shift these assays from an episomal to a chromosomal context. Our specific aims are: (1) To develop high-throughput methods to clone, by capture or by synthesis, large numbers of candidate regulatory elements and to link them to transcribed, synthetic barcodes within complex populations of reporter vectors. (2) To test in parallel tens-of-thousands of candidate regulatory elements nominated by liver ChIP-Seq for in vitro and in vivo activity using HepG2 transfections and the hydrodynamic tail vein assay, with RNA-Seq of the synthetic barcodes serving as a single readout for the differential activity of distinct candidate regulatory elements. (3) To develop a similarly multiplexed lentiviral assay for regulatory element analysis that is chromosomally based and generically applicable to diverse cell and tissue types. We anticipate that these methods can be scaled for the efficient, in vivo functional testing of large numbers of candidate regulatory elements nominated by other technologies. Furthermore, our approach can easily be adopted by other researchers and used for many related goals, such as testing which regulatory elements work together, dissecting the fine-scale architecture of individual regulatory elements, and evaluating the performance of synthetic regulatory elements. PUBLIC HEALTH RELEVANCE: As we enter an era of personalized medicine, a deep understanding of the human genome will be increasingly important to public health, contributing towards the unraveling of the genetic basis of human disease, as well as serving an increasing role in clinical diagnostics. Regulatory sequences in the human genome, that is, sequences that are functionally important but do not encode proteins, are clearly of fundamental importance but are nonetheless poorly understood. This project will develop novel technologies for the parallel validation of large numbers of candidate regulatory sequences, thereby furthering our understanding of their function.
|
0.958 |
2013 — 2017 |
Shendure, Jay Ashok |
DP1Activity Code Description: To support individuals who have the potential to make extraordinary contributions to medical research. The NIH Director’s Pioneer Award is not renewable. |
Interpreting Genetic Variants of Uncertain Significance @ University of Washington
DESCRIPTION (provided by applicant): The sequencing of individual human genomes may soon be routine in certain clinical contexts - for example, to diagnose suspected Mendelian disorders in pediatric patients, or to guide therapeutic decisions in cancer treatment. However, even as its cost plummets to $1,000 or less, the value of a personal genome will remain highly constrained by the poor interpretability of individual genetic variants. For example, although BRCA1 and BRCA2 are clinically actionable when loss-of-function mutations are present, and although both genes have been sequenced in >50,000 patients over the past decade, the result returned to patients is often still variant of uncertain significance. This challenge will profoudly deepen as clinical sequencing accelerates and as the list of clinically actionable genes grows. To address this, we propose to develop a novel approach for experimentally measuring the functional consequences of such variants of uncertain significance at an unprecedented scale, as well as innovative computational approaches for estimating the relative pathogenicity of any possible variant in the entire human genome. For clinically relevant genes, we will exploit massively parallel technologies for nucleic acid synthesis and sequencing towards a new paradigm for dissecting function at saturating resolution. The application of this paradigm will yield experimentally grounded predictions for the functional consequences of all possible single residue variants, thereby informing the interpretation of variants newly observed in patients. For the remainder of the human genome, we will develop a framework for integrating a proliferating diversity of coding and non-coding annotations to a single metric. We will then calculate this metric of relative pathogenicity for all possible single nucleotide variants in the human genome. We anticipate that these methods and the resulting pre-computations of pathogenicity will broadly enable the interpretation of human genome sequences in diverse clinical and research settings.
|
0.958 |
2015 — 2017 |
Cooper, Gregory Michael (co-PI) [⬀] Shendure, Jay Ashok |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Integrative Interpretation of the Organismal Consequences of Non-Coding Variation @ University of Washington
? DESCRIPTION (provided by applicant): Our capacity to sequence human genomes has exceeded our ability to interpret genetic variation, particularly in non-coding regions. To address this challenge, we recently developed a novel framework, Combined Annotation Dependent Depletion (CADD), for estimating the deleteriousness of any genetic variant. CADD defines an objective, data-rich, and quantitative integration of many genomic annotations into a single measure of variant effect at the organismal level. The goals of this R01 proposal are to further develop the CADD framework, to apply it in the context of ongoing genetic studies of both rare and common human diseases, and to experimentally evaluate its predictions. In Specific Aim 1, we will substantially modify CADD in both straightforward and creative ways, with the goal of dramatically improving CADD's ability to annotate non- coding variants, not only to estimate their organismal effects but also to provide insights into molecular mechanisms. In Specific Aim 2, we will apply CADD to a variety of ongoing whole genome sequencing studies of human disease, especially those in which non-coding variants are either known or suspected to be causal. As part of this effort, we will develop new statistical frameworks that directly incorporat CADD into traditional genome-wide discovery approaches. In Specific Aim 3, we will perform a combination of high-throughput (massively parallel reporter assays), medium-throughput (CRISPR/Cas9), and low-throughput (in vivo mouse transgenics) experimental assays for systematic and targeted assessment of CADD predictions. This proposal includes both computational and experimental innovations, and builds on established collaborative relationships between investigators with complementary strengths. The completion of our aims will yield novel methods, data, and resources with which to annotate whole genome sequences, broadly enabling the field to more effectively identify and mechanistically understand non-coding genetic variants that are causally relevant to human disease.
|
0.958 |
2015 — 2019 |
Noble, William Stafford (co-PI) [⬀] Shendure, Jay Ashok |
U54Activity Code Description: To support any part of the full range of research and development from very basic to clinical; may involve ancillary supportive activities such as protracted patient care necessary to the primary research or R&D effort. The spectrum of activities comprises a multidisciplinary attack on a specific disease entity or biomedical problem area. These differ from program project in that they are usually developed in response to an announcement of the programmatic needs of an Institute or Division and subsequently receive continuous attention from its staff. Centers may also serve as regional or national resources for special research purposes, with funding component staff helping to identify appropriate priority needs. |
University of Washington Center For Nuclear Organization and Function @ University of Washington
? DESCRIPTION: A current grand challenge in genomics involves accurately assaying, at all relevant scales, the 3D conformation of DNA in vivo and then linking conformational changes to dynamic processes such as the cell cycle, differentiation and disease. Here we propose to create the University of Washington Center for Nuclear Organization and Function, bringing together an interdisciplinary team of investigators whose diverse areas of expertise - technology development, computational modeling, and mouse and human biology - make them ideally suited to this challenge. Our overall hypothesis is that characterizing and understanding changes in genome architecture over time (the 4D nucleome) will lead to fundamental insights into human biology and disease. We will address this hypothesis by developing a combination of experimental and computational methods development, coupled with their systematic biological validation and application to development- and disease-relevant systems. On the experimental side, we will further optimize our recently developed DNase Hi- C assay, including combinatorial methods for single cells, ultimately aiming to concurrently assay nuclear architecture and gene expression within each of many single cells. On the computational side, we will extend our existing 3D modeling algorithms to account for diploidy, cell-to-cell variabilit, the hierarchical nature of genome architecture, and to explicitly model architectural changes over cell cycle and cell differentiation time scales. We will then employ several complementary computational methods to link our 4D nucleome models to existing, 1D genomics data sets. The outputs of these new experimental and computational technologies will be subjected to orthogonal validation in several well-understood model systems: human cell lines, in vivo tissues from interspecific F1 hybrid mice, mouse embryonic stem cells (ESCs) and skeletal myoblasts. We will also test specific predictions of the models in response to targeted (genome editing) or large-scale (chromosome silencing) perturbations. After initial validation and in parallel with further methods development, we will apply our new tools to the analysis of three biological systems: we will characterize the dynamics of nuclear architecture during the directed differentiation of naïve human ESCs into cardiomyocytes and endothelial cells; we will test the hypothesis that cardiomyopathy-inducing mutations in the nuclear scaffolding protein, lamin A, are associated with derangements in cardiomyocyte nuclear architecture; and we will determine the changes in human cardiomyocyte nuclear architecture induced by trisomy 21. The proposed center will produce new experimental protocols for ascertaining 4D nucleome architecture, two new software toolkits for modeling the 4D nucleome and linking features of the nucleome to other types of genomic data, a variety of publicly available, large-scale 4D nucleome data sets in mouse and human systems, and fundamental insights into human biology and disease. In all of this work, we will work closely and openly with NOFIC and the 4DN Network to maximize the impact of our center and the overall program.
|
0.958 |
2015 — 2019 |
Shendure, Jay Ashok |
U54Activity Code Description: To support any part of the full range of research and development from very basic to clinical; may involve ancillary supportive activities such as protracted patient care necessary to the primary research or R&D effort. The spectrum of activities comprises a multidisciplinary attack on a specific disease entity or biomedical problem area. These differ from program project in that they are usually developed in response to an announcement of the programmatic needs of an Institute or Division and subsequently receive continuous attention from its staff. Centers may also serve as regional or national resources for special research purposes, with funding component staff helping to identify appropriate priority needs. |
Project 1: Uw-Cnof Mapping Technology Development @ University of Washington
ABSTRACT ? PROJECT 1: UW-CNOF MAPPING TECHNOLOGY DEVELOPMENT Over the past decade, advances in technologies for assaying genome architecture have led to remarkable progress in our understanding of the 4D nucleome, i.e. the spatiotemporal organization of the eukaryotic genomes within nuclei. Among all of the powerful experimental tools that have recently emerged, chromosome conformation capture (3C) and its high-throughput derivatives have become the most widely used methods for characterizing genome architecture both locally and globally. However, the current repertoire of 3C-based methods is crucially limited with respect to key parameters such as specificity, resolution and input requirements. Recently, we have made substantial progress in addressing these limitations with DNase Hi-C, a restriction enzyme-free derivative of the Hi-C protocol. Here, we propose to further develop biochemical methods for characterizing the dynamic 4D nucleome that substantially improve upon the state of the art with respect to input requirements (down to single cell), resolution (eliminating restriction enzyme bias), scale (genome-wide or targeted views) and integration (combined measurements with the transcriptome and epigenome), while also improving sensitivity, specificity, simplicity and throughput. In Aim 1, we will continue to optimize genome-wide and targeted DNase Hi-C protocols ? including a much simplified, in situ version of DNase Hi-C ? to further minimize input requirements and bias while improving resolution. We will also refine these protocols in order to maximize robustness, scalability and exportability. In Aim 2, we will develop a high- throughput method for routinely measuring genome architecture in large numbers of single cells. Our proposed approach, based on combinatorial indexing and supported by substantial preliminary data, enables the routine production of DNase Hi-C (nuclear architecture) or ATAC-seq (chromatin accessibility) data from hundreds to thousands of single cells per experiment. In Aim 3, we will integrate DNase Hi-C and other assays for the concurrent measurement of genome architecture, epigenetic state, and the transcriptome, in each of many single cells. We believe that the successful development of such co-assays will profoundly advance our ability to develop integrative models connecting genome form and function. Finally, in Aim 4, we will standardize, benchmark, and export the experimental methods developed by this project, with the goal of maximizing their impact and utility for NOFIC investigators, the 4DN Network, and the broader scientific community.
|
0.958 |
2017 — 2020 |
Seelig, Georg [⬀] Shendure, Jay Ashok |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Predictive Modeling of Alternative Splicing and Polyadenylation From Millions of Random Sequences @ University of Washington
The proportion of the human genome that underlies gene regulation dwarfs the proportion that encodes proteins. However, we remain poorly equipped for identifying which genetic variants compromise gene regulatory function in ways that may contribute to risk for both rare and common human diseases. Understanding how non-coding sequences regulate gene expression, as well as being able to predict the functional consequences of genetic variation for gene regulation, are paramount challenges for the field. Here, we propose to combine synthetic biology, massively parallel functional assays, and machine learning to profoundly advance our understanding of the `regulatory code' of the human genome. While challenging, the task of unravelling complex codes from large amounts of empirical data is not without precedent. For example, over the past decade, computer scientists working in natural language processing have made immense progress, driven in large part by a combination of algorithmic and computational improvements and enormously larger training datasets than were available to the previous generations of scientists working in this area. Inspired by the revolutionizing impact of ?big data? for traditional problems in machine learning, we propose to model gene regulatory phenomena using training datasets with several orders of magnitude more examples than naturally exist in the human genome. We predict that the models learned from massive numbers of synthetic examples will strongly outperform models learned from the small number of natural examples. We will demonstrate our approach by developing comprehensive, quantitative, and predictive models for alternative splicing and alternative polyadenylation, two widespread regulatory mechanisms by which a single gene can code for multiple transcripts and proteins. However, we anticipate that this basic paradigm ? specifically, the massively parallel measurement of the functional behavior of extremely large numbers of synthetic sequences followed by quantitative modeling of sequence-function relationships ? can be generalized to advance our understanding of diverse forms of gene regulation.
|
0.958 |
2019 — 2021 |
Platt, Michael L (co-PI) [⬀] Shendure, Jay Ashok Snyder-Mackler, Noah |
U01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Single Cell Transcriptional and Epigenomic Atlas of the Macaque Brain Across the Lifespan @ University of Washington
ABSTRACT / SUMMARY New technologies are enabling molecular profiling of single brain cells at remarkable throughput. However, these new methods have yet to be extensively applied to the brains of model organisms that bridge the evolutionary distance between mouse and human, including the most common nonhuman primate model system - the rhesus macaque. Here we propose to generate an anatomically resolved, single cell atlas of the epigenome (5.5 million cells) and transcriptome (11 million cells) of the rhesus macaque brain. We will apply two methods, recently developed in our labs, that rely on ?combinatorial indexing? to cost-effectively profile the epigenomes (sci-ATAC-seq) and transcriptomes (sci-RNA-seq) of large numbers of cells or nuclei. As our first aim, we will generate high resolution, single cell epigenetic and transcriptional atlases of one male and one female rhesus macaque brain. Specifically, we will profile chromatin accessibility in 750,000 nuclei (sci-ATAC- seq) and transcription in 1,500,000 nuclei (sci-RNA-seq) from each of two macaque brains (for a total of 4.5 million cells). These will be obtained from 25 anatomically dissected brain regions (30,000 sci-ATAC-seq and 60,000 sci-RNA-seq profiles per region per brain). As our second aim, we will extend these atlases to span the primate lifespan. Specifically, we will perform single cell epigenetic and transcriptional profiling of the brains of 50 additional rhesus macaques (25 regions per brain; 3,200 sci-ATAC-seq and 6,400 sci-RNA-seq profiles per individual/region, for a total of 12 million molecularly profiled cells). This large sample size will allow us to characterize natural variation in chromatin accessibility and transcription within each cell type, between individuals, sexes, and across the natural lifespan of rhesus macaques. At 16.5 million cells, our rhesus macaque brain atlas will comprise the largest transcriptional and epigenomic single cell dataset of any primate organ to date. Our data will be rapidly shared with BICCN and the broader community. We anticipate it will be an essential resource, complementary to other efforts, for identifying the distribution and function of key cell types across the primate brain, allowing for the development of cell type- and region-specific molecular interventions that will help us understand brain function and the etiology, and potentially the treatment, of brain disorders.
|
0.958 |
2019 — 2021 |
Shendure, Jay Ashok Trapnell, Bruce Colston |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Versatile, Exponentially Scalable Methods For Single Cell Molecular Profiling @ University of Washington
The field of single cell genomics is exploding. However, the vast majority of studies restrict themselves to quantifying mRNA transcription, typically in a few thousand cells. We have recently pioneered a new class of methods based on the concept of single cell combinatorial indexing (?sci?), wherein several rounds of splitting, molecular indexing, and pooling are used to uniquely label nucleic acids of cells or nuclei, without requiring the isolation or compartmentalization of each cell. The number of cells that can be uniquely labeled scales exponentially with the number of rounds of indexing, ?e.g. ?millions of cells can be profiled with as few as three rounds of indexing. Since 2015, we have developed sci- methods for quantifying chromatin accessibility (sci-ATAC-seq), transcription (sci-RNA-seq), chromatin architecture (sci-Hi-C), and genome sequence (sci-LIANTI), as well as a co-assay of chromatin accessibility and transcription (sci-CAR). Here, we propose to develop a much broader range of single cell methods, all based on the unifying concept of single cell combinatorial indexing. In our first aim, we will develop additional ?single channel? sci- assays of various aspects of molecular state. In our second aim, we will develop additional ?two channel? sci- assays, ?e.g. co-assays of RNA and DNA. In our third aim, we will adapt sci- assays to enable large-scale chemical and genetic screens in single cells. In our final aim, we will work to make the methods and associated software widely available to the research community. As a versatile, exponentially scalable platform, we anticipate that single cell combinatorial indexing will deepen and broaden the impact of single cell genomics for diverse goals, including for descriptive molecular atlases of organisms, for functional studies of genes and regulatory elements, and for modeling gene regulation.
|
0.958 |
2021 |
Ahituv, Nadav (co-PI) [⬀] Kircher, Martin (co-PI) [⬀] Shendure, Jay Ashok |
UM1Activity Code Description: To support cooperative agreements involving large-scale research activities with complicated structures that cannot be appropriately categorized into an available single component activity code, e.g. clinical networks, research programs or consortium. The components represent a variety of supporting functions and are not independent of each component. Substantial federal programmatic staff involvement is intended to assist investigators during performance of the research activities, as defined in the terms and conditions of the award. The performance period may extend up to seven years but only through the established deviation request process. ICs desiring to use this activity code for programs greater than 5 years must receive OPERA prior approval through the deviation request process. |
Massively Parallel Characterization of Variants and Elements Impacting Transcriptional Regulation in Dynamic Cellular Systems @ University of Washington
SUMMARY / ABSTRACT A major fraction of heritability for common diseases, as well as for the penetrance and expressivity of rare diseases, partitions to distal regulatory elements in the human genome, overwhelmingly cell type-specific enhancers. However, a rate-limiting challenge for the field has been how to identify the specific variants, elements and regulated genes that mediate these effects on disease liability. Towards the overall goals of the Impact of Genomic Variation on Function (IGVF) Consortium, we propose to test over one million human regulatory elements or variants for their functional effects on transcriptional regulation, as well as to query over 100,000 distal regulatory elements for the gene(s) that they regulate. A first theme of our proposal is the diversity of multiplex technologies that we will employ to these ends, including massively parallel reporter assays (MPRAs), crisprQTL, saturation genome editing, multiplex prime editing and single cell combinatorial indexing, many of which we pioneered. A second theme is a focus on dynamic cellular systems that enable a given library of variants and/or elements to be tested across a broad range of cell types and states within a single experiment; these will include ESC-derived neuronal progenitors, cardiomyocytes, embryoid bodies, gastruloids and organoids, and in select cases, mice. A third theme involves leveraging our experience (e.g. CADD, a widely used, genome-wide catalog of variant effect predictions) to support the overarching goals of IGVF. Specifically, we envision using functional measurements generated by us and others to produce well-calibrated predictions of enhancer activity and variant effects that are continuous along the branching trajectories that comprise human development. Our specific aims are as follows: (1) To perform massively parallel validation and functional characterization of candidate human enhancers in a broad range of cell type contexts. (2) To perform massively parallel characterization of human genetic variants with potential roles in human disease. (3) To contribute to a comprehensive variant-element-phenotype catalog while taking a leadership role in synergistic interactions within IGVF, in the dissemination of methods, data and predictions, and in the overarching goals of the consortium.
|
0.958 |
2021 |
Disteche, Christine M. (co-PI) [⬀] Noble, William Stafford [⬀] Shendure, Jay Ashok |
UM1Activity Code Description: To support cooperative agreements involving large-scale research activities with complicated structures that cannot be appropriately categorized into an available single component activity code, e.g. clinical networks, research programs or consortium. The components represent a variety of supporting functions and are not independent of each component. Substantial federal programmatic staff involvement is intended to assist investigators during performance of the research activities, as defined in the terms and conditions of the award. The performance period may extend up to seven years but only through the established deviation request process. ICs desiring to use this activity code for programs greater than 5 years must receive OPERA prior approval through the deviation request process. |
Uw 4-Dimensional Genomic Organization of Mammalian Embryogenesis Center @ University of Washington
Project Summary / Abstract A major shortcoming of most efforts to understand the 4D nucleome is that they have mainly focused on in vitro cell lines, rather than on dynamic, in vivo systems. Arguably, the most important in vivo system, which also happens to be the most dynamic, is development itself, wherein the nucleome both shapes and is shaped by the initial emergence of the myriad mammalian cell types. While these in vivo dynamics are presently poorly documented and understood, recently emerged technologies offer a path forward. Here we propose to establish the University of Washington 4-Dimensional Genomic Nuclear Organization of Mammalian Embryogenesis Center (UW 4D GENOME Center), which will address these massive gaps in our understanding by generating systematic datasets on nuclear morphology and associated molecular measurements in mammalian tissues and cell types. These datasets will be generated in the context of the leading model organism for mammalian development, the mouse. Our approach focuses on following nuclear structure, chromatin and gene expression changes at a ?whole organism? scale, using a combination of scalable single cell profiling and ?visual cell sorting? (VCS) methods, all well-established and mostly developed in our own labs. Our goal is to generate a high- resolution 4DN atlas of mouse embryogenesis for the community. The different types of data will be integrated, including cross-species imputation to integrate with human data, as well as models and navigable maps applied to pathways relevant to mammalian development.
|
0.958 |