227 Biotechnology Building
Professor of Population Genetics
Andrew G. Clark is the Jacob Gould Schurman Professor of Population Genetics and Nancy and Peter Meinig Family Investigator. He received a B.S. in Biology and Applied Mathematics at Brown University in 1976, and a Ph.D. in Population Genetics at Stanford University in 1980. He did postdoctoral work at Arizona State University and the University of Aarhus, Denmark, and a sabbatical at the University of California at Davis. Prior to joining the Cornell faculty in 2002, he was a professor in the Department of Biology at Penn State University. Dr. Clark's research focuses on the genetic basis of adaptive variation in natural populations, with emphasis on quantitative modeling of phenotypes as networks of interacting genes. Dr. Clark has been active in genomics research and has been a frequent consultant with Celera Genomics since April 1999. He was elected Fellow of the American Association for the Advancement of Science in 1994, and serves on review panels for the NIH, NSF, and the Max Planck Society. Dr. Clark's research has been supported by the National Institutes of Health, the National Science Foundation, the Alfred P. Sloan Foundation, NATO, and the Marsden Fund. He served as President of the Society of Molecular Biology and Evolution, and is on the Council of the Genetics Society of America. Dr. Clark is a member of the graduate field of Genetics and Development, and Ecology and Evolutionary Biology. He teaches courses in Human Genetics and Genomics and Advanced Population Genetics. Dr. Clark is also Associate Director for the Cornell Center for Comparative and Population Genomics, (3CPG) at Cornell.
Population genetics of insect immunity
Analysis of DNA sequence variation among alleles of Cecropins, Attacins, and other pathogen-defense molecules produced by insects is revealing how insects are able to deal with diverse bacterial pathogens. We are quantifying patterns of DNA sequence variation in most of the known antibacterial genes and have found that, as a class, they exhibit high levels of variability and an unusual pattern of clustered singleton sites. We are developing the theory and analyses to determine the causes for these patterns of variation. Both a model of simple hypervariability and an "arms race" model can be firmly rejected, leaving models that rely on spatial structuring as the best explanation for observed patterns of variation. Many antibacterial genes occur as small gene families, and whereas the Cecropin cluster exhibits fairly rapid birth-and-death of genes, the Attacin cluster has definite intergenic conversion events. Functional aspects are being studied as well, including variation in transcriptional response to infection, and variation in the dynamics of intra-fly bacterial growth, and whole-fly survival. Variation in bacterial virulence in Drosophila is also being studied. The primary question is, “How can insects, whose immune system lacks memory, evade bacterial pathogens?”
Evolution of the Y chromosome in Drosophila
Perhaps our most exciting recent success in genomics was to develop a method for identification of genes imbedded in heterochromatin. This method has already led to the discovery of eight previously unknown protein-coding genes on the Y chromosome of D. melanogaster (Carvalho et al., 2000, 2001). The basic idea of the method is to apply BLAST searches with any and all known proteins as the query, searching for alignments with the unmapped scaffolds left over from the whole genome shotgun assembly. Celera's original assembly had 631 unmapped scaffolds, and it was thought that these might be parts of genes imbedded in heterochromatin, because heterochromatin has such low cloning efficiency. Our method reveals BLAST hits with these unmapped scaffolds which we then piece together by spanning the gaps with reverse-transcription PCR. Having this wealth of Y-linked sequence is allowing us to now pursue the evolutionary questions regarding the dynamics of nucleotide sequence change on the Y chromosomes, and the deeper molecular evolutionary questions about the origins and divergence of these Y-linked genes.
Population genetics of sperm displacement
When females are able to store sperm for extended periods, it is possible that sperm from more than one male may be “competing” for use in fertilization. In Drosophila, later-mating males have an advantage, but there is extensive genetic variation in this character. Sperm competition has profound evolutionary consequences, and we are studying the molecular basis for the phenomenon as well as the population genetic aspects of it. We have implicated the Accessory Gland Proteins (Acps) in the “defense” component of sperm displacement, and more recently, in a collaborative study with Willie Swanson, Chip Aquadro and Mariana Wolfner, we began an exhaustive identification of Acps by sequencing EST clones derived from the male accessory gland and tested for male-specific expression. This survey not only identified a large new set of accessory gland proteins, it also demonstrated that as a class they undergo accelerated amino acid replacement, and the hit rate of the clones suggests we are near exhaustive coverage of the genes. Functional tests of sperm competition provide our means to analyze the male–female chemical communication in gamete use. Quantification of sperm displacement in the field are surprisingly effective by scoring microsatellites (due to allele multiplicity), and we have completed three surveys of a brood-structured samples (with Larry Harshman and Jørgen Bundgaard). Models for Bayesian inference of sperm competition parameters by Monte Carlo Markov chain are being developed with Beatrix Jones.
Human and Comparative Genomics
My consulting work at Celera Genomics is expanding in an interesting direction. Celera and its parent company, Applera, are launching a project to sequence the complete set of transcribed human genes in 40 individuals and chimpanzee. These data will provide a rich view of polymorphism, and allow pipelined genome-wide tests for attributes like synonymous vs. nonsynonymous levels of polymorphism and divergence (with mouse), codon usage, linkage disequilibrium, etc.. Already we have a complete alignment of human and mouse closest orthologs, and in every human gene, the polymorphism identified in the original five sequenced individuals is being compared to the divergence between human and mouse (similar to the McDonald-Kreitman test). Public data are being developed along similar lines, but the public SNP effort suffers from a very uneven and heterogeneous ascertainment, including different population samples assayed in different ways. A nice feature of the Celera data for analysis of polymorphism is the uniformity of ascertainment, uniformity of laboratory procedures, and uniformity of informatics used to identify polymorphism. Most of these results will be released in the public domain upon publication.
Evolution of metabolic regulation
A classic example of a biological network is intermediary metabolism, and given the depth of our knowledge of the genes encoding nodes of this network (the enzymes), it serves as an excellent model system for evolution of an interacting network. Along with the more theoretical aspects, we are examining the mechanisms that maintain genetic variation in lipid and glycogen storage in Drosophila and the expression of a set of enzyme-encoding genes in relevant metabolic pathways. An intriguing recent result obtained from QTL mapping is that, in many cases there is quite substantial trans-acting variation even in the absence of cis-acting variation. This implies that natural selection for, say, G6PD activity, will result in response primarily at these sites trans to the structural gene. Simple models can explain nearly half the variance in metabolic rate, and over 40% of the variance in flight performance based on activities of 12 enzymes in intermediary metabolism. Our recent analysis of doubly P-element tagged lines yielded a surprising amount of epistasis in their effects on metabolic traits. The ultimate goal of this project is to understand how the forces of mutation and selection in these quantitative traits interact to affect standing levels of variation.
Genetic basis of complex disease
There is wide interest in applying human single nucleotide polymorphisms (SNPs) to the problem of identifying genes that underlie complex diseases (defined as those that tend to aggregate in families but not simply segregate). The promise of methods that entail whole-genome screens for association rely heavily on population genetic principles, as well as on the empirical state of patterns of linkage disequilibrium in the human genome. Most of our efforts have focused on a candidate gene approach to cardiovascular disease risk, and the population genetic analysis has entailed careful quantification of factors that impact the extant variation in human populations, including mutation, recombination, demographic factors, migration, and natural selection. This project has recently expanded in a collaboration with Drs. C. Sing, E. Boerwinkle, and J. Hixson to examine 100 kb regions of chromosome 19. This study will include SNP typing in over 4000 individuals with full medical records, and testing of association between SNP haplotypes and phenotypes will be done by several novel approaches we are developing. In addition to these candidate gene approaches, we have examined the assumptions of the Common Disease Common Variant model, and find many features of the model less than convincing. The alternative demands that designs for finding associations consider the likelihood that genes that influence disease risk will have allelic heterogeneity. In addition, the summation over the genome of effects of the vast number of rare alleles is also likely to be greater than negligible.
Assorted topics in theoretical population genetics
I am interested in the theoretical consequences of genetic systems that influence mating compatibilities, such as the S-locus in gametophytic self-incompatible plants and the a and b loci in the fungus Ustilago. Hamish Spencer, Marc Feldman, and I have been working on models of genomic imprinting, particularly in relation to sex chromosomes and dosage compensation. This collaboration was quiet for a while, but we have renewed interest in the X-linked case. Nuclear–cytoplasmic interactions continue to be an interest, and recently we found stable cycles in models with fitness interactions between maternally transmitted factors (such as mtDNA) and an X-linked gene (Rand et al. 2001). Inference of human demographic history from sample data continues to be a challenge, and further developments of the branching process approach that we developed in Tishkoff et al. (2001) are under way. With Sorin Istrail, I am working on an intriguing graphic-theoretic problem we call the haplotype coloring problem, and we anticipate that results will be of great practical value for the study of haplotype variation. Finally, the theoretical consequences of variation in expression of genes imbedded in an interacting network raised many deep theoretical issues, and we are beginning to pursue these primarily by developing network simulations.