17 Inheritance of Complex Traits
Complex traits, also known as quantitative traits, are traits that do not behave according to simple Mendelian inheritance laws. More specifically, their inheritance cannot be explained by the genetic segregation of a single gene. Such traits show a continuous range of variation and are influenced by both environmental and genetic factors. Compared to strictly Mendelian traits, complex traits are far more common, and because they can be hugely polygenic, they are studied using statistical techniques such as QTL mapping rather than classical genetics methods. Examples of complex traits include height, circadian rhythms, enzyme kinetics, and many diseases including diabetes and Parkinson’s disease. One major goal of genetic research today is to better understand the molecular mechanisms through which genetic variants act to influence complex traits.
When Mendel’s work on inheritance was rediscovered in 1900, scientists debated whether Mendel’s laws could account for the continuous variation observed for many traits. One group known as the biometricians argued that continuous traits such as height were largely heritable, but could not be explained by the inheritance of single Mendelian genetic factors. Work by Ronald Fisher in 1918 mostly resolved debate by demonstrating that the variation in continuous traits could be accounted for if multiple such factors contributed additively to each trait. However, the number of genes involved in such traits remained undetermined; until recently, genetic loci were expected to have moderate effect sizes and each explain several percent of heritability. After the conclusion of the Human Genome Project in 2001, it seemed that the sequencing and mapping of many individuals would soon allow for a complete understanding of traits’ genetic architectures. However, variants discovered through genome-wide association studies (GWASs) accounted for only a small percentage of predicted heritability; for example, while height is estimated to be 80-90% heritable, early studies only identified variants accounting for 5% of this heritability. Later research showed that most missing heritability could be accounted for by common variants missed by GWASs because their effect sizes fell below significance thresholds; a smaller percentage is accounted for by rare variants with larger effect sizes, although in certain traits such as autism, rare variants play a more dominant role. While many genetic factors involved in complex traits have been identified, determining their specific contributions to phenotypes—specifically, the molecular mechanisms through which they act—remains a major challenge.
17.1 QTL mapping
A quantitative trait locus (QTL) is a region of DNA which is associated with a particular phenotypic trait, which varies in degree and which can be attributed to polygenic effects, i.e., the product of two or more genes, and their environment. These QTLs are often found on different chromosomes. The number of QTLs which explain variation in the phenotypic trait indicates the genetic architecture of a trait. It may indicate that plant height is controlled by many genes of small effect, or by a few genes of large effect.
Typically, QTLs underlie continuous traits (those traits which vary continuously, e.g. height) as opposed to discrete traits (traits that have two or several character values, e.g. red hair in humans, a recessive trait, or smooth vs. wrinkled peas used by Mendel in his experiments).
Moreover, a single phenotypic trait is usually determined by many genes. Consequently, many QTLs are associated with a single trait. Another use of QTLs is to identify candidate genes underlying a trait. Once a region of DNA is identified as contributing to a phenotype, it can be sequenced. The DNA sequence of any genes in this region can then be compared to a database of DNA for genes whose function is already known, being this task fundamental for marker-assisted crop improvement.
Mendelian inheritance was rediscovered at the beginning of the 20th century. As Mendel’s ideas spread, geneticists began to connect Mendel’s rules of inheritance of single factors to Darwinian evolution. For early geneticists, it was not immediately clear that the smooth variation in traits like body size (i.e., incomplete dominance) was caused by the inheritance of single genetic factors. Although Darwin himself observed that inbred features of fancy pigeons were inherited in accordance with Mendel’s laws (although Darwin didn’t actually know about Mendel’s ideas when he made the observation), it was not obvious that these features selected by fancy pigeon breeders can similarly explain quantitative variation in nature.
An early attempt by William Ernest Castle to unify the laws of Mendelian inheritance with Darwin’s theory of speciation invoked the idea that species become distinct from one another as one species or the other acquires a novel Mendelian factor. Castle’s conclusion was based on the observation that novel traits that could be studied in the lab and that show Mendelian inheritance patterns reflect a large deviation from the wild type, and Castle believed that acquisition of such features is the basis of “discontinuous variation” that characterizes speciation. Darwin discussed the inheritance of similar mutant features but did not invoke them as a requirement of speciation. Instead Darwin used the emergence of such features in breeding populations as evidence that mutation can occur at random within breeding populations, which is a central premise of his model of selection in nature. Later in his career, Castle would refine his model for speciation to allow for small variation to contribute to speciation over time. He also was able to demonstrate this point by selectively breeding laboratory populations of rats to obtain a hooded phenotype over several generations.
Castle’s was perhaps the first attempt made in the scientific literature to direct evolution by artificial selection of a trait with continuous underlying variation, however the practice had previously been widely employed in the development of agriculture to obtain livestock or plants with favorable features from populations that show quantitative variation in traits like body size or grain yield.
Castle’s work was among the first to attempt to unify the recently rediscovered laws of Mendelian inheritance with Darwin’s theory of evolution. Still, it would be almost thirty years until the theoretical framework for evolution of complex traits would be widely formalized. In an early summary of the theory of evolution of continuous variation, Sewall Wright, a graduate student who trained under Castle, summarized contemporary thinking about the genetic basis of quantitative natural variation: “As genetic studies continued, ever smaller differences were found to mendelize, and any character, sufficiently investigated, turned out to be affected by many factors.” Wright and others formalized population genetics theory that had been worked out over the preceding 30 years explaining how such traits can be inherited and create stably breeding populations with unique characteristics. Quantitative trait genetics today leverages Wright’s observations about the statistical relationship between genotype and phenotype in families and populations to understand how certain genetic features can affect variation in natural and derived populations.
17.2 Quantitative traits
Polygenic inheritance refers to inheritance of a phenotypic characteristic (trait) that is attributable to two or more genes and can be measured quantitatively. Multifactorial inheritance refers to polygenic inheritance that also includes interactions with the environment. Unlike monogenic traits, polygenic traits do not follow patterns of Mendelian inheritance (discrete categories). Instead, their phenotypes typically vary along a continuous gradient depicted by a bell curve.
An example of a polygenic trait is human skin color variation. Several genes factor into determining a person’s natural skin color, so modifying only one of those genes can change skin color slightly or in some cases, such as for SLC24A5, moderately. Many disorders with genetic components are polygenic, including autism, cancer, diabetes and numerous others. Most phenotypic characteristics are the result of the interaction of multiple genes.
Examples of disease processes generally considered to be results of many contributing factors:
Congenital malformation
- Cleft palate
- Congenital dislocation of the hip
- Congenital heart defects
- Neural tube defects
- Pyloric stenosis
- Talipes
Adult onset diseases
- Diabetes Mellitus
- Cancer
- Glaucoma
- Hypertension
- Ischaemic heart disease
- Bipolar disorder
- Schizophrenia
- Psoriasis
- Thyroid diseases
- Alzheimer’s Disease
Multifactorially inherited diseases are said to constitute the majority of genetic disorders affecting humans which will result in hospitalization or special care of some kind.
Traits controlled both by the environment and by genetic factors are called multifactorial. Usually, multifactorial traits outside of illness result in what we see as continuous characteristics in organisms, especially human organisms such as: height, skin color, and body mass. All of these phenotypes are complicated by a great deal of give-and-take between genes and environmental effects. The continuous distribution of traits such as height and skin color described above, reflects the action of genes that do not manifest typical patterns of dominance and recessiveness. Instead the contributions of each involved locus are thought to be additive. Writers have distinguished this kind of inheritance as polygenic, or quantitative inheritance.
Thus, due to the nature of polygenic traits, inheritance will not follow the same pattern as a simple monohybrid or dihybrid cross. Polygenic inheritance can be explained as Mendelian inheritance at many loci, resulting in a trait which is normally-distributed. If n is the number of involved loci, then the coefficients of the binomial expansion of (a + b)2n will give the frequency of distribution of all n allele combinations. For a sufficiently high values of n, this binomial distribution will begin to resemble a normal distribution. From this viewpoint, a disease state will become apparent at one of the tails of the distribution, past some threshold value. Disease states of increasing severity will be expected the further one goes past the threshold and away from the mean.
A mutation resulting in a disease state is often recessive, so both alleles must be mutant in order for the disease to be expressed phenotypically. A disease or syndrome may also be the result of the expression of mutant alleles at more than one locus. When more than one gene is involved, with or without the presence of environmental triggers, we say that the disease is the result of multifactorial inheritance.
The more genes involved in the cross, the more the distribution of the genotypes will resemble a normal, or Gaussian distribution. This shows that multifactorial inheritance is polygenic, and genetic frequencies can be predicted by way of a polyhybrid Mendelian cross. Phenotypic frequencies are a different matter, especially if they are complicated by environmental factors.
The paradigm of polygenic inheritance as being used to define multifactorial disease has encountered much disagreement. Turnpenny (2004) discusses how simple polygenic inheritance cannot explain some diseases such as the onset of Type I diabetes mellitus, and that in cases such as these, not all genes are thought to make an equal contribution.
The assumption of polygenic inheritance is that all involved loci make an equal contribution to the symptoms of the disease. This should result in a normal curve distribution of genotypes. When it does not, the idea of polygenetic inheritance cannot be supported for that illness.
17.3 Expression quantitative trait loci
Expression quantitative trait loci (eQTLs) are genomic loci that explain all or a fraction of variation in expression levels of mRNAs.
An expression trait is a trait regarding the amount of an mRNA transcript or a protein, which are usually the product of a single gene with a specific chromosomal location. This distinguishes the expression from most classical complex traits, which are not the product of the expression of a single gene. As mentioned, chromosomal loci that explain (part of the) variance in expression traits are called eQTLs. eQTLs that map to the approximate location of their gene-of-origin are referred to as local eQTLs. In contrast, those that map far from the location of their gene of origin, often on different chromosomes, are referred to as distant eQTLs. Often, these two types of eQTLs are referred to as cis and trans, respectively, but these terms are best reserved for instances when the regulatory mechanism (cis vs. trans) of the underlying sequence has been established. The first genome-wide study of gene expression was carried out in yeast and published in 2002. The initial wave of eQTL studies employed microarrays to measure genome-wide gene expression; more recent studies have employed massively parallel RNA sequencing. Many expression QTL studies were performed in plants and animals, including humans, non-human primates and mice.
Some cis eQTLs are detected in many tissue types but the majority of trans eQTLs are tissue-dependent (dynamic). eQTLs may act in cis (locally) or trans (at a distance) to a gene. The abundance of a gene transcript is directly modified by polymorphism in regulatory elements. Consequently, transcript abundance might be considered as a quantitative trait that can be mapped with considerable power. These have been named expression QTLs (eQTLs). The combination of whole-genome genetic association studies and the measurement of global gene expression allows the systematic identification of eQTLs. By assaying gene expression and genetic variation simultaneously on a genome-wide basis in a large number of individuals, statistical genetic methods can be used to map the genetic factors that underpin individual differences in quantitative levels of expression of many thousands of transcripts. Studies have shown that single nucleotide polymorphisms (SNPs) reproducibly associated with complex disorders as well as certain pharmacologic phenotypes are found to be significantly enriched for eQTLs, relative to frequency-matched control SNPs.
Mapping eQTLs is done using standard QTL mapping methods that test the linkage between variation in expression and genetic polymorphisms. The only considerable difference is that eQTL studies can involve a million or more expression microtraits. Standard gene mapping software packages can be used, although it is often faster to use custom code such as QTL Reaper or the web-based eQTL mapping system GeneNetwork. GeneNetwork hosts many large eQTL mapping data sets and provide access to fast algorithms to map single loci and epistatic interactions. As is true in all QTL mapping studies, the final steps in defining DNA variants that cause variation in traits are usually difficult and require a second round of experimentation. This is especially the case for trans eQTLs that do not benefit from the strong prior probability that relevant variants are in the immediate vicinity of the parent gene. Statistical, graphical, and bioinformatic methods are used to evaluate positional candidate genes and entire systems of interactions.
17.4 GWAS
A genome-wide association study (GWAS) is a method similar to QTL mapping used to identify variants associated with complex traits. Association mapping differs from QTL mapping primarily in that GWASs are only performed with random-mating populations; because all the alleles in the population are tested at the same time, multiple alleles at each locus can be compared.
Recently, with rapid increases in available genetic data, researchers have begun to better characterize the genetic architecture of complex traits. One surprise has been the observation that most loci identified in GWASs are found in noncoding regions of the genome; therefore, instead of directly altering protein sequences, such variants likely affect gene regulation. To understand the precise effects of these variants, QTL mapping has been employed to examine data from each step of gene regulation; for example, mapping RNA-sequencing data can help determine the effects of variants on mRNA expression levels, which then presumably affect the numbers of proteins translated. A comprehensive analysis of QTLs involved in various regulatory steps—promotor activity, transcription rates, mRNA expression levels, translation levels, and protein expression levels—showed that high proportions of QTLs are shared, indicating that regulation behaves as a “sequential ordered cascade” with variants affecting all levels of regulation. Many of these variants act by affecting transcription factor binding and other processes that alter chromatin function—steps which occur before and during RNA transcription.
To determine the functional consequences of these variants, researchers have largely focused on identifying key genes, pathways, and processes that drive complex trait behavior; an inherent assumption has been that the most statistically significant variants have the greatest impact on traits because they act by affecting these key drivers. For example, one study hypothesizes that there exist rate-limiting genes pivotal to the function of gene regulatory networks. Others studies have identified the functional impacts of key genes and mutations on disorders, including autism and Schizophrenia. However, a 2017 analysis by Boyle et al. argues that while genes which directly impact complex traits do exist, regulatory networks are so interconnected that any expressed gene affects the functions of these “core” genes; this idea is coined the “omnigenic” hypothesis. While these “peripheral” genes each have small effects, their combined impact far exceeds the contributions of core genes themselves. To support the hypothesis that core genes play a smaller than expected role, the authors describe three main observations: the heritability for complex traits is spread broadly, often uniformly, across the genome; genetic effects do not appear to be mediated by cell-type specific function; and genes in the relevant functional categories only modestly contribute more to heritability than other genes. One alternative to the omnigenic hypothesis is the idea that peripheral genes act not by altering core genes but by altering cellular states, such as the speed of cell division or hormone response.