Sie sind auf Seite 1von 31

Hum Genet DOI 10.

1007/s00439-014-1424-6

REViEW PApEr

Disruption oflong-range gene regulation inhuman genetic disease: a kaleidoscope ofgeneral principles, diverse mechanisms andunique phenotypic consequences
ShipraBhatia DirkA.Kleinjan

Received: 16 December 2013 / Accepted: 18 January 2014 Springer-Verlag Berlin Heidelberg 2014

Abstract The precise control of gene expression programs is crucial for the establishment of the diverse gene activity patterns required for the correct development, patterning and differentiation of the myriad of cell types within an organism. The crucial importance of non-coding regions of the genome in the control of gene regulation is well established and depends on a diverse group of sequence fragments called cis-regulatory elements that reside in these regions. Advances in novel genome-wide techniques have greatly increased the ability to identify potential regulatory elements. In contrast, their functional characterisation and the determination of their diverse modes of action remain a major bottleneck. Greater knowledge of gene expression control is of major importance for human health as disruption of gene regulation has become recognised as a signicant cause of human disease. Appreciation of the role of cis-regulatory polymorphism in natural variation and susceptibility to common disease is also growing. While novel techniques such as GWAS and NGS provide the ability to collect large genomic datasets, the challenge for the twenty-rst century will be to extract the relevant sequences and how to investigate the functional consequences of disease-associated changes. Here, we review how studies of transcriptional control at selected paradigm disease gene loci have revealed general principles of cis-regulatory logic and regulatory genome organisation, yet also demonstrate how the variety of mechanisms can combine to result in unique phenotypic outcomes. Integration of these principles with the emerging wealth of

genome-wide data will provide enhanced insight into the workings of our regulatory genome.

Introduction A major aim in the eld of medical genetics is to identify the sequence variants that are the cause of genetic diseases. Traditionally, the focus has been on the analysis of genetic aberrations in the coding regions of genes. This focus has been very successful in the identication of the causative genes and variants for many Mendelian disorders. More than 3,500 Mendelian disorders have now been linked with exonic mutations (www.omim.org). In the majority of cases, the mutation disrupts the integrity of the gene product leading to loss of functional activity of the encoded protein, though in rare cases even synonymous mutations may have an effect (Macaya etal. 2009; Sauna and Kimchi-Sarfaty 2011). However, exonic sequences make up only 2% of the human genome. How much of the remaining 98% of our genome is functional remains a hotly debated issue: Collective results from the ENCODE project suggested that over 80% of the genome would carry some biological function, effectively scrapping the widely held notion that much of our genome consists of junk DNA (Bernstein etal. 2012). The haploid human genome contains approximately 3 billion basepairs, and considering that heterozygous changes also affect traits and diseases, the theoretical mutational target area of the genome could therefore be as large as 80% of 6 billion basepairs. The proportion of this area that would give rise to a detectable phenotype when mutated is presently unclear. Moreover, this interpretation of the ENCODE data has subsequently strongly been rebutted in a number of commentaries (Doolittle 2013; Graur etal. 2013). Other studies have estimated the proportion

S.Bhatia D.A.Kleinjan(*) MRC Human Genetics Unit atthe MRC IGMM atthe University ofEdinburgh, EdinburghEH4 2XU, UK e-mail: dakleinjan@gmail.com

13

Hum Genet

of the genome that is under selective pressure to be around 1015% (Ponting and Hardison 2011). In accordance with the latter result, a recent study on genome-wide ChIP-seq mapping of enhancer-associated histone marks across 19 different tissues found putative function for about 11% of the mouse genome (Shen etal. 2012). Regardless of the true functional proportion of the genome, it is becoming clear that, while the effects of genetic changes in the noncoding 98% of our genome may be less obvious, and many of these changes may go undetected, not all such changes are without consequence. In this review, we focus on those genetic changes that do not directly mutate the protein-coding portion of any gene, but rather interfere with the regulation of gene expression. Disruption of cis-regulatory control can occur in a variety of ways, grouped together under the umbrella of cis-ruption mechanisms. Many of the principles have been unveiled through detailed studies of a small number of paradigm loci and we discuss how these ndings contribute to our understanding of the spectrum of cis-ruption disease etiologies. Importantly, these studies have led to the realisation that individual genes should not be viewed in isolation, but be placed in a broader genomic context that includes the regions spanned by their cis-regulatory sequences and often encompasses adjacent genes.

Long-range enhancers andgene expression programs It is commonly assumed that disruption of non-coding regions results in phenotypic changes by affecting the expression programs of cis-linked genes in the region, a view that is largely supported by the studies of the paradigm loci (see below). These gene expression programs are implemented through the concerted action of a variable set of cis-regulatory elements including promoters, enhancers, repressors, insulators and various chromatin organising elements. In addition, a variety of non-coding RNA species exists of which at least a subset impacts on gene expression through largely unknown mechanisms (Mercer etal. 2009). Transcription of genes initiates at promoters, which specify the site of recruitment of the basal transcription machinery. The common view is that for many genes, including most housekeeping and ubiquitously expressed genes, sequences around and including the promoter are sufcient to drive their expression, but that for genes with tissue-specic, signal-dependent and pleiotropic functions, activation of expression usually depends on distal cisregulatory sequences. These sequences can reside at considerable distances from the gene itself and may be found in adjacent intergenic regions both upstream and downstream, as well as in introns of the gene itself or those of neighbouring genes (Kleinjan and van Heyningen 2005). In odd cases, cis-elements can even coincide within exonic

sequences (Birnbaum etal. 2012; Woltering and Duboule 2009). Intriguingly, genome-wide studies have found that some widely expressed genes are nevertheless surrounded by arrays of long-range cis-regulatory elements (e.g. cMyc), suggesting their apparently ubiquitous expression is achieved through the combined action of many separate elements (Shen etal. 2012). Among the multitude of regulatory elements, enhancers constitute the main class of functional element implicated in cis-regulatory disruption cases. Enhancers are short sequence elements, usually a few hundred basepairs in length, which contain a dense conglomeration of transcription factor binding sites (Spitz and Furlong 2012). The term enhancer reects the functional denition of these elements, having the ability to increase the transcription level from a linked promoter, usually determined with a functional assay that measures an increased output of a reporter gene. Over the years, different methods have been developed to identify these elements in the genome, such as DNaseI hypersensitivity mapping, transcription factor proling (Hallikas etal. 2006; Zinzen etal. 2009) and in vitro and in vivo reporter assays (Table1). An advance was made with the availability of genomic sequence from evolutionarily diverged species and the realisation that selection on functional sequences was detectable as locally elevated levels of sequence conservation, providing a simple method to their identication by genomic sequence comparisons (Elgar etal. 1996; Nobrega and Pennacchio 2004). A further advance came with the nding that the nucleosomes occupying or anking enhancers often carry specic modications on their histone tails (Heintzman etal. 2007; Rada-Iglesias etal. 2010). Large-scale efforts investigating the variety of histone modications have produced strong associations for some specic modications with particular regulatory functions of the underlying DNA, sometimes referred to as the histone code (Buecker and Wysocka 2012; Turner 2007). The methylation of lysine 4 of histone H3 (H3K4me1/3) and the acetylation of lysine 27 of histone H3 (H3K27Ac) have proved particularly valuable as signature marks of the presence of promoters and enhancers (Heintzman etal. 2007; Rada-Iglesias etal. 2010) [Heintzman, Rada-Iglesias]. Inclusion of histone variants H2A.Z and H3.3 in local nucleosomes has also been associated with active cis-elements in the human genome (Jin etal. 2009). Modied nucleosome-containing regions can be identied in high-throughput fashion by ChIP using modication-specic histone antibodies coupled with microarrays or next-generation sequencing (Table1). These methods have led to the identication of >1 million putative enhancers in the human genome (Bernstein etal. 2012; Heintzman etal. 2007; Hoffman etal. 2012; Rada-Iglesias etal. 2010), and these elements are the prime candidate regions to be examined for sequence variants associated

13

Table1Assay methods for the identication and characterization of cis-regulatory elements (CREs) Brief description and merits Limitations

Hum Genet

Application

Method

Identication or prediction of cis-elements

Sequence conservation between evolutionarily diverged species or duplicated gene loci using a variety of web-based tools e.g. PIPmaker, VISTA, ECR browser, UCSC, Ensembl etc. (Loots 2008) Transcription factor proling, both computationally (using a variety of web-based tools e.g. UCSC, MEME, TRANSFAC, JASPAR etc.) and experimentally (using ChIP-Chip, ChIP-seq, yeast one hybrid and Protein binding microarrays) (Karolchik etal. 2014; Newburger and Bulyk 2009; Zinzen etal. 2009) DNaseI hypersensitivity mapping by DNaseI-seq (John etal. 2013) FAIRE (Formaldehyde-assisted isolation of regulatory elements) by FAIRE-seq (Giresi etal. 2007) Histone modications proling (in particular for H3K4Me1, H3K27Ac) by Chip-seq and ChIP-Chip (Heintzman etal. 2007; Rada-Iglesias etal. 2010)

Functional validation and characterisation

In vitro reporter assays in cultured cell lines transfected with constructs bearing CRE sequences driving reporter expression, e.g. GFP, Luciferase

Massively parallel reporter assays (MPRA) (Melnikov etal. 2012; Patwardhan etal. 2012) CRE-seq (Kwasnieski etal. 2012)

STARR-Seq (Arnold etal. 2013)

Reporter assays in zebrash performed by generating stable or transient transgenic lines using constructs bearing CRE sequences driving uorescent reporter genes (Ishibashi etal.2013)

Does not detect evolutionarily diverged or Evolutionary conservation of CRE sequence lineage-specic enhancers. Difcult to predict indicates functional roles of the CRE, and is thus a useful way of prioritising elements for the target gene as CREs may function over large distances further functional studies Enrichment of transcription factors like p300 In vitro methods like yeast one hybrid and protein binding microarrays fail to take coand CBP are useful indicators of active CRE operative binding into account presence. Also, deciphering the transcription factor binding proles helps discern the possible effects of disease-associated mutations in CREs Large number of cells required to perform these Generate genome-wide proles of nucleoassays, therefore mostly restricted to in vitro some-free regions available for transcription cultured cell lines or larger tissues (containing factor binding, indicative of CRE presence mixed cell populations). Biological relevance can be unclear due to the tissue-specic Generate genome-wide enrichment proles nature of CRE function for active CRE-associated histone modications, as indicators of the presence of functional CREs Expression levels of the reporter provide a Assays only performed in cultured cell lines, quantiable read-out of enhancer activity hence lack the relevant biological context. Thus, the information derived cannot readily be extrapolated to human disease context in most situations The method can only be used in cultured cell High-throughput reporter assays in culture lines or limited tissues like retina explants using unique sequence barcodes for identication. Libraries of barcoded reporter genes are transfected into mammalian cells and quantied by performing RNA sequencing on the barcoded transcripts Does not account for effects on reporter RNA Screens multiple enhancers in parallel by stability or processing inclusion of the element itself in the reporter transcript Reduced conservation of elements between the Highly efcient transgenesis methods, low human genome and the more rapidly evolved maintenance cost and transparency of zebrash genome, and the presence of a embryos enable rapid analysis of functional large number of duplicate genes due to the potential of CREs and condent predicadditional whole genome duplication in teltion of target genes. Some high-throughput eost sh restricts the full exploitation of this screening platforms have been developed, powerful system for assaying CRE function e.g. Automated Reporter Quantication in vivo (ARQiv) (Walker etal. 2012)

13

High-throughput enhancer screens covering the full window of embryonic development will be slow and expensive. The VISTA enhancer browser is useful but only provides information on conserved CREs at one stage of embryonic development With few exceptions (Amano etal. 2009) these assays have so far only been optimized in cultured cells, and thus lack the potentially important developmental context. Biased towards stable interactions

Hum Genet

The assay is only functional for CREs active in liver

with genetic disease, predisposition to common disease, and the evolution and variation of traits.

Gene regulation andhuman genetic disease A key strategy in molecular research is to disable a specic component (e.g. mutate a gene) of a genetic system and observe the effect on the organ/organism, to deduce the role of the part in the normal situation. The benet of model organisms and cell culture systems, each with their own advantages, is to be able to make precise disruptions by design. Although such manipulations are not available to human genetics, the study of human genetic malformations nevertheless offers some unique alternative benets: not only are any ndings of direct medical relevance without the need to extrapolate from a distantly related model species, but the human population also provides a large pool of individuals with unique phenotypes. Every human being is characterised by a unique combination of genetic traits. Patients suffering from a particular medical condition are often very well characterised. Thus, naturally occurring variations and disease-causing mutations in humans are a valuable resource, not only to identify critical genes, but also to gain insights into the regulatory networks that connect these genes into developmental and homeostatic pathways. Investigation of disease pathology has been a crucial driver in the discovery of long-range regulation and the existence of cis-regulatory elements. One of the rst studies to hint at a regulatory mechanism of disease was the analysis of Dutch -thalassemia, which revealed a 100kb genomic deletion that left the globin genes themselves intact (Kioussis etal. 1983; Tuan etal. 1985). The deletion fragment, some distance upstream of the genes, was shown to contain multiple DNase hypersensitive sites (Table2) which combine to provide a powerful locus control region (LCR) for the globin locus (Grosveld etal. 1987; Tuan etal. 1985). While the appreciation of a wider generality of the involvement of regulatory regions in genetic diseases was initially slow to follow, it is now clear that there are a number of distinct chromosomal and genetic abnormalities that can lead to disruption of cis-regulatory control and result in disease, collectively grouped as cis-ruption disease mechanisms (Kleinjan and Coutinho 2009). These range from severe genomic lesions such as chromosomal translocations, inversions, duplications and deletions to the much more subtle local changes such as point mutations and small indels (insertions/deletions). Growth of the list of disease cases involving the former group of lesions continues apace and the analysis of apparently balanced chromosomal rearrangements (ABCRs) in particular has been of great value to the discovery of long-range cis-regulatory

Limitations

Table1continued

13

Application

Assays detect interactions between CREs and Mapping interactions between Chromatin looping studies by a variety of chromosome contheir target gene promoter, in the context of formation capture techniques (e.g. 3C,4C, 5C, Hi-C, ChIAremote CREs and target the entire gene locus. This information is PET), DAM-ID and 3D FISH (de Wit and de Laat 2012; van gene promoters missing in reporter assays where the CREs Steensel and Dekker 2010). The C-techniques can also be are taken out of the context of their endogused for element discovery enous locus

Reporter assays in mice by generating stable lines or transient Assays are robust and performed in vivo. The results obtained can be rapidly extrapolated transgenic embryos with CRE-bearing constructs sequences to human situation due to the close evoludriving a lacZ or uorescent reporter gene. (e.g. (Kleinjan tionary distance between human and mice. etal. 2001) Whole genome views have been generated Transgenic enhancer trap lines detecting local enhancer activusing this assays for human and mouse ity at different genomic integration sites. Augmented by CREs and available as the VISTA enhancer local hopping using a transposon-based strategy. (Kokubu browser etal. 2009; Ruf etal. 2011) Massively parallel functional dissection (MPFD) using the tail A library of uniquely barcoded CRE variants vein assay (Kim and Ahituv 2013) is injected in the tail vein of mice and the reporter gene expression driven by each CRE variant assayed by RNA-seq in the RNA extracted from mouse liver

Method

Brief description and merits

Table2Long-range enhancers involved in genetic disease, predisposition, and trait variability Target gene Associated disease/ trait/susceptibility. Distance from promoter Binding factor/ deletion size Reference

Hum Genet

Enhancer

Type of mutation

Disease-associated variants SIMO Point mutation PAX6 SHH SHH IRF6 HCFC1 SOX9 DLX5/6 HBS1L/MYB FOXL2 SOST SHOX SOX9 SOX9 SOX9 POU3F4 ATOH7 DLX5/6 HBA genes HBB genes IHH BMP2 SOX9 SHH TPTPS and SD4 Brachydactyly anonychia Brachydactyly type A2 Craniosynostosis, syndactyly -thalassemia 40kb 80kb 40kb 65/80kb NCRNA 20kb X-linked deafness type3 920kb Pierre Robin Sequence 1380kb Pierre Robin Sequence 1580kb 75kb 8kb 6.5kb 5.1kb 3.3kb 100kb Pierre Robin Sequence 1560kb 36kb >319kb Leri-Weill dyschondrosteosis 47.5kb Van Buchem disease 35kb 52kb Blepharimosis (BPES) 283kb 7.4kb Persistent foetal haemoglobin 43kb/84kb 3bp Autism 13/6kb DLX Pierre Robin Sequence 1441kb MSX1 Non-syndromic intellectual ability 8bp YY1 Cleft lip and palate 9.7kb AP2 Holoprosencephaly 460kb SIX3 Polydactyly 980kb Ets family Aniridia 150kb PAX6 (Bhatia etal. 2013) (Lettice etal. 2012) (Jeong etal. 2008)

ZRS

Point mutation

SBE2

Point mutation

MCS-9.7

Point mutation

(Rahimov etal. 2008) (Huang etal.2012) (Benko etal. 2009) (Poitras etal. 2010) (Farrell etal. 2011) (DHaene etal. 2009) (Balemans etal. 2002) (Benito-Sanz etal. 2012) (Benko etal. 2009)

YY1_S2

Point mutation

HCNE-F2

Point mutation

I56i

Point mutation

Indel

Deletion

Deletion

Deletion

Sp2

Deletion

Sp4

Deletion

F1

Deletion

HCNE_81728

Deletion

Deletion

(Ahn etal. 2009; de Kok etal. 1996; Naranjo etal. 2010) (Ghiasvand etal. 2011) (Brown etal. 2009) (Phylipsen etal. 2010) (Kioussis etal. 1983) (Klopocki etal. 2011) (Dathe etal. 2009) (Kurth etal. 2009) (Klopocki etal. 2008; Sun etal. 2008; Wieczorek etal. 2009)

5,115bp element

Deletion

HS-40/MCS-R2

Deletion

Hearing loss and craniofacial defects -thalassemia

LCR

Deletion

Duplication

Duplication

Duplication

ZRS

Duplication

Disease susceptibility and Trait variability rs12913832 SNP variant OCA2 CDKN2B SORT1 MYC Eye colour

21kb 130kb 118kb 335kb

HLTF, LEF1, and MITF STAT1 C/EBP TCF/LEF1

rs10757278

SNP variant

Coronary artery disease Myocardial infarction Colon cancer

(Eiberg etal. 2008; Sturm etal. 2008; Visser etal. 2012) (Harismendy etal. 2011) (Musunuru etal. 2010) (Ahmadiyeh etal. 2010; Pomerantz etal. 2009; Sotelo etal. 2010; Tuupanen etal. 2009; Wright etal. 2010)

rs12740374

SNP variant

rs6983267

SNP variant

13

Only a few validated disease-causing point mutations have been reported so far. The list of intergenic deletions is longer and here limited to examples of short deletions. In contrast the list of SNP variants with disease or trait association is rapidly growing and only a small selection of validated variants is shown

Hum Genet

Binding factor/ deletion size

TFAP2A

OCT1/GATA

ETV6

TAL1

61kb

3.2kb

gene control for many genes (Fantes etal. 1995, 2008; FitzPatrick etal. 2003; Lodder etal. 2009). Awareness of the potential of ABCRs to affect genes that lie distal to the breakpoints by removing remote regulatory elements has helped to recognise the fact that the gene directly disrupted by the breakpoint is not always the true causative gene, and led to (re-)evaluation of a wider selection of genes in the affected chromosomal segment as potential candidate genes (Martin etal. 2011; Ragvin etal. 2010; Zuniga etal. 2004). In contrast, the list of validated cases of the latter group (point mutations and small deletions) is currently still short (Table2). Nevertheless, due to their much more precise disruption, these cases can provide more detailed mechanistic insight. Examples of both classes of regulatory lesions are found in the set of paradigm loci that we discuss in more detail below.

(Powers etal. 2013)Powers

(Enattah etal. 2002; Tishkoff etal. 2007) (McLean etal. 2011)

Reference

(Praetorius etal. 2013) (Lubbe etal. 2011) (Smemo etal. 2012)

(Prabhakar etal. 2008)

(Oksenberg etal. 2013)

PAX6: a paradigm locus forlong-range gene regulation The PAX6 locus (MIM 607108) on human chromosome 11 has long been a paradigm locus for cis-ruption diseases. PAX6 haploinsufciency is the cause of the congenital eye malformation aniridia (MIM 106210) as shown by a large number of cases with coding region mutations and deletions (Robinson etal. 2008). The clear association of aniridia with PAX6 mutation was crucially informative when aniridia patients with intact coding regions but bearing chromosomal malformations mapping downstream of the PAX6 gene were identied (Fantes etal. 1995). These rearrangements included translocations, inversions and large deletions, but in all cases involved one breakpoint in the region downstream of PAX6, within the nal intron of an adjacent, ubiquitously expressed gene, ELP4 (MIM 606985; a subunit of Elongator, a protein complex associated with elongating RNA polymerase II (Fig.1a) (Kleinjan etal. 2002; Winkler etal. 2002). Initially suggesting the presence of a second aniridia gene in the locus, it was shown that ELP4 was simply a bystander gene that was not directly involved in the disease but is merely harbouring cis-regulatory elements within its introns (Kleinjan etal. 2001, 2002). Several tissue-specic enhancers for PAX6 have now been identied within the ELP4 introns, thus creating an obligate physical linkage between the two genes that explains their conserved synteny through vertebrate evolution (Kleinjan etal. 2006; McBride etal. 2012; Ravi etal. 2013). Only the duplication of the Pax6 locus in the early teleost sh lineage has temporarily lifted selective pressure to create the opportunity to separate the genes and regulatory elements (Kleinjan etal. 2008). Among the multitude of characterised cis-elements in the PAX6 locus are enhancers covering most of the full PAX6 expression domain. It is noteworthy that not only do most elements

Distance from promoter

14kb/22kb

130kb

75kb

Associated disease/ trait/susceptibility.

Skin pigmentation Colorectal cancer Heart disease

Lactase persistence

Loss of penile spine

Gain of brain growth

DCDC2/KIAA0319 IRF4 BMP4 TBX5

Dyslexia

Target gene

GADD45G

CENTG2/GBX2 Gain of function changes HACNS1

Opposable thumb

LCT

AR

Type of mutation

rs12203592 SNP variant rs4444235 SNP variant Enhancer 9 point mutation Enhancers in trait evolution LCT-13910/-14010/ SNP variants -22018 Loss of element

VNTR

Table2continued

Brain SVZ enhancer

Loss of element

13

Enhancer

READ1

HACNS369/174/ HAR31

Fast evolving SNPs

AUTS2

Neurodevelopment/autism

Intronic

Hum Genet

drive expression in several tissues, but most expression sites are covered by multiple separate enhancers (Kleinjan etal. 2006; McBride etal. 2012). Nevertheless, the phenotype in various cis-ruption cases of aniridia is indistinguishable from that caused by loss of function coding region mutation or gene deletion, irrespective of the distance of the breakpoint from the gene. This may suggest a strong interdependence between the various elements, or indicate that all rearrangements have included at least one dominant distal cis-element. As mutations in PAX6 are sometimes linked to additional phenotypes, e.g. diabetes or epilepsy, it remains to be seen if disruptions of particular PAX6 cis-elements may play a role in such cases in isolation. Remarkably, all aniridia associated breakpoints occur in the PAX6 3 region, suggesting the hardwiring of PAX6 into eye development networks is strongly biased towards this side of the gene. One potential exception is a report of a patient with a complex phenotype and a large deletion with one breakpoint between 50 and 150kb upstream of PAX6. Among the disease features, most of which can be explained by genes in the deletion interval, were ptosis and cataracts, which could be due to PAX6 misregulation (Almind etal. 2009). However, no ocular enhancers have so far been identied in the distal upstream region [Bhatia, personal communication]. Until recently, all reported aniridia cis-ruption cases involved the gross disruption of the PAX6 regulatory domain by chromosomal rearrangements. This posed the question whether more subtle mutations could also be a cause of the disease. Direct sequencing of a panel of cis-regulatory elements active in various eye tissues revealed a single nucleotide change in a conserved ocular enhancer, SIMO, located 150kb downstream from PAX6 in a patient with aniridia and no exonic mutations or chromosomal abnormalities (Bhatia etal. 2013). Intriguingly, the mutation appeared to disrupt a binding site for PAX6 itself, suggesting that disruption of auto-regulation might have caused the disease (Fig.1a). In accordance with this hypothesis, transgenic experiments demonstrated that when the mutation is present, expression of a linked reporter gene is initiated as normal but fails to be maintained in later stages of development (Bhatia etal. 2013).

Uncovering endophenotypes atthe SOX9 locus The SOX9 gene (MIM 608160), an HMG-box transcription factor on human chromosome 17, plays a crucial role in several developmental processes, including chondrogenesis, neural crest cell migration and differentiation, as well as pancreatic and testicular development (Pritchett etal. 2011). SOX9 deciency causes a polymorphic malformation syndromeCampomelic Dysplasia (CD MIM114290), characterised by typical congenital bowing

of the long bones, severe skeletal dysplasia and defective sex determination in a majority of cases. Chromosomal translocations upstream of the gene rst pointed to SOX9 as a CD candidate gene and the identication of coding mutation in the gene conrmed haploinsufciency of the gene as the cause of CD and autosomal sex reversal (Foster etal. 1994; Wagner etal. 1994). The gene is embedded within a large gene desert. The gene-free region surrounding SOX9 extends about 2Mb in the centromeric direction to the next upstream gene KCNJ2 and about 500Kb to SLC39A11 on the telomeric side (Fig.1b). As often seen for gene deserts around developmental control genes, many highly conserved sequences are present in the region. Investigation of the chromosomal abnormalities in CD patients has revealed a large number of translocation breakpoints in this gene desert, which roughly seem to group in two clusters, located between 30 and 375kb and between 759 and 932kb upstream of SOX9 (Leipoldt etal. 2007). Characterisation of affected individuals shows a trend in the severity of campomelia, the bowing of the long bones that gives the disorder its name, with severity depending on the distance of the breakpoint to the coding region of the gene (Bagheri-Fam etal. 2006; Leipoldt etal. 2007; Pfeifer etal. 1999; Pop etal. 2004; Velagaleti etal. 2005; Wirth etal. 1996; Wunderle etal. 1998). Some of the most distal lesions lacked clear limb malformations and are referred to as acampomelic campomelic dysplasia (ACD) (Fonseca etal. 2013; Hill-Harfe etal. 2005; Lecointre etal. 2009). In addition to its role in chondrogenesis, SOX9 also functions in development of the pancreas, testis, gut, heart, central nervous system and neural crest. As the number of cases with chromosomal rearrangements near SOX9 has grown over recent years, chromosomal lesions have been uncovered that specically affect SOX9 function in selected organ systems leading to more restricted phenotypes (Fig. 2). Among these so-called endophenotypes are disorders of Sex Development (DSD). Differentiation of the early gonad into a testis rather than an ovary is triggered by the activation of SOX9 expression by the testis-determining factor SRY. Insufcient SOX9 expression results in male-tofemale sex reversal and this occurs in around three quarters of XY CD patients, while inappropriate expression of SOX9 in XX gonads causes initiation of testicular development and female-to-male sex reversal. Analysis of copy number variations in a number of patients with DSDs revealed duplications (in 46,XX males) and deletions (in 46,XY females) that by minimal overlap determined a 78kb region for DSD located 517595kb upstream of SOX9 (Benko etal. 2011; Cox etal. 2011; Huang etal. 1999; Refai etal. 2010). This region is therefore thought to contain critical cis-regulatory elements for gonadal SOX9 expression. However, sex reversal is also seen in some patients with breakpoints beyond this region, and patients with another

13

Hum Genet

SOX9 endophenotype, Cooks syndrome, or brachydactyly anonychia (BA; MIM 106995) (Kurth etal. 2009), resulting from large duplications of the SOX9 upstream region, exist without symptoms of sex reversal. It was argued that a true position effect mechanism could account for this discrepancy. It is well known that the chromatin status of the genomic environment in which an enhancer nds itself, greatly inuences the actual functional activity of the element. Gross chromosomal lesions such as translocations,

large deletions or duplications can result in silencing of a potentially functional enhancer through relocation into a non-permissive chromatin environment. Isolated Pierre Robin Sequence (PRS; MIM261800) represents another SOX9 endophenotype. PRS is a craniofacial disorder characterised by micrognathia (mandibular hypoplasia), glossoptosis, and incomplete midline fusion of the palatal shelves typically leading to a U-shaped cleft palate. The features are thought to result from a causally linked

13

Hum Genet

Fig.1Schematic representations of a number of genomic regions

that have served as paradigm loci for the role of long-range gene regulation in disease. The complex regulatory landscapes of the loci are depicted in simplied form, only showing their most salient features. a The PAX6 locus on chromosome 11p13 was among the rst Developmental Regulator loci where disruption of long-range gene regulation was shown to cause congenital malformation. In this case, the phenotype, aniridia, seen in patients with cis-regulatory disruptions was similar to the disease phenotype of gene mutation or deletion, and most likely due to the loss of the enhancers of the downstream regulatory region (DRR). A single basepair change in one of these enhancers, SIMO, was shown to affect a binding site for PAX6 itself, suggesting interruption of auto-regulatory maintenance of expression can be a cause of the disease. b The SOX9 gene is anked by large gene deserts on both sides. The 5 gene desert contains a large number of conserved non-coding elements. Analysis of chromosomal lesions has revealed strong correlations between distinct parts of the region and specic endophenotypes such as proximal and distal campomelic dysplasia (CD) regions, a region associated with disorders of sex development (DSD) and a region for non-syndromic Pierre Robin Sequence (PRS) located 1.21.5Mb upstream of SOX9. Large duplications of the distal SOX9 regulatory domain are found in patients with brachydactyly anonychia (BA). A SNP located 1Mb upstream of SOX9 is implicated in risk of prostate cancer. c The SHH gene is regulated by multiple tissue-specic enhancers. Here only a set of forebrain enhancers (SBE1-4) and the single known SHH limb enhancer, the ZRS, are shown. Loss of SHH leads to holoprosencephaly (HPE3). A similar phenotype is caused in a patient with a point mutation in the SBE2 element which disrupts a SIX3 binding site. Mutation of SIX3 is itself a cause of holoprosencephaly (HPE2), an example of convergent aetiology hinting at common pathways. A large number of ZRS mutations have been found in patients with preaxial polydactyly II (PPD2) caused by ectopic activation of the element. Duplications of a genomic segment containing the ZRS create the different limb malformations TPTPS and SD4 (Haas type polysyndactyly). An inversion of a large chromosomal segment removes SHH from some of its endogenous enhancers including the ZRS, but juxtaposes the gene with an extraneous limb enhancer with incorrect spatial activity. d The relatively gene-rich -globin locus has highlighted a number of novel cis-ruption mechanisms. The erythroidspecic globin genes are controlled by a set of four upstream enhancers (MCS-R1-4). Deletions of variable length, but always including MCS-R-2, are a cause of -thalassemia. A deletion that removed the nal few exons of the adjacent LUC7L gene caused read-through of its transcript into the HBA1 genic region from an antisense direction, with concomitant epigenetic silencing of the gene. A single basepair change in the HBZ-HBA1 intergenic region (rSNP) was shown to create a new promoter. Redirection of enhancer activity to this promoter interferes with expression levels of the globin genes in some -thalassemia patients. e The multigene HOXD cluster, plus adjacent genes EVX2, LNP and MTX2, is anked by large gene deserts on both sides. Various limb malformations are associated with chromosomal abnormalities in the region, including deletions (horizontal bars), translocations (arrows) and an inversion (arrowhead). A regulatory archipelago consisting of multiple cis-regulatory elements (IV, GCR, Prox) has been shown to interact by looping to form this region into an active conformation hub

Fig.2The disease spectrum associated with genetic abnormalities in the SOX9 locus contains multiple endophenotypes determined by the nature and position of the cis-regulatory disruption. Haploinsufciency of SOX9 causes Campomelic Dysplasia, a syndrome encompassing skeletal dysplasia, sex determination abnormalities and Pierre Robin sequence. A variety of chromosomal abnormalities (translocations, deletions, duplications) upstream of SOX9 have uncovered specic endophenotypes of full-blown CD with correlations between the location of the lesions and the nature of the phenotype. In addition, a separate distinct phenotypic outcome, brachydactyly anonychia (Cooks syndrome) is caused by large duplications of the SOX9 regulatory domain. Increased cancer risk is associated with some SNP variants in the locus

sequence of developmental malformations resulting from a primary deciency in mandibular growth in early facial development. PRS is often a component of Campomelic Dysplasia, but also occurs as a relatively frequent isolated craniofacial anomaly (Amarillo etal. 2013; Fukami etal.

2012; Jamshidi etal. 2004). Detailed genetic analysis of a set of PRS patients carrying distal translocations or deletions led to identication of a locus for isolated PRS at ~1.21.5Mb upstream of SOX9 (Benko etal. 2009). Separate non-overlapping deletions were found in PRS patients in the centromeric region immediately beyond a translocation breakpoint cluster, while intriguingly a further patient carried a 36kb microdeletion 1.5Mb telomeric to SOX9. Sequencing of patient DNA of highly conserved elements within the centromeric deletions revealed a T>C change in a mandibular enhancer that disrupts an MSX1 binding site and affects enhancer activity (Benko etal. 2009). PRS can also be caused by mutation in another gene, SATB2 (MIM 608148), or by disruption of its long-range regulation. Interestingly, enhancers located beyond the gene-distal translocation breakpoints in some patients were shown to bind SOX9 and depend on integrity of the SOX9 binding site for functional activity (Rainger etal. 2014). Although they cannot strictly be classied as endophenotypes, some other disease phenotypes have also been associated with misregulation of SOX9. One of these is the earlier mentioned Cooks syndrome, characterised by

13

Hum Genet

missing middle phalanges, elongated terminal phalanges and absence of the nails (Kurth etal. 2009). Duplications overlapping in a 1.2Mb segment of the upstream SOX9 gene desert were found in affected members of four unrelated families with this phenotype, likely causing misexpression or overexpression of SOX9 at specic developmental time points, resulting in abnormal digit and nail development. The patients had no signs of campomelia, sex reversal or PRS. Upregulation of SOX9 expression has also been correlated with increased risk of prostate cancer (Thomsen etal. 2010; Wang etal. 2008). Association with a 130kb linkage disequilibrium block in the SOX9 gene desert was found in multiple GWAS studies of prostate cancer risk (Zhang etal. 2012). Two putative prostate-specic enhancers were found in this region, one of which, E1, located approximately 1Mb upstream, was shown to loop to the SOX9 promoter by 3C analysis. Two SNPs (rs8072254 and rs1859961) within this element were shown to alter transcription factor binding, with the prostate-cancer associated variants showing a signicantly increased binding afnity for the androgen receptor (AR) and creation of an AP-1 site, respectively (Zhang etal. 2012). The same LD block also showed up in a GWAS study of the spinal deformity Adolescent idiopathic scoliosis (AIS) (Miyake etal. 2013). Taking together the variety of endophenotypes associated with this region, the SOX9 locus provides a clear demonstration that the extended and modular nature of cis-regulatory landscapes of many developmental control genes, while essential for their pleiotropic role in development and homeostasis, also makes them vulnerable to deleterious genomic lesions.

Limb andbrain deformities caused bydisruption ofSonic Hedgehog expression The isolation of endophenotypes is also prominent in cases of regulatory disruption of the SHH gene (MIM 600725). The pleiotropic expression pattern of SHH encompasses the oor plate, notochord, areas of the brain, epithelial linings of digestive and respiratory tracts and the limb bud. A large gene desert is present upstream of the gene, which harbours many putative cis-elements (Fig.1c), including multiple enhancers for expression in the brain and epithelial linings (Jeong etal. 2006; Sagai etal. 2009), but so far only a single enhancer for limb bud expression (Lettice etal. 2003). Haploinsufciency of SHH is the predominant cause of holoprosencephaly (HPE3; MIM 142945), a malformation of the brain and craniofacial region (Belloni etal. 1996). Translocations up to 275kb from the gene have also been found in HPE (Roessler etal. 1997). In addition, a rare sequence variant in SBE2, a highly conserved

hypothalamus specic enhancer located 460kb upstream of SHH, was also found to be a causative mutation in a patient with HPE (Jeong etal. 2008). The mutation disrupts a binding site for SIX3, a homeoprotein which has itself been implicated in HPE through mutations and translocations (Wallis etal. 1999). Interestingly, and despite clear experimental evidence that the mutation strongly reduced binding of SIX3 to the element, the nucleotide variant was also present in the non-affected parent, arguing that such mutations are not always fully penetrant (Jeong etal. 2008). Structural abnormalities of the hands and feet are among the most conspicuous birth defects, and the involvement of cis-ruption mechanisms has been established in a number of human limb malformations. The Sonic Hedgehog protein is a crucial factor in growth and patterning of the hands and feet. Unlike many other developmental regulators, SHH is not a transcription factor but a signalling molecule with a key role in the hedgehog signalling pathway. Its expression in the limb bud is tightly controlled to be restricted to the zone of polarising activity (ZPA) in the posterior part of the limb bud, from where a diffusion gradient sets up graded expression of downstream effectors, in particular controlling the effective ratio of the activating and repressive forms of GLI3. The cis-regulatory element responsible for SHH expression in the ZPA was identied inside an intron of the adjacent LMBR1 gene (MIM 605522), at a distance of nearly a megabase from the SHH gene itself (Lettice etal. 2002). As SHH activity in the posterior limb denes the ZPA, the element was named the ZPA regulatory sequence (ZRS). A number of distinct clinical limb malformations are associated with regulatory mutations of the ZRS, collectively referred to as ZRS-associated syndromes (Wieczorek etal. 2009). These include Pre-axial polydactyly type II (PPD2; MIM 174500), triphalangeal thumb with polysyndactyly (TPTPS; MIM 174500), Haas type polysyndactyly (SD4; MIM 186200) and Werner mesomelic syndrome (WMS; MIM188770) (Albuisson etal. 2010; Farooq etal. 2010; Furniss etal. 2008; Gurnett etal. 2007; Maas and Fallon 2005; Niedermaier etal. 2005; Wieczorek etal. 2009). PPD2 characterised by a triphalangeal thumb, often accompanied by variably sized extra digits, is associated with a large number of independent point mutations in the ZRS. The causative link between point mutations in the ZRS and extra digit formation is further underlined by the discovery of ZRS mutations in limb mutants in a number of different species including mice, chickens, dogs and cats (Lettice etal. 2003, 2012; Maas etal. 2011). The common effect of these basepair changes in the ZRS is the appearance of ectopic expression of SHH in the anterior limb bud creating an extra ZPA and disrupting signalling gradients in the limb bud. The molecular effects of some of the mutations are starting to emerge. While some may disrupt specic TFBS, a number of the mutations were shown to

13

Hum Genet

create new binding sites for members of the ETS TF family (Lettice etal. 2012). These new sites add to a number of sites already present in the ZRS, and it is thought that the increase in the number of ETS sites helps to overcome inhibitory activities acting on the ZRS, thus leading to ectopic expression. A single nucleotide substitution at one particular position (nucleotide 404) in the ZRS produces a particularly strong phenotype, WMS, characterised by preaxial polydactyly with associated dwarsm as a result of tibial hypoplasia (Wieczorek etal. 2009). TPTPS and SD4 are also caused by a different type of cis-ruption mechanism. Studies of the genetic abnormalities in patients with these disorders showed the presence of intrachromosomal duplications of the region containing the ZRS but not SHH itself (Klopocki etal. 2008; Sun etal. 2008; Wieczorek etal. 2009). The presence of an additional (or two) ZRS elements in the genes regulatory domain can easily be envisaged to upregulate SHH expression, though it is less clear whether it also causes ectopic expression in the anterior limb bud, and how duplicated ZRS activity may translate to the patients phenotypes. The opposite mechanism, deletion of the ZRS region, leads to severe truncation of the limbs and the absence of feet. So far this has only been observed in mice engineered to lack the ZRS element (Lettice etal. 2002, 2003; Sagai etal. 2005), but the outcome is similar to acheiropodia (MIM 200500), a rare recessive limb malformation that is surprisingly caused by a small deletion not of the ZRS but of an uncharacterised region about 30kb further upstream from the ZRS (Ianakiev etal. 2001). The megabase distance between the ZRS and SHH promoter was shown to be bridged by loop formation, but the same looping structure was also found in non-expressing tissues (Amano etal. 2009; Li etal. 2012). Remarkably, the ZRS itself is not required for loop formation, suggesting other nearby sequences are involved in the formation of a permissive locus conguration in which the ZRS functions in a tissuespecic manner (Amano etal. 2009). Finally, chromosomal rearrangements affect the genomic landscapes on both sides of the breakpoint and on both affected alleles. They may not only remove important regulatory regions from a genes landscape, but they can also bring new enhancers into a regulatory domain where this activity is undesired. Even though in most currently known cases deleterious effects are due to loss of expression of the gene whose enhancers are translocated away from the gene, in some cases the gain of an inappropriate enhancer by the gene on the opposite chromosome can be the true cause of the disease. This appeared to be the case in a child with features of a holoprosencephaly spectrum (HPES) disorder and severe upper limb syndactyly with lower limb synpolydactyly (Lettice etal. 2011). The diagnosis of HPES suggested involvement of the SHH gene on chromosome

7. Characterisation of the childs chromosomes indeed revealed an inversion of part of chromosome 7, with breakpoints between 7q22.1 and 7q36.3. The inversion moves the Sonic Hedgehog (SHH) gene away from several of its regulatory elements including some CNS elements explaining the HPE phenotype. The inversion also removed its own limb enhancer (ZRS), but the patients phenotype did not t entirely with what is known to result from loss of the ZRS (Lettice etal. 2002; Sagai etal. 2005). However, in its new genomic surroundings, the authors identied a novel limb enhancer, highly conserved non-coding element 2 (HCNE2), located 190kb 3 of the breakpoint (within an intron of EMID2; MIM 608927), and show that it is capable of driving ectopic SHH expression in the limb bud. The authors suggest that in conjunction with loss of its own ZRS enhancer, the adoption of the HCNE2 element by SHH has led to altered SHH limb expression, contributing ultimately the limb malformation in this child (Lettice etal. 2011).

The alpha-globin locus, trailblazer forunconventional cis-ruption mechanisms The study of the globin gene disorders has had a great impact on the elucidation of many aspects of the mechanisms of gene regulation. Collectively referred to as hemoglobinopathies, these blood disorders have a long history of detailed molecular investigation aided by the relative ease of accessibility of the relevant cell types. The haemoglobin molecule contains two alpha- and two beta-globin subunits, produced from the - and -globin gene clusters (MIM 141800, MIM 141900) on human chromosome 16 and 11, respectively. Here, we will focus on the -globin locus (Fig. 1d). Alpha-thalassemia, a consequence of -globin downregulation, provides a strong argument in support of the advantages of human genetics, with its large pool of subject matter, for the discovery of gene regulation mechanisms (Higgs 2013). The alpha-globin gene cluster contains three major globin genes, one embryonic () and two foetal/adult (1 and 2). Four conserved regulatory elements are located upstream of the cluster (MCS-R1-4), with three of them embedded within the introns of a ubiquitously expressed gene NPRL3 (MIM 600928) (Kowalczyk etal. 2012), reminiscent of the intertwining of genes and cis-elements also seen for PAX6 and SHH. A long series of experiments has unravelled the regulatory events that occur during the progression from hematopoietic precursors to mature red blood cells (Higgs 2013). These have shown that a complex interplay takes place between on one hand activation through recruitment of transcription factors and polymerase at the cis-elements, followed by chromatin modications

13

Hum Genet

and looping between the enhancers and promoters, and on the other hand the removal of repressive chromatin marks that were imposed by Polycomb complexes (Garrick etal. 2008; Vernimmen etal. 2011). These studies into the precise sequence of events involved in activation of the globin locus have unearthed a number of intriguing thalassemia causing mechanisms along the way that are of interest here. The rst is a spurious silencing mechanism that involves interference with local chromatin structure by an aberrant antisense transcript. This mechanism was discovered through analysis of a Polish -thalassaemia family. Members of the family were shown to carry an 18kb deletion of the 3 end of the globin cluster, encompassing the HBA1 and HBQ1 genes, but leaving the upstream control elements and HBA2 gene intact (Tufarelli etal. 2003). While the deletion included one copy of the HBA1 gene, the phenotypic severity in the affected individuals suggested that expression from the intact HBA2 gene on the deleted chromosome would also be affected. Although the HBA2 gene and the upstream enhancers were fully intact, it was found that the HBA2 CpG island, normally unmethylated in blood cells, was now densely methylated on the abnormal chromosome and expression was silenced. Further analysis showed the abnormal presence of antisense RNA transcripts over the HBA2 gene, derived from the truncated neighbouring LUC7L gene (MIM 607782) on the opposite strand. It transpired that in addition to HBA1, the 18-kb deletion had removed the nal three exons of the LUC7L gene, including its poly-adenylation signal, causing RNA polymerase to read through into the HBA2 promoter region (Tufarelli etal. 2003). This case shows that proper ending of transcriptional elongation can be crucial not only for the transcribed gene itself, but also for its neighbours. Remarkably, few other examples of this mechanism have since come to light, although a similar mechanism was proposed involving silencing of the MSH2 gene (MIM 609309) in a case of Lynch syndrome (MIM 120435) (Ligtenberg etal. 2009). A further intriguing disease mechanism involves the disruption of the normal interactions of promoters and enhancers through misdirection of enhancer activity by the appearance of a new promoter (De Gobbi etal. 2006). This mechanism was uncovered in a group of -thallasemia patients from Melanesia. Initial DNA analysis had failed to nd mutations or deletions in the globin coding regions or regulatory elements. The co-dominance of the inheritance pattern suggested a gain of function defect that would negatively affect expression of the -globin genes. Patient resequencing identied a large number of potentially causative SNPs, but when RNA expression proling across the locus was done in normal and patient erythroblasts, a major new peak of transcription was identied in the patient sample in the genomic region between the regulatory elements and the globin genes, overlying one of the novel variants

uniquely linked with the patient phenotype. The peaks at the globin gene promoters were concomitantly reduced. The patient variant was shown to create a new GATA factor binding site, and binding of GATA-1, a key regulator of erythroid cell differentiation, as well as its common binding partners SCL, E2A, LMO2 and Ldb-1, was conrmed by chromatin immunoprecipitation. Thus, this single basepair mutation appears to have created a new (but unproductive) promoter, located between the upstream regulatory elements and their cognate promoters. Interference with the normal long-range regulatory interactions in the locus by competing for the attention of the enhancers results in an insufcient transcription level of the -globin genes (De Gobbi etal. 2006). It is remarkable that a single nucleotide change is able to create a new functional element that can rival the existing legitimate promoters. Variants creating a new TF binding site are not uncommon, but all presently validated cases occur in already existing cis-elements. Considering the relative simplicity of the GATA factor recognition sequence, sequence changes creating spurious binding sites for this factor would be expected to occur frequently throughout the genome. The assumption therefore has to be that the mutation has occurred in a loaded sequence, though experiments to investigate this hypothesis have so far not been reported. Competition for enhancer interaction is also revealed by another study of the alpha-globin region. Interaction experiments using 4C (Circular Chromosome Conformation Capture) revealed long-range looping interactions of the globin enhancers with non-globin bystander genes, NME4 (MIM 601818) and FAM173a, located 300kb and 625kb away (Hughes etal. 2013; Lower etal. 2009), showing that the globin enhancers are not strictly specic for the globin genes only, even though they do ignore most of the other promoters in the locus. Analysis of patient cells bearing deletions of the globin genes showed a strong upregulation of NME4, though increase of NME4 is not known to have any phenotypic consequence. It does however suggest that in return promoter disruptions of the bystander gene NME4 could alter long-range interactions in the locus which could have the spurious but deleterious side-effect of inuencing -globin output (Lower etal. 2009).

The HOXD cluster andthe deterministic nature ofregulatory landscapes According to their original denition, enhancers are control sequences that can act on a target promoter linked in cis independent of position or orientation. Similarly, promoters may be responsive to many different input signals from local and remote enhancers. Despite the generality of these

13

Hum Genet

principles, the overall architecture of genes and regulatory elements of genomic loci can have profound inuence in creating unique regulatory situations. The HOX genes are a group of transcription factors with crucial roles in patterning of the animal body plan. In nearly all bilaterian animals, the genes are arranged in multigene clusters, a genomic architecture which is an important factor in the control mechanism of their expression. The critical role of the regulatory landscape in determining the coordinated transcriptional output of the genes in the cluster has been studied extensively for the HoxD locus, one of four Hox clusters in mammals (Montavon and Duboule 2013). The HoxD cluster is involved in patterning of the body axis in the lumbosacral region, as well as in limb development and the correct formation and patterning of the digits. Large gene deserts of 600 and 780kb ank the nine HoxD genes along with the adjacent Mtx2, Evx2 and lunapark (Lnp) genes (MIM 608555, 142991, 610236) located on either side of the cluster (Fig.1e; Spitz etal. 2003). An important characteristic of Hox clusters is the phenomenon of collinearity, a correspondence in spatiotemporal expression of the genes correlating with gene order along the chromosome. A number of features conspire to achieve the collinear activation of the genes, including the presence of local regulatory activities within the cluster and the existence of remote enhancers that provide global regulatory signals acting across the whole locus (Montavon and Duboule 2013). In addition, the locus is subject to epigenetic regulation through the activities of Polycomb and Trithorax group genes. In ES cells, the locus is marked as a bivalent domain, carrying both active and repressive chromatin signatures and suggesting that the genes are silent but poised for activation. During development, a progressive loss of the repressive mark travels across the locus in a sliding window that coincides with sequential activation of parts of the locus (Soshnikova and Duboule 2009). Thus, it appears that global long-range activating signals are integrated with epigenetic modication states which determine the responsiveness of individual genes in the locus to generate the unique collinear transcriptional output in embryonic time and space (Andrey etal. 2013). Considering how such a unique and complex regulatory landscape may have evolved led to the hypothesis that the ancestral role of the cluster was the specication of the main body axis, and that parts of the cluster were later, i.e. more recent in evolution, co-opted to function in the development of the limbs (Spitz etal. 2001). Accordingly, the regulatory controls for the ancestral expression are located within the HoxD cluster, while the younger expression domains depend on enhancers located at remote positions outside the cluster (Spitz and Duboule 2008). During limb development, an early phase of expression of the 3 genes

of the cluster is essential for patterning of the arm/forearm and this phase is dependent on enhancer sequences located telomeric to the cluster. During a second phase of expression, controlled by sequences lying centromeric to the cluster, Hoxd10-13 and the neighbouring genes Evx2 and Lnp (Fig. 1e) are co-expressed in the distal part of the limbs, with very similar, but position-in-the-cluster dependent proles (Spitz etal. 2003). This suggested that digit expression for all these genes may be controlled by a shared enhancer or control region. Initially, a region about 200kb upstream of the locus (Fig.1e) was identied and shown to control tissue-specic expression in multiple tissues. This region, termed the global control region (GCR), is proposed to create a widespread regulatory landscape, sharing its enhancing activity over a dened number of genes in a tissue-specic manner (Spitz etal. 2003). Relative proximity to the enhancer region nevertheless plays a role in creating the subtle differences in expression between the HoxD paralogs. Digit activity of the GCR was shown to spread over the HoxD13-10 plus the Lnp and Evx2 genes, while CNS activity was limited to the Lnp and Evx2 genes. Another enhancer element, Prox, located close to the cluster between the Lnp and Evx2 genes was also shown to be involved, with suggestion that it could serve as a tethering element (Gonzalez etal. 2007). Subsequent study of the locus by the 3C technique revealed that in addition to the GCR and Prox elements, a number of further cis-elements are present in the upstream gene desert which interact with each other and with the HoxD promoters. Due to their appearance as narrow islands of enhancer-associated histone marks in this gene desert, they were termed collectively as a regulatory archipelago (Montavon etal. 2011). Occasionally, unexpected and counter-intuitive phenotypes are observed as a result of regulatory disruptions, and the HoxD locus provides an example of this. A number of human limb malformation syndromes involve chromosomal abnormalities near the HOXD cluster. A patient suffering from mesomelic dysplasia (shortening of the forearms and forelegs) with vertebral defects was found to carry a balanced translocation breakpoint 56kb telomeric of the HoxD1 gene (Spitz etal. 2002) near Mtx2. Other cases include deletions (2q31 microdeletion syndrome (Mitter etal. 2010)), inversions (Spitz etal. 2003), translocations (Dlugaszewska etal. 2006) and duplications (Cho etal. 2010; Kantaputra etal. 2010). With lack of availability of relevant tissue, material mouse models were engineered to investigate the effects of duplications and deletions on the locus. An intriguing result was revealed in a duplication model, where unexpectedly a loss rather than a gain of expression was observed. The phenotype caused by duplication of a genomic segment containing a subset of enhancers was similar to one normally seen with a partial deletion of the gene desert (Montavon etal. 2012). It was

13

Hum Genet

shown that the duplication causes a disruption in the topology of the locus such that distal-most subset of enhancers was no longer able to contribute to gene regulation. It also showed that the various enhancers have specic non-redundant roles as the elements on the duplicated segment could not compensate for the loss of contribution of the most distal elements. Although it is unclear whether the same mechanism is responsible, a number of human patients have been described with chromosomal duplications and mesomelic dysplasia or syndactyly (Ghoumid etal. 2011; Kantaputra etal. 2010; Mitter etal. 2010).

Long-range gene regulation andcommon disease The recent proliferation in genome-wide association studies (GWAS) and whole exome sequencing projects has provided a great advance in identication of candidate genomic regions involved in a variety of traits and susceptibilities to common diseases (Hindorff etal. 2009; Manolio etal. 2009). Identication of SNP variants in disease-associated genomic regions is expected to accelerate further in the near future by widespread adoption of the strategy of sequencing the entire genomes of multiple patients and normal controls to extract the overall mutational prole associated with the disease risk (Gonzaga-Jauregui etal. 2012). Underpinned by knowledge gained from the cis-ruption disease cases in the paradigm loci, it is now well established that causative disease or disease risk variants are not only to be found in the coding regions of the genome, but also in the intergenic spaces that may have a role in regulating the transcriptional output of adjacent genes. Indeed, it has become clear that the majority of GWAS markers are found in the non-coding regions of the genome (Maurano etal. 2012). The major challenge for these strategies therefore lies in the interpretation of the resulting data and in distinguishing the true pathogenic mutations from the wealth of co-segregating benign variants. The latter issue is especially problematic for suspected regulatory variants, where the effect of a mutation cannot easily be predicted from the sequence alone. A number of important steps are therefore required following identication of a region of association: (1) The identication of the actual functional sequences within the disease-associated region, (2) Evaluation of the functional activity associated with those sequences, (3) The effect of the mutation on that activity, (4) Evaluation of the effect of the mutation on the overall transcriptional output from the region and (5) Identication of the relevant target gene and its involvement in the disease. The success of the GWAS strategy is evident from the rapidly growing list of DNA variants associated with common traits and diseases (Hindorff etal. 2009). This list is already far too large to discuss all cases individually here.

However, due to the difculty and effort required for the follow-up steps, at present only a limited number of studies have gone beyond the simple identication of associated genomic regions to delve further into identication of the actual causal variants, the regulatory elements in which they reside, and the mechanistic basis of their effect. Efforts are being made to optimise strategies for the functional characterisation of these elements and their allelic variants (Bhatia in preparation), and new high-throughput methods such as STARR-seq, massively parallel reporter assay (MPRA) and the hydrodynamic tail vein assay, are being developed (see Table1) (Arnold etal. 2013; Kheradpour etal. 2013; Kim and Ahituv 2013). Below we highlight a number of salient examples of trait-associated SNP variants that have been subjected to further validation.

A SNP inthe c-MYC genomic region associated withcancer risk Similar to developmental malformations, cancer is also characterised by disruption of the normal processes of transcriptional control. The advent of genome-wide association studies (GWAS) in recent years has greatly contributed to the identication of many single nucleotide polymorphisms (SNPs) linked to cancer risk. The 8q24 chromosomal region around the c-MYC proto-oncogene(MIM 190080) is strongly associated with colorectal, prostate, breast, ovarian and bladder cancers. All cancer-associated SNPs in the region fall within the gene-proximal 600kb region of a 1.5Mb gene desert upstream of c-MYC. Intriguingly, thus far no SNP associations have been found in the even larger gene-free region downstream of the gene. Upregulation of c-MYC activity is a frequent factor in cancer progression. Recent work suggests that MYC acts as a general amplier of cellular transcription programs by modulating transcriptional pause release of Pol II polymerase (Lin etal. 2012). The discovery of distal chromosomal translocations near MYC in some cancers had shown that misregulation of an intact MYC gene can also be a cause of tumourigenesis (Cesarman etal. 1987; Cory etal. 1985; Henglein etal. 1989). Among a number of SNPs associated with colon cancer, follow-up studies have focussed on one particular SNP, rs6983267, falling within a TCF/LEF1 binding site in a conserved element located 335kb upstream of c-MYC (Pomerantz etal. 2009; Tuupanen etal. 2009; Wright etal. 2010). Increased TCF4 or TCF7L2 binding to the risk allele was shown to correlate with increased c-MYC expression from the cis-linked allele. Intriguingly, deletion of the enhancer containing rs6983267 in a mouse model only modestly affected Myc expression in the intestinal crypts, but did provide marked resistance to intestinal tumorigenesis (Sur etal. 2012). Looping of the element to the MYC

13

Hum Genet

promoter was detected in several studies (Pomerantz etal. 2009; Wright etal. 2010), but intriguingly, the looping frequency of the element to the MYC promoter showed no difference between the low- and high-risk alleles (Wright etal. 2010). Therefore, the elevated cancer risk is due to increased enhancer activity and does not stem from reorganisation of the chromatin loop. Many other putative ciselements are present in the c-MYC gene desert and other SNPs in the region show linkage with other types of cancer such as prostate and breast cancer risk (Ahmadiyeh etal. 2010; Sotelo etal. 2010; Wasserman etal. 2010).

marked decrease in interaction frequency between the element and the OCA2 promoter in the light-pigmented cell line. This suggests that the rs12913832 mutation changes the ability of the cis-element containing the SNP to form long-range looping interactions with the OCA2 promoter, resulting in lower expression of OCA2 and affecting melanosomal function (Visser etal. 2012).

SNP variants implicated incommon disease GWAS studies are a powerful method to detect genetic associations with regions that harbour variants that predispose to common diseases. One such study found association of a cluster of SNPs located 600kb from the IRS1 gene (MIM 147545) with type 2 diabetes (Kilpelainen etal. 2011). Looping studies using ChIA-PET (Chromatin Interaction Analysis with Paired End Tag sequencing), a modied high-throughput variant of the 3C technique, were used to strengthen the link, showing a direct interaction between the SNP cluster and the IRS1 promoter, suggesting the presence of enhancers in the region and implying that the role in T2D for variants in this region is mediated by their effect on IRS1 transcription (Li etal. 2012). Another example of linkage of a clinical phenotype to a non-coding DNA variant is the association of plasma low-density lipoprotein cholesterol (LDL-C) and susceptibility to myocardial infarction (MI) with the human 1p13 locus. A common non-coding polymorphism at this locus, rs12740374, creates a C/EBP (CCAAT/enhancer binding protein) transcription factor binding site on the risk allele and affects expression of several genes in the region. Even though not being the nearest one, the SORT1 gene (MIM 602458) showed the largest expression change in the liver. SORT1 is involved in control of plasma LDL-C and very low-density lipoprotein (VLDL) particle levels by modulating hepatic VLDL secretion, and consequently modulation of this pathway is likely to affect the risk for MI in humans (Musunuru etal. 2010). Dyslexia and language impairment are common learning disabilities with a substantial genetic component, most frequently associated with the DYX2 (susceptibility to dyslexia, type 2; MIM 600202) genomic region on 6p22. A 2.5kb microdeletion in this locus, in an intron of the DCDC2 gene (MIM 605755) removing a putative functional element, READ1, was found to constitute a risk allele for dyslexia (Powers etal. 2013). The deletion was shown to contain a highly polymorphic short tandem repeat sequence with a binding site for ETV6. As multimerisation of ETV6 molecules is thought to be important for its activity, the number of repeats in the element may change its functional strength. In addition, interaction with other risk haplotypes of the region is involved and it is still unclear

Human eye colour, OCA2 andinterference withloop formation DNA polymorphisms in regulatory elements can have a number of molecular consequences. One mechanism through which a non-coding variant can exert a phenotypic effect is via modulation of chromatin folding interactions. This was nicely demonstrated by the characterisation of a SNP variant affecting the common trait of blue eye colour. Although human eye colour is essentially a quantitative trait, involving around 16 different genes, the OCA2 locus on chromosome 15 has been identied as the major contributor to colour variation (Sturm and Larsson 2009; White and Rabago-Smith 2010). One particular SNP in this region, rs12913832, was shown to have strong statistical association with human pigmentation, including the characteristic of blue or brown eye colour (Eiberg etal. 2008; Sturm etal. 2008). rs12913832 is located within in a highly conserved sequence in intron 86 of the non-pigment related gene HERC2 (MIM 605837), but 21kb upstream of the known pigment associated gene OCA2 (MIM 611409) [oculocutaneous albinism type II (Brilliant 2001)]. The brown eye colour allele of rs12913832 is highly conserved across many animal species and it is thought that the SNP allele underlying blue eye colour in humans is derived from a common founder mutation at this position (Eiberg etal. 2008). The element harbouring the SNP was shown to act as an enhancer with allele-dependent activity. Reduced recruitment of the transcription factors HLTF, LEF1, and MITF was observed for the light-pigment allele. Using human melanocyte cell lines derived from light pigmented (and homozygous for one allele of rs12913832) and dark pigmented (homozygous for the other allele of rs12913832) donors, the effect of the variant on chromatin conformation of the locus could be assessed (Visser etal. 2012). Distal enhancers are thought to interact with their target genes by directly contacting the gene promoters via chromatin loops. Chromosome conformation capture (3C) technology, designed to assess these loops by detecting relative interaction frequencies between distal sites, showed a

13

Hum Genet

whether the READ1 element regulates DCDC2 itself or the adjacent KIAA0319, both genes involved in neuronal migration (Powers etal. 2013). Some variants may not cause disease immediately but inuence disease susceptibility through modulation of cellular response to environmental signals. SNPs in an LD region in the previously mentioned gene desert upstream of SOX9 are reported to inuence the effect of smoking on lung function (Hancock etal. 2013). Another example pertains to DNA variants on chromosome 9p21 in Coronary Artery Disease and Type 2 Diabetis (Harismendy etal. 2011). CAD and T2D risk alleles fall into adjacent but separate linkage disequilibrium blocks in a gene desert with a high density of predicted enhancers. The most consistently CAD associated SNP, rs10757278, lies within a conserved fragment where it disrupts a STAT1 binding site. STAT1 is involved in inammatory response and its binding to the site in the non-risk haplotype is induced by IFN treatment. This leads to induction of the non-coding transcript ANRIL (MIM 613149) and repression of CDKN2B (MIM 600431) (Yap etal. 2010). Using a 3C approach, the region containing the STAT1 binding site was shown to make long-range contacts with a number of genes/transcripts in the region which become remodelled upon IFN treatment. However, in cells homozygous for the risk allele, the IFN-induced STAT1 binding does not occur, thus likely abrogating the remodelling of chromatin structure of the locus in response to inammation (Harismendy etal. 2011).

Conspicuous byabsence Sometimes it can be informative to consider not only the cases and mechanims we do observe, but also the those that have not (yet) been found. While the list of validated mutations in long-range enhancers is steadily growing, mutations in other functional elements are so far conspicuously absent. Considering the importance of chromosomal conformation and genome topology in gene regulation (de Laat and Duboule 2013; Dekker etal. 2013) and the critical role of CTCF binding sites, insulators and/or loop formation elements (Phillips-Cremins and Corces 2013; Phillips and Corces 2009), it is surprising that few disruptions in such sites have been reported in genetic disease so far. A number of recent studies have shown that the genome is organised into large genomic domains that partition the effective reach of regulatory activity to separate domains (Dixon etal. 2012; Lieberman-Aiden etal. 2009; Nora etal. 2012; Phillips-Cremins etal. 2013; Wendt etal. 2008). While it is currently poorly understood what characterises the sequences that make up domain boundaries, it would be expected that mutations that disrupt these boundaries would have a strong effect on the regulated

transcription of the genes on either side of the boundary. The effect of such mutations therefore could be comparable to those of chromosomal rearrangements. It may be too early to draw conclusions from the absence of such mutations in the current literature, but it could be an indication that domain boundaries are not simply dened by a delimited underlying sequence fragment. In spite of our ignorance of the factors that are necessary for boundary function, the multifunctional zinc-nger domain protein CTCF is often implicated (Phillips-Cremins and Corces 2013; Phillips and Corces 2009) in insulator and boundary functions. In the tumour suppressor p16INK4a locus (MIM 600160), a CTCF-dependent boundary element, located 2kb upstream of the gene, blocks the spread of heterochromatin and keeps the promoter in an active chromatin state. In cancer cells, the boundary element becomes methylated, CTCF binding and boundary function are lost with concomitant encroachment of repressive heterochromatin over the locus (Witcher and Emerson 2009). The opposite effect has been proposed as a potential causative factor in the still enigmatic mechanism for facio-scapulo-humeral dystrophy (FSHD). FSHD (MIM158900), one of the most frequent myopathies, is linked to shortening of a variable array of D4Z4 repeats near the 4q35 telomere when it occurs on a specic genetic background, from between 11 and 100 in the normal population to below 11 in patients. De-repression of genes in the region through an epigenetic mechanism is the likely cause of the condition. In multirepeat form the D4Z4 fragments are methylated, prohibiting CTCF binding. Reduction of repeat number below a threshold (<11) allows binding of CTCF and the creation of a lamina dependent insulator blocking heterochromatin spreading from the telomere and leading to de-repression of nearby genes (Ottaviani etal. 2009, 2011). However, several alternative hypotheses exist, including a switch from a polycomb mediated repressed state of the region in healthy subjects to trithorax recruitment to the shortened repeat coordinated by expression of a non-coding RNA (Cabianca etal. 2012). Other studies focus on the activation in skeletal muscle of an ORF within the nal repeat, the germline transcription factor DUX4, which depends on the capture of a polymorphic poly-adenylation site in permissive genetic backgrounds (Richards etal. 2011; van der Maarel etal. 2011, 2012).

From genotype tophenotype Our knowledge of the rules underlying the cis-regulatory logic built into both genomic landscapes and individual enhancers is still very much in its infancy. Studies of ciselements in model species have shown that the precise details of enhancer architecture can be of great importance.

13

Hum Genet

Most enhancers contain a multitude of TF recognition sites and act as binding platforms to which sequence-specic transcription factors are attracted. The presence of bound factors at the element can then lead to further recruitment of other transcription factors along with associated coactivator proteins, such as the mediator complex, chromatin remodelers and histone modifying enzymes (Spitz and Furlong 2012). Thus, relatively large protein complexes can be assembled, forming what are sometimes called enhanceosomes (Bazett-Jones etal. 1994). In a slightly different model, the billboard or information display model, TFs assemble at the enhancer in a more exible and haphazard manner (Arnosti and Kulkarni 2005). The latter may be more suitable to provide some robustness to small mutations in the element, while the former model ts better with an all-or-nothing effect of mutations on enhancer output. The mechanistic action of enhancers however is still poorly understood. There is little detail about the level of cooperative and coordinate action between bound TFs that is required to give the appropriate functional outputs to the transcriptional machinery. The requirements for correct number, composition, order and spacing between binding sites in an enhancer are also poorly understood. In agreement with the theme of this review, it is likely that some common principles may apply but that the optimal arrangement will be unique to each enhancer. It should particularly be noted that what we refer to as the optimal composition for a particular enhancer will not necessarily be the one that produces the strongest signal. Transcription factor binding sites are commonly represented by a consensus sequence that reects the highest afnity recognition sequence, but most transcription factors can bind to a range of degenerate sequences. Genome-wide ChIP proling has demonstrated widespread TF binding at non-consensus sites, underscoring the signicance of lower afnity binding sites. Use of lower afnity binding sites provides a mechanism to respond to different TF concentrations and combinations (Davidson and Levine 2008). An example is provided by the regulation of Pax6 expression in the lens. Pax6 expression in the developing lens is controlled by multiple enhancers. One of these, the upstream ectodermal enhancer (EE) contains two lower afnity binding sites for the Prep1 transcription factor. The lower afnity level of these binding sites is important for correct timing of enhancer activity output, to avoid precocious high enhancer activity but still generate a low level of Pax6 expression at this stage. Deliberate mutation to higher afnity binding sites led to inappropriately high levels at an early developmental stage (Rowan etal. 2010). Synergism between the two sites nevertheless ensures that a sufciently high level of Pax6 transcription is achieved at later stages of lens development. This example highlights how precise cis-regulatory logic of enhancers can provide a delicate calibration of expression

output for spatio-temporally and dosage sensitive genes (Rowan etal. 2010). It is therefore clear that the details of enhancer architecture can be of great importance to its output, and mutation or variation of crucial sites in the element can have severe effects. It can be anticipated that detailed analysis of disease and trait-associated SNPs and indels will become instrumental in revealing the underlying principles of enhancer design. For the moment, however, the variety in phenotypic outcomes is bewildering and often unpredictable. The detrimental effect seen in some disease cases of such subtle mutations as a single nucleotide change in a remote control element stands in stark contrast to the apparent absence of phenotypic effects upon deletion of some highly conserved enhancers or even of large intergenic regions in targeted mouse experiments (Ahituv etal. 2007; Nobrega etal. 2004). These contrasting observations are very puzzling but may indicate a high degree of robustness of the developmental process to uctuations in expression levels for some genes and conditions but not others. The hardwiring of gene expression programs into gene regulatory networks (GRNs) connected through a multitude of distal enhancers imparts such robustness at at least three levels: (1) Robustness at the enhancer level: most enhancers are fragments of typically a few hundred basepairs in length, which contain a conglomeration of recognition sites for homotypic and heterotypic binding factors. Binding to the element may occur in additive or synergistic mode, and the consequences of cooperative binding between TFs may range from robust buffering of the output signal to all-ornone scenarios whereby the absence of a key factor leads to complete loss of enhancer activity (Bhatia etal. 2013). (2) Robustness at the regulatory landscape level: An important observation from enhancer mapping studies at a number of paradigm gene loci (e.g. PAX6, HOXD, SHH) has been that the tissue specicity of enhancers often encompasses multiple tissues and that genes are often controlled by multiple enhancers with overlapping tissue-specic activity (Jeong etal. 2006; Kleinjan etal. 2006; McBride etal. 2012; Montavon and Duboule 2012; Montavon etal. 2011; Uchikawa etal. 2003). These observations have found support in analysis of genome-wide ChIP-seq data of chromatin marks for linked enhancer-promoter units (EPUs). The median enhancer-to-promoter ratio per EPU in the mouse genome was reported to be between 5 and 6, consistent with the idea that multiple enhancers can regulate a single gene (Ernst etal. 2011; Shen etal. 2012). In Drosophila, most enhancers appear to have a more singular tissue specicity, but elements with overlapping activities have also been found. These were named shadow enhancers (Hong etal. 2008) and experiments have shown they can indeed provide robustness to the expression levels of their target gene under sub-optimal conditions (Frankel etal. 2010;

13

Hum Genet

Perry etal. 2010). The Drosophila gene shavenbaby is a master control gene for the hair-like trichome formation in ies and contains four separate enhancers with an overlapping activity pattern. Deletion of two of the enhancers has no effect under normal circumstances, but under extreme temperatures or on a sensitised genetic background, a signicant loss of trichomes ensues (Frankel etal. 2010). Similarly, deletion of one of two functionally overlapping enhancers for the Snail gene only showed an effect under high temperatures or reduced levels of its activating factor Dorsal (Perry etal. 2010), though a more recent report suggests that the shadow enhancer may not be entirely dispensable (Dunipace etal. 2011). In line with the latter observation, a recent report showing that deletion of a distal enhancer for ATOH7 (MIM 609875) is the cause of the congenital eye disorder Nonsyndromic congenital retinal nonattachment (NCRNA) (MIM221900) demonstrated also that the presence of apparently redundant shadow enhancers does not always protect against malformation even without any obvious environmental stress (Ghiasvand etal. 2011). In addition to the provision of robustness, the presence of complex arrays of enhancers acting on individual genes provides a mechanism for ne-tuning of subtle aspects (levels, timing precision) of gene expression in specic developmental processes (Perry etal. 2011). (3) Robustness at the level of the developmental process. Even though organismal development is an extremely complex process requiring great precision in cell fate determination and morphogenesis, it is remarkably resilient to perturbations caused by environmental inuences and mutations in the genetic components involved in the process. The hardwiring of genetic programs for developmental processes into gene regulatory networks ensures that uctuations in expression levels of individual genes are buffered through a multitude of auto- and cross-regulatory connections (Davidson 2010; Davidson and Levine 2008). The complex connectivity of these networks, providing among others modularity, redundancy, feed-back loops and switchlike interactions based on thresholds, ensures consistency in the progression of developmental processes even when some components of the network are disrupted (Gareld etal. 2013).

Regulatory variation, SNPs andincomplete penetrance A major problem associated with NGS analysis is the large number of sequence variants that exist in any genomic region compared to the reference genome. This even applies to exome sequencing projects, but is obviously wildly exacerbated in WGS strategies. To narrow down the list of potential causative variants, one would normally check public databases of common variation (dbSNP,

1000genomes, HapMap), and exclude any variants present in those databases. This however makes the assumption that any disease-causing variant will be very rare and fully penetrant. While this seems reasonable (though by no means absolute) in the case of exome variants, this assumption is far too restrictive when assessing regulatory variation. Regulatory variants can affect gene function in several ways that are inuenced by various modier alleles resulting in variable expressivity of the mutant allele and reduced penetrance of its effect. Therefore, one cannot discount rare or even common SNPs too easily and further evidence for their inclusion/exclusion as causative variants will be required. Dosage effects, differential allelic expression, and the inuence of modier alleles all have a role to play (Cooper etal. 2013). A number of clear examples of incomplete penetrance from strongly disease-associated enhancer mutations have been uncovered in recent years. The SIX3 binding site mutation in the SHH SBE2 enhancer causally linked with disease in a HPE patient but not in his father (Jeong etal. 2008), and the association of disruption of an AP2 binding site in an IRF6 (MIM 607199) enhancer with highly increased risk of cleft lip and palate (Rahimov etal. 2008) are but two examples of this. Variants in enhancer elements can act as strong modier alleles themselves (Dimas etal. 2008, 2009), as shown by the inuence of an enhancer for the RET gene (MIM 164761) on the clinical penetrance of Hirschsprung disease HSCR (MIM 142623) (Emison etal. 2005). The RET MCS+9.7 enhancer, located in intron 1 of the RET gene, contains a common SNP variant (rs2435357 C>T) which causes disruption of a SOX10-binding site, leading to a reduction in RET transcription level, and a>4 fold increase in susceptibility to Hirschsprung by RET coding sequence mutations (Emison etal. 2010). Interestingly, while the tyrosine kinase RET is the major HSCR gene with >80% of all known mutations, SOX10 (MIM602229) is another of the genes implicated in the disease, highlighting the connection between disease-associated genes through regulatory networks. Although it is no doubt an immensely difcult task, the availability of affected and non-affected sets of individuals carrying a disease-associated regulatory variant may represent an opportunity to tease out other factors/ modier alleles in the regulatory network. The task is made even more difcult when taking into account the possibility that some common diseases may result from a combinatorial effect of multiple enhancer variants in the same linkage disequilibrium (LD) block. A recent analysis of GWAS of six common autoimmune disorders indicated that the associations arose from multiple polymorphisms in LD. The variants mapped to clusters of enhancer elements are active in the same cell type, supporting a multiple cooperative enhancer variant hypothesis for common traits (Corradin etal. 2013).

13

Hum Genet

Lessons learned fromparadigm loci The detailed studies of genetic disease cases associated with the paradigm loci described above as well as a number of other complex regulatory regions have provided several instructive lessons. The rst, now well accepted is that genetic disease and disease susceptibility can be caused by disruption of the regulatory as well as the coding parts of genes. The regulatory parts of genes may stretch over megabase-sized distances and incorporate a variety of cisregulatory elements (Kleinjan and van Heyningen 2005). More importantly at this stage is therefore the notion that the so-called cis-ruption effects can come about in a number of ways. The most obvious is the loss of transactivating or repressing activity of an enhancer when deleted or aficted by a TF binding site mutation. This is most likely the effect of the mutations of the PAX6 and SOX9 long-range elements, and can involve both activating and repressing activities. Such transactivating activity loss can affect both PIC assembly and the switch from initiating to elongating forms of the PolII complex. However, while in cases of point mutations or indels in enhancers the initial focus understandably is on the putative loss of TF binding activity to the mutated site, formation of a new binding site or the strengthening of a weak site should always be a consideration (Rowan etal. 2010; Zhang etal. 2012). Another way enhancer function can be impaired is by interference with the looping interaction between enhancer and promoter. Direct physical interaction between distal enhancers and gene promoters is considered an important event in transcriptional activation, and the advent of highthroughput conformation capture studies is uncovering the widespread prevalence of such interactions. Disruption of these interactions is a cis-ruption mechanism that seems to be involved in determination of the trait of blue eye colour, where a variant site in an enhancer affects its interaction frequency with the OCA2 promoter (Visser etal. 2012). The importance of long-range loops in enhancer-driven transcription was shown in an elegant experiment using the beta-globin locus. It was known that the transcription factor Gata1 and the associated co-factor Lbd1 are required for loop formation between the LCR and globin promoters, which is necessary for beta-globin expression. In Gata1 knock-out cells, Lbd1 is no longer found at the globin promoters though it is still present at the distal LCR. Using articial zinc ngers to tether the self-association domain of Lbd1 to the promoter restored loop formation between the promoter and the LCR, with concomitant robust induction of globin gene transcription (Deng etal. 2012). Apart from disruption of the appropriate looping interactions by abrogation of the binding capacity of looping factors at the required sites, interference with normal longrange interactions by the formation of inappropriate loops

through the inadvertent creation of novel interaction sites presents the other side of the coin for this type of disease mechanism, as highlighted in -thalassemia (De Gobbi etal. 2006). The observations that duplications in a distal part of the HoxD cluster can affect expression due to disruption of the lay-out of the locus rather than due to specic loss or gain of the intrinsic activity of the deleted or duplicated fragment itself, show that interference with the topology of the regulatory landscape of genes is another pathway to disease. Occurrences of the latter mechanism are hard to prove, especially as larger rearrangements nearly always include some putative cis-regulatory elements. However, support for this mechanism is provided by studies on the mouse Fgf8 locus, which contains a number of enhancers interspersed with adjacent, unrelated genes along a relatively gene-dense region. These enhancers appear to act in a concerted manner, but their combined potential activity is ltered and polarised differently to different positions within the locus, such that the promoters of the various target and bystander genes sense only a subset of the total enhancer activity available in the locus (Marinic etal. 2013). Rearrangement of locus structure can thus lead to sensing of inappropriate enhancer signalling by both target and non-target promoters. Indeed, this could provide an explanation for the currently unresolved mechanism associated with SHFM3, a limb malformation characterised by hypoplasia/aplasia of the central digital rays and variable fusion of the remaining digits (MIM 246560), which has been linked to submicroscopic duplications of the human FGF8 region (MIM 600483) (de Mollerat etal. 2003; Dimitrov etal. 2009). Adoption of inappropriate enhancers through disruption of locus structure may also be at play in the intriguing homeotic arm-to-leg transformation Liebenberg syndrome (MIM 186550), associated with structural changes upstream of the PITX1 gene (MIM 602149) (Spielmann etal. 2012). In two separate families with deletions starting 269kb upstream of PITX1 (107 and 129kb), a strong limb enhancer is brought in closer proximity to the gene, while in a family with a t(5;18) translocation with a breakpoint 224kb upstream of the gene two new limb enhancers were relocated into the PITX1 domain. Misexpression of a Pitx1 cDNA driven by one of the enhancers in single-copy transgenic mice led to a forelimb-to-hindlimb transformation, recapitulating the Liebenberg phenotype (Spielmann etal. 2012). The plethora of ways to disrupt gene expression complicates the assessment of the potential causative genes and mechanisms of newly identied potential cis-ruption cases. In particular, in the case of chromosomal abnormalities, multiple factors need to be considered as the rearrangement inescapably changes multiple genomic landscapes in a dramatic way. On each side of each breakpoint, the

13

Hum Genet

endogenous genomic region is removed and replaced with a new and unfamiliar sequence. The effects of both loss and gain at each of the ends need to be assessed. Complicated combinations of enhancer losses and gains can be encountered, as shown in the case of SHH limb enhancer adoption (Lettice etal. 2011). The detailed studies of the described regulatory loci have revealed many principles of transcriptional regulation and, moreover, of the mechanisms by which disruptions in the regulatory environment of these loci can lead to disease. Knowledge of these principles can now be applied to the enormous amount of data that is emerging from GWAS and high-throughput patient re-sequencing studies. It has long been recognised that genome variability has great inuence on differential disease risk among individuals. A major difculty lies in the identication of the specic variants that are of true relevance to the disease of interest and the interpretation of their effect on the biology of the disease and its phenotypic appearance. Despite the impressive success of GWAS in linking genomic regions to traits and diseases, those latter issues often remain a bottleneck to further progress. As often many additional SNPs are in linkage with the lead SNP, it is by no means certain that the lead SNP will be the actual causative variant, leaving a large number of SNPs requiring further assessment. The obvious, but not necessarily valid, assumption would be that the variant will affect TF binding at a regulatory region, suggesting that an educated selection of SNPs for further analysis can be made by prioritising those SNPs that coincide with putative regulatory elements. In support of this approach, some studies have shown that there is a signicant overlap between cancer risk variants and enhancer-hallmarked regions (Akhtar-Zaidi etal. 2012; Teng etal. 2011). Equally, a GWAS signal may suggest the involvement of a non-coding sequence in an association region containing many genes, without any of these standing out as a likely culprit. Since cis-regulatory elements can act over large genomic distances, up to at least 1.5Mb in some cases, even for associations with a narrow region of linkage disequilibrium, genes within a wide genomic region need to be considered and more than one gene may be affected by the variant. One solution to this problem is to combine GWAS linkage with eQTL analysis. Transcript abundance can be treated as a quantitative trait that is directly modied by polymorphism in regulatory elements (Cookson etal. 2009). Expression analysis on relevant tissues from subjects and controls can in some cases point to a particular gene in the region. However, as experience from the paradigm loci has shown, multiple genes can be affected by a cis-regulatory variant, sometimes separated by large distances (e.g. globin and NME4; lunapark-HoxD), and the presence of these bystander genes can complicate matters

further (Lower etal. 2009; Spitz etal. 2003). Nevertheless, this approach has the advantage that it can link putative regulatory SNPs to their likely candidate genes and make previously unknown connections between diseases and genes. The method has been successfully applied in a number of complex disease cases, e.g. ORMDL3 (MIM 610075) and asthma (Moffatt etal. 2007), PTGER4 (MIM 601586) and Crohns disease (Libioulle etal. 2007) and several others.

Cis-regulation andevolution Genetic disease, complex disease susceptibility and evolutionary changes in traits can be viewed as different wavelengths on the spectrum of cis-regulatory effects (Fig.3). The notion that cis-regulatory variation is an important factor in evolution has been around for some time especially as it allows spatio-temporally specic modication of developmental networks without affecting gene function in other tissues (Fig.4) (King and Wilson 1975; Wray 2007). A genetic variant causing a developmental abnormality may constitute a pathological defect in the lifetime of an individual, but under different circumstances and on the much larger time-scale of evolution may turn out to be a morphological novelty. For instance, it could be imagined that a triphalangeal thumb may 1day confer a selective advantage, such that some particular form of the ZRS currently considered as the mutant form will become the prevailing, wildtype form. The evolutionary signicance of cis-regulatory mutations is reviewed in detail elsewhere (Wittkopp and Kalay 2011; Wray 2007), but nicely illustrated

Fig.3Variants or mutations in cis-regulatory elements are at the centre of a spectrum of phenotypic consequences ranging from genetic disease, disease predisposition, and trait variability that can be acted upon by evolutionary forces

13

Hum Genet

by a number of human specic adaptations. Two studies have looked at conserved elements that are fast evolving in the human genome, and show that many of these can act as developmental enhancers (Capra etal. 2013; Prabhakar etal. 2006). Gain of function changes in one such element, HACNS1 located near CENTG2 (MIM 608651)and GBX2 (MIM 601135), have been linked to the appearance of an opposable thumb in primates (Prabhakar etal. 2008). Some other human accelerated conserved non-coding (HACNS) elements are located near the gene AUTS2, a gene involved in neurodevelopment. The region carries signature of a human-Neanderthal selective sweep, but various lesions in the region are linked to autism spectrum disorders (Oksenberg etal. 2013). Another study found 510 sequence fragments that are highly conserved between chimpanzees and other mammals but are absent from the human genome (McLean etal. 2011). All but one of these were located in non-coding regions and appeared to be enriched near genes involved in steroid hormone signalling and neural function. Two candidate enhancers with a potential role in human evolutionary divergence were analysed further. Intriguingly, the absence of an enhancer from the human androgen receptor (AR) gene locus correlates with anatomical loss of androgen-dependent sensory vibrissae and penile spines in the human lineage. The loss of a forebrain enhancer near the tumour suppressor gene GADD45G (MIM 604949) may have had some role in enabling expansion of specic brain regions in humans. (McLean etal. 2011). Using differences in DNase hypersensitive sites between equivalent cell types from human, chimpanzee and macaque, another study mapped species-specic gains of active enhancers and showed that these could result from single nucleotide changes in TF binding sites (Shibata etal. 2012). When conditions provide a strong selective advantage, a variant can spread rapidly through populations. This is exemplied by the gain of the trait of lactase persistence, the ability to digest milk into adulthood, caused by regulatory mutations that have spread rapidly in some human populations in the last 10,000years since the dawn of agriculture. Multiple independent variants that drive the continued expression of lactase after weaning are found in regulatory regions upstream of the LCT gene (MIM 603202), and these map in a specic pattern with haplotypes from distinct human populations, showing that strong selective pressure resulting from animal domestication and adult milk consumption has led to convergent evolution of this trait (Enattah etal. 2002, 2007; Tishkoff etal. 2007). These examples of the loss and gain of traits in the human lineage demonstrate that, similar to disease-associated regulatory disruptions, different types of primary event (gain of function mutations in a regulatory element, enhancer deletions) can be a mechanism of evolutionary change.

Clinical implications andfuture perspectives As demonstrated by the discoveries made in the studies on the paradigm loci, detailed investigation of cis-ruption disease cases can not only further our understanding of the causative events that lead to the disease, but thereby also provide novel insight about the underlying cis-regulatory mechanisms that can be extrapolated to general principles. Moreover, identication of a novel enhancer whose functional disruption is involved with disease in a set of patients can open the door for inclusion of that enhancer in future diagnostic and predictive genetic testing. This can be important for clinical insight and may guide the way the disease is treated. For instance, if a regulatory mutation causes the loss of gene expression in one specic tissue only, or interferes with maintenance of expression at post-natal stages, therapeutic intervention can be tailored to these sites. Cis-regulatory variation can also affect drug metabolism by inuencing the levels of drug targets, transporters and processing enzymes (Kim etal. 2011), and future advances in regulatory pharmacogenomics may help in identication of responders versus non-responders, the prevention of adverse effects and the optimization of therapies for individual patients (Smith etal. 2012). The analysis of cis-regulatory disruptions in genetic and complex disease cases also has the prospect of elucidating previously undened regulatory networks. Investigation of disease-associated variants and increased knowledge about transcription factor binding conditions may result in improved resolution of network architectures (Davidson 2010). This should improve understanding of disease mechanisms and lead to new approaches for disease diagnosis and therapy (Hindorff etal. 2009). A further understanding of the regulatory network circuitry, including not only the cis-regulatory sequences, but also their cognate transcription factors, co-factors and chromatin remodelers, will also be of great benet to the eld of cellular reprogramming and regenerative medicine. Currently, the ability to assign a particular genetic condition unequivocally to a change in a regulatory region is often hampered by the absence of prior knowledge, rstly of the target gene of the cis-element, and secondly, of a clear link between the target gene and the disease. The cis-ruption cases in the paradigm gene loci benetted from known genedisease associations through intragenic mutation or deletion of the gene in other patients with the disease. If such connection is not available, the problem becomes both more interesting (discovery of a new gene disease connection) and more difcult (as a link between activity of the putative gene product and the disease needs to be established de novo). As enhancers can affect just a subset of the expression pattern of the gene, regulatory mutations may cause a narrower disease phenotype that

13

Hum Genet

Fig.4Variants and mutations in tissue-specic enhancers can affect a limited subset of the full expression pattern of pleiotropic genes and thus reveal specic endophenotypes Genes with spatio-temporally complex expression patterns are controlled by multiple cis-regulatory elements, spread at variable distances throughout their gene locus. Each individual element drives a subset of the full expression pattern, often with overlapping sites of activity shared between multiple elements. In combination, these elements form the input hubs that integrate various input signals (e.g. through binding of transcription factors TFA-TFD) and produce a gene expression output that provides

a signal to the next level of the relevant gene regulatory networks (e.g. binding of the gene product at enhancers for downstream genes). Due to the modular nature of cis-elements, mutations or deletions in their sequences may affect gene activity in a tissue- and stage-specic manner. Functional variants in an individual cis-element would cause a change in gene expression of its target gene only in specic organs or tissues, leaving gene function in other tissues unaffected. In the example, a mutation in the 3 element, a cell type B specic enhancer, prevents binding of TFC, causing absence of expression and the disruption of downstream regulatory events

forms only part of a wider syndrome, as shown for the isolated endophenotypes seen in some patients with changes in the SOX9 regulatory domain. In such cases, particular enhancers can become part of the standard genetic testing repertoire for the endophenotype. An important step forward in the discovery of the molecular causes of disease in the new age of whole genome sequencing will be the collection and data mining of all coding and non-coding variants in specic groups of patients. This will provide opportunity to extract the shared variants that may reveal links between common enhancers, target genes or biological pathways, thus supporting the role of these elements in a given biological process (McClellan and King 2010). Looking further ahead, it may soon become feasible to start systematically linking particular combinations of multiple variants with specic phenotypes. Further investigation of the three-dimensional organisation of the genome and the networks of chromatin interactions between promoters and distal elements in different cell types will likely continue to paint a picture of gene regulatory interactions that is far more complex than imagined only a few years ago (Dekker etal. 2013; Zhang etal. 2013). Already 5C (chromosome conformation capture carbon copy) experiments are revealing that many promoters and distal elements are each engaged in multiple longrange interactions to form complex interaction networks. How many of these interactions occur simultaneously in

the same cell is presently unresolved, but it indicates that most interactions between promoters and enhancers are not exclusively one-to-one, and suggests that multiple genes and distal elements may come together in larger regulatory hubs (Sanyal etal. 2012). This suggests that changes in a single regulatory element could affect all genes in the hub, potentially leading to complex disorders. Exciting advances in the possibilities for efcient genome editing as well as the manipulation of endogenous gene expression at specic loci in an RNA-guided fashion using the clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system (Cong etal. 2013; Jinek etal. 2013; Maeder etal. 2013a; Mali etal. 2013a, b; Perez-Pinera etal. 2013a), or with engineered transcription factors based on zinc nger or transcription activator like (TAL) effector proteins (Maeder etal. 2013b; Perez-Pinera etal. 2013b) hold great promise for the prospect of therapeutic intervention of some regulatory diseases.

Conclusions The fundamental biological processes that govern organismal development (such as developmental patterning, morphogenesis, organogenesis and cell differentiation) and subsequent homeostasis are all dependent on the elaborate regulatory networks that establish the complex and

13

Hum Genet

often highly specic patterns of gene expression required. Studies on selected gene loci over the past >25years have provided great insight into the molecular mechanisms that underlie the strictly regulated expression of these genes. In recent years, the advent of new high-throughput genome-wide techniques has added insight into the widespread prevalence of regulatory information encoded in the genome, showing that thousands of enhancers and other cis-regulatory elements are involved in orchestrating the overall expression landscape in each individual tissue. Disruptions in these regulatory parts of the genome can lead to genetic disease as highlighted by studies on a number of paradigm gene loci. GWAS and other studies have revealed that a large proportion of sequence variants involved in common diseases and genetic traits reside in the non-coding parts of our genome. By applying the principles of cisregulatory control learned from the paradigm loci to the information obtained by genome-wide studies, we can now start to build models of our regulatory genome in different contexts. Yet, while the paradigm loci have shown us many of the general principles of gene regulation, studies of associated diseases have also made clear that very similar types of mutation can have widely differing phenotypic outcomes, making each situation unique and the consequences of regulatory disruptions difcult to predict. Nevertheless, better understanding of the intricate mechanisms that control the expression of our genes will greatly benet the management of genetic disease and the advent of personalised medicine, and the study of cis-ruption mechanisms in human genetic disease will undoubtedly continue to reveal new insights into our regulatory genome.

References
Ahituv N, Zhu Y, Visel A, Holt A, Afzal V, Pennacchio LA, Rubin EM (2007) Deletion of ultraconserved elements yields viable mice. PLoS Biol 5:e234 Ahmadiyeh N, Pomerantz MM, Grisanzio C, Herman P, Jia L, Almendro V, He HH, Brown M, Liu XS, Davis M, Caswell JL, Beckwith CA, Hills A, Macconaill L, Coetzee GA, Regan MM, Freedman ML (2010) 8q24 prostate, breast, and colon cancer risk loci show tissue-specic long-range interaction with MYC. Proc Natl Acad Sci USA 107:97429746 Ahn KJ, Passero F Jr, Crenshaw EB 3rd (2009) Otic mesenchyme expression of Cre recombinase directed by the inner ear enhancer of the Brn4/Pou3f4 gene. Genesis 47:137141 Akhtar-Zaidi B, Cowper-Sal-lari R, Corradin O, Saiakhova A, Bartels CF, Balasubramanian D, Myeroff L, Lutterbaugh J, Jarrar A, Kalady MF, Willis J, Moore JH, Tesar PJ, Laframboise T, Markowitz S, Lupien M, Scacheri PC (2012) Epigenomic enhancer proling denes a signature of colon cancer. Science 336:736739 Albuisson J, Isidor B, Giraud M, Pichon O, Marsaud T, David A, Le Caignec C, Bezieau S (2010) Identication of two novel mutations in Shh long-range regulator associated with familial preaxial polydactyly. Clin Genet 79:371377

Almind GJ, Brondum-Nielsen K, Bangsgaard R, Baekgaard P, Gronskov K (2009) 11p Microdeletion including WT1 but not PAX6, presenting with cataract, mental retardation, genital abnormalities and seizures: a case report. Mol Cytogenet 2:6 Amano T, Sagai T, Tanabe H, Mizushina Y, Nakazawa H, Shiroishi T (2009) Chromosomal dynamics at the Shh locus: limb budspecic differential regulation of competence and active transcription. Dev Cell 16:4757 Amarillo IE, Dipple KM, Quintero-Rivera F (2013) Familial microdeletion of 17q24.3 upstream of SOX9 is associated with isolated Pierre Robin sequence due to position effect. Am J Med Genet A 161A:11671172 Andrey G, Montavon T, Mascrez B, Gonzalez F, Noordermeer D, Leleu M, Trono D, Spitz F, Duboule D (2013) A switch between topological domains underlies HoxD genes collinearity in mouse limbs. Science 340:1234167 Arnold CD, Gerlach D, Stelzer C, Boryn LM, Rath M, Stark A (2013) Genome-wide quantitative enhancer activity maps identied by STARR-seq. Science 339:10741077 Arnosti DN, Kulkarni MM (2005) Transcriptional enhancers: intelligent enhanceosomes or exible billboards? J Cell Biochem 94:890898 Bagheri-Fam S, Barrionuevo F, Dohrmann U, Gunther T, Schule R, Kemler R, Mallo M, Kanzler B, Scherer G (2006) Long-range upstream and downstream enhancers control distinct subsets of the complex spatiotemporal Sox9 expression pattern. Dev Biol 291:382397 Balemans W, Patel N, Ebeling M, Van Hul E, Wuyts W, Lacza C, Dioszegi M, Dikkers FG, Hildering P, Willems PJ, Verheij JB, Lindpaintner K, Vickery B, Foernzler D, Van Hul W (2002) Identication of a 52kb deletion downstream of the SOST gene in patients with van Buchem disease. J Med Genet 39:9197 Bazett-Jones DP, Leblanc B, Herfort M, Moss T (1994) Short-range DNA looping by the Xenopus HMG-box transcription factor, xUBF. Science 264:11341137 Belloni E, Muenke M, Roessler E, Traverso G, Siegel-Bartelt J, Frumkin A, Mitchell HF, Donis-Keller H, Helms C, Hing AV, Heng HH, Koop B, Martindale D, Rommens JM, Tsui LC, Scherer SW (1996) Identication of Sonic hedgehog as a candidate gene responsible for holoprosencephaly. Nat Genet 14:353356 Bhatia S, Bengani H, Fish M, Brown A, Divizia MT, de Marco R, Damante G, Grainger R, van Heyningen V, Kleinjan DA (2013) Disruption of auto-regulatory feedback by a mutation in a remote, ultra-conserved PAX6 enhancer causes aniridia. Am J Hum Genet 93:11261134 Benito-Sanz S, Royo JL, Barroso E, Paumard-Hernandez B, BarredaBonis AC, Liu P, Gracia R, Lupski JR, Campos-Barros A, Gomez-Skarmeta JL, Heath KE (2012) Identication of the rst recurrent PAR1 deletion in Leri-Weill dyschondrosteosis and idiopathic short stature reveals the presence of a novel SHOX enhancer. J Med Genet 49:442450 Benko S, Fantes JA, Amiel J, Kleinjan DJ, Thomas S, Ramsay J, Jamshidi N, Essa A, Heaney S, Gordon CT, McBride D, Golzio C, Fisher M, Perry P, Abadie V, Ayuso C, Holder-Espinasse M, Kilpatrick N, Lees MM, Picard A, Temple IK, Thomas P, Vazquez MP, Vekemans M, Roest Crollius H, Hastie ND, Munnich A, Etchevers HC, Pelet A, Farlie PG, Fitzpatrick DR, Lyonnet S (2009) Highly conserved non-coding elements on either side of SOX9 associated with Pierre Robin sequence. Nat Genet 41:359364 Benko S, Gordon CT, Mallet D, Sreenivasan R, Thauvin-Robinet C, Brendehaug A, Thomas S, Bruland O, David M, Nicolino M, Labalme A, Sanlaville D, Callier P, Malan V, Huet F, Molven A, Dijoud F, Munnich A, Faivre L, Amiel J, Harley V, Houge G, Morel Y, Lyonnet S (2011) Disruption of a long distance

13

regulatory region upstream of SOX9 in isolated disorders of sex development. J Med Genet 48:825830 Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:5774 Birnbaum RY, Clowney EJ, Agamy O, Kim MJ, Zhao J, Yamanaka T, Pappalardo Z, Clarke SL, Wenger AM, Nguyen L, Gurrieri F, Everman DB, Schwartz CE, Birk OS, Bejerano G, Lomvardas S, Ahituv N (2012) Coding exons function as tissue-specic enhancers of nearby genes. Genome Res 22:10591068 Brilliant MH (2001) The mouse p (pink-eyed dilution) and human P genes, oculocutaneous albinism type 2 (OCA2), and melanosomal pH. Pigment Cell Res 14:8693 Brown KK, Reiss JA, Crow K, Ferguson HL, Kelly C, Fritzsch B, Morton CC (2009) Deletion of an enhancer near DLX5 and DLX6 in a family with hearing loss, craniofacial defects, and an inv(7)(q21.3q35). Hum Genet 127:1931 Buecker C, Wysocka J (2012) Enhancers as information integration hubs in development: lessons from genomics. Trends Genet 28:276284 Cabianca DS, Casa V, Bodega B, Xynos A, Ginelli E, Tanaka Y, Gabellini D (2012) A long ncRNA links copy number variation to a polycomb/trithorax epigenetic switch in FSHD muscular dystrophy. Cell 149:819831 Capra JA, Erwin GD, McKinsey G, Rubenstein JL, Pollard KS (2013) Many human accelerated regions are developmental enhancers. Philos Trans R Soc Lond B Biol Sci 368:20130025 Cesarman E, Dalla-Favera R, Bentley D, Groudine M (1987) Mutations in the rst exon are associated with altered transcription of c-myc in Burkitt lymphoma. Science 238:12721275 Cho TJ, Kim OH, Choi IH, Nishimura G, Superti-Furga A, Kim KS, Lee YJ, Park WY (2010) A dominant mesomelic dysplasia associated with a 1.0-Mb microduplication of HOXD gene cluster at 2q31.1. J Med Genet 47:638639 Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, Hsu PD, Wu X, Jiang W, Marrafni LA, Zhang F (2013) Multiplex genome engineering using CRISPR/Cas systems. Science 339:819823 Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (2009) Mapping complex disease traits with global gene expression. Nat Rev Genet 10:184194 Cooper DN, Krawczak M, Polychronakos C, Tyler-Smith C, KehrerSawatzki H (2013) Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease. Hum Genet 132:10771130 Corradin O, Saiakhova A, Akhtar-Zaidi B, Myeroff L, Willis J, Cowper-Sal-Lari R, Lupien M, Markowitz S, Scacheri PC (2013) Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res 24:113 Cory S, Graham M, Webb E, Corcoran L, Adams JM (1985) Variant (6;15) translocations in murine plasmacytomas involve a chromosome 15 locus at least 72kb from the c-myc oncogene. EMBO J 4:675681 Cox JJ, Willatt L, Homfray T, Woods CG (2011) A SOX9 duplication and familial 46, XX developmental testicular disorder. N Engl J Med 364:9193 Dathe K, Kjaer KW, Brehm A, Meinecke P, Nurnberg P, Neto JC, Brunoni D, Tommerup N, Ott CE, Klopocki E, Seemann P, Mundlos S (2009) Duplications involving a conserved regulatory element downstream of BMP2 are associated with brachydactyly type A2. Am J Hum Genet 84:483492 Davidson EH (2010) Emerging properties of animal gene regulatory networks. Nature 468:911920

Hum Genet Davidson EH, Levine MS (2008) Properties of developmental gene regulatory networks. Proc Natl Acad Sci USA 105:2006320066 De Gobbi M, Viprakasit V, Hughes JR, Fisher C, Buckle VJ, Ayyub H, Gibbons RJ, Vernimmen D, Yoshinaga Y, de Jong P, Cheng JF, Rubin EM, Wood WG, Bowden D, Higgs DR (2006) A regulatory SNP causes a human genetic disease by creating a new transcriptional promoter. Science 312:12151217 de Kok YJ, Vossenaar ER, Cremers CW, Dahl N, Laporte J, Hu LJ, Lacombe D, Fischel-Ghodsian N, Friedman RA, Parnes LS, Thorpe P, Bitner-Glindzicz M, Pander HJ, Heilbronner H, Graveline J, den Dunnen JT, Brunner HG, Ropers HH, Cremers FP (1996) Identication of a hot spot for microdeletions in patients with X-linked deafness type 3 (DFN3) 900kb proximal to the DFN3 gene POU3F4. Hum Mol Genet 5:12291235 de Laat W, Duboule D (2013) Topology of mammalian developmental enhancers and their regulatory landscapes. Nature 502:499506 de Mollerat XJ, Gurrieri F, Morgan CT, Sangiorgi E, Everman DB, Gaspari P, Amiel J, Bamshad MJ, Lyle R, Blouin JL, Allanson JE, Le Marec B, Wilson M, Braverman NE, Radhakrishna U, Delozier-Blanchet C, Abbott A, Elghouzzi V, Antonarakis S, Stevenson RE, Munnich A, Neri G, Schwartz CE (2003) A genomic rearrangement resulting in a tandem duplication is associated with split hand-split foot malformation 3 (SHFM3) at 10q24. Hum Mol Genet 12:19591971 de Wit E, de Laat W (2012) A decade of 3C technologies: insights into nuclear organization. Genes Dev 26:1124 Dekker J, Marti-Renom MA, Mirny LA (2013) Exploring the threedimensional organization of genomes: interpreting chromatin interaction data. Nat Rev Genet 14:390403 Deng W, Lee J, Wang H, Miller J, Reik A, Gregory PD, Dean A, Blobel GA (2012) Controlling long-range genomic interactions at a native locus by targeted tethering of a looping factor. Cell 149:12331244 DHaene B, Attanasio C, Beysen D, Dostie J, Lemire E, Bouchard P, Field M, Jones K, Lorenz B, Menten B, Buysse K, Pattyn F, Friedli M, Ucla C, Rossier C, Wyss C, Speleman F, De Paepe A, Dekker J, Antonarakis SE, De Baere E (2009) Disease-causing 7.4kb cis-regulatory deletion disrupting conserved non-coding sequences and their interaction with the FOXL2 promotor: implications for mutation screening. PLoS Genet 5:e1000522 Dimas AS, Stranger BE, Beazley C, Finn RD, Ingle CE, Forrest MS, Ritchie ME, Deloukas P, Tavare S, Dermitzakis ET (2008) Modier effects between regulatory and protein-coding variation. PLoS Genet 4:e1000244 Dimas AS, Deutsch S, Stranger BE, Montgomery SB, Borel C, AttarCohen H, Ingle C, Beazley C, Gutierrez Arcelus M, Sekowska M, Gagnebin M, Nisbett J, Deloukas P, Dermitzakis ET, Antonarakis SE (2009) Common regulatory variation impacts gene expression in a cell type-dependent manner. Science 325:12461250 Dimitrov BI, de Ravel T, Van Driessche J, de Die-Smulders C, Toutain A, Vermeesch JR, Fryns JP, Devriendt K, Debeer P (2009) Distal limb deciencies, micrognathia syndrome, and syndromic forms of split hand foot malformation (SHFM) are caused by chromosome 10q genomic rearrangements. J Med Genet 47:103111 Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS, Ren B (2012) Topological domains in mammalian genomes identied by analysis of chromatin interactions. Nature 485:376380 Dlugaszewska B, Silahtaroglu A, Menzel C, Kubart S, Cohen M, Mundlos S, Tumer Z, Kjaer K, Friedrich U, Ropers HH, Tommerup N, Neitzel H, Kalscheuer VM (2006) Breakpoints around the HOXD cluster result in various limb malformations. J Med Genet 43:111118

13

Hum Genet Doolittle WF (2013) Is junk DNA bunk? A critique of ENCODE. Proc Natl Acad Sci USA 110:52945300 Dunipace L, Ozdemir A, Stathopoulos A (2011) Complex interactions between cis-regulatory modules in native conformation are critical for Drosophila snail expression. Development 138:40754084 Eiberg H, Troelsen J, Nielsen M, Mikkelsen A, Mengel-From J, Kjaer KW, Hansen L (2008) Blue eye color in humans may be caused by a perfectly associated founder mutation in a regulatory element located within the HERC2 gene inhibiting OCA2 expression. Hum Genet 123:177187 Elgar G, Sandford R, Aparicio S, Macrae A, Venkatesh B, Brenner S (1996) Small is beautiful: comparative genomics with the puffersh (Fugu rubripes). Trends Genet 12:145150 Emison ES, McCallion AS, Kashuk CS, Bush RT, Grice E, Lin S, Portnoy ME, Cutler DJ, Green ED, Chakravarti A (2005) A common sex-dependent mutation in a RET enhancer underlies Hirschsprung disease risk. Nature 434:857863 Emison ES, Garcia-Barcelo M, Grice EA, Lantieri F, Amiel J, Burzynski G, Fernandez RM, Hao L, Kashuk C, West K, Miao X, Tam PK, Griseri P, Ceccherini I, Pelet A, Jannot AS, de Pontual L, Henrion-Caude A, Lyonnet S, Verheij JB, Hofstra RM, Antinolo G, Borrego S, McCallion AS, Chakravarti A (2010) Differential contributions of rare and common, coding and noncoding Ret mutations to multifactorial Hirschsprung disease liability. Am J Hum Genet 87:6074 Enattah NS, Sahi T, Savilahti E, Terwilliger JD, Peltonen L, Jarvela I (2002) Identication of a variant associated with adult-type hypolactasia. Nat Genet 30:233237 Enattah NS, Trudeau A, Pimenoff V, Maiuri L, Auricchio S, Greco L, Rossi M, Lentze M, Seo JK, Rahgozar S, Khalil I, Alifrangis M, Natah S, Groop L, Shaat N, Kozlov A, Verschubskaya G, Comas D, Bulayeva K, Mehdi SQ, Terwilliger JD, Sahi T, Savilahti E, Perola M, Sajantila A, Jarvela I, Peltonen L (2007) Evidence of still-ongoing convergence evolution of the lactase persistence T-13910 alleles in humans. Am J Hum Genet 81:615625 Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Zhang X, Wang L, Issner R, Coyne M, Ku M, Durham T, Kellis M, Bernstein BE (2011) Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473:4349 Fantes J, Redeker B, Breen M, Boyle S, Brown J, Fletcher J, Jones S, Bickmore W, Fukushima Y, Mannens M etal (1995) Aniridiaassociated cytogenetic rearrangements suggest that a position effect may cause the mutant phenotype. Hum Mol Genet 4:415422 Fantes JA, Boland E, Ramsay J, Donnai D, Splitt M, Goodship JA, Stewart H, Whiteford M, Gautier P, Harewood L, Holloway S, Sharkey F, Maher E, van Heyningen V, Clayton-Smith J, Fitzpatrick DR, Black GC (2008) FISH mapping of de novo apparently balanced chromosome rearrangements identies characteristics associated with phenotypic abnormality. Am J Hum Genet 82:916926 Farooq M, Troelsen JT, Boyd M, Eiberg H, Hansen L, Hussain MS, Rehman S, Azhar A, Ali A, Bakhtiar SM, Tommerup N, Baig SM, Kjaer KW (2010) Preaxial polydactyly/triphalangeal thumb is associated with changed transcription factor-binding afnity in a family with a novel point mutation in the long-range cis-regulatory element ZRS. Eur J Hum Genet 18:733736 Farrell JJ, Sherva RM, Chen ZY, Luo HY, Chu BF, Ha SY, Li CK, Lee AC, Li RC, Yuen HL, So JC, Ma ES, Chan LC, Chan V, Sebastiani P, Farrer LA, Baldwin CT, Steinberg MH, Chui DH (2011) A 3-bp deletion in the HBS1L-MYB intergenic region on chromosome 6q23 is associated with HbF expression. Blood 117:49354945 FitzPatrick DR, Carr IM, McLaren L, Leek JP, Wightman P, Williamson K, Gautier P, McGill N, Hayward C, Firth H, Markham AF, Fantes JA, Bonthron DT (2003) Identication of SATB2 as the cleft palate gene on 2q32-q33. Hum Mol Genet 12:24912501 Fonseca AC, Bonaldi A, Bertola DR, Kim CA, Otto PA, ViannaMorgante AM (2013) The clinical impact of chromosomal rearrangements with breakpoints upstream of the SOX9 gene: two novel de novo balanced translocations associated with acampomelic campomelic dysplasia. BMC Med Genet 14:50 Foster JW, Dominguez-Steglich MA, Guioli S, Kwok C, Weller PA, Stevanovic M, Weissenbach J, Mansour S, Young ID, Goodfellow PN etal (1994) Campomelic dysplasia and autosomal sex reversal caused by mutations in an SRY-related gene. Nature 372:525530 Frankel N, Davis GK, Vargas D, Wang S, Payre F, Stern DL (2010) Phenotypic robustness conferred by apparently redundant transcriptional enhancers. Nature 466:490493 Fukami M, Tsuchiya T, Takada S, Kanbara A, Asahara H, Igarashi A, Kamiyama Y, Nishimura G, Ogata T (2012) Complex genomic rearrangement in the SOX9 5 region in a patient with Pierre Robin sequence and hypoplastic left scapula. Am J Med Genet A 158A:15291534 Furniss D, Lettice LA, Taylor IB, Critchley PS, Giele H, Hill RE, Wilkie AO (2008) A variant in the sonic hedgehog regulatory sequence (ZRS) is associated with triphalangeal thumb and deregulates expression in the developing limb. Hum Mol Genet 17:24172423 Gareld DA, Runcie DE, Babbitt CC, Haygood R, Nielsen WJ, Wray GA (2013) The impact of gene expression variation on the robustness and evolvability of a developmental gene regulatory network. PLoS Biol 11:e1001696 Garrick D, De Gobbi M, Samara V, Rugless M, Holland M, Ayyub H, Lower K, Sloane-Stanley J, Gray N, Koch C, Dunham I, Higgs DR (2008) The role of the polycomb complex in silencing alpha-globin gene expression in nonerythroid cells. Blood 112:38893899 Ghiasvand NM, Rudolph DD, Mashayekhi M, Brzezinski JA 4th, Goldman D, Glaser T (2011) Deletion of a remote enhancer near ATOH7 disrupts retinal neurogenesis, causing NCRNA disease. Nat Neurosci 14:578586 Ghoumid J, Andrieux J, Sablonniere B, Odent S, Philippe N, Zanlonghi X, Saugier-Veber P, Bardyn T, Manouvrier-Hanu S, Holder-Espinasse M (2011) Duplication at chromosome 2q31.1-q31.2 in a family presenting syndactyly and nystagmus. Eur J Hum Genet 19:11981201 Giresi PG, Kim J, McDaniell RM, Iyer VR, Lieb JD (2007) FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome Res 17:877885 Gonzaga-Jauregui C, Lupski JR, Gibbs RA (2012) Human genome sequencing in health and disease. Annu Rev Med 63:3561 Gonzalez F, Duboule D, Spitz F (2007) Transgenic analysis of Hoxd gene regulation during digit development. Dev Biol 306:847859 Graur D, Zheng Y, Price N, Azevedo RB, Zufall RA, Elhaik E (2013) On the immortality of television sets: function in the human genome according to the evolution-free gospel of ENCODE. Genome Biol Evol 5:578590 Grosveld F, van Assendelft GB, Greaves DR, Kollias G (1987) Position-independent, high-level expression of the human beta-globin gene in transgenic mice. Cell 51:975985 Gurnett CA, Bowcock AM, Dietz FR, Morcuende JA, Murray JC, Dobbs MB (2007) Two novel point mutations in the long-range SHH enhancer in three families with triphalangeal thumb and preaxial polydactyly. Am J Med Genet A 143:2732

13

Hallikas O, Palin K, Sinjushina N, Rautiainen R, Partanen J, Ukkonen E, Taipale J (2006) Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding afnity. Cell 124:4759 Hancock DB, Artigas MS, Gharib SA, Henry A, Manichaikul A, Ramasamy A, Loth DW, Imboden M, Koch B, McArdle WL, Smith AV, Smolonska J, Sood A, Tang W, Wilk JB, Zhai G, Zhao JH, Aschard H, Burkart KM, Curjuric I, Eijgelsheim M, Elliott P, Gu X, Harris TB, Janson C, Homuth G, Hysi PG, Liu JZ, Loehr LR, Lohman K, Loos RJ, Manning AK, Marciante KD, Obeidat M, Postma DS, Aldrich MC, Brusselle GG, Chen TH, Eiriksdottir G, Franceschini N, Heinrich J, Rotter JI, Wijmenga C, Williams OD, Bentley AR, Hofman A, Laurie CC, Lumley T, Morrison AC, Joubert BR, Rivadeneira F, Couper DJ, Kritchevsky SB, Liu Y, Wjst M, Wain LV, Vonk JM, Uitterlinden AG, Rochat T, Rich SS, Psaty BM, OConnor GT, North KE, Mirel DB, Meibohm B, Launer LJ, Khaw KT, Hartikainen AL, Hammond CJ, Glaser S, Marchini J, Kraft P, Wareham NJ, Volzke H, Stricker BH, Spector TD, ProbstHensch NM, Jarvis D, Jarvelin MR, Heckbert SR, Gudnason V, Boezen HM, Barr RG, Cassano PA, Strachan DP, Fornage M, Hall IP, Dupuis J, Tobin MD, London SJ (2013) Genomewide joint meta-analysis of SNP and SNP-by-smoking interaction identies novel loci for pulmonary function. PLoS Genet 8:e1003098 Harismendy O, Notani D, Song X, Rahim NG, Tanasa B, Heintzman N, Ren B, Fu XD, Topol EJ, Rosenfeld MG, Frazer KA (2011) 9p21 DNA variants associated with coronary artery disease impair interferon-gamma signalling response. Nature 470:264268 Heintzman ND, Stuart RK, Hon G, Fu Y, Ching CW, Hawkins RD, Barrera LO, Van Calcar S, Qu C, Ching KA, Wang W, Weng Z, Green RD, Crawford GE, Ren B (2007) Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet 39:311318 Henglein B, Synovzik H, Groitl P, Bornkamm GW, Hartl P, Lipp M (1989) Three breakpoints of variant t(2;8) translocations in Burkitts lymphoma cells fall within a region 140 kilobases distal from c-myc. Mol Cell Biol 9:21052113 Higgs DR (2013) The molecular basis of alpha-thalassemia. Cold Spring Harb Perspect Med 3:a011718 Hill-Harfe KL, Kaplan L, Stalker HJ, Zori RT, Pop R, Scherer G, Wallace MR (2005) Fine mapping of chromosome 17 translocation breakpoints>or=900Kb upstream of SOX9 in acampomelic campomelic dysplasia and a mild, familial skeletal dysplasia. Am J Hum Genet 76:663671 Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA 106:93629367 Hoffman MM, Ernst J, Wilder SP, Kundaje A, Harris RS, Libbrecht M, Giardine B, Ellenbogen PM, Bilmes JA, Birney E, Hardison RC, Dunham I, Kellis M, Noble WS (2012) Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res 41:827841 Hong JW, Hendrix DA, Levine MS (2008) Shadow enhancers as a source of evolutionary novelty. Science 321:1314 Huang B, Wang S, Ning Y, Lamb AN, Bartley J (1999) Autosomal XX sex reversal caused by duplication of SOX9. Am J Med Genet 87:349353 Huang L, Jolly LA, Willis-Owen S, Gardner A, Kumar R, Douglas E, Shoubridge C, Wieczorek D, Tzschach A, Cohen M, Hackett A, Field M, Froyen G, Hu H, Haas SA, Ropers HH, Kalscheuer VM, Corbett MA, Gecz J (2012) A noncoding, regulatory mutation implicates HCFC1 in nonsyndromic intellectual disability. Am J Hum Genet 91:694702

Hum Genet Hughes JR, Lower KM, Dunham I, Taylor S, De Gobbi M, SloaneStanley JA, McGowan S, Ragoussis J, Vernimmen D, Gibbons RJ, Higgs DR (2013) High-resolution analysis of cis-acting regulatory networks at the alpha-globin locus. Philos Trans R Soc Lond B Biol Sci 368:20120361 Ianakiev P, van Baren MJ, Daly MJ, Toledo SP, Cavalcanti MG, Neto JC, Silveira EL, Freire-Maia A, Heutink P, Kilpatrick MW, Tsipouras P (2001) Acheiropodia is caused by a genomic deletion in C7orf2, the human orthologue of the Lmbr1 gene. Am J Hum Genet 68:3845 Ishibashi M, Mechaly AS, Becker TS, Rinkwitz S (2013) Using zebrash transgenesis to test human genomic sequences for specic enhancer activity. Methods 62:216225 Jamshidi N, Macciocca I, Dargaville PA, Thomas P, Kilpatrick N, McKinlay Gardner RJ, Farlie PG (2004) Isolated Robin sequence associated with a balanced t(2;17) chromosomal translocation. J Med Genet 41:e1 Jeong Y, El-Jaick K, Roessler E, Muenke M, Epstein DJ (2006) A functional screen for sonic hedgehog regulatory elements across a 1Mb interval identies long-range ventral forebrain enhancers. Development 133:761772 Jeong Y, Leskow FC, El-Jaick K, Roessler E, Muenke M, Yocum A, Dubourg C, Li X, Geng X, Oliver G, Epstein DJ (2008) Regulation of a remote Shh forebrain enhancer by the Six3 homeoprotein. Nat Genet 40:13481353 Jin C, Zang C, Wei G, Cui K, Peng W, Zhao K, Felsenfeld G (2009) H3.3/H2A.Z double variant-containing nucleosomes mark nucleosome-free regions of active promoters and other regulatory regions. Nat Genet 41:941945 Jinek M, East A, Cheng A, Lin S, Ma E, Doudna J (2013) RNA-programmed genome editing in human cells. Elife 2:e00471 John S, Sabo PJ, Caneld TK, Lee K, Vong S, Weaver M, Wang H, Vierstra J, Reynolds AP, Thurman RE, Stamatoyannopoulos JA (2013) Genome-scale mapping of DNase I hypersensitivity. Curr Protoc Mol Biol 27:21.27 Kantaputra PN, Klopocki E, Hennig BP, Praphanphoj V, Le Caignec C, Isidor B, Kwee ML, Shears DJ, Mundlos S (2010) Mesomelic dysplasia Kantaputra type is associated with duplications of the HOXD locus on chromosome 2q. Eur J Hum Genet 18:13101314 Karolchik D, Barber GP, Casper J, Clawson H, Cline MS, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, Harte RA, Heitner S, Hinrichs AS, Learned K, Lee BT, Li CH, Raney BJ, Rhead B, Rosenbloom KR, Sloan CA, Speir ML, Zweig AS, Haussler D, Kuhn RM, Kent WJ (2014) The UCSC Genome Browser database: 2014 update. Nucleic Acids Res 42:D764D770 Kheradpour P, Ernst J, Melnikov A, Rogov P, Wang L, Zhang X, Alston J, Mikkelsen TS, Kellis M (2013) Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res 23:800811 Kilpelainen TO, Zillikens MC, Stancakova A, Finucane FM, Ried JS, Langenberg C, Zhang W, Beckmann JS, Luan J, Vandenput L, Styrkarsdottir U, Zhou Y, Smith AV, Zhao JH, Amin N, Vedantam S, Shin SY, Haritunians T, Fu M, Feitosa MF, Kumari M, Halldorsson BV, Tikkanen E, Mangino M, Hayward C, Song C, Arnold AM, Aulchenko YS, Oostra BA, Campbell H, Cupples LA, Davis KE, Doring A, Eiriksdottir G, Estrada K, Fernandez-Real JM, Garcia M, Gieger C, Glazer NL, Guiducci C, Hofman A, Humphries SE, Isomaa B, Jacobs LC, Jula A, Karasik D, Karlsson MK, Khaw KT, Kim LJ, Kivimaki M, Klopp N, Kuhnel B, Kuusisto J, Liu Y, Ljunggren O, Lorentzon M, Luben RN, McKnight B, Mellstrom D, Mitchell BD, Mooser V, Moreno JM, Mannisto S, OConnell JR, Pascoe L, Peltonen L, Peral B, Perola M, Psaty BM, Salomaa V, Savage DB, Semple RK, Skaric-Juric T, Sigurdsson G, Song KS, Spector TD,

13

Hum Genet Syvanen AC, Talmud PJ, Thorleifsson G, Thorsteinsdottir U, Uitterlinden AG, van Duijn CM, Vidal-Puig A, Wild SH, Wright AF, Clegg DJ, Schadt E, Wilson JF, Rudan I, Ripatti S, Borecki IB, Shuldiner AR, Ingelsson E, Jansson JO, Kaplan RC, Gudnason V, Harris TB, Groop L, Kiel DP, Rivadeneira F etal (2011) Genetic variation near IRS1 associates with reduced adiposity and an impaired metabolic prole. Nat Genet 43:753760 Kim MJ, Ahituv N (2013) The hydrodynamic tail vein assay as a tool for the study of liver promoters and enhancers. Methods Mol Biol 1015:279289 Kim MJ, Skewes-Cox P, Fukushima H, Hesselson S, Yee SW, Ramsey LB, Nguyen L, Eshragh JL, Castro RA, Wen CC, Stryke D, Johns SJ, Ferrin TE, Kwok PY, Relling MV, Giacomini KM, Kroetz DL, Ahituv N (2011) Functional characterization of liver enhancers that regulate drug-associated transporters. Clin Pharmacol Ther 89:571578 King MC, Wilson AC (1975) Evolution at two levels in humans and chimpanzees. Science 188:107116 Kioussis D, Vanin E, deLange T, Flavell RA, Grosveld FG (1983) Beta-globin gene inactivation by DNA translocation in gamma beta-thalassaemia. Nature 306:662666 Kleinjan DJ, Coutinho P (2009) Cis-ruption mechanisms: disruption of cis-regulatory control as a cause of human genetic disease. Brief Funct Genomic Proteomic 8:317332 Kleinjan DA, van Heyningen V (2005) Long-range control of gene expression: emerging mechanisms and disruption in disease. Am J Hum Genet 76:832 Kleinjan DA, Seawright A, Schedl A, Quinlan RA, Danes S, van Heyningen V (2001) Aniridia-associated translocations, DNase hypersensitivity, sequence comparison and transgenic analysis redene the functional domain of PAX6. Hum Mol Genet 10:20492059 Kleinjan DA, Seawright A, Elgar G, van Heyningen V (2002) Characterization of a novel gene adjacent to PAX6, revealing synteny conservation with functional signicance. Mamm Genome 13:102107 Kleinjan DA, Seawright A, Mella S, Carr CB, Tyas DA, Simpson TI, Mason JO, Price DJ, van Heyningen V (2006) Long-range downstream enhancers are essential for Pax6 expression. Dev Biol 299:563581 Kleinjan DA, Bancewicz RM, Gautier P, Dahm R, Schonthaler HB, Damante G, Seawright A, Hever AM, Yeyati PL, van Heyningen V, Coutinho P (2008) Subfunctionalization of duplicated zebrash pax6 genes by cis-regulatory divergence. PLoS Genet 4:e29 Klopocki E, Ott CE, Benatar N, Ullmann R, Mundlos S, Lehmann K (2008) A microduplication of the long range SHH limb regulator (ZRS) is associated with triphalangeal thumb-polysyndactyly syndrome. J Med Genet 45:370375 Klopocki E, Lohan S, Brancati F, Koll R, Brehm A, Seemann P, Dathe K, Stricker S, Hecht J, Bosse K, Betz RC, Garaci FG, Dallapiccola B, Jain M, Muenke M, Ng VC, Chan W, Chan D, Mundlos S (2011) Copy-number variations involving the IHH locus are associated with syndactyly and craniosynostosis. Am J Hum Genet 88:7075 Kokubu C, Horie K, Abe K, Ikeda R, Mizuno S, Uno Y, Ogiwara S, Ohtsuka M, Isotani A, Okabe M, Imai K, Takeda J (2009) A transposon-based chromosomal engineering method to survey a large cis-regulatory landscape in mice. Nat Genet 41:946952 Kowalczyk MS, Hughes JR, Garrick D, Lynch MD, Sharpe JA, Sloane-Stanley JA, McGowan SJ, De Gobbi M, Hosseini M, Vernimmen D, Brown JM, Gray NE, Collavin L, Gibbons RJ, Flint J, Taylor S, Buckle VJ, Milne TA, Wood WG, Higgs DR (2012) Intragenic enhancers act as alternative promoters. Mol Cell 45:447458 Kurth I, Klopocki E, Stricker S, van Oosterwijk J, Vanek S, Altmann J, Santos HG, van Harssel JJ, de Ravel T, Wilkie AO, Gal A, Mundlos S (2009) Duplications of noncoding elements 5 of SOX9 are associated with brachydactyly-anonychia. Nat Genet 41:862863 Kwasnieski JC, Mogno I, Myers CA, Corbo JC, Cohen BA (2012) Complex effects of nucleotide variants in a mammalian cisregulatory element. Proc Natl Acad Sci USA 109:1949819503 Lecointre C, Pichon O, Hamel A, Heloury Y, Michel-Calemard L, Morel Y, David A, Le Caignec C (2009) Familial acampomelic form of campomelic dysplasia caused by a 960kb deletion upstream of SOX9. Am J Med Genet A 149A:11831189 Leipoldt M, Erdel M, Bien-Willner GA, Smyk M, Theurl M, Yatsenko SA, Lupski JR, Lane AH, Shanske AL, Stankiewicz P, Scherer G (2007) Two novel translocation breakpoints upstream of SOX9 dene borders of the proximal and distal breakpoint cluster region in campomelic dysplasia. Clin Genet 71:6775 Lettice LA, Horikoshi T, Heaney SJ, van Baren MJ, van der Linde HC, Breedveld GJ, Joosse M, Akarsu N, Oostra BA, Endo N, Shibata M, Suzuki M, Takahashi E, Shinka T, Nakahori Y, Ayusawa D, Nakabayashi K, Scherer SW, Heutink P, Hill RE, Noji S (2002) Disruption of a long-range cis-acting regulator for Shh causes preaxial polydactyly. Proc Natl Acad Sci USA 99:75487553 Lettice LA, Heaney SJ, Purdie LA, Li L, de Beer P, Oostra BA, Goode D, Elgar G, Hill RE, de Graaff E (2003) A long-range Shh enhancer regulates expression in the developing limb and n and is associated with preaxial polydactyly. Hum Mol Genet 12:17251735 Lettice LA, Daniels S, Sweeney E, Venkataraman S, Devenney PS, Gautier P, Morrison H, Fantes J, Hill RE, FitzPatrick DR (2011) Enhancer-adoption as a mechanism of human developmental disease. Hum Mutat 32:14921499 Lettice LA, Williamson I, Wiltshire JH, Peluso S, Devenney PS, Hill AE, Essa A, Hagman J, Mort R, Grimes G, DeAngelis CL, Hill RE (2012) Opposing functions of the ETS factor family dene Shh spatial expression in limb buds and underlie polydactyly. Dev Cell 22:459467 Li G, Ruan X, Auerbach RK, Sandhu KS, Zheng M, Wang P, Poh HM, Goh Y, Lim J, Zhang J, Sim HS, Peh SQ, Mulawadi FH, Ong CT, Orlov YL, Hong S, Zhang Z, Landt S, Raha D, Euskirchen G, Wei CL, Ge W, Wang H, Davis C, Fisher-Aylor KI, Mortazavi A, Gerstein M, Gingeras T, Wold B, Sun Y, Fullwood MJ, Cheung E, Liu E, Sung WK, Snyder M, Ruan Y (2012) Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148:8498 Libioulle C, Louis E, Hansoul S, Sandor C, Farnir F, Franchimont D, Vermeire S, Dewit O, de Vos M, Dixon A, Demarche B, Gut I, Heath S, Foglio M, Liang L, Laukens D, Mni M, Zelenika D, Van Gossum A, Rutgeerts P, Belaiche J, Lathrop M, Georges M (2007) Novel Crohn disease locus identied by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS Genet 3:e58 Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326:289293 Ligtenberg MJ, Kuiper RP, Chan TL, Goossens M, Hebeda KM, Voorendt M, Lee TY, Bodmer D, Hoenselaar E, Hendriks-Cornelissen SJ, Tsui WY, Kong CK, Brunner HG, van Kessel AG, Yuen ST, van Krieken JH, Leung SY, Hoogerbrugge N (2009) Heritable somatic methylation and inactivation of MSH2 in

13

families with Lynch syndrome due to deletion of the 3 exons of TACSTD1. Nat Genet 41:112117 Lin CY, Loven J, Rahl PB, Paranal RM, Burge CB, Bradner JE, Lee TI, Young RA (2012) Transcriptional amplication in tumor cells with elevated c-Myc. Cell 151:5667 Lodder EM, Eussen BH, van Hassel DA, Hoogeboom AJ, Poddighe PJ, Coert JH, Oostra BA, de Klein A, de Graaff E (2009) Implication of long-distance regulation of the HOXA cluster in a patient with postaxial polydactyly. Chromosome Res 17:737744 Loots GG (2008) Genomic identication of regulatory elements by evolutionary sequence comparison and functional analysis. Adv Genet 61:269293 Lower KM, Hughes JR, De Gobbi M, Henderson S, Viprakasit V, Fisher C, Goriely A, Ayyub H, Sloane-Stanley J, Vernimmen D, Langford C, Garrick D, Gibbons RJ, Higgs DR (2009) Adventitious changes in long-range gene expression caused by polymorphic structural variation and promoter competition. Proc Natl Acad Sci USA 106:2177121776 Lubbe SJ, Pittman AM, Olver B, Lloyd A, Vijayakrishnan J, Naranjo S, Dobbins S, Broderick P, Gomez-Skarmeta JL, Houlston RS (2011) The 14q22.2 colorectal cancer variant rs4444235 shows cis-acting regulation of BMP4. Oncogene 31:37773784 Maas SA, Fallon JF (2005) Single base pair change in the long-range Sonic hedgehog limb-specic enhancer is a genetic basis for preaxial polydactyly. Dev Dyn 232:345348 Maas SA, Suzuki T, Fallon JF (2011) Identication of spontaneous mutations within the long-range limb-specic Sonic hedgehog enhancer (ZRS) that alter Sonic hedgehog expression in the chicken limb mutants oligozeugodactyly and silkie breed. Dev Dyn 240:12121222 Macaya D, Katsanis SH, Hefferon TW, Audlin S, Mendelsohn NJ, Roggenbuck J, Cutting GR (2009) A synonymous mutation in TCOF1 causes Treacher Collins syndrome due to mis-splicing of a constitutive exon. Am J Med Genet A 149A:16241627 Maeder ML, Linder SJ, Cascio VM, Fu Y, Ho QH, Joung JK (2013a) CRISPR RNA-guided activation of endogenous human genes. Nat Methods 10:977979 Maeder ML, Linder SJ, Reyon D, Angstman JF, Fu Y, Sander JD, Joung JK (2013b) Robust, synergistic regulation of human gene expression using TALE activators. Nat Methods 10:243245 Mali P, Esvelt KM, Church GM (2013a) Cas9 as a versatile tool for engineering biology. Nat Methods 10:957963 Mali P, Yang L, Esvelt KM, Aach J, Guell M, DiCarlo JE, Norville JE, Church GM (2013b) RNA-guided human genome engineering via Cas9. Science 339:823826 Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TF, McCarroll SA, Visscher PM (2009) Finding the missing heritability of complex diseases. Nature 461:747753 Marinic M, Aktas T, Ruf S, Spitz F (2013) An integrated holoenhancer unit denes tissue and gene specicity of the Fgf8 regulatory landscape. Dev Cell 24:530542 Martin D, Pantoja C, Fernandez Minan A, Valdes-Quezada C, Molto E, Matesanz F, Bogdanovic O, de la Calle-Mustienes E, Dominguez O, Taher L, Furlan-Magaril M, Alcina A, Canon S, Fedetz M, Blasco MA, Pereira PS, Ovcharenko I, RecillasTarga F, Montoliu L, Manzanares M, Guigo R, Serrano M, Casares F, Gomez-Skarmeta JL (2011) Genome-wide CTCF distribution in vertebrates denes equivalent sites that aid the identication of disease-associated genes. Nat Struct Mol Biol 18:708714

Hum Genet Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J, Shafer A, Neri F, Lee K, Kutyavin T, Stehling-Sun S, Johnson AK, Caneld TK, Giste E, Diegel M, Bates D, Hansen RS, Neph S, Sabo PJ, Heimfeld S, Raubitschek A, Ziegler S, Cotsapas C, Sotoodehnia N, Glass I, Sunyaev SR, Kaul R, Stamatoyannopoulos JA (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science 337:11901195 McBride DJ, Buckle A, van Heyningen V, Kleinjan DA (2012) DNaseI hypersensitivity and ultraconservation reveal novel, interdependent long-range enhancers at the complex Pax6 cis-regulatory region. PLoS One 6:e28616 McClellan J, King MC (2010) Genetic heterogeneity in human disease. Cell 141:210217 McLean CY, Reno PL, Pollen AA, Bassan AI, Capellini TD, Guenther C, Indjeian VB, Lim X, Menke DB, Schaar BT, Wenger AM, Bejerano G, Kingsley DM (2011) Human-specic loss of regulatory DNA and the evolution of human-specic traits. Nature 471:216219 Melnikov A, Murugan A, Zhang X, Tesileanu T, Wang L, Rogov P, Feizi S, Gnirke A, Callan CG Jr, Kinney JB, Kellis M, Lander ES, Mikkelsen TS (2012) Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat Biotechnol 30:271277 Mercer TR, Dinger ME, Mattick JS (2009) Long non-coding RNAs: insights into functions. Nat Rev Genet 10:155159 Mitter D, Chiaie BD, Ludecke HJ, Gillessen-Kaesbach G, Bohring A, Kohlhase J, Caliebe A, Siebert R, Roepke A, Ramos-Arroyo MA, Nieva B, Menten B, Loeys B, Mortier G, Wieczorek D (2010) Genotype-phenotype correlation in eight new patients with a deletion encompassing 2q31.1. Am J Med Genet A 152A:12131224 Miyake A, Kou I, Takahashi Y, Johnson TA, Ogura Y, Dai J, Qiu X, Takahashi A, Jiang H, Yan H, Kono K, Kawakami N, Uno K, Ito M, Minami S, Yanagida H, Taneichi H, Hosono N, Tsuji T, Suzuki T, Sudo H, Kotani T, Yonezawa I, Kubo M, Tsunoda T, Watanabe K, Chiba K, Toyama Y, Qiu Y, Matsumoto M, Ikegawa S (2013) Identication of a susceptibility locus for severe adolescent idiopathic scoliosis on chromosome 17q24.3. PLoS One 8:e72802 Moffatt MF, Kabesch M, Liang L, Dixon AL, Strachan D, Heath S, Depner M, von Berg A, Bufe A, Rietschel E, Heinzmann A, Simma B, Frischer T, Willis-Owen SA, Wong KC, Illig T, Vogelberg C, Weiland SK, von Mutius E, Abecasis GR, Farrall M, Gut IG, Lathrop GM, Cookson WO (2007) Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature 448:470473 Montavon T, Duboule D (2012) Landscapes and archipelagos: spatial organization of gene regulation in vertebrates. Trends Cell Biol 22:347354 Montavon T, Duboule D (2013) Chromatin organization and global regulation of Hox gene clusters. Philos Trans R Soc Lond B Biol Sci 368:20120367 Montavon T, Soshnikova N, Mascrez B, Joye E, Thevenet L, Splinter E, de Laat W, Spitz F, Duboule D (2011) A regulatory archipelago controls Hox genes transcription in digits. Cell 147:11321145 Montavon T, Thevenet L, Duboule D (2012) Impact of copy number variations (CNVs) on long-range gene regulation at the HoxD locus. Proc Natl Acad Sci USA 109:2020420211 Musunuru K, Strong A, Frank-Kamenetsky M, Lee NE, Ahfeldt T, Sachs KV, Li X, Li H, Kuperwasser N, Ruda VM, Pirruccello JP, Muchmore B, Prokunina-Olsson L, Hall JL, Schadt EE, Morales CR, Lund-Katz S, Phillips MC, Wong J, Cantley W, Racie T, Ejebe KG, Orho-Melander M, Melander O, Koteliansky V, Fitzgerald K, Krauss RM, Cowan CA, Kathiresan

13

Hum Genet S, Rader DJ (2010) From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466:714719 Naranjo S, Voesenek K, de la Calle-Mustienes E, Robert-Moreno A, Kokotas H, Grigoriadou M, Economides J, Van Camp G, Hilgert N, Moreno F, Alsina B, Petersen MB, Kremer H, Gomez-Skarmeta JL (2010) Multiple enhancers located in a 1-Mb region upstream of POU3F4 promote expression during inner ear development and may be required for hearing. Hum Genet 128:411419 Newburger DE, Bulyk ML (2009) UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res 37:D77D82 Niedermaier M, Schwabe GC, Fees S, Helmrich A, Brieske N, Seemann P, Hecht J, Seitz V, Stricker S, Leschik G, Schrock E, Selby PB, Mundlos S (2005) An inversion involving the mouse Shh locus results in brachydactyly through dysregulation of Shh expression. J Clin Invest 115:900909 Nobrega MA, Pennacchio LA (2004) Comparative genomic analysis as a tool for biological discovery. J Physiol 554:3139 Nobrega MA, Zhu Y, Plajzer-Frick I, Afzal V, Rubin EM (2004) Megabase deletions of gene deserts result in viable mice. Nature 431:988993 Nora EP, Lajoie BR, Schulz EG, Giorgetti L, Okamoto I, Servant N, Piolot T, van Berkum NL, Meisig J, Sedat J, Gribnau J, Barillot E, Bluthgen N, Dekker J, Heard E (2012) Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485:381385 Oksenberg N, Stevison L, Wall JD, Ahituv N (2013) Function and regulation of AUTS2, a gene implicated in autism and human evolution. PLoS Genet 9:e1003221 Ottaviani A, Rival-Gervier S, Boussouar A, Foerster AM, Rondier D, Sacconi S, Desnuelle C, Gilson E, Magdinier F (2009) The D4Z4 macrosatellite repeat acts as a CTCF and A-type laminsdependent insulator in facio-scapulo-humeral dystrophy. PLoS Genet 5:e1000394 Ottaviani A, Schluth-Bolard C, Gilson E, Magdinier F (2011) D4Z4 as a prototype of CTCF and lamins-dependent insulator in human cells. Nucleus 1:3036 Patwardhan RP, Hiatt JB, Witten DM, Kim MJ, Smith RP, May D, Lee C, Andrie JM, Lee SI, Cooper GM, Ahituv N, Pennacchio LA, Shendure J (2012) Massively parallel functional dissection of mammalian enhancers in vivo. Nat Biotechnol 30:265270 Perez-Pinera P, Kocak DD, Vockley CM, Adler AF, Kabadi AM, Polstein LR, Thakore PI, Glass KA, Ousterout DG, Leong KW, Guilak F, Crawford GE, Reddy TE, Gersbach CA (2013a) RNA-guided gene activation by CRISPR-Cas9-based transcription factors. Nat Methods 10:973976 Perez-Pinera P, Ousterout DG, Brunger JM, Farin AM, Glass KA, Guilak F, Crawford GE, Hartemink AJ, Gersbach CA (2013b) Synergistic and tunable human gene activation by combinations of synthetic transcription factors. Nat Methods 10:239242 Perry MW, Boettiger AN, Bothma JP, Levine M (2010) Shadow enhancers foster robustness of Drosophila gastrulation. Curr Biol 20:15621567 Perry MW, Boettiger AN, Levine M (2011) Multiple enhancers ensure precision of gap gene-expression patterns in the Drosophila embryo. Proc Natl Acad Sci USA 108:1357013575 Pfeifer D, Kist R, Dewar K, Devon K, Lander ES, Birren B, Korniszewski L, Back E, Scherer G (1999) Campomelic dysplasia translocation breakpoints are scattered over 1Mb proximal to SOX9: evidence for an extended control region. Am J Hum Genet 65:111124 Phillips JE, Corces VG (2009) CTCF: master weaver of the genome. Cell 137:11941211 Phillips-Cremins JE, Corces VG (2013) Chromatin insulators: linking genome organization to cellular function. Mol Cell 50:461474 Phillips-Cremins JE, Sauria ME, Sanyal A, Gerasimova TI, Lajoie BR, Bell JS, Ong CT, Hookway TA, Guo C, Sun Y, Bland MJ, Wagstaff W, Dalton S, McDevitt TC, Sen R, Dekker J, Taylor J, Corces VG (2013) Architectural protein subclasses shape 3D organization of genomes during lineage commitment. Cell 153:12811295 Phylipsen M, Prior JF, Lim E, Lingam N, Vogelaar IP, Giordano PC, Finlayson J, Harteveld CL (2010) Thalassemia in Western Australia: 11 novel deletions characterized by Multiplex Ligation-dependent Probe Amplication. Blood Cells Mol Dis 44:146151 Poitras L, Yu M, Lesage-Pelletier C, Macdonald RB, Gagne JP, Hatch G, Kelly I, Hamilton SP, Rubenstein JL, Poirier GG, Ekker M (2010) An SNP in an ultraconserved regulatory element affects Dlx5/Dlx6 regulation in the forebrain. Development 137:30893097 Pomerantz MM, Ahmadiyeh N, Jia L, Herman P, Verzi MP, Doddapaneni H, Beckwith CA, Chan JA, Hills A, Davis M, Yao K, Kehoe SM, Lenz HJ, Haiman CA, Yan C, Henderson BE, Frenkel B, Barretina J, Bass A, Tabernero J, Baselga J, Regan MM, Manak JR, Shivdasani R, Coetzee GA, Freedman ML (2009) The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nat Genet 41:882884 Ponting CP, Hardison RC (2011) What fraction of the human genome is functional? Genome Res 21:17691776 Pop R, Conz C, Lindenberg KS, Blesson S, Schmalenberger B, Briault S, Pfeifer D, Scherer G (2004) Screening of the 1Mb SOX9 5 control region by array CGH identies a large deletion in a case of campomelic dysplasia with XY sex reversal. J Med Genet 41:e47 Powers NR, Eicher JD, Butter F, Kong Y, Miller LL, Ring SM, Mann M, Gruen JR (2013) Alleles of a polymorphic ETV6 binding site in DCDC2 confer risk of reading and language impairment. Am J Hum Genet 93:1928 Prabhakar S, Noonan JP, Paabo S, Rubin EM (2006) Accelerated evolution of conserved noncoding sequences in humans. Science 314:786 Prabhakar S, Visel A, Akiyama JA, Shoukry M, Lewis KD, Holt A, Plajzer-Frick I, Morrison H, Fitzpatrick DR, Afzal V, Pennacchio LA, Rubin EM, Noonan JP (2008) Human-specic gain of function in a developmental enhancer. Science 321:13461350 Praetorius C, Grill C, Stacey SN, Metcalf AM, Gorkin DU, Robinson KC, Van Otterloo E, Kim RS, Bergsteinsdottir K, Ogmundsdottir MH, Magnusdottir E, Mishra PJ, Davis SR, Guo T, Zaidi MR, Helgason AS, Sigurdsson MI, Meltzer PS, Merlino G, Petit V, Larue L, Loftus SK, Adams DR, Sobhiafshar U, Emre NC, Pavan WJ, Cornell R, Smith AG, McCallion AS, Fisher DE, Stefansson K, Sturm RA, Steingrimsson E (2013) A polymorphism in IRF4 affects human pigmentation through a tyrosinase-dependent MITF/TFAP2A pathway. Cell 155:10221033 Pritchett J, Athwal V, Roberts N, Hanley NA, Hanley KP (2011) Understanding the role of SOX9 in acquired diseases: lessons from development. Trends Mol Med 17:166174 Rada-Iglesias A, Bajpai R, Swigut T, Brugmann SA, Flynn RA, Wysocka J (2010) A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470:279283 Ragvin A, Moro E, Fredman D, Navratilova P, Drivenes O, Engstrom PG, Alonso ME, de la Calle Mustienes E, Gomez Skarmeta JL, Tavares MJ, Casares F, Manzanares M, van Heyningen V, Molven A, Njolstad PR, Argenton F, Lenhard B, Becker TS (2010) Long-range gene regulation links genomic type 2 diabetes and obesity risk regions to HHEX, SOX4, and IRX3. Proc Natl Acad Sci USA 107:775780 Rahimov F, Marazita ML, Visel A, Cooper ME, Hitchler MJ, Rubini M, Domann FE, Govil M, Christensen K, Bille C, Melbye M, Jugessur A, Lie RT, Wilcox AJ, Fitzpatrick DR, Green ED,

13

Mossey PA, Little J, Steegers-Theunissen RP, Pennacchio LA, Schutte BC, Murray JC (2008) Disruption of an AP-2alpha binding site in an IRF6 enhancer is associated with cleft lip. Nat Genet 40:13411347 Rainger JK, Bhatia S, Bengani H, Gautier P, Rainger J, Pearson M, Ansari M, Crow J, Mehendale F, Palinkasova B, Dixon MJ, Thompson PJ, Matarin M, Sisodiya SM, Kleinjan DA, Fitzpatrick DR (2014) Disruption of SATB2 or its long-range cisregulation by SOX9 causes a syndromic form of Pierre Robin Sequence. Hum Mol Genet [Epub ahead of print] Ravi V, Bhatia S, Gautier P, Loosli F, Tay BH, Tay A, Murdoch E, Coutinho P, van Heyningen V, Brenner S, Venkatesh B, Kleinjan DA (2013) Sequencing of Pax6 loci from the elephant shark reveals a family of Pax6 genes in vertebrate genomes, forged by ancient duplications and divergences. PLoS Genet 9:e1003177 Refai O, Friedman A, Terry L, Jewett T, Pearlman A, Perle MA, Ostrer H (2010) De novo 12;17 translocation upstream of SOX9 resulting in 46, XX testicular disorder of sex development. Am J Med Genet A 152A:422426 Richards M, Coppee F, Thomas N, Belayew A, Upadhyaya M (2011) Facioscapulohumeral muscular dystrophy (FSHD): an enigma unravelled? Hum Genet 131:325340 Robinson DO, Howarth RJ, Williamson KA, van Heyningen V, Beal SJ, Crolla JA (2008) Genetic analysis of chromosome 11p13 and the PAX6 gene in a series of 125 cases referred with aniridia. Am J Med Genet A 146A:558569 Roessler E, Ward DE, Gaudenz K, Belloni E, Scherer SW, Donnai D, Siegel-Bartelt J, Tsui LC, Muenke M (1997) Cytogenetic rearrangements involving the loss of the Sonic Hedgehog gene at 7q36 cause holoprosencephaly. Hum Genet 100:172181 Rowan S, Siggers T, Lachke SA, Yue Y, Bulyk ML, Maas RL (2010) Precise temporal control of the eye regulatory gene Pax6 via enhancer-binding site afnity. Genes Dev 24:980985 Ruf S, Symmons O, Uslu VV, Dolle D, Hot C, Ettwiller L, Spitz F (2011) Large-scale analysis of the regulatory architecture of the mouse genome with a transposon-associated sensor. Nat Genet 43:379386 Sagai T, Hosoya M, Mizushina Y, Tamura M, Shiroishi T (2005) Elimination of a long-range cis-regulatory module causes complete loss of limb-specic Shh expression and truncation of the mouse limb. Development 132:797803 Sagai T, Amano T, Tamura M, Mizushina Y, Sumiyama K, Shiroishi T (2009) A cluster of three long-range enhancers directs regional Shh expression in the epithelial linings. Development 136:16651674 Sanyal A, Lajoie BR, Jain G, Dekker J (2012) The long-range interaction landscape of gene promoters. Nature 489:109113 Sauna ZE, Kimchi-Sarfaty C (2011) Understanding the contribution of synonymous mutations to human disease. Nat Rev Genet 12:683691 Shen Y, Yue F, McCleary DF, Ye Z, Edsall L, Kuan S, Wagner U, Dixon J, Lee L, Lobanenkov VV, Ren B (2012) A map of the cis-regulatory sequences in the mouse genome. Nature 488:116120 Shibata Y, Shefeld NC, Fedrigo O, Babbitt CC, Wortham M, Tewari AK, London D, Song L, Lee BK, Iyer VR, Parker SC, Margulies EH, Wray GA, Furey TS, Crawford GE (2012) Extensive evolutionary changes in regulatory element activity during human origins are associated with altered gene expression and positive selection. PLoS Genet 8:e1002789 Smemo S, Campos LC, Moskowitz IP, Krieger JE, Pereira AC, Nobrega MA (2012) Regulatory variation in a TBX5 enhancer leads to isolated congenital heart disease. Hum Mol Genet 21:32553263

Hum Genet Smith RP, Lam ET, Markova S, Yee SW, Ahituv N (2012) Pharmacogene regulatory elements: from discovery to applications. Genome Med 4:45 Soshnikova N, Duboule D (2009) Epigenetic temporal control of mouse Hox genes in vivo. Science 324:13201323 Sotelo J, Esposito D, Duhagon MA, Baneld K, Mehalko J, Liao H, Stephens RM, Harris TJ, Munroe DJ, Wu X (2010) Long-range enhancers on 8q24 regulate c-Myc. Proc Natl Acad Sci USA 107:30013005 Spielmann M, Brancati F, Krawitz PM, Robinson PN, Ibrahim DM, Franke M, Hecht J, Lohan S, Dathe K, Nardone AM, Ferrari P, Landi A, Wittler L, Timmermann B, Chan D, Mennen U, Klopocki E, Mundlos S (2012) Homeotic arm-to-leg transformation associated with genomic rearrangements at the PITX1 locus. Am J Hum Genet 91:629635 Spitz F, Duboule D (2008) Global control regions and regulatory landscapes in vertebrate development and evolution. Adv Genet 61:175205 Spitz F, Furlong EE (2012) Transcription factors: from enhancer binding to developmental control. Nat Rev Genet 13:613626 Spitz F, Gonzalez F, Peichel C, Vogt TF, Duboule D, Zakany J (2001) Large scale transgenic and cluster deletion analysis of the HoxD complex separate an ancestral regulatory module from evolutionary innovations. Genes Dev 15:22092214 Spitz F, Montavon T, Monso-Hinard C, Morris M, Ventruto ML, Antonarakis S, Ventruto V, Duboule D (2002) A t(2;8) balanced translocation with breakpoints near the human HOXD complex causes mesomelic dysplasia and vertebral defects. Genomics 79:493498 Spitz F, Gonzalez F, Duboule D (2003) A global control region denes a chromosomal regulatory landscape containing the HoxD cluster. Cell 113:405417 Sturm RA, Larsson M (2009) Genetics of human iris colour and patterns. Pigment Cell Melanoma Res 22:544562 Sturm RA, Duffy DL, Zhao ZZ, Leite FP, Stark MS, Hayward NK, Martin NG, Montgomery GW (2008) A single SNP in an evolutionary conserved region within intron 86 of the HERC2 gene determines human blue-brown eye color. Am J Hum Genet 82:424431 Sun M, Ma F, Zeng X, Liu Q, Zhao XL, Wu FX, Wu GP, Zhang ZF, Gu B, Zhao YF, Tian SH, Lin B, Kong XY, Zhang XL, Yang W, Lo WH, Zhang X (2008) Triphalangeal thumb-polysyndactyly syndrome and syndactyly type IV are caused by genomic duplications involving the long range, limb-specic SHH enhancer. J Med Genet 45:589595 Sur IK, Hallikas O, Vaharautio A, Yan J, Turunen M, Enge M, Taipale M, Karhu A, Aaltonen LA, Taipale J (2012) Mice lacking a Myc enhancer that includes human SNP rs6983267 are resistant to intestinal tumors. Science 338:13601363 Teng L, Firpi HA, Tan K (2011) Enhancers in embryonic stem cells are enriched for transposable elements and genetic variations associated with cancers. Nucleic Acids Res 39:73717379 Thomsen MK, Ambroisine L, Wynn S, Cheah KS, Foster CS, Fisher G, Berney DM, Moller H, Reuter VE, Scardino P, Cuzick J, Ragavan N, Singh PB, Martin FL, Butler CM, Cooper CS, Swain A (2010) SOX9 elevation in the prostate promotes proliferation and cooperates with PTEN loss to drive tumor formation. Cancer Res 70:979987 Tishkoff SA, Reed FA, Ranciaro A, Voight BF, Babbitt CC, Silverman JS, Powell K, Mortensen HM, Hirbo JB, Osman M, Ibrahim M, Omar SA, Lema G, Nyambo TB, Ghori J, Bumpstead S, Pritchard JK, Wray GA, Deloukas P (2007) Convergent adaptation of human lactase persistence in Africa and Europe. Nat Genet 39:3140

13

Hum Genet Tuan D, Solomon W, Li Q, London IM (1985) The beta-like-globin gene domain in human erythroid cells. Proc Natl Acad Sci USA 82:63846388 Tufarelli C, Stanley JA, Garrick D, Sharpe JA, Ayyub H, Wood WG, Higgs DR (2003) Transcription of antisense RNA leading to gene silencing and methylation as a novel cause of human genetic disease. Nat Genet 34:157165 Turner BM (2007) Dening an epigenetic code. Nat Cell Biol 9:26 Tuupanen S, Turunen M, Lehtonen R, Hallikas O, Vanharanta S, Kivioja T, Bjorklund M, Wei G, Yan J, Niittymaki I, Mecklin JP, Jarvinen H, Ristimaki A, Di-Bernardo M, East P, Carvajal-Carmona L, Houlston RS, Tomlinson I, Palin K, Ukkonen E, Karhu A, Taipale J, Aaltonen LA (2009) The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling. Nat Genet 41:885890 Uchikawa M, Ishida Y, Takemoto T, Kamachi Y, Kondoh H (2003) Functional analysis of chicken Sox2 enhancers highlights an array of diverse regulatory elements that are conserved in mammals. Dev Cell 4:509519 van der Maarel SM, Tawil R, Tapscott SJ (2011) Facioscapulohumeral muscular dystrophy and DUX4: breaking the silence. Trends Mol Med 17:252258 van der Maarel SM, Miller DG, Tawil R, Filippova GN, Tapscott SJ (2012) Facioscapulohumeral muscular dystrophy: consequences of chromatin relaxation. Curr Opin Neurol 25:614620 van Steensel B, Dekker J (2010) Genomics tools for unraveling chromosome architecture. Nat Biotechnol 28:10891095 Velagaleti GV, Bien-Willner GA, Northup JK, Lockhart LH, Hawkins JC, Jalal SM, Withers M, Lupski JR, Stankiewicz P (2005) Position effects due to chromosome breakpoints that map approximately 900Kb upstream and approximately 1.3Mb downstream of SOX9 in two patients with campomelic dysplasia. Am J Hum Genet 76:652662 Vernimmen D, Lynch MD, De Gobbi M, Garrick D, Sharpe JA, Sloane-Stanley JA, Smith AJ, Higgs DR (2011) Polycomb eviction as a new distant enhancer function. Genes Dev 25:15831588 Visser M, Kayser M, Palstra RJ (2012) HERC2 rs12913832 modulates human pigmentation by attenuating chromatin-loop formation between a long-range enhancer and the OCA2 promoter. Genome Res 22:446455 Wagner T, Wirth J, Meyer J, Zabel B, Held M, Zimmer J, Pasantes J, Bricarelli FD, Keutel J, Hustert E, Wolf U, Tommerup N, Schempp W, Scherer G (1994) Autosomal sex reversal and campomelic dysplasia are caused by mutations in and around the SRY-related gene SOX9. Cell 79:11111120 Walker SL, Ariga J, Mathias JR, Coothankandaswamy V, Xie X, Distel M, Koster RW, Parsons MJ, Bhalla KN, Saxena MT, Mumm JS (2012) Automated reporter quantication in vivo: high-throughput screening method for reporter-based assays in zebrash. PLoS One 7:e29916 Wallis DE, Roessler E, Hehr U, Nanni L, Wiltshire T, Richieri-Costa A, Gillessen-Kaesbach G, Zackai EH, Rommens J, Muenke M (1999) Mutations in the homeodomain of the human SIX3 gene cause holoprosencephaly. Nat Genet 22:196198 Wang H, Leav I, Ibaragi S, Wegner M, Hu GF, Lu ML, Balk SP, Yuan X (2008) SOX9 is expressed in human fetal prostate epithelium and enhances prostate cancer invasion. Cancer Res 68:16251630 Wasserman NF, Aneas I, Nobrega MA (2010) An 8q24 gene desert variant associated with prostate cancer risk confers differential in vivo activity to a MYC enhancer. Genome Res 20:11911197 Wendt KS, Yoshida K, Itoh T, Bando M, Koch B, Schirghuber E, Tsutsumi S, Nagae G, Ishihara K, Mishiro T, Yahata K, Imamoto F, Aburatani H, Nakao M, Imamoto N, Maeshima K, Shirahige K, Peters JM (2008) Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature 451:796801 White D, Rabago-Smith M (2010) Genotype-phenotype associations and human eye color. J Hum Genet 56:57 Wieczorek D, Pawlik B, Li Y, Akarsu NA, Caliebe A, May KJ, Schweiger B, Vargas FR, Balci S, Gillessen-Kaesbach G, Wollnik B (2009) A specic mutation in the distant sonic hedgehog (SHH) cis-regulator (ZRS) causes Werner mesomelic syndrome (WMS) while complete ZRS duplications underlie Haas type polysyndactyly and preaxial polydactyly (PPD) with or without triphalangeal thumb. Hum Mutat 31:8189 Winkler GS, Kristjuhan A, Erdjument-Bromage H, Tempst P, Svejstrup JQ (2002) Elongator is a histone H3 and H4 acetyltransferase important for normal histone acetylation levels in vivo. Proc Natl Acad Sci USA 99:35173522 Wirth J, Wagner T, Meyer J, Pfeiffer RA, Tietze HU, Schempp W, Scherer G (1996) Translocation breakpoints in three patients with campomelic dysplasia and autosomal sex reversal map more than 130kb from SOX9. Hum Genet 97:186193 Witcher M, Emerson BM (2009) Epigenetic silencing of the p16(INK4a) tumor suppressor is associated with loss of CTCF binding and a chromatin boundary. Mol Cell 34:271284 Wittkopp PJ, Kalay G (2011) Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat Rev Genet 13:5969 Woltering JM, Duboule D (2009) Conserved elements within open reading frames of mammalian Hox genes. J Biol 8:17 Wray GA (2007) The evolutionary signicance of cis-regulatory mutations. Nat Rev Genet 8:206216 Wright JB, Brown SJ, Cole MD (2010) Upregulation of c-MYC in cis through a large chromatin loop linked to a cancer risk-associated single-nucleotide polymorphism in colorectal cancer cells. Mol Cell Biol 30:14111420 Wunderle VM, Critcher R, Hastie N, Goodfellow PN, Schedl A (1998) Deletion of long-range regulatory elements upstream of SOX9 causes campomelic dysplasia. Proc Natl Acad Sci USA 95:1064910654 Yap KL, Li S, Munoz-Cabello AM, Raguz S, Zeng L, Mujtaba S, Gil J, Walsh MJ, Zhou MM (2010) Molecular interplay of the noncoding RNA ANRIL and methylated histone H3 lysine 27 by polycomb CBX7 in transcriptional silencing of INK4a. Mol Cell 38:662674 Zhang X, Cowper-Sallari R, Bailey SD, Moore JH, Lupien M (2012) Integrative functional genomics identies an enhancer looping to the SOX9 gene disrupted by the 17q24.3 prostate cancer risk locus. Genome Res 22:14371446 Zhang Y, Wong CH, Birnbaum RY, Li G, Favaro R, Ngan CY, Lim J, Tai E, Poh HM, Wong E, Mulawadi FH, Sung WK, Nicolis S, Ahituv N, Ruan Y, Wei CL (2013) Chromatin connectivity maps reveal dynamic promoter-enhancer long-range associations. Nature 504:306310 Zinzen RP, Girardot C, Gagneur J, Braun M, Furlong EE (2009) Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature 462:6570 Zuniga A, Michos O, Spitz F, Haramis AP, Panman L, Galli A, Vintersten K, Klasen C, Manseld W, Kuc S, Duboule D, Dono R, Zeller R (2004) Mouse limb deformity mutations disrupt a global control region within the large regulatory landscape required for Gremlin expression. Genes Dev 18:15531564

13

Das könnte Ihnen auch gefallen