Sie sind auf Seite 1von 7

BIOINFORMATICS

Sequence analysis

DISCOVERY NOTE

Vol. 00 no. 00 2005, pages 17 doi:10.1093/bioinformatics/bti767

The ASCH superfamily: novel domains with a fold related to the PUA domain and a potential role in RNA metabolism
Lakshminarayan M. Iyer, A. Maxwell Burroughs and L. Aravind
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
Received on May 4, 2005; revised on November 1, 2005; accepted on November 4, 2005 Advance Access publication . . . Associate Editor: Chris Stoeckert ABSTRACT Several studies show that transcription coactivators are often bi-functional ribonucleoprotein complexes that also regulate pre-mRNA processing and splicing decisions. Using sensitive sequence profile searches and structural comparisons we show that the C-terminal domain of the human coactivator protein ASC-1 defines a novel superfamily, the ASC-1 homology (ASCH) domain. The approximately 110 amino acid long ASCH domains are widely represented in all the three superkingdoms of life and several prokaryotic viruses. We show that the ASCH superfamily adopts a beta-barrel fold similar to the PUA domain superfamily. Using multiple lines of evidence, we suggest that members of the ASCH superfamily are likely to function as RNA-binding domains in contexts related to coactivation, RNA-processing and possibly prokaryotic translation regulation. Structural analysis of ASCH domains reveals the presence of a potential RNA-binding cleft associated with a conserved sequence motif, which is characteristic of this superfamily. Despite their similar structure, the ASCH and PUA domains appear to occupy distinct functional niches, with the former domains typically occurring in a standalone form in polypeptides, and the latter domains showing fusions to a variety of RNA-modifying enzymes. Contact: aravind@ncbi.nlm.nih.gov Supplementary information: A complete alignment of all ASCH domains in the NR-database and other domains found fused to the ASCH are provided as text files.

The URL (ftp:// ftp.ncbi.nih.gov/ blast/documents/ README.bcl) does not lead to the web page. Please check format and details. Please provide the URL for the supplementary materials cited.

INTRODUCTION
Systematic analyses of the proteins involved in RNA metabolism have suggested that despite the complexity of this system the majority of proteins are constructed from a relatively small set of conserved globular domains (For summary see Anantharaman et al., 2002). The phyletic proles of these conserved domains derived from large-scale comparative analyses of genomes from the three superkingdoms of life show certain interesting features (Anantharaman et al., 2002; Koonin and Mushegian, 1996). Many of the RNA-binding domains, typically those present in ribosomal proteins, translation factors and tRNA and rRNA-modifying enzymes, are widely represented across the three superkingdoms of life. These appear to be ancient innovations, which were originally utilized in core RNA metabolism processes that are likely to have been already present in the last universal common ancestor (LUCA) of all cellular life forms. In some cases, a subset of these

ancient domains also appear to have been secondarily recruited to many of the unique eukaryotic innovations such as splicing, posttranscriptional gene silencing, mRNA capping and polyadenylation, and nonsense-mediated RNA decay (Anantharaman et al., 2002; Clissold and Ponting, 2000). Identication of these ancient RNAbinding domains have helped considerably in uncovering aspects of RNAprotein interactions that hold good across a wide range of biological functional contexts, and in clarifying the roles of uncharacterized conserved proteins from phylogenetically distant organisms (e.g. see Cerutti et al., 2000; Fatica et al., 2004; Ishitani et al., 2002; Korber et al., 1999; Reid et al., 1999). Given these antecedents, we were interested in the identication of any potentially novel ancient conserved domains that might throw light on poorly understood ribonucleoprotein complexes that have been identied in the cellular transcription apparatus. The activating signal cointegrator 1 or the thyroid hormone receptor interactor protein 4 (ASC-1/TRIP4) is a transcriptional coactivator that is widely conserved in eukaryotes and is part of a potential RNA interacting protein complex (Jung et al., 2002; Kim et al., 1999). ASC-1 directly interacts with a wide range of unrelated transcription factors such as the serum response factor, NFkB, AP-1 and nuclear hormone receptors, and has been shown to be part of a protein complex that bridges these specic transcription factors to the basal transcriptional apparatus (Jung et al., 2002). One of the proteins of this coactivator complex is an RNA helicase, while the other one has an RNA-binding KH domain fused to a 2H RNA phosphoesterase (Jung et al., 2002; Mazumder et al., 2002). ASC-1 itself contains a conserved cysteine-rich Zn-chelating domain, which binds transcription factors (Jung et al., 2002) and a conserved C-terminal domain which has thus far not been characterized. Using sensitive sequence prole searches and structural comparisons we show that the C-terminal domain of ASC-1 domain denes a superfamily of domains that is widely distributed across the three superkingdoms of the life. We show that this superfamily assumes a protein fold, which was originally observed in the RNA-binding PUA domain. Our ndings suggest that this unique b-barrel fold, which is encountered both in the new superfamily of domains typied by the C-terminal domain of ASC-1 and the PUA superfamily, denes an ancient structural theme in RNAprotein interactions.

SYSTEMS AND METHODS


The non-redundant (NR) database of protein sequences (National Center for Biotechnology Information, NIH, Bethesda) was searched using the

To whom correspondence should be addressed.

The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

L.M.Iyer et al.

BLASTPGP program (Altschul et al., 1997). Iterative sequence prole searches were done using the PSI-BLAST program either with a single sequence or with an alignment used as the query, with a prole inclusion expectation (E) value threshold of 0.01, and were iterated until convergence (Altschul et al., 1997). For all searches with compositionally biased proteins, the statistical correction for this bias was employed (Schaffer et al., 2001). Multiple alignments were constructed using the T_Coffee program, followed by manual correction based on the PSI-BLAST results (Notredame et al., 2000). Hidden Markov models (HMMs) were built from alignments using the hmmbuild program and searches were carried out using the hmmsearch program from the HMMer package (Eddy, 1998). Protein secondary structure was predicted using a multiple alignment as the input for the JPRED and PHD programs (Cuff and Barton, 2000; Cuff et al., 1998; Rost et al., 1994). Preliminary clustering of proteins was done using the BLASTCLUST program with empirically determined length and score threshold cut-off values (For documentation see ftp://ftp.ncbi.nih.gov/blast/documents/README. bcl). Previously known, conserved domains were identied using PSIBLAST derived proles with the RPS-BLAST program (http://www.ncbi. nlm.nih.gov/Structure/cdd/wrpsb.cgi) (Schaffer et al., 1999). Structure similarity searches were conducted using the DALI program (Holm and Sander, 1995). Structure manipulations and the construction of ribbon and surface diagrams were performed using the Pymol program (Delano, 2002). Gene neighborhoods were obtained by isolating all conserved genes in the neighborhood of the gene under consideration that showed a separation of less than 70 nt between their termini. Genes fullling this criterion were considered likely to form operons. Gene neighborhoods were determined by searching the NCBI PTT tables (http://www.ncbi.nlm.nih.gov/entrez/query. fcgi?db=Genome) with an in-house PERL script.

RESULTS AND DISCUSSION Identification of the ASCH domain


The ASC-1 proteins from animals are relatively large proteins (around 580650 amino acids), and the only characterized globular domain in them is a unique Zn-chelating domain with 7 cysteines and 1 histidine. This domain was shown to be critical for the interaction of ASC-1 with specic transcription factors and is likely to form a binuclear metal cluster chelating two Zn atoms (Jung et al., 2002). Given that other polypeptides of the ASC-1-containing complex have characteristic RNA-interaction domains, we further investigated the ASC-1 proteins to identify potential links to RNA interaction. Analysis of the human ASC-1 protein with the SEG program revealed that it contains additional globular segments, including a C-terminal globular segment (gi: 6013191, 434581), which in searches of the NR database with the BLASTPGP gave signicant hits to the proteins SAP1p60 from the bacterium Streptomyces avermitilis and Mbur03000455 from the archaeon Methanococcoides burtonii (E 105 and 103, respectively). This region of similarity did not map to any previously published protein domain and more or less encompassed the entire length of the prokaryotic proteins, suggesting that it might dene a novel protein domain. Further iterations of the search retrieved a large number of uncharacterized proteins from vertebrates, prokaryotes and bacteriophages such as LOC541578 from Homo sapiens (iteration 3; E 103), gp69 from the Mycobacteriophage Che9c (iteration 2; E 106 ), PF0238 from Pyrococcus furiosus (iteration 3; E 104) and the TTC18981protein from Thermus thermophilus (iteration 3; E 103) whose crystal structure has been determined (pdb id: 1wk2). All the sequences showed a highly conserved GxKxxxxR motif that they shared with the ASC-1 protein. At convergence, the search also retrieved several proteins with the

GxKxxxxR motif with E-values of border-line signicance (E > 0.01). In order to retrieve all possible homologs for a comprehensive analysis, we conducted transitive sequence prole searches seeded with several homologs of the ASC-1 protein, which were recovered in the above search. As a result, we recovered several additional signicant hits from diverse species from all three superkingdoms, and proteins whose structures have been determined as part of various structural genomics project (E < 102) such as the uncharacterized proteins YqfB (pdb:1TE7) from Escherichia coli, PF0455 (pdb: 1S04) from P.furiosus, and EF3133 (pdb: 1T62) from Enterococcus faecalis (Fig. 1). Some of these proteins had been classied into separate families of domains of unknown function, DUF437, DUF984 and DUF1530, in the PFAM database (Bateman et al., 2004). The sequence afnities between the proteins recovered in the above searches were also independently corroborated by searches with HMMs derived using a seed alignment of the originally detected set of ASC-1 homologs. Furthermore, comparisons of the predicted secondary structures for different subgroups of these homologous domains with the above-mentioned proteins with X-ray or NMR structures showed complete congruence, indicating that these 103120 amino acid long domains dene a novel monophyletic superfamily (Fig. 1). We refer to this superfamily, containing over 180 distinct representatives in the NR database from viruses and cellular organisms belonging to all three superkingdoms of life, as the ASC-1-homology (ASCH) superfamily. Structure similarity searches with members of the ASCH superfamily showed that it contains a fold, which was previously noted in the PUA domain (Fig. 2) (DALI Z-scores 4.56). For example, DALI searches with the Thermus TTC18981 protein (pdb id: 1wk2) retrieved the PUA domains from pseudouridine synthase (pdb id: 1k8w, Z-score 4.8), ATP sulfurylase (1g8f, Z-score 4.8) and Archaeosine tRNA-guanine transglycosylase (1k8w Z-score 3.6) in addition to the bona de ASCH proteins derived from structural genomics projects (pdb ids: 1t62, 1xne, 1zce, 1t5y, 1nxz; Z-scores 4.65.8). The PUA domain is an ancient RNA-binding domain, which is fused to the catalytic domains of a variety of RNAmodifying enzymes such as pseudouridine synthetases of the TruB family, the archaeosine transglycosylase, Rossmann fold methylases, YggJ-type SPOUT domain RNA methylases and thiouridine synthases, and also occurs as standalone forms (Anantharaman et al., 2002; Aravind and Koonin, 1999; Forouhar et al., 2003). However, PUA domains were not recovered in any of the sequence prole searches seeded with the ASCH domain or vice versa, suggesting that these two classes of domains form distinct sequence superfamiles, despite them sharing a common fold. We propose that the fold be renamed the PUA-ASCH fold to reect the two distinct superfamilies of the fold. The ASCH domains contain a conserved core of ve strands that form a b-barrel, and a characteristic helix between strand-1 and strand-2 (Fig. 2). Additionally, most versions of the ASCH domain, unlike the majority PUA domains, contain a long insert between strand 4 and strand-5 that usually forms two or more helical segments (Fig. 2). In terms of sequence conservation, the most characteristic feature of the ASCH superfamily is a GxK motif (where x is any amino acid) that is found in the distinctive turn between the core helix and strand-2 (Figs 1 and 2). Members of the ASCH superfamily also contain a highly conserved polar position, two residues downstream of this GXK motif, which is typically occupied by either glutamate or threonine (Figs 1 and 2).

Secondary Structure asc-1_Cele_14530648 ASC-1_Hsap_6013191 CG11710_Dmel_20129087 Tb927.5.3590_Tbru_62176712 LOC541578_Hsap_16306914 OSJNBa0033P04.18_Osat_31193905 AY085136.1_Atha_21537348 SCP1.114_Scoe_21234110 69_BPChe9c_29424900 all7271_Ana_17233287 all7370_Ana_17134569 Mbur03000455_Mbur_46142641 TTC1891_Tthe_46200193(1wk2) TK2233_Tkod_57160492 PH1922_Phor_14591666 Mfl470_Mflo_50365287 orfX_Llac_2327031 MK0523_Mkan_20093961 PAE0017_Pyae_18311656 APE1779_Aper_14601620 rpL22_Upar_13357796 MYPE10090_Mpen_26554457 BMEI1226_Bmel_17983209 Tery02001359_Tery_48894438 LA3196_Lint_24215895 SpyM3_1233_BP315.4_28876347 LJ0305_Ljoh_42518395 plu4361_Plum_37528189 MMP0784_Mmar_45358347 ORF5_BPpsiM2_3249590 ZMO0922_Zmob_56543392 MTH1209_Mthe_2622319 NCU09867.1_Ncra_32420691 MG03322.4_Mgri_38105284 PF0455_Pfur_18976827(1S04) At3g03320_Atha_32189307 TK1837_Tkod_57160096 PH0355_Phor_14590265 lp_3017_Lpla_28379445 Ssui801001492_Ssui_50590499 TK0055_Tkod_57158314 PH0447_Phor_14590361 Ooen02001772_Ooen_48864792 Lmes02000490_Lmes_23023726 BH0816_Bhal_10173430 Exigu03000404_Exsp_53771700 GBAA3599_Bant_47528885 AY084222.1_Atha_21536491 MK0492_Mkan_20093930 EF3133_Efae_29377587(1T62) BT1423_Bthe_29338730 Desu02007233_Dhaf_53682585 lp_0320_Lpla_28377242 Rgel02000913_Rgel_47575656 ECs5377_Ecol_38703857 ECA0252_Ecar_50119212 SC4065_Sent_62182635 SCO4741_Scoe_21223120 BL0764_Blon_23465341 y1855_Ypes_22125748 BlinB01003512_Blin_62422944 VV12528_Vvul_27361979 CAC1000_Cace_15894287 yjfI_Llac_15672915 283 433 366 465 2 10 7 10 1 1 1 60 4 5 1 1 1 1 1 10 189 294 101 3 38 4 1 1 1 1 8 1 639 586 1 105 2 1 1 1 2 3 1 8 1 1 1 105 235 22 4 25 26 20 23 12 20 51 11 10 21 26 29 17

Str-1 Hel-1 Str-2 Str-3 Str-4 <-------variable helical insert------------> Str-5 ...EEEE... HHHHHHHHHH.....EEEEEE...................EEEEE... ....EEEEEEEE..........hhhhhhh....... ...........hhhhhhhhhh .EEEEEEEEE.. KGYVLALAQ- -----PTASFLV--NGLCRFIRWPTDL-------NLK---GPLFITAKAG29AVLGRAFLQECMMLE-------DFLDKYGSKSAI-- ----------------PGGD-3 VLCFSAFEPLVS GGWCLSVHQ- -----PWASLLV--RGIKRVEGRSWYT-------PHR---GRLWIAATAK31CLLGCVDLIDCLSQK-------QFKEQFPDIS---- ------------QESD-----3 VFICKNPQEMVV -RQCLSMHQ- -----PWASLLV--AGIKKHEGRVWYS-------EHR---GRLWIASTSK32SLLGCVHVDSCLPQE-------EYRELYPN------ ------------GESE-----3 VFVCTKPEQLNI NGTCLSLHQ- -----PWAGLLV--AGIKVHEGRVWST-------DYR---GRLWIHAASS29VLLGYVFLMDCMDRE-------RYEENYTPEQ---- ------------RQEE-----3 SFICAVGKTLPF KFGCLSFRQ- -----PYAGFVL--NGIKTVETRWRPL----LSSQRN---CTIAVHIAHR35VIAGLVDIGETLQCP-------EDLTPDEVVELE-- -----------NQAVLTNLK-3 LTVISNPRWLLE RNPCLTMHQ- -----PWASLLV--HGIKRVEGRSWPS-------PLT---GRLWIHAASK35RLLGCVEVVGCVTSQ-------ELASWEHVPQSV-- -----------RLEALTD---3 LCENPQKLVVPF KNPCLTMHQ- -----PWASLLV--HGIKRIEGRSWPS-------PIR---GRLWIHAASK35RLIGCVEVVGCVTSD-------ELQNWDALPQGV-- -----------RLEGQTN---3 LCEKPQKLIIPF SVRGITIKQ- -----PWAACIL--HGDKRVENRPRAW--------TP---GWRLLHAGAE23AVLGVVRITGSHTDT---------GEPCSPWA---- -------------QP------3 HLVLADVHALAL -MRAITVRQ- -----PWAWQII--NQRKNIENRTRNI-----AGKYR---GPAAIHAALK26VILGVVDLVDVHQSA--------PFCCASDWG---- -------------ELLW----3 HLVLANPRPIPI -MKALTVRQ- -----PWAWAII--YANKNIENRVWPI-------HYR---GDILIHAAQK29QIIGVVRVVGCQFSE-----------TGSGWG---- -------------MPQQ----3 HLDNPRTITPIP -MKALTVRQ- -----PWAWAII--YANKDIENRGWAI-------HYR---GDILIHSAKG29EIIGVVRVVGCQFSE-----------TAASWG---- -------------MPQQ----3 HLTNPRAITPIP KIRVLAIRQ- -----PWASLII--RGLKNIEVRSKNT-------YVR---GTIAIYASRS27MIIGTVNLVECKEYE----SDFHFKLDQRRHLNP-- -----------EESYSKNI--3 FLKSPRPIE-PV PKLGLIVRE- -----PYASLIV--DGRKVWEIRRRK-------TRHR---GPLGIVSG--1 RLIGQADLVGVEGPF----SVEELLAHQEKHLAE-- ---------EAFLRAYAKDEP4 VLENAFRYEKPL KKKGLIVRE- -----PYATLIV--EGEKVWEIRKSR-------TKIR---GEVLIISN--1 KAIGKAELVDVLGPF----SPEELAEHFDKHRAE-- ---------PEFLKEYSNGKP4 VFKNAEKFEKPK -MKGLIIKQ- -----PYANWIV--EGKKVWEIRKTP-------TKIR---GRVIIISE--1 KAVGSVEIVDVLGPF----TPEELANHENKHLAS-- ---------YDFLKRYANGKK4 VLANPEKYDPPI MDLILSMKK- -----EYFDLIK--TGKKKYEYRFKMPEL---NINDKL--YLYIPKSKA- ---GIYGYIKIRDVK---WMSKDEACTFYSNQ---- ----------FKDKEAAYQI-5 SHRCGCFVLTIS MTPIMSFWP- -----SIYDKIK--NQIKLIEYRRTFP-----NDCKYA--YMYITKPVK- ---AIGGIVYFGKKH---DLDDWKKQYSNNT----- -----------IISDR-----5 QSYRYGMEIIGF ----MSVKP- -----KFANLIL--DGVKNVEVRRWLPGT--ILRERTC--IVYASSPLC- ---AVLGEVTIEEIK---KVAIQSEEDLAEI----- ---------AELAKASEDE--5 EGRDHAYLITLD ----MSIKP- -----KFGEEIL--KGLKKYELRRLVGPL--VEPGDLL--YLYFTKPAG- ---AVIGRFTAGIVF---LVPAGSISRLLGE----- ---------LGDVGIGAED--4 KGAKYAMLIQVK ILYLMSIRP- -----QYARAIM--AGRKKYELRRIHGVP-PIEEGSII--IVYASGNVK- ---SIIGEFQAKRVI---QATPEKIWSIASK----- -----------PGSGVGED--5 RGAKRAIAIEVG ETIMISTSP- -----KNAQVLFD-DLEKNVIFYKTTP----INKVLRV--LVYVTSPTK- ---KVVGEFDLESVE---IGAISSIWRKYSK----- -----------QSVISKKE--5 EGKDKAHALVSK AIIMISTKK- -----EYADNLLNEDLNKNVFFYKVTP----VNPIKRV--LIYTTTGKK- ---GVVGEFDLDKIE---ILAVSTAWKKYGQ----- -----------QSCMSKKD--5 KDSEKAHILISK RDLVISIKP- -----NYSGKIF--DGVKTIELRRRFPL--SIAAGATA--YIYSTSPEM- ---ALVGTIKIENVE---RLQLRLLWKKHGQ----- -----------SASIKKAD--5 SGLEEGFALKLS NTVLLSIKP- -----EYADKIFY-QKTKKVELRRVFP---SLDKDDLV--IVYVSSPKK- ---AVVGYFKVKKII---KKNISYLWDEVEE----- -----------KAGITQEE--5 SGLDLGIGIFFQ RVVFLPIKP- -----EFAHKII--NGEKNIEFRKKFS----SQEVETI--VIYSSSPEK- ---RVIGYATVDSIV---IDTPDSLWKRFYK----- -----------KGGIDKDR--5 NGKETGVGIRIK IEAIISIKP- -----QFVDEIM--KGNKRFEFRKSFL----KSIPDRC--YIYSTKPVG- ---KIVGFFTIKQVL---RDEPEKIWQKTSK----- -----------KSGITKDF--5 SGKNKAVAIEID MKILLSIKP- -----KYVSSIM--NGTKKFEFRRKIF---KRKDVDTV--VVYATKPIG- ---KVVGEFEIKQVI---SDTPKSVWNMTAQ----- -----------YAGLDKLD--5 EGLDEAFAIQIS MKVLLSIKP- -----EYVDRIL--DGSKKFEFRKIAF---KNNQVQSV--VIYATMPIG- ---MIVGEFEIKEII---SNSPSVVWEMTHK----- -----------FAGTTKDF--5 EGREKAVAISIG MTILLSIKP- -----KYVKEIL--NGSKKYEFRKSIFK--KYDKNELV--FIYSSYPVK- ---RIVGTFSVGDII---ENCPKILWNEFKN----- -----------VSGIKESE--5 KEKDKGYAIQIN MKVLLSVKP- -----KYVEKIM--SGDKRYEFRKTIW----KKKINEV--YIYSTTPEK- ---KIVASFTYDKVI---KEDPQTLWEFYHE----- -----------ESGLTQGE--5 RGIKKGYAIPIK KEAVISLWP- -----EFAKAIV--SGKKTVEFRRRIP---LPALSARI--WIYATRPVK- ---SVIGFAYLEAIV---QGDVNTLWSRYGR----- -----------EAFLSEQQ--5 EGTEKATAFLLR -MVKGVKHPV ---LAEYAKRIH--DGEKNGFLRV-LPG--RFSPGDKF--VIYESYGQY- -----RGWADIVSIQ---KMPKGKMISEYGL----- -----------RLMITEDE--5 GGRKEMTIIEFK SDVIIVVPA- -----RHLEEIV--AGTRDHEFRPHV----LPAVRA----WFYATEPVN- ---EVKYMATLGEARK--PGEIEGGSGVGNE----- ------------EFNR----- GEPNKFAHKLLQ SDIVVTMHP- -----SKVKEIV--DGIRNHDFRTVK----LPRSVHRL--WIYVARPVC- ---ELRYMATIGEPHE--PGEIDSNTGLGNA----- ------------EFNE----- GKTSARWAYELE MEWEMGLQE- -----EFLELIK--LRKKKIEGRLYDE---KRRQIKP---GDVISFEG-- -GKLKVRVKAIRVYN----SFREMLEKEGLENVLP- -------GVKSIEEGIQVYRR9 YGVVAIEIEPLE VNFELHVQE- -----PYFTQLK--DGLKTVEGRCAVG---DYMRISS---GDFLLFNK-- -CLLLEVQDVHRYT-----SFSEMLKVEGLAKVLP- -------GVESIEEGVQVYRN9 NGVVAIRVAKPA ARWKMGLQE- -----EYLRAIA--EGKKKIEGRLYDE---KRQKIKP---GDEIVFE--- -NKLVCVVKDVRVYS----SFREMLEKEGLENVLP- -------GVKSIEEGVKVYRK9 YGVAAIEVEPVA MKWEMGLQE- -----EYIELIK--AGKKKIEGRLYDE---KRRQIKP---GDIIIFEG-- -GKLKVKVKGIRVYS----SFKEMLEKEGIENVLP- -------GVKSIEEGVKVYRQ9 YGVVAIEIEPIE -MTTMQLIH- -----PQWLLIK--SGLKTIEIRLNDA---KRQALQV---GDIVNFIDLT1 GQQLTTQLIDITRFA----SFESLLSEYTAVQVGSA -------PGTPVTQMVQEMLT9 SGVVALQVRPLI ----MLLAP- -----KPFEMMK--SGQKTIELRLYDE---KRKHIQI---GDRIRFYCTE2 TQTIEVQVLDLHIFD----NFAQLYKELDLLSCGYT -------QSSIRGAKPEDMED9 YGAVGIELRVID KVYRLFVKD- -----EYLNFIR--SGEKRIEVRVAYP---QFRNIRP---GDKIIFNDS- ---IPAVVTEVKKYE----TFRQVLREEPIKKIFP- -------DEPSFERAVKRFHN9 YGVIAIKFKLLG KVYRLYLRD- -----EYLEMVK--SGKKRIEVRVAYP---QLKGMKR---GDKIIFNDE- ---VPAEVIEVKHYE----TFRQVLREEPIDKIFP- -------DEPSFERALRRFHN9 YGVIAIKFKLLG ----MGLNH- -----SQFLLMQ--QGDKSVEIRLNDR---KRSFLKE---GSLITFIDLK1 DKKIEVIVKKIYKFK----TFCELYKSFTSTEVGSA -------TNDSLEKMVNDTYK9 FGVLAIRIRLIH VIMLMGLNH- -----DQFVLVQ--RGTKTIEIGLYDE---KRAQLKI---GHKILFTDLE1 NNQIMVSVKQLYKFT----TFADLYAQFNGAKVGSN -------STDNIEKIVNDTYE9 YGVLAIEMLLGD MKHAMGLFK- -----VPFESIK--AGRKTVEVRLNDA---KRRQVAV---GDTIEFTKLP2 EETLKVVVTKLRSYD----TFEAMYKEIPFEAFDC- -------EGWTMDEMLDGTYE9 WGALPIYVERM----MGLYE- -----EPFHSIQ--TGKKVIEIRLNDQ---KRQAIKV---DDLIEFKNLS1 GEQLTVRVTKRETFK----DFQSLYEKISLEAIDC- -------IGWSMPELLKSTYA9 FGALALTIEIIE MRYEMGLYN- -----KPFQSIQ--SGKKVYEVRLYDK---KRQLINK---GDEIVFTNLT1 KEMMAVKVTEIKRYE----SFKVMYEQIDKKLMDC- -------ENDSLEEMLESTYK9 WGTVAIGIEVIK VNFELHVQE- -----PYFTQLK--DGLKTVEGRCAVG---DYMRISS---GAFLLF-NKC ---LLLEVQDVHRYT----SFSEMLKVEGLSKVLP- -------GVESIEEGVQVYRN9 NGVVAIRVAKPA RVHHMGLEE- -----EYLNLIK--EGKKTVEGRVKDD---KRARIKP---GDKILF-NRR ---LLVKVIDVREYD----SFEEMLREEGLENVLP- -------NVDSIEEGVEIYRR9 FGVLAIEIEPIM MPDVWMFGDG1SEMGNRLGQLVV--SGRKTATCSSLDIYK-MEEEQLPKA-GQYDIILDGQ --SQPLAIIRTTKVEIMPMNKVSESFAQAEGE---- GDLTLDYWYEEHARFFKEELA9 MLLVCQSFEVVD IKHRIQFGCD ---TDELADKVL--SGEKTATSSLYDYSL-MNQEEIKV--NECASILDSQ --GKEKCIVKIERIEIVDFQDITEEFAVNEGD---- G--CLDNWIKIHTEYYSSLLE9 TKLVCEWFSVVS DYAVCSFGDN PRDAAELAALVL--AGTKRATASLARDYP---ADGLPRV-GDYVVVLDGD --GRPCCLWRTSEIQIKPMREVDAAFAWDEGE---- GDRSREDWLRGHRDYFSAQAR9 IAVVFERFRIVW SYQTRWFGQQ2PAEVTALADAIL--AGTKTATTTPLDSYT-AEQIAIPQV-GDYNVLLNGD --MKPVAILKTVVSELIPFYRISAEHAYHEGD---- GDRTIGDWRKRKTAEFTPTLE9 TPMVSEVFEVVY FYEPFHFHDN NQGALELATLVL--QGTKRGSASLVWTYE-HDNKRPPWP-GSLSIMTDWL --GTPLGIIETRTIEAVPFDDVTAEFAASEGE---- GDLSLRYWRQGHWDYFTRECR9 MLVFCERFDLVF HASFCTFGDS AALADHLATLIA--TGVKTASCGSLAGCI---EDNAFPMIGEYKIVENSR --GEPVCVIRVIGLHLLRFSDVTAELARKEGE---- GDLSLEYWRNEHRRFFQAEGS4 MDVIFEEYALID DALRWAFGDS SELADELLQLVR--EGHKTASCGSYHAFK---SEPSPQV-GDYNIILDGA --GQPSCVIRTRSLTLVRYCDVTAEMAAKEGE---- GDKSLAFWQQAHQEFFEREGS4 MLLVFEEFELVE QAFCWSFGDS PALADELAALVV--AGKKRGTCSSLVSYQ---KEQPPVTPGSYHIVLNGT --GDAVCVIRTLALRLIRFNEMSADLAALEGE---- GDLSMAYWQAAHRAFFEREGN4 MELVYEEFAVLE SLPRAEFGFP GPLRDRLVAAVL--DGSKTSTTGLVADYE-HEGEPLPEA-GRREVVVDSR --ERPVAVIEVTEVRVVPLAEVDLAHAVDEGE---- GDTSVAGWRAGHERFWHGAEM14TPVVLERFRIVT DLPKAEFMFP GPERDRLVKLIL--DGVKTATAALMIEYE-EEVEPLPCV-GAHSVLVDSD --ERPVAVLVTTAVDVIPLGKVTDRYAIDEGE---- GDVTAAQWRSAHESFWNSAEY14SLIVFEHFEVEQ DVERWAFGDT EQVADELLALVL--NGTKTATCAALD------DEGVPQA-GDIFVVVNGR --NQPACAVELTEVELKTFDQVDEAHALAEGE---- GDSTLAYWRKTQQRFFEEYDM4 MMLICMKFKVLE APQPWAFGAT PEHADDLLALVL--DGTKTGTASALCDIE-AEGEEVPRV-GEVSIILDGR --DSPRAVIETTAVVTVAFDEVTAEHAHAEGE---- GERTLAAWREIHERFW-RDYS9 MLVVCERFRLVF SFSSDYFCAD EHNANLCAELIL--RGEKRASCSLDYWYS-EKGEPMPVV-GHLQVVTNWD --GIPICIIEMTSVTKQKYSEVTPEFAALEGE---- GDKTLAWWREAHWNFFSKECV9 MLLVLEQFKVVY TYSSWYFGGD KNSANNLADLVM--KGTKRGTTSLYYFYE-LENDPLPKK-GDFSIITDYS --GTAQCVIETTKVTVLPFKEVTEELAFIEGE---- GDKSLDYWKRAHISFFTKELE9 MLVVFEEFKVVY FRWAWAFCNE7PELADKLLDLVL--EGKKSATASAVAEYG-E-DEPFPSVDGKFDILLDGK --GQPRAAITTSKVYVRNFFDVSAEHAFKEGE---- GDQSLDYWRKVHQDFWSDLKV4 MEVLCEEFEVLY

399\Zn cluster+X1+ASC-1\ ASC-1 549| | proper 480| | family 579/ | 132 | 135 | 132 | 113 | 112 | 110 | 110 | 180 / 102\Family 1 103| 98/ 99\Family 2 91| L32->L33->ASC-1 95| 94| 107| 284| L22+ASC-1 390| S3+ASC-1 197| cI HTH+ASC-1 99| 132| 98| 96| 96| 97| 95| 103| 96| 753| 672/ 109\Family 3 212| 109| 109| 113| 111| 109| 110| 110| 121| 113| 109| 113| 212| X2+ASC-1 342/ 150\Family 4 125| 150| 155| 147| 144| 132| 141| IclR HTH ->ASC-1 183| 143| 127| 147| TetR HTH->ASC-1 153| 156| 146|

Fig. 1. Continued

Predicted RNA-binding in ASC-1 coactivators

4
39 57 20 25 23 26 25 22 23 20 3 9 2 1 1 1 1 1 1 17 2 4 3 2 4 23 4 2 3 1 8 3 2 17 1 1 2 4 2 11 137 6 2 2 4 2 2 2 111 7 2 30 2 29 LQSAYQFGSSIDAWQFGAEIDAWAFGVEVDVWAFGDA HYEAWAFGNT KYSAFAFGWP KFEAWSFGNS LEGAYAYGAN PVEAWAFGAT EPESWAFGARYVELGPYLR RVQYLGRHIM KVVQIRKFIL -----MERIN ----MIRHLM -----MKHLE -----MKNLE MTYGISKHLE -----MRHLE GEGKEVKNLK KTVQIRKFIL KPVQIRKFML PNDITFFQRYSCITFFQRSTQITFFEFPTKMTFFSRPTKMTFFSRLTEITFFERSNKITFFTRMNNITFFSRNREITFFGRPNDITFYQRSKIVGVTYPI SFMKAVTFPV MEHVIALHQMQHVIALHQPRSALSIVKDAWALSIVAKHKTLSIVHSTHVLKTDPKTHRLKILPRIHQLKIAPTTHKIKLSHNRHDLKTWPKIHEVKLHATQHLLNCYPAYHQLKIKPTVHKLKILPTIHELKILPTIHTLKIETRTLVLPLKGANLQLAVKGANLQLAVKGANLQLAVKG.......... ....h.h... --DPDKLAQLVL--TGTKTATTSAYDLYE-T-DEPLPKV-GAYDVILDAH --NQPVCVTRTDQVMITPYLDIDATHAYLEGE---- GDRTYAYWRRVHDAFFKQEYQ10AQMVLERFHVVY --EPDQLAQLVL--KGVKTATASAYDLYK-VDNEPLPQK-DSYDVILDSQ --NQAICIIQIIKVSVVPFKEVSDEHAFKEGE---- GDRSLTWWQDSHRKIFTQWFK9 SLIVLEEFRCVY --EPDLLADLVF--KGEKTATASAYDLYV-LEDEPLPQV-GTFDVILDSQ --NQAVCIVEITKVSVKPFNRVSADHAYKEGE---- GNKTLVYWRQVHEDFFRDCLG9 SKVVLEEFRKVY PDHADGLAKLVL--EGKKTATSSSYRNYL-ADGDSLPER-GLHNIILNSY --GEAVAIIVTTDVEIVPYHEVTEEHAYLEGE---- GDLSLHYWQEVHEAFFSRELK9 IPVVCERFELVY EEMADELAELVL--AGKKTATSSNYTLYE-LEQESLPKV-GQHHVLLNGK --GEAVAVIMTTAVTVIPYNQVTEEHAFLEGE---- GDRSLAYWQDVHEPFFTQELE9 IPVLCEQFKVVY ESMQNELAELVK--SGIKTATTSAFELYE-P-NEEIPLI-GEYNIILDGR --GNPVCITQTKVVETVPFNLISQEHAFHEGE---- GDRSYDYWRKVHEDFFNKEFK9 APCICEVFELVF ARMANELGGLVM--DGIKTGTSSLFYWYD-QGGETMPSV-GSHVVLLDGN --EEAMGIIKLIGVTIMPFNEVPETFAYLEGE---- GDRTLEYWRKVHTSFFTNECV9 ALVVCEEFEVVY SQDADILSDLIN--RKIITATTS---VYD-H-NEDVPTV-GMYSIVLNGK --NEPVCVIQNKTVEIMPFKNVSAEHAYLEGE---- GDRSYEYWYKVHKKFFTWECK9 TEVVCETFIKVD QKDANALASLVD--KGIKTATTSAYELYG-K-DEKLPKV-GEWSIILDSN --AKPVCVIKDVCAEIISYNLISQEHAYHEGE---- GDRSYAYWRKVHDEFFTREYK9 APMVCKVFEKIK --EADRLADLVA--RGIKTSTSSAHALYA-VEGEEIPTA-GGYDIILDGQ --GKAVCIIQTTKVYVTPFSQVTEKHAYKEGEFRQG6KEKSLIHWRQVHEELFTIWLA9 MLVVCEEFELVY --FKRKYLEGIL--DGKKRVTVRYGI--------VRPR--FSLVYIVCCD ---HIYGEAIITKVYYTRLEKVGQDVIEAEGF---- G---------SREELVSELKE8 DTVSVIFFSLVR --IKGVYADKLL--EGRKTTTIRLGI--------VKPR--YREVIIHGHG ---RPLAKAEIVDVHVKKVSELTLEDARRDGF---- R---------SVRELIESLER9 DAVTIIHLRVKQ --LDNKYKSKVI--SGKKVTTIRFGKYE------AKP---GTEVYIVITP -SDTAIAKAKIKGIRTKKVKELTIEDAKLDGF---- S---------DVKELVRELSR8 DEVTIIEFEDVE --FDSEFVERII--NGEKITTVRRGIKS------YPV---GRIVELTANG ---ERFALAKVKKVVVKRVRELTDEDAIRDGF---- K---------SREELISALKR8 EFVTVVHFEVVK --LSQKYVKALL--DGRKRSTIRPGV--------LKV---ADRVYIHSMG ---KIVAIAEVEQVAYKRVSELTDEDAIIDGF---- N---------SRAELISYLKR8 AIVTIVKFRKVE --FKGKYAKALI--EGRKRLTIRKWTN-------LKE---GDEVLVHSGG ---KIIGKAKIKAIKKRHVSEITDEEARLDGF---- R---------NAEELMEEMKK4 GEVYVIHFDFEP --FKGEYKDLII--SGKKVATIRIGKLP------IKK---GQKLYIHSGG ---YVIGEAIVKDVKIIRLKDIDNDMAKKDGF---- E---------SKEELIAKLKE8 EPLTYIEFEFKP --ISGEYRDKLL--RGEKRATIRVGRVP-----GARP---GKVVYIHCGG ---YVYGKVRITNVRTKRVRDLTDEDANLDGF---- E---------NREELLKALRD8 DIVTIIEFEWVE --FDGRYAEDIL--RGKKRATVRLGRKP-----NLKE---GDTVLIHAGG ---YALGKAVIERVESKTVGELTDEDAFLDGF---- S---------SREELIRALKE8 SPAHVIVFRLIE --FDGRYKDDII--SGKKKATIRLGRKV-----NLKP---GEEVLIHAGG ---YVLGKARITRVTTKKVSELTDEDARKDGF---- K---------SREELLEALRE8 SPATIVEFEMLS --LDNKYKSKII--KGEKVTTIRFGKYE------AKP---GSEVYIVITP -SDTAIAKAKIKGIKTKKVKELTNEDARLDGF---- S---------DVKELVRELSR8 DEVTIIEFEDVK --IDSAYKSRIL--RGDKVTTIRYGDYE------AKP---GSEVYLVITP -SDTAVAKVRITKVEKKKVRELTNEDAKLDGF---- S---------DVKELLRELSK8 DEVTIIGFEVVK -----FQDDILA---GRKTITIRDESES-----HFKT---GDVLRVGRFE -DDGYFCTIEVTATS--TVTLDTLTEKHAEQENM-- -------TLTELKKVIAD---5 TQFYVIEFKCL-----LERSILS---GNKTATIRDKSDS-----HYLV---GQMLDACTHE -DNRKMCQIEILSIE--YVTFSELNRAHANAEGL-- -------PFLFMLKWIVRK--5 NDLFFISFRVVT -----LTPLVAS---GQKTITIRDKSES-----HYVP---GTRVEVFTLE -TQRKVCEIDILAVE--PLKFDEINEFHAEQEAI-- -------ELPKLKALIQE---5 DELYVITYQLAK -----FEADILA---GKKTITIRDESEK-----DYQP---GTTVEVSTLE -EGRVFCQLKILSVE--PIAFSELNEFHAEQENM-- -------TLATLKEVIQE---5 EQLYVIQYQRV-----FEADILA---GKKTITIRDESEK-----DYQP---GTTVEVSTLE -EGRVFCQLKILSVE--PIAFSALNEFHAEQENM-- -------TLETLKEVIQE---5 EQLYVIQYQRV-----FEHDILM---GKKTITLRNEAES-----HVIP---GQILPVSTFE -THRWFCDIQVLEVT--PITLSGLTTLHAQQENM-- -------TLAELRLVIAE---5 EQLYMIRFKVLT -----FEQDILA---GRKTITIRDKSES-----SFQP---NQILAVYTNE -TDRFFANIKVLSVT--PIHFEALSEAHAQQENM-- -------TLPELRQVIKE---5 DCFWVIAFELVD -----FEADILA---GKKNITIRDKSEA-----YFQP---QQELKVFTNE -TNLFFADIRVISVT--PIRFEQLNEQHAKQENM-- -------SLAELKQVIRE---5 NDFFVIEFELIE -----FEADILA---DRKTITIRDSSES-----DFRS---GEVLRVCRNE -DGVFFCHIKVKSVT--PVTLDGLSERHAEQENM-- -------SLDELKKVIKA---5 DRFYVIEFTRC-----FEADILA---GHKTISIRDDSES-----HFKA---GDILRVGRFE -DNQYFCNIEVLSVS--PITLDELTQPHAKQENM-- -------GLDELKEVIRG---5 IIFWVIQFSLKE ---PKRFMDRFFK-KGKDVFVKPATVW---KELKPGMK--FVFYQSHEDT ---GFVGEARIKRVV--LSENPMQFFETFGD----- -----------RVFLTKDE--17KKKKLWMAIELE ---PWEYLNRIF--EGKNVFVKPATL----K-VEEGMK--VIFYASRENQ ---GWHGEAEVERVE--HYTNVEEIIKKYGD----- -----------KLFLTPEE--14RRKRPWMVLVLR -----VYAELIF--RGLKTVELRKSRA-----FGEGDIV-FLYVARGNPY17TQRGTIAGGFEVGEV--IKADLETLWEMTKEAS--- ----------GLTLVHGEN--7 YIREYGYAFTIE -----VYGELIF--RGLKWHEIRRSRV-----FEEGDIV-FLYIARGDLY17TKRGTIAGGFEVGEV--IKADFETLWELTKDSS--- ----------GLSFVHGEEE-7 YINDYGYAFVIE -----PAVDDIV--AGRKQVEIRSWAP------PALPL--RDLVLVQNTI12IALALVDVVGVHDWT-------PD---EARAQ---- ------------GKQWCAGY-3 ELTNVRVIDPPF -----PSGTRIA--QGLKTLEVRSWRP------DVLPI--RDLLIIENDH11RAVAWIDIESVHPWQ-------AH---EVEAA---- -----------CASAWSEGY-3 VISHVRPLNETF -----PNGTRIA--NGEKIIEVRRWLP------PADLT--GDLLIVENKN11IPVAIVKIKSVRPFE-------EK---DIPAA---- -----------CATRWEPGY-3 ELTEVRKLNSTQ -----DAFDSVA--RGLKTVEIRRRDR-------DYRV--GDTLLLLRTR15GDALAVLVTHIQT---------GY----ELP----- --------------------- DALCVMSILPED -----EYFQSVI--EGRKKAELRVNDR-------DFSV--GDYLLLN---10GRQVAVAITDIT----------HV---DSLA----- --------------------- DNLVLLSFYDPS -----KYFNAVV--AGQKTAELRKDDR-------GYKV--GDVLSLC---8 GREWAAVISHVLPVN-------DV---MAVS----- --------------------- EQWVMLSIRPLT -----CFFDDVA--SGRKNFELRKNDR-------HYQV--GDCLCLC---8 GRKLMVAVTYKLQ---------DF---TGLS----- --------------------- DGYCILGIENVR -----KYFAAVR--AGQKRFEIRRNDR-------EFAV--GDILVLR---10GQFEERQITFLLSEE-------DY---GGVI----- --------------------- HGFVAIGFGDVP -----KYFDLVL--EGKKRAEFRKKDR-------NYER--GDTLILH---8 GRKVEARITDVT----------DL---SDWL----- --------------------- EDYVLLHRILGE -----EYFEAIM--DGTKTFECRYNDR-------DFKV--GDELLLR---9 YRCIVRKITYILS---------DF---IGLK----- --------------------- DGYVILGVTNEP -----EHLEAII--AGDKTFEIRKNDR-------DFKV--GDRVTLI---5 -RYLTIRIKYIT----------DY----AQQ----- --------------------- DGYVVFSFDWIE -----EFFEKKR--TLVKAFEIRKNDR-------NFMV--GDTLILQ---8 GREYWEDVVYIT----------DY----LQK----- --------------------- EGIVVMGTLPNE -----EYFEAVV--SGDKRFEIRKNDR-------NYQN--GDILRLN---8 GDVHVAEITYIT----------DY----AQQ----- --------------------- DGYVVLGIK-------EFFKAVK--ERRKTFEIRKNDR-------NFQV--GDILILE---8 DDECEAEVIYIT----------DY----AQR----- --------------------- EGYVVLGIELH-----EYFDAIK--AGLKPEEFRAATPYWRRR-LDGQS--FDQIELTRGY12PWKGYRLTTIIHP---------HF------------ --------------------- GADPVEVFAIDV -----EYFDAMI--RGEKTEEYRLCNDYWKKR-LVNRK--HDRLIITKGY12PYDGYEVKTITHP---------HF------------ --------------------- GDKPVKVYAIKV -----EYFDAMI--RREKTEEYRLCNDYWNKR-IMFRD--YDRLIITKGY12PYDGYEIKTITHP---------HF------------ --------------------- GDKPVKVFAIKV -----EYFDAMI--RGEKTEEYRLFNDYWNKR-IMFRE--YDRLIITKGY12PYDGYEIKTITHP---------HF------------ --------------------- GDKPVKVFAIKV ..........l....u.K..p.p........................... .................................... ..................... ............ ......hhpbhh..pGpKphphR.................spbh.h.... .....hs.hph..h........p................................... ..h.hb.hb.h. 163| 181| 144| 152| 150| GntR HTH->ASC-1 152| 152| R3H->ASC-1<-AraC-HTH 145| 149| 154/ 109\Family 5 116| 111| 103| 102| 98| 103| 109| 104| 125| 111| 113/ 103\Family 6 104| 105| 123| 104| 103| 104| 102| 108| 104/ 115\ 124/ 122\ 123/ 102\Family 7 104| RRM->ASC-1 102/ 98\Family 8 216| 86| 80| 84| 81| 81| 74| 78| 184| 82/ 89\Family 9 117| 89| 116/

L.M.Iyer et al.

lp_0072_Lpla_28377034 SMU.1636c_Smut_24378004 SP0796_Spne_15900689 OB0788_Oihe_23098243 ABC3755_Bcla_56965515 Ooen02001850_Ooen_48864737 lmo2852_Lmon_16804889 LBA1326_Laci_58337599 Lgas02000241_Lgas_23002250 Ssui801001519_Ssui_50590481 PAE0638_Pyae_18312063 APE0116_Aper_14600461 PH1656_Phor_14591425 AF1523_Aful_11499118 PAE2152_Pyae_18313136 AF1812_Aful_11499400 NEQ158_Nequ_41614954 MK1577_Mkan_20095013 TK0056_Tkod_57158315 PH0448_Phor_3256851 PAB2020_Paby_5457945 TK0687_Tkod_57158946 yqfB_Ecol_26249316(1TE7) ORF16_Mmar_33112505 VC1576_Vcho_9656085 VVA0554_Vvul_37676214 VV20047_Vvul_33112482 SO1922_Sone_24373486 tex_Pmul_15603312 Aple02000450_Aple_32033862 y2529_Ypes_22126411 HI1394_Hinf_16273304 AF2351_Aful_11499928 PH0907_Phor_14590760 TK1629_Tkod_57159888 PH0948_Phor_14590800 Bcepa03005180_Bcep_46312874 ACIAD0497_Asp._50083736 Bd1457_Bbac_42522970 Rgel02003094_Rgel_47572806 YPTB3135_Ypse_51597446 ORFd_Ecol_145149 SMU.194c_Smut_24376571 CC0765_Ccre_16125018 orf91_BPP2_3139126 gbs1152_Saga_25011203 VV10070_Vvul_27359672 lin1741_Linn_16800809 EF2118_Efae_29376627 EF0509_Efae_29375137 ND046_Psp._38638593 c1523_Ecol_26247392 HK022p32_BPHK022_19343381 ORFb42_BPStx1_32128099 Consensus/90% Consensus/70%

Fig. 1. Multiple alignment of members of the ASCH superfamily. Proteins are shown with their gene name, species abbreviations and genbank ID (gi) numbers separated by underscores. The pdb codes of proteins with X-ray crystal or NMR structures are shown in brackets after the gi number. Columns in the alignment are colored based on the residue conservation profile at 90 and 70% consensus. Sample operons and domain architectures of interest are shown to the right of the alignment. The domains in the architectures are separated by a + symbol, whereas genes in operons are separated by > symbol with the > pointing from the 50 to the 30 directions of the coding sequence. X1 and X2 refer to uncharacterized domains, which were found fused with certain ASCH domains. The Pfam domains of unknown function, DUF437, DUF1530 and DUF984, include some of the representatives, respectively, from families 1, 3 and 4 defined by us. The consensus for residue conservation and the coloring scheme are as follows: h, hydrophobic residues (ACFILMVWY), shaded yellow; b, big residues (LIYERFQKMW), shaded gray; s, small residues (AGSVCDN) colored green; p, polar residues (STEDKRNQHC) colored magenta. The lysine residue that is characteristic of the ASCH superfamily is shaded red. Species abbreviations are as follows: Aful, Archaeoglobus fulgidus; Ana, Nostoc sp.; Aper, Aeropyrum pernix; Aple, Actinobacillus pleuropneumoniae; Asp., Acinetobacter sp.; Atha, Arabidopsis thaliana; BP315.4, Streptococcus pyogenes phage 315.4; BPChe9c, Mycobacteriophage Che9c; BPHK022, Enterobacteria phage HK022; BPP2, Enterobacteria phage P2; Bpbacteriophage, Stx1 converting bacteriophage; BPpsiM2, Methanobacterium phage psiM2; Bant, Bacillus anthracis; Bbac, Bdellovibrio bacteriovorus; Bcep, Burkholderia cepacia; Bcla, Bacillus clausii; Bhal, Bacillus halodurans; Blin, Brevibacterium linens; Blon, Bifidobacterium longum; Bmel, Brucella melitensis; Bthe, Bacteroides thetaiotaomicron; Cace, Clostridium acetobutylicum; Ccre, Caulobacter crescentus; Cele, Caenorhabditis elegans; Dhaf, Desulfitobacterium hafniense; Dmel, Drosophila melanogaster; Ecar, Erwinia carotovora; Ecol, Escherichia coli; Efae, Enterococcus faecalis; Exsp, Exiguobacterium sp.; Hinf, Haemophilus influenzae; Hsap, Homo sapiens; Laci, Lactobacillus acidophilus; Lgas, Lactobacillus gasseri; Linn, Listeria innocua; Lint, Leptospira interrogans; Ljoh, Lactobacillus johnsonii; Llac, Lactococcus lactis; Lmes, Leuconostoc mesenteroides; Lmon, Listeria monocytogenes; Lpla, Lactobacillus plantarum; Mbur, Methanococcoides burtonii; Mflo, Mesoplasma florum; Mgri, Magnaporthe grisea; Mjan, Methanocaldococcus jannaschii; Mkan, Methanopyrus kandleri; Mmar, Methanococcus maripaludis; Mmar, Moritella marina; Mpen, Mycoplasma penetrans; Mthe, Methanothermobacter thermautotrophicus; Ncra, Neurospora crassa; Nequ, Nanoarchaeum equitans; Oihe, Oceanobacillus iheyensis; Ooen, Oenococcus oeni; Osat, Oryza sativa; Paby, Pyrococcus abyssi; Pfur, Pyrococcus furiosus; Phor, Pyrococcus horikoshii; Plum, Photorhabdus luminescens; Pmul, Pasteurella multocida; Psp., Pseudomonas sp.; Pyae, Pyrobaculum aerophilum; Rgel, Rubrivivax gelatinosus; Saga, Streptococcus agalactiae; Scoe, Streptomyces coelicolor; Sent, Salmonella enterica; Smut, Streptococcus mutans; Sone, Shewanella oneidensis; Spne, Streptococcus pneumoniae; Ssui, Streptococcus suis; Tbru, Trypanosoma brucei; Tery, Trichodesmium erythraeum; Tkod, Thermococcus kodakaraensis; Tthe, Thermus thermophilus; Upar, Ureaplasma parvum; Vcho, Vibrio cholerae; Vvul, Vibrio vulnificus; Ypes, Yersinia pestis; Ypse, Yersinia pseudotuberculosis; Zmob, Zymomonas mobilis.

Predicted RNA-binding in ASC-1 coactivators

Fig. 2. Structures of members and domain architectures of the ASCH and PUA superfamilies. Cartoon representations of X-ray and NMR structures of the ASCH and the PUA superfamilies are mapped on a tree showing the inferred higher order relationships between the two superfamilies. The clustering was derived using distances derived from pairwise DALI Z-scores. Each structure is labeled with its Protein Data Bank (PDB) identifier. Conserved beta-strands are shown in light blue while the characteristic conserved a helix is shown in red. Variable helical inserts located between strand-4 and strand-5 are colored tan. The structures are shown with strand-2 vertical and approximately central to the depiction. S1 and S5, which are the first and the last strands of the PUA-ASCH fold, are labeled. This view placed the RNA binding cleft in between the conserved helix and strand-2. Key conserved residues lining this cleft in the ASCH superfamily are shown in the ball and stick format. Domain architectures of members of the ASCH superfamily and those of a subset of the PUA superfamily are shown as cartoon representations in the top right and top left panels, respectively. Proteins are labeled as in Figure 1.

Predicted functions of members of ASCH superfamily


In order to obtain functional insights regarding members of the ASCH superfamily, we used the combined evidence gleaned from different forms of contextual connections, namely physical interactions, gene fusions and conserved operons. In different Gram-positive bacteria such as Mycoplasma, Ureaplasma and Lactococcus lactis, members of the ASCH superfamily are embedded or associated with the ribosomal protein operon (Fig. 1). Specically, in Mycoplasma penetrans the ASCH domain is fused to the ribosomal protein S3, whereas in Ureaplasma parvum it is fused to ribosomal protein L22 (Fig. 2). Other members of the ASCH family are also found tightly linked with genes encoding RNA-binding proteins with RRM (e.g. in Acinetobacter, gene ACIAD0497) or R3H (e.g. Listeria, gene lmo2852) domains, implying that they are cotranscribed and probably functionally cooperate. These associations with ribosomal and RNA-metabolism proteins are consistent with the physical interactions of the vertebrate ASC-1 with proteins involved in RNA processing and the potential requirement for RNAprotein interactions for transcriptional coactivation by the ASC-1 containing complex (Jung et al., 2002). A study of the available structures of four distinct members of the ASCH superfamily indicates that they contain a prominent cleft, whose scaffold is formed by the conserved helix and the downstream strand-2 (Figs 2 and 3). The above-described conserved residues of the ASCH superfamily, like the lysine from the GXK motif, and other polar residues associated with strand-2, line this cleft forming a positively charged surface (Fig. 3). A similarly positioned cleft has been observed in the structures of the PUA domain found in the Archaeosine tRNA-guanine transglycosylase, Pseudouridine

synthase II TruB and the predicted RNA methylase (Hoang and Ferre-DAmare, 2001; Ishitani et al., 2002; Pan et al., 2003), and is likely to form its RNA-binding surface. Taken together the above observations suggest that the ASCH domains are likely to possess RNA-binding activity. Over the past few years a number of studies have shown that coactivator complexes are often bi-functional proteins that not only coactivate transcription mediated by specic transcription factors, like nuclear hormone receptors, but also participate in pre-mRNA processing (Auboeuf et al., 2004; Dowhan et al., 2005; Maniatis and Reed, 2002) and regulation of splicing. Furthermore, a regulatory pseudouridylated RNA termed the steroid receptor coactivator RNA (SRA), together with specic RNA-binding proteins with which it interacts, have been shown to be a part of coactivator complexes that couple nuclear hormone receptors to the basal transcription machinery (Lanz et al., 1999; Shi et al., 2001; Zhao et al., 2004). Given these observations, it is likely that the ASCH domain mediates some of the interactions between RNA and the ASC-1 coactivator complex. Its RNA partner could either be the premRNA generated from the transcription of its target genes or a regulatory RNA like SRA. The association with the ribosomal proteins might indicate that some of the prokaryotic versions might be involved in translational regulation. The prokaryotic and phage ASCH domains, with a few exceptions, occur as standalone versions (Fig. 1), which are encoded by genes in predicted cotranscribed arrays containing a wide variety of other genes. In several of these cases they are found adjacent to a gene encoding a helixturnhelix protein, which is the transcriptional regulator of the predicted operon (Fig. 1).

L.M.Iyer et al.

Fig. 3. Molecular surfaces of observed binding cleft in ASCH superfamily. X-ray structure of Family 1 member of the ASCH superfamily (PDB: 1WK2) is depicted in four different ways. In A, B and C the protein is oriented to expose the potential binding fold, located between the helix and strand 2. In the top left (A), the predicted three-dimensional surface of the protein is shown with the conserved residues lining the binding cleft of family 1 colored in red while other surfaces are colored in blue. On the top right (B), cartoons indicating secondary structure features are shown against the transparent outline of the predicted molecular surface of the protein colored in dark blue. Again, the surfaces of the residues lining the putative binding cleft of this family are colored in red (D20, G21, R22, K23, E26, R28, R29) and the most highly conserved residues found along the cleft are rendered as ball and sticks and are colored in green (G21, K23, E26). Betastrands rendered as cartoons in the protein are colored yellow, the conserved helix is colored red, and coil regions are colored gray. In the bottom left (C) and right (D) the front and back views of the predicted molecular surface are shown. Surfaces of residues are colored according to consensus conservation across the entire ASCH superfamily; red denotes positions with at least 90% conservation, while yellow denotes positions with at least 70% conservation. The residue conservation was calculated using the residue grouping as indicated in the consensus shown in Figure 1. Panels A, B and C represent the molecule in the same orientation, while it is rotated by 180 around the Z-axis in panel D. The scale bar in-between surfaces A and B represents the approximate width of the core of a nucleotide in single-stranded RNA.

In Brucella an ASCH domain is fused to a cI-like HTH domain within the same polypeptide (Fig. 2). These associations suggest that solo ASCH proteins of prokaryotes functionally cooperate with transcription regulators, probably by binding the transcripts generated from particular operons, and thereby regulate their expression.

Evolutionary diversity of ASCH domains and general conclusions


The ASCH superfamily encompasses considerable diversity and can be subdivided into several families that are unied by specic sequence signatures. The ASC-1 proper family is typied by a unique insert between strand-3 and strand-4. It is present in animals (two paralogous versions, with and without a fusion to the Znchelating domain are seen in vertebrates, respectively, typied by human ASC-1 and LOC541578; Fig. 1), plants and trypanosomes among the eukaryotes and in certain cyanobacteria, actinobacteria and their phages, Burkholderia and the archaeon Methanococcoides. The two copies in the vertebrates appear to have emerged from a relatively recent duplication in the common ancestor of the extant vertebrates with sequenced genomes. Related

to the ASC-1 family is family 1 typied by the Thermus protein TTC1891 (termed DUF437 in PFAM) that is present in Thermus, Pyrococcus and Archaeoglobus. Family 2 (typied by the standalone ASCH domain protein Zymomonas protein ZM00922) is predominantly found in bacteria and archaea, with isolated eukaryotic representatives from the lamentous fungi such as Neurospora and Magnaporthe (Fig. 1). Likewise sporadic eukaryotic representatives from plants are seen in the otherwise prokaryotic family typied by the Pyrococcus protein PH0447 protein (family 3). All the other families of ASCH domains, such as families 4 (DUF984, e.g. EF3133), 5, 6, 7, 8 and 9 are restricted to prokaryotes and their phages. This phyletic pattern of the ASCH superfamily suggests that it diversied in the prokaryotes followed by multiple lateral transfers to the eukaryotes. The Zn-chelating domain and a predicted globular segment immediately downstream of it (Fig. 2) in ASC-1 are conserved in all eukaryotes, and occur as a standalone unit independent of the ASCH domain in basal eukaryotes like Giardia (Supplementary data). Hence, the transfer of the ASCH domain from prokaryotes that gave rise to eukaryotic ASC-1 appears to have happened after the divergence of the basal eukaryotic lineages like Giardia, followed by a fusion to the

Predicted RNA-binding in ASC-1 coactivators

above-mentioned standalone unit. This was followed by losses of the ASCH domain in crown group eukaryotes, such as in the fungi. In addition to the emergence of ASC-1, there appear to have been independent sporadic transfers of other prokaryotic ASCH family members to specic lineages of crown group eukaryotes (Fig. 1). In terms of phyletic patterns, the PUA domains can be condently traced back to the LUCA of all cellular life forms. The ancient versions of the PUA domain include those fused to key RNA metabolism enzymes such as the pseudouridine synthetase, which are conserved in all the three superkingdoms of life (Anantharaman et al., 2002; Hoang and Ferre-DAmare, 2001). In the case of the ASCH domain no single family is conserved across the three superkingdoms of life, making it unclear whether it was present in LUCA. However, its broad phyletic range in the prokaryotes suggests that the ASCH domain emerged very early in the evolution of the prokaryotic superkingdoms. It is however not universally represented in all prokaryotic genomes and has been lost in some eukaryotes such as the fungi. This suggests that they are likely to belong to the more easily dispensable regulatory apparatus rather than the core aspects of RNA metabolism. No ASCH domain occurs as multiple repeats in the same polypeptide unlike many other RNA binding domains such as the KH or the RRM domains. This suggests that it is likely to form single isolated contacts with specic features on RNA rather than extended multi-site contact with long RNA molecules. Furthermore, unlike the structurally similar PUA domains, which typically occur in multi-domain proteins fused to other RNA modifying or interacting domains (Anantharaman et al., 2002; Aravind and Koonin, 1999; Forouhar et al., 2003), the ASCH domains typically occur as the sole globular domain in the polypeptide (Figs 1 and 2). The conserved residues on the surface of the predicted cleft are also distinct in the PUA and ASCH superfamilies, suggesting that they bind very different types of target RNAs. The PUA domain appears to have mainly colonized core functional niches related to rRNA and tRNA modication, while the ASCH domains appear to have to been recruited to a distinct set of functional niches, including transcription coactivation and regulation of translation. Thus, the ASCH and PUA domains appear to have emerged from a common RNA-binding precursor and subsequently diversied to perform distinct functional roles, probably as a result of the diversication of their binding clefts.

ACKNOWLEDGEMENTS
The authors gratefully acknowledge the Intramural research program of the National Library of Medicine, National Institutes of Health, USA for funding their research. Conflict of interest: none declared.

REFERENCES
Please cite Anantharaman et al. 2002a and 2002b accordingly in the text.

Altschul,S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402. Anantharaman,V. et al. (2002a) Comparative genomics and evolution of proteins involved in RNA metabolism. Nucleic Acids Res., 30, 14271464. Anantharaman,V. et al. (2002b) SPOUT: a class of methyltransferases that includes spoU and trmD RNA methylase superfamilies, and novel superfamilies of predicted prokaryotic RNA methylases. J. Mol. Microbiol. Biotechnol., 4, 7175.

Aravind,L. and Koonin,E.V. (1999) Novel predicted RNA-binding domains associated with the translation machinery. J. Mol. Evol., 48, 291302. Auboeuf,D. et al. (2004) CoAA, a nuclear receptor coactivator protein at the interface of transcriptional coactivation and RNA splicing. Mol. Cell. Biol., 24, 442453. Bateman,A. et al. (2004) The Pfam protein families database. Nucleic Acids Res., 32, D138D141. Cerutti,L. et al. (2000) Domains in gene silencing and cell differentiation proteins: the novel PAZ domain and redenition of the Piwi domain. Trends Biochem Sci., 25, 481482. Clissold,P.M. and Ponting,C.P. (2000) PIN domains in nonsense-mediated mRNA decay and RNAi. Curr. Biol., 10, 888890. Cuff,J.A. and Barton,G.J. (2000) Application of multiple sequence alignment proles to improve protein secondary structure prediction. Proteins, 40, 502511. Cuff,J.A. et al. (1998) JPred: a consensus secondary structure prediction server. Bioinformatics, 14, 892893. Delano,W.L. (2002) The PyMOL Molecular Graphics System. DeLano Scientic, San Carlos, CA, USA. Dowhan,D.H. et al. (2005) Steroid hormone receptor coactivation and alternative RNA splicing by U2AF65-related proteins CAPERalpha and CAPERbeta. Mol. Cell, 17, 429439. Eddy,S.R. (1998) Prole hidden Markov models. Bioinformatics, 14, 755763. Fatica,A. et al. (2004) PIN domain of Nob1p is required for D-site cleavage in 20S pre-rRNA. RNA, 10, 16981701. Forouhar,F. et al. (2003) Functional assignment based on structural analysis: crystal structure of the yggJ protein (HI0303) of Haemophilus inuenzae reveals an RNA methyltransferase with a deep trefoil knot. Proteins, 53, 329332. Hoang,C. and Ferre-DAmare,A.R. (2001) Cocrystal structure of a tRNA Psi55 pseudouridine synthase: nucleotide ipping by an RNA-modifying enzyme. Cell, 107, 929939. Holm,L. and Sander,C. (1995) Dali: a network tool for protein structure comparison. Trends Biochem Sci., 20, 478480. Ishitani,R. et al. (2002) Crystal structure of archaeosine tRNA-guanine transglycosylase. J. Mol. Biol., 318, 665677. Jung,D.J. et al. (2002) Novel transcription coactivator complex containing activating signal cointegrator 1. Mol. Cell. Biol., 22, 52035211. Kim,H.J. et al. (1999) Activating signal cointegrator 1, a novel transcription coactivator of nuclear receptors, and its cytosolic localization under conditions of serum deprivation. Mol. Cell. Biol., 19, 63236332. Koonin,E.V. and Mushegian,A.R. (1996) Complete genome sequences of cellular life forms: glimpses of theoretical evolutionary genomics. Curr. Opin. Genet. Dev., 6, 757762. Korber,P. et al. (1999) A new heat shock protein that binds nucleic acids. J. Biol. Chem., 274, 249256. Lanz,R.B. et al. (1999) A steroid receptor coactivator, SRA, functions as an RNA and is present in an SRC-1 complex. Cell, 97, 1727. Maniatis,T. and Reed,R. (2002) An extensive network of coupling among gene expression machines. Nature, 416, 499506. Mazumder,R. et al. (2002) Detection of novel members, structure-function analysis and evolutionary classication of the 2H phosphoesterase superfamily. Nucleic Acids Res., 30, 52295243. Notredame,C. et al. (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol., 302, 205217. Pan,H. et al. (2003) Structure of tRNA pseudouridine synthase TruB and its RNA complex: RNA recognition through a combination of rigid docking and induced t. Proc. Natl Acad. Sci. USA, 100, 1264812653. Reid,R. et al. (1999) Exposition of a family of RNA m(5)C methyltransferases from searching genomic and proteomic sequences. Nucleic Acids Res., 27, 31383145. Rost,B. et al. (1994) PHDan automatic mail server for protein secondary structure prediction. Comput. Appl. Biosci., 10, 5360. Schaffer,A.A. et al. (1999) IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specic score matrices. Bioinformatics, 15, 10001011. Schaffer,A.A. et al. (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other renements. Nucleic Acids Res., 29, 29943005. Shi,Y. et al. (2001) Sharp, an inducible cofactor that integrates nuclear receptor repression and activation. Genes Dev., 15, 11401151. Zhao,X. et al. (2004) Regulation of nuclear receptor activity by a pseudouridine synthase through posttranscriptional modication of steroid receptor RNA activator. Mol. Cell, 15, 549558.

Das könnte Ihnen auch gefallen