Dr Aleksandr Sahakyan

Research Area: Bioinformatics & Stats (inc. Modelling and Computational Biology)
Technology Exchange: Bioinformatics, Chromosome mapping, Computational biology, Human genetics, Magnetic resonance imaging & spectroscopy, Protein interaction, SNP typing and Statistical genetics
Scientific Themes: Bioinformatics, Statistics & Computational Biology and Genes, Genetics, Epigenetics & Genomics
Keywords: computational biology, machine learning, data integration, molecular medicine, genome, transcriptome and proteome
Web Links:

Our research targets genomics through the development of highly quantitative methods for describing the structure and dynamics of (epi)genome, gene regulatory pathways, involved macromolecules and their interaction networks. We are interested in combining advanced machine learning, computational biology, computational chemistry and experimental biophysical and sequencing techniques to reach a new level of precision in structural systems biology at both genome and proteome levels.

There are no collaborations listed for this principal investigator.

Sahakyan AB, Chambers VS, Marsico G, Santner T, Di Antonio M, Balasubramanian S. 2017. Machine learning model for sequence-driven DNA G-quadruplex formation. Sci Rep, 7 (1), pp. 14535. | Show Abstract | Read more

We describe a sequence-based computational model to predict DNA G-quadruplex (G4) formation. The model was developed using large-scale machine learning from an extensive experimental G4-formation dataset, recently obtained for the human genome via G4-seq methodology. Our model differentiates many widely accepted putative quadruplex sequences that do not actually form stable genomic G4 structures, correctly assessing the G4 folding potential of over 700,000 such sequences in the human genome. Moreover, our approach reveals the relative importance of sequence-based features coming from both within the G4 motifs and their flanking regions. The developed model can be applied to any DNA sequence or genome to characterise sequence-driven intramolecular G4 formation propensities.

Sahakyan AB, Murat P, Mayer C, Balasubramanian S. 2017. G-quadruplex structures within the 3' UTR of LINE-1 elements stimulate retrotransposition. Nat Struct Mol Biol, 24 (3), pp. 243-247. | Show Abstract | Read more

Long interspersed nuclear elements (LINEs) are ubiquitous transposable elements in higher eukaryotes that have a significant role in shaping genomes, owing to their abundance. Here we report that guanine-rich sequences in the 3' untranslated regions (UTRs) of hominoid-specific LINE-1 elements are coupled with retrotransposon speciation and contribute to retrotransposition through the formation of G-quadruplex (G4) structures. We demonstrate that stabilization of the G4 motif of a human-specific LINE-1 element by small-molecule ligands stimulates retrotransposition.

Sahakyan AB, Balasubramanian S. 2017. Single genome retrieval of context-dependent variability in mutation rates for human germline. BMC Genomics, 18 (1), pp. 81. | Show Abstract | Read more

BACKGROUND: Accurate knowledge of the core components of substitution rates is of vital importance to understand genome evolution and dynamics. By performing a single-genome and direct analysis of 39,894 retrotransposon remnants, we reveal sequence context-dependent germline nucleotide substitution rates for the human genome. RESULTS: The rates are characterised through rate constants in a time-domain, and are made available through a dedicated program (Trek) and a stand-alone database. Due to the nature of the method design and the imposed stringency criteria, we expect our rate constants to be good estimates for the rates of spontaneous mutations. Benefiting from such data, we study the short-range nucleotide (up to 7-mer) organisation and the germline basal substitution propensity (BSP) profile of the human genome; characterise novel, CpG-independent, substitution prone and resistant motifs; confirm a decreased tendency of moieties with low BSP to undergo somatic mutations in a number of cancer types; and, produce a Trek-based estimate of the overall mutation rate in human. CONCLUSIONS: The extended set of rate constants we report may enrich our resources and help advance our understanding of genome dynamics and evolution, with possible implications for the role of spontaneous mutations in the emergence of pathological genotypes and neutral evolution of proteomes.

Kwok CK, Marsico G, Sahakyan AB, Chambers VS, Balasubramanian S. 2016. rG4-seq reveals widespread formation of G-quadruplex structures in the human transcriptome. Nat Methods, 13 (10), pp. 841-844. | Show Abstract | Read more

We introduce RNA G-quadruplex sequencing (rG4-seq), a transcriptome-wide RNA G-quadruplex (rG4) profiling method that couples rG4-mediated reverse transcriptase stalling with next-generation sequencing. Using rG4-seq on polyadenylated-enriched HeLa RNA, we generated a global in vitro map of thousands of canonical and noncanonical rG4 structures. We characterize rG4 formation relative to cytosine content and alternative RNA structure stability, uncover rG4-dependent differences in RNA folding and show evolutionarily conserved enrichment in transcripts mediating RNA processing and stability.

Sahakyan AB, Balasubramanian S. 2016. Long genes and genes with multiple splice variants are enriched in pathways linked to cancer and other multigenic diseases. BMC Genomics, 17 (1), pp. 225. | Show Abstract | Read more

BACKGROUND: The role of random mutations and genetic errors in defining the etiology of cancer and other multigenic diseases has recently received much attention. With the view that complex genes should be particularly vulnerable to such events, here we explore the link between the simple properties of the human genes, such as transcript length, number of splice variants, exon/intron composition, and their involvement in the pathways linked to cancer and other multigenic diseases. RESULTS: We reveal a substantial enrichment of cancer pathways with long genes and genes that have multiple splice variants. Although the latter two factors are interdependent, we show that the overall gene length and splicing complexity increase in cancer pathways in a partially decoupled manner. Our systematic survey for the pathways enriched with top lengthy genes and with genes that have multiple splice variants reveal, along with cancer pathways, the pathways involved in various neuronal processes, cardiomyopathies and type II diabetes. We outline a correlation between the gene length and the number of somatic mutations. CONCLUSIONS: Our work is a step forward in the assessment of the role of simple gene characteristics in cancer and a wider range of multigenic diseases. We demonstrate a significant accumulation of long genes and genes with multiple splice variants in pathways of multigenic diseases that have already been associated with de novo mutations. Unlike the cancer pathways, we note that the pathways of neuronal processes, cardiomyopathies and type II diabetes contain genes long enough for topoisomerase-dependent gene expression to also be a potential contributing factor in the emergence of pathologies, should topoisomerases become impaired.

Kwok CK, Sahakyan AB, Balasubramanian S. 2016. Structural Analysis using SHALiPE to Reveal RNA G-Quadruplex Formation in Human Precursor MicroRNA. Angew Chem Int Ed Engl, 55 (31), pp. 8958-8961. | Show Abstract | Read more

RNA G-quadruplex (rG4) structures are of fundamental importance to biology. A novel approach is introduced to detect and structurally map rG4s at single-nucleotide resolution in RNAs. The approach, denoted SHALiPE, couples selective 2'-hydroxyl acylation with lithium ion-based primer extension, and identifies characteristic structural fingerprints for rG4 mapping. We apply SHALiPE to interrogate the human precursor microRNA 149, and reveal the formation of an rG4 structure in this non-coding RNA. Additional analyses support the SHALiPE results and uncover that this rG4 has a parallel topology, is thermally stable, and is conserved in mammals. An in vitro Dicer assay shows that this rG4 inhibits Dicer processing, supporting the potential role of rG4 structures in microRNA maturation and post-transcriptional regulation of mRNAs.

Hardisty RE, Kawasaki F, Sahakyan AB, Balasubramanian S. 2015. Selective Chemical Labeling of Natural T Modifications in DNA. J Am Chem Soc, 137 (29), pp. 9270-9272. | Show Abstract | Read more

We present a chemical method to selectively tag and enrich thymine modifications, 5-formyluracil (5-fU) and 5-hydroxymethyluracil (5-hmU), found naturally in DNA. Inherent reactivity differences have enabled us to tag 5-fU chemoselectively over its C modification counterpart, 5-formylcytosine (5-fC). We rationalized the enhanced reactivity of 5-fU compared to 5-fC via ab initio quantum mechanical calculations. We exploited this chemical tagging reaction to provide proof of concept for the enrichment of 5-fU containing DNA from a pool that contains 5-fC or no modification. We further demonstrate that 5-hmU can be chemically oxidized to 5-fU, providing a strategy for the enrichment of 5-hmU. These methods will enable the mapping of 5-fU and 5-hmU in genomic DNA, to provide insights into their functional role and dynamics in biology.

Shahkhatuni AA, Shahkhatuni AG, Minasyan NS, Panosyan HA, Sahakyan AB. 2015. Revealing the specific solute–solvent interactions via the measurements of the NMR spin–spin coupling constants Journal of Molecular Structure, 1083 pp. 175-178. | Show Abstract | Read more

© 2014 Elsevier B.V. The solvent induced changes of one-bond spin-spin coupling constants (SSCCs) are investigated for a set of substituted methanes in solvents with various ε dielectric constants. Solute-solvent systems with varying types of ε-dependences for the solute SSCCs are outlined. Aliphatic hydrocarbon solvents and their halogen-substituted derivatives comprise the subset, where the SSCC is linearly dependent on the solvent reaction field, f(ε) = 2(ε - 1)/(2ε + 1), hence indicating the absence of specific solute-solvent interactions. In such solvents, SSCCs depend only on bulk dielectric properties of the medium, and, the magnitudes of the solvent sensitivities of SSCCs are fully determined by the initial values of "pure" SSCCs that correspond to the isolated solute molecules. The solvents involved in the second subset have a relatively chaotic distribution of the SSCC/f(ε) relationship, with possible groupings by their chemical nature. There, the conventional linear SSCC/f(ε) dependence is perturbed by additional interactions, such as hydrogen bonding, specific association processes, lone electron pairs, and conjugation.

Camilloni C, Sahakyan AB, Holliday MJ, Isern NG, Zhang F, Eisenmesser EZ, Vendruscolo M. 2014. Cyclophilin A catalyzes proline isomerization by an electrostatic handle mechanism. Proc Natl Acad Sci U S A, 111 (28), pp. 10203-10208. | Show Abstract | Read more

Proline isomerization is a ubiquitous process that plays a key role in the folding of proteins and in the regulation of their functions. Different families of enzymes, known as "peptidyl-prolyl isomerases" (PPIases), catalyze this reaction, which involves the interconversion between the cis and trans isomers of the N-terminal amide bond of the amino acid proline. However, complete descriptions of the mechanisms by which these enzymes function have remained elusive. We show here that cyclophilin A, one of the most common PPIases, provides a catalytic environment that acts on the substrate through an electrostatic handle mechanism. In this mechanism, the electrostatic field in the catalytic site turns the electric dipole associated with the carbonyl group of the amino acid preceding the proline in the substrate, thus causing the rotation of the peptide bond between the two residues. We identified this mechanism using a combination of NMR measurements, molecular dynamics simulations, and density functional theory calculations to simultaneously determine the cis-bound and trans-bound conformations of cyclophilin A and its substrate as the enzymatic reaction takes place. We anticipate that this approach will be helpful in elucidating whether the electrostatic handle mechanism that we describe here is common to other PPIases and, more generally, in characterizing other enzymatic processes.

Fu B, Sahakyan AB, Camilloni C, Tartaglia GG, Paci E, Caflisch A, Vendruscolo M, Cavalli A. 2014. ALMOST: an all atom molecular simulation toolkit for protein structure determination. J Comput Chem, 35 (14), pp. 1101-1105. | Show Abstract | Read more

Almost (all atom molecular simulation toolkit) is an open source computational package for structure determination and analysis of complex molecular systems including proteins, and nucleic acids. Almost has been designed with two primary goals: to provide tools for molecular structure determination using various types of experimental measurements as conformational restraints, and to provide methods for the analysis and assessment of structural and dynamical properties of complex molecular systems. The methods incorporated in Almost include the determination of structural and dynamical features of proteins using distance restraints derived from nuclear Overhauser effect measurements, orientational restraints obtained from residual dipolar couplings and the structural restraints from chemical shifts. Here, we present the first public release of Almost, highlight the key aspects of its computational design and discuss the main features currently implemented. Almost is available for the most common Unix-based operating systems, including Linux and Mac OS X. Almost is distributed free of charge under the GNU Public License, and is available both as a source code and as a binary executable from the project web site at http://www.open-almost.org. Interested users can follow and contribute to the further development of Almost on http://sourceforge.net/projects/almost.

Kannan A, Camilloni C, Sahakyan AB, Cavalli A, Vendruscolo M. 2014. A conformational ensemble derived using NMR methyl chemical shifts reveals a mechanical clamping transition that gates the binding of the HU protein to DNA. J Am Chem Soc, 136 (6), pp. 2204-2207. | Show Abstract | Read more

Recent improvements in the accuracy of structure-based methods for the prediction of nuclear magnetic resonance chemical shifts have inspired numerous approaches for determining the secondary and tertiary structures of proteins. Such advances also suggest the possibility of using chemical shifts to characterize the conformational fluctuations of these molecules. Here we describe a method of using methyl chemical shifts as restraints in replica-averaged molecular dynamics (MD) simulations, which enables us to determine the conformational ensemble of the HU dimer and characterize the range of motions accessible to its flexible β-arms. Our analysis suggests that the bending action of HU on DNA is mediated by a mechanical clamping mechanism, in which metastable structural intermediates sampled during the hinge motions of the β-arms in the free state are presculpted to bind DNA. These results illustrate that using side-chain chemical shift data in conjunction with MD simulations can provide quantitative information about the free energy landscapes of proteins and yield detailed insights into their functional mechanisms.

Suardíaz R, Sahakyan AB, Vendruscolo M. 2013. A geometrical parametrization of C1'-C5' RNA ribose chemical shifts calculated by density functional theory. J Chem Phys, 139 (3), pp. 034101. | Show Abstract | Read more

It has been recently shown that NMR chemical shifts can be used to determine the structures of proteins. In order to begin to extend this type of approach to nucleic acids, we present an equation that relates the structural parameters and the (13)C chemical shifts of the ribose group. The parameters in the equation were determined by maximizing the agreement between the DFT-derived chemical shifts and those predicted through the equation for a database of ribose structures. Our results indicate that this type of approach represents a promising way of establishing quantitative and computationally efficient analytical relationships between chemical shifts and structural parameters in nucleic acids.

Sahakyan AB, Vendruscolo M. 2013. Analysis of the contributions of ring current and electric field effects to the chemical shifts of RNA bases. J Phys Chem B, 117 (7), pp. 1989-1998. | Show Abstract | Read more

Ring current and electric field effects can considerably influence NMR chemical shifts in biomolecules. Understanding such effects is particularly important for the development of accurate mappings between chemical shifts and the structures of nucleic acids. In this work, we first analyzed the Pople and the Haigh-Mallion models in terms of their ability to describe nitrogen base conjugated ring effects. We then created a database (DiBaseRNA) of three-dimensional arrangements of RNA base pairs from X-ray structures, calculated the corresponding chemical shifts via a hybrid density functional theory approach and used the results to parametrize the ring current and electric field effects in RNA bases. Next, we studied the coupling of the electric field and ring current effects for different inter-ring arrangements found in RNA bases using linear model fitting, with joint electric field and ring current, as well as only electric field and only ring current approximations. Taken together, our results provide a characterization of the interdependence of ring current and electric field geometric factors, which is shown to be especially important for the chemical shifts of non-hydrogen atoms in RNA bases.

Sahakyan AB. 2012. Computational studies of dielectric permittivity effects on chemical shifts of alanine dipeptide Chemical Physics Letters, 547 pp. 66-72. | Show Abstract | Read more

Dielectric permittivity effect on chemical shifts of alanine dipeptide is studied using hybrid density functional theory. The dependence is shown to be highly sensitive to conformation, and, a reasonable explanation is outlined based on the solvent reaction field model. The danger of the observed shape of dependence for the chemical shift evaluations at low dielectric constant environment, as in the case of protein interior, is emphasized. The nuclear shielding sensitivity towards the dielectric permittivity is examined over different φ/ψ combinations. Comparison with the experimental data from protein backbone suggests an effective dielectric constant of ≈4-5 for protein interior. © 2012 Elsevier B.V. All rights reserved.

Shahkhatuni AA, Sahakyan AB, Shahkhatuni AG, Mamyan SS, Panosyan HA. 2012. Correlation of 1JCH spin–spin coupling constants and their solvent sensitivities Chemical Physics Letters, 542 pp. 56-61. | Show Abstract | Read more

The solvent induced changes of one-bond spin-spin coupling constants (SSCC) of several substituted methanes are investigated in solvents with different polarities. The correlation between solute SSCC and solvent dielectric constant is used to estimate the solvent effect-free ('pure') values of SSCCs by linear extrapolation to zero in reaction field coordinates. The obtained 'pure' SSCCs are close to the values, measured by gas phase NMR spectroscopy or predicted by quantum chemical calculations for isolated molecules. The slopes of SSCC dependencies, interpreted as solvent sensitivities of each molecule, are linearly correlated with the 'pure' values of SSCC. © 2012 Elsevier B.V. All rights reserved.

Sahakyan AB, Cavalli A, Vranken WF, Vendruscolo M. 2012. Protein structure validation using side-chain chemical shifts. J Phys Chem B, 116 (16), pp. 4754-4759. | Show Abstract | Read more

We present a method of assessing the quality of protein structures based on the use of side-chain NMR chemical shifts. Because these parameters are very accurate reporters of side-chain positions and are highly sensitive to tertiary structure and packing, they are particularly useful for structure validation. To analyze a given structure, we define a quality score, QCS, that compares the chemical shifts calculated from such a structure with the corresponding experimental values in a way that takes account of the errors in the calculations. The results that we report illustrate the advantages in the examination of the quality of protein structures from the perspective of side-chains.

Sahakyan AB, Vranken WF, Cavalli A, Vendruscolo M. 2011. Using Side-Chain Aromatic Proton Chemical Shifts for a Quantitative Analysis of Protein Structures Angewandte Chemie, 123 (41), pp. 9794-9797. | Read more

Sahakyan AB, Vranken WF, Cavalli A, Vendruscolo M. 2011. Using side-chain aromatic proton chemical shifts for a quantitative analysis of protein structures. Angew Chem Int Ed Engl, 50 (41), pp. 9620-9623. | Show Abstract | Read more

Predicting chemical shifts: A method for the structure-based prediction of side-chain aromatic 1 H chemical shifts of proteins is presented (see picture; blue structures: aromatic side chains, red spheres: aromatic hydrogen atoms). Its ability to differentiate correct structural models from incorrect ones is also demonstrated, together with its use to detect differences caused by cofactor or ligand binding, or by sequence alterations between structures. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Sahakyan AB, Vranken WF, Cavalli A, Vendruscolo M. 2011. Structure-based prediction of methyl chemical shifts in proteins. J Biomol NMR, 50 (4), pp. 331-346. | Show Abstract | Read more

Protein methyl groups have recently been the subject of much attention in NMR spectroscopy because of the opportunities that they provide to obtain information about the structure and dynamics of proteins and protein complexes. With the advent of selective labeling schemes, methyl groups are particularly interesting in the context of chemical shift based protein structure determination, an approach that to date has exploited primarily the mapping between protein structures and backbone chemical shifts. In order to extend the scope of chemical shifts for structure determination, we present here the CH3Shift method of performing structure-based predictions of methyl chemical shifts. The terms considered in the predictions take account of ring current, magnetic anisotropy, electric field, rotameric type, and dihedral angle effects, which are considered in conjunction with polynomial functions of interatomic distances. We show that the CH3Shift method achieves an accuracy in the predictions that ranges from 0.133 to 0.198 ppm for (1)H chemical shifts for Ala, Thr, Val, Leu and Ile methyl groups. We illustrate the use of the method by assessing the accuracy of side-chain structures in structural ensembles representing the dynamics of proteins.

Sahakyan AB, Shahkhatuni AG, Shahkhatuni AA, Panosyan HA. 2008. Electric field effects on one-bond indirect spin-spin coupling constants and possible biomolecular perspectives. J Phys Chem A, 112 (16), pp. 3576-3586. | Show Abstract | Read more

Electric field (EF) induced changes of one-bond indirect spin-spin coupling constants are investigated on a wide range of molecules including peptide models. EFs were both externally applied and internally calculated without external EF application by the hybrid density functional theory method. Reliable agreement with experimental data has been obtained for calculated one-bond J-couplings. The role of the EF sign and direction, internal and induced components, hydrogen bonding, internuclear distance and hyperconjugative interactions on the one-bond J-coupling vs EF interconnection is analyzed. A linear dependence of 1J on EF projection along the bond is obtained, if the bound atoms possess different enough electron densities and an EF determined by the electronic polarization exists along the bond. Accentuating the 1JNH couplings as possible EF sensitive parameters, a systematic study is done in two sets of molecules with a large variation of the native internal EF value. The most EF affected component of the 1JNH coupling constant is the spin-dipole term of Ramsey's formulation; however, in the total J-coupling formation, the EF influence on the Fermi contact term is the most significant. The induced EF projection along the bond is 6.7 times weaker in magnitude than the simulated external uniform field. The absolute EF dependence of the one-bond J-coupling involves only the internal field, which is the sum of the induced field (if the external field exists) and the internuclear field determined by the native polarization. That linear and universal dependence joins the corresponding couplings in a diverse set of molecules under various electrostatic conditions. Many types of the one-bond J-couplings can be potentially measured in biomolecules, and the study of their relation with the electrostatic properties at the corresponding sites opens a new avenue to the full exploitation of the NMR measurable parameters with novel and exciting applications.

Sahakyan AB, Shahkhatuni AG, Shahkhatuni AA, Panosyan HA. 2008. Torsion sensitivity in NMR of aligned molecules: study on various substituted biphenyls. Magn Reson Chem, 46 (2), pp. 144-149. | Show Abstract | Read more

To estimate the torsion sensitivity of dipolar coupling, biphenylic molecules were chosen as probes due to their relatively simple structure and the surprisingly high ambiguity of the only flexible parameter-the interring torsion angle. Solution structures of 4,4'-dibromobiphenyl and 4,4'-diiodobiphenyl are reported for the first time in two liquid crystals I52 and ZLI 1695. The comparison of NMR structures of various para-substituted biphenyls (BPs), calculated by the additive potential maximum entropy (APME) approach, shows that the small spread of torsion angle values in case of different solvents and para-substituents is in good agreement with theoretical expectations from hybrid density functional theory (DFT) methods. Furthermore, the real structural changes of interring torsion and the prevalence of solvent effects over para-halosubstitution can be correctly revealed from these small fluctuations.

Sahakyan AB, Shahkhatuni AA, Shahkhatuni AG, Panosyan HA. 2008. Dielectric permittivity and temperature effects on spin-spin couplings studied on acetonitrile. Magn Reson Chem, 46 (1), pp. 63-68. | Show Abstract | Read more

Dielectric permittivity (epsilon) and temperature effects on indirect spin-spin coupling constants were studied using acetonitrile as a probe molecule. Experiments were accompanied by hybrid DFT (density functional theory) studies, where the solvent was modeled using the polarization continuum model. Owing to its numerous types of J-couplings, acetonitrile is a very convenient molecule against which various basis sets can be tested or the best basis set can be selected for a given study. The results show reasonable agreement between calculated and experimental values. According to our data, scalar spin-spin coupling constants undergo substantial shifts at lower values of the dielectric constant. Thus J-coupling values are not transferable between measurements made at differing epsilon-conditions, and the assumption of the epsilon-independence of the J-coupling can lead to crucial mistakes in experiments using low-epsilon media. Dielectric permittivity also causes small geometric fluctuations within the molecule, which themselves can affect J-coupling values. Examinations of the results computed with frozen and relaxed geometries show that geometry mediation mostly affects the spin-dipole term of the J-coupling; hence, for accurate evaluation of the latter, frozen geometries are not acceptable. Another interesting fact revealed is the connection between the solvent dielectric properties and the temperature-dependence slopes of J-couplings in corresponding media.

Shahkhatuni AA, Shahkhatuni AG, Panosyan HA, Sahakyan AB, Byeon I-JL, Gronenborn AM. 2007. Assessment of solvent effects: do weak alignment media affect the structure of the solute? Magn Reson Chem, 45 (7), pp. 557-563. | Show Abstract | Read more

Alignment media used for measuring residual dipolar couplings, such as solutions of filamentous phages, phospholipid mixtures, polyacrylamide gels and various lyotropic liquid crystalline systems were investigated with respect to solvent effects on molecular structure. Structural parameters of the small rigid model compound 13C-acetonitrile were calculated from dipolar couplings and variations from expectation values were used for assessment of solvent effects. Only minor solvent effects were observed for most of the media employed and the measured structural data are in good agreement with microwave data and theoretical predictions.

Sahakyan AB, Chambers VS, Marsico G, Santner T, Di Antonio M, Balasubramanian S. 2017. Machine learning model for sequence-driven DNA G-quadruplex formation. Sci Rep, 7 (1), pp. 14535. | Show Abstract | Read more

We describe a sequence-based computational model to predict DNA G-quadruplex (G4) formation. The model was developed using large-scale machine learning from an extensive experimental G4-formation dataset, recently obtained for the human genome via G4-seq methodology. Our model differentiates many widely accepted putative quadruplex sequences that do not actually form stable genomic G4 structures, correctly assessing the G4 folding potential of over 700,000 such sequences in the human genome. Moreover, our approach reveals the relative importance of sequence-based features coming from both within the G4 motifs and their flanking regions. The developed model can be applied to any DNA sequence or genome to characterise sequence-driven intramolecular G4 formation propensities.

Sahakyan AB, Murat P, Mayer C, Balasubramanian S. 2017. G-quadruplex structures within the 3' UTR of LINE-1 elements stimulate retrotransposition. Nat Struct Mol Biol, 24 (3), pp. 243-247. | Show Abstract | Read more

Long interspersed nuclear elements (LINEs) are ubiquitous transposable elements in higher eukaryotes that have a significant role in shaping genomes, owing to their abundance. Here we report that guanine-rich sequences in the 3' untranslated regions (UTRs) of hominoid-specific LINE-1 elements are coupled with retrotransposon speciation and contribute to retrotransposition through the formation of G-quadruplex (G4) structures. We demonstrate that stabilization of the G4 motif of a human-specific LINE-1 element by small-molecule ligands stimulates retrotransposition.

Sahakyan AB, Balasubramanian S. 2017. Single genome retrieval of context-dependent variability in mutation rates for human germline. BMC Genomics, 18 (1), pp. 81. | Show Abstract | Read more

BACKGROUND: Accurate knowledge of the core components of substitution rates is of vital importance to understand genome evolution and dynamics. By performing a single-genome and direct analysis of 39,894 retrotransposon remnants, we reveal sequence context-dependent germline nucleotide substitution rates for the human genome. RESULTS: The rates are characterised through rate constants in a time-domain, and are made available through a dedicated program (Trek) and a stand-alone database. Due to the nature of the method design and the imposed stringency criteria, we expect our rate constants to be good estimates for the rates of spontaneous mutations. Benefiting from such data, we study the short-range nucleotide (up to 7-mer) organisation and the germline basal substitution propensity (BSP) profile of the human genome; characterise novel, CpG-independent, substitution prone and resistant motifs; confirm a decreased tendency of moieties with low BSP to undergo somatic mutations in a number of cancer types; and, produce a Trek-based estimate of the overall mutation rate in human. CONCLUSIONS: The extended set of rate constants we report may enrich our resources and help advance our understanding of genome dynamics and evolution, with possible implications for the role of spontaneous mutations in the emergence of pathological genotypes and neutral evolution of proteomes.

Kwok CK, Marsico G, Sahakyan AB, Chambers VS, Balasubramanian S. 2016. rG4-seq reveals widespread formation of G-quadruplex structures in the human transcriptome. Nat Methods, 13 (10), pp. 841-844. | Show Abstract | Read more

We introduce RNA G-quadruplex sequencing (rG4-seq), a transcriptome-wide RNA G-quadruplex (rG4) profiling method that couples rG4-mediated reverse transcriptase stalling with next-generation sequencing. Using rG4-seq on polyadenylated-enriched HeLa RNA, we generated a global in vitro map of thousands of canonical and noncanonical rG4 structures. We characterize rG4 formation relative to cytosine content and alternative RNA structure stability, uncover rG4-dependent differences in RNA folding and show evolutionarily conserved enrichment in transcripts mediating RNA processing and stability.

Sahakyan AB, Balasubramanian S. 2016. Long genes and genes with multiple splice variants are enriched in pathways linked to cancer and other multigenic diseases. BMC Genomics, 17 (1), pp. 225. | Show Abstract | Read more

BACKGROUND: The role of random mutations and genetic errors in defining the etiology of cancer and other multigenic diseases has recently received much attention. With the view that complex genes should be particularly vulnerable to such events, here we explore the link between the simple properties of the human genes, such as transcript length, number of splice variants, exon/intron composition, and their involvement in the pathways linked to cancer and other multigenic diseases. RESULTS: We reveal a substantial enrichment of cancer pathways with long genes and genes that have multiple splice variants. Although the latter two factors are interdependent, we show that the overall gene length and splicing complexity increase in cancer pathways in a partially decoupled manner. Our systematic survey for the pathways enriched with top lengthy genes and with genes that have multiple splice variants reveal, along with cancer pathways, the pathways involved in various neuronal processes, cardiomyopathies and type II diabetes. We outline a correlation between the gene length and the number of somatic mutations. CONCLUSIONS: Our work is a step forward in the assessment of the role of simple gene characteristics in cancer and a wider range of multigenic diseases. We demonstrate a significant accumulation of long genes and genes with multiple splice variants in pathways of multigenic diseases that have already been associated with de novo mutations. Unlike the cancer pathways, we note that the pathways of neuronal processes, cardiomyopathies and type II diabetes contain genes long enough for topoisomerase-dependent gene expression to also be a potential contributing factor in the emergence of pathologies, should topoisomerases become impaired.

Kwok CK, Sahakyan AB, Balasubramanian S. 2016. Structural Analysis using SHALiPE to Reveal RNA G-Quadruplex Formation in Human Precursor MicroRNA. Angew Chem Int Ed Engl, 55 (31), pp. 8958-8961. | Show Abstract | Read more

RNA G-quadruplex (rG4) structures are of fundamental importance to biology. A novel approach is introduced to detect and structurally map rG4s at single-nucleotide resolution in RNAs. The approach, denoted SHALiPE, couples selective 2'-hydroxyl acylation with lithium ion-based primer extension, and identifies characteristic structural fingerprints for rG4 mapping. We apply SHALiPE to interrogate the human precursor microRNA 149, and reveal the formation of an rG4 structure in this non-coding RNA. Additional analyses support the SHALiPE results and uncover that this rG4 has a parallel topology, is thermally stable, and is conserved in mammals. An in vitro Dicer assay shows that this rG4 inhibits Dicer processing, supporting the potential role of rG4 structures in microRNA maturation and post-transcriptional regulation of mRNAs.

Hardisty RE, Kawasaki F, Sahakyan AB, Balasubramanian S. 2015. Selective Chemical Labeling of Natural T Modifications in DNA. J Am Chem Soc, 137 (29), pp. 9270-9272. | Show Abstract | Read more

We present a chemical method to selectively tag and enrich thymine modifications, 5-formyluracil (5-fU) and 5-hydroxymethyluracil (5-hmU), found naturally in DNA. Inherent reactivity differences have enabled us to tag 5-fU chemoselectively over its C modification counterpart, 5-formylcytosine (5-fC). We rationalized the enhanced reactivity of 5-fU compared to 5-fC via ab initio quantum mechanical calculations. We exploited this chemical tagging reaction to provide proof of concept for the enrichment of 5-fU containing DNA from a pool that contains 5-fC or no modification. We further demonstrate that 5-hmU can be chemically oxidized to 5-fU, providing a strategy for the enrichment of 5-hmU. These methods will enable the mapping of 5-fU and 5-hmU in genomic DNA, to provide insights into their functional role and dynamics in biology.

Camilloni C, Sahakyan AB, Holliday MJ, Isern NG, Zhang F, Eisenmesser EZ, Vendruscolo M. 2014. Cyclophilin A catalyzes proline isomerization by an electrostatic handle mechanism. Proc Natl Acad Sci U S A, 111 (28), pp. 10203-10208. | Show Abstract | Read more

Proline isomerization is a ubiquitous process that plays a key role in the folding of proteins and in the regulation of their functions. Different families of enzymes, known as "peptidyl-prolyl isomerases" (PPIases), catalyze this reaction, which involves the interconversion between the cis and trans isomers of the N-terminal amide bond of the amino acid proline. However, complete descriptions of the mechanisms by which these enzymes function have remained elusive. We show here that cyclophilin A, one of the most common PPIases, provides a catalytic environment that acts on the substrate through an electrostatic handle mechanism. In this mechanism, the electrostatic field in the catalytic site turns the electric dipole associated with the carbonyl group of the amino acid preceding the proline in the substrate, thus causing the rotation of the peptide bond between the two residues. We identified this mechanism using a combination of NMR measurements, molecular dynamics simulations, and density functional theory calculations to simultaneously determine the cis-bound and trans-bound conformations of cyclophilin A and its substrate as the enzymatic reaction takes place. We anticipate that this approach will be helpful in elucidating whether the electrostatic handle mechanism that we describe here is common to other PPIases and, more generally, in characterizing other enzymatic processes.

Sahakyan AB. 2012. Computational studies of dielectric permittivity effects on chemical shifts of alanine dipeptide Chemical Physics Letters, 547 pp. 66-72. | Show Abstract | Read more

Dielectric permittivity effect on chemical shifts of alanine dipeptide is studied using hybrid density functional theory. The dependence is shown to be highly sensitive to conformation, and, a reasonable explanation is outlined based on the solvent reaction field model. The danger of the observed shape of dependence for the chemical shift evaluations at low dielectric constant environment, as in the case of protein interior, is emphasized. The nuclear shielding sensitivity towards the dielectric permittivity is examined over different φ/ψ combinations. Comparison with the experimental data from protein backbone suggests an effective dielectric constant of ≈4-5 for protein interior. © 2012 Elsevier B.V. All rights reserved.

Sahakyan AB, Vranken WF, Cavalli A, Vendruscolo M. 2011. Using side-chain aromatic proton chemical shifts for a quantitative analysis of protein structures. Angew Chem Int Ed Engl, 50 (41), pp. 9620-9623. | Show Abstract | Read more

Predicting chemical shifts: A method for the structure-based prediction of side-chain aromatic 1 H chemical shifts of proteins is presented (see picture; blue structures: aromatic side chains, red spheres: aromatic hydrogen atoms). Its ability to differentiate correct structural models from incorrect ones is also demonstrated, together with its use to detect differences caused by cofactor or ligand binding, or by sequence alterations between structures. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Sahakyan AB, Vranken WF, Cavalli A, Vendruscolo M. 2011. Structure-based prediction of methyl chemical shifts in proteins. J Biomol NMR, 50 (4), pp. 331-346. | Show Abstract | Read more

Protein methyl groups have recently been the subject of much attention in NMR spectroscopy because of the opportunities that they provide to obtain information about the structure and dynamics of proteins and protein complexes. With the advent of selective labeling schemes, methyl groups are particularly interesting in the context of chemical shift based protein structure determination, an approach that to date has exploited primarily the mapping between protein structures and backbone chemical shifts. In order to extend the scope of chemical shifts for structure determination, we present here the CH3Shift method of performing structure-based predictions of methyl chemical shifts. The terms considered in the predictions take account of ring current, magnetic anisotropy, electric field, rotameric type, and dihedral angle effects, which are considered in conjunction with polynomial functions of interatomic distances. We show that the CH3Shift method achieves an accuracy in the predictions that ranges from 0.133 to 0.198 ppm for (1)H chemical shifts for Ala, Thr, Val, Leu and Ile methyl groups. We illustrate the use of the method by assessing the accuracy of side-chain structures in structural ensembles representing the dynamics of proteins.

Sahakyan AB, Shahkhatuni AG, Shahkhatuni AA, Panosyan HA. 2008. Electric field effects on one-bond indirect spin-spin coupling constants and possible biomolecular perspectives. J Phys Chem A, 112 (16), pp. 3576-3586. | Show Abstract | Read more

Electric field (EF) induced changes of one-bond indirect spin-spin coupling constants are investigated on a wide range of molecules including peptide models. EFs were both externally applied and internally calculated without external EF application by the hybrid density functional theory method. Reliable agreement with experimental data has been obtained for calculated one-bond J-couplings. The role of the EF sign and direction, internal and induced components, hydrogen bonding, internuclear distance and hyperconjugative interactions on the one-bond J-coupling vs EF interconnection is analyzed. A linear dependence of 1J on EF projection along the bond is obtained, if the bound atoms possess different enough electron densities and an EF determined by the electronic polarization exists along the bond. Accentuating the 1JNH couplings as possible EF sensitive parameters, a systematic study is done in two sets of molecules with a large variation of the native internal EF value. The most EF affected component of the 1JNH coupling constant is the spin-dipole term of Ramsey's formulation; however, in the total J-coupling formation, the EF influence on the Fermi contact term is the most significant. The induced EF projection along the bond is 6.7 times weaker in magnitude than the simulated external uniform field. The absolute EF dependence of the one-bond J-coupling involves only the internal field, which is the sum of the induced field (if the external field exists) and the internuclear field determined by the native polarization. That linear and universal dependence joins the corresponding couplings in a diverse set of molecules under various electrostatic conditions. Many types of the one-bond J-couplings can be potentially measured in biomolecules, and the study of their relation with the electrostatic properties at the corresponding sites opens a new avenue to the full exploitation of the NMR measurable parameters with novel and exciting applications.

3313