protein nmr assignment tutorial

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
Explore content
About the journal
Publish with us
Sign up for alerts
Open access
Published: 18 October 2022

Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA

Piotr Klukowski ORCID: orcid.org/0000-0003-1045-3487 1 ,
Roland Riek ORCID: orcid.org/0000-0002-6333-066X 1 &
Peter Güntert ORCID: orcid.org/0000-0002-2911-7574 1 , 2 , 3

Nature Communications volume 13 , Article number: 6151 ( 2022 ) Cite this article

13k Accesses

31 Citations

30 Altmetric

Metrics details

Machine learning
Solution-state NMR

Nuclear Magnetic Resonance (NMR) spectroscopy is a major technique in structural biology with over 11,800 protein structures deposited in the Protein Data Bank. NMR can elucidate structures and dynamics of small and medium size proteins in solution, living cells, and solids, but has been limited by the tedious data analysis process. It typically requires weeks or months of manual work of a trained expert to turn NMR measurements into a protein structure. Automation of this process is an open problem, formulated in the field over 30 years ago. We present a solution to this challenge that enables the completely automated analysis of protein NMR data within hours after completing the measurements. Using only NMR spectra and the protein sequence as input, our machine learning-based method, ARTINA, delivers signal positions, resonance assignments, and structures strictly without human intervention. Tested on a 100-protein benchmark comprising 1329 multidimensional NMR spectra, ARTINA demonstrated its ability to solve structures with 1.44 Å median RMSD to the PDB reference and to identify 91.36% correct NMR resonance assignments. ARTINA can be used by non-experts, reducing the effort for a protein assignment or structure determination by NMR essentially to the preparation of the sample and the spectra measurements.

The 100-protein NMR spectra dataset: A resource for biomolecular NMR data analysis

DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra

Solution-state methyl NMR spectroscopy of large non-deuterated proteins enabled by deep neural networks

Introduction.

Studying structures of proteins and ligand-protein complexes is one of the most influential endeavors in molecular biology and rational drug design. All key structure determination techniques, X-ray crystallography, electron microscopy, and NMR spectroscopy, have led to remarkable discoveries, but suffer from their respective experimental limitations. NMR can elucidate structures and dynamics of small and medium size proteins in solution 1 and even in living cells 2 . However, the analysis of NMR spectra and the resonance assignment, which are indispensable for NMR studies, remain time-consuming even for a skilled and experienced spectroscopist. Attributed to this, the percentage of NMR protein structures in the Protein Data Bank (PDB) has decreased from a maximum of 14.6% in 2007 to 7.3% in 2021 ( https://www.rcsb.org/stats ). The problem has sparked research towards automating different tasks in NMR structure determination 3 , 4 , including peak picking 5 , 6 , 7 , 8 , 9 , resonance assignment 10 , 11 , 12 , and the identification of distance restraints 13 , 14 . Several of these methods are available as webservers 15 , 16 . This enabled semi-automatic 17 , 18 but not yet unsupervised automation of the entire NMR structure determination process, except for a very small number of favorable proteins 7 , 19 .

The advance of machine learning techniques 20 now offers unprecedented possibilities for reliably replacing decisions of human experts by efficient computational tools. Here, we present a method that achieves this goal for NMR assignment and structure determination. We show for a diverse set of 100 proteins that NMR resonance assignments and protein structures can be determined within hours after completing the NMR measurements. Our method, Art ificial I ntelligence for N MR A pplications, ARTINA (Fig. 1 ), combines machine learning for tasks that are difficult to model otherwise with existing algorithms—evolutionary optimization for resonance assignment with FLYA 12 , chemical shift database searches for torsion angle restraint generation with TALOS-N 21 , ambiguous distance restraints, network-anchoring and constraint combination for NOESY assignment 14 , 22 and simulated annealing by torsion angle dynamics for structure calculation with CYANA 23 . Machine learning is used in multiple flavors—deep residual neural networks 24 for visual spectrum analysis to identify peak positions (pp-ResNet) and to deconvolve overlapping signals (deconv-ResNet) in 25 different types of spectra (Supplementary Table 1 ), kernel density estimation (KDE) to reconstruct original peak positions in folded spectra, a deep graph neural network 25 , 26 (GNN) for chemical shift estimation within the refinement of chemical shift assignments, and a gradient boosted trees 27 (GBT) model for the selection of structure proposals.

The flowchart presents the interplay between the main components of the automated protein structure determination workflow: Residual Neural Network (ResNet), FLYA automated chemical shift assignment, Graph Neural Network (GNN), Gradient Boosted Trees (GBT), and CYANA structure calculation.

A major challenge in developing ARTINA was the collection and preparation of a large training data set that is required for machine learning, because, in contrast to assignments and structures, NMR spectra are generally not archived in public data repositories. Instead, we were obliged to collect from different sources and standardize complete sets of multidimensional NMR spectra for the assignment and structure determination of 100 proteins.

In the following work, we describe the algorithm, training and test data, and results of ARTINA automated structure determination, which are on par with those achieved in weeks or months of human experts’ labor.

Benchmark dataset

One of the major obstacles for developing deep learning solutions for protein NMR spectroscopy is the lack of a large-scale standardized benchmark dataset of protein NMR spectra. To date, published manuscripts presenting the most notable methods for computational NMR, typically refer to less than 50 2D/3D/4D NMR spectra in their experimental sections. Even the well-recognized CASD-NMR competition cannot serve as a major source of training data for deep learning, since only the NOESY spectra of 10 proteins were used in the last round of the event 28 .

To make our study possible, we established a standardized benchmark of 1329 2D/3D/4D NMR spectra, which allows 100 proteins to be recalculated using their original spectral data (Fig. 2 and Supplementary Table 2 ). Each protein record in our dataset contains 5–20 spectra together with manually identified chemical shifts (usually depositions at the Biological Magnetic Resonance Data Bank, BMRB) and the previously determined (“ground truth”) protein structure (PDB record; Supplementary Table 3 ). The benchmark covers protein sizes typically studied by NMR spectroscopy with sequence lengths between 35 and 175 residues (molecular mass 4–20 kDa).

PDB codes (or names, MH04, MDM2, KRAS4B, if PDB code unavailable) of the 100 benchmark proteins are ordered by the number of residues. The histogram shows the number of spectra for backbone assignment, side-chain assignment, and NOE measurement. Spectrum types in each data set are shown by light to dark blue circles indicating the number of individual spectra of the given type. The percentages of benchmark records that contain a given spectrum type are given at the top. Spectrum types present in less than 5% of the data sets have been omitted.

Automated protein structure determination

The accuracy of protein structure determination with ARTINA was evaluated in a 5-fold cross-validation experiment with the aforementioned benchmark dataset. Five instances of pp-ResNet and GBT were trained, each one using data from about 80% of the proteins for training and the remaining ones for testing. Since each protein was present exactly once in the test set, reported quality metrics were obtained directly in the cross-validation experiment, and no averaging between data splits was required. To deploy pp-ResNet and GBT models in our online system, we constructed an ensemble by averaging predictions of all 5 cross-validation models. The other models were trained only once using either generated data (deconv-ResNet, Supplementary Fig. 1 ) or BMRB depositions excluding all benchmark proteins (GNN, KDE).

In this experiment, we reproduced 100 structures in fully automated manner using only NMR spectra and the protein sequences as input. Since ARTINA has no tunable parameters and does not require any manual curation of data, each structure was calculated by a single execution of the ARTINA workflow. All benchmark datasets were analyzed by ARTINA in parallel with execution times of 4–20 h per protein.

All automatically determined structures, overlaid with the corresponding reference structures from the PDB, are visualized in Fig. 3 , Supplementary Fig. 2 , and Supplementary Movie 1 . ARTINA was able to reproduce the reference structures with a median backbone root-mean-square deviation (RMSD) of 1.44 Å between the mean coordinates of the ARTINA structure bundle and the mean coordinates of the corresponding reference PDB structure bundle for the backbone atoms N, C α , C’ in the residue ranges determined by CYRANGE 29 (Fig. 4a and Supplementary Table 4 ). ARTINA automatically identified between 459 and 4678 distance restraints (2198 on average over 100 proteins), which corresponds to 4.25–33.20 restraints per residue (Fig. 4b ). This number is mainly influenced by the extent of unstructured regions and the quality of the NOESY spectra. In agreement with earlier findings 30 , it correlates only weakly with the backbone RMSD to reference (linear correlation coefficient −0.38). As a more expressive validation measure for the structures from ARTINA, we computed a predicted RMSD to the PDB reference structure on the basis of the RMSDs between the 10 candidate structure bundles calculated in ARTINA (see “Methods”, Fig. 5 , and Supplementary Table 5 ). The average deviation between actual and predicted RMSDs for the 100 proteins in this study is 0.35 Å, and their linear correlation coefficient is 0.77 (Fig. 5 ). In no case, the true RMSD exceeds the predicted one by more than 1 Å.

The structures are aligned with the RMSD to reference range as indicated on the left and hexagonal frames color-coded by their size as indicated above. Structures with no corresponding PDB depositions are marked by an asterisk.

a Backbone RMSD to reference. b Number of distance restraints per residue. c Chemical shift assignment accuracy. Bars represent quantity values for benchmark proteins, identified by PDB codes (or protein names). Proteins are ordered by size, which is indicated by a color-coded circle. Values in the center of each panel are 10th, 50th, and 90th percentiles of values presented in the bar plot. Short/medium/long-range restraints are between residues i and j with | i – j | ≤ 1, 2 ≤ | i – j | ≤ 4, and | i – j | ≥ 5, respectively.

The predicted RMSD to reference (pRMSD) is calculated from the ARTINA results without knowledge of the reference PDB structure (see “Methods”) and, by definition, always in the range of 0–4 Å. For comparability, actual RMSD values to reference are also truncated at 4 Å (protein 2M47 with RMSD 4.47 Å). The dotted lines represent deviations of ±1 Å between the two RMSD quantities.

Additional structure validation scores obtained from ANSSUR 31 (Supplementary Table 6 ), RPF 32 (Supplementary Table 7 ), and consensus structure bundles 33 (Supplementary Table 8 ) confirm that overall the ARTINA structures and the corresponding reference PDB structures are of equivalent quality. Energy refinement of the ARTINA structures in explicit water using OPALp 34 (not part of the standard ARTINA workflow) does not significantly alter the agreement with the PDB reference structures (Supplementary Table 9 ). The benchmark data set comprises 78 protein structures determined by the Northeast Structural Genomics Consortium (NESG). On average, ARTINA yielded structures of the same accuracy for NESG targets (median RMSD to reference 1.44 Å) as for proteins from other sources (1.42 Å).

On average, ARTINA correctly assigned 90.39% of the chemical shifts (Fig. 4c ), as compared to the manually prepared assignments, including both “strong” (high-reliability) and “weak” (tentative) FLYA assignments 12 . Backbone chemical shifts were assigned more accurately (96.03%) than side-chain ones (86.50%), which is mainly due to difficulties in assigning lysine/arginine (79.97%) and aromatic (76.87%) side-chains. Further details on the assignment accuracy for individual amino acid types in the protein cores (residues with less than 20% solvent accessibility) are given in Supplementary Table 10 . Assignments for core residues, which are important for the protein structure, are generally more accurate than for the entire protein, in particular for core Ala, Cys, and Asp residues, which show a median assignment accuracy of 100% over the 100 proteins. The lowest accuracies are observed for core His (83.3%), Phe (83.3%), and Arg (87.5%) residues. The three proteins with highest RMSD to reference, 2KCD, 2L82, and 2M47 (see below), show 68.2, 83.8, and 75.7% correct aromatic assignments, respectively, well below the corresponding median of 85.5%. On the other hand, the assignment accuracies for the methyl-containing residues Ala, Ile, Val are above average and reach a median of 100, 97.6, and 98.6%, respectively.

The quality of automated structure determination and chemical shift assignment reflects the performance of deep learning-based visual spectrum analysis, presented qualitatively in Figs. 6 – 7 , Supplementary Fig. 3 , and Supplementary Movies 2 – 4 . In this experiment, our models (pp-ResNet, deconv-ResNet) automatically identified 1,168,739 cross-peaks with high confidence (≥0.50) in the benchmark spectra. All 1329 peak lists, together with automatically determined protein structures and chemical shift lists, are available for download.

A fragment of a 15 N-HSQC spectrum of the protein 1T0Y is shown. Initial signal positions identified by the peak picking model pp-ResNet (black dots) are deconvolved by deconv-ResNet, yielding the final coordinates used for automated assignment and structure determination (blue crosses). a 1 , a 2 Initial peak picking marker position is refined by the deconvolution model. b 1 , b 2 pp-ResNet output is deconvolved into two components. c The deconvolution model supports maximally 3 components per initial signal. d Two peak picking markers are merged by the deconvolution model. e Peak picking output deconvolved into three components.

A fragment of the 13 C-HSQC spectrum of protein 2K0M is shown. Initial signal positions identified by the peak picking model pp-ResNet (black dots) are deconvolved by deconv-ResNet, yielding the final coordinates used for automated assignment and structure determination (blue crosses).

Error analysis

The largest deviations from the PDB reference structure were observed for the proteins 2KCD, 2L82, and 2M47, for which the pRMSD consistently indicated low accuracy (Fig. 5 ). Significant deviations are mainly due to displacements of terminal secondary structure elements (e.g., a tilted α-helix near a chain terminus), or inaccurate loop conformations (e.g., more flexible than in the PDB deposition). We investigated the origin of these discrepancies.

2KCD is a 120-residue (14.4 kDa) protein from Staphylococcus saprophyticus with an α-β roll architecture. Its dataset comprises 19 spectra (8 backbone, 6 side-chain, and 5 NOESY). The ARTINA structure has a backbone RMSD to PDB reference of 3.13 Å, which is caused by the displacement of the C-terminal α-helix (residues 105–109; Supplementary Fig. 4a ). Excluding this 5-residue fragment decreases the RMSD to 2.40 Å (Supplementary Table 11 ). The positioning of this helix appears to be uncertain, since an ARTINA calculation without the 4D CC-NOESY spectrum yields a significantly lower RMSD of 1.77 Å (Supplementary Table 12 ).

2L82 is a de novo designed protein of 162 residues (19.7 kDa) with an αβ 3-layer (αβα) sandwich architecture. Although only 9 spectra (4 backbone, 2 side-chain and 3 NOESY) are available, ARTINA correctly assigned 97.87% backbone and 81.05% side-chain chemical shifts. The primary reason for the high RMSD value of 3.55 Å is again a displacement of the C-terminal α-helix (residues 138–153). The remainder of the protein matches closely the PDB deposition (1.04 Å RMSD, Supplementary Fig. 4b ).

The protein with highest RMSD to reference (4.72 Å) in our benchmark dataset is 2M47, a 163-residue (18.8 kDa) protein from Corynebacterium glutamicum with an α-β 2-layer sandwich architecture, for which 17 spectra (7 backbone, 7 side chain and 3 NOESY) are available. The main source of discrepancy are two α-helices spanning residues 111–157 near the C-terminus. Nevertheless, the residues contributing to the high RMSD value are distributed more extensively than in 2L82 and 2KCD just discussed. Interestingly, 2 of the 10 structure proposals calculated by ARTINA have an RMSD to reference below 2 Å (1.66 Å and 1.97 Å). In the final structure selection step, our GBT model selected the 4.72 Å RMSD structure as the first choice and 1.66 Å as the second one (Supplementary Fig. 4c ). Such results imply that the automated structure determination of this protein is unstable. Since ARTINA returns the two structures selected by GBT with the highest confidence, the user can, in principle, choose the better structure based on contextual information.

In addition to these three case studies, we performed a quantitative analysis of all regular secondary structure elements and flexible loops present in our 100-protein benchmark in order to assess their impact on the backbone RMSD to reference (Supplementary Table 11 ). All residues in the structurally well-defined regions determined by CYRANGE 29 were assigned to 6 partially overlapping sets: (a) first secondary structure element, (b) last secondary structure element, (c) α-helices, (d) β-sheets, (e) α-helices and β-sheets, and (f) loops. Then, the RMSD to reference was calculated 6 times, each time with one set excluded. In total, for 66 of the 100 proteins the lowest RMSD was obtained if set (f) was excluded from RMSD calculation, and 13% benefited most from removal of the first or last secondary structure element (a or b). Moreover, for 18 out of the 19 proteins with more than 0.5 Å RMSD decrease compared to the RMSD for all well-defined residues, (a), (b), or (f) was the primary source of discrepancy. These results are consistent with our earlier statement that deviations in automatically determined protein structures are mainly caused by terminal secondary structure elements or inaccurate loop conformations.

Ablation studies

During the experiment, we captured the state of each structure determination at 9 time-points, 3 per structure determination cycle: (a) after the initial FLYA shift assignment, (b) after GNN shift refinement, and (c) after structure calculation (Fig. 1 ). Comparative analysis of these states allowed us to quantify the contribution of different ARTINA components to the structure determination process (Table 1 ).

The results show a strong benefit of the refinement cycles, as quantities reported in Table 1 consistently improve from cycle 1 to 3. The majority of benchmark proteins converge to the correct fold after the first cycle (1.56 Å median backbone RMSD to reference), which is further refined to 1.52 Å in cycle 2 and 1.44 Å in cycle 3. Additionally, within each chemical shift refinement cycle, improvements in assignment accuracy resulting from the GNN predictions are observed. This quantity also increases consistently across all refinement cycles, in particular for side-chains. Refinement cycles are particularly advantageous for large and challenging systems, such as 2LF2, 2M7U, or 2B3W, which benefit substantially in cycles 2 and 3 from the presence of the approximate protein fold in the chemical shift assignment step.

Impact of 4D NOESY experiments

As presented in Fig. 2 , 26 out of 100 benchmark datasets contain 4D CC-NOESY spectra, which require long measurement times and were used in the manual structure determination. To quantify their impact, we performed automated structure determinations of these 26 proteins with and without the 4D CC-NOESY spectra (Supplementary Table 12 ).

On average, the presence of 4D CC-NOESY improves the backbone RMSD to reference by 0.15 Å (decrease from 1.88 to 1.73 Å) and has less than 1% impact on chemical shift assignment accuracy. However, the impact is non-uniform. For three proteins, 2KIW, 2L8V, and 2LF2, use of the 4D CC-NOESY decreased the RMSD by more than 1 Å. On the other hand, there is also one protein, 2KCD, for which the RMSD decreased by more than 1 Å by excluding the 4D CC-NOESY.

These results suggest that overall the amount of information stored in 2D/3D experiments is sufficient for ARTINA to reach close to optimal performance, and only modest improvement can be achieved by introducing additional information redundancy from 4D CC-NOESY spectra.

Automated chemical shift assignment

Apart from structure determination, our data analysis pipeline for protein NMR spectroscopy can address an array of problems that are nowadays approached manually or semi-manually. For instance, ARTINA can be stopped after visual spectrum analysis, returning positions and intensities of cross-peaks that can be utilized for any downstream task, not necessarily related to protein structure determination.

Alternatively, a single chemical shift refinement cycle can be performed to get automatically assigned cross-peaks from spectra and sequence. We evaluated this approach with three sets of spectra: (i) Exclusively backbone assignment spectra were used to assign N, C α , C β , C’, and H N shifts. With this input, ARTINA assigned 92.40% (median value) of the backbone shifts correctly. (ii) All through-bond but no NOESY spectra were used to assign the backbone and side-chain shifts. This raised the percentage of correct backbone assignments to 94.20%. (iii) The full data set including NOESY yielded 96.60% correct assignments of the backbone shifts. These three experiments were performed for the 45 benchmark proteins, for which CBCANH and CBCAcoNH, as well as either HNCA and HNcoCA or HNCO and HNcaCO experiments were available. The availability of NOESY spectra had a large impact on the side-chain assignments: 86.00% were correct for the full spectra set iii, compared to 73.70% in the absence of NOESY spectra (spectra set ii). The presence of NOESY spectra consistently improved the chemical shift assignment accuracy of all amino acid types (Supplementary Tables 13 and 14 ). The improvement is particularly strong for aromatic residues (Phe, 61.6 to 76.5%, Trp 52.5 to 80%, and Tyr 71.4 to 89.7%), but not limited to this group.

The results obtained with ARTINA differ in several aspects substantially from previous approaches towards automating protein NMR analysis 3 , 4 , 7 , 12 , 17 , 18 , 19 , 35 . First, ARTINA comprehends the entire workflow from spectra to structures rather than individual steps in it, and there are strictly no manual interventions or protein-specific parameters to be adapted. Second, the quality of the results regarding peak identification, resonance assignments, and structures have been assessed on a large and diverse set of 100 proteins; for the vast majority of which they are on par with what can be achieved by human experts. Third, the method provides a two-orders-of-magnitude leap in efficiency by providing assignments and a structure within hours of computation time rather than weeks or months of human work. This reduces the effort for a protein structure determination by NMR essentially to the preparation of the sample and the measurement of the spectra. Its implementation in the https://nmrtist.org webserver (Supplementary Movie 5 ) encapsulates its complexity, eliminates any intermediate data and format conversions by the user, and enables the use of different types of high-performance hardware as appropriate for each of the subtasks. ARTINA is not limited to structure determination but can be used equally well for peak picking and resonance assignment in NMR studies that do not aim at a structure, such as investigations of ligand binding or dynamics.

Although ARTINA has no parameters to be optimized by the user, care should be given to the preparation of the input data, i.e., the choice, measurement, processing, and specification of the spectra. Spectrum type, axes, and isotope labeling declarations must be correct, and chemical shift referencing consistent over the entire set of spectra. Slight variations of corresponding chemical shifts within the tolerances of 0.03 ppm for 1 H and 0.4 ppm for 13 C/ 15 N can be accommodated, but larger deviations, resulting, for instance, from the use of multiple samples, pH changes, protein degradation, or inaccurate referencing, can be detrimental. Where appropriate, ARTINA proposes corrections of chemical shift referencing 36 . Furthermore, based on the large training data set, which comprises a large variety of spectral artifacts, ARTINA largely avoids misinterpreting artifacts as signals. However, with decreasing spectral quality, ARTINA, like a human expert, will progressively miss real signals.

Regarding protein size and spectrum quality, limitations of ARTINA are similar to those encountered by a trained spectroscopist. Machine-learning-based visual analysis of spectra requires signals to be present and distinguishable in the spectra. ARTINA does not suffer from accidental oversight that may affect human spectra analysis. On the other hand, human experts may exploit contextual information to which the automated system currently has no access because it identifies individual signals by looking at relatively small, local excerpts of spectra.

In this paper, we used all spectra that are available from the earlier manual structure determination. For most of the 100 proteins, the spectra data set has significant redundancy regarding information for the resonance assignment. Our results indicate that one can expect to obtain good assignments and structures also from smaller sets of spectra 37 , with concomitant savings of NMR measurement time. We plan to investigate this in a future study.

The present version of ARTINA can be enhanced in several directions. Besides improving individual models and algorithms, it is conceivable to integrate the so far independently trained collection of machine learning models, plus additional models that replace conventional algorithms, into a coherent system that is trained as a whole. Furthermore, the reliability of machine learning approaches depends strongly on the quantity and quality of training data available. While the collection of the present training data set for ARTINA was cumbersome, from now on it can be expected to expand continuously through the use of the https://nmrtist.org website, both quantitatively and qualitatively with regard to greater variability in terms of protein types. spectral quality, source laboratory, data processing (including non-linear sampling), etc., which can be exploited in retraining the models. ARTINA can also be extended to use additional experimental input data, e.g., known partial assignments, stereospecific assignments, 3 J couplings, residual dipolar couplings, paramagnetic data, and H-bonds. Structural information, e.g., from AlphaFold 38 , can be used in combination with reduced sets of NMR spectra for rapid structure-based assignment. Finally, the range of application of ARTINA can be generalized to small molecule-protein complexes relevant for structure-activity relationship studies in drug research, protein-protein complexes, RNA, solid state, and in-cell NMR.

Overall, ARTINA stands for a paradigm change in biomolecular NMR from a time-consuming technique for specialists to a fast method open to researchers in molecular biology and medicinal chemistry. At the same time, in a larger perspective, the appearance of generally highly accurate structure predictions by AlphaFold 38 is revolutionizing structural biology. Nevertheless, there remains space for the experimental methods, for instance, to elucidate various states of proteins under different conditions or in dynamic exchange, or for studying protein-ligand interaction. Regarding ARTINA, one should keep in mind that its applications extend far beyond structure determination. It will accelerate virtually any biological NMR studies that require the analysis of multidimensional NMR spectra and chemical shift assignments. Protein structure determination is just one possible ARTINA application, which is both demanding in terms of the amount and quality of required experimental data and amenable to quantitative evaluation.

Spectrum benchmark collection

To collect the benchmark of NMR spectra (Fig. 2 and Supplementary Table 2 ), we implemented a crawler software, which systematically scanned the FTP server of the BMRB data bank 39 , identifying data files relevant to our study. Additional datasets were obtained by setting up a website for the deposition of published data ( https://nmrdb.ethz.ch ), from our collaboration network, or had been acquired internally in our laboratory. NMR data was collected from these channels either in the form of processed spectra (Sparky 40 , NMRpipe 41 , XEASY 42 , Bruker formats), or in the form of time-domain data accompanied by depositor-supplied NMRpipe processing scripts. No additional spectra processing (e.g., baseline correction) was performed as part of this study.

The most challenging aspects of the benchmark collection process were: scarcity of data—only a small fraction of all BMRB depositions are accompanied by uploaded spectra (or time-domain data), lack of standards for NMR data depositions—each protein data set had to be prepared manually, as the original data was stored in different formats (spectra name conventions, axis label standards, spectra data format), and difficulties in correlating data files deposited in the BMRB FTP site with contextual information about the spectrum and the sample (e.g., sample characteristics, measurement conditions, instrument used). Manually prepared (mostly NOESY) peak lists, which are available from the BMRB for some of the proteins in the benchmark, were not used for this study.

Different approaches to 3D 13 C-NOESY spectra measurement had to be taken into account: (i) Two separate 13 C NOESY for aliphatic and aromatic signals. These were analyzed by ARTINA without any special treatment. We used ALI , ARO tags (Supplementary Movie S5 ) to provide the information that only either aliphatic or aromatics shifts are expected in a given spectrum. (ii) Simultaneous NC-NOESY. These spectra were processed twice to have proper scaling of the 13 C and 15 N axes in ppm units, and cropped to extract 15 N-NOESY and 13 C-NOESY spectra. If nitrogen and carbon cross-peak amplitudes have different signs, we used POS , NEG tags to provide the information that only either positive or negative signals should be analyzed. (iii) Aliphatic and aromatic signals in a single 13 C-NOESY spectrum. These measurements do not require any special treatment, but proper cross-peak unfolding plays a vital role in aromatic signals analysis.

Overview of the ARTINA algorithm

ARTINA uses as input only the protein sequence and a set of NMR spectra, which may contain any combination of 25 experiments currently supported by the method (Supplementary Table 1 ). Within 4–20 h of computation time (depending on protein size, number of spectra, and computing hardware load), ARTINA determines: (a) cross-peak positions for each spectrum, (b) chemical shift assignments, (c) distance restraints from NOESY spectra, and (d) the protein structure. The whole process does not require any human involvement, allowing rapid protein NMR assignment and structure determination by non-experts.

The ARTINA workflow starts with visual spectrum analysis (Fig. 1 ), wherein cross-peak positions are identified in frequency-domain NMR spectra using deep residual neural networks (ResNet) 24 . Coordinates of signals in the spectra are passed as input to the FLYA automated assignment algorithm 12 , yielding initial chemical shift assignments . In the subsequent chemical shift refinement step, we bring to the workflow contextual information about thousands of protein structures solved by NMR in the past using a deep GNN 25 that was trained on BMRB/PDB depositions. Its goal is to predict expected values of yet missing chemical shifts, given the shifts that have already been confidently and unambiguously assigned by FLYA. With these GNN predictions as additional input, the cross-peak positions are reassessed in a second FLYA call, which completes the chemical shift refinement cycle (Fig. 1 ).

In the structure refinement cycle , 10 variants of NOESY peak lists are generated, which differ in the number of cross-peaks selected from the output of the visual spectrum analysis by varying the confidence threshold of a signal selected by ResNet between 0.05 and 0.5. Each set of NOESY peak lists is used in an independent CYANA structure calculation 22 , 23 , yielding 10 intermediate structure proposals (Fig. 1 ). The structure proposals are ranked in the intermediate structure selection step based on 96 features with a dedicated GBT model. The selected best structure proposal is used as contextual information in a consecutive FLYA run, which closes the structure refinement cycle .

After the two initial steps of visual spectrum analysis and initial chemical shift assignment, ARTINA interchangeably executes refinement cycles. The chemical shift refinement cycle provides FLYA with tighter restraints on expected chemical shifts, which helps to assign ambiguous cross-peaks. The structure refinement cycle provides information about possible through-space contacts, allowing identified cross-peaks (especially in NOESY) to be reassigned. The high-level concept behind the interchangeable execution of refinement cycles is to iteratively update the protein structure given fixed chemical shifts, and update chemical shifts given the fixed protein structure. Both refinement cycles are executed three times.

Automated visual analysis of the spectrum

We established two machine learning models for the visual analysis of multidimensional NMR spectra (see downloads in the Code availability section). In their design, we made no assumptions about the downstream task and the 2D/3D/4D experiment type. Therefore, the proposed models can be used as the starting point of our automated structure determination procedure, as well as for any other task that requires cross-peak coordinates.

The automated visual analysis starts by selecting all extrema \({{{{{\boldsymbol{x}}}}}}=\left\{{{{{{{\boldsymbol{x}}}}}}}_{1},{{{{{{\boldsymbol{x}}}}}}}_{2},\ldots,{{{{{{\boldsymbol{x}}}}}}}_{N}\right\}\) , \({{{{{{\boldsymbol{x}}}}}}}_{n}\in {{\mathbb{N}}}^{D}\) in the NMR spectrum, which is represented as a D -dimensional regular grid storing signal intensities at discrete frequencies. We formulated the peak picking task as an object detection problem, where possible object positions are confined to \({{{{{\boldsymbol{x}}}}}}\) . This task was addressed by training a deep residual neural network 24 , in the following denoted as peak picking ResNet (pp-ResNet), which learns a mapping \({{{{{{\boldsymbol{x}}}}}}}_{n}\to[0,\,1]\) that assigns to each signal extremum a real-valued score, which resembles its probability of being a true signal rather than an artefact.

Our network architecture is strongly linked to ResNet-18 24 . It contains 8 residual blocks, followed by a single fully connected layer with sigmoidal activation. After weight initialization with Glorot Uniform 43 , the architecture was trained by optimizing a binary cross-entropy loss using Adam 44 with learning rate 10 –4 and gradient clipping of 0.5.

To establish an experimental training dataset for pp-ResNet, we normalized the 1329 spectra in our benchmark with respect to resolution (adjusting the number of data grid points per unit chemical shift (ppm) using linear interpolation) and signal amplitude (scaling the spectrum by a constant). Subsequently, 675,423 diverse 2D fragments of size 256 × 32 × 1 were extracted from the normalized spectra and manually annotated, yielding 98,730 positive and 576,693 negative class training examples. During the training process, we additionally augmented this dataset by flipping spectrum fragments along the second dimension (32 pixels), stretching them by 0–30% in the first and second dimensions, and perturbing signal intensities with Gaussian noise addition.

The role of the pp-ResNet is to quickly iterate over signal extrema in the spectrum, filtering out artefacts and selecting approximate cross-peak positions for the downstream task. The relatively small network architecture (8 residual blocks) and input size of 2D 256 × 32 image patches make it possible to analyze large 3D 13 C-resolved NOESY spectra in less than 5 min on a high-end desktop computer. Simultaneously, the first dimension of the image patch (256 pixels) provides long-range contextual information on the possible presence of signals aligned with the current extremum (e.g., C α , C β cross-peaks in an HNCACB spectrum).

Extrema classified with high confidence as true signals by pp-ResNet undergo subsequent analysis with a second deep residual neural network (deconv-ResNet). Its objective is to perform signal deconvolution, based on a 3D spectrum fragment (64 × 32 × 5 voxels) that is cropped around a signal extremum selected by pp-ResNet. This task is defined as a regression problem, where deconv-ResNet outputs a 3 × 3 matrix storing 3D coordinates of up to 3 deconvolved peak components, relative to the center of the input image. To ensure permutation invariance with respect to the ordering of components in the output coordinate matrix, and to allow for a variable number of 1–3 peak components, the architecture was trained with a Chamfer distance loss 45 .

Since deconv-ResNet deals only with true signals and their local neighborhood, its training dataset can be conveniently generated. We established a spectrum fragment generator, based on rules reflecting the physics of NMR, which produced 110,000 synthetic training examples (Supplementary Fig. 1 ) having variable (a) numbers of components to deconvolve (1–3), (b) signal-to-noise ratio, (c) component shapes (Gaussian, Lorentzian, and mixed), (d) component amplitude ratios, (e) component separation, and (f) component neighborhood type (i.e., NOESY-like signal strips or HSQC-like 2D signal clusters). The deconv-ResNet model was thus trained on fully synthetic data.

Signal unaliasing

To use ResNet predictions in automated chemical shift assignment and structure calculation, detected cross-peak coordinates must be transformed from the spectrum coordinate system to their true resonance frequencies. We addressed the problem of automated signal unfolding with the classical machine learning approach to density estimation.

At first, we generated 10 5 cross-peaks associated with each experiment type supported by ARTINA (Supplementary Table 1 ). In this process, we used randomly selected chemical shift lists deposited in the BMRB database, excluding depositions associated with our benchmark proteins. Subsequently, we trained a Kernel Density Estimator (KDE):

which captures the distribution \({p}_{e}\left({{{{{\boldsymbol{x}}}}}}\right)\) of true peaks being present at position \({{{{{\boldsymbol{x}}}}}}\) in spectrum type \(e\) , based on N e = 10 5 cross-peaks coordinates \({{{{{{\boldsymbol{x}}}}}}}_{i}^{(e)}\) generated with BMRB data, and \(\kappa\) being the Gaussian kernel.

Unfolding a k -dimensional spectrum is defined as a discrete optimization problem, solved independently for each cross-peak \({{{{{{\boldsymbol{x}}}}}}}_{j}^{\left(e\right)}\) observed in a spectrum of type \(e\) :

where \({{{{{\boldsymbol{w}}}}}}\in{{\mathbb{R}}}^{k}\) is a vector storing the spectral widths in each dimension (ppm units), \({{\circ }}\) is element-wise multiplication, \({{{{{\boldsymbol{s}}}}}}\in \,{{\mathbb{Z}}}^{k}\) is a vector indicating how many times the cross-peak is unfolded in each dimension, and \({{{{{{\boldsymbol{s}}}}}}}^{{{{{{\boldsymbol{*}}}}}}}\in {{\mathbb{Z}}}^{k}\) is the optimal cross-peak unfolding.

As long as regular and folded signals do not overlap or have different signs in the spectrum, KDE can unfold the peak list regardless of spectrum dimensionality. The spectrum must not be cropped in the folded dimension, i.e., the folding sweep width must equal the width of the spectrum in the corresponding dimension.

All 2D/3D spectra in our benchmark were folded in at most one dimension and satisfy the aforementioned requirements. However, the 4D CC-NOESY spectra satisfy neither, as regular and folded peaks both overlap and have the same signal amplitude sign. This introduces ambiguity in the spectrum unfolding that prevents direct use of the KDE technique. To retrieve original signal positions, 4D CC-NOESY cross-peaks were unfolded to overlap with signals detected in 3D 13 C-NOESY. In consequence, 4D CC-NOESY unfolding depended on other experiments, and individual 4D cross-peaks were retained only if they were confirmed in a 3D experiment.

Chemical shift assignment

Chemical shift assignment is performed with the existing FLYA algorithm 12 that uses a genetic algorithm combined with local optimization to find an optimal matching between expected and observed peaks. FLYA uses as input the protein sequence, lists of peak positions from the available spectra, chemical shift statistics, either from the BMRB 39 or the GNN described in the next section, and, if available, the structure from the previous refinement cycle. The tolerance for the matching of peak positions and chemical shifts was set to 0.03 ppm for 1 H, and 0.4 ppm for 13 C/ 15 N shifts. Each FLYA execution comprises 20 independent runs with identical input data that differ in the random numbers used in the optimization algorithm. Nuclei for which at least 80% of the 20 runs yield, within tolerance, the same chemical shift value are classified as reliably assigned 12 and used as input for the following chemical shift refinement step.

Chemical shift refinement

We used a graph data structure to combine FLYA-assigned shifts with information from previously assigned proteins (BMRB records) and possible spatial interactions. Each node corresponds to an atom in the protein sequence, and is represented by a feature vector composed of (a) a one-hot encoded atom type code (e.g., C α , H β ), (b) a one-hot encoded amino acid type, (c) the value of the chemical shift assigned by FLYA (only if a confident assignment is available, zero otherwise), (d) atom-specific BMRB shift statistics (mean and standard deviation), and (e) 30 chemical shift values obtained from BMRB database fragments. The latter feature is obtained by searching BMRB records for assigned 2–3-residue fragments that match the local protein sequence and have minimal mean-squared-error (MSE) to shifts confidently assigned by FLYA (non-zero values of feature (c) in the local neighborhood of the atom). The edges of the graph correspond to chemical bonds or skip connections. The latter connect the C β atom of a given residue with C β atoms 2, 3, and 5 residues apart in the amino acid sequence, and have the purpose to capture possible through-space influence on the chemical shift that is typically observed in secondary structure elements.

The chemical shift refinement task is defined as a node regression problem, where an expected value of the chemical shift is predicted for each atom that lacks a confident FLYA assignment. This task is addressed with a DeepGCN model 25 , 26 that was trained on 28,400 graphs extracted from 2840 referenced BMRB records 39 . Each training example was created by building a fully assigned graph out of a single BMRB record, and dropping chemical shift values (feature (c) above) for randomly chosen atoms that FLYA typically assigns either with low confidence or inaccurately.

Our DeepGCN model is designed specifically for de novo structure determination, as it uses only the protein sequence and partial shift assignments to estimate values of missing chemical shifts. Its predictions are used to guide the FLYA genetic algorithm optimization 12 by reducing its search range for assignments. The precise final chemical shift value is always determined by the position of a signal in the spectrum, rather than the model prediction alone.

Torsion angle restraints

Before each structure calculation step, torsion angle restraints for the ϕ and ψ angles of the polypeptide backbone were obtained from the current backbone chemical shifts using the program TALOS-N 21 . Restraints were only generated if TALOS-N classified the prediction as ‘Good’, ‘Strong’, or ‘Generous’. Given a TALOS-N torsion angle prediction of ϕ ± Δ ϕ , the allowed range of the torsion angle was set to ϕ ± max(Δ ϕ , 10°) for ‘Good’ and ‘Strong’ predictions, and ϕ ± 1.5 max(Δ ϕ , 10°) for ‘Generous’ predictions, and likewise for ψ .

Structure calculation and selection

Given the chemical shift assignments and NOESY cross-peak positions and intensities, the structure is calculated with CYANA 23 using the established method 22 that comprises 7 cycles of NOESY cross-peak assignment and structure calculation, followed by a final structure calculation. In total, 8 × 100 conformers are calculated for a given input data set using 30,000 torsion angle dynamics steps per conformer. The 20 conformers with the lowest final target function value are chosen to represent the solution structure proposal. The entire combined NOESY assignment and structure calculation procedure is executed independently 10 times based on 10 variants of NOESY peak lists, which differ in the number of cross-peaks selected from the output of the visual spectrum analysis. The first set generously includes all signals selected by ResNet with confidence ≥0.05. The other variants of NOESY peak lists follow the same principle with increasingly restrictive confidence thresholds of 0.1, 0.15, …, 0.5.

The CYANA structures calculations are followed by a structure selection step, wherein the 10 intermediate structure proposals are compared pairwise by a Gradient Boosted Tree (GBT) model that uses 96 features from each structure proposal (including the CYANA target function value 23 , number of long-range distance restraints, etc.; for details, see downloads in the Code availability section) to rank the structures by their expected accuracy. The best structure from the ranking is subsequently used as contextual information for the chemical shift refinement cycle (Fig. 1 ), or returned as the final outcome of ARTINA. The second-best final structure is also returned for comparison.

To train GBT, we collected a set of successful and unsuccessful structure calculations with CYANA. Each training example was a tuple ( s i , r i ), where s i is the vector of features extracted from the CYANA structure calculation output, and r i is the RMSD of the output structure to the PDB reference. The GBT was trained to take the features s i and s j of two structure calculations with CYANA as input, and to predict a binary order variable o ij , such that o ij = 1 if r i < r j , and 0 otherwise. Importantly, the deposited PDB reference structures were not used directly in the GBT model training (they are used only to calculate the RMSDs). Consequently, the GBT model is unaffected by methodology and technicalities related to PDB deposition (e.g., the structure calculation software used to calculate the deposited reference structure).

Structure accuracy estimate

As an accuracy estimate for the final ARTINA structure, a predicted RMSD to reference (pRMSD) is calculated from the ARTINA results (without knowledge of the reference PDB structure). It aims at reproducing the actual RMSD to reference, which is the RMSD between the mean coordinates of the ARTINA structure bundle and the mean coordinates of the corresponding reference PDB structure bundle for the backbone atoms N, C α , C’ in the residue ranges as given in Supplementary Table 4 . The predicted RMSD is given by pRMSD = (1 – t ) × 4 Å, where, in analogy to the GDT_HA value 46 , t is the average fraction of the RMSDs ≤ 0.5, 1, 2, 4 Å between the mean coordinates of the best ARTINA candidate structure bundle and the mean coordinates of the structure bundles of the 9 other structure proposals. Since t ∈ [0, 1], the pRMSD is always in the range of 0–4 Å, grouping all “bad” structures with expected RMSD to reference ≥ 4 Å at pRMSD = 4 Å.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

References structures: PDB Protein Data Bank ( https://www.rcsb.org/ ; accession codes in Fig. 2 and Supplementary Table 3 ).

Spectra and reference assignments: BMRB Biological Magnetic Resonance Data Bank ( https://bmrb.io/ ; entry IDs in Supplementary Table 3 ).

Peak lists, assignments, and structures: https://nmrtist.org/static/public/publications/artina/ARTINA_results.zip and in the ETH Research Collection under DOI 10.3929/ethz-b-000568621.

Source data for Figs. 2 , 4 , and 5 is available in Supplementary Tables 2 , 4 , and 5, respectively.

Code availability

The ARTINA algorithm is available as a webserver at https://nmrtist.org . pp-ResNet, deconv-ResNet, GNN, and GBT are available for download in binary form, together with architecture schemes, example input data, model input description, and source code that allows to read model files and make predictions ( https://github.com/PiotrKlukowski/ARTINA , https://nmrtist.org/static/public/publications/artina/models/ {ARTINA_peak_picking.zip, ARTINA_peak_deconvolution.zip, ARTINA_shift_prediction.zip, ARTINA_structure_ranking.zip}). These files provide a full technical specification of the components developed within ARTINA, and allow for their independent use in Python.

Existing software used: Python ( https://www.python.org/ ), CYANA ( https://www.las.jp/ ), TALOS-N ( https://spin.niddk.nih.gov/bax/software/TALOS-N ).

Wüthrich, K. NMR studies of structure and function of biological macromolecules (Nobel Lecture). Angew. Chem. Int. Ed. 42 , 3340–3363 (2003).

Article CAS Google Scholar

Sakakibara, D. et al. Protein structure determination in living cells by in-cell NMR spectroscopy. Nature 458 , 102–105 (2009).

Article ADS CAS Google Scholar

Guerry, P. & Herrmann, T. Advances in automated NMR protein structure determination. Q. Rev. Biophys. 44 , 257–309 (2011).

Güntert, P. Automated structure determination from NMR spectra. Eur. Biophys. J. 38 , 129–143 (2009).

Garrett, D. S., Powers, R., Gronenborn, A. M. & Clore, G. M. A common sense approach to peak picking two-, three- and four-dimensional spectra using automatic computer analysis of contour diagrams. J. Magn. Reson. 95 , 214–220 (1991).

ADS CAS Google Scholar

Koradi, R., Billeter, M., Engeli, M., Güntert, P. & Wüthrich, K. Automated peak picking and peak integration in macromolecular NMR spectra using AUTOPSY. J. Magn. Reson. 135 , 288–297 (1998).

Würz, J. M. & Güntert, P. Peak picking multidimensional NMR spectra with the contour geometry based algorithm CYPICK. J. Biomol. NMR 67 , 63–76 (2017).

Klukowski, P. et al. NMRNet: A deep learning approach to automated peak picking of protein NMR spectra. Bioinformatics 34 , 2590–2597 (2018).

Li, D. W., Hansen, A. L., Yuan, C. H., Bruschweiler-Li, L. & Brüschweiler, R. DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra. Nat. Commun. 12 , 5229 (2021).

Bartels, C., Güntert, P., Billeter, M. & Wüthrich, K. GARANT—A general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra. J. Comput. Chem. 18 , 139–149 (1997).

Zimmerman, D. E. et al. Automated analysis of protein NMR assignments using methods from artificial intelligence. J. Mol. Biol. 269 , 592–610 (1997).

Schmidt, E. & Güntert, P. A new algorithm for reliable and general NMR resonance assignment. J. Am. Chem. Soc. 134 , 12817–12829 (2012).

Linge, J. P., O’Donoghue, S. I. & Nilges, M. Automated assignment of ambiguous nuclear overhauser effects with ARIA. Methods Enzymol. 339 , 71–90 (2001).

Herrmann, T., Güntert, P. & Wüthrich, K. Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J. Mol. Biol. 319 , 209–227 (2002).

Allain, F., Mareuil, F., Ménager, H., Nilges, M. & Bardiaux, B. ARIAweb: a server for automated NMR structure calculation. Nucleic Acids Res. 48 , W41–W47 (2020).

Lee, W. et al. I-PINE web server: Aan integrative probabilistic NMR assignment system for proteins. J. Biomol. NMR 73 , 213–222 (2019).

Huang, Y. P. J. et al. An integrated platform for automated analysis of protein NMR structures. Methods Enzymol. 394 , 111–141 (2005).

Kobayashi, N. et al. KUJIRA, a package of integrated modules for systematic and interactive analysis of NMR data directed to high-throughput NMR structure studies. J. Biomol. NMR 39 , 31–52 (2007).

López-Méndez, B. & Güntert, P. Automated protein structure determination from NMR spectra. J. Am. Chem. Soc. 128 , 13112–13122 (2006).

Murphy, K. P. Probabilistic Machine Learning: An Introduction (MIT Press, 2022).

Shen, Y. & Bax, A. Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J. Biomol. NMR 56 , 227–241 (2013).

Güntert, P. & Buchner, L. Combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR 62 , 453–471 (2015).

Güntert, P., Mumenthaler, C. & Wüthrich, K. Torsion angle dynamics for NMR structure calculation with the new program DYANA. J. Mol. Biol. 273 , 283–298 (1997).

Article Google Scholar

Kaiming, H., Xiangyu, Z., Shaoqing, R. & Jian, S. Deep residual learning for image recognition. In Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (2016).

Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. Preprint at https://arxiv.org/abs/1609.02907 (2016).

Chiang, W. L. et al. Cluster-GCN: An efficient algorithm for training deep and large graph convolutional networks. In Proc. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD) 257–266 (2019).

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proc. 32nd Conference on Neural Information Processing Systems (NIPS) (2018).

Rosato, A. et al. The second round of Critical Assessment of Automated Structure Determination of Proteins by NMR: CASD-NMR-2013. J. Biomol. NMR 62 , 413–424 (2015).

Kirchner, D. K. & Güntert, P. Objective identification of residue ranges for the superposition of protein structures. BMC Bioinform. 12 , 170 (2011).

Buchner, L. & Güntert, P. Systematic evaluation of combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR 62 , 81–95 (2015).

Fowler, N. J., Sljoka, A. & Williamson, M. P. A method for validating the accuracy of NMR protein structures. Nat. Commun . 11 , 6321 (2020).

Huang, Y. J., Powers, R. & Montelione, G. T. Protein NMR recall, precision, and F-measure scores (RPF scores): Structure quality assessment measures based on information retrieval statistics. J. Am. Chem. Soc. 127 , 1665–1674 (2005).

Buchner, L. & Güntert, P. Increased reliability of nuclear magnetic resonance protein structures by consensus structure bundles. Structure 23 , 425–434 (2015).

Koradi, R., Billeter, M. & Güntert, P. Point-centered domain decomposition for parallel molecular dynamics simulation. Comput. Phys. Commun. 124 , 139–147 (2000).

Herrmann, T., Güntert, P. & Wüthrich, K. Protein NMR structure determination with automated NOE-identification in the NOESY spectra using the new software ATNOS. J. Biomol. NMR 24 , 171–189 (2002).

Buchner, L., Schmidt, E. & Güntert, P. Peakmatch: A simple and robust method for peak list matching. J. Biomol. NMR 55 , 267–277 (2013).

Scott, A., López-Méndez, B. & Güntert, P. Fully automated structure determinations of the Fes SH2 domain using different sets of NMR spectra. Magn. Reson. Chem. 44 , S83–S88 (2006).

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596 , 583–589 (2021).

Ulrich, E. L. et al. BioMagResBank. Nucleic Acids Res. 36 , D402–D408 (2008).

Goddard, T. D. & Kneller, D. G. Sparky 3. (University of California, San Francisco, 2001).

Delaglio, F. et al. NMRPipe—A multidimensional spectral processing system based on Unix pipes. J. Biomol. NMR 6 , 277–293 (1995).

Bartels, C., Xia, T. H., Billeter, M., Güntert, P. & Wüthrich, K. The program XEASY for computer-supported NMR spectral analysis of biological macromolecules. J. Biomol. NMR 6 , 1–10 (1995).

Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. Proc. Mach. Learn. Res. 9 , 249–256 (2010).

Google Scholar

Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2015).

Davies, E. R. Computer Vision (Academic Press, 2018).

Kryshtafovych, A. et al. New tools and expanded data analysis capabilities at the protein structure prediction center. Proteins 69 , 19–26 (2007).

Download references

Acknowledgements

We thank Drs. Frédéric Allain, Fred Damberger, Hideo Iwai, Harindranath Kadavath, Julien Orts, and Dean Strotz for providing unpublished spectra. This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 891690 (P.K.), and a Grant-in-Aid for Scientific Research of the Japan Society for the Promotion of Science (P.G., 20 K06508).

Author information

Authors and affiliations.

Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland

Piotr Klukowski, Roland Riek & Peter Güntert

Institute of Biophysical Chemistry, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438, Frankfurt am Main, Germany

Peter Güntert

Department of Chemistry, Tokyo Metropolitan University, 1-1 Minami-Osawa, Hachioji, 192-0397, Tokyo, Japan

You can also search for this author in PubMed Google Scholar

Contributions

P.K. prepared training and test data sets, designed and trained machine learning models, performed experiments described in the manuscript, and implemented ARTINA within the nmrtist.org web platform. P.K. and P.G. wrote the software. P.K., R.R., and P.G. conceived the project, analyzed the results, and wrote the manuscript.

Corresponding authors

Correspondence to Piotr Klukowski , Roland Riek or Peter Güntert .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Benjamin Bardiaux, Gaetano Montelione, Theresa Ramelot, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary info file #1, description of additional supplementary files, supplementary movie 1, supplementary movie 2, supplementary movie 3, supplementary movie 4, supplementary movie 5, reporting summary, peer review file, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Klukowski, P., Riek, R. & Güntert, P. Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA. Nat Commun 13 , 6151 (2022). https://doi.org/10.1038/s41467-022-33879-5

Download citation

Received : 28 March 2022

Accepted : 30 September 2022

Published : 18 October 2022

DOI : https://doi.org/10.1038/s41467-022-33879-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Gogulan Karunanithy
Vaibhav Kumar Shukla
D. Flemming Hansen

Nature Communications (2024)

Piotr Klukowski
Fred F. Damberger

Scientific Data (2024)

Overlay databank unlocks data-driven analyses of biomolecules for all

Anne M. Kiirikki
Hanne S. Antila
O. H. Samuli Ollila

5D solid-state NMR spectroscopy for facilitated resonance assignment

Alexander Klein
Suresh K. Vasa
Rasmus Linser

Journal of Biomolecular NMR (2023)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Documentation

Welcome to CARA

This is the official website of CARA ( C omputer A ided R esonance A ssignment). CARA is a software application for the analysis of NMR spectra and computer aided resonance assignment which is particularly suited for biomacromolecules. Dedicated tools for backbone assignment, side chain assignment, and peak integration support the entire process of structure determination. CARA was developed in Professor Kurt Wüthrich's group . Continuing development and support is provided by a group of volunteers .

CARA is free software (see licence terms ). Precompiled native executables are provided on all major platforms for easy installation and operation.

CARA can be downloaded from the CARA Downloads page.

CARA documentation is available from the CARA Documentation page.

Support site for tutorials, templates, scripts, FAQs, etc.
CARA Forum for anouncements, support or feature requests
Who is using CARA See our list of users and projects.

You are here: Home

Show pagesource
Old revisions

Personal Tools

home.txt · Last modified: 2016/07/26 23:52 by rkeller

Protein NMR Resonance Assignment

Living reference work entry
First Online: 10 March 2021
Cite this living reference work entry

Takahisa Ikegami 4 &
Fuyuhiko Inagaki 5

33 Accesses

Biosynthetic labeling ; Main chain assignment: side chain assignment ; Spectroscopic assignment

Overview of Protein Resonance Assignment

Until the introduction of the sequential assignment procedure developed by Kurt Wüthrich and his coworkers in 1980s (Wüthrich 1986 ), most protein assignment works were accomplished with reference to the corresponding crystal structures. The establishment of the sequential assignment procedure without depending on the existing three-dimensional (3D) structures was, therefore, a milestone for the protein NMR. Backbone amide proton ( 1 H N ) and α proton ( 1 H α ) signals were sequentially assigned based on the distance information between 1 H N i and 1 H α i − 1 and between 1 H N i and 1 H α i , and fragments of connected assignments were aligned on the amino acid sequence of the particular protein. This facilitated NMR to be independent of X-ray crystallography, and the solution structures of proteins were determined by NMR using the assignment of proton signals...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Bax A, Grzesiek S (1993) Methodological advances in protein NMR. Acc Chem Res 26:131–138. https://doi.org/10.1021/ar00028a001

Article CAS Google Scholar

Cavanagh J, Fairbrother W, Palmer AG, Rance M, Skeleton NJ (2007) Protein NMR spectroscopy, 2nd edn. Elsevier, Amsterdam

Google Scholar

Driscoll PC, Gronenborn AM, Wingfield PT, Clore GM (1990) Determination of the secondary structure and molecular topology of interleukin-1 beta by use of two- and three-dimensional heteronuclear 15 N- 1 H NMR spectroscopy. Biochemistry 29:4668–4682. https://doi.org/10.1021/bi00471a023

Article CAS PubMed Google Scholar

Fesik SW, Eaton HL, Olejniczak ET, Zuiderweg ERP, McIntosh LP, Dahlquist FW (1990) 2D and 3D NMR spectroscopy employing 13 C-, 13 C magnetization transfer by isotropic mixing. spin system identification in large proteins. J Am Chem Soc 112:886–888. https://doi.org/10.1021/ja00158a069

Gorman SD, Sahu D, O'Rourke KF, Boehr DD (2018) Assigning methyl resonances for protein solution-state NMR studies. Methods 148:88–99. https://doi.org/10.1016/j.ymeth.2018.06.010

Article CAS PubMed PubMed Central Google Scholar

Ikura M, Kay LE, Bax A (1990) A novel approach for sequential assignment of proton, carbon-13, and nitrogen-15 spectra of larger proteins: heteronuclear triple-resonance three-dimensional NMR spectroscopy. Application to calmodulin. Biochemistry 29:4659–4667. https://doi.org/10.1021/bi00471a022

Jabar S, Adams LA, Wang Y, Aurelio L, Graham B, Otting G (2017) Chemical tagging with tert -butyl and trimethylsilyl groups for measuring intermolecular nuclear Overhauser effects in a large protein-ligand complex. Chemistry 23:13033–13036. https://doi.org/10.1002/chem.201703531

Kainosho M, Tsuji T (1982) Assignment of the three methionyl carbonyl carbon resonances in Streptomyces subtilisin inhibitor by a carbon-13 and nitrogen-15 double-labeling technique. A new strategy for structural studies of proteins in solution. Biochemistry 21:6273–6279. https://doi.org/10.1021/bi00267a036

Kainosho M, Torizawa T, Iwashita Y, Terauchi T, Ono AM, Güntert P (2006) Optimal isotope labelling for NMR protein structure determinations. Nature 440:52–57. https://doi.org/10.1038/nature04525

Kasai T, Ono S, Koshiba S, Yamamoto M, Tanaka T, Ikeda S, Kigawa T (2020) Amino-acid selective isotope labeling enables simultaneous overlapping signal decomposition and information extraction from NMR spectra. J Biomol NMR 74:125–137. https://doi.org/10.1007/s10858-019-00295-9

Kay LE (2001) Nuclear magnetic resonance methods for high molecular weight proteins: a study involving a complex of maltose binding protein and β-cyclodextrin. In: James TL, Dotsch V, Schmitz U (eds) Methods enzymol, vol 339. Academic, New York, pp 174–203. https://doi.org/10.1016/s0076-6879(01)39314-x

Chapter Google Scholar

Kay LE, Ikura M, Tschudin R, Bax A (1990) Three-dimensional triple-resonance NMR spectroscopy of isotopically enriched proteins. J Magn Reson 89:496–514. (Reprint, 213:423–441). https://doi.org/10.1016/j.jmr.2011.09.004

Maciejewski MW, Schuyler AD, Gryk MR, Moraru II, Romero PR, Ulrich EL, Eghbalnia HR, Livny M, Delaglio F, Hoch JC (2017) NMRbox: a resource for biomolecular NMR computation. Biophys J 112:1529–1534. https://doi.org/10.1016/j.bpj.2017.03.011

McIntosh LP, Dahlquist FW (1990) Biosynthetic incorporation of 15 N and 13 C for assignment and interpretation of nuclear magnetic resonance spectra of proteins. Q Rev Biophys 23:1–38

Morita EH, Shimizu M, Ogasawara T, Endo Y, Tanaka R, Kohno T (2004) A novel way of amino acid-specific assignment in 1 H- 15 N HSQC spectra with a wheat germ cell-free protein synthesis system. J Biomol NMR 30:37–45

Oh BH, Westler WM, Darba P, Markley JL (1988) Protein carbon-13 spin systems by a single two-dimensional nuclear magnetic resonance experiment. Science 240:908–911. https://doi.org/10.1126/science.3129784

Pritišanac I, Alderson TR, Güntert P (2020) Automated assignment of methyl NMR spectra from large proteins. Prog Nucl Magn Reson Spectrosc 118–119:54–73. https://doi.org/10.1016/j.pnmrs.2020.04.001

Reif B (2017) Proton-detection in biological MAS solid-state NMR spectroscopy. In: Webb GA (ed) Modern magnetic resonance. Springer, Cham, pp 1–33. https://doi.org/10.1007/978-3-319-28275-6

Rennella E, Huang R, Yu Z, Kay LE (2020) Exploring long-range cooperativity in the 20S proteasome core particle from Thermoplasma acidophilum using methyl-TROSY-based NMR. Proc Natl Acad Sci U S A 117:5298–5309. https://doi.org/10.1073/pnas.1920770117

Shen Y, Bax A (2015) Protein structural information derived from NMR chemical shift with the neural network program TALOS-N. Methods Mol Biol 1260:17–32. https://doi.org/10.1007/978-1-4939-2239-0_2

Stoffregen MC, Schwer MM, Renschler FA, Wiesner S (2012) Methionine scanning as an NMR tool for detecting and analyzing biomolecular interaction surfaces. Structure 20:573–581. https://doi.org/10.1016/j.str.2012.02.012

Torchia DA, Sparks SW, Bax A (1989) Staphylococcal nuclease: sequential assignments and solution structure. Biochemistry 28:5509–5524. https://doi.org/10.1021/bi00439a028

Tugarinov V, Kay LE (2003) Ile, Leu, and Val methyl assignments of the 723-residue malate synthase G using a new labeling strategy and novel NMR methods. J Am Chem Soc 125:13868–13878. https://doi.org/10.1021/ja030345s

Tugarinov V, Muhandiram R, Ayed A, Kay LE (2002) Four-dimensional NMR spectroscopy of a 723-residue protein: chemical shift assignments and secondary structure of malate synthase G. J Am Chem Soc 124:10025–10035. https://doi.org/10.1021/ja0205636

Wüthrich K (1986) NMR of proteins and nucleic acids. Wiley, New York

Book Google Scholar

Wüthrich K, Wider K (2003) Transverse relaxation-optimized NMR spectroscopy with biomacromolecular structure in solution. Magn Reson Chem 41:S80–S88. https://doi.org/10.1002/mrc.1280

Download references

Author information

Authors and affiliations.

Graduate School of Medical Life Science, Yokohama City University, Yokohama, Japan

Takahisa Ikegami

Department of Structural Biology, Hokkaido University, Kita-ku, Sapporo, Japan

Fuyuhiko Inagaki

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takahisa Ikegami .

Editor information

Editors and affiliations.

University Leicester MRC Centre, Leicester, UK

Gordon Roberts

Dept Biochemistry, University of Oxford, Oxford, UK

Anthony Watts

Section Editor information

No affiliation provided

Mitsu Ikura

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry.

Ikegami, T., Inagaki, F. (2021). Protein NMR Resonance Assignment. In: Roberts, G., Watts, A. (eds) Encyclopedia of Biophysics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35943-9_312-1

Download citation

DOI : https://doi.org/10.1007/978-3-642-35943-9_312-1

Received : 13 November 2020

Accepted : 16 November 2020

Published : 10 March 2021

Publisher Name : Springer, Berlin, Heidelberg

Print ISBN : 978-3-642-35943-9

Online ISBN : 978-3-642-35943-9

eBook Packages : Springer Reference Biomedicine and Life Sciences Reference Module Biomedical and Life Sciences

Publish with us

Policies and ethics

Find a journal
Track your research

Protein assignment

General introduction.

Chris Spronk, January 2009.

This tutorial is divided in a number of topics (shown on the right) and is designed to achieve 2 main goals:

to familiarise users who are new to CcpNmr Analysis with an important part of the programs' functionality and core concepts.

to teach the basic steps that are typically performed in the interpretation and assignment of NMR data of a double labeled protein.

The tutorial is set up such that the user can start from a variety of stages in the tutorial, by loading project files that have been prepared and are available for download.

The data used in this tutorial come from the 95 residue protein NapD 1 . The protein consists of alpha-helices, beta-sheets, turns and a flexible tail, and the NMR data contain easy and more difficult parts to assign. As such, NapD provides a representative example for other real life cases. The tutorial data can be downloaded here, and contains ~1.5Gb of NMR data and associated files:

15N-HSQC , 600MHz

CBCAcoNH , 600MHz

HNCACB , 600MHz

HNcaHA , 600MHz

HBHAcbcacoNH , 600MHz

hCCH-TOCSY , 600MHz

HcCH-TOCSY , 600MHz

13C-NOESY-HSQC_aro , aromatic region, 800MHz

13C-NOESY-HSQC , aliphatic region, 800MHz

15N-NOESY-HSQC , 800MHz

The spectra, projects and related data can be downloaded from here in 6 tgz files:

proteinAssignmentTutorialFiles1.tgz

proteinAssignmentTutorialFiles2.tgz

proteinAssignmentTutorialFiles3.tgz

proteinAssignmentTutorialFiles4.tgz

proteinAssignmentTutorialFiles5.tgz

proteinAssignmentTutorialFiles6.tgz

Tutorial directory structure

The main data directory contains the sequence of the protein in 'fasta' format, NapD.fasta, and the following subdirectories:

Azara: Contains NMR spectra

RCSB_DEPOSITION: contains structures, constraints, chemical shifts etcetera that were deposited at the RCSB .

NapD_....(saved projects)

Saved projects' descriptions

The NapD_.... directories contain the saved intermediate project states:

NapD_1: Project set up; sequence loaded; Molecules defined

NapD_2: Spectra loaded; spectra and windows customised

NapD_3: 15N-HSQC peaks automatically picked; noise peaks removed; added manually picked peaks. A clean starting point for 3d peak picking.

NapD_4: Initialised NH roots and spin systems; identified NH2's; Auto arranged peak labels

NapD_5: CBCAcoNH/HNCACB/HNcaHA/HBHAcbcacoNH linked to 15N-HSQC, 3D peaks picked and assigned to root frequencies. Non root resonances created

NapD_6: Some side chains assigned in the hCCH-TOCSY

NapD_7: Short stretch of sequential amino acids assigned

Next: Setting up the tutorial project

Setting up the tutorial project

Loading the spectra

Customising project settings

Introduction to protein backbone assignment

15N-HSQC peak picking

Assigning root resonances (1)

Assigning root resonances (2)

3D peak picking (1)

3D peak picking (2)

Assigning non-root resonances

Identifying spin system types

Assigning side chain resonances

Sequential assignment (1)

Sequential assignment (2)

Protein NMR

A practical guide, triple resonance backbone assignment.

Standard triple resonance backbone assignment of proteins is based on the CBCANNH and CBCA(CO)NNH spectra. The idea is that the CBCANNH correlates each NH group with the Cα and Cβ chemical shifts of its own residue (strongly) and of the residue preceding (weakly). The CBCA(CO)NNH only correlates the NH group to the preceding Cα and Cβ chemical shifts. The Figure below shows how this can be used to link one NH group to the next into a long chain.

In practise, using the CBCANH and CBCA(CO)NH spectra this looks like this (Cαs are shown in dark blue, Cβs in light blue):

Alternatively, some software packages (such as CCPNmr Analysis) allow you to superimpose the two spectra and then your strips will look like this:

The Cα and Cβ chemical shifts adopt values characteristic of the amino acid type. Some of these, such as Alanine, Serine, Threonine and Glycine are very easy to spot as their Cβ chemical shifts are very different to those of the other amino acids (and in the case of Glycine there is no Cβ). Valine, Isoleucine and Proline are also likely to stand out by the fact that they have lower than normal Cα chemical shifts. Once a chain of NH groups with their corresponding Cα and Cβ chemical shifts has been built, then the identification of some of the amino acid types makes it possible to match this string to the sequence. E.g. a string of shifts may have been found that corresponds to xxxSxxAx – if this sequence only appears once in the sequence of the protein in question, then sequence-specific assignment can be made. You may be surprised to find that I have coloured the Glycine Cα in light blue rather than dark blue – this is to illustrate the fact that Glycine Cα peaks may often have the same sign (i.e. be positive or negative) as the other Cβ peaks. Whether this really is the case, though. depends on the exact pulse sequence you use.

In some cases, in particular if your protein is fairly large (>200 residues, say), you may find that the quality of the CBCANNH and CBCA(CO)NNH spectra are not very good. The Cβ resonances may, for example, not be visible above the noise level. In this case it is possible to use the Cα and C’ chemical shifts rather than the Cα and Cβ chemical shifts, as those which you use to walk from one residue to the next. The HNCA and HN(CO)CA experiments give you the same information as the CBCANNH and CBCA(CO)NNH spectra, except without the Cβ resonances. To complement this, you can then record the HNCO and HN(CA)CO experiments. These link each NH(i) group with the C'(i-1) (HNCO) or with C'(i) and C'(i-1) (HN(CA)CO). The residues are now linked up in the following manner:

The advantage of using the HNCO and HNCA-based spectra is that they are more sensitive than the CBCANNH-type and thus the spectral quality should improve. The disadvantage is that the Cα and C’ chemical shifts provide less information about the amino acid type than the Cβ chemical shift and are less disperse.

COMMENTS

Protein NMR
Most books on Protein NMR focus on theoretical aspects and pulse sequences with only little space devoted to resonance assignment and structure calculations. At the same time many software manuals provide detailed information on how to use the software, but assume prior knowledge of the concepts of assignment and structure calculation. This has produced a gap in this area which these webpages ...
Assignment of Protein NMR Spectra Using Heteronuclear NMR—A Tutorial
The assignment of resonances in the complex nuclear magnetic resonance (NMR) spectrum of a protein is the first step in any NMR study of protein structure, function or dynamics. This chapter aims to provide a tutorial on protein NMR resonance assignment. Two approaches to the assignment are commonly used: the triple resonance methodology, which uses a suite of three-dimensional (3D) 13 C/ 15 N ...
Assignment Practice
There are several ways in which triple resonance backbone assignment, in particular, can be approached in CCPNmr Analysis using more or less automated methods. Initially a more manual method will be described, as this makes it easier to understand the process of assignment for those who are new to protein NMR assignment.
Tutorials
Learn how to AnalysisAssign to make manual assignments in solid-state NMR spectra, and assign a small protein using [1,3]- 13 C and [2] -13 C labelled glycerol samples.
Solid-state Assignment Tutorial
This page contains a tutorial for protein assignment using solid-state MAS NMR data and CCPNmr Analysis. It is hoped that it will be of use to those who are new to solid-state assignment and/or CCPNmr Analysis.
Assignment of Protein Nmr Spectra Using Heteronuclear Nmr
This led to the development of a systematic method for the assignment of the 2D NMR spectra of proteins that relied only on information about the amino acid sequence of the protein; this is the sequential assignment method [3-5].
Assignment of Protein NMR Spectra Using Heteronuclear NMR—A Tutorial
The assignment of resonances in the complex nuclear magnetic resonance (NMR) spectrum of a protein is the first step in any NMR study of protein structure, function or dynamics. This chapter aims ...
PDF CcpNmr Analysis Version 3 Backbone Assignment Tutorial
The following parts of this tutorial are How To'sand cover the usage of other Backbone Assignment tools, in particular how to inspect the assignment and edit it.
Biomolecular NMR Wiki
The interpretation and assignment of NMR data of a protein usually starts with assignment of the backbone atoms and linking them with their sequential neighbours. The experiments that we will use at this stage are: These experiments allow us to assign most of the HA, HB, CA, CB and HN resonances of many residues in NapD and group them into spin ...
NMR-Based Methods for Protein Analysis
NMR-Based Methods for Protein Analysis. Nuclear magnetic resonance (NMR) spectroscopy is a well-established method for analyzing protein structure, interaction, and dynamics at atomic resolution and in various sample states including solution state, solid state, and membranous environment. Thanks to rapid NMR methodology development, the past ...
NMRtist
NMRtist is a cloud computing service for the fully automated analysis of protein NMR spectra (e.g. peak picking, chemical shift assignment, structure determination) using deep learning-based approaches. Each project created in NMRtist receives 30 GB of private storage, which can be filled by experimental data and analyzed using the available ...
Assignment
A new NMR assignment programme by Greg Benison. Alongside the usual strip-plots etc. it also incorporates the higher-order spectrum principle, a way of assigning peaks in 3- and higher dimensional spectra that doesn't rely on peaklists.
Rapid protein assignments and structures from raw NMR spectra with the
The analysis of protein NMR spectra is time-consuming and can occupy a human expert for weeks or months. The researchers in this work present a deep learning-based method that delivers signal ...
Protein NMR: Modern Techniques and Biomedical Applications
This book covers new techniques in protein NMR, from basic principles to state-of-the-art research. It covers a spectrum of topics ranging from a "toolbox" for how sequence-specific resonance assignments can be obtained using a suite of 2D and 3D NMR experiments and tips on how overlap problems can be overcome. Further topics include the novel applications of Overhauser dynamic nuclear ...
PDF Assignment of Protein NMR Spectra Using Heteronuclear NMR—A Tutorial
The assignment of resonances in the complex nuclear magnetic resonance (NMR) spectrum of a protein is the first step in any NMR study of protein structure, func- tion or dynamics.
Home
CARA is a software application for the analysis of NMR spectra and computer aided resonance assignment which is particularly suited for biomacromolecules. Dedicated tools for backbone assignment, side chain assignment, and peak integration support the entire process of structure determination.
Assignment Theory
Large proteins give worse NMR spectra, because they tumble more slowly. For this reason the CBCANNH and CBCA (CO)NNH spectra of larger proteins (> 150 residues) are often not of sufficient quality to be able to carry out a full assignment. In this case a good option is the use of HNCA, HN (CO)CA, HNCO and HN (CA)CO spectra.
Protein NMR Resonance Assignment
This facilitated NMR to be independent of X-ray crystallography, and the solution structures of proteins were determined by NMR using the assignment of proton signals and proton-proton distance information. The limited resolution of two-dimensional (2D) 1 H NMR spectra, however, restricted the molecular weights of target proteins to less than 8 ...
Sculpting conducting nanopore size and shape through de novo protein
(D) Long-range NMR NOE contacts mapped to the expected TMB12_3 hydrogen bonds (dashed black lines). Residues with amide assignment are shown in white and green, unassigned residues are shown in ash gray. Residues with β-sheet secondary structure are shown as squares, all others as circles. Bold outlines indicate available methyl assignments.
Biomolecular NMR Wiki
The data used in this tutorial come from the 95 residue protein NapD1. The protein consists of alpha-helices, beta-sheets, turns and a flexible tail, and the NMR data contain easy and more difficult parts to assign.
Automatic Assignment
This automatic backbone assignment programme uses chemical shifts from 3D assignment spectra and secondary structure prediction as its input. It can also assign the backbone using RDC data and a known structure of the protein. If you have installed MARS on your computer, then you can also access it directly via CCPNmr Analysis without having to export and re-import the data. Also available via ...
Triple Resonance Backbone Assignment
Triple Resonance Backbone Assignment. Standard triple resonance backbone assignment of proteins is based on the CBCANNH and CBCA (CO)NNH spectra. The idea is that the CBCANNH correlates each NH group with the Cα and Cβ chemical shifts of its own residue (strongly) and of the residue preceding (weakly). The CBCA (CO)NNH only correlates the NH ...

Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA

Similar content being viewed by others

The 100-protein NMR spectra dataset: A resource for biomolecular NMR data analysis

DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra

Solution-state methyl NMR spectroscopy of large non-deuterated proteins enabled by deep neural networks

Benchmark dataset

Automated protein structure determination

Error analysis

Ablation studies

Impact of 4D NOESY experiments

Automated chemical shift assignment

Spectrum benchmark collection

Overview of the ARTINA algorithm

Automated visual analysis of the spectrum

Signal unaliasing

Chemical shift assignment

Chemical shift refinement

Torsion angle restraints

Structure calculation and selection

Structure accuracy estimate

Reporting summary

Data availability

Code availability

Acknowledgements

Author information

Contributions

Corresponding authors

Ethics declarations

Peer review

Additional information

Supplementary information

About this article

Share this article

This article is cited by

Overlay databank unlocks data-driven analyses of biomolecules for all

5D solid-state NMR spectroscopy for facilitated resonance assignment

Quick links

Welcome to CARA

Personal Tools

Protein NMR Resonance Assignment

Overview of Protein Resonance Assignment

Access this chapter

Author information

Corresponding author

Editor information

Section Editor information

Rights and permissions

Copyright information

About this entry

Download citation

Protein assignment

Tutorial directory structure

Saved projects' descriptions

Protein NMR

COMMENTS