Introduction: Serum profiling with Matrix Assisted Laser Desorption Ionization (MALDI) Time-of-Flight (TOF) mass spectrometry provides a quick and efficient way to assess and monitor the circulating proteome. Changes in the circulating proteome can provide insight for biomarker discovery and personalized medicine. MALDI TOF can provide useful information on changes of relative abundance of circulating proteins. Changes in the serum profile are observed by monitoring relative peak height within spectra, however, matching peak m/z values to known protein identifications can be problematic. Heterogeneity due to sequence variation, post-translational modifications, exopeptidase action, and variable modifications of free cysteines produces masses that may not be accounted for in database searches, which makes identification problematic. Typically bottom-up analysis is leveraged for protein identification. But through this process only the mass for the protein predicted from the genome is obtained. Though one could successfully identify the protein through database search, it might not match up to any proetoform peaks in the MALDI spectrum. However, with top-down protein analysis the intact mass of the proteoform is recorded by the mass spectrometer before fragmentation and identification. With this method we can get a proteoform identification with intact mass and tandem MS product ion assignment, then the intact mass is matched back to the peaks in the MALDI spectra to get tentative protein identification of the peaks used for serum profiling.
Methods: MALDI TOF spectra (m/z 3,000 to 30,000) was generated for all test specimens. To increase the signal-to-noise ratio 400,000 laser shots were averaged to obtain >300 distinct peaks per spectrum. For protein identification, a subset of samples was analyzed by top-down LC-MS via 21 T FT-ICR MS without any depletion of high abundant proteins prior to analysis. A Waters M Class UHPLC was used with an analytical column comprised of a PLRP-S 1000 Angstrom pore size; 5 um particle size, with a gradient of 25-minutes. A custom-built 21 T FT-ICR mass spectrometer (National High Magnetic Field Laboratory, Tallahassee, FL) was used for top-down analysis was run with settings of 300,000 resolving power for full scans and 150,000 resolving power for CID scans (RP specified at m/z 400). The instrument settings were optimized for routine analysis of 3 – 30 kDa, which complements the m/z range in the MALDI spectra used for profiling. The top-down mass spectrometry data was analyzed with TDViewer (National Resource For Translational and Developmental Proteomics). The intact mass from the identified proteins with less than a 1% false discovery rate was mapped back to the peaks in the MALDI spectra.
Results: The results from unfractioned serum analyzed with top-down LC-MS identified 24 Proteins and 57 proteoforms. The peak identifications with intact mass were mapped back to MALDI TOF spectra. Here we were able to tentatively assign approximately half of the peaks, however the peaks that were assigned accounted for approximately 80% of the observed signal magnitude. Thus, many of the high abundant peaks in the spectra were assigned peak identifications. The mass of intact proteins assigned at 21 T had an RMS error of 0.80 ppm, while the assigned proteins in the MALDI-TOF spectra had an RMS error of 1.5 Da. The analysis of un-fractioned serum by LC-MS resulted in the mass spectrometer signal being significantly swamped by the most highly abundant proteins, which was not unexpected. This impacted the depth of the proteome that was interrogated. Un-fractioned serum was analyzed to closely match the sample preparation in the MALDI-TOF work flow used for profiling. Future experiments are planned to include depletion of the abundant proteins followed by top-down LCMS with a 21 T FTICR mass spectrometer. This will allow more in-depth analyses of the circulating proteome and should result in an increase in the number of peaks identified.
Conclusions and Discussion: Serum profiling for patients diagnosed with NSCLC can be conducted using MALDI-TOF Mass spectrometry. This technique provides a powerful and efficient way to monitor the circulating proteome. However, identification of the representative proteins in the spectra is problematic. Here we incorporated protein identification with top-down LC-MS which provides the advantage of incorporating the intact mass along with the protein identification for this specific cohort. The intact mass provides a means to match the proteoform identification to the peak in the MALDI spectra to provide a tentative identification. The initial approach for top-down analysis of the non-depleted serum res.
Introduction: Matrix Assisted Laser Desorption Ionization Time of Flight (MALDI-TOF) mass spectrometry (MS) is a method well-suited for high-throughput analysis of biological samples. While identification of the proteins and peptides represented by mass spectral features is feasible, it typically requires large amounts of biological material, is time consuming, and does not necessarily elucidate all relevant functions or reflect the complex network interactions present in biological systems. Set enrichment analysis approaches (1, 2) allow the study of expression differences that are consistent across pre-specified groups or sets of known genes or proteins related to a biological function. Here, we show how a matched protein panel and mass spectral data collected on a reference sample set can be used together in a set enrichment approach to identify the associations of mass spectral features with biological functions without the need to identify the protein constituents of the MS features.
Methods: Mass spectra peaks (features) were generated by applying our DeepMALDI® protocol (3) to mass spectrometry data collected from the serum of 100 lung cancer patients purchased from Oncology Metrics Inc. and Conversant Bio using a MALDI mass spectrometer (SimulTOF 100, SimulTOF Systems, Marlborough, MA, USA).
Protein sets of interest were defined by querying for biological functions in the GeneOntology database (GO (http://www.geneontology.org); Gene Ontology Consortium), using AmiGO (http://amigo.geneontology.org/amigo) and EMBL-EBI QuickGO (https://www.ebi.ac.uk/QuickGO/).
The expression levels of 1305 proteins were measured from the same set of 100 serum samples above using aptamer-based, multiplexed, proteomic technology on 1.3k SOMAscan® assay (Somalogic, Boulder, CO).
Spearman correlation was used to assess association between MS features and individual protein expressions. Enrichment scores (ES) were calculated using an extension of the gene set enrichment analysis (GSEA) approach (2) which separates the sample set into two halves, calculates the standard ES for each half and averages these together over multiple splits of the sample set. P values were determined from a null distribution generated from random permutation of phenotype (mass spectral feature) values.
Results: Many of the mass-spectral features were significantly associated with more than one biological process reflecting the multi-functionality of proteins and overlapping pathways. Several biological processes (e.g. acute phase reaction, acute inflammatory response, complement activation, wound healing, innate immune response, and IFN type 1 signaling) were associated with more than 100 mass spectral features (out of 274). Biological functions such as immune tolerance and suppression, IFN γ signaling, or type 17 immune response were associated with several tens of features, while others, such as epithelial- mesenchymal transition, response to hypoxia, or B-cell mediated immunity had just a handful of significant correlations. The dissimilar proportion of features associated with certain biological processes mirrors the abundance of host-derived proteins associated with acute and chronic inflammation in serum proteome. Processes related to cell-signaling and tissue organization are represented by proteins present at lower concentrations and are more difficult to capture using MALDI mass spectrometry.
Conclusions and Discussion: Set enrichment analysis can be a powerful tool to identify associations of mass spectral features with biological functions without requiring the sequencing of the individual peaks.