br breast cancer associated proteins with
319 breast-cancer-associated proteins with high inter-labora-tory reproducibility. The data discriminated basal versus luminal breast cancer phenotypes and largely correlated with ER levels in 30 cell lines.
To increase the number of proteins reproducibly quantified across samples, in the present study, we use a highly multiplexed mode of targeted proteomics, sequential windowed acquisition of all theoretical fragment Methoctramine spectra-mass spectrometry (SWATH-MS), a next-generation proteomics approach devel-oped by Gillet and colleagues (Gillet et al., 2012). For the targeted analysis of the acquired data, we built a compre-hensive breast-cancer-specific SWATH assay library. We applied the SWATH-MS technique to obtain digital proteome maps (or ‘‘proteotypes’’) for a set of 96 breast tumor lysates (Data S1) and classified them into five proteotype-based subtypes using a conditional reference tree algorithm (Hothorn et al., 2006). The algorithm found three key proteins that are highly effective for group separation; the agreement between our proteotype-based subtypes and the conventional sub-types is 84%. The triple-negative subtype showed the highest degree of heterogeneity of protein expression. In addition to allowing a more refined classification of breast cancer sub-types, the obtained SWATH-MS data allowed us to compare protein and transcript levels of over 2,700 genes. Although the correlation of protein and transcript levels was low for most differentially expressed genes, it was strong for the three classifying proteins. In conclusion, this study describes the application of the SWATH-MS technique to generate large-scale quantitative proteomics profiles of breast cancer tissues for tumor classification. Discrepancies between the conventional tumor subtypes and our proteotype-based subtypes indicate patients that could potentially benefit from different treatment strategies.
Generation of an Assay Library for Quantifying Breast-Cancer-Associated Proteins by SWATH-MS
To extract quantitative protein information from SWATH-MS da-tasets acquired from breast cancer patient tissue samples in a targeted manner, we generated an extensive spectral library based on samples of all classical breast cancer subtypes described above and fractionated pools thereof. From this spectral library, we obtained reference spectra for 28,233 pro-teotypic peptides and their modified variants (false discovery rate [FDR] < 0.01), representing 4,443 proteins (Data S2A). This spectral library was used in the following to quantify pro-teins in breast cancer tissue lysates using the SWATH-MS approach. The assay library covers many key proteins involved in cancer-related pathways and molecular functions, such as the cell cycle/p53, transforming growth factor b (TGF-b), JAK-STAT, phosphatidylinositol 3-kinase (PI3K)-AKT, EGFR, and Wnt pathways, as well as adherent junctions, extracellular ma-trix (ECM)-receptor interactions, and apoptosis (Data S2B). This breast cancer SWATH assay library has been made available through the SWATHAtlas database (http://www.swathatlas. org) as a public resource to support further basic and applied breast cancer research.
Generation of a SWATH-MS Data Matrix Consisting of 2,842 Consistently Quantified Proteins across 96 Patient Samples
We analyzed the proteome of 96 breast cancer tumor tissues by SWATH-MS. Each tumor tissue was previously classified by a pathologist into one of the five conventional breast cancer subtypes (defined by ER, PR, HER2 status, and tumor grade) and according to their lymph node status. In addition to the anal-ysis of individual breast cancer samples, we also analyzed pooled samples of each of the five subtypes. For each subtype, lymph-node-negative and lymph-node-positive samples were pooled separately, generating ten sample pools in total. Using the SWATH assay library described above, we were able to extract quantitative data for 25,278 proteotypic peptides and their modified variants, representing 2,842 proteins across all individual samples. These 2,842 consistently quantified proteins cover the majority of molecular processes known to be involved in breast cancer (Figure S1).
Comparison of Proteotype-Based Subtypes and Conventional Subtypes of Breast Cancer
Using the thus generated proteotypes for 96 samples, we first asked to what extent tumor classification based on proteo-types correlated with the conventional subtype classification. We performed unsupervised hierarchical clustering on the proteotypes of the pooled samples. Figure 1A shows that pools of lymph-node-positive and negative samples of each subtype clustered closely together, indicating high reproduc-ibility of our measurements. Moreover, clustering revealed proteotype similarity between less aggressive luminal A and luminal B subtypes, whereas the more aggressive HER2 and triple-negative subtypes formed a separate cluster. The luminal B HER2+ group was more similar to the cluster with high aggressiveness, in agreement with its worse therapy response (Figure 1A).