Early View
ORIGINAL ARTICLE
Open Access

Detailed DNA methylation characterisation of phyllodes tumours identifies a signature of malignancy and distinguishes phyllodes from metaplastic breast carcinoma

Braydon Meyer

Braydon Meyer

Epigenetics Research Laboratory, Cancer Ecosystems Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia

St. Vincent's Clinical School, University of New South Wales, Sydney, New South Wales, Australia

Equal contributions: Braydon Meyer and Clare Stirzaker.

Search for more papers by this author
Clare Stirzaker

Clare Stirzaker

Epigenetics Research Laboratory, Cancer Ecosystems Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia

St. Vincent's Clinical School, University of New South Wales, Sydney, New South Wales, Australia

Equal contributions: Braydon Meyer and Clare Stirzaker.

Search for more papers by this author
Sonny Ramkomuth

Sonny Ramkomuth

Tumour Progression Laboratory, Cancer Ecosystems Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia

Search for more papers by this author
Kate Harvey

Kate Harvey

Tumour Progression Laboratory, Cancer Ecosystems Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia

Search for more papers by this author
Belinda Chan

Belinda Chan

Department of Surgery, Chris O'Brien Lifehouse, Camperdown, New South Wales, Australia

Search for more papers by this author
Cheok Soon Lee

Cheok Soon Lee

Department of Tissue Pathology and Diagnostic Oncology, NSW Health Pathology, Royal Prince Alfred Hospital, Camperdown, New South Wales, Australia

Department of Anatomical Pathology and Molecular Pathology Laboratory, Liverpool Hospital, Liverpool, New South Wales, Australia

Discipline of Pathology, School of Medicine, Western Sydney University, Liverpool, New South Wales, Australia

Search for more papers by this author
Rooshdiya Karim

Rooshdiya Karim

Department of Tissue Pathology and Diagnostic Oncology, NSW Health Pathology, Royal Prince Alfred Hospital, Camperdown, New South Wales, Australia

Sydney Medical School, University of Sydney, Sydney, New South Wales, Australia

Search for more papers by this author
Niantao Deng

Niantao Deng

Tumour Progression Laboratory, Cancer Ecosystems Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia

Search for more papers by this author
Kelly A Avery-Kiejda

Kelly A Avery-Kiejda

School of Biomedical Sciences and Pharmacy, College of Health, Medicine and Wellbeing, The University of Newcastle, Newcastle, New South Wales, Australia

Discipline of Medical Genetics, School of Biomedical Sciences and Pharmacy, University of Newcastle, Callaghan, New South Wales, Australia

Search for more papers by this author
Rodney J Scott

Rodney J Scott

Discipline of Medical Genetics, School of Biomedical Sciences and Pharmacy, University of Newcastle, Callaghan, New South Wales, Australia

Hunter Medical Research Institute, Newcastle, New South Wales, Australia

Search for more papers by this author
Sunil Lakhani

Sunil Lakhani

UQ Centre for Clinical Research, The University of Queensland, Brisbane, Queensland, Australia

Pathology Queensland, Royal Brisbane and Women's Hospital, Brisbane, Queensland, Australia

Search for more papers by this author
Stephen Fox

Stephen Fox

Department of Pathology, Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia

Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, Victoria, Australia

Search for more papers by this author
Elizabeth Robbins

Elizabeth Robbins

Department of Tissue Pathology and Diagnostic Oncology, NSW Health Pathology, Royal Prince Alfred Hospital, Camperdown, New South Wales, Australia

Search for more papers by this author
Joo-Shik Shin

Joo-Shik Shin

Department of Tissue Pathology and Diagnostic Oncology, NSW Health Pathology, Royal Prince Alfred Hospital, Camperdown, New South Wales, Australia

Search for more papers by this author
Jane Beith

Jane Beith

Psycho-Oncology Co-Operative Group (PoCoG), University of Sydney, Sydney, New South Wales, Australia

Chris O'Brien Lifehouse, Sydney, New South Wales, Australia

University of Sydney, Sydney, New South Wales, Australia

Search for more papers by this author
Anthony Gill

Anthony Gill

Cancer Diagnosis and Pathology Group, Kolling Institute of Medical Research, Royal North Shore Hospital, St Leonards, New South Wales, Australia

NSW Health Pathology, Department of Anatomical Pathology, Royal North Shore Hospital, St Leonards, New South Wales, Australia

Sydney Medical School, University of Sydney, St Leonards, New South Wales, Australia

Search for more papers by this author
Loretta Sioson

Loretta Sioson

Cancer Diagnosis and Pathology Group, Kolling Institute of Medical Research, Royal North Shore Hospital, St Leonards, New South Wales, Australia

NSW Health Pathology, Department of Anatomical Pathology, Royal North Shore Hospital, St Leonards, New South Wales, Australia

Sydney Medical School, University of Sydney, St Leonards, New South Wales, Australia

Search for more papers by this author
Charles Chan

Charles Chan

NSW Health Pathology, Department of Anatomical Pathology, Concord Repatriation General Hospital, Sydney, New South Wales, Australia

Concord Clinical School, Sydney Medical School, The University of Sydney, Sydney, New South Wales, Australia

Search for more papers by this author
Mrudula Krishnaswamy

Mrudula Krishnaswamy

NSW Health Pathology, Department of Anatomical Pathology, Concord Repatriation General Hospital, Sydney, New South Wales, Australia

Concord Clinical School, Sydney Medical School, The University of Sydney, Sydney, New South Wales, Australia

Search for more papers by this author
Caroline Cooper

Caroline Cooper

Anatomical Pathology, Pathology Queensland, Princess Alexandra Hospital, Woolloongabba, Queensland, Australia

Faculty of Medicine, The University of Queensland, St Lucia, Queensland, Australia

Search for more papers by this author
Sanjay Warrier

Sanjay Warrier

Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia

Sydney Medical Program, The University of Sydney, Sydney, New South Wales, Australia

Department of Breast Surgery, Chris O'Brien Lifehouse, Camperdown, New South Wales, Australia

Search for more papers by this author
Cindy Mak

Cindy Mak

Faculty of Medicine, The University of Queensland, St Lucia, Queensland, Australia

Department of Breast Surgery, Chris O'Brien Lifehouse, Camperdown, New South Wales, Australia

Search for more papers by this author
John EJ Rasko

John EJ Rasko

Faculty of Medicine, The University of Queensland, St Lucia, Queensland, Australia

Department of Cell and Molecular Therapies, Royal Prince Alfred Hospital, Sydney, New South Wales, Australia

Gene and Stem Cell Therapy Program, Centenary Institute, Sydney, New South Wales, Australia

Search for more papers by this author
Charles G Bailey

Charles G Bailey

Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia

Gene and Stem Cell Therapy Program, Centenary Institute, Sydney, New South Wales, Australia

Cancer and Gene Regulation Laboratory Centenary Institute, The University of Sydney, Camperdown, New South Wales, Australia

Search for more papers by this author
Alexander Swarbrick

Alexander Swarbrick

St. Vincent's Clinical School, University of New South Wales, Sydney, New South Wales, Australia

Tumour Progression Laboratory, Cancer Ecosystems Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia

Search for more papers by this author
Susan J Clark

Susan J Clark

Epigenetics Research Laboratory, Cancer Ecosystems Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia

St. Vincent's Clinical School, University of New South Wales, Sydney, New South Wales, Australia

Co-senior authors: Susan J. Clark, Sandra O'Toole, Ruth Pidsley.

Search for more papers by this author
Sandra O'Toole

Corresponding Author

Sandra O'Toole

Tumour Progression Laboratory, Cancer Ecosystems Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia

Department of Tissue Pathology and Diagnostic Oncology, NSW Health Pathology, Royal Prince Alfred Hospital, Camperdown, New South Wales, Australia

Sydney Medical School, University of Sydney, Sydney, New South Wales, Australia

Co-senior authors: Susan J. Clark, Sandra O'Toole, Ruth Pidsley.

Correspondence to: R Pidsley, Epigenetics Research Laboratory, Cancer Ecosystems Program, Garvan Institute of Medical Research, Sydney, NSW 2010, Australia. E-mail: [email protected]; S O'Toole, Tumour Progression Laboratory, Cancer Ecosystems Program, Garvan Institute of Medical Research, Sydney, NSW 2010, Australia.

E-mail: [email protected]

Search for more papers by this author
Ruth Pidsley

Corresponding Author

Ruth Pidsley

Epigenetics Research Laboratory, Cancer Ecosystems Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia

St. Vincent's Clinical School, University of New South Wales, Sydney, New South Wales, Australia

Co-senior authors: Susan J. Clark, Sandra O'Toole, Ruth Pidsley.

Correspondence to: R Pidsley, Epigenetics Research Laboratory, Cancer Ecosystems Program, Garvan Institute of Medical Research, Sydney, NSW 2010, Australia. E-mail: [email protected]; S O'Toole, Tumour Progression Laboratory, Cancer Ecosystems Program, Garvan Institute of Medical Research, Sydney, NSW 2010, Australia.

E-mail: [email protected]

Search for more papers by this author
First published: 01 February 2024

No conflicts of interest were declared.

Abstract

Phyllodes tumours (PTs) are rare fibroepithelial lesions of the breast that are classified as benign, borderline, or malignant. As little is known about the molecular underpinnings of PTs, current diagnosis relies on histological examination. However, accurate classification is often difficult, particularly for distinguishing borderline from malignant PTs. Furthermore, PTs can be misdiagnosed as other tumour types with shared histological features, such as fibroadenoma and metaplastic breast cancers. As DNA methylation is a recognised hallmark of many cancers, we hypothesised that DNA methylation could provide novel biomarkers for diagnosis and tumour stratification in PTs, whilst also allowing insight into the molecular aetiology of this otherwise understudied tumour. We generated whole-genome methylation data using the Illumina EPIC microarray in a novel PT cohort (n = 33) and curated methylation microarray data from published datasets including PTs and other potentially histopathologically similar tumours (total n = 817 samples). Analyses revealed that PTs have a unique methylome compared to normal breast tissue and to potentially histopathologically similar tumours (metaplastic breast cancer, fibroadenoma and sarcomas), with PT-specific methylation changes enriched in gene sets involved in KRAS signalling and epithelial-mesenchymal transition. Next, we identified 53 differentially methylated regions (DMRs) (false discovery rate < 0.05) that specifically delineated malignant from non-malignant PTs. The top DMR in both discovery and validation cohorts was hypermethylation at the HSD17B8 CpG island promoter. Matched PT single-cell expression data showed that HSD17B8 had minimal expression in fibroblast (putative tumour) cells. Finally, we created a methylation classifier to distinguish PTs from metaplastic breast cancer samples, where we revealed a likely misdiagnosis for two TCGA metaplastic breast cancer samples. In conclusion, DNA methylation alterations are associated with PT histopathology and hold the potential to improve our understanding of PT molecular aetiology, diagnostics, and risk stratification. © 2024 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.

Introduction

Phyllodes tumours (PTs) are rare, fibroepithelial tumours of the breast. PTs account for ~1% of breast tumours, with malignant PTs accounting for ~10% of PT diagnoses [1]. Fibroepithelial lesions of the breast are a heterogenous group of tumours that include cellular fibroadenomas (FAs) and PTs. FAs and PTs have a similar clinical presentation and are composed of both stromal cells and epithelial cells, of which stromal cells are the neoplastic component. However, their disease course varies, with PTs characterised by more rapid growth and a higher risk of recurrence after surgery. Furthermore, if surgical control is unsuccessful, malignant PTs generally respond poorly to chemotherapy and radiotherapy and may metastasise [2].

The World Health Organisation (WHO) Classification of Breast Tumours, 5th Edition [1], outlines the current criteria used to classify PTs and FAs. In brief, FAs have well-defined margins and show variable but usually mild cellularity and low mitotic activity; PTs show a spectrum from benign to borderline to malignant with increasing cellularity, atypia, and mitotic activity. Malignant PTs have sarcomatous stroma and may also show stromal overgrowth and heterologous differentiation with malignant bone (osteosarcoma) and cartilage (chondrosarcoma) [1]. Notably, there is significant overlap between diagnostic categories, which can pose a clinical challenge.

In recognition of this challenge, the Singapore General Hospital (SGH) Group developed a nomogram which assigns a score based on cellular atypia (mild, moderate, severe), mitotic count per 10 high-powered fields, presence or absence of stromal overgrowth, and margin status, the most heavily weighted variable [3]. This SGH score is used as a predictor of recurrence-free survival and has been independently validated in several cohorts [4-6], leading to its inclusion in the WHO classification [1]. While the nomogram provides more data regarding likely outcome for a particular patient, it does not correctly identify all patients with PT who recur, nor does it distinguish PT from histopathologically similar tumours [4-6]. There is therefore a need to develop molecular biomarkers that can provide an accurate diagnosis to inform optimal patient management and minimise both under- and overtreatment.

Several studies have identified PT molecular biomarkers, but with limited translational success [7], and few studies exist to address the clinical need to distinguish PT from histopathologically similar tumours [8]. PTs are difficult to study owing to their relative rarity; in one of the largest genomic studies to date, Tan and colleagues [9] undertook exome sequencing (n = 22 PTs) and performed targeted resequencing of a larger cohort of 100 PTs. They described recurrent loss of function mutations in SETD2 and KMT2D, which are histone methyltransferase enzymes known to play a significant role in epigenetic modification. That these mutations are detected in PTs and only rarely in FAs suggests a potential role for epigenetics in PT tumorigenesis [10]. Disruption to epigenetic mechanisms, such as DNA methylation, is a recognised hallmark of cancer [11] and therefore may be important in the molecular aetiology of PTs. Epigenetic studies of PTs have been limited to date; early studies investigated DNA methylation at a small number of candidate genes such as TWIST1 and RASSF1 [12, 13], and a more recent integrative genomic and epigenomic study by Hench et al [14] highlighted the diagnostic potential of combining DNA methylation and copy number profiling for predicting clinical outcomes.

We hypothesised that DNA methylation could provide novel biomarkers for diagnosis and more accurate tumour stratification in PTs, as well as a better understanding of the biological processes underlying the development and progression of PTs. In this study, we undertook whole-genome methylation profiling using DNA methylation arrays on a unique cohort (n = 33) of fibroepithelial and breast tumours encompassing FAs, metaplastic breast cancer, and benign, borderline, and malignant PTs. We used publicly available data to perform validation [14] and comparisons with histopathologically similar tumours. Our characterisation of the PT methylome reveals that it is unique compared to that of other similar or co-localised cancers including breast cancer and sarcomas. Further, we demonstrate the diagnostic potential of DNA methylation by identifying DNA methylation differences between malignant and non-malignant PTs and create a classifier with the potential to discriminate PTs from metaplastic breast cancer.

Materials and methods

Additional details for all methods are included in Supplementary materials and methods.

Ethics approval and consent to participate

Ethics approval was obtained through the Sydney Local Health District (Royal Prince Alfred Hospital Zone) Human Research Ethics Committee: Protocol No. X15-0388 and 2019/ETH06994 – ‘Retrospective breast tumour bank and database.’

Clinical samples

The Phyllodes (Australian) cohort comprised n = 33 patients with cellular fibroepithelial lesions and metaplastic breast cancer who had a tumour resected between 2007 and 2017. The Phyllodes (Australian) cohort patients had few events (recurrence or death), consistent with the rarity of recurrence in PT tumours. An expert breast pathologist (S.O.T.) classified cases according to WHO categories [15] as benign (n = 2), borderline (n = 9), and malignant (n = 18) PT, FA (n = 2), and metaplastic breast cancer (n = 2, supplementary material, Table S1). This PT classification was based on the SGH nomogram score [3] minus margin status, with thresholds for benign (SGH = 3), borderline (SGH = 14), or malignant (SGH ≥ 24). Margin status was excluded as it does not necessarily reflect the underlying biological nature of a PT and instead may reflect patient and surgeon preferences.

DNA methylation

Pathological review of the Phyllodes (Australian) cohort tissue identified regions for coring. DNA was extracted from the cores using the Qiagen QIAamp DNA FFPE Tissue Kit or MN Nucleospin DNA kit for formalin-fixed paraffin-embedded tissue (FFPET), following the manufacturer's instructions (Qiagen, Hilden, Germany). The Infinium HD FFPE quality control and DNA restoration kits (Illumina, San Diego, CA, USA) were used to evaluate and repair degraded DNA samples where feasible, as previously described [16]. DNA (250–500 ng) was treated with sodium bisulphite using an EZ-96 DNA methylation kit (Zymo Research, Irvine, CA, USA). DNA methylation was then quantified using the Illumina Infinium Human Methylation EPIC BeadChip (EPIC arrays) according to the manufacturer's standard protocol (Illumina).

Publicly available genome-wide DNA methylation data

EPIC array methylation datasets were downloaded from Gene Expression Omnibus (GEO, www.ncbi.nlm.nih.gov/geo): GSE179458, a PT, breast carcinoma, and FA cohort (n = 80) [the ‘Phyllodes (Hench et al) cohort’] [14]; GSE184159, our previously published triple-negative breast cancer cohort (n = 32) (the ‘TNBC BRCA cohort’) [17]; and GSE140686, a sarcoma cohort (the ‘Sarcoma cohort’) [18]. From GSE140686 we randomly selected three samples each (where available) of 65 sarcoma subtypes to include the full range of sarcomas in our initial analysis. For a secondary analysis, all samples of 11 sarcoma subtypes were selected for their pathological similarity to PT (total n = 392). Samples from The Cancer Genome Atlas Breast Cancer (TCGA BRCA) cohort Infinium 450K array methylation data were downloaded from the GDC Data Portal (n = 280) [the ‘BRCA (TCGA) cohort’] [19].

The total dataset comprised n = 817 samples (with a cohort and analysis breakdown shown in supplementary material, Table S1). For each analysis, the methylation data underwent quality control, normalisation, and harmonisation according to the genomic content of the array, depending on which cohorts were involved.

Cellular deconvolution of methylation data

To estimate tumour purity and cellular composition, we used the R package EpiDISH [20] (version 2.14.1) using the centEpiFibIC.m reference dataset, and t-tests were used to compare cellular proportions between tumour types.

Sarcoma classifier

To compare the methylation profile of all available PT and FA samples to sarcoma subtypes, we used a web-based classifier created by Koelsche and colleagues [18].

Genome-wide DNA methylation analysis

For initial data visualisation we extracted the 500 most variable probes across the dataset being studied and applied the ‘Rtsne’ function. For each cohort comparison we used the limma package (version 3.54.1) to identify differentially methylated probes (DMPs) with an adjusted p value cut-off of false discovery rate (FDR) < 0.05 [21]. The package DMRcate (version 2.12.0) was used to identify differentially methylated regions (DMRs), with a p value cut-off of FDR < 0.05 and an absolute ∆β of ≥10% [22].

Machine learning

To develop a machine learning algorithm to distinguish PTs from breast cancer, we followed the steps outlined in supplementary material, Figure S1, on samples described in supplementary material, Table S1D. We first employed a DMP analysis through the limma package to determine individual CpG sites that were significantly different between PTs and non-metaplastic breast cancer. The training cohort consisted of PTs from the Phyllodes (Australian) cohort (n = 29) and TCGA datasets (n = 2), combined with randomly selected TCGA breast cancer samples (n = 150, supplementary material, Table S1D). We used the resulting 10 probes to create a random forest classifier using the caret package (version 6.0-93) [23]. This model was then tested on three separate cohorts: (i) the Phyllodes (Hench et al) cohort, to test against an independent PT (n = 38) and breast cancer (n = 25) cohort; (ii) the BRCA (TCGA) normal breast tissue samples (n = 50), to ensure control/normal tissue was not being identified as a PT; and (iii) the combined metaplastic breast cancer samples (n = 15) from the BRCA (TCGA) cohort (n = 13) and the Phyllodes (Australian) cohort (n = 2) to determine whether our classifier could successfully distinguish this potentially misdiagnosed cancer.

Results

Patient cohort and publicly available data

We performed a genome-wide DNA methylation analysis of primary breast PT samples. The Phyllodes (Australian) cohort (Figure 1) comprises female patients who had a PT (n = 29) reviewed by a specialist breast pathologist (S.O.T.) and classified according to the WHO criteria and modified SGH score as benign (Figure 1A, n = 2, SGH = 3), borderline (Figure 1B, n = 9, SGH = 14), or malignant (Figure 1C, n = 18, SGH ≥24). These modified SGH score cut-offs were selected to represent the minimum diagnostic criteria for each category. Additional samples were included from patients with FA lesions (n = 2) and metaplastic breast cancer (n = 2). For each patient, DNA was extracted from a FFPE block of the lesion. DNA methylation profiling of the samples was performed using EPIC arrays. To enable comparison with histopathologically similar tumours (Figure 1D–F), we included publicly available methylation datasets in our analysis (supplementary material, Table S1) [14, 17-19].

Details are in the caption following the image
Histopathology of different grades of PT and potentially misdiagnosed, similar tumours. Representative haematoxylin and eosin staining (at magnification ×400) from patients with (A) benign, (B) borderline, and (C) malignant PTs. The leaf-like structure defining PTs can be seen in (A), with progressively increasing cellularity, atypia, and infiltration observed as grade of tumour increases. Histopathologically similar tumours used as comparisons in this study include (D) fibroadenoma (FA), (E) undifferentiated sarcoma, and (F) metaplastic breast cancer. FAs and metaplastic breast cancer can share histopathological features with benign and malignant PTs respectively.

DNA methylation profiling delineates phyllodes tumours from histopathologically similar tumours

First, we explored the ability of DNA methylation to distinguish PTs from all other tissue types. Methylation data from n = 403 samples were combined to create the ‘Comprehensive Cohort’ (supplementary material, Table S1A). Visualisation of the top 500 most variable probes in a t-SNE plot shows that samples cluster according to tumour type, with minimal clustering by cohort, indicating minimal batch effects (Figure 2A). PTs cluster together, in a distinct group from the breast carcinoma samples, despite both being primary tumours of the breast. Phyllodes also form a separate group from the sarcoma samples despite both being tumours of mesenchymal origin. FA samples cluster towards the PTs, as expected given their histopathological similarity, but notably form their own subcluster suggesting distinct FA and PT methylomes.

Details are in the caption following the image
DNA methylation distinguishes PTs compared to other co-localised or pathologically similar tumours. (A) t-SNE of the top 500 most variable probes shows clustering by sample type. PTs (purple) cluster separately from normal breast (light blue), breast cancer (blue), sarcoma (green), and FA tissue (red). Minimal batch effects are observed between different cohorts with the same tumour type (e.g. TCGA BRCA and TNBC BRCA). (B) EpiDISH, a cellular deconvolution method using methylation, shows different cellular compositions between tissue types. PTs show the highest fibroblast and lowest epithelial proportions. Differences are shown between PTs and (i) breast cancer and (ii) normal breast tissue. (C) The sarcoma classifier generated by Koelsche and colleagues [18] classifies eight PT samples as sarcomas (≥0.9 classifier score), with the majority classified within the dermatofibrosarcoma protuberans subtype (DFSP). Other subtypes with PT samples above the 0.9 threshold include malignant peripheral nerve sheath-like sarcoma (MPNST-like), undifferentiated sarcoma (USARC), and desmoid-type fibromatosis (DTFM).

To further characterise the difference between each type of tumour, we employed the methylation-based cellular deconvolution method EpiDISH to estimate epithelial, fibroblast, and immune cell fractions (supplementary material, Table S2). As anticipated given the known stromal neoplastic proliferation in PTs, we observed a greater proportion of fibroblast cells in PTs (mean = 0.70) compared to normal breast tissue samples (mean = 0.28, t-test p < 0.001; Figure 2Bi) and compared to breast carcinoma (which is of known epithelial origin) (mean = 0.23, t-test p < 0.001; Figure 2Bii) across all cohorts (supplementary material, Figure S2). Interestingly, we found that PTs and FA had a low immune cell proportion (mean = 0.13), which is significantly lower than other tumour types (supplementary material, Table S2).

Histopathologically malignant PTs can show sarcomatous differentiation including heterologous elements such as chondro- and osteosarcoma [1]. To ascertain whether any PT samples exhibited the methylation profile of a particular sarcoma, we applied the methylation-based sarcoma classifier created by Koelsche and colleagues [18] to all PT and FA samples (n = 86). None of the n = 19 FA samples were classified as sarcoma. Of the total 67 PT samples, eight samples passed the classifier threshold for a sarcoma (threshold ≥0.9, Figure 2C). Interestingly, these eight samples [n = 5, Phyllodes (Australian) cohort, n = 3 Phyllodes (Hench et al) cohort] are across all PT grades and were classified within just four of the 65 sarcoma subtypes: five samples were identified as dermatofibrosarcoma protuberans (DFSP), which was also the most highly ranked subtype across all PT samples (supplementary material, Figure S3), one PT was classified as undifferentiated sarcoma (USARC), one PT as malignant peripheral nerve sheath tumour (MPNST-like), and one PT as desmoid-type fibromatosis (DTFM). Intriguingly, no overlap with chondrosarcoma or osteosarcoma was seen in the malignant PTs from the Phyllodes (Australian) cohort reported as having malignant bone or cartilage heterologous elements by histopathology. A t-SNE plot comparing PTs and specific sarcoma subtypes pathologically related to PTs (informed by pathologist advice and the results of classifier analysis) showed that the majority of PT samples maintained a distinct clustering from sarcomas (supplementary material, Figure S4A). However, the PT sample that had the highest score (0.99) in the sarcoma classifier as a DTFM was a malignant PT sample (GSM5418525) from the Phyllodes (Hench et al) cohort and clearly clustered with the DTFM subtype (supplementary material, Figure S4A). Sample 4,488 was noted to have heterologous elements of rhabdoid differentiation according to pathology and was considered to be MPNST-like by the sarcoma classifier and, interestingly, was also found to cluster close to both these sarcoma subtypes in the t-SNE (supplementary material, Figure S4B). While greater sample numbers are needed, there is potential for this already existing classifier to be applied to PT samples to detect misdiagnosed sarcomas.

DNA methylation profiling distinguishes phyllodes tumours from normal breast tissue

To discover methylation changes that define PT pathology and gain insight into PT biology, we compared the methylation profile of PTs against normal breast tissue. For this we analysed two methylation datasets: (i) PTs from the Phyllodes (Australian) cohort (n = 29) and BRCA (TCGA) normal breast tissue samples (n = 30) and (ii) PTs from the Phyllodes (Hench et al) cohort (n = 38) and an independent set of BRCA (TCGA) normal breast tissue samples (n = 30) (supplementary material, Table S1B). A t-SNE plot of the top 500 most variable probes revealed distinct clusters by tissue type, irrespective of PT grade (supplementary material, Figure S5). We next identified differentially methylated regions (DMRs) between PT and normal tissue: 11,366 DMRs from the Phyllodes (Australian) cohort. Interestingly these regions showed a strong correlation with methylation in the Phyllodes (Hench et al) cohort (Pearson's r = 0.82, p < 2.2e-16) (supplementary material, Figure S6A). In an independent analysis of the Phyllodes (Hench et al) cohort we identified 9,992 DMRs, of which 6,882 DMRs overlapped between cohorts (supplementary material, Figure S6B, and Table S3). Gene ontology analysis of the common DMRs revealed 223 significant gene signatures (supplementary material, Table S4). Of note, dysregulation of the KRAS signalling pathway was one of the top signatures in an analysis of both hyper- and hypomethylated DMRs. Enriched hypomethylated pathways also include epithelial-mesenchymal transition (EMT) and extracellular structure organisation (supplementary material, Figure S7).

Identification of differential DNA methylation between malignant and non-malignant phyllodes tumours

We performed a genome-wide methylation analysis to identify novel genomic regions associated with PT malignancy by comparing malignant (SGH ≥24) and non-malignant samples (benign and borderline, SGH < 24). Initial visualisation of the top 500 most variable probes of PTs in the combined PT cohorts (n = 62; 5 samples with unknown SGH score removed, supplementary material, Table S1C) and analysis of global methylation showed no obvious difference in methylation between malignant and non-malignant samples (supplementary material, Figures S8A and S9A) or PT grade (supplementary material, Figures S8B and S9B). Next, we applied a DMR analysis to identify methylation differences between malignant (n = 18) and non-malignant samples (n = 11) in the Phyllodes (Australian) cohort, which identified 355 significant DMRs (FDR ≤0.05, absolute ∆β ≥ 10%) (Figure 3A and supplementary material, Table S5). We next compared the differential methylation of the 355 malignant DMRs between the Phyllodes (Australian) cohort and the Phyllodes (Hench et al) cohort, finding a strong positive correlation (Figure 3B, Pearson's r = 0.75, p < 0.0001). Gene set enrichment analysis (GSEA) of genes proximal to hypomethylated DMRs showed an enrichment for oestrogen response and p53 from the hallmark signature and downregulated EMT signalling in breast cancer (Figure 3C). Significant pathways from the GSEA analysis can be found in supplementary material, Table S6 and Figure S10. Furthermore, we sought to validate CNV findings from Hench and colleagues’ study [14] in the Phyllodes (Australian) cohort using the conumee package for calling CNVs from methylation array data. We observed several CNV amplifications in MDM4 and EGFR, with amplifications and deletions in RB1 and minimal CDKN2A/B deletions (supplementary material, Figure S11).

Details are in the caption following the image
Differential DNA methylation can define the malignancy status of PTs. (A) Heatmap of 355 significant DMRs (FDR ≤ 0.05, Δβ ≥10%) from the Phyllodes (Australian) cohort distinguishes malignant (SGH ≥ 24) from non-malignant (SGH < 24) PTs and showing subclusters associated with SGH score (i versus ii). (B) Correlation of malignant minus non-malignant DNA methylation differences (%) in Phyllodes (Australian) DMRs compared against methylation in the same regions in the Phyllodes (Hench et al) cohort [14] (r = 0.75, p < 2.2 × 10−16), with the four most significant DMRs across both cohorts highlighted. (C) GSEA of hypomethylated probes derived from Phyllodes (Australian) DMR analysis.

Independent genome-wide DMR analysis of malignant versus non-malignant PT samples in the Phyllodes (Hench et al) cohort revealed 532 DMRs (supplementary material, Table S7). 53/532 DMRs intersected with the 355 DMRs from the Phyllodes (Australian) cohort, and all agreed on the direction of effect (Figure 4A and supplementary material, Table S8 and Figure S12). These 53 validated DMRs were distributed throughout the genome and were all hypermethylated in malignant samples (Figure 4B). The most highly ranked DMR genes in both discovery and validation cohorts were HSD17B8, NADK, NELFA, and GFM1/LNX (Figures 3B and 4C and supplementary material, Figures S13 and S14). Among malignant samples we observed heterogeneity in the methylation levels of our top DMRs (supplementary material, Figure S15) but found no evidence of genomic copy number confounding methylation at these regions (supplementary material, Figure S16).

Details are in the caption following the image
Validation of differentially methylated regions between malignant and non-malignant PTs. (A) Venn diagram for DMRs from independent analysis of malignant (SGH ≥24) versus non-malignant (SGH <24) PTs conducted within Phyllodes (Australian) and Phyllodes (Hench et al) [14] cohorts. 53 DMRs overlapped between both cohorts (FDR ≤0.05, Δβ ≥10%). (B) Circos heatmap depicting location and methylation level of 53 validated DMRs. (C) Methylation of top ranked DMRs for each sample (defined by highest ranked genes by FDR in each cohort analysis). ***p ≤ 0.001, Student's t-test for region after recalculation of intersected DMR.

Of note, the most significant DMR in both cohorts is an expansive region covering the promoter and extending into the gene body of HSD17B8 [Figure 5A, Phyllodes (Australian) cohort: p = 4.48 × 10−32, ∆β = 13.4%, # CpG sites = 33; Phyllodes (Hench et al) cohort: p = 1.66 × 10−95, ∆β = 14.7%, # CpG sites = 36]. Within the hypermethylated malignant PT group the absolute level of methylation was heterogenous, which led us to investigate whether methylation in this region was associated with any other patient-specific molecular or clinical variables. We observed no association with EpiDISH-predicted cell type proportion with HSD17B8 methylation (supplementary material, Figure S17A–C). However, we found a significant positive correlation with SGH score, suggesting that HSD17B8 methylation is likely on a continuum, increasing with the degree of atypia (as measured by SGH score, supplementary material, Figure S17D), as well as a near-significant negative association with age (r = −0.36, p = 0.054, supplementary material, Figure S17E).

Details are in the caption following the image
HSD17B8 – Evidence of promoter DNA hypermethylation with malignant PTs and single-cell expression profiling showing low expression in fibroblast populations. (A) Heatmap and relative location of HSD17B8 gene and DMR showing variable hypermethylation in malignant PTs compared to non-malignant PTs. (B) UMAP of single-cell expression data derived from three malignant PTs (resolution = 0.4). Major clusters are defined by cell type-specific expression signatures, with the major cluster predicted as fibroblasts (tumour) by ‘SingleR’. (C) Depiction of HSD17B8 expression density overlayed on single-cell cell-type prediction UMAP, showing minimal expression within fibroblast cells and higher expression levels in epithelial cells.

Decreased expression of HSD17B8 was previously associated with poor survival outcomes in breast cancer, including in the BRCA (TCGA) cohort [24]. We therefore used the full BRCA (TCGA) dataset to test for an association between HSD17B8 expression and methylation [19]. We observed a significant association between positive HSD17B8 methylation and decreased gene expression (Pearson's r = −0.48, p < 0.001, supplementary material, Figure S18), suggesting a potential regulatory role associated with the methylation change. Interestingly we also observed that, as in the Phyllodes (Australian) cohort, HSD17B8 methylation was highly heterogenous between BRCA (TCGA) samples, with only a small proportion (9.6%) with methylation levels above 40% (supplementary material, Figure S18).

Finally, we generated single-cell expression data for three malignant PTs (mean = 3,163 cells per sample), of which two overlapped with our Phyllodes (Australian) cohort methylation dataset (sample nos. 4,413 and 4,436). For initial characterisation of the single-cell data, we clustered cells by predicted cell type. As expected for PTs as a fibroepithelial tumour type, most cells were predicted to be of stromal origin, but we also observed several smaller clusters of immune and epithelial cell types (Figure 5B) at similar levels to those observed from our previous cellular deconvolution analysis (supplementary material, Figure S19). We interrogated the data of all three PTs for expression of HSD17B8 and found minimal expression within the predominant fibroblast (putative tumour cell) proportion, consistent with the high gene promoter methylation levels observed in malignant samples (Figure 5C). HSD17B8 expression appears to occur specifically within the small proportion of epithelial cells (Figure 5C); however, a larger single-cell cohort including non-malignant PTs is required to determine the cell specificity of HSD17B8 expression and changes with malignancy.

DNA methylation as a biomarker to prevent misdiagnosis of phyllodes tumours as metaplastic breast cancer

Malignant PTs can share histopathological features with types of metaplastic breast cancer (Figure 1) such as spindle cell metaplastic carcinoma and metaplastic carcinoma with heterologous mesenchymal differentiation [15]. As a result, PTs can be misdiagnosed as metaplastic breast cancer, which has a significant therapeutic consequence for patients [25]. In our initial visualisation of DNA methylation data in the Comprehensive Cohort, we observed that metaplastic breast cancers largely clustered with non-metaplastic breast cancers, away from the PTs (Figure 2A). Thus, we hypothesised that the methylation differences between PTs and non-metaplastic breast cancer could be exploited to distinguish PT from metaplastic breast cancer.

To develop a classifier, we curated cohorts of PT, breast cancer, and normal samples (as outlined in supplementary material, Table S1D). Initial visualisation of all samples (Figure 6A) confirmed that PTs largely clustered away from breast cancer/tissue samples. We then used a training cohort of PT and breast cancer samples to identify CpG sites of differential methylation (DMPs). These CpG sites were used to develop a random forest classifier, which we assessed using three test datasets: (i) an independent PT/breast cancer validation cohort and alternative tissue types in (ii) normal breast tissue and (iii) metaplastic breast cancer, as outlined in supplementary material, Figure S1 and Table S1D and described below.

Details are in the caption following the image
Comparison of DNA methylation between PTs, breast cancers, and normal breast tissue. (A) t-SNE of 500 most variable probes among all samples and tissue types. This shows phyllodes-specific (purple, n = 69), breast cancer (blue, n = 150), metaplastic breast cancer (orange, n = 15), and normal breast tissue (light blue, n = 50) clusters. Two metaplastic breast cancer samples (TCGA-AC-A2QH and TCGA-AC-A7VC) cluster with PTs. (B) Fourfold plots of the test results from the PT versus breast cancer classifier models. Green quadrants denote a correct prediction of tumour type compared to reference, while red quadrants indicate incorrect predictions. Testing of model on PT and breast cancer samples from Phyllodes (Hench et al) cohort returns 57/63 correct predictions. (C) Testing of classifier on metaplastic breast cancer samples from TCGA and Phyllodes (Australian) cohorts returns 13/15 correct predictions.

Development of a random forest model to distinguish PT from breast cancer

First, we identified CpG sites of differential methylation between PT (n = 31) and non-metaplastic breast cancer (n = 150). We identified 321,344 DMPs and then performed manual feature selection, selecting probes based on delta beta (Δβ ≥50%), independent predictive ability (AUC ≥0.95), and low correlation to one another (pair-wise absolute correlation <0.75), culminating in a final selection of 10 probes (supplementary material, Table S9 and Figure S1). Each probe was then independently validated through the Boruta feature selection package (supplementary material, Figure S20) and used to fit the random forest model.
  1. Testing of random forest classifier on PT versus non-metaplastic breast cancer

    The random forest model was first tested on the training dataset (PT versus non-metaplastic breast cancer); as expected, we achieved 100% accuracy to discriminate PTs from non-metaplastic breast cancer (supplementary material, Figure S21A). Using the Phyllodes (Hench et al) cohort as our test dataset (n = 63) (supplementary material, Figure S1Di), we achieved an accuracy of 90.5% (Table 1, precision = 80.65%, recall = 100%, F1 = 89.3%), where only 6/63 samples were incorrectly classified (Figure 6B).

  2. Testing of random forest classifier on normal breast tissue

    To determine how tissue types other than those the model was trained on would be categorised, we ran a test cohort of normal breast tissue (n = 50) (supplementary material, Figure S1Dii) through the model and observed a prediction of all samples as breast cancer (supplementary material, Figure S21B). This result indicated that, although not trained on normal breast tissue, our model classified both breast cancer and normal breast tissue in the one group together.

  3. Testing of random forest classifier on metaplastic breast cancer

    Prior to classification of the final test cohort of metaplastic breast cancer samples (supplementary material, Figure S1Diii, n = 15), we conducted a blinded, expert clinical re-assessment of all histopathology reports and images for metaplastic breast cancer samples from TCGA. The conclusion from pathological review was that a diagnosis of a malignant PT could not be excluded in four samples from the TCGA cohort (TCGA-AC-A2QJ-01, TCGA-AC-A2QH-01, TCGA-AC-A7VC-01, TCGA-A2-A4S1-01). Intriguingly, we observed two of these four samples clustering with PTs (TCGA-AC-A2QH-01, TCGA-AC-A7VC-01) in our initial visualisation of variable DNA methylation of all samples (Figure 6A). Once we applied the PT/breast cancer classifier to the n = 15 metaplastic breast cancer samples, we found that 13 samples were predicted as breast cancer, and two as PTs (Figure 6C). Those two samples were TCGA-AC-A2QH-01 and TCGA-AC-A7VC-01, which, interestingly, were two of the TCGA samples previously identified as potential PTs by pathologist review and t-SNE clustering (Figure 6A), suggesting a possible misdiagnosis.

Table 1. Statistical results of training and test cohorts of phyllodes tumour versus breast cancer random forest model.
Cohort Sensitivity Specificity Precision Recall F1 Balanced accuracy
Training set (n = 181) 1.000 1.000 1.000 1.000 1.000 1.000
Test set – phyllodes (Hench et al) cohort [14] (n = 63) 1.000 0.842 0.806 1.000 0.893 0.921
Test set – metaplastic (n = 15) 0.867 - 1.000 0.867 0.929 -
Test set – normal breast tissue (n = 50) 1.000 - 1.000 1.000 1.000 -

Discussion

Currently, diagnosis and grading of PTs primarily rely on histological examination of the tumour, which determines treatment. However, it can be challenging to accurately classify PTs, particularly between borderline and malignant PT tumours, and to distinguish from those tumours that have shared histological features. Furthermore, the molecular underpinnings of the spectra of PTs are not well characterised. While cancer-related genetic alterations occur in most PTs, the mutational profile between tumours is highly variable [26]. Mutations in TERT (59%) and MED12 (53%) are the most common but do not occur in all malignant cases [26]. Therefore, diagnosis or grading of PT via common mutations would not sufficiently capture all cases. In this study, we sought to interrogate the role of DNA methylation in PTs and its diagnostic potential by performing whole-genome methylation profiling in a novel cohort of PT patients.

Incorporating recently curated publicly available PT methylation data into our study, we showed that PTs have an entirely unique methylome in comparison to other related tumour types, including breast cancer and sarcoma. Cellular deconvolution of the methylation data and PT single-cell expression data showed that both PTs and FAs exhibit the cellular composition expected of a soft tissue tumour. In comparisons of PTs to both normal breast tissue and breast cancer, we showed that differential methylation in genes pertaining to EMT, PRC2 targets, and KRAS signalling pathways were enriched. Rare mutations in the KRAS gene have been found in patients with primary and metastasised phyllodes tumours [27], but our methylation data suggested a larger role for KRAS signalling in PT aetiology than previously thought.

Using the methylation-derived sarcoma classifier developed by Koelsche et al [18], eight PT samples were classified as specific sarcoma subtypes, most commonly DFSP, suggesting a unique biology of these PTs. Further work is required to determine whether a sarcoma-like methylation profile is associated with phyllodes patient survival or opens avenues of alternative treatments. For example, methylation could be used to identify patients that may be at risk of transformation into sarcoma [1] or potential candidates for post-operative radiotherapy [28, 29], or chemotherapy if the patient is otherwise inoperable [30].

We identified 53 novel DMRs between malignant and non-malignant PTs. The top DMR in both discovery and validation cohorts was promoter hypermethylation of the HSD17B8 gene (encoding Hydroxysteroid 17-Beta Dehydrogenase 8). Interestingly, HSD17B8 was identified as the only gene with prognostic ability across an impressive collection of breast cancer cohorts, with decreased expression conferring a poor prognosis [24]. Combined with our methylation data, this may mean that reduced HSD17B8 expression is a marker of poor prognosis across cancers localised to the breast, regardless of the tumour's cell type of origin. HSD17B8 has a known function in steroid metabolism, particularly as an oxidative enzyme associated with oestradiol breakdown [31]. Of the other top malignancy DMR genes, NADK (encoding NAD kinase) has been targeted as a cancer therapeutic target [32], and promoter hypermethylation and downregulation at LXN (encoding latexin) were previously associated with increased tumour volume in haematopoietic cancers [33]. While we have found evidence to suggest DNA methylation is associated with a malignant phenotype in PTs, future studies will need to be undertaken on cohorts with long-term survival data to validate these results. Pareja and colleagues propose that there may be two types of PT tumours evolving through distinct paths where PTs with MED12 mutations are likened to a more benign tumour that undergoes progressive malignant transformation, compared to the de novo malignant phenotype driven by mutation of more stereotypical drivers of cancer (NF1, EGFR, TP53) [34]. It will be important to address the issue of whether the methylation variability we observed in HSD17B8 might be associated with genotype.

A major clinical challenge for PT is the potential misdiagnosis between malignant PTs and metaplastic breast cancer. This study showed the potential of a DNA methylation classifier to identify differences between PTs and breast cancer; indeed, we found that two metaplastic breast cancer samples from the TCGA BRCA cohort [19] had likely been misdiagnosed PTs. DNA methylation-based molecular profiling has shown great utility in many cancers, including sarcoma [18], brain cancer [35], prostate cancer [36], and breast cancer [37], and could be a useful tool to help improve the diagnosis of metaplastic breast cancer and, in future studies, fibroadenoma. However, the low number of FA samples was a limitation of the current study as we were not able to perform methylation analyses to distinguish PT from FA, a common clinical challenge.

If validated, the methylation signature could be rapidly translated using a targeted technique such as methylation bisulphite PCR sequencing [38] or the highly sensitive droplet digital PCR method, which is emerging as a powerful diagnostic technique in clinical laboratories [39]. The sensitivity of these methods means that they can be applied to measure gene methylation signatures in FFPE biopsy samples and could therefore help distinguish PT from FA, sarcoma, and metaplastic breast cancer in a clinical setting.

Overall, our study demonstrates the utility of DNA methylation as a molecular tool to improve diagnostic accuracy among histopathologically similar tumours and stratify patients by risk, with the potential to improve long-term outcomes for patients. The next step for the clinical translation would be a multicentre study for PT biomarker validation, which would offer advantages including increased sample size for improved statistical power and assessment of a range of cellularity and atypia within the PT spectrum (including FA), as well as access to long-term clinical follow-up data.

Acknowledgements

This work was supported by the National Breast Cancer Foundation (NBCF) Investigator Initiated Research Scheme (NBCF IIRS 22-060, NBCF IIRS 18-137, NBCF IIRS 19-084), Leeanne Hodgson (consumer), Sydney Breast Cancer Foundation, National Health and Medical Research Council (NHMRC) Investigator Grant 2010156 to RP, and the Tour de Cure Collaborative Research Grant and NHMRC Fellowship (1063559) and project grant (1128916) to SJC. The contents of the published material are the sole responsibility of the administering institution and individual authors and do not reflect the views of the NHMRC.

    Author contributions statement

    RP, CS, SOT, KH and SJC coordinated the overall study and wrote the manuscript together with BM. KH and KAA-K prepared DNA for EPIC arrays and ran arrays on samples. BM, RP, SR, ND and AS analysed the data. BC, CSL, RK, SL, SF, CC, CM, MK and CC supplied phyllodes samples and clinicopathological data for this study and contributed to the creation of the cohort. JEJR, CGB, AG and LS provided intellectual input into the study design. All authors read and approved the final manuscript.

    Data availability statement

    The Phyllodes (Australian) cohort dataset generated and analysed during the current study is publicly available at NCBI GEO (www.ncbi.nlm.nih.gov/geo) under accession no. GSE231574 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE231574).