Abbreviations:BH Benjamini-Hochberg correction for multiple testingECM Extracellular matrixGSVA Gene Set Variation AnalysisDDA Data-dependent AcquisitionDIA Data-Independent AcquisitionEMT Epithelial-mesenchymal transitionNedrex Network-based Drug Repurposing and explorationKeywords: common fibrotic signature, drug repurposing, extracellular matrix, fibrosis, publicly available proteomics dataTotal number of words: 3944Abstract:Fibrosis is characterised by inappropriate wound healing, that can occur in multiple organs and involves the development of excessive amounts of fibrous connective tissue, with increased deposition of the Extracellular Matrix (ECM), which can ultimately lead to organ failure [1]. Despite the high burden of fibrosis [2], treatment options only delay disease progression [2]. Therefore, leveraging publicly available proteomics data, we investigated whether common fibrotic proteins and pathways in different organs could be found, to define potentially core changes related to fibrosis. We identified a core set of 18 significantly upregulated proteins in heart and liver fibrosis, pointing towards increased ECM deposition and fibroblast activation. Within the proteins significantly altered in heart and liver fibrosis respectively, interacting proteins with a shared biological function were concordantly dysregulated, for example, proteins related to the ECM (upregulated) and proteins related to mitochondrial activity (downregulated). Finally, through drug repurposing, 26 compounds were proposed for further investigation, with 20 of them having demonstrated a promising anti-fibrotic effect. This approach can be generalised for other pathologies, improving the knowledge on the affected molecular pathways, and based on this, identifying potential drug candidates/compounds.Fibrosis is characterised by inappropriate wound healing and can manifest in multiple organs (such as liver, heart, kidney, lung), due to various tissue injuries [1]. Despite organ-specific differences, commonly in all affected organs, increased inflammation and deposition of ECM proteins are observed, eventually leading to organ failure [1]. The most frequently affected organs are liver (affecting 1 in 4 persons of the global population), kidney (affecting 1 in 6), heart (affecting 1 in 60) and lung (affecting 1 in 1,500) [2]. Despite the high burden of fibrosis [2], clinical treatment only delays disease progression [2], although recent progress has been made in the field of liver fibrosis [3]. Therefore, there is a high unmet need to better understand fibrotic disease mechanisms, driving evidence-based shortlisting of anti-fibrotic drugs [4]. To facilitate investigation of fibrosis on the molecular level we established a proteomics resource compiling existing relevant datasets retrieved following a systematic search. Using this resource, we investigated whether a common fibrotic protein signature which shared differentially regulated pathways in different organs (heart, liver) could be identified. Finally, we performed molecularly–driven drug repurposing, with further in silico verification of the biological relevance of the identified drug candidates. This approach can be generalized to different diseases, contributing to a better understanding of disease-associated molecular changes and leveraging that knowledge to propose drug candidates. Publicly available proteomics datasets from ProteomeXchange and its repositories PRIDE and MassIVE [5] were identified, focussing on chronic diseases associated with fibrosis in heart, liver or kidney. More details on inclusion criteria and employed keywords are provided in Supplementary Materials. In total, three datasets on heart (heart fibrosis dataset 1 (PXD008934) [6], heart fibrosis dataset 2 (PXD012467) [7], heart fibrosis dataset 3 (PXD054266) [8]), three datasets on liver (liver fibrosis dataset 1 (PXD001474) [9], liver fibrosis dataset 2 (PXD027722) [10] and liver fibrosis dataset 3 (MSV000094959) [11]) and two datasets on kidney fibrosis (PXD006339 [12] and PXD040617 [13]) were selected. Briefly, data was re-analysed using Proteome Discoverer v1.4 for Data-Dependent Acquisition (DDA) or DIA-NN v1.9.1 for Data-Independent Acquisition (DIA) data, as applicable. For datasets passing quality control, the protein exports are available in Supplementary Table 2. For the heart and liver datasets, after removal of samples and proteins with more than 70% missing values, Wilcoxon Rank Sum tests were used to identify differential protein expression, followed by Benjamini-Hochberg (BH) correction for multiple testing. The two kidney datasets were not considered for further analysis due to quality concerns (mainly blood contamination, details in Supplementary Materials). For the remaining datasets, proteins with a BH-adjusted p-value < 0.05 and nominal significance in at least one other dataset examining the same organ, as well as a consistent trend in log₂ fold change across all datasets for that organ, were considered significant. STRING network analysis [14,15] and Gene Set Variation Analysis (GSVA) [16] using MSigDB [17] on differential expressed proteins were further applied for pathway mapping. Finally, using Network-based Drug Repurposing and exploration (NedRex) [4] and Cytoscape version 3.10.3 [18] drug repurposing analysis was performed to shortlist interesting compounds targeting key proteins and pathways in fibrosis, independent of aetiology. In heart fibrosis, 124 proteins were shared among at least two out of three datasets (significant after BH correction in at least one dataset, nominally significant in at least one other dataset, with a concordant trend in log2 fold change across all heart datasets) (Supplementary Table 3). Similarly, four shared proteins were identified in early-versus-mild liver fibrosis (Supplementary Table 4), 135 in mild-versus-severe liver fibrosis (Supplementary Table 5), and 160 in early-versus-severe liver fibrosis (Supplementary Table 6). The low number of shared proteins in early-versus-mild liver fibrosis was reported earlier [19]. Moreover, it should be noted that all four proteins in the early-versus-mild comparison were significant after BH correction in liver fibrosis dataset 2 (PXD027722) and nominally significant in liver fibrosis dataset 3 (MSV000094959). Eighteen proteins were common between heart fibrosis and liver fibrosis and demonstrated concordant upregulation in fibrosis versus non-/early fibrosis stages, and are mostly ECM-related (n=14) (based on matrisomeDB annotation [20]) (Supplementary Table 7, also providing the results of the statistical analysis of each of the individual datasets), and are listed along with their role in heart and liver fibrosis in Table S2 (Supplementary methods), and graphically shown in Figure 1.