Welcome to the IKCEST
A computer-guided design tool to increase the efficiency of cellular conversions

Reconstruction of cell-type-specific core GRNs

We propose a computer-guided design tool for TF over-expression-based cellular conversions to overcome the abiding issue of conversion efficiency. For that, we developed IRENE, a computational framework that models the major determinants of conversion efficiency and prioritizes more efficient sets of IFs (Supplementary Fig. 1). IRENE identifies these IFs by integrating transcriptomic and epigenetic profiles along with publicly available TF binding sites and enhancer-promoter interactions to reconstruct cell-type-specific core GRNs. For each TF, active enhancer and promoter regions are established by combining enhancer-promoter interactions from GeneHancer27 with cell-type-specific H3K27ac peaks and identifying H3K4me3 peaks around transcription start sites (TSS), respectively. IRENE filters these regions by overlaying cell-type-specific DNase-seq peaks to determine regulatory binding events within these regions and reconstructs transcriptional regulators from over 224 million TF ChIP-seq peaks. Finally, IRENE identifies a set of 10 identity TFs by computing the TFs with the highest cell-type-specific expression in comparison to 7600 phenotypes using a modified version of Jensen-Shannon-Divergence (JSD)16. In addition, TFs fulfilling the following three conditions are included as co-factors of these identity TFs. First, each co-factor has to be significantly expressed. Second, it has to be regulated by at least one of the identified identity TFs and, third, it has to regulate at least one identity TF. Of note, IRENE does not impose a maximum number of co-factors. Thus, all TFs fulfilling these criteria are included in the network. Finally, the core GRN is composed of all regulatory interactions between identity TFs and their co-factors.

We employed IRENE to reconstruct core GRNs for 72 human cell types, cell lines, and tissues. Every network has up to 51 TFs (on average 18.5 TFs), while every TF in the network has up to 46 regulators (on average 15.0) and 44 active enhancers (on average 6.0). The number of enhancers per gene follows an exponential distribution where the majority of genes have one or two active enhancers, which is consistent with enhancer-promoter interactions obtained from promoter capture Hi-C experiments28 (Fig. 1a). Moreover, unlike co-factors, core TFs are always differentially expressed between the initial and final cell types according to commonly used criteria (fold change > 2). Nevertheless, although co-factors are not necessarily differentially expressed, they are equally likely to be contained in the predicted IF combinations, since their over-expression could be beneficial to overcome the transcriptional and epigenetic barriers (Supplementary Table 1).

Fig. 1: Benchmarking of IRENE.
figure1

a The number of enhancers per gene (blue) across all networks follows an exponential distribution (orange). b Benchmark of reconstructed networks against cell-type-specific TF ChIP-Seq data for 8 cell types/cell lines. True positives (TP, blue) represent the interactions that are present in the reconstructed GRNs and are experimentally validated by cell-type-specific TF ChIP-Seq data. Interactions validated by TF ChIP-Seq data only profiled in cell-types other than the one under consideration are considered false positives (FP, orange). n = 13 (Adipocyte), 69 (ESC), 236 (GM12878), 238 (HeLaS3), 292 (HepG2), 53 (K562), 37 (Keratinocyte) and 178 (MCF7) interactions. ce Most-highly enriched significant gene ontology terms for the reconstructed networks of c adipocytes, d natural killer cells and e mammary epithelial cells. For adipocytes and mammary epithelial cells, the top 5 GO terms are represented. For natural killer cells, only two terms were significantly enriched. Cells corresponding to TFs that are and are not relevant for a particular GO term are colored in red and gray, respectively. f Reconstructed melanocyte subnetwork including all experimentally validated (red border) and predicted IFs (gold) for the conversion of fibroblasts towards melanocytes. Enhancer and promoter regulation (green) is distinguished from enhancer-only regulation (blue). Predicted interactions from position weight matrices using Homer are depicted as black dashed lines. g Recovery of experimentally validated IFs in seven target cell types using IRENE (green), Mogrify (orange), and the method from d’Alessio et al. (blue). The fraction of recovered IFs in multiple combinations of cellular conversions is depicted as box plots. The median is represented by a solid line within the boxes. The lower and upper bounds of boxes are the first and third quartile, respectively. Whiskers extend to 1.5-times the interquartile range or the minimum/maximum value. Dots correspond to outliers. n = 3 (Adipocytes), 4 (Hepatocytes), 10 (iPSC), 3 (Melanocytes), 1 (Myoblasts), 4 (Neuron), and 4 (NSC) combinations of cellular conversion factors. h Enrichment of predicted instructive factors in experimentally validated IF combinations. Predicted IFs are highlighted in green whereas TFs that were replaced by another validated and more efficient IF are highlighted in blue. TFs not predicted by IRENE are colored in black.

Validation of reconstructed GRNs

Before employing IRENE’s reconstructed networks to generate predictions of IFs for efficient cellular conversions, we interrogated their accuracy and cell-type-specificity. For that, we first examined whether the set of selected identity TFs and co-factors is implicated in the functionality of the cell or tissue type. Significantly enriched gene ontology (GO) terms of the network TFs were identified using WebGestalt29 and showed a highly specific enrichment for most cell or tissue types (Supplementary Data 1). For instance, subcutaneous adipocytes were enriched in positive regulation of fat cell differentiation, natural killer cells were enriched in defense response while mammary epithelial cells were enriched in the establishment of the skin barrier (Fig. 1c–e).

Next, we validated the reconstructed interactions among network TFs. In the presence of incomplete ground truth data, we first assessed the number of interactions within promoter regions that are compatible with cell-type-specific TF ChIP-seq data from ChIP-Atlas30 (Supplementary Note 1). Requiring a representative evaluation of at least 10 network TFs resulted in eight examples of different cell types and cell lines. We evaluated a total of 1044 TF ChIP-seq experiments and validated on average 80.98% of interactions whereas 8.84% of interactions were “false positives”, i.e., regulatory binding events only occurring in a cell type other than the target (Fig. 1b). Afterwards, we collected four experimentally validated, manually curated gold-standard networks of embryonic stem cells (ESCs)31, hepatocytes32, HepG2, and MCF7 cells33 to compare them against reconstructed networks from IRENE. Of note, only TFs common to the reconstructed and gold-standard networks were considered in this assessment. In particular, 79% of TFs in the gold-standard networks are, on average, present in the reconstructed networks by IRENE (range: 50–100%) (Supplementary Table 2, Supplementary Data 2). Moreover, we observed that the networks for ESCs, HepG2, and MCF7 cells were in perfect agreement whereas a single interaction was missing in the reconstructed hepatocyte network (Table 1). Moreover, IRENE inferred four new interactions of HNF1A and FOXA2 in the hepatocyte network that have been validated in TF knockdown studies of hepatoma cells34. Thus, 95% of interactions in the gold-standard networks were correctly reconstructed, which highlights IRENE’s accuracy. In addition, we set out to validate the choice of databases underlying IRENE and performed the same assessment using enhancer-gene associations from EnhancerAtlas35 and transcriptional regulatory interactions from GTRD36. Indeed, using the data from EnhancerAtlas and GTRD, we could only validate 52% of interactions in the gold-standard networks, which supports the choice of databases underlying IRENE (Supplementary Table 2, Supplementary Data 2).

Table 1 Benchmarking reconstructed core GRNs against experimentally validated core networks.

Prediction of IFs for inducing cellular conversions

Considering the stochastic nature of cellular conversions, we set out to convert reconstructed GRNs by IRENE into Deterministic Time Markov Chain models (DTMCs) that we can exploit for interrogating the dynamics of the system. For that, Boolean expressions were defined that connect the regulators of a TF and represent their competitive or cooperative action. IRENE characterizes two regulatory events as cooperative if their corresponding ChIP-seq peaks significantly overlap and an experimentally validated protein-protein interaction was reported in iRefIndex37 (see “Methods”). Otherwise, regulatory events are deemed competitive. Using these models, we developed a strategy for identifying combinations of TFs that induce cellular conversions with increased efficiency. In brief, IRENE identifies combinations of TFs whose over-expression at the initial cell type maximizes the agreement at the transcriptional and epigenetic level between the converted and target cells. To achieve this, IRENE assesses the probability that a perturbation activates the complete network of the target cell type and considers the amount of epigenetic restructuring needed to transform the enhancer/promoter landscape of the initial to the target cell type (see “Methods”).

To begin with, we assessed whether IRENE’s strategy to prioritize combinations of TFs is able to recapitulate known IFs. Starting from a collection of 29 human cell conversion experiments for which epigenetic and transcriptomic profiles were available, we first assessed the number of recovered IFs (Fig. 1h). Next, we compared our predictions against two former state-of-the-art approaches, Mogrify20 and d’Alessio et al.16. Indeed, IRENE substantially outperforms Mogrify and d’Alessio et.al, exhibiting median accuracy of 83.3% compared to 50% and 33.3%, respectively (Fig. 1g). Moreover, we observed a remarkable enrichment of predicted TFs for iPSCs, showing on average 95% recovery of known IFs compared to 72.5 and 45% with Mogrify and d’Alessio et al. (Fig. 1g).

Despite the overall increased performance, IRENE’s predictions of melanocytes were vastly inconsistent (17%), which prompted us to investigate this case more closely. Only three of the known IFs are included in the reconstructed melanocyte GRN, namely MITF, SOX10, and PAX3 (Fig. 1f). However, binding site predictions of known motifs from Homer38 in the promoter regions of known IFs confirmed many network TFs as upstream regulators. Importantly, one of the predicted TFs, TFAP2A, displays predicted binding sites within the promoter region of multiple IFs (Fig. 1f). In the presence of a recent study showing that TFAP2A is likely a pioneer factor capable of establishing competence for transcription, it is highly probable that TFAP2A could more efficiently induce melanocyte conversion39.

IRENE prioritizes more efficient combinations of IFs

Given that IRENE resembled a majority of known IFs and at the same time predicted other combinations, we investigated whether IRENE prioritizes combinations yielding higher cellular conversion efficiency. For that, we collected examples of IF combinations inducing the same transition with different efficiency. In order to assess the real contribution of the IFs on conversion efficiency, we required the combinations to be reported in the same study using the same experimental design as well as all IFs to be present in the reconstructed GRNs. As a result, only iPSC conversion fulfilled both of these criteria. In particular, we identified eight pairs of IFs fulfilling our inclusion criteria in which the efficiency was assessed and focused on these transitions.

First, IRENE was employed to reconstruct an iPSC network, which we assessed in terms of its constituent TFs (Fig. 2a). Apparently, except for LIN28A, all known inducers of iPS cells, i.e., NANOG, MYC, POU5F1, SOX2, KLF4, PRDM14, and MYCN, are contained in the network. In addition, the network contains FOXH1, ZNF423, and MTA3, which play diverse roles in the conversion to pluripotent stem cells. For example, FOXH1 significantly enhances iPSC conversion efficiency40 and ZNF423 is implicated in the maintenance of pluripotency and self-renewal8. In contrast, the functional role of MTA3 in the induction and maintenance of PSCs remains to be investigated. However, TP53 is a constituent of the reconstructed GRN, as well. Even though it is known to diminish iPS cell conversion efficiency41,42, TP53 plays an important role in the maintenance of ESCs41,43. Due to the dual role of TP53, we examined whether the diminished efficiency of iPSC conversion is reflected in the dynamics of the network. Indeed, combinations including TP53 yield significantly lower scores compared to combinations not containing it (Supplementary Fig. 2, one-sided Wilcoxon–Mann–Whitney test, p-value <  1.8e−5). Moreover, the network dynamics underpin the essential role of POU5F1 in the induction of pluripotency, showing that perturbations of fibroblasts without POU5F1 are not capable of activating the complete network (Fig. 2c). In addition, IRENE prioritizes PRDM14 over KLF4, which is consistent with previous reports showing that PRDM14 increases the efficiency of iPS cell conversions44 (Fig. 1h).

Fig. 2: Computational assessment of IRENE’s ability to prioritize IFs.
figure2

a Connected component of the reconstructed GRN of induced pluripotent stem cells. Enhancer and promoter regulation (green) is distinguished from enhancer-only regulation (blue). The size of the nodes (gold) is proportional to the number of regulated TFs. b Predicted conversion efficiency for inducing PSCs from hematopoietic stem cells (HSC, red), normal human dermal fibroblasts (NHDF, green), embryonic fibroblasts (blue), neural stem cells (NSC, orange), keratinocytes (purple), and melanocytes (gold). For each conversion, two experimentally validated combinations of IFs were compared. The predicted score of the combinations with the lower experimental efficiency is divided by the predicted score of the combination with the higher experimental efficiency and colored depending on the initial cell type. Each small square in a grid corresponds to 1%. (O = POU5F1, S = SOX2, K = KLF4, M = MYC) c Model simulations of 1000 random perturbations of NHDF cells that do not contain POU5F1 for 200 simulation steps. The color code represents the amount of dissimilarity with yellow representing maximum dissimilarity and red depicting perfect agreement. The similarity is measured as the number of expressed TFs in the iPSC network during the simulation. The iPSC network corresponds to dissimilarity of 0 (red) and cannot be induced without POU5F1 being expressed. d Comparison of EpCAM-positive cells (left) and predicted scores by IRENE (right) when iPSC differentiation towards mammary epithelial cells is induced with 5- (orange, red) or 6 (blue, green) TF combinations. The median is represented by a solid line within the boxes. The lower and upper bounds of boxes are the first and third quartile, respectively. Whiskers extend to 1.5-times interquartile range or the minimum/maximum value. n = 4 (6 TFs) and 5 (5 TFs) independent experiments. e.g., Comparison of scores predicted by IRENE with the percentage of successfully differentiated e mammary epithelial cells (blue) f melanocytes (red) and g NK cells (green).

Supported by the assessment of the iPSC network, we went on to compare the collected dyads of IFs starting from six different initial cell types, i.e., NSCs45, HSCs46, melanocytes47, keratinocytes48,49, newborn and adult fibroblasts50, and ranked them based on IRENE’s score. Strikingly, IRENE resembled each dyad of combinations correctly and assigned higher scores to combinations with higher efficiency (Fig. 2b).

Finally, since the number of predicted TFs per combination is a user-defined parameter of IRENE, we set out to interrogate the redundancy of predicted TFs in combinations of various sizes. In this regard, we focused on the differentiation of iPSCs into NK-cells, scored all combinations of network TFs of size four, five, and six, respectively, and ranked them based on the predicted scores (Supplementary Fig. 3a). As a result, we observed that the median rank of certain TFs, such as JUN and ELK4, is low, which implies that they predominantly occur in high-ranking combinations of all sizes, whereas others, such as ZNF107 and SP140, mostly occur in low-ranking combinations (Supplementary Fig. 3a). Intrigued by this finding, we explored whether the same trend can be observed for high-ranking combinations as a whole, i.e. whether high-ranking combinations of size k are subsumed in high-ranking combinations of size k + 1. However, in contrast to single TFs, the addition of a single factor to high scoring combinations does not always lead to new high scoring combinations, which underscores the highly non-linear dynamics imposed by the cooperative and competitive regulation of TFs (Supplementary Fig. 3b).

Experimental validation of increased conversion efficiency

To demonstrate IRENE’s ability to predict combinations of IFs, we set out to increase differentiation efficiency by first creating stable iPS lines for all experiments via genomic integration to ensure high, stable expression of IFs using the human TFome (Fig. 3a). We selected the three most commonly used types of protocols: (1) a protocol for differentiating a cell type in the origin media type to demonstrate that the TFs on their own are sufficient for differentiation of a cell type, (2) a differentiation protocol using destination media only to demonstrate that IFs are also effective at differentiating in destination type conditions, and (3) a previously published growth-factor based protocol to show that we can improve differentiation with our identified IFs. We selected three target cell types having an immediate application in therapeutic strategies where conversion efficiency constitutes a major impediment.

Fig. 3: Experimental validation of improved efficiency of cell type conversion.

a Stable iPS lines for all differentiation experiments were created prior to differentiation via genomic integration to ensure high, stable expression of IFs using the human TFome. b Protocol for differentiating human mammary epithelial cells (HMEC) from human iPS cells (hiPSCs). c Protocol for differentiating melanocytes from hiPSCs. d Protocol for differentiating NK cells from hiPSCs. (GF: growth factor, DOX: doxycycline). eg Differentiation efficiency of (e). HMECs, f melanocytes and g NK-cells from hiPSCs for various combinations of IFs generated from IRENE. Efficiency is defined as the number of marker positive/double-positive cells divided by the number of plated cells. n = 1 experiment from three pooled biologically independent samples.

For the first, we chose human mammary epithelial cells (HMECs) (Fig. 3b), whose potential in the repopulation of surgically resected mammary tissue has been explored for decades51. To date, this requires dissociation of mammary epithelial cells from one tissue environment and subsequent transplantation into another tissue. An efficient in vitro differentiation protocol of mammary epithelial cells would thus overcome this invasive procedure and provides a graft source that can be generated from virtually any patient cells.

For the second, we chose melanocytes (Fig. 3c), which provide a source of cellular grafts to replace damaged cells in the context of vitiligo, an autoimmune disease characterized by the destruction of melanocytes by immune cells, which results in white, unpigmented areas of the skin. To increase accessibility in the clinics and decrease costs, current approaches rely on the use of non-cultured melanocyte grafts, although transplantation of appropriately cultured melanocytes is more efficacious in the re-pigmentation of the skin52. Thus, our melanocyte differentiation protocol could serve as a way to increase the accessibility of cultured melanocyte grafts for treating vitiligo in order to achieve more favorable therapeutic outcomes.

For the final, we chose NK-cells (Fig. 3d), whose transplantation from allogeneic donors has been found to have a beneficial effect in the treatment of leukemia after chemotherapy53. Although this strategy has been proven useful in achieving a complete remission of the disease in some patients, the transplanted cells were frequently rejected53. In this regard, an efficient NK-cell differentiation protocol can substantially benefit the treatment of leukemia by using patient-derived iPSCs, which are expected to be well tolerated.

First, we thought it was important to demonstrate that selected IFs were causing differentiation directly in starting cell type media. To test this, we calculated combinations of TFs for differentiating a cell type without previously documented conversion protocols (mammary epithelial cells) and over-expressed the TFs in iPSCs cultured in stem cell media (mTeSR) (Fig. 3e, Supplementary Fig. 4a). As a result, we observed a high consistency between the experimental and computational ranking of EPCAM and ERBB2 double-positive cells (Fig. 2e). Each of the tested combinations resulted in at least 78.2% EPCAM-positive cells after 8 days, but not necessarily a mammary subtype. In addition, more over-expressed TFs lead to a significant increase in converted epithelial cells (Fig. 2d; Wilcoxon test p-value: 0.03). One combination, however, ([GRHL3, NFYC, VDR, KLF5, MAX]), appeared to shift the population double-positive for a large percentage of cells (~99%), compared to the number of seeded cells. To corroborate the induction of these cells, we performed RNA-seq experiments of the initial iPSC and converted cell populations. Comparison between the individual samples with iPSCs confirms the elevated expression of a larger set of mammary epithelial marker genes (Supplementary Fig. 5). In addition, a comparison of network TF expression of the converted cells and iPSCs shows that the over-expression of a small number of TFs was sufficient to induce these TFs in almost all combinations, which supports the network architecture reconstructed by IRENE (Supplementary Fig. 6). Despite the induction of marker genes and network TFs, we set out to assess the transcriptional similarity to mammary epithelial cells by deconvoluting the RNA-seq samples of iPSCs, converted cells, and a gold-standard mammary epithelial cell line (Supplementary Fig. 7). For that, we employed CybersortX54, a computational method for detecting the proportion of cell types present in an RNA-seq sample within a single-cell RNA-seq reference dataset. Based on a reference dataset assembled from human breast tumor tissue55 and iPSCs56, we found up to 14% of the converted cells to possess a mammary epithelial cell type whereas the remaining cells are largely possessing an iPSC phenotype. (Supplementary Fig. 7a). Intriguingly, we employed a HMEC line as a positive control and found only 23% of these cells to possess an epithelial transcriptional phenotype, suggesting a closer resemblance of the converted cells to the positive control than expected from the predicted fraction of epithelial cells. However, we speculate that longer differentiation or differentiation in a mammary-epithelial cell-specific media could result in a more holistic differentiation of the population and, thus, a more pronounced increase in the expression of marker genes and network TFs.

Next, we wanted to determine if IFs selected by IRENE could improve differentiation efficiency when placing cells of the starting type into media of the destination cell type as opposed to the starting cell type (Fig. 3f, Supplementary Fig. 4c). For this experiment, we differentiated iPSCs to melanocytes in melanocyte media with and without TF over-expression. We found that while destination media was sufficient to partially differentiate iPSCs to melanocytes, two of four TF combinations were able to considerably increase the efficiency of differentiation by more than 900% of Mel.2-CD26 double-positive cells (medium alone: 0.49%; TFs: 4.7%) (Fig. 2f). Notably, the lowest ranking combination ([RXRG, PAX3, SOX10, E2F7]) resulted in the second-highest efficiency, only superseded by the combination [RXRG, ETS1, TFAP2A, HOXC9, E2F7, MSC] (Fig. 2f). We suspect that this effect is due to the composition of the growth medium and that it can activate RXRG with retinoic acid, if it is expressed. Indeed, retinoid acid, through RXR activation, is a well-known inducer of melanogenesis57. Similar to the case of mammary epithelial cells, RNA-seq confirms the expression of melanocyte marker genes and network TFs, especially for combinations increasing the efficiency (Supplementary Figs. 89). Moreover, deconvolution of the converted cell RNA-seq samples, using a single-cell reference dataset composed of iPSCs56 as well as neonatal and adult skin samples enriched for melanocytes58, shows up to 93% of successfully converted cells that do not possess an iPSC phenotype anymore (Supplementary Fig. 7b).

Finally, we sought to determine if IRENE could produce combinations of IFs that could increase the conversion efficiency of established differentiation protocols. To test this, we performed NK-cell differentiation using an established differentiation protocol59 and measured if the related cellular markers were more prominently differentiated in iPSC lines with over-expressed TFs than a control iPS cell line (Fig. 3g, Supplementary Fig. 4b). Again, we found a high consistency between the experimental and computational ranking of CD56 + NKp46+ double-positive cells (Fig. 2g). In particular, five of eight iPSC lines with combinations of IFs over-expressed after spin-EB differentiation ([JUN, ETS1, FLI1, IRF4, ELK4, ZNF107], [JUN, ETS1, FLI1, IRF4, IRF8, ELK4], [JUN, ETS1, FLI1, IRF4, IRF8], [JUN, ETS1, FLI1, IRF4, MBD4, ELK4] and [JUN, ETS1, FLI1, IRF4, RFX5]) increased the number of CD56 + NKp46 NK-cells by up to 250% compared to the line without IFs, yielding an efficiency of 2.6% with respect to double-positive cells (Fig. 3g). Furthermore, these cell lines expressed a greater percentage of other mature NK-cell markers (Fig. 3g), indicating that not only were more NK-cells produced, but that the cells that were produced were more mature than the iPSC control line. This finding is corroborated by corresponding RNA-sequencing analysis (Supplementary Figs. 1011). Except for one combination ([JUN, ELK4, ETS1, FLI1, IRF4]), all combinations induce the expression of NK-cell marker genes and network TFs. This is consistent with the fact that this combination only results in an efficiency of 0.28%, which is lower than the bona fide NK differentiation protocol alone (Fig. 2g). Moreover, deconvolution of converted cell RNA-seq samples using a single-cell reference dataset composed of peripheral blood mononuclear cells60 and iPSCs56 further underscores the possession of an NK-cell phenotype for most combinations (Supplementary Fig. 7c). In particular, except for one cell line converted with the IF combination [JUN, ETS1, FLI1, IRF4, MBD4], between 16 and 30% of converted cells in each sample are predicted to be NK cells.

Original Text (This is the original text for your reference.)

Reconstruction of cell-type-specific core GRNs

We propose a computer-guided design tool for TF over-expression-based cellular conversions to overcome the abiding issue of conversion efficiency. For that, we developed IRENE, a computational framework that models the major determinants of conversion efficiency and prioritizes more efficient sets of IFs (Supplementary Fig. 1). IRENE identifies these IFs by integrating transcriptomic and epigenetic profiles along with publicly available TF binding sites and enhancer-promoter interactions to reconstruct cell-type-specific core GRNs. For each TF, active enhancer and promoter regions are established by combining enhancer-promoter interactions from GeneHancer27 with cell-type-specific H3K27ac peaks and identifying H3K4me3 peaks around transcription start sites (TSS), respectively. IRENE filters these regions by overlaying cell-type-specific DNase-seq peaks to determine regulatory binding events within these regions and reconstructs transcriptional regulators from over 224 million TF ChIP-seq peaks. Finally, IRENE identifies a set of 10 identity TFs by computing the TFs with the highest cell-type-specific expression in comparison to 7600 phenotypes using a modified version of Jensen-Shannon-Divergence (JSD)16. In addition, TFs fulfilling the following three conditions are included as co-factors of these identity TFs. First, each co-factor has to be significantly expressed. Second, it has to be regulated by at least one of the identified identity TFs and, third, it has to regulate at least one identity TF. Of note, IRENE does not impose a maximum number of co-factors. Thus, all TFs fulfilling these criteria are included in the network. Finally, the core GRN is composed of all regulatory interactions between identity TFs and their co-factors.

We employed IRENE to reconstruct core GRNs for 72 human cell types, cell lines, and tissues. Every network has up to 51 TFs (on average 18.5 TFs), while every TF in the network has up to 46 regulators (on average 15.0) and 44 active enhancers (on average 6.0). The number of enhancers per gene follows an exponential distribution where the majority of genes have one or two active enhancers, which is consistent with enhancer-promoter interactions obtained from promoter capture Hi-C experiments28 (Fig. 1a). Moreover, unlike co-factors, core TFs are always differentially expressed between the initial and final cell types according to commonly used criteria (fold change > 2). Nevertheless, although co-factors are not necessarily differentially expressed, they are equally likely to be contained in the predicted IF combinations, since their over-expression could be beneficial to overcome the transcriptional and epigenetic barriers (Supplementary Table 1).

Fig. 1: Benchmarking of IRENE.
figure1

a The number of enhancers per gene (blue) across all networks follows an exponential distribution (orange). b Benchmark of reconstructed networks against cell-type-specific TF ChIP-Seq data for 8 cell types/cell lines. True positives (TP, blue) represent the interactions that are present in the reconstructed GRNs and are experimentally validated by cell-type-specific TF ChIP-Seq data. Interactions validated by TF ChIP-Seq data only profiled in cell-types other than the one under consideration are considered false positives (FP, orange). n = 13 (Adipocyte), 69 (ESC), 236 (GM12878), 238 (HeLaS3), 292 (HepG2), 53 (K562), 37 (Keratinocyte) and 178 (MCF7) interactions. ce Most-highly enriched significant gene ontology terms for the reconstructed networks of c adipocytes, d natural killer cells and e mammary epithelial cells. For adipocytes and mammary epithelial cells, the top 5 GO terms are represented. For natural killer cells, only two terms were significantly enriched. Cells corresponding to TFs that are and are not relevant for a particular GO term are colored in red and gray, respectively. f Reconstructed melanocyte subnetwork including all experimentally validated (red border) and predicted IFs (gold) for the conversion of fibroblasts towards melanocytes. Enhancer and promoter regulation (green) is distinguished from enhancer-only regulation (blue). Predicted interactions from position weight matrices using Homer are depicted as black dashed lines. g Recovery of experimentally validated IFs in seven target cell types using IRENE (green), Mogrify (orange), and the method from d’Alessio et al. (blue). The fraction of recovered IFs in multiple combinations of cellular conversions is depicted as box plots. The median is represented by a solid line within the boxes. The lower and upper bounds of boxes are the first and third quartile, respectively. Whiskers extend to 1.5-times the interquartile range or the minimum/maximum value. Dots correspond to outliers. n = 3 (Adipocytes), 4 (Hepatocytes), 10 (iPSC), 3 (Melanocytes), 1 (Myoblasts), 4 (Neuron), and 4 (NSC) combinations of cellular conversion factors. h Enrichment of predicted instructive factors in experimentally validated IF combinations. Predicted IFs are highlighted in green whereas TFs that were replaced by another validated and more efficient IF are highlighted in blue. TFs not predicted by IRENE are colored in black.

Validation of reconstructed GRNs

Before employing IRENE’s reconstructed networks to generate predictions of IFs for efficient cellular conversions, we interrogated their accuracy and cell-type-specificity. For that, we first examined whether the set of selected identity TFs and co-factors is implicated in the functionality of the cell or tissue type. Significantly enriched gene ontology (GO) terms of the network TFs were identified using WebGestalt29 and showed a highly specific enrichment for most cell or tissue types (Supplementary Data 1). For instance, subcutaneous adipocytes were enriched in positive regulation of fat cell differentiation, natural killer cells were enriched in defense response while mammary epithelial cells were enriched in the establishment of the skin barrier (Fig. 1c–e).

Next, we validated the reconstructed interactions among network TFs. In the presence of incomplete ground truth data, we first assessed the number of interactions within promoter regions that are compatible with cell-type-specific TF ChIP-seq data from ChIP-Atlas30 (Supplementary Note 1). Requiring a representative evaluation of at least 10 network TFs resulted in eight examples of different cell types and cell lines. We evaluated a total of 1044 TF ChIP-seq experiments and validated on average 80.98% of interactions whereas 8.84% of interactions were “false positives”, i.e., regulatory binding events only occurring in a cell type other than the target (Fig. 1b). Afterwards, we collected four experimentally validated, manually curated gold-standard networks of embryonic stem cells (ESCs)31, hepatocytes32, HepG2, and MCF7 cells33 to compare them against reconstructed networks from IRENE. Of note, only TFs common to the reconstructed and gold-standard networks were considered in this assessment. In particular, 79% of TFs in the gold-standard networks are, on average, present in the reconstructed networks by IRENE (range: 50–100%) (Supplementary Table 2, Supplementary Data 2). Moreover, we observed that the networks for ESCs, HepG2, and MCF7 cells were in perfect agreement whereas a single interaction was missing in the reconstructed hepatocyte network (Table 1). Moreover, IRENE inferred four new interactions of HNF1A and FOXA2 in the hepatocyte network that have been validated in TF knockdown studies of hepatoma cells34. Thus, 95% of interactions in the gold-standard networks were correctly reconstructed, which highlights IRENE’s accuracy. In addition, we set out to validate the choice of databases underlying IRENE and performed the same assessment using enhancer-gene associations from EnhancerAtlas35 and transcriptional regulatory interactions from GTRD36. Indeed, using the data from EnhancerAtlas and GTRD, we could only validate 52% of interactions in the gold-standard networks, which supports the choice of databases underlying IRENE (Supplementary Table 2, Supplementary Data 2).

Table 1 Benchmarking reconstructed core GRNs against experimentally validated core networks.

Prediction of IFs for inducing cellular conversions

Considering the stochastic nature of cellular conversions, we set out to convert reconstructed GRNs by IRENE into Deterministic Time Markov Chain models (DTMCs) that we can exploit for interrogating the dynamics of the system. For that, Boolean expressions were defined that connect the regulators of a TF and represent their competitive or cooperative action. IRENE characterizes two regulatory events as cooperative if their corresponding ChIP-seq peaks significantly overlap and an experimentally validated protein-protein interaction was reported in iRefIndex37 (see “Methods”). Otherwise, regulatory events are deemed competitive. Using these models, we developed a strategy for identifying combinations of TFs that induce cellular conversions with increased efficiency. In brief, IRENE identifies combinations of TFs whose over-expression at the initial cell type maximizes the agreement at the transcriptional and epigenetic level between the converted and target cells. To achieve this, IRENE assesses the probability that a perturbation activates the complete network of the target cell type and considers the amount of epigenetic restructuring needed to transform the enhancer/promoter landscape of the initial to the target cell type (see “Methods”).

To begin with, we assessed whether IRENE’s strategy to prioritize combinations of TFs is able to recapitulate known IFs. Starting from a collection of 29 human cell conversion experiments for which epigenetic and transcriptomic profiles were available, we first assessed the number of recovered IFs (Fig. 1h). Next, we compared our predictions against two former state-of-the-art approaches, Mogrify20 and d’Alessio et al.16. Indeed, IRENE substantially outperforms Mogrify and d’Alessio et.al, exhibiting median accuracy of 83.3% compared to 50% and 33.3%, respectively (Fig. 1g). Moreover, we observed a remarkable enrichment of predicted TFs for iPSCs, showing on average 95% recovery of known IFs compared to 72.5 and 45% with Mogrify and d’Alessio et al. (Fig. 1g).

Despite the overall increased performance, IRENE’s predictions of melanocytes were vastly inconsistent (17%), which prompted us to investigate this case more closely. Only three of the known IFs are included in the reconstructed melanocyte GRN, namely MITF, SOX10, and PAX3 (Fig. 1f). However, binding site predictions of known motifs from Homer38 in the promoter regions of known IFs confirmed many network TFs as upstream regulators. Importantly, one of the predicted TFs, TFAP2A, displays predicted binding sites within the promoter region of multiple IFs (Fig. 1f). In the presence of a recent study showing that TFAP2A is likely a pioneer factor capable of establishing competence for transcription, it is highly probable that TFAP2A could more efficiently induce melanocyte conversion39.

IRENE prioritizes more efficient combinations of IFs

Given that IRENE resembled a majority of known IFs and at the same time predicted other combinations, we investigated whether IRENE prioritizes combinations yielding higher cellular conversion efficiency. For that, we collected examples of IF combinations inducing the same transition with different efficiency. In order to assess the real contribution of the IFs on conversion efficiency, we required the combinations to be reported in the same study using the same experimental design as well as all IFs to be present in the reconstructed GRNs. As a result, only iPSC conversion fulfilled both of these criteria. In particular, we identified eight pairs of IFs fulfilling our inclusion criteria in which the efficiency was assessed and focused on these transitions.

First, IRENE was employed to reconstruct an iPSC network, which we assessed in terms of its constituent TFs (Fig. 2a). Apparently, except for LIN28A, all known inducers of iPS cells, i.e., NANOG, MYC, POU5F1, SOX2, KLF4, PRDM14, and MYCN, are contained in the network. In addition, the network contains FOXH1, ZNF423, and MTA3, which play diverse roles in the conversion to pluripotent stem cells. For example, FOXH1 significantly enhances iPSC conversion efficiency40 and ZNF423 is implicated in the maintenance of pluripotency and self-renewal8. In contrast, the functional role of MTA3 in the induction and maintenance of PSCs remains to be investigated. However, TP53 is a constituent of the reconstructed GRN, as well. Even though it is known to diminish iPS cell conversion efficiency41,42, TP53 plays an important role in the maintenance of ESCs41,43. Due to the dual role of TP53, we examined whether the diminished efficiency of iPSC conversion is reflected in the dynamics of the network. Indeed, combinations including TP53 yield significantly lower scores compared to combinations not containing it (Supplementary Fig. 2, one-sided Wilcoxon–Mann–Whitney test, p-value <  1.8e−5). Moreover, the network dynamics underpin the essential role of POU5F1 in the induction of pluripotency, showing that perturbations of fibroblasts without POU5F1 are not capable of activating the complete network (Fig. 2c). In addition, IRENE prioritizes PRDM14 over KLF4, which is consistent with previous reports showing that PRDM14 increases the efficiency of iPS cell conversions44 (Fig. 1h).

Fig. 2: Computational assessment of IRENE’s ability to prioritize IFs.
figure2

a Connected component of the reconstructed GRN of induced pluripotent stem cells. Enhancer and promoter regulation (green) is distinguished from enhancer-only regulation (blue). The size of the nodes (gold) is proportional to the number of regulated TFs. b Predicted conversion efficiency for inducing PSCs from hematopoietic stem cells (HSC, red), normal human dermal fibroblasts (NHDF, green), embryonic fibroblasts (blue), neural stem cells (NSC, orange), keratinocytes (purple), and melanocytes (gold). For each conversion, two experimentally validated combinations of IFs were compared. The predicted score of the combinations with the lower experimental efficiency is divided by the predicted score of the combination with the higher experimental efficiency and colored depending on the initial cell type. Each small square in a grid corresponds to 1%. (O = POU5F1, S = SOX2, K = KLF4, M = MYC) c Model simulations of 1000 random perturbations of NHDF cells that do not contain POU5F1 for 200 simulation steps. The color code represents the amount of dissimilarity with yellow representing maximum dissimilarity and red depicting perfect agreement. The similarity is measured as the number of expressed TFs in the iPSC network during the simulation. The iPSC network corresponds to dissimilarity of 0 (red) and cannot be induced without POU5F1 being expressed. d Comparison of EpCAM-positive cells (left) and predicted scores by IRENE (right) when iPSC differentiation towards mammary epithelial cells is induced with 5- (orange, red) or 6 (blue, green) TF combinations. The median is represented by a solid line within the boxes. The lower and upper bounds of boxes are the first and third quartile, respectively. Whiskers extend to 1.5-times interquartile range or the minimum/maximum value. n = 4 (6 TFs) and 5 (5 TFs) independent experiments. e.g., Comparison of scores predicted by IRENE with the percentage of successfully differentiated e mammary epithelial cells (blue) f melanocytes (red) and g NK cells (green).

Supported by the assessment of the iPSC network, we went on to compare the collected dyads of IFs starting from six different initial cell types, i.e., NSCs45, HSCs46, melanocytes47, keratinocytes48,49, newborn and adult fibroblasts50, and ranked them based on IRENE’s score. Strikingly, IRENE resembled each dyad of combinations correctly and assigned higher scores to combinations with higher efficiency (Fig. 2b).

Finally, since the number of predicted TFs per combination is a user-defined parameter of IRENE, we set out to interrogate the redundancy of predicted TFs in combinations of various sizes. In this regard, we focused on the differentiation of iPSCs into NK-cells, scored all combinations of network TFs of size four, five, and six, respectively, and ranked them based on the predicted scores (Supplementary Fig. 3a). As a result, we observed that the median rank of certain TFs, such as JUN and ELK4, is low, which implies that they predominantly occur in high-ranking combinations of all sizes, whereas others, such as ZNF107 and SP140, mostly occur in low-ranking combinations (Supplementary Fig. 3a). Intrigued by this finding, we explored whether the same trend can be observed for high-ranking combinations as a whole, i.e. whether high-ranking combinations of size k are subsumed in high-ranking combinations of size k + 1. However, in contrast to single TFs, the addition of a single factor to high scoring combinations does not always lead to new high scoring combinations, which underscores the highly non-linear dynamics imposed by the cooperative and competitive regulation of TFs (Supplementary Fig. 3b).

Experimental validation of increased conversion efficiency

To demonstrate IRENE’s ability to predict combinations of IFs, we set out to increase differentiation efficiency by first creating stable iPS lines for all experiments via genomic integration to ensure high, stable expression of IFs using the human TFome (Fig. 3a). We selected the three most commonly used types of protocols: (1) a protocol for differentiating a cell type in the origin media type to demonstrate that the TFs on their own are sufficient for differentiation of a cell type, (2) a differentiation protocol using destination media only to demonstrate that IFs are also effective at differentiating in destination type conditions, and (3) a previously published growth-factor based protocol to show that we can improve differentiation with our identified IFs. We selected three target cell types having an immediate application in therapeutic strategies where conversion efficiency constitutes a major impediment.

Fig. 3: Experimental validation of improved efficiency of cell type conversion.

a Stable iPS lines for all differentiation experiments were created prior to differentiation via genomic integration to ensure high, stable expression of IFs using the human TFome. b Protocol for differentiating human mammary epithelial cells (HMEC) from human iPS cells (hiPSCs). c Protocol for differentiating melanocytes from hiPSCs. d Protocol for differentiating NK cells from hiPSCs. (GF: growth factor, DOX: doxycycline). eg Differentiation efficiency of (e). HMECs, f melanocytes and g NK-cells from hiPSCs for various combinations of IFs generated from IRENE. Efficiency is defined as the number of marker positive/double-positive cells divided by the number of plated cells. n = 1 experiment from three pooled biologically independent samples.

For the first, we chose human mammary epithelial cells (HMECs) (Fig. 3b), whose potential in the repopulation of surgically resected mammary tissue has been explored for decades51. To date, this requires dissociation of mammary epithelial cells from one tissue environment and subsequent transplantation into another tissue. An efficient in vitro differentiation protocol of mammary epithelial cells would thus overcome this invasive procedure and provides a graft source that can be generated from virtually any patient cells.

For the second, we chose melanocytes (Fig. 3c), which provide a source of cellular grafts to replace damaged cells in the context of vitiligo, an autoimmune disease characterized by the destruction of melanocytes by immune cells, which results in white, unpigmented areas of the skin. To increase accessibility in the clinics and decrease costs, current approaches rely on the use of non-cultured melanocyte grafts, although transplantation of appropriately cultured melanocytes is more efficacious in the re-pigmentation of the skin52. Thus, our melanocyte differentiation protocol could serve as a way to increase the accessibility of cultured melanocyte grafts for treating vitiligo in order to achieve more favorable therapeutic outcomes.

For the final, we chose NK-cells (Fig. 3d), whose transplantation from allogeneic donors has been found to have a beneficial effect in the treatment of leukemia after chemotherapy53. Although this strategy has been proven useful in achieving a complete remission of the disease in some patients, the transplanted cells were frequently rejected53. In this regard, an efficient NK-cell differentiation protocol can substantially benefit the treatment of leukemia by using patient-derived iPSCs, which are expected to be well tolerated.

First, we thought it was important to demonstrate that selected IFs were causing differentiation directly in starting cell type media. To test this, we calculated combinations of TFs for differentiating a cell type without previously documented conversion protocols (mammary epithelial cells) and over-expressed the TFs in iPSCs cultured in stem cell media (mTeSR) (Fig. 3e, Supplementary Fig. 4a). As a result, we observed a high consistency between the experimental and computational ranking of EPCAM and ERBB2 double-positive cells (Fig. 2e). Each of the tested combinations resulted in at least 78.2% EPCAM-positive cells after 8 days, but not necessarily a mammary subtype. In addition, more over-expressed TFs lead to a significant increase in converted epithelial cells (Fig. 2d; Wilcoxon test p-value: 0.03). One combination, however, ([GRHL3, NFYC, VDR, KLF5, MAX]), appeared to shift the population double-positive for a large percentage of cells (~99%), compared to the number of seeded cells. To corroborate the induction of these cells, we performed RNA-seq experiments of the initial iPSC and converted cell populations. Comparison between the individual samples with iPSCs confirms the elevated expression of a larger set of mammary epithelial marker genes (Supplementary Fig. 5). In addition, a comparison of network TF expression of the converted cells and iPSCs shows that the over-expression of a small number of TFs was sufficient to induce these TFs in almost all combinations, which supports the network architecture reconstructed by IRENE (Supplementary Fig. 6). Despite the induction of marker genes and network TFs, we set out to assess the transcriptional similarity to mammary epithelial cells by deconvoluting the RNA-seq samples of iPSCs, converted cells, and a gold-standard mammary epithelial cell line (Supplementary Fig. 7). For that, we employed CybersortX54, a computational method for detecting the proportion of cell types present in an RNA-seq sample within a single-cell RNA-seq reference dataset. Based on a reference dataset assembled from human breast tumor tissue55 and iPSCs56, we found up to 14% of the converted cells to possess a mammary epithelial cell type whereas the remaining cells are largely possessing an iPSC phenotype. (Supplementary Fig. 7a). Intriguingly, we employed a HMEC line as a positive control and found only 23% of these cells to possess an epithelial transcriptional phenotype, suggesting a closer resemblance of the converted cells to the positive control than expected from the predicted fraction of epithelial cells. However, we speculate that longer differentiation or differentiation in a mammary-epithelial cell-specific media could result in a more holistic differentiation of the population and, thus, a more pronounced increase in the expression of marker genes and network TFs.

Next, we wanted to determine if IFs selected by IRENE could improve differentiation efficiency when placing cells of the starting type into media of the destination cell type as opposed to the starting cell type (Fig. 3f, Supplementary Fig. 4c). For this experiment, we differentiated iPSCs to melanocytes in melanocyte media with and without TF over-expression. We found that while destination media was sufficient to partially differentiate iPSCs to melanocytes, two of four TF combinations were able to considerably increase the efficiency of differentiation by more than 900% of Mel.2-CD26 double-positive cells (medium alone: 0.49%; TFs: 4.7%) (Fig. 2f). Notably, the lowest ranking combination ([RXRG, PAX3, SOX10, E2F7]) resulted in the second-highest efficiency, only superseded by the combination [RXRG, ETS1, TFAP2A, HOXC9, E2F7, MSC] (Fig. 2f). We suspect that this effect is due to the composition of the growth medium and that it can activate RXRG with retinoic acid, if it is expressed. Indeed, retinoid acid, through RXR activation, is a well-known inducer of melanogenesis57. Similar to the case of mammary epithelial cells, RNA-seq confirms the expression of melanocyte marker genes and network TFs, especially for combinations increasing the efficiency (Supplementary Figs. 89). Moreover, deconvolution of the converted cell RNA-seq samples, using a single-cell reference dataset composed of iPSCs56 as well as neonatal and adult skin samples enriched for melanocytes58, shows up to 93% of successfully converted cells that do not possess an iPSC phenotype anymore (Supplementary Fig. 7b).

Finally, we sought to determine if IRENE could produce combinations of IFs that could increase the conversion efficiency of established differentiation protocols. To test this, we performed NK-cell differentiation using an established differentiation protocol59 and measured if the related cellular markers were more prominently differentiated in iPSC lines with over-expressed TFs than a control iPS cell line (Fig. 3g, Supplementary Fig. 4b). Again, we found a high consistency between the experimental and computational ranking of CD56 + NKp46+ double-positive cells (Fig. 2g). In particular, five of eight iPSC lines with combinations of IFs over-expressed after spin-EB differentiation ([JUN, ETS1, FLI1, IRF4, ELK4, ZNF107], [JUN, ETS1, FLI1, IRF4, IRF8, ELK4], [JUN, ETS1, FLI1, IRF4, IRF8], [JUN, ETS1, FLI1, IRF4, MBD4, ELK4] and [JUN, ETS1, FLI1, IRF4, RFX5]) increased the number of CD56 + NKp46 NK-cells by up to 250% compared to the line without IFs, yielding an efficiency of 2.6% with respect to double-positive cells (Fig. 3g). Furthermore, these cell lines expressed a greater percentage of other mature NK-cell markers (Fig. 3g), indicating that not only were more NK-cells produced, but that the cells that were produced were more mature than the iPSC control line. This finding is corroborated by corresponding RNA-sequencing analysis (Supplementary Figs. 1011). Except for one combination ([JUN, ELK4, ETS1, FLI1, IRF4]), all combinations induce the expression of NK-cell marker genes and network TFs. This is consistent with the fact that this combination only results in an efficiency of 0.28%, which is lower than the bona fide NK differentiation protocol alone (Fig. 2g). Moreover, deconvolution of converted cell RNA-seq samples using a single-cell reference dataset composed of peripheral blood mononuclear cells60 and iPSCs56 further underscores the possession of an NK-cell phenotype for most combinations (Supplementary Fig. 7c). In particular, except for one cell line converted with the IF combination [JUN, ETS1, FLI1, IRF4, MBD4], between 16 and 30% of converted cells in each sample are predicted to be NK cells.

Comments

    Something to say?

    Log in or Sign up for free

    Disclaimer: The translated content is provided by third-party translation service providers, and IKCEST shall not assume any responsibility for the accuracy and legality of the content.
    Translate engine
    Article's language
    English
    中文
    Pусск
    Français
    Español
    العربية
    Português
    Kikongo
    Dutch
    kiswahili
    هَوُسَ
    IsiZulu
    Action
    Related

    Report

    Select your report category*



    Reason*



    By pressing send, your feedback will be used to improve IKCEST. Your privacy will be protected.

    Submit
    Cancel