4 Toy Network Examples
Here, we will walk through a brief tutorial of a NETISCE run. The files necessary to complete the tutorial are within the
input data folder of both
The results from these Toy examples can be found in the toy_example_results folder of the main github repository.
4.1 OverviewWe will use a simple toy network of 6 nodes and 9 edges.
You can find the relevant data files in the
In this example, we have 2 samples, A and B, with three replicates each (A_1,A_2,A_3, etc).
The normalized expression data is housed in
expressions.csv, and contains normalized expression values for the network nodes. Note that the value of F has been initialized to zero. A feature of NETISCE is that it does not require non-zero values for all network nodes, which is useful if the data is unavailable for a node. Such values can either be entered in the
expressions.csv file as 0s, or the node can be excluded entirely from this file, depending on user preference (in this case, the values will automatically be initialized to 0).
samples.txt file specifies that A is associated to a treatment sensitive phenotype, while B is associated to a resistance phenotype.
Note that you can use any term to describe the phenotypes. Just be sure to be consistent with the
param.undesired variables within the Nextflow
Lastly, we need to include a list of internal marker nodes. This list is in
internal_marker.txt. For our small network, the internal-marker node is
4.3 NETISCE run configuration
With all your input data files loaded, next we configure the nextflow run in either
NETSICE_hpc (Note: while we do recommend you run NETISCE on a hpc, this example is small enough to run locally).
NETISCE.nf. Here, you need to specify the parameters for the Nextflow run on lines 3-19. Please refer to section 2.5 for parameter definitions.
For this example, your parameters should look like:
params.expressions = "$baseDir/input_data/expressions.csv" params.network = "$baseDir/input_data/network.sif" params.samples = "$baseDir/input_data/samples.txt" params.internal_control="$baseDir/input_data/internal_marker.txt" // params.mutations="$baseDir/input_data/mutations.csv" params.alpha = 0.9 params.undesired = 'resistant' params.desired = 'sensitive' params.filter ="strict" params.kmeans_min_val = 2 params.kmeans_max_val = 10 params.num_nodes = 6 // that have expression data params.num_states = 100000 params.randseed=0
Some Notes: make sure to include
$baseDir before pointing to the folder containing your input data. Also, be sure that
params.num_nodes is the number of nodes where there exists data within
expressions.csv. Finally, by adjusting the
params.randseed you can identify alternate FVSes within the nextwork.
4.4 Run NETISCEIn your terminal/command prompt, navigate to the appropriate NETISCE folder (
local). To start your run, enter
./nextflow run NETISCE.nf -resume. While NETISCE is running, your terminal should look like this, where you can see the progress on each step of the pipeline:
The first column contains the location (folder and subfolder) that is running that step of NETISCE within the
4.5 NETSICE Results
Let’s take a look at the results of our NETISCE run, where the goal was to shift the system from the undesired state B, and towards the desired state A. These results can be found in the toy_example_1 subfolder of the toy_example_results folder of the main github repository.
Our internal marker node was node C. In this file we see the steady state values of node see in the A sample replicates A_1, A_2 and A_3 and B sample replicates B_1, B_2, and B_3 (as computed by SFA).
experimental_internalmarkers.pdfThe above numbers may be a little challenging to read! So, we have included a plot of the values in the
On this histogram, we see bars for each of the samples and their replicates. The A (sensitive) samples are marked by a blue vertical line at their steady state value, while the B (resistant) samples are marked by a red vertical line at their steady state value. The grey bars can be aggregated to show the number of attractors with values for C that are binned together (for example, in the above table we see that the steady state value of C in replicate B_2=0.171 and replicate B_3=0.166. These values are binned together on the histogram to show that two replicates had similar values.)
Here, we see that the values of node C are well separated between the two phenotypes (all of the A values are greater than all of the B values). We will assume that this also aligns with the biological knowledge of the system.
After estimating attractors for the experimental and randomly generated initial states, the resultant attractors were clustered using k-means clustering. The elbow and silhouette metrics are used to determine the optimal number k.
The elbow metric found the optimal number of k clusters to be k=5.
The silhouette metric found the optimal number of k clusters to be k=2.
Since the optimal ks identified by the silhouette metric and the elbow metric do not match, NETISCE chooses the smaller k, as long as the phenotypes remain separate (NETISCE checks to make sure this is true).
kmeans.txt file contains the clustering results for each attractor generated from the experimental data and the randomly generated initial states. The first column contains the sample name, and the second column contains the ID of which cluster it is assigned to. Since k=2 in this case, one cluster is named
“0” and the other cluster is named “1”
And we see in the
kmeans.txt file, that the A samples are clustered in cluster 0, while the B samples are grouped in cluster 1.
18.104.22.168 FVS_Finding output of FVSes.txt
FVS_Finding Nextflow script was used to identify all unique FVSes in the toy network.
The first column contains the FVS identifier, where the number is the
randomseed value where that FVS was identified. There were four FVSes identified in this network, each comprised of two nodes (column2 and column 3)
We selected FVS_0 to be analyzed in this example. Therefore,
params.randomseed was set to 0
The FVS finding algorithm identified nodes B and E to be a minimal FVS control nodes in the toy network. Since the FVS control node set contained 2 nodes, 9 combinations of perturbations were performed on the control node sets.
This file contains a list of IDs for the perturbations to FVS control nodes that passed criterion 1.
5 out of the 9 perturbations passed the machine learning filtering criterion.
NETISCE calculated perturbations on FVS control nodes when the system was initialized from the three undesired B replicates (B_1,B_2,B_3). The steady state values of the internal marker node C was extracted from each of these replicate perturbations that passed filtering criterion 1 (i.e., the steady state value of C in pert_0, pert_3, pert_6, pert_7, pert_8). Here we will show the steady state values of C under FVS perturbations when the system was initialized with the normalized expression values of B_1 (contained in the file
pert_replicate_1_internal_markers.txt; the values of C for the system under perturbations when initialized from B_2 are found in
pert_replicate_2_internal_markers.txt, the values of C for the system under perturbations when initialized from B_3 are found in
This file contains a table of the perturbations on FVS control nodes that passed both the 1st and 2nd filtering criteria. it also contains the number of upregulation,downregulations, and total number of nodes perturbed for each perturbation set.
Here, we see that four perturbations that passed both filtering criteria.
Let’s take a quick look at the steady state values for these perturbations, and the attractors generated from the experimental data:
Indeed, we see that the steady-state expression values of node C in the attractors generated by peturbations to the FVS control nodes are all are greater than the steady-state expression values of node C in the attractors generated from the sensitive A sample. A successful reprogramming from resistant (B) to sensitive (A) cells has occurred!
4.6 Toy Example with mutations
Let’s say that in our system, gene D exhibits a gain of function mutation in the sensitive phenotype (A samples). If we want to include this in our simulations, we will use the
First, we must add to our
input_data folder a
.csv file containing the mutational profile. Let’s call this file
The gain of function mutation is encoded with 1 (loss-of-function mutations can be encoded with “0”).
Next, we make sure that the parameters in
NETISCE_mutations.nf on lines 3-19 are set correctly for the conditions
For this example, your parameters should look like:
params.expressions = "$baseDir/input_data/expressions.csv" params.network = "$baseDir/input_data/network.sif" params.samples = "$baseDir/input_data/samples.txt" params.internal_control="$baseDir/input_data/internal_marker.txt" params.mutations="$baseDir/input_data/mutations.csv" params.alpha = 0.9 params.undesired = 'resistant' params.desired = 'sensitive' params.filter ="strict" params.kmeans_min_val = 2 params.kmeans_max_val = 10 params.num_nodes = 4 // that have expression data params.num_states = 1000
Note, the additional parameter
params.mutations that points to the
As above, to run Netisce, enter
./nextflow run NETISCE.nf -resume.
By including mutational information, the results of NETISCE have changed.These results can be found in the toy_example_2 subfolder of the toy_example_results folder of the main github repository.
successful_controlnode_perturbations.txt file contains pert_7 in addition to perturbations pert_3, pert_6, and pert_8
Let’s take a look at the steady-state expression values of node C in the attractors generated from the successful perturbations and the experimental initial states B_1 when mutational information is included.
Though the values are different in this system with mutations, we still see that the steady-state expression values of node C in the attractors generated by peturbations to the FVS control nodes are all are greater than the steady-state expression values of node C in the attractors generated from the sensitive A sample. A successful reprogramming from resistant (B) to sensitive (A) cells has occurred!