2 Installation and Usage

2.1 Download NETISCE

NETISCE pipelines can be downloaded from our github repository: https://github.com/veraliconaresearchgroup/netisce

We recommend that you run NETISCE on a high-performance cluster (hpc), as you may generate files that are quite large, or run computations that may take a long time. However, we provide two Nextflow pipelines, one designed for hpcs (NETISCE_hpc), and another for running NETISCE on a local machine (NETISCE_local).

2.2 Install Nextflow

Nextflow is required to run the NETISCE pipeline. Please follow the instructions from https://www.nextflow.io/ (see ‘Getting Started’ steps 1 & 2) to install Nextflow in the appropriate NETISCE folder (_local or _hpc). Note: if you are on a Windows Machine, you will need to install Windows Subsystem for Linux https://docs.microsoft.com/en-us/windows/wsl/install

2.3 Prerequisuites

Please be sure you have the following Python packages installed:

scipy version=1.5.4
pandas version=1.1.5
sklearn version=0.24.2
yellowbrick version=1.3.post1

As well as the following R packages:

dplyr
ggplot2
plyr
reshape2
readr

which can be installed through CRAN.

A Note about R packages NETISCE was originally constructed and run using R version 3.6.3. However, it has also been tested and functions in R version 4.0.4. The in NETISCE_hpc we specify for R scripts to use R/3.6.3 and have included the relevant R packages within the bin folder in an effort to alleviate potential package installation issues when using an hpc.

2.4 Docker Image

If you are interested, we additionally provide a Docker container which has all required packages and code loaded to run NETISCE. It can be downloaded here

After downloading the image, it can be unpacked using docker image load -i netisce.tar. Then, you can run the container with docker run ubuntu:NETISCE. (note, depending on your system, you may need to use sudo commands)

2.5 Parameters and Configuration

Whether on your local machine or hpc, to run NETISCE you must specify the files and parameters within the .nf file

params.expressions: csv file containing normalized expression data for network nodes in different samples
params.network: network file (sif format)
params.samples: text file specifying the phenotype for each sample in params.expressions file (tab delimited)
params.internal_control: text file containing a list of nodes to be used as internal marker nodes
params.alpha: alpha parameter for signal flow analysis (default =0.9)
params.undesired: string of the undesired phenotype (as labeled in the params.samples file)
params.desired: string of the desired phenotype (as labeled in the params.samples file)
params.filter: filtering parameter for criterion 2 (“strict” or “relaxed”)
params.kmeans_min_val: minimum k-means value for clustering (default=2)
params.kmeans_max_val: maximum k-means value for clustering (default=10)
params.num_nodes: number of nodes in network for which normalized expression data exists (within the params.expressions file)
params.num_states: number of randomly generated initial states (default=100000, or 3^n where n is the number of network nodes and 3^n is less than 100000)
params.randseed: random seed to generate one FVS of a network. Note: If you are interested in identifying all FVSes within a network, and using this information to specifically select a FVS, please see the additional folder and Nextflow script in the FVS_finding folder. Detailed description of the input and output is below.

Please see the input_data folder for examples of files to match the formatting.

2.5.1 NETISCE_mutations.nf

If you are interested in including mutational information, please use the NETISCE_mutations.nf pipeline. You must additionally specify params.mutations: a csv file containing mutational configuration for network nodes (0 for loss of function, 1 for gain of function). Please see example in input_data for formatting.

2.5.2 nextflow.config

If you are running nextflow on an hpc, please specify your executor, and clusterOptions within the nextflow.config file. Please see https://www.nextflow.io/docs/latest/config.html for more information regarding your executor.

2.6 Running NETISCE

Once you have specified the parameters, run NETISCE using the following command:

./nextflow run NETISCE.nf -resume ##or NETISCE_mutations.nf if including mutational data

We recommend using the -resume flag in the case that you change a file or parameter within your pipeline. This way, nextflow caches results that remain unchanged, preventing pipeline steps from being re-run.

2.7 FVS_Finding Nextflow Script

A user may be interested in identifying all FVSes within a network. By looking at individual FVSes, users can select the FVS that contains nodes of interest. We have included an additional Nextflow script that calculates the FVSes within a network and then provides the unique sets of FVSes identified.

2.7.1 parameters

params.network = location of network file (this can be directed to your input_data folder)

params.sets = number of FVS searches to perform (i.e., the randomseed value). This should be set reasonably high to discover all FVSes within a network, based on the network size.

##output

FVSes.txt = file containing the unique minimal FVSes within the network. The first column contains the FVS identifier- the number following “FVS_” indicates which randomseed value was used to identify that specific FVS. You can use this value as the setting for params.randseed in NETISCE.