ORf Interaction Ontology Network (ORION) ¶

ORf Interaction Ontology Network (ORION), a computational framework designed to explore patterns of protein cluster (PC) organization across genomes, contigs, or viral operational taxonomic unit (vOTU) sequences.

Description¶

ORION detects shared genomic features by analyzing the local arrangement of protein clusters (PCs) around each open reading frame (ORF). It maps each ORF position to its corresponding PC, then defines cluster blocks (CBs) using a sliding window of user-specified length. For example, a window size of three includes the focal PC plus its immediate neighbors; larger windows capture more distant PCs. By comparing the relative order of PCs within each window, ORION maintains consistent pattern matching even when ORFs shift due to rearrangements. Increasing the window size tightens the required cluster arrangement, revealing more highly conserved synteny across genomes.

Dependencies¶

orion can be ran either on a machine with a properly set up python environment or by creating a custom Conda environment if you do not wish to change your current setup (recommended).

Installation¶

I recommend using Mamba for faster dependency resolution; see the Mamba installation guide. If you prefer Conda, refer to the Conda installation guide.

#Retrieve repo
git clone https://github.com/zschreib/orion
cd orion

#Create environment (conda may be slow)
mamba env create -f environment.yml

#Activate environment

mamba activate orion 
#OR
conda activate orion

Tool Check. If you recieve any errors here do not proceede until fixed.

mmseqs -h
prodigal -h
#Make sure version matches main branch.
python main.py --version

Running ORION¶

usage: main.py [-h] -i INPUT -p PREFIX [--min-seq-id MIN_SEQ_ID] [-cb CLUSTER_BLOCK] [-min MIN_GENOMES] [-jci JACCARD] [-t THREADS] [-f {graphml,csv}] [--version]

Pipeline: preprocess FASTA and run ORION into one results folder

options:
  -h, --help            show this help message and exit
  -i, --input INPUT     Path to input FASTA file. Nucleotide genomes/contigs.
  -p, --prefix PREFIX   Base name for results directory (creates <prefix>_results)
  --min-seq-id MIN_SEQ_ID
                        MMseqs2 min amino acid sequence identity for clustering. Default = 0.3 or 30%. 
  -cb, --cluster-block CLUSTER_BLOCK
                        Cluster block size for ORION. Default = 3.
  -min, --min-genomes MIN_GENOMES
                        Minimum genomes for conserved cluster-block. Default = 5.
  -jci, --jaccard JACCARD
                        Jaccard threshold for cluster block co-occurrence visual. Default = 0.25.
  -t, --threads THREADS
                        Number of worker threads for ORION. Default = 1
  -f, --format {graphml,csv}
                        Output format for networks. Recommend csv for Cosmograph or graphml for Cytoscape. Default csv.
  --version             Show program version and exit.

Example run: ORION applied to the NCBI test dataset, which contains 48 phage genomes (16 Klebsiella, 16 Staphylococcus, and 16 Salmonella phage).

python main.py -i NCBI_test_input/kleb_sal_staph_48_phage.fasta  -p NCBI_test --min-seq-id 0.3 -cb 3 -min 3 -f csv

Process parameters: clusters amino acid ORFs at 30% identity (--min-seq-id 0.3), generates cluster blocks of size 3 (-cb 3), retains blocks found in =3 genomes (--min 3), saves output to a prefix-named directory (-p), and exports networks in CSV format (-f csv). To see more options use (python main.py -h).

Results¶

NCBI_test_results/
+-- NCBI_test_job_logs.txt
+-- orion_output
|   +-- analysis_data
|   |   +-- block_enrichment_scores.tsv
|   |   +-- block_jaccard_scores.tsv
|   |   +-- genome_enriched_blocks.tsv
|   +-- networks
|       +-- cluster_block_jaccard_metadata.csv
|       +-- cluster_block_jaccard_network.csv
|       +-- genome_to_cb_metadata.csv
|       +-- genome_to_cb_network.csv
+-- preprocess_output
    +-- NCBI_test_cluster_blocks.tsv
    +-- NCBI_test_mmseqs_cluster_membership.tsv
    +-- NCBI_test_orf_metadata.tsv
    +-- NCBI_test_orfs.pep

More details on each output files will be provided soon.

Network Visuals¶

Depending on the chosen format (-f), networks can be visualized in Cytoscape or Cosmograph. Since Cosmograph does not accept GraphML, we will use CSV files:

Open Cosmograph:
Go to Cosmograph.
Upload files:
Drag genome_to_cb_network.csv into Select data file
Drag genome_to_cb_metadata.csv into Select metadata file
Map columns:
Set Source column ? Source
Set Target column ? Target
Launch the network:
Click Launch to render the graph.
Color nodes by syntenome:
In the left panel under Node color by, choose metadata | syntenome.
This will color genomes based on shared cluster block composition (syntenomes) and leave cluster block nodes gray. Highly shared blocks will draw their connected genomes toward the center.

ORION example

Citation¶

If you found this tool useful, please cite:

Schreiber, Z. D. ORION: ORf Interaction Ontology Network. GitHub. Retrieved July 29, 2025, from https://github.com/zschreib/orion/.

Authors¶

Contributors names and contact info.

zschreib@udel.edu

License¶

This project is licensed under the GNU General Public License v3.0 see the LICENSE file for details

Acknowledgments¶

Visuals provided by Cosmograph