CLI Usages | SEGUL

📄️ Introduction

The goals of SEGUL CLI are to be easy to use for beginners and to provide powerful options for experienced users. Some common arguments have short options. Some of them also have default values when they are possible and safe to have. This way, we will save time typing the commands.

📄️ Command Options

SEGUL CLI command is structured this way:

📄️ Alignment Concatenation

SEGUL CLI provides an easy way to concatenate multiple alignments and generate the partition setting simultaneously.

📄️ Alignment Conversion

SEGUL can convert a single file to multiple files in the same directory.

📄️ Alignment Filtering

In a typical phylogenomic workflow, you may want to filter problematic alignments before running a phylogenetic analysis. This feature provides multiple ways to filter alignments.

📄️ Alignment Partition Conversion

SEGUL CLI can convert single and multiple partition files. It can also extract partitions embedded in NEXUS sequence files.

📄️ Alignment Splitting

SEGUL alignment splitting splits a concatenated alignment into multiple alignments based on an input partition.

📄️ Alignment Summary

SEGUL generates different summary statistics for DNA and amino acid sequences. By default, the data type is set to the DNA sequence. In general, the command is as follows:

📄️ Alignment Trimming

Trim alignments based on the proportion of missing data or the number of parsimony informative sites. This feature will filter sites based on the specified parameters.

📄️ Unalign Alignments

Unalign alignments to produce unaligned sequence files.

📄️ Multi Alignment Format Conversion

Multiple Alignment Format (MAF) is a text-based format for representing multiple sequence alignments. Unlike the NEXUS or PHYLIP format, which usually contains a single alignment, each MAF file can contain multiple alignments. This format helps store alignments with detailed information about the sequences, such as the sample name, scores, size, strand, and other attributes. However, most phylogenetic software does not support this format. SEGUL aims to bridge this gap by converting MAF files to FASTA or PHYLIP format, including support for interleaved and sequential formats. The output will be in multiple files containing sequences with a matching locus/gene. The filenames will be the locus/gene names.

📄️ Genomic Summary

Since version 0.19.0, segul can calculate summary statistics for raw reads and contiguous sequences.

📄️ Sequence Addition

Add sequences to existing sequence files/alignments. Allow adding sequences from multiple sources to multiple destinations. The file formats for the source and destinations can be different, but SEGUL requires matching file names for both to add the sequences. If the destination files are aligned, all the output sequences will be unaligned. We recommend using MAFFT to align the resulting sequence files.

📄️ Sequence Extraction

SEGUL can extract sequences based on the sequence ID in a collection of alignments. You can input the sequence ID in three ways:

📄️ Sequence Filtering

The sequence filtering method works at the sequence level, which differs from the SEGUL alignment filtering feature, which works on the alignment level. Using the alignment filtering feature will filter the entire alignment that does not meet the filtering criteria. However, the sequence filtering feature will remove sequences that do not meet the criteria while retaining the same alignment if at least one sequence is left in the alignment. The feature works on many alignments simultaneously and will never overwrite your original datasets; it will create new files with the filtered sequences.

📄️ Sequence ID Extraction

Often, we need to know what the taxa in our dataset are. The most straightforward command would be:

📄️ Sequence ID Mapping

To map the distribution of your samples across your dataset, you only need to pass --map flag in the finding unique IDs command:

📄️ Sequence Removal

Based on a list of IDs, you can remove sequences in a collection of alignments. This feature is the opposite of the segul extract feature. Removing less than half of the sequences is faster than segul extract.

📄️ Sequence ID Renaming

SEGUL provides an easy way to rename sequence IDs across all your alignments. To use this function, SEGUL requires a list of the original IDs and the names it needs to change. The input IDs can be written in a tabulated format as a comma-delimited file (.csv) or a tab-delimited file (.tsv).

📄️ Sequence Translation

To translate DNA alignment to amino acid:

📄️ Log File

Except for the spinning emoji and the program progress messages, all the terminal output is written in the log file and saved in the current working directory. The log file also includes the time and the log status.