Skip to main content

Genomic Summary

Since version 0.19.0, segul can calculate summary statistics for raw reads and contiguous sequences.

Raw reads

Supported file formats:

  • Gunzip Compressed fastq (fq.gz/fastq.gz)
  • Standard fastq (fq/fastq)

Example

segul raw summary -i raw_reads.fq.gz -o raw_reads_summary

The output file will be saved as read-summary.csv.

segul provide three modes to calculate summary statistics for raw reads:

Minimal: read count only Default: generate essential statistics (see below) Complete: generate all the essential statistics plus summary statistics per position in read for each file.

Essential statistics for raw reads:

  • Number of reads
  • Number of bases
  • Mean read length
  • Minimum read length
  • Maximum read length
  • GC count
  • GC content
  • AT count
  • AT content
  • A, C, G, T, N count
  • Low quality base count
  • Mean base quality
  • Min base quality
  • Max base quality

Contiguous sequences

Supported file formats:

  • Fasta (fa/fasta)

Example

segul contig summary -i contigs.fa -o contigs_summary

Available statistics for contiguous sequences:

  • Contig count
  • Base count
  • Nucleotide
  • GC content
  • AT content
  • Minimum contig length
  • Maximum contig length
  • Mean contig length
  • Median contig length
  • N50
  • N75
  • N90
  • Contig 750 bp
  • Contig 1000 bp
  • Contig 1500 bp
  • Cumulative length
  • A, C, G, T, N counts