3.2.4 File Manipulation
Various file formats have been introduced with the development of different DNA/RNA sequencing technologies. While there are many different biological file formats related to NGS analyses (or to store and manipulate), FASTA/Q files are most commonly encountered in the bioinformatics community. This is due to their flexibility: FASTA/Q files can be read, mapped and indexed by several different software packages to generate SAM/BAM, GFF/GTF, VCF, and more. Using a fai index file in conjunction with a FASTA/Q file containing reference sequences enables efficient access to arbitrary regions within those reference sequences and extracts subsequences from the indexed reference sequence (Danecek et al. 2021; Quinlan & Hall 2010).
Like other modules, the web-based Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org, Australia: https://usegalaxy.org.au/) and command-line tools, Samtools and BCFtools (Danecek et al., 2021) and BEDTools (Quinlan & Hall, 2010), offer a range of NGS data file manipulation capabilities, but its usage can be challenging for biologists due to lack of computer language literacy and internet dependence. To enhance and extend the flexibility and convenience, we present easyfm , a free single GUI for NGS file manipulation (mainly for FASTA files) (Figure 5). Since users can control everything with a simple mouse click on a desktop, the tools available in theeasyfm would be a convenient way to teach bioinformatics/data analysis, and to quickly analyse results without being hampered by command line tools and HPC Secure Shell (SSH) connections.
Users can import any FASTA/Q files to index and extract the indexed ID with its sequence by double-clicking, matching Prefix ID and selecting a provided text file (Figure 5A). Even the FASTQ file can be converted to the FASTA file and the given FASTA file change its direction via Reverse Complement and Reverse (Figure 5B and 5C). For wide applications,easyfm File Manipulation also allows users to easily manipulate (including filtering [IDs, features and strand] and extracting sequence regions) and consolidate from GFF and GTF files if its corresponding reference genome/transcriptome sequences are present (Figure 5D). To enhance user-friendliness, users can extract a given sequence as a FASTA file with extra flanking regions for both directions by entering the desired sequence length (numeric numbers). Along with existing tools (Danecek et al., 2021; Quinlan & Hall, 2010),easyfm File Manipulation will provide a stable and modular platform for manipulating sequence data and files to ensure high reproducibility standards in the NGS era.