Issue
I have multiple files named:
Genus_species_strain.fasta
I want to use sed to print out:
Genus
species
strain
I want to use the "printed" words in a command like this (prokka is a tool for genome annotation):
prokka $file --outdir `echo $file | sed s/\.fasta//` --genus `echo $file | sed s/_.*\.fasta//` --species `echo $file | sed <something here>` --strain `echo $file | sed <something here>`
I would appreciate the help. I am very new to all of this, and as you see above, I only know how to print out Genus
.
Below I have some additional questions (no need to answer these if it only complicates things further). This is one of my attempts to print species
, and the questions are the following:
sed s/.*_//1 | sed s/_.*\.fasta//
I know the second command isn't correct. I assume it needs to start from the second
_
, but I don't know how to do that, since the continuation (that is.fasta
) is unique.When used alone,
sed s/.*_//1
returnsstrain.fasta
. How to make it not skip the first_
?Combining commands (either as you see above, or with
;
) doesn't seem to work for me.
Solution
You can use string splitting with string manipulation:
file='Genus_species_strain.fasta'
IFS='[_.]' read -r genus species strain _ <<< "$file"
outdir="${file%.*}"
Then you can use the variables in the command:
prokka "$file" --outdir "$outdir" --genus "$genus" --species "$species" --strain "$strain"
See this online demo:
#!/bin/bash
file='Genus_species_strain.fasta'
IFS='[_.]' read -r genus species strain _ <<< "$file"
echo "${file%.*}" # outdir
echo "$genus"
echo "$species"
echo "$strain"
Output:
Genus_species_strain
Genus
species
strain
Answered By - Wiktor Stribiżew Answer Checked By - Marilyn (WPSolving Volunteer)