Issue
I would like to simply concatenate all contents of a set of files into a single new file. Each file may be identified by either a shared file name, or folder name. I see many examples of performing this task while files are in the same directory, but not where they are spread across sub directories.
Input
my_project
+-- housecat
| +-- 1234
| | +-- 1234_contigs.fasta
| +-- 1290
| | +-- 1290_contigs.fasta
+-- jaguar
| +-- 1234
| | +-- 1234_contigs.fasta
| +-- 4567
| | +-- 4567_contigs.fasta
| +-- 9876
| | +-- 9876_contigs.fasta
+-- puma
| +-- 0987
| | +-- 0987_contigs.fasta
| +-- 1029
| | +-- 1029_contigs.fasta
| +-- 1234
| | +-- 1234_contigs.fasta
| +-- 4567
| | +-- 4567_contigs.fasta
an 'example' of the output files.
mkdir -p concats/1234
cat puma/1234/1234_contigs.fasta jaguar/1234/1234_contigs.fasta housecat/1234/1234_contigs.fasta >> concats/1234/1234_concat.fasta
more concats/1234_concat.fasta
minimally reproducible contents of housecat 1234
minimally reproducible contents of jaguar 1234
minimally reproducible contents of puma 1234
I would like this action to be performed for each of these types of files, even if their is only one such file (e.g. 1029 & 1290.fasta). I see that I can copy each of the files into a directory, and concatenate from there - but I would like to avoid that. Is this possible or should I just continue along the path of renaming the files, placing into the same folder, and combining them there?
DESIRED OUTPUT (not showing contents of all sub directories)
my_project
+-- concats
| +-- 0987/0987_concat.fasta
| +-- 1029/1029_concat.fasta
| +-- 1234/1234_concat.fasta
| +-- 4567/4567_concat.fasta
| +-- 1290/1290_concat.fasta
+-- jaguar
+-- housecat
+-- puma
what i have so far:
FILENAME=$(find . -print | grep -E '[0-9]{3,4}_contigs.fasta') # this due to many many non-target files being present. I can move it into the script later just do not want to much focus on this.
for i in $FILENAME; do
FILE=$(basename "$i" | sed 's/_contigs//g')
DIR=concats/${FILE%.*}
ORGANISM=$(echo $i | cut -d/ -f 2)
mkdir -p -- "$DIR"
cp $i "${DIR}/${ORGANISM}_${FILE}" # rename the files here
done
for d in concats/*/ ; do
LOCI=$(echo $d | cut -d/ -f 2)
echo $d* > ${d}${LOCI}_concat.fasta
done
I was wondering if before I run the second loop I could use a cat like command to combine these files? Or whether I need to move them to destination and them combine them? Mostly just curious if I can avoid copying the files.
Solution
A solution in plain bash
would be:
cd /path/to/my_project || exit
for src in */*/*_contigs.fasta; do
IFS=/ read -r _ dir file <<< "$src"
mkdir -p "concats/$dir"
cat "$src" >> "concats/$dir/${file/contigs/concat}"
done
Answered By - M. Nejat Aydin Answer Checked By - Robin (WPSolving Admin)