Issue
Let's say I have a file with patterns to match into another file:
file_names.txt
pfg022G
pfg022T
pfg068T
pfg130T
pfg181G
pfg181T
pfg424G
pfg424T
I would like to use file_names.txt
and use sed
command into example.conf
:
example.conf
{
"ExomeGermlineSingleSample.sample_and_unmapped_bams": {
"flowcell_unmapped_bams": ["/groups/cgsd/alexandre/gatk-workflows/src/ubam/pfg022G.unmapped.bam"],
"unmapped_bam_suffix": ".unmapped.bam",
"sample_name": "pfg022G",
"base_file_name": "pfg022G.GRCh38DH.target",
"final_gvcf_base_name": "pfg022G.GRCh38DH.target"
},
The sed command would replace pfg022G
on example.conf
with pfg022T
, which is the next item in file_names.txt
(sed s/pfg022G/pfg022T/
). The example.conf
at this point should look like this:
{
"ExomeGermlineSingleSample.sample_and_unmapped_bams": {
"flowcell_unmapped_bams": ["/groups/cgsd/alexandre/gatk-workflows/src/ubam/pfg022T.unmapped.bam"],
"unmapped_bam_suffix": ".unmapped.bam",
"sample_name": "pfg022T",
"base_file_name": "pfg022T.GRCh38DH.target",
"final_gvcf_base_name": "pfg022T.GRCh38DH.target"
},
After 15 minutes the substitution should be pfg022T
to pfg068T
and so on until all the items in file_names.txt
are exhausted.
Solution
The following crontab would run your script every 15 minutes:
# Example of job definition:
# .---------------- minute (0 - 59)
# | .------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | | | .---- day of week (0 - 6) (Sunday=0 or 7)
# | | | | |
# * * * * * command to be executed
15 * * * * /path/to/script
With script
reading
#!/usr/bin/env sh
file1="file_names.txt"
file2="example.conf"
sed -i -e "$(awk '(NR>1){print "s/"p"/"$1"/g"}{p=$1}' $file1 | tac)" example.conf
The trick we use here is to do revere substitution. The file example.conf
always contains only one string which is also in "file_names.txt". So if you attempt to substitute from the last to the front you will only do a single substitution.
We use awk
here to build a sed
-script and tac
to reverse it so that we only have a single match:
$ awk '(NR>1){print "s/"p"/"$1"/g"}{p=$1}' $file_names.txt
s/pfg022G/pfg022T/g
s/pfg022T/pfg068T/g
s/pfg068T/pfg130T/g
s/pfg130T/pfg181G/g
s/pfg181G/pfg181T/g
s/pfg181T/pfg424G/g
s/pfg424G/pfg424T/g
If we do a sed
with the above script, we will always end up with pfg424T
(the last entry) as it will find a single match (assume we are in the third entry pfg068T
), so sed will perform every substitution after that. However, when we reverse the order (using tac
), sed
will only find a single match.
Answered By - kvantour Answer Checked By - Timothy Miller (WPSolving Admin)