Monday, January 22, 2024

[SOLVED] Replace variable Nth occurence in file

January 22, 2024 awk, sed, while-loop, zsh

Issue

So I have a file with a bunch of occurrences of the same string across thousands of lines. For simplicity's sake, my demo file reads as such:

123 Fragment-cnum0001 Energy
Fragment-cnum0001
XXX
sss
123 Fragment-cnum0001 Energy
Fragment-cnum0001
XXX
sss
123 Fragment-cnum0001 Energy
Fragment-cnum0001
XXX
sss
123 Fragment-cnum0001 Energy
Fragment-cnum0001
XXX
sss
123 Fragment-cnum0001 Energy
Fragment-cnum0001
XXX
sss

I would like to replace each occurrence so that it reads

123 Fragment-cnum0001 Energy
alpha-cnum0001
XXX
sss
123 Fragment-cnum0001 Energy
alpha-cnum0002
XXX
sss
123 Fragment-cnum0001 Energy
alpha-cnum0003
XXX
sss
123 Fragment-cnum0001 Energy
alpha-cnum0004
XXX
sss
123 Fragment-cnum0001 Energy
alpha-cnum0005
XXX
sss

I know I could do specific replacements of each line with

sed 's/0001/0002/2' file

But I was hoping that the following loop would work:

Though probably very slow, my original thought was to do

for k in *.txt; do
    x=0 #reset the number of occurrences to zero
    tconf=$(grep -c "cnum0001\n" $k) #find the total number of occurrences
    while $(( (($x + 0)) <= (($tconf + 0)) )); do #while the number of occurrences is less than the total number
        x=$(($x + 1)) #add one to the number of occurrences
        cnc=$(( printf %04d $x )) #set it so that $cnc includes the necessary number of leading zeroes before $x, so if x=1, cnc=0001.
        cn=${prefixCFGi}-cnum${cnc}
        sed -i 's/Fragment-cnum0001/$cn/$x' $k #This is the command I need help with. I want it to find the xth occurrence of Fragment-cnum0001 and replace it with $cn
    done #loop through the txt file until $x=$tconf
done #loop through all txt files

However, when I tried:

x=2;sed "s/0001/0002/$x" file

the output was exactly the same as the input. In this simple case, it should have merely changed the second occurrence of 0001 to 0002 and it did not. To me, this means that sed isn't understanding that x=2 and replacing it in the execution accordingly.

I am writing this as a part of a much larger zsh script, but I am currently working in the terminal.

Notes that I have added because the answers I was getting were not fully addressing my question:

I cannot use the line number as a counter (so code that says "do this every 4 lines" will not work). The number of lines between each occurrence is variable as is the text between them. My actual file has over a hundred lines between each occurrence.
I need to be able to specify the found string must be on its own line, as I have occurrences that are in the middle of a larger string on other lines that I do not want counted or replaced.

I am open to other commands, sed is just the one whose arrangement I am most familiar with.

Solution

While it would certainly be possible to use a bash loop to update the file, the repeated sed -i call is going to be excessive (ie, having to rewrite the entire file for each pass through the loop). Better performance is going to come from using a tool (eg, awk, perl, python) that's capable of making the (multiple) changes in a single pass through the file.

Setup:

$ cat file1.txt
123 Fragment-cnum0001 Energy
Fragment-cnum0001
XXX
sss
123 Fragment-cnum0001 Energy
Fragment-cnum0001
XXX
sss
123 Fragment-cnum0001 Energy
Fragment-cnum0001
XXX
sss

$ cat file2.txt
456 Fragment-cnum0001 Mining
Fragment-cnum0001
XXX
sss
456 Fragment-cnum0001 Mining
Fragment-cnum0001
XXX
sss
456 Fragment-cnum0001 Mining
Fragment-cnum0001
XXX
sss

One awk idea to replace OP's current while loop:

newpfx="alpha"

for k in *.txt
do
    printf "\n############## $k\n"

    awk -v pfx="Fragment,${newpfx}" '                    # define old/new prefix strings

    BEGIN     { split(pfx,a,",")                         # a[1]==old prefix / a[2]==new prefix
                oldid=a[1] "-cnum0001"                   # assumes always looking for string ending in "cnum0001"
                newid=a[2] "-cnum"
              }

    $1==oldid { $1 = newid sprintf("%04d", ++sfx) }      # if 1st field matches "oldid" then redefine 1st field; assumes no other fields on this line
    1                                                    # print current line
    ' "$k"
done

This generates:

############## file1.txt
123 Fragment-cnum0001 Energy
alpha-cnum0001
XXX
sss
123 Fragment-cnum0001 Energy
alpha-cnum0002
XXX
sss
123 Fragment-cnum0001 Energy
alpha-cnum0003
XXX
sss

############## file2.txt
456 Fragment-cnum0001 Mining
alpha-cnum0001
XXX
sss
456 Fragment-cnum0001 Mining
alpha-cnum0002
XXX
sss
456 Fragment-cnum0001 Mining
alpha-cnum0003
XXX
sss

If using GNU awk (for -i inplace support) we can directly update the files, eg:

newpfx="alpha"

for k in *.txt
do
    awk -i inplace -v pfx="Fragment,${newpfx}" '    
    BEGIN     { split(pfx,a,",")
                oldid=a[1] "-cnum0001"
                newid=a[2] "-cnum"
              }

    $1==oldid { $1 = newid sprintf("%04d", ++sfx) }
    1
    ' "$k"
done

This generates:

$ cat file1.txt
123 Fragment-cnum0001 Energy
alpha-cnum0001
XXX
sss
123 Fragment-cnum0001 Energy
alpha-cnum0002
XXX
sss
123 Fragment-cnum0001 Energy
alpha-cnum0003
XXX
sss

$ cat file2.txt
456 Fragment-cnum0001 Mining
alpha-cnum0001
XXX
sss
456 Fragment-cnum0001 Mining
alpha-cnum0002
XXX
sss
456 Fragment-cnum0001 Mining
alpha-cnum0003
XXX
sss

We could go further and pull the for k in *.txt into our single awk call, eg:

awk -i inplace -v pfx="Fragment,${newpfx}" 'BEGIN ....' *.txt

OP will need to decide if this will work in the real script.

OP has mentioned this code is nested within a couple other loops; if those additional loops consist of making further modifications to these same files then it may be possible to pull those other loops into the same awk script, which in turn would improve the overall performance of the main script.

Answered By - markp-fuso

Answer Checked By - Marie Seifert (WPSolving Admin)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, January 22, 2024

[SOLVED] Replace variable Nth occurence in file

Issue

Solution

Popular Posts

Labels