Sunday, March 13, 2022

[SOLVED] Can it work together head, sed and regex into one bash script?

March 13, 2022 bash, head, regex, sed

Issue

I have MyInitialTextFile.txt with these characteristics: <nsup></nsup> Abc 1:2varied text.

every line starts with this: <nsup></nsup> 
it is followed by an expression like this: Abc 1:2 or by 2Ab 1:2
always followed by: 
followed by varied text afterwards.

<nsup></nsup> Abc 1:2varied text

<nsup></nsup> Abc 1:3varied text

<nsup></nsup> Abc 1:4varied text

I need to:

Select the first line(s) from MyInitialTextFile.txt if they start the same. In my case the first two lines. And then transfer these two lines into a TransitionalTextFile.txt For this I used head in bash:

head -n 2 MyInitialTextFile.txt > TransitionalTextFile.txt

Where I would apply on them manually a sequence of two regex expressions. For regex I used:

Find1: (\n) #that is, find Line Feed (an enter on keyboard)

Replace1: " " #that is, Replace with 5 empty spaces

Find2: (.*) #that is, select the entire string

Replace2: $1\n #that is, Replace with all selected (the entire string), and add a Line Feed at the end.

Transfer the content of TransitionalTextFile.txt to the end of a new text file with the same name as found in first string Abc 1:2. For this I used:

head -n 1 TransitionalTextFile.txt >> 'Abc 1:2.txt'

This will be always -n 1 because following the regex step, all the text becomes one entry, even if there were two strings selected initially.

Delete from MyInitialTextFile.txt the number of lines that I transferred, which for me there were two lines. For this I used sed in bash:

sed -i '1,2d' MyInitialTextFile.txt

And the process continues with the next one line: <nsup></nsup> Abc 1:3varied text

I made all the above four steps work manually, but my problem is how to bring all these four steps into one script. That is, to select the strings from a initial file and transfer them to another file via regex where I delete the line feed between them and I add a line feed at the end of them so that it will look like this:

<nsup></nsup> Abc 1:2varied text <nsup></nsup> Abc 1:2varied text

At the end I have to delete from my initial file these two strings. I would appreciate any help to bring these four steps into one script. Thank you.

Solution

Like this (taking one for the team :)? Using awk (Notice: it creates files like Abc 1:2 or whatever is between  and ):

$ awk '
BEGIN {
    FS="<sup>"                 # split at this delimiter
}
{
    if($1==p) {                # if first part equals first part of previous split
        b=b "     " $0         # append to the output buffer
    }
    else {                     # if first part differs, do stuff
        if(NR>1) {             # first line needs not printing
            print b >> t[n]
            # close t[n]       # uncomment if if needed
        }
        n=split($1,t,/<b>/)    # get the changing part
        b=$0                   # reset buffer
    }
    p=$1                       # create previous to compare on next round
}
END {
    print b >> t[n]            # flush the rest of the buffer
}' file

Output of cat Abc\ 1\:2:

<p><nsup></nsup> <b>Abc 1:2<sup>varied text     <p><nsup></nsup> <b>Abc 1:2<sup>varied text

Depending on the awk flavor used, if you start running out of file descriptors, add a close(t[n]) after the print >>s.

Answered By - James Brown

Answer Checked By - Mary Flores (WPSolving Volunteer)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, March 13, 2022

[SOLVED] Can it work together head, sed and regex into one bash script?

Issue

Solution

Popular Posts

Labels