Monday, May 16, 2022

[SOLVED] Replace a pattern between lines

May 16, 2022 bash, sed, shell, unix

Issue

I am trying to replace a pattern between the lines of a file.

Specifically, I would like to replace ,\n & with , &\n in large and multiple files. This actually moves the symbol & to the previous line. This is very easy with CTR+H, but I found it difficult with sed.

So, the initial file is in the following form:

      A +,
   &  B -,
   &  C ),
   &  D +,
   &  E (,
   &  F *,
 # &  G -,
   &  H +,
   &  I (,
   &  J +,
      K ?,

The output-desired form is:

      A +, &
      B -, &
      C ), &
      D +, &
      E (, &
      F *, &
#  &  G -,
      H +, &
      I (, &
      J +,
      K ?,

Following previous answered questions on stackoverflow, I tried to convert it with the commands below:

sed ':a;N;$!ba;s/,\n &/&\n /g' file1.txt > file2.txt

sed -i -e '$!N;/&/b1' -e 'P;D' -e:1 -e 's/\n[[:space:]]*/ /' file2.txt

but they fail if the symbol "#" is present in the file.

Is there any way to replace the matched pattern simpler, let's say: sed -i 's/,\n &/, &\n /g' file

Thank you in advance!

Solution

If you use GNU sed and your file does not contain NUL characters (ASCII code 0), you can use its -z option to process the whole file as one single string, and the multi-line mode of the substitute command (m flag). The m flag is not absolutely needed but it simplifies a bit (. and character classes do not match newlines):

$ sed -Ez ':a;s/((\`|\n)[^#]*,)((\n.*#.*)*)(\n[[:blank:]]*)&/\1 \&\3\5 /gm;ta' file
      A +, &
      B -, &
      C ), &
      D +, &
      E (, &
      F *, &
 # &  G -,
      H +, &
      I (, &
      J +,
      K ?,

This corresponds to your textual specification and to your desired output for the example you show. But it is a bit complicated. Instead of processing lines that end with a newline character it processes sub-strings that begin with a newline character (or the beginning of the file) and end before the next newline character. Let's name these "chunks".

We search for a sequence of chunks in the form AB*C where:

A is a chunk (possibly the first) not containing #. It is matched by (\<backtick>|\n)[^#]*, which means beginning-of-file-or-newline, followed by any number of characters except newline and #, followed by a comma.
B* is any number (including none) of chunks containing #. It is matched by \n.*#.* which means newline, followed by any number of characters except newline, followed by # and any number of characters except newline.
C is a chunk starting with a newline, followed by spaces and &. It is matched by \n[[:blank:]]*& which means newline, followed by any number of blanks and a &.

If we find such a AB*C sequence we add a space and a & at the end of A, we do not change B*, and we replace the first & in C by a space. And we repeat until no such sequence is found.

Note: if the commas can be followed by blanks before the newline we must take them into account. If you want to keep them:

$ sed -Ez ':a;s/((\`|\n)[^#]*,[[:blank:]]*)((\n.*#.*)*)(\n[[:blank:]]*)&/\1 \&\3\5 /gm;ta' file

Else:

$ sed -Ez ':a;s/((\`|\n)[^#]*,)[[:blank:]]*((\n.*#.*)*)(\n[[:blank:]]*)&/\1 \&\3\5 /gm;ta' file

Answered By - Renaud Pacalet

Answer Checked By - Clifford M. (WPSolving Volunteer)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, May 16, 2022

[SOLVED] Replace a pattern between lines

Issue

Solution

Popular Posts

Labels