Issue
I am trying to replace a pattern between the lines of a file.
Specifically, I would like to replace ,\n &
with , &\n
in large and multiple files. This actually moves the symbol & to the previous line. This is very easy with CTR+H, but I found it difficult with sed.
So, the initial file is in the following form:
A +,
& B -,
& C ),
& D +,
& E (,
& F *,
# & G -,
& H +,
& I (,
& J +,
K ?,
The output-desired form is:
A +, &
B -, &
C ), &
D +, &
E (, &
F *, &
# & G -,
H +, &
I (, &
J +,
K ?,
Following previous answered questions on stackoverflow, I tried to convert it with the commands below:
sed ':a;N;$!ba;s/,\n &/&\n /g' file1.txt > file2.txt
sed -i -e '$!N;/&/b1' -e 'P;D' -e:1 -e 's/\n[[:space:]]*/ /' file2.txt
but they fail if the symbol "#" is present in the file.
Is there any way to replace the matched pattern simpler, let's say:
sed -i 's/,\n &/, &\n /g' file
Thank you in advance!
Solution
If you use GNU sed
and your file does not contain NUL characters (ASCII code 0), you can use its -z
option to process the whole file as one single string, and the multi-line mode of the substitute command (m
flag). The m
flag is not absolutely needed but it simplifies a bit (.
and character classes do not match newlines):
$ sed -Ez ':a;s/((\`|\n)[^#]*,)((\n.*#.*)*)(\n[[:blank:]]*)&/\1 \&\3\5 /gm;ta' file
A +, &
B -, &
C ), &
D +, &
E (, &
F *, &
# & G -,
H +, &
I (, &
J +,
K ?,
This corresponds to your textual specification and to your desired output for the example you show. But it is a bit complicated. Instead of processing lines that end with a newline character it processes sub-strings that begin with a newline character (or the beginning of the file) and end before the next newline character. Let's name these "chunks".
We search for a sequence of chunks in the form AB*C
where:
A
is a chunk (possibly the first) not containing#
. It is matched by(\<backtick>|\n)[^#]*,
which means beginning-of-file-or-newline, followed by any number of characters except newline and#
, followed by a comma.B*
is any number (including none) of chunks containing#
. It is matched by\n.*#.*
which means newline, followed by any number of characters except newline, followed by#
and any number of characters except newline.C
is a chunk starting with a newline, followed by spaces and&
. It is matched by\n[[:blank:]]*&
which means newline, followed by any number of blanks and a&
.
If we find such a AB*C
sequence we add a space and a &
at the end of A
, we do not change B*
, and we replace the first &
in C
by a space. And we repeat until no such sequence is found.
Note: if the commas can be followed by blanks before the newline we must take them into account. If you want to keep them:
$ sed -Ez ':a;s/((\`|\n)[^#]*,[[:blank:]]*)((\n.*#.*)*)(\n[[:blank:]]*)&/\1 \&\3\5 /gm;ta' file
Else:
$ sed -Ez ':a;s/((\`|\n)[^#]*,)[[:blank:]]*((\n.*#.*)*)(\n[[:blank:]]*)&/\1 \&\3\5 /gm;ta' file
Answered By - Renaud Pacalet Answer Checked By - Clifford M. (WPSolving Volunteer)