Issue
I'm working on cleaning up a few hundred files on a Windows machine and one of things I need done to them is removing some duplicate lines. So an example file might look like this:
foo=false
bar=true
baz=false
baz=false
baz=false
So in working with sed
I came across this site that showcased a line that removes duplicate lines.
sed "$!N; /^\(.*\)\n\1$/!P; D" textfile.txt
So I went and plugged it into a command window to see if it works and the console window showed the duplicate lines removed. After that I plugged that line into my batch script to run it against my list of files that needed to be edited.
FOR /F %%a IN ('listfile.txt') DO (
sed "$!N; /^\(.*\)\n\1$/!P; D" %%a
)
But when I ran this against my test list of files it removed every line from the file except for one of the duplicate lines.
I'm not familiar with sed
enough to know for sure what all the stuff that line is doing but my test of it showed it doing what I wanted. So what gives? Am I missing something in the way sed
works in a batch file?
Based on the comments I tried:
gawk "!a[$0]++" textfile.txt
and once again it works on the command line but not in the script. So there is definitely some issue with the way the batch file is running this command but I'm unable to figure out what that is.
Solution
After doing some more testing on the original sed
statement I found that it was getting hung up on the !
in the command. So I started some digging along that route and found that EnableDelyedExpansion
was causing the !
and everything between them to be removed even within the sed
statement.
So my options were to escape the !
or narrow the scope of the EnableDelayedExpansion
. Since escaping didn't seem to be working I just narrowed the scope to right around the specific variable that needed it and then the sed
statement seemed to work correctly after that.
Answered By - Matthew Green Answer Checked By - David Marino (WPSolving Volunteer)