Sunday, January 28, 2024

[SOLVED] sed tool unable to convert CRLF to LF in Windows powershell bash

Issue

I have a file with CRLF line endings, which I need to convert to LF. This sed command can do that: sed -E -i "s/\r\n/\n/" file.txt.

I have sed installed in my Windows 11 using msys2. So, if I execute the above sed command in Powershell, it executes successfully and converts CRLF to LF.

But instead, if I execute the above sed command inside bash in Powershell, it completes with no output to the console. But it fails to convert CRLF to LF.

Instead if I run a sed command to word character replacement, it executes correctly both in Powershell and Powershell bash. So I think this has something to do with how CRLF is handles by bash.

So now, how to convert CRLF to LF in Powershell bash using sed?


Solution

Use the following command in your WSL bash session in order to perform in-place conversion of a file from \r\n (CRLF, Windows-format) to \n (LF, Unix-format) newlines (line endings):[1]

sed -i 's/\r$//' file.txt

Alternatively, if you don't mind reading the file as a whole, using GNU sed's -z option and the g regex option for global matching (finding all matches):

sed -i -z 's/\r\n/\n/g' file.txt

Your problem was unrelated to PowerShell; instead, there was a problem with the logic of your sed command:

  • sed reads input line by line by default, and removes the trailing newline from each.

    • Therefore, no line that a sed script operates on contains \n by default, so a substitution (s command) trying to match \r\n is invariably a no-op.
  • Because sed considers only \n (LF) alone a newline:

    • Those lines originally terminated with \r\n will end in \r (CR) when a sed operation is performed on them.
  • s/\r$// therefore removes that trailing \r from each line, and - on writing the modified lines back to the file - terminates each line with just \n, so that in effect only \n (LF) newlines remain.


Note:

  • The only reason why sed -E -i "s/\r\n/\n/" file.txt worked for you is that the sed implementation that comes with MSYS has special behavior for Windows:

    • On reading files, it recognizes both \n (LF) and \r\n (CRLF) as newlines.

    • On writing files, it only uses \n (LF)

    • As such, it converts CRLF sequences to LF by default, and the following command would suffice, i.e. providing an empty script:

      • sed -i ' ' file.txt

        • Note: sed -i '' file.xt should work, i.e. passing an empty string as the script argument, but doesn't up to PowerShell 7.2.x, due to a long-standing bug that accidentally removes empty-string arguments when calling external programs - see GitHub issue #6280 for details.
          The underlying cause of this bug more prominently manifests in the inability to pass arguments with embedded " chars. to external programs - see this answer.
    • For the reasons explained above, s/\r\n/\n/ actually has no effect - in any sed implementation. In other words: it was simply the default behavior of the MSYS sed implementation that happened to perform the desired conversion for you.


[1] To perform the opposite conversion - from LF-only to CRLF newlines, use sed -i 's/$/\r/' file.txt, as jdweng notes.



Answered By - mklement0
Answer Checked By - Senaida (WPSolving Volunteer)