Issue
I'm quite new to AWK so apologies for the basic question. I've found many references for removing windows end-line characters from files but none that match a regular expression and subsequently remove the windows end line characters.
I have a file named infile.txt that contains a line like so:
...
DATAFILE data5v.dat
...
Within a shell script I want to capture the filename argument data5v.dat
from this infile.txt and remove any carriage return character, \r, IF present. The carriage return may not always be present. So I have to match a word and then remove the \r subsequently.
I have tried the following but it is not working how I expect:
FILENAME=$(awk '/DATAFILE/ { print gsub("\r", "", $2) }' $INFILE)
Can I store the string returned from matching my regex /DATAFILE/
in a variable within my AWK statement to subsequently apply gsub
?
Solution
File names can contain spaces, including \r
s, blanks and tabs, so to do this robustly you can't remove all \r
s with gsub()
and you can't rely on there being any field, e.g. $2
, that contains the whole file name.
If your input fields are tab-separated you need:
awk '/DATAFILE/ { sub(/[^\t]+\t/,""); sub(/\r$/,""); print }' file
or this otherwise:
awk '/DATAFILE/ { sub(/[^[:space:]]+[[:space:]]+/,""); sub(/\r$/,""); print }' file
The above assumes your file names don't start with spaces and don't contain newlines.
To test any solution for robustness try:
printf 'DATAFILE\tfoo \r bar\r\n' | awk '...' | cat -TEv
and make sure that the output looks like it does below:
$ printf 'DATAFILE\tfoo \r\tbar\r\n' | awk '/DATAFILE/ { sub(/[^\t]+\t/,""); sub(/\r$/,""); print }' | cat -TEv
foo ^M^Ibar$
$ printf 'DATAFILE\tfoo \r\tbar\r\n' | awk '/DATAFILE/ { sub(/[^[:space:]]+[[:space:]]+/,""); sub(/\r$/,""); print }' | cat -TEv
foo ^M^Ibar$
Note the blank, ^M
(CR), and ^I
(tab) in the middle of the file name as they should be but no ^M
at the end of the line.
If your version of cat
doesn't support -T
or -E
then do whatever you normally do to look for non-printing chars, e.g. od -c
or vi
the output.
Answered By - Ed Morton Answer Checked By - Candace Johnson (WPSolving Volunteer)