Issue
So I'm trying to do a simple regex replacement of a date in the metadata of a document using sed from bash. For example, suppose I have input file test.md
containing:
---
title: "I am a file"
date: December 1, 2021
---
Loren ipsum blah blah blah
I'd like to be able to run a bash script on December 29 and get an output file
---
title: "I am a file"
date: December 29, 2021
---
Loren ipsum blah blah blah
So here's my first try:
#!/bin/bash
TODAY=$(date +'%B %d, %Y')
STARTBIT="date: "
FULLDATE="$STARTBIT$TODAY"
REGEX="s/date:\s.*\n/$FULLDATE/"
echo $REGEX # to make sure I'm getting what I think I'm getting
sed -e $REGEX < test.md > output.md
but I get the following output:
s/date:\s.*\n/date: December 29, 2021/
sed: 1: "s/date:\s.*\n/date:
": unescaped newline inside substitute pattern
so this is a bit confusing, the first line is my echoed pattern, and I definitely don't see any newlines in it on the command line. Nor am I sure quite where newlines would supposedly be??
So then I thought, ok, maybe the newline is appended to the end of one of the variables, and for some reason is made invisible due to some bash silliness when I echo it. So based on this prior SO answer, I just went in and stripped newlines from the end of everything just to make sure. Viz:
#!/bin/bash
TODAY=$(date +'%B %d, %Y')
STARTBIT="date: "
CLEANSTARTBIT=${STARTBIT%%[[:space:]]}
CLEANTODAY=${TODAY%%[[:space:]]}
FULLDATE="$STARTBIT$TODAY"
CLEANFULLDATE=${FULLDATE%%[[:space:]]}
REGEX="s/date:\s.*\n/$CLEANFULLDATE/"
CLEANREGEX=${REGEX%%[[:space:]]}
echo $CLEANREGEX
sed -e $CLEANREGEX < test.md > output.md
and I'm still getting exactly the same output. But now I'm really stumped. There can't possibly be newlines sneaking in here...
Help??
Bonus possible issues:
I'm using the version of sed that shipped with macOS. Heaven only knows what version. Maybe I should try getting my hands on GNU sed??
I don't really know what flavor of regex sed uses, or indeed how sed works at all... I basically just copied the regex over from the one I was using in a python script since forever, for learning purposes/because I'm sick of calling out to python for this bit of basic text processing that I do all the time. Hah, but I actually know python regex...
Solution
First problem: you need to double-quote your variable references (e.g. echo "$REGEX"
instead of echo $REGEX
). Without the double-quotes, the variable's value will be split into "words", and any words that look like filename wildcards will be expanded into a list of matching files. You almost never want either of these things to happen, so you should almost always double-quote variable references. In particular, this command:
sed -e $REGEX < test.md > output.md
Expands to something like:
sed -e s/date:\s.*\n/date: December 29, 2021/
...and "s/date:\s.*\n/date:
", "December
", "29,
", and "2021/
" are all treated as completely separate arguments to sed
. The error message is misleading; the real error is that the first one is an incomplete sed
command.
(If you happened to have any files matching s/date:\s.*\n/date
-- unlikely, but technically possible -- things would get even sillier.)
The second problem is that, as you guessed, your regex is in the wrong syntax dialect. The version that comes with macOS doesn't support the \s
shorthand, so use [[:space:]]
instead. Also, using \n
to match the end of line is invalid in any flavor of sed
; use $
instead (but you need to escape it, since it's in double-quotes and you don't want it to initiate some expansion rule):
REGEX="s/date:[[:space:]].*\$/$FULLDATE/"
Technically, you don't need the $
either. Regex matching is greedy, so if it can match to the end of the line -- and it can -- it will match to the end of the line.
But it'd be a good idea to add ^
at the beginning of the pattern, to anchor it to the beginning of a line. Otherwise, it'll match "date: " anywhere in a line.
Third, I'd recommend switching to lower- or mixed-case variable names. There are a bunch of all-caps names with special meanings, and if you accidentally use one of those it can have weird effects.
Final note: use shellcheck.net -- it'll point out a lot of common scripting mistakes (such as failing to double-quote).
Answered By - Gordon Davisson Answer Checked By - Terry (WPSolving Volunteer)