Issue
Disclaimer: this happens on macOS (Big Sur); more info about the context below.
I have to write (almost did) a script which will replace images URLs in big text (xml) files by their Base64-encoded value.
The script should run the same way with single filenames or patterns, or both, e.g.:
./replace-encode single.xml
./replace-encode pattern*.xml
./replace-encode single.xml pattern*.xml
./replace-encode folder/*.xml
Note: it should properly handle files\ with\ spaces.xml
So I ended up with this script:
#!/bin/bash
#needed for `ls` command
IFS=$'\n'
ls -1 $* | xargs -I % sed -nr 's/.*>(https?:\/\/[^<]+)<.*/\1/p' % | xargs -tI % sh -c 'sed -i "" "s@%@`curl -s % | base64`@" $0' "$*"
What it does: ls
all files, pipe the list to xargs
then search all URLs surrounded by anchors (hence the >
and <
in the search expr. - also had to use sed
because grep
is limited on macOS), then pipe again to a sh
script which runs the sed
search & replace, where the remplacement is the big Base64 string.
This works perfectly fine... but only for fileswithoutspaces.xml
I tried to play with $0
vs $1
, $*
vs $@
, w/ or w/o "
but to no avail.
I don't understand exactly how does the variable substitution (is it how it's called? - not a native English speaker, and above all, not a script-writer at all!!! just a Java dev. all day long...) work between xargs
, sh
or even bash
with arguments like filenames.
The xargs -t
is here to let me check out how the substitution works, and that's how I noticed that using a pattern worked but I have to let the "
around the last $*
, otherwise only the 1st file is searched & replaced; output is like:
user@host % ./replace-encode pattern*.xml
sh -c sed -i "" "s@https://www.some.com/public/123456.jpg@`curl -s https://www.some.com/public/123456.jpg | base64`@" $0 pattern_123.xml
pattern_456.xml
Both pattern_123.xml
and pattern_456.xml
are handled here; w/ $*
instead of "$*"
in the end of the command, only pattern_123.xml
is handled.
So is there a simple way to "fix" this?
Thank you.
Note: macOS commands have some limitations (I know) but as this script is intended to non-technical users, I can't ask them to install (or have the IT team installed on their behalf) some alternate GNU-versions installed e.g. pcregrep
or 'ggrep' like I've read many times...
Also: I don't intend to change from xargs
to for
loops or so because, 1/ don't have the time, 2/ might want to optimize the 2nd step where some URLs might be duplicate or so.
Solution
Finally ended up with this single-line script:
sed -nr 's/.*>(https?:\/\/[^<]+)<.*/\1/p' "$@" | xargs -I% sh -c 'sed -i "" "s@%@`curl -s % | base64`@" "$@"' _ "$@"
which does properly support filenames with or without spaces.
Answered By - maxxyme Answer Checked By - Timothy Miller (WPSolving Admin)