Issue
I need to recursively find all files with this HTML:
<html id="blx-5fb3c619e82a2863d6567c52-000000001" class="blx-5fb3c619e82a2863d6567c52"><head>
<meta charset="utf-8">
<meta name="google" value="notranslate">
And replace it with this HTML:
<html id="blx-5fb3c619e82a2863d6567c52-000000001" class="blx-5fb3c619e82a2863d6567c52"><head>
<meta charset="utf-8">
<meta name="google" value="notranslate">
<meta name="format-detection" content="telephone=no">
<meta name="format-detection" content="date=no">
<meta name="format-detection" content="address=no">
<meta name="format-detection" content="email=no">
This is my unsuccessful attempt of a grep command piped to a sed:
grep --include="index.html" -PRwzl -e '<html id="blx-5fb3c619e82a2863d6567c52-000000001" class="blx-5fb3c619e82a2863d6567c52"><head>\n <meta charset="utf-8">\n <meta name="google" value="notranslate">\n' | xargs -i@ sed -i 's/<html id="blx-5fb3c619e82a2863d6567c52-000000001" class="blx-5fb3c619e82a2863d6567c52"><head>\n <meta charset="utf-8">\n <meta name="google" value="notranslate">\n/<html id="blx-5fb3c619e82a2863d6567c52-000000001" class="blx-5fb3c619e82a2863d6567c52"><head>\n <meta charset="utf-8">\n <meta name="google" value="notranslate">\n <meta name="google" value="notranslate">\n <meta name="format-detection" content="telephone=no">\n <meta name="format-detection" content="date=no">\n <meta name="format-detection" content="address=no">\n <meta name="format-detection" content="email=no">\n/g' @
The grep command alone works perfectly.
For clarity, here is the command split into many sections.:
grep --include="index.html" \
-PRwzl \
-e '<html id="blx-5fb3c619e82a2863d6567c52-000000001" class="blx-5fb3c619e82a2863d6567c52"><head>
\n <meta charset="utf-8">
\n <meta name="google" value="notranslate">
\n' \
| xargs -i@ sed -i 's/<html id="blx-5fb3c619e82a2863d6567c52-000000001" class="blx-5fb3c619e82a2863d6567c52"><head>
\n <meta charset="utf-8">
\n <meta name="google" value="notranslate">
\n
/<html id="blx-5fb3c619e82a2863d6567c52-000000001" class="blx-5fb3c619e82a2863d6567c52"><head>
\n <meta charset="utf-8">
\n <meta name="google" value="notranslate">
\n <meta name="google" value="notranslate">
\n <meta name="format-detection" content="telephone=no">
\n <meta name="format-detection" content="date=no">
\n <meta name="format-detection" content="address=no">
\n <meta name="format-detection" content="email=no">
\n
/g' @
Solution
Your command is very complex for nothing. You can run your sed
on the file, without the grep
and xargs
before. Typically an inline edit of a file with sed
looks like:
sed -i 's/TO_FIND/REPLACE/' FILE.txt
Another comment, sed
is not a great tool to edit HTML. Look at RegEx match open tags except XHTML self-contained tags.
That being said I propose this script to meet your requirement.
#!/bin/bash
#
find . -type f -name "*.html" -print0 | while IFS= read -r -d '' file
do
if [[ $(grep -c 'id="blx-5fb3c619e82a2863d6567c52-000000001" class="blx-5fb3c619e82a2863d6567c52"' $file) -ne 0 ]]
then
# Add the content...
echo "Adding in file $file"
sed -i 's#</head># <meta name="format-detection" content="telephone=no">\n <meta name="format-detection" content="date=no">\n <meta name="format-detection" content="address=no">\n <meta name="format-detection" content="email=no">\n </head>#' "$file"
else
echo "Nothing to do on $file"
fi
done
- Using
find
withwhile
andread
cover cases where you have HTML files in sub-directories. - The
grep
has been highly simplified. If the id and class values are present, it is enough to identify valid files. - Then in the
sed
, you can just add the new lines. Yoursed
replaced lines with these same lines. - I used
#
as a separator insed
instead of/
to avoid confusion with HTML code. - This is based on a file I created myself, since you did not provide a sample. You should provide samples in your questions.
- The order of tags within the
<head>
section is not relevant, so adding lines just before the closing</head>
works. - Obviously the
else
section is optional. <opinion>
I find this type of script easier to understand and debug in the future than long single liners.</opinion>
.
Assuming that index.html is:
<html id="blx-5fb3c619e82a2863d6567c52-000000001" class="blx-5fb3c619e82a2863d6567c52"><head>
<meta charset="utf-8">
<meta name="google" value="notranslate">
<title>TITRE</title>
</head>
<body>
<p>PARAGRAPH</p>
</body>
</html>
The result is:
<html id="blx-5fb3c619e82a2863d6567c52-000000001" class="blx-5fb3c619e82a2863d6567c52"><head>
<meta charset="utf-8">
<meta name="google" value="notranslate">
<title>TITRE</title>
<meta name="format-detection" content="telephone=no">
<meta name="format-detection" content="date=no">
<meta name="format-detection" content="address=no">
<meta name="format-detection" content="email=no">
</head>
<body>
<p>PARAGRAPH</p>
</body>
</html>
Answered By - Nic3500 Answer Checked By - Marilyn (WPSolving Volunteer)