Issue
I need to verify if the words in the HTML comments are included in the same line, in this case, delete the comment. Otherwise, keep the comment.
At the same time, the script needs to ignore the pronouns, adverbs, articles. I already have a list and is over 100 hundreds words. Like this:
"the", "this", "I", "me", "you", "she", "her", "he", "him", "it", "they", "them", "that", "which", etc...
This is an example of one line:
text <!-- They are human # life --> text text <!-- the rights --> text the human text
After running the script:
text text text <!-- the rights --> text the human text
Resume:
- in the same line can be many comments, not only one.
- the script needs to ignore my list of pronouns, adverbs, etc...
- the script needs to ignore the words to other comments.
- not sensitive case.
- the files have over one thousand lines.
- usually in the comments I have this character # (I hope is not a problem).
Solution
As others have mentioned, you should show some research, tell what you've tried and why it didn't work.
That being said, I found this to be a fun little challenge, so I decided to give it a go.
I assumed there are two files, "file.html" which we want to modify, and "words.txt" which lists the words to ignore separated by newlines (\n). This script should do the trick:
#!/bin/bash
FILE="file.html"
WORDS="words.txt"
#Set array delimiter to '\n':
IFS=$'\n'
#Find all comments within the file:
comments="$(cat $FILE | grep -oP '<!--[^<]+-->' | sort | uniq)"
for comment in $comments; do
#Words In Comment. Gets all words in the comment.
wic="$(echo $comment | head -1 | grep -oP '[^\s]+' | grep -v '<' | grep -v '>')"
words="$(cat $WORDS)"
#Filtered Words. It's $wic without any of the words in words.txt
fw="$(echo $wic $words $words | tr ' ' '\n' | sort | uniq -u)"
#if any remain
if [ ! -z "$fw" ]
then
for word in $fw; do
#Gets all lines with both the comment and the word outside the comment
lines="$(cat $FILE | grep -P "$comment.+$word|$word.+$comment")"
#If it finds any
if [ ! -z "$lines" ]
then
for line in $lines; do
#Generate the replacement line
replace="$(echo $line | sed "s/$comment//g")"
#Replace the line with the replacement in the file
sed -i "s/$line/$replace/g" $FILE
done
fi
done
fi
done
It's not perfect but gets the job done. Tested it on a file with the following contents:
text <!-- foo # --> foo
text <!-- bar # --> foo
text <!-- bar # --> bar
text <!-- bar # --> text <!-- something # --> something bar
text <!-- foo # --> text <!-- bar # --> text foo bar
Using the following words.txt:
foo
And got the expected result:
text <!-- foo # --> foo
text <!-- bar # --> foo
text bar
text text something bar
text <!-- foo # --> text text foo bar
Answered By - 3snoW Answer Checked By - Mildred Charles (WPSolving Admin)