Issue
I have a large directory of files, which I need to look through for specific lines, because they need to be updated.
The format I am looking for always starts with <topicref, and then after that, it must have href="../, but will have some text after it. For example: href="../example.md". After that, it might have scope="peer", some other lines, and will end off with either > or />.
So far, I've come up with a regex that address finding the lines I want:
pcregrep -HnM '<topicref(.*) href="..\/(.*).dita(.*)[^>]*'
However, I'm having trouble filtering out the results that have scope="peer". I tried doing
pcregrep -HnM '<topicref(.*) href="..\/(.*).dita(.*)[^>]*' directory | pcregrep - Mv 'scope="peer" > file
But the results from this would strictly show all the lines that don't have 'scope="peer"' in it from the overall result from the previous pcregrep, so there would be random results that shouldn't be included, and also I am unable to track which files these results are from.
Is it possible to see all the <topicref href="../... >
mentions without scope="peer"
?
Three examples of lines with scope="peer":
<topicref href="../cat.md" scope="peer"
something />
<topicref href="../cat.md"
something scope="peer"
something />
<topicref href="../cat.md"
scope="peer"
something></topicref><map>
Solution
You can use
pcregrep -HnM '<topicref(?![^>]*\sscope="peer")(?:\s[^>]+)?\shref="\.\./([^"]*)\.dita[^>]*>' file
Details
<topicref
- a literal string(?![^>]*\sscope="peer")
- no whitespace +scope="peer"
allowed after any zero or more chars other than>
immediately to the right of the current position(?:\s[^>]+)?
- an optional whitespace, one or more chars other than>
\shref="\.\./
- whitespace,href="../
string([^"]*)
- Group 1: zero or more chars other than"
\.dita
-.dita
string (replace with\.md
if you need to match.md
)[^>]*>
- zero or more chars other than>
and then a>
.
Answered By - Wiktor Stribiżew Answer Checked By - Senaida (WPSolving Volunteer)