Issue
I've seen the following stackoverflow How to use regex for multiple line pattern in shell script but it doesn't do exactly what I want. I'm looking for a terminal based way of doing an in-place sed
(or perl
) regex that will auto change some files for me. (I can probably do it with xml libraries/etc., but I would prefer to use the terminal).
The file I have
Some text
<div class="firstClass secondClass" something="else">
Some random stuff
</div>
Random Text
<div class="thirdClass fifthClass" something="else">
Some random stuff
< is something
< but not /> This
</div>
<div class="fourthClass">
Some random stuff
</div>
Final Text
I tried to do an arbitrary enough example to show a few different use-cases. I'm trying to convert it into something like the following:
Some text
<!-- firstClass start -->
Some random stuff
<!-- firstClass end -->
Random Text
<!-- thirdClass start -->
Some random stuff
< is something
< but not /> This
<!-- thirdClass end -->
<!-- fourthClass start -->
Some random stuff
<!-- fourthClass end -->
Final Text
I am trying the following code:
sed -n '/<div class="\([^ "]*\)[^>]*>/,/<\/div>/{s/<div class="\([^ "]*\)[^>]*>/<!-- \1 start -->/;/<\/div>/d;p}' file
but since in the previous stackoverflow question the person didn't want the final line, the answers deleted it, which is not what I want. As can be seen, I want that first text repeated before and after the inside contents.
The regex above properly fixes the first line (changes the div to a comment), but I can't seem to replicate that below the text. I tried to mess around with the regex expression, but I can't seem to get it to work. It's additionally cutting out the very first line and the last lines although I'd like to keep them. Any ideas how to do something like this?
(PS, yes, I know we need sed -i
for an in-place command, but I want to test it out before I actually run through with it for obvious reasons)
Edit: A little addendum as to the idea of what I'm trying to do. Although the above is HTML, this code is not necessarily exclusively for HTML (hence why I don't want HTML/XML processing). The idea is:
Some random text before my pattern
PATTERN "info ...
random stuffs
END PATTERN
Some random stuff after pattern
I'd like this to be converted to
Some random text before my pattern
NEW PATTERN - info
random stuffs
END NEW PATTERN - info
Some random stuff after pattern
So no html necessarily. Just something that takes a pattern above some text, replicates it below. The only condition is that random stuffs
will not have the text END PATTERN
and so that's what I want to base it off of. random stuffs
will 100% never ever have the END PATTERN
text. There's no nesting involved nor any edge cases. It's always the same pattern as shown above. The only "edge" case is that the first line PATTERN "info ...
might have some extra text up until a line break which I don't care about. That can always be deleted. I only care about the word info
(aka up until the first space character or first "
character.)
Solution
For starters, here is a simple take that works in my tests on the particular posted text
s{<div\s+ class="(\S+) (.*?) </div>}{<!-- $1 --> $2 <!-- $1 end -->}sxg;
The modifiers are: s
so that .
matches a linefeed as well (normally it doesn't), x
so that literal spaces are ignored, what helps readability, and g
so that this keeps going through the string, matching-and-substituting.
I'd recommend a program in a file for this, not a command-line one ("one-liner"), but since that was specifically asked for in the question here
perl -0777 -wpe'
s{<div\s+ class="(\S+) (.*?) </div>}{<!-- $1 --> $2 <!-- $1 end -->}sxg'
The -0777
switch makes it read the entire file into the $_
variable, which is default for many things in Perl -- regex's s{}{}
operator in this case. See switches in perlrun.
In a larger and more structured program you could perhaps have beginning and end patterns in variables, for
s{$pbeg (.*?) $pend}{...}sxg
where for this case it would be
my $pbeg = qr{<div\s+ class="(\S+)};
my $pend = qr{</div>}
However, this could turn unwieldy if those patterns get complex/
Answered By - zdim Answer Checked By - Robin (WPSolving Admin)