Issue
Context: I have a csv file containing a products export from prestashop and I need to remove every occurence of any visual composer shortcode inside of it.
I found this regex "/\[(\/*)?vc_(.*?)\]/"
(here) that can help.
Now I'm tring to use sed
with that and I built this one line command but it not working at all (option of s unknown).
sed -i -E "s\/\[(\/*)?vc_(.*?)\]//\g" origin.csv
What am i missing?
Edit: The problem is in the Product Description column: e.g.
[vc_row][vc_column width="1/1"][vc_column_text]
ADULT GRAIN FREE
[/vc_column_text][vc_separator color="grey"][vc_column_text]
RICETTA COMPLETA PER CANI ADULTI DI TUTTE LE RAZZE
[/vc_column_text][vc_separator color="grey"][vc_column_text]
65% DI CARNE FRESCA DI POLLO, FRUTTA & VERDURA
That is full of this [vc_row] and similar. A desired output will be like this:
ADULT GRAIN FREE
RICETTA COMPLETA PER CANI ADULTI DI TUTTE LE RAZZE
65% DI CARNE FRESCA DI POLLO, FRUTTA & VERDURA
Solution
sed
does not support non-greedy matching. The regex dialect supported by sed
is rather primitive, and far predates the Perl features which are now supported in many regex implementations.
The simple fix is to switch to Perl:
perl -pi -e 's%\[/?vc_(.*?)\]%%g' origin.csv
Notice the switch to alternate delimiters to avoid the need to backslash slashes. You were backslash-escaping the slashes which should not be escaped, too!
It's not impossible to do this in sed
, either. Just be more specific about what you want. Non-greedy matching is often a lazy (sic) way to avoid saying what you really mean.
sed -i -E "s%\[/?vc_[^][]*\]%%g" origin.csv
The updated regex says there can be anything except square brackets in the match after vc_
which is presumably what you wanted to say all along.
I'm also assuming there can't really be multiple slashes before vc_
and so we simply say /?
to indicate one slash max, optional.
Nothing here is specific to CSV; this should work for any text file (though to be really correct, you would need a more complex regex to cover corner cases like a vc_
code with a comma inside it; but let's just assume you don't have any).
Answered By - tripleee Answer Checked By - David Goodson (WPSolving Volunteer)