Issue
The text file has many lines of these sort , i want to extract the words after /videos till .mp4 and the very last number ( shown in bold ) and output each filtered line in a separate file
https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/**S4KWZTyt-32313922.mp4**.m3u8?hdnts=exp=1592315851~acl=*/S4KWZTyt-32313922.mp4.m3u8~hmac=83f4674e6bf2576b070c716a3196cb6a30f35737827ee69c8cf7e0c57a196e51 **1**
Lets say for example the text file content is ..
https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/JajSfbVN-32313922.mp4.m3u8?hdnts=exp=1592315891~acl=*/JajSfbVN-32313922.mp4.m3u8~hmac=d3ca7bd5b233a531cfe242d17d2ea0c0167b41b90fff6459e433700ffc969d69 19
https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/Qs3xZqcv-32313922.mp4.m3u8?hdnts=exp=1592315940~acl=*/Qs3xZqcv-32313922.mp4.m3u8~hmac=c30e2082bf748a6b4d1621c1d33a95319baa61798775e9da8856041951cf5233 20
The output should be
JajSfbVN-32313922.mp4 19
Qs3xZqcv-32313922.mp4 20
Solution
You may try the below regex:
.*\/videos\/(.*?mp4).*?(?<= )(\d+)
Explanation of the above regex:
.*
- Matching everything before\videos
.
\/videos\/
- Matching videos literally.
(.*?mp4)
- Represents a capturing group lazily matching everything beforemp4
.
.*?
- Greedily matches everything before the occurrence of digits.
(\d+)
- Represents second capturing group matching the numbers at the end as required by you.
You can find the demo of the above regex in here.
Command line implementation in linux:
cat regea.txt | perl -ne 'print "$1 $2\n" while /.*\/videos\/(.*?mp4).*?(?<= )(\d+)/g;'> out.txt
You can find the sample implementation of the above command in here.
Answered By - user7571182