Friday, October 7, 2022

[SOLVED] Regex that extracts everything until finds "/", starting from the end

Issue

I'm writing a script in bash where I use the grep function with a regex expression to extract an id which I will be using as a variable.

The goal is to extract all characters until it finds /, but the caracter ' and } should be ignored.

file.txt:

{'name': 'projects/data/locations/us-central1/datasets/dataset/source1/messages/B0g2_e8gG_xaZzpbliWvjlShnVdRNEw='}

command:

cat file.txt | grep -oP "[/]+^"

The current command isn't working.

desired output:

B0g2_e8gG_xaZzpbliWvjlShnVdRNEw=

Solution

The regex you gave was: [/]+^

It has a few mistakes:

  • Your use of ^ at the end seems to imply you think you can ask the software to search backwards - You can't;
  • [/] matches only the slash character.

Your sample shows what appears to be a malformed JSON object containing a key-value pair, each enclosed in single-quotes. JSON requires double-quotes so perhaps it is not JSON.

If several assumptions are made, it is possible to extract the section of the input that you seem to want:

  • file contains a single line; and
  • key and value are strings surrounded by single-quote; and
  • either:
    • the value part is immediately followed by }; or
    • the name part cannot contain /

You are using -P option to grep, so lookaround operators are available.

(?<=/)[^/]+(?=')
  • lookbehind declares match is preceded by /
  • one or more non-slash (the match)
  • lookahead declares match is followed by '
[^/]+(?='})
  • one or more non-slash (the match)
  • lookahead declares match is followed by ' then }

Note that the match begins as early in the line as possible and with greedy + it is as long as possible.



Answered By - jhnc
Answer Checked By - Katrina (WPSolving Volunteer)