Thursday, February 3, 2022

[SOLVED] Extract substring within a given string

Issue

I've read and attempted to extract a substring from a given string with awk, sed or grep but I am unable to get it working or think how to accomplish this.

I have the string below which describes configurations of my VMs:

config: diskSizeGb: 100 diskType: pd-standard imageType: COS_CONTAINERD machineType: e2-micro metadata: disable-legacy-endpoints: 'true' preemptible: true status: RUNNING version: 1.19.9

How can I extract a substring for example, "preemptible: true" or "status: RUNNING" knowing that the values can be different for each VM?

Thank you!


Solution

Assumptions:

  • the VM config name/value pairs may not be in the same order
  • config names and values are single strings with no embedded white space
  • each config name is preceded by (at least) one space, and followed immediately by a colon (:)
  • there may be multiple spaces between the colon (:) and the config value; we want to maintain these spaces in the output

One idea using sed and a capture groups:

# note: extra spaces placed between 'version:' and '1.19.9'

cfg_string="config: diskSizeGb: 100 diskType: pd-standard imageType: COS_CONTAINERD machineType: e2-micro metadata: disable-legacy-endpoints: 'true' preemptible: true status: RUNNING version:   1.19.9"

for config in preemptible status version
do
        echo "++++++++++++++ ${config}"
        sed -nE "s/.* (${config}:[ ]*[^ ]*).*/\1/p" <<< "${cfg_string}"
done

sed details:

  • -nE - disable default printing of the input (we'll use /p to explicitly print our capture group; enable Extended regex support
  • .* (${config}:[ ]*[^ ]*).* - match variable number of characters (.*) + a space ( ) + ${config} + a colon (:) + one or more spaces ([ ]*) + everything that follows that is not a space ([^ ]*) + the rest of the input (.*); the parens mark the start/end of the capture group (only one capture group in this case)
  • \1 - reference capture group #1 (ie, everything inside of the parens)
  • /p - print (the capture group)

This generates:

++++++++++++++ preemptible
preemptible: true
++++++++++++++ status
status: RUNNING
++++++++++++++ version
version:   1.19.9                # extra spaces maintained

NOTES:

  • obviously an invalid config name (eg, stat, versions) is going to produce no output
  • the sed results could be captured in a variable for further testing/processing (would address issue of an invalid config name)


Answered By - markp-fuso
Answer Checked By - Pedro (WPSolving Volunteer)