Friday, October 7, 2022

[SOLVED] Convert a key value pair String into array using Bash Script

Issue

I have a series of JSON strings containing key value pairs that are structured as below. Note some values can be blank or have spaces.

"result":"guid=guid1;forename=Jimmy Harry;surname=Jones;birthdate=20220201"
"result":"guid=guid2;forename=Sarah-Smith;middlename=,surname=Jones;birthdate=20220201"

I would like to parse the key value pairs into an array using just Bash 3.2 with macOS, so I can iterate through the array for a given key. For example, say the array below was parsed out of the string

[guid=guid1,forename=Jimmy,middlename=,surname=Jones,birthdate=20220201]

I would like to get a given key's value rather than having to loop through each array item and check if the key equals what I am looking for. In Java you can get values based on key but not sure how I do this in Bash 3.

This is what I have tried so far but it doesnt let me check each kv pair as the counter doesnt increment. Only a direct reference with an index like "${array[2]}" will work.

  array=($(echo "$result" | tr ';' '\n'))
  counter=0
  for item in $array
  do
    if [[ $item == "forename*" ]]
    then
      echo "${item[counter]}"
    fi
    echo $((counter++))
  done

Solution

It sounds like you want to use a BASH associative array, but those were not introduced until version 4 of BASH. Further, using a proper JSON parser might be a better approach, but the format of your JSON strings might still be difficult to manage. If you have control of the upstream JSON generation, changing the output format to something more easily parsed would be ideal.

All of that said, one option might be to parse and 'search' by key value using awk. This approach relies on swapping commas for semi-colons in the source data as the second json string in your source data has middlename=,surname... Parsing the individual fields as implemented below requires the data string to be semi-colon delimited.

Data file to process:

$ cat t.dat
"result":"guid=guid1;forename=Jimmy Harry;surname=Jones;birthdate=20220201"
"result":"guid=guid2;forename=Sarah-Smith;middlename=,surname=Jones;birthdate=20220201"

Parse with awk:

search_str="guid"; awk -F":" -v sstr="$search_str" '{
    gsub(",",";",$0);
    gsub("\"","",$0);
    split($2,a,";");
    for(i in a) if (a[i] ~ sstr) {
        split(a[i],b,"="); print b[2]}
}' t.dat

Details:
search_str="guid"; awk -F":" -v sstr="$search_str" Set bash variable with string to search for and pass it to awk as awk variable. Split each record on colon. gsub(",",";",$0); Swap all commas for semi-colons.
gsub("\"","",$0); Remove double quotes.
split($2,a,";"); Split the second field into array named a on semi-colon.
for(i in a) if (a[i] ~ sstr) Loop through a array and search for specified search string.
split(a[i],b,"="); print b[2] If search string is found split a array item into array b on equals sign and print the second item in the b array.

Sample usage with different values for bash variable $search_str to search for various values:

$ search_str="guid"; awk -F":" -v sstr="$search_str" '{
    gsub(",",";",$0);
    gsub("\"","",$0);
    split($2,a,";");
    for(i in a) if (a[i] ~ sstr) {
        split(a[i],b,"="); print b[2]}
}' t.dat
guid1
guid2

$ search_str="surname"; awk -F":" -v sstr="$search_str" '{
    gsub(",",";",$0);
    gsub("\"","",$0);
    split($2,a,";");
    for(i in a) if (a[i] ~ sstr) {
        split(a[i],b,"="); print b[2]}
}' t.dat
Jones
Jones

$ search_str="forename"; awk -F":" -v sstr="$search_str" '{
    gsub(",",";",$0);
    gsub("\"","",$0);
    split($2,a,";");
    for(i in a) if (a[i] ~ sstr) {
        split(a[i],b,"="); print b[2]}
}' t.dat
Jimmy Harry
Sarah-Smith

$ search_str="birthdate"; awk -F":" -v sstr="$search_str" '{
    gsub(",",";",$0);
    gsub("\"","",$0);
    split($2,a,";");
    for(i in a) if (a[i] ~ sstr) {
        split(a[i],b,"="); print b[2]}
}' t.dat
20220201
20220201

$ search_str="middlename"; awk -F":" -v sstr="$search_str" '{
    gsub(",",";",$0);
    gsub("\"","",$0);
    split($2,a,";");
    for(i in a) if (a[i] ~ sstr) {
        split(a[i],b,"="); print b[2]}
}' t.dat

*Note for middlename an empty line is output for the second json string as there is a middlename key but no value.

Update per OP's comment, here is usage in a BASH script without storing the JSON in a file:

Source JSON string stored in a variable that is passed to script along with the search string

$ result='"result":"guid=guid2;forename=Sarah-Smith;middlename=,surname=Jones;birthdate=20220201"'
buck:t_dir buck$ ./script guid "$result"
guid2

Source JSON string passed directly to script along with search string:

$ ./script guid '"result":"guid=guid2;forename=Sarah-Smith;middlename=,surname=Jones;birthdate=20220201"'
guid2

Contents of script:

#!/bin/bash

# use first parameter passed to script as 'search_str' 
# to be searched for in JSON
search_str="${1}"; 

# use second parameter passed to script as 
# the JSON string to be searched
src_json="${2}"

echo "$src_json" | awk -F":" -v sstr="$search_str" '{
    gsub(",",";",$0);
    gsub("\"","",$0);
    split($2,a,";"); 
    for(i in a) if (a[i] ~ sstr) { 
        split(a[i],b,"="); print b[2]}
}'


Answered By - j_b
Answer Checked By - Candace Johnson (WPSolving Volunteer)