Friday, September 2, 2022

[SOLVED] (sed / awk) I need to extract a number inbetween two strings from a very long compplicated input which includes many special characters

Issue

I need to extract a number inbetween two strings from a very long compplicated input which includes many special characters, so I don't know how to seperate them when using sed or awk to extract just the number from "pk" (from the example below the output should look like this: 19473

so my input looks similar to this:

{"pagination":{"next":0,"previous":0,"count":1,"current":1,"total_pages":1,"start_index":1,"end_index":1},"results":[{"pk":19473,"username":"someuser12.999name","name":"someuser12.999name","is_active":true,"last_login":null,"is_superuser":false,"groups":[],"groups_obj":[],"email":"[email protected]","avatar":"https://secure.gravatar.com/avatar/

Solution

Your example is not valid JSON, but you probably forgot some characters.

You can choose to see the string as just a string. In this case, look at how to match/capture using regular expression with awk.

In the longer run, I think you will be happier if you can use proper json extraction. A good tool is jq same question and example here.

In your case, this will give the result:

c:\temp> type jsonstring.txt |  jq-win32.exe ".results | .[0] |.pk" 
19473

your formatted json string is here

{
  "pagination": {
    "next": 0,
    "previous": 0,
    "count": 1,
    "current": 1,
    "total_pages": 1,
    "start_index": 1,
    "end_index": 1
  },
  "results": [
    {
      "pk": 19473,
      "username": "someuser12.999name",
      "name": "someuser12.999name",
      "is_active": true,
      "last_login": null,
      "is_superuser": false,
      "groups": [],
      "groups_obj": [],
      "email": "[email protected]",
      "avatar": "https://secure.gravatar.com/avatar/"
    }
  ]
}


Answered By - MyICQ
Answer Checked By - Pedro (WPSolving Volunteer)