Issue
I have a gigantic json file that was accidentally output without a newline character in between all the json entries. It is being treated as one giant single line. So what I did was try and take a find an replace with sed and insert a newline.
sed 's/{"seq_id"/\n{"seq_id"/g' my_giant_json.json
It doesn't output anything
However, I know my sed expression is working if I operate on just a small part of the file and it works fine.
head -c 1000000 my_giant_json.json | sed 's/{"seq_id"/\n{"seq_id"/g'
I have also tried using python with this gnarly one liner
'\n{"seq_id'.join(open(json_file,'r').readlines()[0].split('{"seq_id')).lstrip()
But this loads into memory thanks to readlines() method. But I don't know how to iterate through a giant single line of characters (iterate in chunks) and do a find and replace.
Any thoughts?
Solution
Perl will let you change the input separator ($/
) from newline to another character. You could take advantage of this to get some convenient chunking.
perl -pe'BEGIN{$/="}"}s/^({"seq_id")/\n$1/' my_giant_json.json
That sets the input separator to be "}"
. Then it looks for chunks that start with {"seq_id"
and prefixes them with a newline.
Note that it puts an unnecessary empty line at the beginning. You could complicate the program to eliminate that or just delete it manually after.
Answered By - Bo Borgerson