Issue
I'm trying to replace newline characters in a JSON string, because Postgres' COPY function won't work otherwise.
I've been trying different combinations of sed, such as:
gsed -z 's/\\n/\\\\n/g'
gsed -E ':a;N;$!ba;s/\r{0,1}\n/\\n/g'
gsed '/}$/{N; s/}\n{/},\n{/}
But it either replaces all of them or none.
Here's my sample data:
{"corpusid":75399585,"externalids":{"ACL":null,"DBLP":null,"ArXiv":null,"MAG":"2350229292","CorpusId":"75399585","PubMed":null,"DOI":null,"PubMedCentral":null},"url":"https://www.semanticscholar.org/paper/aec9799d6e12f2000deb8f7dadcc7d7e653f7393","title":"Prevention and Therapy of Osteoporosis","authors":[{"authorId":"40937034","name":"Hu Jun-bo"}],"venue":"","publicationvenueid":null,"year":2003,"referencecount":0,"citationcount":0,"influentialcitationcount":0,"isopenaccess":false,"s2fieldsofstudy":[{"category":"Medicine","source":"s2-fos-model"},{"category":"Medicine","source":"external"}],"publicationtypes":null,"publicationdate":null,"journal":null,"updated":"2022-02-11T07:19:53.877Z"}
{"corpusid":83138554,"openaccessinfo":{"externalids":{"MAG":"883546316","ACL":null,"DOI":null,"PubMedCentral":null,"ArXiv":null},"license":null,"url":null,"status":null},"abstract":"Phytosulfokin-alpha (PSK-alpha) ist ein bisulfatiertes Pentapetid, welches als Wachstumsfaktor die Proliferation von Zellen in Zellkultur niedriger Dichte stimuliert. In der vorliegenden Arbeit wurden physiologische und genetische Ansatze verfolgt, um Funktionen von PSK-alpha in Arabidopsis thaliana aufzuklaren. Promotor-GUS Expressionsanalysen der PSK-Praproproteingene und des PSK-Rezeptorgens AtPSKR1 in Wurzeln und Bluten deuteten auf mogliche Funktionen von PSK-alpha in diesen Geweben hin. \nExogen appliziertes PSK-alpha forderte das Wurzelwachstum von Arabidopsiskeimlingen. Das Wurzelwachstum der PSK-Rezeptor T-DNA Insertionsmutante Atpskr1-T war inhibiert. Diese Ergebnisse zeigten, dass PSK-alpha in die Regulation von Wurzelwachstum involviert ist. \nCharakteristisch fur die Atpskr1-T Mutante war eine verringerte Samenbildung im Vergleich zu Wildtyppflanzen. Genetische Analysen zeigten, dass sowohl ein paternaler als auch maternale Effekte und asynchrones Wachstum der inneren Blutenorgane zur Ausbildung des Mutantenphanotyps fuhren.","updated":"2022-02-11T01:27:48.987Z"}
As you can see, second JSON string has \n within the "abstract" field, it needs to be replaced to \\n
I assume for the COPY to work.
But I have to retain newline in between JSON strings, as there's thousands of them, so that COPY reads that file properly.
Solution
I have reused the portion with jq by Cyrus and wrapped it into a sh script, as I had many files I needed to convert.
Please note it also replaces \"
with "
as it was preventing COPY function from properly importing JSON strings.
Here's the result:
#!/bin/bash
input_file=$1
output_file=$2
while read -r line; do
if echo "$line" | jq -e 'has("abstract")' > /dev/null; then
echo "$line" | jq -c 'try .abstract |= (gsub("\\n"; "\\n") | (gsub("\""; """)))' | tee -a $output_file > /dev/null
else
echo "$line" | tee -a $output_file > /dev/null
fi
done < $input_file
tail -n 1 $input_file | jq -c 'try .abstract |= (gsub("\\n"; "\\n") | (gsub("\""; """)))' >> $output_file
Usage:
./script_name.sh original_file.json converted_file.json
Answered By - deb0rian Answer Checked By - Gilberto Lyons (WPSolving Admin)