Friday, October 28, 2022

[SOLVED] Regex for printing pattern from string

Issue

i have a file with below content. i need to separate the content into 2 files o/p1 should have content everything within first braces () and ` removed and only 1&2 columns printed. o/p2 should have location with its value

$ cat dt.txt
CREATE EXTERNAL TABLE `rte.fteff_ft`(
  `dt` date,
  `wk_id` int,
  `yq_id` int(10,00),
  `te_ind` string,
  `yw_dt` date,
  `em_dt` date comment dfdsf sdfsdf)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\u0007'
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://dfdf/data/ffff/ODE/TdddfT/'
TBLPROPERTIES (
  'last_modified_by'='asdas',
  'last_modified_time'='1639551681',
  'numFiles'='1',
  'totalSize'='2848434',
  'transient_lastDdlTime'='1639551681')

i need output from the above in two files.

o/p1: a.txt
  dt date,
  wk_id int,
  yq_id int(10,00),
  te_ind string,
  yw_dt date,
  em_dt date

o/p2: b.txt

LOCATION
  'hdfs://dfdf/data/ffff/ODE/TdddfT/'

Solution

First, use sed to run a couple of commands, to operate on the range of lines between 'CREATE EXTERNAL' and 'ROW DELIMITED FORMAT' where they occur at the start of the line, not including those lines. Then replace grave accent marks with nothing, then keep only the first 2 words.

sed -E '/CREATE EXTERNAL/,/ROW FORMAT DELIMITED/!d;//d;s/`//g; s/(([^ ]+ ){2}).*/\1/' dt.txt > a.txt

EDIT: To remove the commas at the end of the line, add another command of s/,$// . Make sure to anchor the comma to the end of the line else you'll get the comma in the int declaration.

sed -E '/CREATE EXTERNAL/,/ROW FORMAT DELIMITED/!d;//d;s/`//g;s/,$//; s/(([^ ]+ ){2}).*/\1/' dt.txt > a.txt

Second, use the -A option to grep to match the word 'LOCATION' on a line by itself plus the following 1 line.

grep -A 1 '^LOCATION$' dt.txt > b.txt


Answered By - Gary_W
Answer Checked By - Mildred Charles (WPSolving Admin)