Issue
I have a very large text file (tab-delimited, first line is header) like this:
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD AC 150 2367 0.02 123 I
FA AFQ ASB 123 2473 0.4 630 I
As you can see, there are two strings in column 3 at the lines 3 and 4 columns 3 (A3). Could you please help me out with how I can delete these strings and shift cells left using awk, sed or any Linux codes to have the corrected file like:
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD 150 2367 0.02 123 I
FA AFQ 123 2473 0.4 630 I
I tried:
awk 'if($3!~/[0-9]+/') $3=$4}1', file
It removes any strings in column 3 and replaces them with column 4, but without shifting cells left.
Solution
Using sed
$ sed '1!s/^\([^ ]* \+[^ ]* \+\)[A-Z][^ ]* \+/\1/' input_file
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD 150 2367 0.02 123 I
FA AFQ 123 2473 0.4 630 I
1!
- Do not match line 1
^\([^ ]* \+[^ ]* \+\)
- Using backreferencing, we can store to memory the contents of a parenthesis, this will match everything up to the second space.
[A-Z][^ ]* \+
- Anything not within the parenthesis will be excluded from the matched return. If the third column contains capital alphabetic characters, then evrything up to the next space is excluded.
\1
- Return anything captured within the parenthesis
Answered By - HatLess Answer Checked By - Mary Flores (WPSolving Volunteer)