Issue
My file is in the form:
EMPLOYEE
FIRST NAME: JOHN
LAST NAME: DOE
POSITION: ACCOUNT MANAGER
EMPLOYEE
FIRST NAME: BIG
LAST NAME: BOSS
POSITION: CEO
Well, it's a bit more complex than that, but it is enough to have a solution for it.
I try to fix the casing to title case while keeping the alignment and fields names unchanged:
EMPLOYEE
FIRST NAME: John
LAST NAME: Doe
POSITION: Account Manager
EMPLOYEE
FIRST NAME: Big
LAST NAME: Boss
POSITION: CEO
I have used this so far:
sed -E '/^\s{0,}(FIRST NAME|LAST NAME|POSITION)/ { s/((^\s{0,})(FIRST NAME|LAST NAME|POSITION))/\1/; T; s/(\b[A-Za-z])([A-Za-z]*)\b/\U\1\L\2/g; }' employees.list
But it seems not to avoid changing the casing of the field names (FIRST NAME, LAST NAME, POSITION), so these become:
EMPLOYEE
First Name: John
Last Name: Doe
Position: Account Manager
EMPLOYEE
First Name: Big
Last Name: Boss
Position: Ceo
(did not yet go to handle content like CEO
).
Is this achievable with sed
? If so, how?
Solution
{0,}
?? Just *
.
What is really hard is that you want to apply the "first uppercase rest lowercase" regex on part of the string. What I usually do, is put part of the input into h
old space separated by newline, then remove it. Then I can work on the interesting part, finally g
rab the hold space and res-huffle the output.
sed -E '
/: CEO/{p;d}
/^(\s*(FIRST NAME|LAST NAME|POSITION):\s*)/{
# empty s// reuses last regex
# add a newline betweej <this>: <and this>
s//\1\n/
# hold current line with the newline
h
# Remove the first part.
# `\s*` in regex above super nicely "catches" newline added above.
s///
# capitalize
s/\b([A-Za-z])([A-Za-z]*)\b/\U\1\L\2/g
# join with a newline and hold space
G
# use the capitalized part with the <prefix:> part.
s/([^\n]*)\n([^\n]*).*/\2\1/
}
'
Outputs:
EMPLOYEE
FIRST NAME: John
LAST NAME: Doe
POSITION: Account Manager
EMPLOYEE
FIRST NAME: Big
LAST NAME: Boss
POSITION: CEO
Overall, consider a real programming languages, more like awk
or python
etc.
Actually, you can capitalize all words and then just re-uppercase the first part, but you would have to how to exclude the EMPLOYEE
line. So you can just do this:
sed -E '
/: CEO/{p;d}
/^(\s*(FIRST NAME|LAST NAME|POSITION):\s*)(.*)/{
s/\b([A-Za-z])([A-Za-z]*)\b/\U\1\L\2/g
s/^(\s*(FIRST NAME|LAST NAME|POSITION):\s*)(.*)/\U\1\E\3/i
}
'
Answered By - KamilCuk Answer Checked By - Senaida (WPSolving Volunteer)