Friday, October 7, 2022

[SOLVED] Merge 16 lines into one line

Issue

A file contains ASCII characters to represent hex value. (each line ends in ", ")

cat temp.txt
    0x6A, 
    0xF2, 
    0x44,
    .....
    0xF8, 
    0x1A,

I try to combines each 16 words/lines into one line like this

cat hex_result.txt 
0x6A, 0xF2, 0x44, 0xF8, 0x45, 0x41, 0x88, 0xD1,0x4E, 0x8B, 0xA3, 0xB1, 0x8C, 0xE0, 0x37, 0x2D, 
.... 
0xE2, 0x1C, 0x06, 0x8A, 0x75, 0x2B, 0xBC, 0x3C, 0xC5, 0x08, 0xB7, 0x4E, 0xB0, 0xE4, 0xF8, 0x1A,

Is any bash commands to accomplish it ?


Solution

Benchmarking six different merging methods,

for merging specific number of lines.

Basicaly, there are many commands:

pr - convert text files for printing

pr -at16 <file

Try:

pr -a -t -16 < <(seq 1 42)
1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16
17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32
33  34  35  36  37  38  39  40  41  42

xargs - build and execute command lines from standard input

... and executes the command (default is /bin/echo) ...

xargs -n 16 <file

Try:

xargs -n 16 < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

paste - merge lines of files

printf -v pasteargs %*s 16
paste -d\  ${pasteargs// /- } < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

sed - stream editor for filtering and transforming text

printf -v sedstr 'N;s/\\n/ /;%.0s' {2..16};
sed -e "$sedstr" < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

awk - pattern scanning and processing language

awk 'NR%16{printf "%s ",$0;next;}1'  < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

But, you could use pure :

group=()
while read -r line;do
    group+=("$line")
    (( ${#group[@]} > 15 ))&&{
        echo "${group[*]}"
        group=()
    }
  done < <(seq 1 42) ; echo "${group[*]}"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

or as a function:

lgrp () { 
    local group=() line
    while read -r line; do
        group+=("$line")
        ((${#group[@]}>=$1)) && { 
            echo "${group[*]}"
            group=()
        }
    done
    [ "$group" ] && echo "${group[*]}"
}

or

lgrp () { local g=() l;while read -r l;do g+=("$l");((${#g[@]}>=$1))&&{
          echo "${g[*]}";g=();};done;[ "$g" ] && echo "${g[*]}";}

then

lgrp 16 < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

( Note: All this tests was arbitrarily done over 42 values, don't ask my why! ;-)

Other languages

Of course, by using any language, you could do same:

perl -ne 'chomp;$r.=$_." ";( 15 < ++$cnt) && do {
    printf "%s\n", $1 if $r =~ /^(.*) $/;$r="";$cnt=0;
  };END{print $r."\n"}' < <(seq 1 42)

Like python, ruby, lisp, C, ...

Comparison of execution time.

Ok, there are more than 3 simple ways, let do a little bench. How I do it:

lgrp () { local g=() l;while read -r l;do g+=("$l");((${#g[@]}>=
 $1))&&{ echo "${g[*]}";g=();};done;[ "$g" ] && echo "${g[*]}";}
export -f lgrp
printf -v sedcmd '%*s' 15
sedcmd=${sedcmd// /N;s/\\n/ /;}
export sedcmd
{ 
    printf "%-12s\n" Method
    printf %7s\\n count real user system count real user system

    for cmd in 'paste -d " " -{,,,}{,,,}' 'pr -at16' \
        'sed -e "$sedcmd"' \
        $'awk \47NR%16{printf "%s ",$0;next;}1;END{print ""}\47'\
        $'perl -ne \47chomp;$r.=$_." ";( 15 < ++$cnt) && do {
           printf "%s\n", $1 if $r =~ /^(.*) $/;$r="";$cnt=0;
           };END{print $r."\n"}\47' 'lgrp 16' 'xargs -n 16'
    do
        printf %-12s\\n ${cmd%% *}
        for length in 5042 50042; do
            printf %7s\\n $(bash -c "TIMEFORMAT=$'%R %U %S';
                time $cmd < <(seq 1 $length) | wc -l" 2>&1)
        done
    done
} | paste -d $'\t' -{,,,,,,,,}

(This could be cut'n paste in a terminal). Produce, on my computer:

Method        count    real    user  system   count    real    user  system
paste           316   0.002   0.002   0.000    3128   0.003   0.003   0.000
pr              316   0.003   0.000   0.003    3128   0.008   0.005   0.002
sed             316   0.005   0.001   0.003    3128   0.018   0.019   0.000
awk             316   0.003   0.001   0.003    3128   0.017   0.017   0.002
perl            316   0.008   0.002   0.004    3128   0.017   0.014   0.004
lgrp            316   0.058   0.042   0.021    3128   0.733   0.568   0.307
xargs           316   0.232   0.178   0.058    3128   2.249   1.730   0.555

There is same bench on my raspberry pi:

Method        count    real    user  system   count    real    user  system
paste           316   0.149   0.032   0.012    3128   0.204   0.014   0.054
pr              316   0.163   0.017   0.038    3128   0.418   0.069   0.096
sed             316   0.275   0.088   0.031    3128   1.586   0.697   0.045
awk             316   0.440   0.146   0.049    3128   2.809   1.305   0.050
perl            316   0.421   0.122   0.040    3128   2.042   0.902   0.067
lgrp            316   7.261   3.159   0.446    3128  71.733  31.223   3.558
xargs           316   9.464   3.038   1.066    3128  93.607  32.035   9.177

Hopefully all line count are same, then paste are clearly the quicker, followed by pr. Pure function is not slower than xargs (I'm surprised about poor performance of xargs!).



Answered By - F. Hauri - Give Up GitHub
Answer Checked By - Marilyn (WPSolving Volunteer)