Issue
I am doing a school project and I came across something that shouldn't work in theory.
I need to create two programs where one communicates with the other through unix signals, I will call them client and server, I pass a message in my client's argv, break each char into bit and send to the server
The idea is to use bitwise communication (Something simple and rudimentary, if the bit is 0 I send SIGUSR1 to the server PID using the kill system call, if it is 1 I send SIGUSR2.
#client send a char to server
int send_sig(int pid, unsigned char b)
{
int a;
a = 0;
while (a < 8)
{
if (b & 1)
kill(pid, SIGUSR2);
else
kill(pid, SIGUSR1);
b = b >> 1;
a++;
usleep(1000);
}
return (0);
}
the problem is when I use unicode characters, the argv will always be a string (an array of char) so when I pass some unicode character it will vary from 1 to 4 bytes, even so the process continues normal, the problem happens on my server side where I get these bits
The way I structured my code is that I need to print one bit at a time (which is acceptable since in theory a char in C is equivalent to one byte) but even when passing 4 byte unicode characters, printing them one at a time it keeps working (it's like Russian roulette, it breaks sometimes and works normally sometimes)
# Server receiving the
unsigned char reverse(unsigned char b)
{
b = (b & 0xF0) >> 4 | (b & 0x0F) << 4;
b = (b & 0xCC) >> 2 | (b & 0x33) << 2;
b = (b & 0xAA) >> 1 | (b & 0x55) << 1;
return (b);
}
void signal_handler(int sig, siginfo_t *p_info, void *ucontext)
{
static unsigned int a = 0;
static unsigned int b = 0;
a <<= 1;
if (sig == SIGUSR2)
a++;
b++;
if (b == 8)
{
b = 0;
ft_printf("%c\0", reverse(a));
}
p_info = p_info;
ucontext = ucontext;
}
Why this behavior happens ? wasn't it just for it to break and print something wrong ?
Expeculations:
the way I print on stdout without NULL byte make the shell and terminal interpreter a whole byte without losing the UTF-8 map
The unicode fitt in char (But this is impossible I guess)
reproduce this behavior with theses code:
#client.c file
#include <signal.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
void send_sig(int pid, char b)
{
int a = 0;
printf("%c", b);
while (a < 8)
{
if (b & 1)
kill(pid, SIGUSR2);
else
kill(pid, SIGUSR1);
b >>= 1;
a++;
usleep(500);
}
}
int main(int argc, char *argv[])
{
char *s = "🤨🤨🤨🤨🤨🤨🤨";
while (*s++ != '\0')
send_sig(atoi(argv[1]), *s);
}
#server.c file
#include <unistd.h>
#include <stdio.h>
#include <signal.h>
unsigned char reverse(unsigned char b)
{
b = (b & 0xF0) >> 4 | (b & 0x0F) << 4;
b = (b & 0xCC) >> 2 | (b & 0x33) << 2;
b = (b & 0xAA) >> 1 | (b & 0x55) << 1;
return (b);
}
void signal_handler(int sig, siginfo_t *p_info, void *ucontext)
{
static unsigned int a = 0;
static unsigned int b = 0;
a <<= 1;
if (sig == SIGUSR2)
a++;
b++;
if (b == 8)
{
b = 0;
a = reverse(a);
write(1, &a, 1);
}
p_info = p_info;
ucontext = ucontext;
}
int main(void)
{
struct sigaction act;
act.sa_sigaction = signal_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
sigaction(SIGUSR1, &act, NULL);
sigaction(SIGUSR2, &act, NULL);
printf("The server pid: %d\n", getpid());
while (1)
usleep(300);
}
Solution
Sending unicode bit by bit can be implemented either by sending the 16 (UTF-16) or 32 (UTF-32) bit value (that means a character transmission is always 16 or 32 bits long) or byte by byte. If latter, then the first byte determines the number of bytes (bits) in the transmission. Currently, your server reads only 8 bits and sends the received byte to output, the rest of the (possible multibyte character) bytes are not considered and discarded.
If your server has the first byte (8-bits), then do the following to calculate the number of bytes in the transmission:
if (byte < 0x80)
num_bytes = 1; //single byte, no further read required
else if ((byte & 0xe0) == 0xc0)
num_bytes = 2; //one more byte to read
else if ((byte & 0xf0) == 0xe0)
num_bytes = 3; //two more bytes to read
else if ((byte & 0xf8) == 0xf0)
num_bytes = 4; //three more bytes to read
Then, to form a valid utf8 (multibyte) character, read the following (if any) bytes into a char array, e.g. unsigned char utf8_bytes[4];
Of course, in order to form a valid null-terminated (printable) string the size of the array has to be 5
and the last byte set to '\0'
.
Addition
Your client is sending the bit-sequence (byte: 10101010) as follows:
1010101|0 -> SIGUSR1
101010|1 -> SIGUSR2
10101|0 -> SIGUSR1
1010|1 -> SIGUSR2
101|0 -> SIGUSR1
10|1 -> SIGUSR2
1|0 -> SIGUSR1
|1 -> SIGUSR2
So, every time your server is receiving a SIGUSR2
it has to set the bit at a certain position, which can be easily done like this:
if (sig == SIGUSR2)
byte |= (1 << bit_counter);
++bit_counter;
The complete server code could look like this:
void signal_handler(int sig, siginfo_t *p_info, void *ucontext)
{
static unsigned char utf8_bytes[5]; //multibyte storage
static unsigned char byte = 0; //bitset
static int byte_index = 0; //current position in the mb storage
static int bit_counter = 0; //number of bits received
static int num_bytes = 1; //total number of bytes of mb character
if (sig == SIGUSR2) //bit: 1
byte |= (1 << bit_counter); //set the according bit in byte
if (++bit_counter == 8) { //we received 8 bits -> 1 byte
if (byte_index == 0) { //if first byte in sequence
if (byte < 0x80)
num_bytes = 1; //single byte, no further read required
else if ((byte & 0xe0) == 0xc0)
num_bytes = 2; //one more byte to read
else if ((byte & 0xf0) == 0xe0)
num_bytes = 3; //two more bytes to read
else if ((byte & 0xf8) == 0xf0)
num_bytes = 4; //three more bytes to read
}
//since we completed 1 byte, decrease num_bytes
if (--num_bytes == 0) { //and if there are no more bytes to read
utf8_bytes[++byte_index] = '\0'; //make null-terminated string
//printf("%s\n", utf8_bytes); //do something useful
byte_index = 0; //reset the byte index
} else { //we need further reading
utf8_bytes[byte_index++] = byte; //store the byte
}
bit_counter = 0; //reset counter
byte = 0; //reset byte (set all bits to zero)
}
p_info = p_info;
ucontext = ucontext;
}
Answered By - Erdal Küçük Answer Checked By - Gilberto Lyons (WPSolving Admin)