Issue
Just curious if all MOV
's could be replaced with PUSH/POP
's in a program?
I understand such replacements are not practical and inefficient.
This href="https://godbolt.org/z/5eTszsxYx" rel="nofollow noreferrer">godbolt example shows a standard printf call using MOV and another printf with PUSH/POP for comparison. My intuition says it's possible but there are probably some gotchas along the way?
#include <stdio.h>
char format_string[] asm("format_string") = "%d %d %d %d %d\n";
void MOV_printf() {
__asm__ (
"subq $128, %%rsp\n\t"
"lea format_string(%%rip), %%rdi\n\t"
"movq $1, %%rsi\n\t"
"movq $2, %%rdx\n\t"
"movq $3, %%rcx\n\t"
"movq $4, %%r8\n\t"
"movq $5, %%r9\n\t"
"call printf\n\t"
"addq $136, %%rsp\n"
::: "rdi", "rsi", "rdx", "rcx", "r8", "r9", "rsp"
);
}
void PUSH_POP_printf() {
__asm__ (
"subq $128, %%rsp\n\t"
"lea format_string(%%rip), %%rdi\n\t"
"pushq $1\n\t"
"popq %%rsi\n\t"
"pushq $2\n\t"
"popq %%rdx\n\t"
"pushq $3\n\t"
"popq %%rcx\n\t"
"pushq $4\n\t"
"popq %%r8\n\t"
"pushq $5\n\t"
"popq %%r9\n\t"
"call printf\n\t"
"addq $136, %%rsp\n"
::: "rdi", "rsi", "rdx", "rcx", "r8", "r9", "rsp"
);
}
int main() {
MOV_printf();
PUSH_POP_printf();
return 0;
}
Solutions
MOV r64, imm64
--- Replace with 4 pushw
and a popq
. example
MOV AH,DL
--- Simulate it using push/pop
and a scratch buffer. example
Gotchas
A mov
is required for these
- Control Registers (CR0, CR2, CR3, CR4, etc.,...)
- Debug Registers (DR0, DR1, DR2, etc.,...)
Solution
Byte stores like mov %al, (%rdi)
are not possible with push
. Any emulation that loads/stores the containing word or qword and stores it back won't be thread-safe; non-atomic RMW of the containing word could step on a store to the other byte by another thread. (Can modern x86 hardware not store a single byte to memory? - it can, so can most ISAs, despite misconceptions.)
If you're willing to accept non-thread-safe emulation, then perhaps partially-overlapping pop m16
operations could construct a word with the value you're looking for in a static buffer, which you can pop m16
/ push m16
to copy over the original byte.
But you won't know whether the byte at (%rdi)
is the low or high byte of the 16-bit word that contains it, so you won't know which of -1(%rdi)
or 0(%rdi)
you can access without possibly segfaulting by going into the next page. Only an aligned 16-bit load / store is guaranteed not to cross any wider boundaries (like 4k page), and thus can't page-fault if the word contains any bytes you know are valid. (Is it safe to read past the end of a buffer within the same page on x86 and x64?)
push
/pop
alone can't check the low bit of %rdi
and branch accordingly.
(x86-64 makes it impossible to have segment limits, which in 32-bit mode could have been an odd number of bytes, in the general case not assuming a flat memory model. But actually, x86-64 (still?) makes odd segment bases possible for FS and GS, I think, so mov %al, %fs:(%rdi)
is even more unknown; even if you could test %1, %dil
; jnz
, that still wouldn't tell you whether the linear address was odd or even.)
Also, in addition to debug and control registers, x86-64 removed the opcodes for push/pop of segment registers other than FS/GS. https://www.felixcloutier.com/x86/push. So mov ds, eax
is not emulatable either.
Answered By - Peter Cordes Answer Checked By - Terry (WPSolving Volunteer)