Issue
In an embedded application I'm running a Yocto based linux distrubution with RT patches. One thread is writing data to a file on a NFS filesystem and another thread is once every second running chronyc tracking with popen.
The problem is that chronyc tracking doesn't return within 100 ms if both threads runs in the same application. If, however, each thread runs in its own process it works as expected. As I have understood the linux CFS (Completely Fair Scheduler) it shouldn't matter if the threads belongs to the same process or not.
I expect that execute both threads in the same process should be the same as execute each thread in its own process.
- I have tried without the RT patches with the same behavior.
- I have tried a 4.x, 5.x and 6.x kernel with the same outcome.
- I have tried to write to CIFS instead of NFS with the same outcome.
- Writing to a sd card it works as expected.
- Writing to sshfs filesystem works as expected.
- I have tested to set RT prio 70 (chrt 70 timeout 0.1 chronyc tracking) on timeout and chronyc tracking without improvement.
Is there anyone that knows what is going on here? Is there some kernel parameters that should be changed? Some limitations on the process?
Any good way to debug this issue?
Code to reproduce the issue:
#include <iostream>
#include <fstream>
#include <thread>
bool done{false};
std::string path;
bool command(const char * cmd)
{
std::array<char, 128> buffer;
std::string result;
auto pipe = popen(cmd, "r");
if (!pipe)
{
std::cout << "Failed to open pipe\n";
return false;
}
while (fgets(buffer.data(), 128, pipe) != nullptr)
result += buffer.data();
return pclose(pipe) == EXIT_SUCCESS;
}
void tracking()
{
while (!done)
{
auto res = command("timeout 0.1 chronyc tracking");
if (!res)
std::cout << "Timeout waiting for chrony tracking\n";
else
std::cout << "Got chrony tracking\n";
std::this_thread::sleep_for(std::chrono::seconds(1));
}
}
void write_file()
{
while (!done)
{
// Open file, do some writing, close file
std::ofstream file;
file.open(path);
for (int i = 0; i < 100000; i++)
{
file << "lmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvbcdefghjklmnopqrstuvabcdefghjklmnopqrstuvlmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvabcdefghjklmnopqrstuvbcdefghjklmnopqrstuvabcdefghjklmnopqrstuv";
}
file.close();
}
}
int main(int argc, char **argv)
{
if (argc != 3)
{
std::cout << argv[0] << " <both/chrony/file> /path/to/file\n";
return 1;
}
std::string cmd = argv[1];
path = argv[2];
if (cmd == "both")
{
std::jthread t1(tracking);
std::jthread t2(write_file);
std::this_thread::sleep_for(std::chrono::minutes(2));
done = true;
}
else if (cmd == "chrony")
{
std::jthread t1(tracking);
std::this_thread::sleep_for(std::chrono::minutes(2));
done = true;
}
else if (cmd == "file")
{
std::jthread t2(write_file);
std::this_thread::sleep_for(std::chrono::minutes(2));
done = true;
}
else
{
return 1;
}
return 0;
}
Update: With perf I have collected some scheduling information with trace.
chronyc in one thread and NFS write in another:
chronyc 4123/4123 [001] 2057.379201: 1 sched:sched_switch: prev_comm=chronyc prev_pid=4123 prev_prio=120 prev_state=R+ ==> next_comm=rcu_preempt next_pid=15 next_prio=98
c0b5783c __schedule ([kernel.kallsyms])
c0b5783c __schedule ([kernel.kallsyms])
c0b57fb8 preempt_schedule_irq ([kernel.kallsyms])
c0100c0c svc_preempt ([kernel.kallsyms])
c04163f0 nfs_page_clear_headlock ([kernel.kallsyms])
c04167e0 __nfs_pageio_add_request ([kernel.kallsyms])
c0417300 nfs_pageio_add_request ([kernel.kallsyms])
c041b664 nfs_page_async_flush ([kernel.kallsyms])
c041b9d0 nfs_writepages_callback ([kernel.kallsyms])
c026b88c write_cache_pages ([kernel.kallsyms])
c041bb28 nfs_writepages ([kernel.kallsyms])
c026de78 do_writepages ([kernel.kallsyms])
c025fd8c filemap_fdatawrite_wbc ([kernel.kallsyms])
c02604bc filemap_write_and_wait_range ([kernel.kallsyms])
c041c484 nfs_wb_all ([kernel.kallsyms])
c040c9b4 nfs_file_flush ([kernel.kallsyms])
c02e08b4 filp_close ([kernel.kallsyms])
c0309014 put_files_struct ([kernel.kallsyms])
c012b2f8 do_exit ([kernel.kallsyms])
c012bb1c do_group_exit ([kernel.kallsyms])
c012bb80 __wake_up_parent ([kernel.kallsyms])
chronyc in one thread and CIFS write in another:
chronyc 9269/9269 [001] 6244.795802: 1 sched:sched_switch: prev_comm=chronyc prev_pid=9269 prev_prio=120 prev_state=R+ ==> next_comm=rcuc/1 next_pid=24 next_prio=98
c0b5783c __schedule ([kernel.kallsyms])
c0b5783c __schedule ([kernel.kallsyms])
c0b57fb8 preempt_schedule_irq ([kernel.kallsyms])
c0100c0c svc_preempt ([kernel.kallsyms])
c07a87e8 macb_start_xmit ([kernel.kallsyms])
c098dc04 dev_hard_start_xmit ([kernel.kallsyms])
c09d43bc sch_direct_xmit ([kernel.kallsyms])
c098e28c __dev_queue_xmit ([kernel.kallsyms])
c0a0ae9c ip_finish_output2 ([kernel.kallsyms])
c0a0c890 __ip_queue_xmit ([kernel.kallsyms])
c0a2da94 __tcp_transmit_skb ([kernel.kallsyms])
c0a2f160 tcp_write_xmit ([kernel.kallsyms])
c0a2ff80 __tcp_push_pending_frames ([kernel.kallsyms])
c0a1b24c tcp_sendmsg_locked ([kernel.kallsyms])
c0a1ba90 tcp_sendmsg ([kernel.kallsyms])
c0964540 sock_sendmsg ([kernel.kallsyms])
c048eb10 smb_send_kvec ([kernel.kallsyms])
c04901f8 __smb_send_rqst ([kernel.kallsyms])
c0490a4c cifs_call_async ([kernel.kallsyms])
c04b85dc smb2_async_writev ([kernel.kallsyms])
c047ef1c cifs_writepages ([kernel.kallsyms])
c026de78 do_writepages ([kernel.kallsyms])
c025fd8c filemap_fdatawrite_wbc ([kernel.kallsyms])
c02604bc filemap_write_and_wait_range ([kernel.kallsyms])
c04814b0 cifs_flush ([kernel.kallsyms])
c02e08b4 filp_close ([kernel.kallsyms])
c0309014 put_files_struct ([kernel.kallsyms])
c012b2f8 do_exit ([kernel.kallsyms])
c012bb1c do_group_exit ([kernel.kallsyms])
c012bb80 __wake_up_parent ([kernel.kallsyms])
For CIFS it is only one of the chronyc tracking that fails. It takes forever to execute but after that one the rest is within 100ms limit set with timeout and no cifs stuff in the trace.
Solution
It turns out that exiting i thread will flush outstanding write requests on other threads.
To get around the problem one could drop CLONE_FILES when creating threads. That means that file descriptors aren't shared between the threads. To my knowledge there is no easy way to do that for std::threads or pthreads but it is possible to implement you own function to create threads without CLONE_FILES set.
For reference: https://www.spinics.net/lists/linux-nfs/msg99933.html https://man7.org/linux/man-pages/man2/clone.2.html
Answered By - Elias Näslund Answer Checked By - Dawn Plyler (WPSolving Volunteer)