Issue
I'm using long-lived grpc streaming for sending notifications from a server to applications. A client subscribes for particular notifications and then the server sends them back to the client. When some server event occurs, clients receive an immediate notification on that event. This rpc will be active while the client is running.
If SIGINT
or SIGTERM
will handled, the server will be shut down like this:
std::unique_ptr<grpc::Server> server(builder.BuildAndStart());
while (!GetSignalToExit()) {
std::this_thread::sleep_for(std::chrono::seconds(1));
}
server->Shutdown();
It works if all active rpc's was finished (I can detect that long-lived grpc streaming has been cancelled on client side and call Finish()
on server side). But it doesn't work if any rpc's still active (client doesn't cancel rpc and I can not call Finish()
on server side while client still subscribed). So I have a deadlock in Shutdown()
:
I0531 14:26:57.547300877 568002 call.cc:1833] grpc_call_start_batch(call=0x626000000170, ops=0x7f6e190fa3e0, nops=1, tag=0x61c000010660, reserved=(nil))
I0531 14:26:57.547415151 568002 call.cc:1322] ops[0]: SEND_MESSAGE ptr=0x61200002d4c0
I0531 14:26:57.547552138 568002 completion_queue.cc:701] cq_end_op_for_next(cq=0x613000000580, tag=0x61c000010660, error=OK, done=0x7f6e1de91ed0, done_arg=0x61300001ff60, storage=0x61300001ffd0)
I0531 14:26:57.547627956 568002 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518018, tv_nsec: 547605372, clock_type: 1 }, reserved=(nil))
I0531 14:26:57.547659786 568007 completion_queue.cc:1083] RETURN_EVENT[0x613000000580]: OP_COMPLETE: tag:0x61c000010660 OK
^CI0531 14:26:58.152863773 568005 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518019, tv_nsec: 152835249, clock_type: 1 }, reserved=(nil))
I0531 14:26:58.152871623 568006 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518019, tv_nsec: 152850642, clock_type: 1 }, reserved=(nil))
I0531 14:26:58.153774912 568003 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518019, tv_nsec: 153756833, clock_type: 1 }, reserved=(nil))
I0531 14:26:58.540090834 568004 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518019, tv_nsec: 540060203, clock_type: 1 }, reserved=(nil))
I0531 14:26:58.548524228 567997 init.cc:171] grpc_init(void)
I0531 14:26:58.548550608 567997 completion_queue.cc:521] grpc_completion_queue_create_internal(completion_type=0, polling_type=0)
I0531 14:26:58.548559168 567997 server.cc:1507] grpc_server_shutdown_and_notify(server=0x612000000f40, cq=0x613000000ac0, tag=0x7ffcd7c151a0)
I0531 14:26:58.548791593 567997 completion_queue.cc:1420] grpc_completion_queue_shutdown(cq=0x613000000ac0)
I0531 14:26:58.548799653 567997 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000ac0, deadline=gpr_timespec { tv_sec: 9223372036854775807, tv_nsec: 0, clock_type: 0 }, reserved=(nil))
I0531 14:26:58.648809379 568002 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518019, tv_nsec: 648783006, clock_type: 1 }, reserved=(nil))
I0531 14:26:59.253870233 568006 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518020, tv_nsec: 253845351, clock_type: 1 }, reserved=(nil))
...
I'm trying to use Shutdown()
with one second deadline, and I see that all rpc's should be forced cencelled with grpc_server_cancel_all_calls()
, but it does not give any effect. It is still blocked in grpc_completion_queue_next()
.
I0531 14:31:22.912214984 568384 call.cc:1833] grpc_call_start_batch(call=0x626000000170, ops=0x7f450d8f83e0, nops=1, tag=0x61c000010660, reserved=(nil))
I0531 14:31:22.912273535 568384 call.cc:1322] ops[0]: SEND_MESSAGE ptr=0x6120000211c0
I0531 14:31:22.912376873 568384 completion_queue.cc:701] cq_end_op_for_next(cq=0x613000000580, tag=0x61c000010660, error=OK, done=0x7f4513691ed0, done_arg=0x61300001ff60, storage=0x61300001ffd0)
I0531 14:31:22.912437076 568384 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518283, tv_nsec: 912413737, clock_type: 1 }, reserved=(nil))
I0531 14:31:22.912579665 568382 completion_queue.cc:1083] RETURN_EVENT[0x613000000580]: OP_COMPLETE: tag:0x61c000010660 OK
^CI0531 14:31:23.520863003 568387 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518284, tv_nsec: 520838434, clock_type: 1 }, reserved=(nil))
I0531 14:31:23.521851049 568383 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518284, tv_nsec: 521831363, clock_type: 1 }, reserved=(nil))
I0531 14:31:23.521886850 568386 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518284, tv_nsec: 521868298, clock_type: 1 }, reserved=(nil))
I0531 14:31:23.546587208 568385 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518284, tv_nsec: 546566221, clock_type: 1 }, reserved=(nil))
I0531 14:31:23.913261372 568377 init.cc:171] grpc_init(void)
I0531 14:31:23.913377031 568377 completion_queue.cc:521] grpc_completion_queue_create_internal(completion_type=0, polling_type=0)
I0531 14:31:23.913415879 568377 server.cc:1507] grpc_server_shutdown_and_notify(server=0x612000000f40, cq=0x613000000ac0, tag=0x7fffbe9f7890)
I0531 14:31:23.914173773 568377 completion_queue.cc:1420] grpc_completion_queue_shutdown(cq=0x613000000ac0)
I0531 14:31:23.914212184 568377 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000ac0, deadline=gpr_timespec { tv_sec: 1685518284, tv_nsec: 913225652, clock_type: 1 }, reserved=(nil))
I0531 14:31:24.013877278 568384 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518285, tv_nsec: 13848147, clock_type: 1 }, reserved=(nil))
I0531 14:31:24.622571986 568387 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518285, tv_nsec: 622518754, clock_type: 1 }, reserved=(nil))
I0531 14:31:24.622796108 568386 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518285, tv_nsec: 622775445, clock_type: 1 }, reserved=(nil))
I0531 14:31:24.623751461 568383 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518285, tv_nsec: 623732426, clock_type: 1 }, reserved=(nil))
I0531 14:31:24.647893715 568385 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518285, tv_nsec: 647873697, clock_type: 1 }, reserved=(nil))
I0531 14:31:24.913724663 568377 server.cc:1515] grpc_server_cancel_all_calls(server=0x612000000f40)
I0531 14:31:24.914575277 568377 completion_queue.cc:701] cq_end_op_for_next(cq=0x613000000580, tag=0x61a00001ffa0, error=OK, done=0x7f4513691ed0, done_arg=0x613000010360, storage=0x6130000103d0)
I0531 14:31:24.914759989 568387 completion_queue.cc:1083] RETURN_EVENT[0x613000000580]: OP_COMPLETE: tag:0x61a00001ffa0 OK
I0531 14:31:24.914849649 568387 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518285, tv_nsec: 914838828, clock_type: 1 }, reserved=(nil))
I0531 14:31:24.915312144 568377 completion_queue.cc:701] cq_end_op_for_next(cq=0x613000000ac0, tag=0x7fffbe9f7890, error=OK, done=0x7f45136b4e00, done_arg=0x612000000f40, storage=0x6060000192f0)
I0531 14:31:25.115158526 568384 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518286, tv_nsec: 115134038, clock_type: 1 }, reserved=(nil))
I0531 14:31:25.723760911 568386 completion_queue.cc:970] grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518286, tv_nsec: 723747543, clock_type: 1 }, reserved=(nil))
...
- Why
Shutdown()
with deadline doesn't cancel all active rpc's? - How can I finish active rpc on server side when signal was handled?
I'm using grpc callback-based asynchronous API
protobuf 3.19.6
gRPC 1.48.4
Solution
gRPC logs were confusing me. It was not a deadlock in Shutdown(), it was deadlock on condition variable inside server.
Answered By - Ilya Lukin Answer Checked By - Robin (WPSolving Admin)