[process] Process.Spawn leaves the child in zombie state on posix
Hi, everyone! This bug was observed in 2022.07.05, using boost version 1.79.0. Similar behaviour was commented on before on online messageboards. Bug description When starting a new, detached process with Process.Spawn in a posix system, if the parent process outlives the child, the child process remains in zombie state for the parent process' lifetime. The bug described above is demonstrated in the CMake project attached to this document. Analysis The spawn function injects the syscall "signal(SIGCHLD, SIG_IGN)" with a functor of type boost::process::detail::posix::sig_init_. This is done to the forked child process, to no avail. And this is not done (and should not be done) in the parent process being too intrusive. Possible mitigation Introducing an in-between forked process that serves as the parent of the to-be-spawned process with SIGCHLD set to SIG_IGN would prevent the spawned process to become zombie, and simultaneously does not disturb the parent process' signal handlers. Introducing another class alternative to sig_init_ with the functionality described above would be a reasonable approach. An implementation sketch of the double fork method can be found in the attached project. System used OS: Ubuntu 18.04.6 LTS arch: x86_64 compiler: gcc-7.5.0 libc: libc-2.27 boost: 1.79.0 (built from source) Best regards, Benedek Tass
On 7/6/22 16:28, Benedek Tass via Boost wrote:
Hi, everyone!
This bug was observed in 2022.07.05, using boost version 1.79.0. Similar behaviour was commented on before on online messageboards.
Bug description
When starting a new, detached process with Process.Spawn in a posix system, if the parent process outlives the child, the child process remains in zombie state for the parent process' lifetime. The bug described above is demonstrated in the CMake project attached to this document.
Analysis
The spawn function injects the syscall "signal(SIGCHLD, SIG_IGN)" with a functor of type boost::process::detail::posix::sig_init_. This is done to the forked child process, to no avail. And this is not done (and should not be done) in the parent process being too intrusive.
Possible mitigation
Introducing an in-between forked process that serves as the parent of the to-be-spawned process with SIGCHLD set to SIG_IGN would prevent the spawned process to become zombie, and simultaneously does not disturb the parent process' signal handlers. Introducing another class alternative to sig_init_ with the functionality described above would be a reasonable approach. An implementation sketch of the double fork method can be found in the attached project.
System used
OS: Ubuntu 18.04.6 LTS arch: x86_64 compiler: gcc-7.5.0 libc: libc-2.27 boost: 1.79.0 (built from source)
You should probably report bugs on GitHub: https://github.com/boostorg/process/issues In the bug, it is always desirable to post a small compilable code sample that reproduces the issue. Regarding the proposed fix, it's not clear how introducing an intermediate process would fix the parent not calling waitpid() or equivalent. You'd just get a different zombie process.
I was planning to send an example project, I'm sending it now as an
attachment.
As the demo sketch shows, I would absolutely call waitpid() in the parent
process, the sig_init_-like class' on_success() method would be a
reasonable place for it. The current implementation doesn't lend itself to
an easy fix in case of this bug, I don't have a fully fletched idea how to
do it.
Andrey Semashev via Boost
On 7/6/22 16:28, Benedek Tass via Boost wrote:
Hi, everyone!
This bug was observed in 2022.07.05, using boost version 1.79.0. Similar behaviour was commented on before on online messageboards.
Bug description
When starting a new, detached process with Process.Spawn in a posix system, if the parent process outlives the child, the child process remains in zombie state for the parent process' lifetime. The bug described above is demonstrated in the CMake project attached to this document.
Analysis
The spawn function injects the syscall "signal(SIGCHLD, SIG_IGN)" with a functor of type boost::process::detail::posix::sig_init_. This is done to the forked child process, to no avail. And this is not done (and should not be done) in the parent process being too intrusive.
Possible mitigation
Introducing an in-between forked process that serves as the parent of the to-be-spawned process with SIGCHLD set to SIG_IGN would prevent the spawned process to become zombie, and simultaneously does not disturb the parent process' signal handlers. Introducing another class alternative to sig_init_ with the functionality described above would be a reasonable approach. An implementation sketch of the double fork method can be found in the attached project.
System used
OS: Ubuntu 18.04.6 LTS arch: x86_64 compiler: gcc-7.5.0 libc: libc-2.27 boost: 1.79.0 (built from source)
You should probably report bugs on GitHub:
https://github.com/boostorg/process/issues
In the bug, it is always desirable to post a small compilable code sample that reproduces the issue.
Regarding the proposed fix, it's not clear how introducing an intermediate process would fix the parent not calling waitpid() or equivalent. You'd just get a different zombie process.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
I don't think this is a solvable problem.
The reason is that you need to set SIGCHLD to SIGIGN for the whole
application in order to prevent zombies. Doesn't really work as a general
solution, which is why I'll deprecate that function at some point.
You can probably `waitpid(-1, &status, 0); from time to time to reap the
zombie processes or you can put them in a process group; but I couldn't
come up with a satisfying solution, which is why the only way will be to
just hold a handle to the child process.
On Thu, Jul 7, 2022 at 7:08 PM Benedek Tass via Boost
I was planning to send an example project, I'm sending it now as an attachment. As the demo sketch shows, I would absolutely call waitpid() in the parent process, the sig_init_-like class' on_success() method would be a reasonable place for it. The current implementation doesn't lend itself to an easy fix in case of this bug, I don't have a fully fletched idea how to do it.
Andrey Semashev via Boost
ezt írta (időpont: 2022. júl. 6., Sze, 17:28): Hi, everyone!
This bug was observed in 2022.07.05, using boost version 1.79.0. Similar behaviour was commented on before on online messageboards.
Bug description
When starting a new, detached process with Process.Spawn in a posix system, if the parent process outlives the child, the child process remains in zombie state for the parent process' lifetime. The bug described above is demonstrated in the CMake project attached to this document.
Analysis
The spawn function injects the syscall "signal(SIGCHLD, SIG_IGN)" with a functor of type boost::process::detail::posix::sig_init_. This is done to the forked child process, to no avail. And this is not done (and should not be done) in the parent process being too intrusive.
Possible mitigation
Introducing an in-between forked process that serves as the parent of
to-be-spawned process with SIGCHLD set to SIG_IGN would prevent the spawned process to become zombie, and simultaneously does not disturb the
On 7/6/22 16:28, Benedek Tass via Boost wrote: the parent
process' signal handlers. Introducing another class alternative to sig_init_ with the functionality described above would be a reasonable approach. An implementation sketch of the double fork method can be found in the attached project.
System used
OS: Ubuntu 18.04.6 LTS arch: x86_64 compiler: gcc-7.5.0 libc: libc-2.27 boost: 1.79.0 (built from source)
You should probably report bugs on GitHub:
https://github.com/boostorg/process/issues
In the bug, it is always desirable to post a small compilable code sample that reproduces the issue.
Regarding the proposed fix, it's not clear how introducing an intermediate process would fix the parent not calling waitpid() or equivalent. You'd just get a different zombie process.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Have you checked out the sketch i provided? It seems to me that it solves
the problem reasonably well.
Klemens Morgenstern via Boost
I don't think this is a solvable problem. The reason is that you need to set SIGCHLD to SIGIGN for the whole application in order to prevent zombies. Doesn't really work as a general solution, which is why I'll deprecate that function at some point.
You can probably `waitpid(-1, &status, 0); from time to time to reap the zombie processes or you can put them in a process group; but I couldn't come up with a satisfying solution, which is why the only way will be to just hold a handle to the child process.
On Thu, Jul 7, 2022 at 7:08 PM Benedek Tass via Boost < boost@lists.boost.org> wrote:
I was planning to send an example project, I'm sending it now as an attachment. As the demo sketch shows, I would absolutely call waitpid() in the parent process, the sig_init_-like class' on_success() method would be a reasonable place for it. The current implementation doesn't lend itself to an easy fix in case of this bug, I don't have a fully fletched idea how to do it.
Andrey Semashev via Boost
ezt írta (időpont: júl. 6., Sze, 17:28):
Hi, everyone!
This bug was observed in 2022.07.05, using boost version 1.79.0. Similar behaviour was commented on before on online messageboards.
Bug description
When starting a new, detached process with Process.Spawn in a posix system, if the parent process outlives the child, the child process remains in zombie state for the parent process' lifetime. The bug described above is demonstrated in the CMake project attached to this document.
Analysis
The spawn function injects the syscall "signal(SIGCHLD, SIG_IGN)" with a functor of type boost::process::detail::posix::sig_init_. This is done to the forked child process, to no avail. And this is not done (and should not be done) in the parent process being too intrusive.
Possible mitigation
Introducing an in-between forked process that serves as the parent of
to-be-spawned process with SIGCHLD set to SIG_IGN would prevent the spawned process to become zombie, and simultaneously does not disturb the
On 7/6/22 16:28, Benedek Tass via Boost wrote: the parent
process' signal handlers. Introducing another class alternative to sig_init_ with the functionality described above would be a reasonable approach. An implementation sketch of the double fork method can be found in the attached project.
System used
OS: Ubuntu 18.04.6 LTS arch: x86_64 compiler: gcc-7.5.0 libc: libc-2.27 boost: 1.79.0 (built from source)
You should probably report bugs on GitHub:
https://github.com/boostorg/process/issues
In the bug, it is always desirable to post a small compilable code sample that reproduces the issue.
Regarding the proposed fix, it's not clear how introducing an intermediate process would fix the parent not calling waitpid() or equivalent. You'd just get a different zombie process.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
The following figure may clarify the idea i propose:
PARENT -->fork()----------------------------------waitpid()--->
\ /
\ signal(SIGCHLD,SIG_IGN) /
CHILD fork()----signal(SIGHUP,SIG_IGN)----exit() # this process
takes the burden of ignoring SIGCHLD from parent
\ setsid()
\
GRANDCHILD execve()------>
Benedek Tass
Have you checked out the sketch i provided? It seems to me that it solves the problem reasonably well.
Klemens Morgenstern via Boost
ezt írta (időpont: 2022. júl. 7., Cs, 13:22): I don't think this is a solvable problem. The reason is that you need to set SIGCHLD to SIGIGN for the whole application in order to prevent zombies. Doesn't really work as a general solution, which is why I'll deprecate that function at some point.
You can probably `waitpid(-1, &status, 0); from time to time to reap the zombie processes or you can put them in a process group; but I couldn't come up with a satisfying solution, which is why the only way will be to just hold a handle to the child process.
On Thu, Jul 7, 2022 at 7:08 PM Benedek Tass via Boost < boost@lists.boost.org> wrote:
I was planning to send an example project, I'm sending it now as an attachment. As the demo sketch shows, I would absolutely call waitpid() in the parent process, the sig_init_-like class' on_success() method would be a reasonable place for it. The current implementation doesn't lend itself to an easy fix in case of this bug, I don't have a fully fletched idea how to do it.
Andrey Semashev via Boost
ezt írta (időpont: júl. 6., Sze, 17:28):
Hi, everyone!
This bug was observed in 2022.07.05, using boost version 1.79.0. Similar behaviour was commented on before on online messageboards.
Bug description
When starting a new, detached process with Process.Spawn in a posix system, if the parent process outlives the child, the child process remains in zombie state for the parent process' lifetime. The bug described above is demonstrated in the CMake project attached to this document.
Analysis
The spawn function injects the syscall "signal(SIGCHLD, SIG_IGN)" with a functor of type boost::process::detail::posix::sig_init_. This is done to the forked child process, to no avail. And this is not done (and should not be done) in the parent process being too intrusive.
Possible mitigation
Introducing an in-between forked process that serves as the parent of
to-be-spawned process with SIGCHLD set to SIG_IGN would prevent the spawned process to become zombie, and simultaneously does not disturb the
On 7/6/22 16:28, Benedek Tass via Boost wrote: the parent
process' signal handlers. Introducing another class alternative to sig_init_ with the functionality described above would be a reasonable approach. An implementation sketch of the double fork method can be found in the attached project.
System used
OS: Ubuntu 18.04.6 LTS arch: x86_64 compiler: gcc-7.5.0 libc: libc-2.27 boost: 1.79.0 (built from source)
You should probably report bugs on GitHub:
https://github.com/boostorg/process/issues
In the bug, it is always desirable to post a small compilable code sample that reproduces the issue.
Regarding the proposed fix, it's not clear how introducing an intermediate process would fix the parent not calling waitpid() or equivalent. You'd just get a different zombie process.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 7/7/22 14:21, Klemens Morgenstern via Boost wrote:
I don't think this is a solvable problem. The reason is that you need to set SIGCHLD to SIGIGN for the whole application in order to prevent zombies. Doesn't really work as a general solution, which is why I'll deprecate that function at some point.
You can probably `waitpid(-1, &status, 0); from time to time to reap the zombie processes or you can put them in a process group; but I couldn't come up with a satisfying solution, which is why the only way will be to just hold a handle to the child process.
Isn't SIGCHLD main purpose exactly to call waitpid? If anything, you should be recommending to either plug a Boost.Process API call in user's SIGCHLD handler (that will call waitpid internally) or allow the user to set Boost.Process' own SIGCHLD handler that will do this. Ignoring SIGCHLD does not seem like a right thing to do. If Boost.Process doesn't allow to join terminated processes as they terminate, this seems like a major design flaw to me. Asking users to call waitpid periodically sounds like a kludge.
participants (3)
-
Andrey Semashev
-
Benedek Tass
-
Klemens Morgenstern