Caché process failures on RHEL V7.2
InterSystems WRC has investigated several issues of process failure that can be attributed to a recent change in Red Hat Linux.
A new feature implemented in RHEL V7.2 (systemd-219-19.el7.x86_64) can cause O.S. IPC (Inter-process communication) semaphores to be deallocated when a non-system RHEL user logs out (system users, i.e. with a UID number < 1000, are excluded)
Internally, Caché makes use of IPC semaphores to control the operation of Caché processes (for example when trying to wake-up a Caché process). It does this by using the “semop” system service and if the operating system unexpectedly removes semaphores that Caché relies upon for IPC, then processes can fail. If this occurs the following evidence will be found in the cconsole.log:
“System error while trying to wake-up a process, code = 22”
along with corresponding errors being placed into the Caché SYSLOG, such as the following typical example:
Err Process Date/Time Mod Line Routine Namespace
22 39761 09/29/2016 04:41:27PM
61 359 BF0+1359^Ens.Queue.1 HSBUS
This can eventually lead to a Caché instance hang occurring.
A link to an article supplied by Redhat is included below which gives more details about this feature and how it can be disabled:
https://access.redhat.com/solutions/2062273
This issue has been fixed in systemd-219-19.el7_2.4 (released with RHBA-2016-0199(https://rhn.redhat.com/errata/RHBA-2016-0199.html))
You will be happy even for /etc/systemd/logind.conf RemoveIPC=yes if cache installation uses owner=cachesys, group allowed start/stop cache - cachemgr. You might use either names you like instead of cachesys, cachemgr but 'cachesys' != cacheusr (in term of uid) It works even if cachesys uid > 1000 i.e. it is considered as non system process.