By Adrian Bridgett | 2023-10-05
Overview
We all have some legacy systems out there. That can mean lots of SSH sessions.
Occassionally my sessions would hang. One day not only did I find an easy way to reproduce the issue, but I had enough time to start to investigate. What seemed like a quick fix then turned into a long tale of how to make one tiny change on a modern Mac without making security faux pas.
The simple problem
My test case was to SSH into a VM, then run Clush from there to a few hundred servers (limited to 75 simultaneously, I’m not greedy). If I tried this, my entire SSH session would hang. Not even Esc-~
would work.
Turning up the SSH debug level showed me an issue with my Mac’s SSH agent. When my VM wishes to login to one of those hundred+ servers, it needs an authentication key and that’s not saved on the VM - so it asks my Mac. It does this by talking to the ssh-agent
program on my Mac which then kindly tells it the private key. This request runs over the existing SSH connection from my Mac to the VM. NB: only my VM is allowed to do this as it’s a security risk (i.e. those hundred servers do not get access to that key).
The solution was near, running lsof
on ssh-agent
I could see that the agent would use 256 file descriptors and then hang. This was because when launctl
started it, the maxfiles
limit is set low (as a safety net). I tested my hypothesis by running a separate ssh-agent
(which had a higher maxfiles
as I had started it). That worked - now all I had to so was to increase that limit for the one that launchctl created.
This is where my work really started.
The painful problem
On Linux this would typically involve adding a one-line systemd
override, reloading systemd
and restarting the ssh-agent
service. On my Mac things were rather harder.
First let’s see those launchctl
defaults:
$ launchctl limit
cpu unlimited unlimited
filesize unlimited unlimited
data unlimited unlimited
stack 8388608 67104768
core 0 unlimited
rss unlimited unlimited
memlock unlimited unlimited
maxproc 2784 4176
maxfiles 256 unlimited
Some posts will say that you can change this:
- edit /etc/launchd.conf
- sudo launchctl limit maxfiles 1024 8000000 WARNING
- more attempts
However this doesn’t work on the latest version (at least not without doing dodgy stuff). Regardless, the right approach is to bump it only for ssh-agent
.
First look at the current launch configuration at /System/Library/LaunchAgents/com.openssh.ssh-agent.plist
:
System/Library/LaunchAgents/com.openssh.ssh-agent.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.openssh.ssh-agent</string>
<key>ProgramArguments</key>
<array>
<string>/usr/bin/ssh-agent</string>
<string>-l</string>
</array>
<key>Sockets</key>
<dict>
<key>Listeners</key>
<dict>
<key>SecureSocketWithKey</key>
<string>SSH_AUTH_SOCK</string>
</dict>
</dict>
<key>EnableTransactions</key>
<true/>
</dict>
</plist>
Alternatively you can run launchctl print gui/$(id -u)/com.openssh.ssh-agent
- this seems to be generated from the above file and is much longer:
launchctl print gui/$(id -u)/com.openssh.ssh-agent
output
com.openssh.ssh-agent = {
active count = 1
copy count = 0
one shot = 0
path = /System/Library/LaunchAgents/com.openssh.ssh-agent.plist
state = running
program = /usr/bin/ssh-agent
arguments = {
/usr/bin/ssh-agent
-l
}
inherited environment = {
DISPLAY => /private/tmp/com.apple.launchd.9fCei5mSDS/org.xquartz:0
SSH_AUTH_SOCK => /private/tmp/com.apple.launchd.KvL8blgVGz/Listeners
}
default environment = {
PATH => /usr/bin:/bin:/usr/sbin:/sbin
}
environment = {
MallocSpaceEfficient => 1
XPC_SERVICE_NAME => com.openssh.ssh-agent
}
domain = com.apple.xpc.launchd.user.domain.501.100018.Aqua
asid = 100018
minimum runtime = 10
exit timeout = 5
runs = 1
successive crashes = 0
pid = 88520
immediate reason = ipc (socket)
forks = 0
execs = 1
initialized = 1
trampolined = 1
started suspended = 0
proxy started suspended = 0
last exit code = (never exited)
event triggers = {
}
endpoints = {
}
dynamic endpoints = {
}
pid-local endpoints = {
}
instance-specific endpoints = {
}
event channels = {
}
sockets = {
"Listeners" = {
type = stream
path = /private/tmp/com.apple.launchd.KvL8blgVGz/Listeners
secure key = SSH_AUTH_SOCK
owner uid = 501
group id = 0
sockets = {
25 (bytes to read)
}
active = 1
passive = 1
bonjour = 0
ipv4v6 = 0
receive_packet_info = 0
}
}
instances = {
}
spawn type = daemon (3)
jetsam priority = 18
jetsam memory limit (active, soft) = 15 MB
jetsam memory limit (inactive, soft) = 15 MB
jetsamproperties category = daemon
jetsam thread limit = 32
cpumon = default
job state = running
properties = {
partial import = 0
xpc bundle = 0
keepalive = 0
runatload = 0
low priority i/o = 0
low priority background i/o = 0
dataless file mode = 0
legacy timer behavior = 0
exception handler = 0
supports transactions = 1
supports pressured exit = 0
supports idle hysteresis = 0
enter kdp before kill = 0
wait for debugger = 0
app = 0
system app = 0
creates session = 0
inetd-compatible = 0
inetd listener = 0
abandon process group = 0
event monitor = 0
penalty box = 0
role account = 0
launch only once = 0
system support = 0
inferred program = 1
joins gui session = 0
joins host session = 0
parameterized sandbox = 0
resolve program = 0
abandon coalition = 0
high bits aslr = 0
reslide shared cache = 0
disable resliding = 0
extension = 0
nano allocator = 0
no initgroups = 0
start on fs mount = 0
needs implicit endpoint = 0
is copy = 0
disallow all lookups = 0
system service = 1
protected by submitter = 0
multiple instances = 0
}
Some people try to edit the /System/Library/LaunchAgents/com.openssh.ssh-agent.plist
file, however this is rather difficult with SIP.
What you can do is to disable the default service and create a similar one with an increased limit.
The solution
First we prepare the new service:
cp /System/Library/LaunchAgents/com.openssh.ssh-agent.plist ~/Library/LaunchAgents/com.openssh.ssh-agent-maxfiles.plist
- Edit it like this:
- Change the
Label
tocom.openssh.ssh-agent-maxfiles
- Add a new
SoftResourceLimits
key as shows in the file below:
- Change the
~/Library/LaunchAgents/com.openssh.ssh-agent-maxfiles.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.openssh.ssh-agent-maxfiles</string>
<key>ProgramArguments</key>
<array>
<string>/usr/bin/ssh-agent</string>
<string>-l</string>
</array>
<key>Sockets</key>
<dict>
<key>Listeners</key>
<dict>
<key>SecureSocketWithKey</key>
<string>SSH_AUTH_SOCK</string>
</dict>
</dict>
<key>EnableTransactions</key>
<true/>
<key>SoftResourceLimits</key>
<dict>
<key>NumberOfFiles</key>
<integer>10240</integer>
</dict>
</dict>
</plist>
Now we stop the old SSH agent:
launchctl disable gui/$(id -u)/com.openssh.ssh-agent
launchctl stop gui/$(id -u)/com.openssh.ssh-agent
launchctl kill SIGTERM gui/$(id -u)/com.openssh.ssh-agent
Check it’s actually dead - you might have to kill it:
ps -ef |grep ssh-agent
killall ssh-agent
Load and start the new SSH agent:
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.openssh.ssh-agent-maxfiles.plist
launchctl start gui/$(id -u)/com.openssh.ssh-agent-maxfiles
You will need to update your SSH_AUTH_SOCK
environment variable (logging in/out is easiest).
If you want to show disabled services:
launchctl print-disabled gui/$(id -u)
Congratulations, enjoy your new SSH powers!
Summary
Mac’s increased security and lockdown (immutable filesystems) can make previously simple steps harder. Finding good solutions can take a lot of hunting - especially up to date ones.
This blog post is only possible by all the amazing websites out there. In particular: