Fun times with Vulkan Loaders

Or how I stopped crashing and learnt to love to game ūüėÄ

I recently figured out what was causing me some issues with games running under Proton/Wine/Lutris. I have both RADV and AMDVLK drivers installed and many games/loaders do not cope well when there are multiple choices. This would often result in crashes in games that have Silver/Gold/Platinum ratings on winedb.

The trick is to have the game runner enforce a single driver. For Steam you can alter the command used to run the game to:

VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/radeon_icd.i686.json:/usr/share/vulkan/icd.d/radeon_icd.x86_64.json %command%
VK_ICD_FILENAMES=/etc/vulkan/icd.d/amd_icd32.json:/etc/vulkan/icd.d/amd_icd64.json %command%

For Lutris, you can add an environment variable for the runners for Epic Games/ Connect etc. Under the games config dialog on the System Options tab you can specify the Environment variables (from above)

Mounting a folder from the host inside an LXC container

sharing folder with host on LXC

1. Create profile (makes life easier)

Take this script (a slightly modified version from here), save it and make it executable and save it.

set -eu 
_UID=$(id -u) 
GID=$(id -g) 
# give lxd permission to map your user/group id through 
grep root:$_UID:1 /etc/subuid -qs || sudo usermod --add-subuids ${_UID}-${_UID} --add-subgids ${GID}-${GID} root 
# set up a separate key to make sure we can log in automatically via ssh 
# with $HOME mounted 
[ -f $PUBKEY ] || ssh-keygen -f $KEY -N '' -C "key for local lxds" 
# create a profile to control this, name it after $USER 
lxc profile create $USER &> /dev/null || true 
# configure profile 
# this will rewrite the whole profile 
cat << EOF | lxc profile edit $USER 
name: $USER 
description: allow home dir mounting for $USER 
  # this part maps uid/gid on the host to the same on the container 
  raw.idmap: | 
    uid $_UID 1000 
    gid $GID 1000 
  # note: user.user-data is still available 
  user.vendor-data: | 
      - name: $USER 
        groups: sudo 
        shell: $SHELL 
        sudo: ['ALL=(ALL) NOPASSWD:ALL'] 
    # ensure users shell is installed 
      - $(dpkg -S $(readlink -m $SHELL) | cut -d: -f1) 

So – this will take care of mapping your UIDs GIDs so you can edit the files in your mount point to your newly added user

2. Create your container

lxc launch ubuntu:18.04 <YOUR-CONTAINER> -p $USER -p default

3. Add you shared folder as a device and mount it

lxc config device add YOUR-CONTAINER YOUR-DEVICE-NAME disk source=/home/<user>/<projects-folder> path=/home/<user>/<projects-folder>

Congratulations – you now have a shiny Ubuntu 18.04 container to do your dev work in.

You  might want to jump in and install some important stuff

lxc exec YOUR-CONTAINER /bin/bash
$ apt install build-essential ....

Hacking a Cisco/Linksys NSS6000

So I was given a Cisco/Linksys NSS6000 to upgrade and root.  Luckily I have was also provisioned with the instructions to root this machine.

Thanks to some hacker types that had already been and done this the process was relatively straight forward.

  1. Create User
  2. Insert USB Key
  3. Backup Configuration onto USB Key
  4. Unmount USB Key
  5. Dive into the tar ball (which is simply /etc) and:
    1. Change the root password in etc/password – I just copied my new users password!
    2. Added the following line to etc/cron.d/root
      */5 * * * * /usr/sbin/
  6. Tar the extracted files
  7. Put the tarball back on the USB drive
  8. Mount it in the NAS and Restore from backup
  9. Profit, Right!?

Well,  nearly. I had a couple of issues:

Incorrect tarball Permissions

So, my first derp was when I tar’d the etc folder back up and well… instead of root owning everything, you get the picture.

What happens is that you get an error like this:

Warning: touch(): Unable to create file /etc/nas/ran_wizard because Permission denied in /www/html/index.php on line 48

And you end up getting into a loop with dialog boxes and never ending redirects to the same page.

The down side of this is that you cannot get to any other pages in the administration to even consider doing a factory reset. Luckily you *can* post to it still

curl --data "p=admin&s=maintenance&restore_all=Restore+ALL+Settings+to+Factory+Defaults" http://admin:admin@

This command will reset the device to factory defaults – you should change the IP address and user/pass to what you need it to be. It obviously uses cURL so you will need that too ūüôā

We also tried to overwrite the start of the one disk we had (using dd if=/dev/zero…) in the machine to see if that would work – but alas it did not. We were able to go through the setup wizard again, but ended back at the loop we had before.

SSH Connection Closed

The second problem I had was ssh’ing to the box. We discovered that if we used an older version of openSSH we could connect to it, but newer versions of openSSH would just not connect.

Pro Tip: Putty connects fine ūüôā

To be able to connect to older dropbear’s with newer openSSH clients – try this in your ssh_config file

jason@workstation:~$ cat ~/.ssh/config 
	Ciphers aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,arcfour,aes192-cbc,aes256-cbc

I hope this information saves someone a few hours of frustrations.

rm: Too Many Files

Ever come across a folder you need to delete but there are too many files in it?

Basically the shell expansion of * attempts to put everything on the commandline – so:

jason@server:~/images/# rm *

turns into

jason@server:~/images/# rm image1.jpg image2.jpg image3.jpg image4.jpg...

and there is a limit (albeit rather large) on the length of a command this can be a pain to try and figure out which files to delete on mass to get rid of the folder.

Fortunately there is some awesome commandline foo that you can do – and here it is:

ls -1 | tr '\n' '\0' | sed 's/ /\\ /' | xargs -0 rm

xargs will append all the file names onto the end of the rm command and run as many as needed to delete all the files. The explanation of this command is:

  1. List all files in the current folder, one per line
  2. Change all newline characters to null characters (better for xargs to split upon)
  3. Escape all the spaces in file names
  4. Finally run the rm command via xargs
  5. we could simplify this a little if we only wanted to remove jpeg images

    find -type f -name \*.jpg -print0 | xargs -0 rm

Why is my Quad Core VPS Running Slowly?

Or how a host schedules CPU cycles.

So I learnt an interesting tidbit of information the other day to do with a VPS and why it had high load and bugger all CPU usage. If you see something similar to this in top:

top - 20:30:53 up 8 days,  6:42,  1 user,  load average: 9.37, 10.81, 9.67
Tasks: 135 total,  12 running, 133 sleeping,   0 stopped,   0 zombie
Cpu(s): 30.8%us,  0.8%sy,  0.0%ni, 67.8%id,  0.5%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3790268k total,  3621780k used,   168488k free,   350528k buffers
Swap:  1830908k total,    11464k used,  1819444k free,  2753548k cached

over a long period of time, and you have more than 2 CPUs in your VPS – consider dropping back to 2 vCPU’s.

But Why? Surely 4 CPUs is Better than 2!

They certainly are, but when the underlying host gets busy, slots for 2 vCPU hosts get scheduled a lot more often than slots for quad vCPU’s – this is all down to the scheduler.

The case for turning it off and on again

So the other day we started having issues with our mail server. The symptom was the mail queue showing hundreds of emails with a message like “SMTP Server rejected at greeting”. Amavis (the mail scanner / coordinator) was rejecting mail and ClamAV was not working properly. We found that simply restarting the Amavis daemon and flushing the mail queue would resolve the problem for a short time before it would happen again.

The postfix mail queue spikes
The postfix mail queue spikes

Before we managed to resolve the problem, it happened over and over again, with more and more frequency as you can see in the above graph.

The resolution? I restarted the server and waved goodbye to the 200+ day uptime. Because the file system had not been fsck’d in such a long time, it was forced and low and behold there were busted inodes and file system errors. These problems were fixed and since then the mail server has been happily behaving itself. I am also no longer scratching my head as to why load was so high while processing the mail queue and why Amavis was failing!

Have you tried turning it off and then on again?
The IT Crowd – Have you tried turning it off and then on again?

That said, I will not be reaching for the turning it off and then on again approach to resolve all the problems we encounter, as most of them can be fixed quickly if you look through the logs!

Using varnish as a HTTP Router

A Layer 7 Routing Option

So one of the novel uses I have put the Varnish Cache to is a HTTP (Layer 7) Router.

Our Setup:

We have a single IP address that forwards port 80 to a Virtual Machine. This virtual machine runs varnish. We have a whole number of virtual machines that we use for development and need to be accessible from the great wild web. How do we do this?

HTTP Router Setup

The simple solution is to setup multiple backend definitions and do if statements on the

backend int_dev_server_1 {
    .host = "";
    .port = "80";

backend int_dev_server_2 {
    .host = "";
    .port = "80";

sub vcl_recv {
	// ... your normal config stuff

	if ( ~ "^(.*)") {
	    set req.backend = int_dev_server_1;
	if ( ~ "^(.*)") {
	    set req.backend = int_dev_server_2;


when fails to notice disk size increases

So usually when I am doing online capacity expansions of vmware/raid devices I use a tool called “” and it works. It detects the size increase on the disk and then I run a resize2fs /dev/sdb or something like that.
Well, I have come across a time when this tool does not work and I need to do things the hard way. Luckily the hard way is rather simple.

  1. Get the SCSI id of the device you want to use. The tool “lsscsi” does this nicely
    # lsscsi
    [1:0:0:0] cd/dvd NECVMWar VMware IDE CDR10 1.00 /dev/sr0
    [2:0:0:0] disk VMware Virtual disk 1.0 /dev/sda
    [2:0:1:0] disk VMware Virtual disk 1.0 /dev/sdb
  2. Now comes the hard part, you need to tell the server to rescan this device.
    echo 1 > /sys/bus/scsi/devices/2\:0\:1\:0/rescan
  3. And now you watch dmesg for the expansion message!
    dmesg | grep change
    [4768364.446120] sdb: detected capacity change from 171798691840 to 268435456000
    [4768364.834677] VFS: busy inodes on changed media or resized disk sdb

MD Raid 5 with 2 dropped disks

So today I was happily going about stuff when there was a brown out. nothing turned off so I was happy to continue going about doing whatever I was doing. Until I got these emails

Subject: Fail event on /dev/md1:server
This is an automatically generated mail message from mdadm
running on server

A Fail event had been detected on md device /dev/md1.

It could be related to component device /dev/sdg.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdi[3] sdh[1](F) sdg[0](F)
2930274304 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/1] [__U]

md0 : active raid5 sda[5] sdb[0] sdd[1] sde[3] sdc[4]
7814051840 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]

unused devices: <none>


Subject: Fail event on /dev/md1:server

This is an automatically generated mail message from mdadm
running on server

A Fail event had been detected on md device /dev/md1.

It could be related to component device /dev/sdh.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdi[3] sdh[1](F) sdg[0](F)
2930274304 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/1] [__U]

md0 : active raid5 sda[5] sdb[0] sdd[1] sde[3] sdc[4]
7814051840 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]

unused devices: <none>

After the small panic attack that my backups raid array had gone walk went about trying to fix it.

Root Cause Analysis

So It turns out that the power brown out had knocked two of my disks offline for a brief second and when they came back online Ubuntu said “oh hey – new disks!” and promptly decided to give them new names. This in turn caused my /dev/md1 array to go bonkers.

The Fix

Well, I wanted this fixed quick – so I turned my Linux server off and then on again – thinking that the disks will renumber themselves back into what they should be and the MD1 array would reassemble.

I was half right, I still needed the array to assemble and run after the reboot. Here is the command I used to get things going again.

mdadm --assemble /dev/md1 --force --run

and there we are, making an MD array run properly after power failure. I am just lucky that there was no writing going on at that time.

Maybe time for a UPS and some ZFS loving?