Category Archives: Linux

Hacking a Cisco/Linksys NSS6000

So I was given a Cisco/Linksys NSS6000 to upgrade and root.  Luckily I have was also provisioned with the instructions to root this machine.

Thanks to some hacker types that had already been and done this the process was relatively straight forward.

  1. Create User
  2. Insert USB Key
  3. Backup Configuration onto USB Key
  4. Unmount USB Key
  5. Dive into the tar ball (which is simply /etc) and:
    1. Change the root password in etc/password – I just copied my new users password!
    2. Added the following line to etc/cron.d/root
      */5 * * * * /usr/sbin/dropbear_start.sh
  6. Tar the extracted files
  7. Put the tarball back on the USB drive
  8. Mount it in the NAS and Restore from backup
  9. Profit, Right!?

Well,  nearly. I had a couple of issues:

Incorrect tarball Permissions

So, my first derp was when I tar’d the etc folder back up and well… instead of root owning everything, you get the picture.

What happens is that you get an error like this:

Warning: touch(): Unable to create file /etc/nas/ran_wizard because Permission denied in /www/html/index.php on line 48

And you end up getting into a loop with dialog boxes and never ending redirects to the same page.

The down side of this is that you cannot get to any other pages in the administration to even consider doing a factory reset. Luckily you *can* post to it still

curl --data "p=admin&s=maintenance&restore_all=Restore+ALL+Settings+to+Factory+Defaults" http://admin:admin@192.168.0.2/index.php

This command will reset the device to factory defaults – you should change the IP address and user/pass to what you need it to be. It obviously uses cURL so you will need that too 🙂

We also tried to overwrite the start of the one disk we had (using dd if=/dev/zero…) in the machine to see if that would work – but alas it did not. We were able to go through the setup wizard again, but ended back at the loop we had before.

SSH Connection Closed

The second problem I had was ssh’ing to the box. We discovered that if we used an older version of openSSH we could connect to it, but newer versions of openSSH would just not connect.

Pro Tip: Putty connects fine 🙂

To be able to connect to older dropbear’s with newer openSSH clients – try this in your ssh_config file

jason@workstation:~$ cat ~/.ssh/config 
host 192.168.0.2
	Ciphers aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,arcfour,aes192-cbc,aes256-cbc

I hope this information saves someone a few hours of frustrations.

rm: Too Many Files

Ever come across a folder you need to delete but there are too many files in it?

Basically the shell expansion of * attempts to put everything on the commandline – so:

jason@server:~/images/# rm *

turns into

jason@server:~/images/# rm image1.jpg image2.jpg image3.jpg image4.jpg...

and there is a limit (albeit rather large) on the length of a command this can be a pain to try and figure out which files to delete on mass to get rid of the folder.

Fortunately there is some awesome commandline foo that you can do – and here it is:

ls -1 | tr '\n' '\0' | sed 's/ /\\ /' | xargs -0 rm

xargs will append all the file names onto the end of the rm command and run as many as needed to delete all the files. The explanation of this command is:

  1. List all files in the current folder, one per line
  2. Change all newline characters to null characters (better for xargs to split upon)
  3. Escape all the spaces in file names
  4. Finally run the rm command via xargs
  5. we could simplify this a little if we only wanted to remove jpeg images

    find -type f -name \*.jpg -print0 | xargs -0 rm

Why is my Quad Core VPS Running Slowly?

Or how a host schedules CPU cycles.

So I learnt an interesting tidbit of information the other day to do with a VPS and why it had high load and bugger all CPU usage. If you see something similar to this in top:

top - 20:30:53 up 8 days,  6:42,  1 user,  load average: 9.37, 10.81, 9.67
Tasks: 135 total,  12 running, 133 sleeping,   0 stopped,   0 zombie
Cpu(s): 30.8%us,  0.8%sy,  0.0%ni, 67.8%id,  0.5%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3790268k total,  3621780k used,   168488k free,   350528k buffers
Swap:  1830908k total,    11464k used,  1819444k free,  2753548k cached

over a long period of time, and you have more than 2 CPUs in your VPS – consider dropping back to 2 vCPU’s.

But Why? Surely 4 CPUs is Better than 2!

They certainly are, but when the underlying host gets busy, slots for 2 vCPU hosts get scheduled a lot more often than slots for quad vCPU’s – this is all down to the scheduler.

The case for turning it off and on again

So the other day we started having issues with our mail server. The symptom was the mail queue showing hundreds of emails with a message like “SMTP Server rejected at greeting”. Amavis (the mail scanner / coordinator) was rejecting mail and ClamAV was not working properly. We found that simply restarting the Amavis daemon and flushing the mail queue would resolve the problem for a short time before it would happen again.

The postfix mail queue spikes
The postfix mail queue spikes

Before we managed to resolve the problem, it happened over and over again, with more and more frequency as you can see in the above graph.

The resolution? I restarted the server and waved goodbye to the 200+ day uptime. Because the file system had not been fsck’d in such a long time, it was forced and low and behold there were busted inodes and file system errors. These problems were fixed and since then the mail server has been happily behaving itself. I am also no longer scratching my head as to why load was so high while processing the mail queue and why Amavis was failing!

Have you tried turning it off and then on again?
The IT Crowd – Have you tried turning it off and then on again?

That said, I will not be reaching for the turning it off and then on again approach to resolve all the problems we encounter, as most of them can be fixed quickly if you look through the logs!

Using varnish as a HTTP Router

A Layer 7 Routing Option

So one of the novel uses I have put the Varnish Cache to is a HTTP (Layer 7) Router.

Our Setup:

We have a single IP address that forwards port 80 to a Virtual Machine. This virtual machine runs varnish. We have a whole number of virtual machines that we use for development and need to be accessible from the great wild web. How do we do this?

HTTP Router Setup

The simple solution is to setup multiple backend definitions and do if statements on the req.http.host.

backend int_dev_server_1 {
    .host = "10.1.2.1";
    .port = "80";
}

backend int_dev_server_2 {
    .host = "10.1.2.2";
    .port = "80";
}

sub vcl_recv {
	// ... your normal config stuff

	if (req.http.host ~ "^(.*)dev-server-1.example.com") {
	    set req.backend = int_dev_server_1;
	    return(pipe);
	}
	if (req.http.host ~ "^(.*)dev-server-2.example.com") {
	    set req.backend = int_dev_server_2;
	    return(pipe);
	}

}

when rescan-scsi-bus.sh fails to notice disk size increases

So usually when I am doing online capacity expansions of vmware/raid devices I use a tool called “rescan-scsi-bus.sh” and it works. It detects the size increase on the disk and then I run a resize2fs /dev/sdb or something like that.
Well, I have come across a time when this tool does not work and I need to do things the hard way. Luckily the hard way is rather simple.

  1. Get the SCSI id of the device you want to use. The tool “lsscsi” does this nicely
    # lsscsi
    [1:0:0:0] cd/dvd NECVMWar VMware IDE CDR10 1.00 /dev/sr0
    [2:0:0:0] disk VMware Virtual disk 1.0 /dev/sda
    [2:0:1:0] disk VMware Virtual disk 1.0 /dev/sdb
  2. Now comes the hard part, you need to tell the server to rescan this device.
    echo 1 > /sys/bus/scsi/devices/2\:0\:1\:0/rescan
  3. And now you watch dmesg for the expansion message!
    dmesg | grep change
    [4768364.446120] sdb: detected capacity change from 171798691840 to 268435456000
    [4768364.834677] VFS: busy inodes on changed media or resized disk sdb

MD Raid 5 with 2 dropped disks

So today I was happily going about stuff when there was a brown out. nothing turned off so I was happy to continue going about doing whatever I was doing. Until I got these emails

Subject: Fail event on /dev/md1:server
This is an automatically generated mail message from mdadm
running on server

A Fail event had been detected on md device /dev/md1.

It could be related to component device /dev/sdg.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdi[3] sdh[1](F) sdg[0](F)
2930274304 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/1] [__U]

md0 : active raid5 sda[5] sdb[0] sdd[1] sde[3] sdc[4]
7814051840 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]

unused devices: <none>

and

Subject: Fail event on /dev/md1:server

This is an automatically generated mail message from mdadm
running on server

A Fail event had been detected on md device /dev/md1.

It could be related to component device /dev/sdh.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdi[3] sdh[1](F) sdg[0](F)
2930274304 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/1] [__U]

md0 : active raid5 sda[5] sdb[0] sdd[1] sde[3] sdc[4]
7814051840 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]

unused devices: <none>

After the small panic attack that my backups raid array had gone walk went about trying to fix it.

Root Cause Analysis

So It turns out that the power brown out had knocked two of my disks offline for a brief second and when they came back online Ubuntu said “oh hey – new disks!” and promptly decided to give them new names. This in turn caused my /dev/md1 array to go bonkers.

The Fix

Well, I wanted this fixed quick – so I turned my Linux server off and then on again – thinking that the disks will renumber themselves back into what they should be and the MD1 array would reassemble.

I was half right, I still needed the array to assemble and run after the reboot. Here is the command I used to get things going again.

mdadm --assemble /dev/md1 --force --run

and there we are, making an MD array run properly after power failure. I am just lucky that there was no writing going on at that time.

Maybe time for a UPS and some ZFS loving?

Installing Dropbox on headless linux file server

So today I decided it was (finally?) time to get a dropbox account. The windows install went fine, but it appears that the linux version is geared up for GUI and not command line config. I had the following requirements:

  1. Put the dropbox folder in a place of my own choosing
  2. make it work without a GUI

so BEFORE installing dropbox, create a link to the place where you want files to be.

jason@server:~$ ln -s /data/documents/files/dropbox/ Dropbox

Then install dropbox.

sudo apt-get install nautilus-dropbox

And the first part was done, now I needed to hook up the install, as this was telling me

$ dropbox start
$ dropbox status
Waiting to be linked to an account...

so after some more digging, I found that I needed to start the dropboxd app manually and watch the output. so:

$ /var/lib/dropbox/.dropbox-dist/dropboxd
This client is not linked to any account...
Please visit https://www.dropbox.com/cli_link?host_id=19b...........................a7&cl=en_US to link this machine.

So, I went to that URL in the webbrowser I was already logged in with, it asked for my password – and then authenticated!

Client successfully linked, Welcome Jason!

missing SOA records and PowerDNS sending ServFail

So today I was trying to resolve an issue with a domain we host. Two of our name servers were sending the correct info, the third was not. In the logs of the third name server I was seeing entries like this:

Mar 1 08:26:22 ns3 pdns[19262]: Not authoritative for <domain> sending servfail to <ip>

It turns out that this is caused by missing SOA records for the domain. In fact in my database there were NO records for this domain. I added the appropriate records in and it now serves!