Adventures with Google Content API and OAuth

So recently I have had the opportunity to play with Google’s Content API and their OAuth API.

I have needed to use it in “offline” mode, as I want to interact with Google when the authorising user is not present (cron jobs and such).

Here are my lessons learnt.

  • You can Indeed use OAuth on a website that wants to use it in the background. You just need to persistently store the tokens (especially the refresh token!)
  • The refresh token ONLY appears when the user is asked for permission. it does *not* appear when access is auto approved. This means when you generate the authorise url – you need to specify the “approval=force” option!
  • Their testing facilities are not that good, trying to sort out a sandbox site is like pulling teeth. Their account signup pages were busted 🙁
  • Their API is pretty good!

Google Content API Class

Below is a simple class stub to interact with Google. The PersistentKeyValueStore class is fairly self explanitory and you can implement your own (I persist my data in a simple table with the columns “key” and “value” with “key” being a primary key).

When implementing this class you will need

  • A user to initially interact with a web page
  • Your code to call the Google_Content_Client->doAuthorise() function so the user can interact with the OAuth page.
  • <?php
    client = new GSC_Client($options->merchantId);
    		$this->options = $options;
    
    		$this->authToken = new GSC_OAuth2Token(
    			$this->options->clientId,
    			$this->options->clientSecret,
    			$this->options->userAgent
    		);
    
    		$token = PersistentKeyValueStore::get(self::TOKEN_KEY);
    		if (!$token) {
    			return false;
    		}
    		$this->authToken->fromBlob($token);
    
    		$this->client->setToken($this->authToken);
    	}
    
    	/**
    	 * handles the user interaction for the authorising
    	 */
    	public function doAuthorise($revoke, $force = false) {
    
    		if ($revoke) {
    			// do we have a refresh token to revoke?
    
    			$bits = explode('|',PersistentKeyValueStore::get(self::TOKEN_KEY));
    			if ($bits[4]) {
    				$this->authToken->revoke();
    			}
    			PersistentKeyValueStore::set(self::TOKEN_KEY,'');
    		} else {
    			$code = @$_GET['code'];
    
    			$approvalPrompt = $force ? 'force' : 'auto';
    			$authorizeUrl = $this->authToken->generateAuthorizeUrl($this->options->redirectUri, $approvalPrompt);
    			if ('' == $code) {
    				header("Location: $authorizeUrl");
    				die;
    			} else {
    				$this->authToken->getAccessToken($_GET['code']);
    				$this->client->setToken($this->authToken);
    				PersistentKeyValueStore::set(self::TOKEN_KEY,$this->authToken->toBlob());
    			}
    		}
    
    	}
    
    	/** your functions to wrap Google's **/
    }

Telstra Rally Australia 2006

So we went though the filing cabinet the other day and threw out a whole bunch of REALLY old paperwork. Among the debris were some gems – like these photos from Telstra Rally Australia in 2006.

My hairy and scary phase whilst I was building the online flight booking engine at Best Flights.

This was the last WRC event held in Perth (apparently it did not bring in enough cash for the governments liking). We were racing in the corporate cup – which is basically an excuse for some non rally entrants to hoon around the Gloucester Park Super Stage in a pair of Rally Hyundai Accents. They had their interiors stripped, Rally Tires, intake and exhaust mods, and a roll cage. They were beastly (not) machines.

PHP: for vs foreach

So this week I was asked the question on which was quicker in PHP – A for loop or a foreach loop. It turns out that my assertion that they were both about the same performance was about right. Here are my test results

jason@server:~/php$ php speedtest.php
Starting test: Test for loop data length=4000000
     Mem: 795.63MiB Used
Peak Mem: 795.63MiB Used
[xxxxxxxxx ]
Results for test Test for loop data length=4000000
Min   : 0.6085901260376
Max   : 0.61222791671753
Mean  : 0.6110053062439
StdDev: 0.00015155474344889

     Mem: 795.63MiB Used
Peak Mem: 795.64MiB Used
Starting test: Test foreach loop data length=4000000
     Mem: 795.63MiB Used
Peak Mem: 795.64MiB Used
[xxxxxxxxx ]
Results for test Test foreach loop data length=4000000
Min   : 0.41838312149048
Max   : 0.42730903625488
Mean  : 0.42159442901611
StdDev: 0.0010704358418783

     Mem: 795.63MiB Used
Peak Mem: 795.64MiB Used
On average test 2 is faster than test 1 by 1.4493x

And the code I used to run the test is:

<?php

$data_length = 4000000;//rand(10000000,20000000);
for ($i=0; $i<$data_length; $i++) {
        $data[] = $i;
}


$avg1 = run_test("Test for loop data length={$data_length}", "test_for",&$data);
$avg2 = run_test("Test foreach loop data length={$data_length}", "test_foreach",&$data);

print "On average ";
if ($avg1 < $avg2) {
        print "test 1 is faster than test 2 by ";
        printf("%0.4fx\n", $avg2 / $avg1);
} else {
        print "test 2 is faster than test 1 by ";
        printf("%0.4fx\n", $avg1 / $avg2);
}


function test_for(&$data) {
        $max = count(&$data);
        for($i=0; $i<$max; $i++) {$x=&$data[$i];}
}

function test_foreach(&$data) {
        foreach ($data as &$x) {}
}


function run_test($name, $func, &$data) {
        print "Starting test: $name\n";
        printf("     Mem: %3.2fMiB Used\n", memory_get_usage()/1024/1024);
        printf("Peak Mem: %3.2fMiB Used\n", memory_get_peak_usage()/1024/1024);
        $num_runs = 10;

        for($i=0; $i<$num_runs; $i++) {
                print "[" . str_repeat('x',$i) . str_repeat(' ',$num_runs-$i) . "]\r";
                $run_times[] = time_run($func,&$data);
        }

        print "\nResults for test {$name}\n";
        $min = null;
        $max = null;
        $mean = null;
        foreach ($run_times as $run_time) {
                $mean += $run_time;
                $max = (is_null($max) || $run_time > $max) ? $run_time : $max;
                $min = (is_null($min) || $run_time < $min) ? $run_time : $min;
        }
        $mean /= $num_runs;

        // now work out the std deviation aka confidence
        $std_dev_count = 0;
        foreach ($run_times as $run_time) {
                $dev = $run_time - $mean;
                $std_dev_count = $dev * $dev;
        }
        $std_dev = sqrt( $std_dev_count / ($num_runs-1));

        print "Min   : {$min}\n";
        print "Max   : {$max}\n";
        print "Mean  : {$mean}\n";
        print "StdDev: {$std_dev}\n\n";

        printf("     Mem: %3.2fMiB Used\n", memory_get_usage()/1024/1024);
        printf("Peak Mem: %3.2fMiB Used\n", memory_get_peak_usage()/1024/1024);

        return $mean;
}

function time_run($func,&$data) {
        $start = microtime(1);
        $func(&$data);
        $end = microtime(1);
        return $end - $start;
}

rm: Too Many Files

Ever come across a folder you need to delete but there are too many files in it?

Basically the shell expansion of * attempts to put everything on the commandline – so:

jason@server:~/images/# rm *

turns into

jason@server:~/images/# rm image1.jpg image2.jpg image3.jpg image4.jpg...

and there is a limit (albeit rather large) on the length of a command this can be a pain to try and figure out which files to delete on mass to get rid of the folder.

Fortunately there is some awesome commandline foo that you can do – and here it is:

ls -1 | tr '\n' '\0' | sed 's/ /\\ /' | xargs -0 rm

xargs will append all the file names onto the end of the rm command and run as many as needed to delete all the files. The explanation of this command is:

  1. List all files in the current folder, one per line
  2. Change all newline characters to null characters (better for xargs to split upon)
  3. Escape all the spaces in file names
  4. Finally run the rm command via xargs
  5. we could simplify this a little if we only wanted to remove jpeg images

    find -type f -name \*.jpg -print0 | xargs -0 rm

Why is my Quad Core VPS Running Slowly?

Or how a host schedules CPU cycles.

So I learnt an interesting tidbit of information the other day to do with a VPS and why it had high load and bugger all CPU usage. If you see something similar to this in top:

top - 20:30:53 up 8 days,  6:42,  1 user,  load average: 9.37, 10.81, 9.67
Tasks: 135 total,  12 running, 133 sleeping,   0 stopped,   0 zombie
Cpu(s): 30.8%us,  0.8%sy,  0.0%ni, 67.8%id,  0.5%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3790268k total,  3621780k used,   168488k free,   350528k buffers
Swap:  1830908k total,    11464k used,  1819444k free,  2753548k cached

over a long period of time, and you have more than 2 CPUs in your VPS – consider dropping back to 2 vCPU’s.

But Why? Surely 4 CPUs is Better than 2!

They certainly are, but when the underlying host gets busy, slots for 2 vCPU hosts get scheduled a lot more often than slots for quad vCPU’s – this is all down to the scheduler.

The case for turning it off and on again

So the other day we started having issues with our mail server. The symptom was the mail queue showing hundreds of emails with a message like “SMTP Server rejected at greeting”. Amavis (the mail scanner / coordinator) was rejecting mail and ClamAV was not working properly. We found that simply restarting the Amavis daemon and flushing the mail queue would resolve the problem for a short time before it would happen again.

The postfix mail queue spikes
The postfix mail queue spikes

Before we managed to resolve the problem, it happened over and over again, with more and more frequency as you can see in the above graph.

The resolution? I restarted the server and waved goodbye to the 200+ day uptime. Because the file system had not been fsck’d in such a long time, it was forced and low and behold there were busted inodes and file system errors. These problems were fixed and since then the mail server has been happily behaving itself. I am also no longer scratching my head as to why load was so high while processing the mail queue and why Amavis was failing!

Have you tried turning it off and then on again?
The IT Crowd – Have you tried turning it off and then on again?

That said, I will not be reaching for the turning it off and then on again approach to resolve all the problems we encounter, as most of them can be fixed quickly if you look through the logs!

WordPress – Stop screwing with the timezone!

So my dates were not displaying correctly and it turns out that WordPress is to blame.

After checking the data and finding that it was correct, I was confused as to why a wordpress page was displaying the wrong date for a correct unix timestamp.

WordPress was screwing with the timezone setting of PHP. This little factoid took a good 30 minutes for my decaffeinated brain to figure out.

So, here is a small function that will help out displaying a date with the correct info:

/**
* Echo's a date with the timezone setting unfuxed
* 
* @param mixed $format
* @param mixed $timestamp
*/
function echo_date($format,$timestamp) {
	$old_tz = date_default_timezone_get();
	date_default_timezone_set(get_option('timezone_string'));
	echo date($format,$timestamp);
	
	date_default_timezone_set($old_tz);
}

Using varnish as a HTTP Router

A Layer 7 Routing Option

So one of the novel uses I have put the Varnish Cache to is a HTTP (Layer 7) Router.

Our Setup:

We have a single IP address that forwards port 80 to a Virtual Machine. This virtual machine runs varnish. We have a whole number of virtual machines that we use for development and need to be accessible from the great wild web. How do we do this?

HTTP Router Setup

The simple solution is to setup multiple backend definitions and do if statements on the req.http.host.

backend int_dev_server_1 {
    .host = "10.1.2.1";
    .port = "80";
}

backend int_dev_server_2 {
    .host = "10.1.2.2";
    .port = "80";
}

sub vcl_recv {
	// ... your normal config stuff

	if (req.http.host ~ "^(.*)dev-server-1.example.com") {
	    set req.backend = int_dev_server_1;
	    return(pipe);
	}
	if (req.http.host ~ "^(.*)dev-server-2.example.com") {
	    set req.backend = int_dev_server_2;
	    return(pipe);
	}

}