Zend Framework Per Module Layout Settings – Follow Up

By Steven Lloyd Watkin, Tuesday 16th February 2010 8:48 pm

As a follow up to my previous post on per module based layout settings for Zend Framework, I’ve updated the code to require less configuration then before (not that it required more that a few lines in your application configuration!).
Continue reading 'Zend Framework Per Module Layout Settings – Follow Up'»

Creating URL in Zend Customer View Helper

By Steven Lloyd Watkin, Thursday 28th January 2010 11:01 pm

This may seem simple, but I was banging my head trying to create a URL in a custom view helper in Zend Framework. I have routing setup which gets the module from the sub-domain in use so I couldn’t use a simple hardcoded URL.

Basically but invoking an instance of the front controller its possible to grab the router and assemble a url. Assemble is the function used in the view helper. The URL is built up from an array of module, controller, action, etc, followed by a second parameter of the route to use. The code is as follows:

<?php
/**
 * View helper which returns link category URL
 *
 * @author     Lloyd Watkin 
 * @since      25/01/2010
 * @package    ViewHelper
 * @subpackage LinksUrl
 */
class Pro_View_Helper_LinksUrl
    extends Zend_View_Helper_Abstract
{
	/**
	 * Returns link category URL
	 *
	 * @param  Doctrine_Record $category
	 * @param  string          $module
	 * @param  string          $controller
	 * @param  string          $action
	 * @return string Url
	 */
    public function linksUrl($category, $module = 'www',
        $controller = 'links', $action = 'index')
    {
    	$router = Zend_Controller_Front::getInstance()->getRouter();

        return $router->assemble(array(
            'module'     => $module,
            'controller' => $controller,
            'action'     => $action,
            'category'   => "{$category->id}-{$category->name}",
        ), 'www-index');
    }
}

Another way to do this is to invoke Zend_View_Helper_Url itself and call the Url method (if you want to use the helper itself). This can be done by using the following code:

<?php
/**
 * View helper which returns link category URL
 *
 * @author     Lloyd Watkin 
 * @since      25/01/2010
 * @package    ViewHelper
 * @subpackage LinksUrl
 */
class Pro_View_Helper_LinksUrl
    extends Zend_View_Helper_Abstract
{
	/**
	 * Returns link category URL
	 *
	 * @param  Doctrine_Record $category
	 * @param  string          $module
	 * @param  string          $controller
	 * @param  string          $action
	 * @return string Url
	 */
    public function linksUrl($category, $module = 'www',
        $controller = 'links', $action = 'index')
    {
    	$link = new Zend_View_Helper_Url();

        return $link->url(array(
            'module'     => $module,
            'controller' => $controller,
            'action'     => $action,
            'category'   => "{$category->id}-{$slug}",
        ), 'www-index');
    }
}

Both almost identical. Not a hard thing to do in the framework but can catch you out ;)

Dynamically add pages to Zend_Navigation container at runtime

By Steven Lloyd Watkin, Thursday 7th January 2010 10:50 pm

In a continuation on my last post about Zend_Navigation, Route requests for sitemap.xml to custom controller/action, this post is about dymnamically adding pages to a Zend_Navigation container at runtime/script execution.

Its all well and good specifying your pages in a ini or xml file but at some point you’re going to have changing pages in your site that you want as part of a menu, sitemap, or to be included in your breadcrumb trail. Therefore what we need to do is add pages to our Zend_Navigation container at runtime. Examples for this would be in adding news items, blog posts, or page comments, etc.

In this example I’m going to add some news posts to my statically defined ini config. To get my news post page configurations I’ve used a class which returns an array in the following format:

$pagesToAdd = array (
  0 =>
    array (
      'label' => 'Fake news story #5...',
      'module' => 'www',
      'route' => 'www-index',
      'action' => 'view',
      'controller' => 'news',
      'params' => array (
          'id' => '5-Fake-news-story--5' )
    ),
  1 =>
    array ( /* More page details */ ),
 );

As you’ll notice that the function has returned an array in which are contained arrays which make up the config arrays for Zend_Navigation_Page_Mvc. Therefore, by looping over the array new Zend_Navigation pages can be created from the config array. The next thing to do as part of the loop is to to add the pages in the correct position (alternatively pages can be added in bulk by using ->addPages() method).

To do this, locate the page you wish to add the sub-pages to and simply add the pages. In this case I have used the following code to find my page:

$container->findOneBy('label', 'Latest News')->addPage($page);

My overall navigation initialisation in the bootstrap therefore looks like this:

    /**
     * used for handling top-level navigation
     *
     * @return Zend_Navigation
     */
    protected function _initNavigation()
    {
        $this->bootstrap('layout');
        $layout = $this->getResource('layout');
        $view = $layout->getView();
        $config = new Zend_Config_Ini(
            APPLICATION_PATH . '/configs/navigation.ini', APPLICATION_ENV);

        $container = new Zend_Navigation($config->default);
        // Now add the last 25 news reports
        $news  = new News();
        $pages = $news->getNavigationEntries();
        foreach ($pages AS $page) {
        	$page = new Zend_Navigation_Page_Mvc($page);
        	$container->findOneBy('label', 'Latest News')->addPage($page);
        }
        $view->navigation($container);
    }

On thing that needs to be added is some form of caching (using Zend_Cache presumably ;)) otherwise this is going to be quite expensive with each page load.

Route requests for sitemap.xml to custom controller/action

By Steven Lloyd Watkin, Wednesday 6th January 2010 12:13 am

In order to direct requests for /sitemap.xml to a custom controller and action in your Zend Framework application simply add the following in your application.ini or alternative config file (e.g. I use navigation.ini):

resources.router.routes.sitemap.route                = "sitemap.xml"
resources.router.routes.sitemap.defaults.controller  = index
resources.router.routes.sitemap.defaults.action      = sitemap

Example code for outputting can be seen by creating an action in the appropriate controller (e.g. my sitemap lies in the index controller, sitemap action):

<php
class IndexController
    extends Zend_Controller_Action
{
    /**
     * Renders a sitemap based on Zend_Navigation setup
     */
    public function sitemapAction()
    {
    	echo $this->view->navigation()->sitemap();
    	$this->view->layout()->disableLayout();
    	$this->_helper->viewRenderer->setNoRender(true);
    }
}

Sitemaps can quickly and easily be generated using Zend_Navigation, a great quick tutorial (and generally very useful for Zend Framework tutorials) is Zend CastsDynamically creating a menu a sitemap and breadcrumbs.

Zend Framework Per-Module based settings

By Steven Lloyd Watkin, Friday 1st January 2010 10:40 pm

I’ve created a followup to this post which requires less configuration, please see Module Based Layout – Zend Framework.

When using the zend framework with modules, its obvious that if you’re running various (sub-)sites off the same application you don’t necessarily want the same layout scripts for each part. I decided to go with the following site structure:

/Application
    /controllers
        ...
    /models
    /modules
        /default
            /controllers
            /layout
                /scripts
            /views
                /scripts
        /anotherModule
            ...
    /scripts

The problem was setting up the layout scripts on a per-module basis. The answer came through using an Action Helper. Setting up the layouts on a per-module basis involves three steps:

  1. Application.ini (or similar configuration setup):
    admin.resources.layout.layoutPath = APPLICATION_PATH "/modules/admin/layouts/scripts"
    default.resources.layout.layoutPath = APPLICATION_PATH "/modules/default/layouts/scripts"
    member.resources.layout.layoutPath = APPLICATION_PATH "/modules/member/layouts/scripts"
    affiliate.resources.layout.layoutPath = APPLICATION_PATH "/modules/affiliate/layouts/scripts"
  2. Create your Action Helper:
    <?php
    /**
     * Sets the layout path on a per-module basis
     *
     * @author Lloyd Watkin <lloyd@evilprofessor.co.uk>
     * @since  2010-01-01
     */
    class Pro_Controller_Action_Helper_SetLayoutPath
        extends Zend_Controller_Action_Helper_Abstract
    {
        /**
         * Sets layout path based on module
         */
        public function preDispatch()
        {
        	$module = $this->getRequest()->getModuleName();
    
    	    if ($bootstrap = $this->getActionController()
    	                       ->getInvokeArg('bootstrap')) {
    
    	        $config = $bootstrap->getOptions();
    
    	        if (isset($config[$module]['resources']['layout']['layoutPath'])) {
    	            $layoutPath =
    	                 $config[$module]['resources']['layout']['layoutPath'];
    	            $this->getActionController()
    	                 ->getHelper('layout')
    	                 ->setLayoutPath($layoutPath);
    	        }
        	}
        }
    }
  3. And lastly boostrap the action helper:
    ...
        /**
         * Sets up layout scripts on a per-module basis
         */
        protected function _initLayoutHelper()
    	{
    	    $this->bootstrap('frontController');
    	    $layout = Zend_Controller_Action_HelperBroker::addHelper(
    	        new Pro_Controller_Action_Helper_SetLayoutPath());
    	}
    ...

Doctrine: DATETIME default NOW()

By Steven Lloyd Watkin, Wednesday 30th December 2009 6:30 pm

I’ve been struggling with setting up a database schema for a new Zend Framework project. I’m using trying to use Doctrine ORM for my database models. I need to set up the schema so that it allowed me to set a default date and time for a `datetime` column, e.g. when adding a new message I get the current timestamp. After much searching and experimenting I found the solution so I’m sharing it.

In your schema YAML file simply do the following:

Message:
  actAs:
    Timestampable:
      created:
        name: created_at
        type: timestamp
        format: Y-m-d H:m:s
      updated:
        name: last_updated
        type: timestamp
        format: Y-m-d H:m:s
  columns:
    id:
      type: integer
      primary: true
      autoincrement: true
    name: string(255)
    email: string(300)
    message: string(2000)

If on the other hand you don’t want an `updated_at` column you can use the following:

Message:
  actAs:
    Timestampable:
      created:
        name: created_at
        type: timestamp
        format: Y-m-d H:m:s
      updated:
        disabled: true
  columns:
    id:
      type: integer
      primary: true
      autoincrement: true
    name: string(255)
    email: string(300)
    message: string(2000)

PHP Design Patterns – Observer Pattern

By Steven Lloyd Watkin, Tuesday 29th December 2009 10:02 pm

I’ve been reading Head First Design Patterns recently and have decided to write some of the patterns as PHP examples for my own benefit. The first one that I’ve decided to code up is the Observer Pattern. The formal definition of the Observer Pattern is:

The observer pattern (a subset of the asynchronous publish/subscribe pattern) is a software design pattern in which an object, called the subject, maintains a list of its dependents, called observers, and notifies them automatically of any state changes, usually by calling one of their methods. It is mainly used to implement distributed event handling systems.

As systems become more loosely coupled making sure that when an event happens all systems that require knowledge of these updates are informed. For example, a blog post, after saving a post we may need to update a search engine (e.g. Lucene), update our sitemap, tags, email subscribed users, etc. The observer pattern allows developers to add additional listeners without editing their observable object. By injecting observers (i.e. a search engine update observer, a sitemap generator, etc) into a subject (i.e. blog post editing system) we can allow the it to perform all the necessary updates without any changes.

Before the Observer pattern was identified the usual trick would be to update the observable object with an additional line of code to update the required system, and removing the line of code as required. This does not allow for easily adding and removing observers.

The subject updates all of its observers via an update method. This update method calls a method in each observer which is derived from implementing an interface. Observers can add an remove themselves through methods in the observable object.

That’s basically it! I always find an example to be the best method of learning/understanding so here’s my coded up example…

Observer Pattern in PHP

In my example I’ve created a news system (NewsAggregator) which sends out news headline updates to smaller news feeds. Here the news sytem takes the place of the Subject, Observable, etc whereas the news feeds take the role of the Observers or Listeners.

Once initialised observers can attach and detach themselves from the subject as they see fit. In my example I have created three observers, these scan the headlines sent out by the subject and ’shout’ the news if its appropriate. The three observers are named below along with the terms they scan for when receiving news headlines:

  • Sport Observer: ‘rugby’, ‘football’, ‘tennis’
  • News Observer: ‘politics’, ‘finance’, ‘government’
  • Gossip Observer: ‘celebrity’, ‘music’, ‘fashion’

After initalising the subject I add the news and gossip observers and send out a news update. After this the sport observer is added before more news updates are sent out. Lastly the gossip observer is removed before a final news headline is sent out.

The three different observer classes implement the interface Observer, this gives them a clear interface/method through which they will receive updates. Provided they implement the Observer interface they will be able to attach themselves to the Subject. This also keeps with the programming paradigm of ‘program to interfaces not implementations’. The NewsAggregator class extends the abstract class Subject, which provides us with the three required public methods:

  1. updateObservers()
  2. addObserver()
  3. removeObserver()

The code can be seen running here, Observer Pattern in PHP Running, and the code can be downloaded from here, Observer Pattern in PHP Code.

Observer Script

<?php
/**
 * This file contains the observers
 *
 * @author Lloyd Watkin
 * @since 2009/12/23
 */

abstract class Subject
{
	abstract public function addObserver(Observer $observer);
	abstract public function removeObserver(Observer $observer);
	abstract public function updateObservers( $newsHeadline );
}

/**
 * This is the subject class for the example
 *
 * @author Lloyd Watkin
 * @since 2009/12/23
 */
class ArticleAggregator extends Subject
{
	/**
	 * Holds a list of our observers
	 *
	 * @var array
	 */
	protected $_observerList = array();

	/**
	 * Method to add an observer
	 *
	 * @var Observer $observer
	 * @return void
	 */
	public function addObserver(Observer $observer)
	{
		$this->_observerList[] = $observer;
	}

	/**
	 * Method to remove an observer
	 *
	 * @var Observer $observer
	 * @return boolean
	 */
	public function removeObserver(Observer $observer)
	{
		foreach ($this->_observerList AS $key => $ob) {
			if ($ob == $observer) {
				unset($this->_observerList[$key]);
				return true;
			}
		}
		return false;
	}

	/**
	 * Method to update observers
	 *
	 * @var string $newsHeadline
	 * @return void
	 */
	public function updateObservers( $newsHeadline )
	{
		foreach ($this->_observerList AS $ob) {
			$ob->update( $newsHeadline );
		}
	}

	/**
	 * Add a new news story
	 *
	 * @var string $story
	 * @return void
	 */
	public function addNewsStory( $story )
	{
		if ( empty( $story ) || !is_string( $story) ) {
			throw new InvalidArgumentException('Expected a news story!');
		}
		$this->updateObservers( $story );
	}
}

Subject / Observable Script

<?php
/**
 * This file contains the subject
 *
 * @author Lloyd Watkin
 * @since 2009/12/23
 */

abstract class Subject
{
	abstract public function addObserver(Observer $observer);
	abstract public function removeObserver(Observer $observer);
	abstract public function updateObservers( $newsHeadline );
}

/**
 * This is the subject class for the example
 *
 * @author Lloyd Watkin
 * @since 2009/12/23
 */
class ArticleAggregator extends Subject
{
	/**
	 * Holds a list of our observers
	 *
	 * @var array
	 */
	protected $_observerList = array();

	/**
	 * Method to add an observer
	 *
	 * @var Observer $observer
	 * @return void
	 */
	public function addObserver(Observer $observer)
	{
		$this->_observerList[] = $observer;
	}

	/**
	 * Method to remove an observer
	 *
	 * @var Observer $observer
	 * @return boolean
	 */
	public function removeObserver(Observer $observer)
	{
		foreach ($this->_observerList AS $key => $ob) {
			if ($ob == $observer) {
				unset($this->_observerList[$key]);
				return true;
			}
		}
		return false;
	}

	/**
	 * Method to update observers
	 *
	 * @var string $newsHeadline
	 * @return void
	 */
	public function updateObservers( $newsHeadline )
	{
		foreach ($this->_observerList AS $ob) {
			$ob->update( $newsHeadline );
		}
	}

	/**
	 * Add a new news story
	 *
	 * @var string $story
	 * @return void
	 */
	public function addNewsStory( $story )
	{
		if ( empty( $story ) || !is_string( $story) ) {
			throw new InvalidArgumentException('Expected a news story!');
		}
		$this->updateObservers( $story );
	}
}

Controller Script

<?php
/**
 * Observer Design Pattern Example
 *
 * @author Lloyd Watkin
 * @since 2009/12/23
 * @link http://www.evilprofessor.co.uk
 */
include 'observers.php';
include 'subject.php';

if (!empty($_SERVER['HTTP_USER_AGENT'])) {
    echo '
';
}

// What are we doing?
echo 'Observer Pattern Example in PHP' . PHP_EOL;
echo '================================' . PHP_EOL;
// Set up our subject
$subject = new ArticleAggregator();
echo ' - ArticleAggregator created' . PHP_EOL;

// Add some observers
$subject->addObserver( new NewsObserver() );
$subject->addObserver( $gossiper = new GossipObserver() );

echo ' - Added NewsObverser & GossipObserver' .
	 PHP_EOL . PHP_EOL;

// Beep, beep, beep... News Flash!
echo 'NewsFlash: celebrity rugby player loves finance' . PHP_EOL;
echo '================================================' . PHP_EOL;
$subject->addNewsStory('celebrity rugby player loves finance');
echo PHP_EOL;

echo ' - SportObserver has found out and wants to join the group!';
$subject->addObserver( new SportObserver() );
echo PHP_EOL . PHP_EOL;

// Beep, beep, beep... News Flash!
echo 'NewsFlash: government messes up again!' . PHP_EOL;
echo '=======================================' . PHP_EOL;
$subject->addNewsStory('government messes up again!');
echo PHP_EOL;

// Beep, beep, beep... News Flash!
echo 'NewsFlash: fashion and football combine' . PHP_EOL;
echo '=======================================' . PHP_EOL;
$subject->addNewsStory('fashion and football combine');
echo PHP_EOL;

// Beep, beep, beep... News Flash!
echo 'NewsFlash: music and politics, what next?' . PHP_EOL;
echo '==========================================' . PHP_EOL;
$subject->addNewsStory('music and politics, what next?');
echo PHP_EOL;

/**
 * Gossipers grow tired of news very quickly and have decided
 * to stop listening, despite all the interesting news today!
 */
echo ' - GossipObserver is bored and leaves the group!' .
     PHP_EOL . PHP_EOL;
$subject->removeObserver( $gossiper );

// Beep, beep, beep... News Flash - Update to an earlier story!
echo 'NewsUpdate: fashion and football combine says ' .
     'government' . PHP_EOL;
echo '================================================' .
     '=========' . PHP_EOL;
$subject->addNewsStory( 'fashion and football combine ' .
                        'says government' );
echo PHP_EOL;

if (!empty($_SERVER['HTTP_USER_AGENT'])) {
    echo '

';

}

Office Grid Computing using Virtual environments – Part 4

By Steven Lloyd Watkin, Friday 4th December 2009 11:59 pm

Introduction

I work in a company where we run many batch jobs processing millions of records of data each day and I’ve been thinking recently about all the machines that sit around each and every day doing nothing for several hours. Wouldn’t it be good if we could use those machines to bolster the processing power of our systems? In this set of articles I’m going to look at the potential benefits of employing an office grid using virtualised environments.

In part 3 we created our virtual processing machine and set up windows machines to become idle-time workers.

Running the latest code

Inevitably after creating your workers business logic will change, bugs will be found, faster more efficient code will be produced thus leaving your workers sat around processing data using old smelly code. How then do we ensure that we’re always using the latest and greatest version of our processing scripts?

There are a few very easy simple ways we could do this, the trick, however, is to reduce processing power and network traffic in achieving this. Lets start with the simplest of solutions and improve it slowly over a couple of iterations.

The first method would be to simply connect to our job control server (via samba, FTP, or similar) and pull down the latest version of the code. Not very efficient, but it will do the job. Lets improve on that somewhat, how about creating an rsync script and using that each time instead? Alternatively what about putting our latest processing script into subversion checking out the code initially and then just updating our code on each run (svn update)?

In the end we could end up with a bash script (called by cron every 10 minutes) which looks as simple as this:

#!/bin/sh
if ps ax | grep -v grep | grep php > /dev/null
then
    echo "Job is currently processing, exit"
else
    echo "Job is not running, start now"
    cd /path/to/working/copy
    svn update
    php yourJobProcessingScript.php
fi

Now we can be sure that with each run we’re definitely running the latest code. We’re ensuring this by updating our code base each and every time we perform a run and reducing network traffic by only transferring the file differences across our network.

In my demonstration setup, I did exactly as above. Subversion was installed on my job processing server and I simply pulled the latest code from a ‘worker’ branch using ’svn update’. I also added a version number tag to my processing script which was returned to the database as part of the results return. This way I could see that my code was being updated each time I copied my trunk into the worker branch i.e. that I was definitely running the latest processing script.

Using the latest data

If your job processing makes use of data sources then at some point these are going to be updated too. Unless you call your data sources on a very infrequent basis you’re going to flood your network with traffic as soon as your workers start running bringing everything to a standstill. For my solution I decided that I’d like to move my data sources around with my VMs.

Hold you’re horses there! What if my data sources are HUGE? Well this really is a case of how much data are we talking? It may be more cost effective to install an additional larger hard drive into each machine than to purchase an additional processing server. This is a question of budget and is up to the business to decide. It maybe that your data sources are so large that its just unfeasible to keep that amount of data in your worker machines. In that case what would you do? Well we could look at calling a local data server, but this might cause issues with the network. In this case a grid system such as this may become unrealistic to include in your office environment. It may also be that you can look into alternative running strategies, for example only calling your workers between 8pm and 6am each night and/or throttling data source requests.

Moving on lets say our data sources amount to 100Gb of data. Well yes that’s quite a bit of data to move around the network on an update. How would we ensure that we have the latest copy of the data in this case? Rsync is a possibility, but personally I think by running your latest data source on your job processing server and setting this up as a master in replication (with a nice long bin log) might be the way to go:

replication By setting each of your workers up as a slave to the job control server updates to your data sources will trickle down nicely to your workers without a huge increase in network activity (that is unless you perform a huge data update and all your workers kick in at once). This has advantages over rsync in that you wouldn’t get a long pause before each job; as the database updates, the mysql daemon on your worker will continually update its data while the processing continues.

This is how I set up my demonstration server. To set up replication I followed the guide on the mySQL site (Setting up replication) and within 20 minutes I had my inital worker replicating the job control servers dataset. For each additional worker the replication settings and process worked each time when the VM was copied.

Summary

In this section of the article we have looked at how easy and painless it is to keep your processing code up to date by using  rsync or subverion (SVN) to do the work and reduce network traffic at the same time.  We also discussed how to keep your data source information up-to-date by allowing it to trickle down to each of your workers. Thus we are  ensuring that we keep up with business logic and information in our office grid system. There will obviously be countless alternatives to performing these tasks, but here were two simple examples to show how easy a solution is to come by.

Next time

In the final part of this series, aptly named Part 5 , we’ll discuss deploying this system for. I’ll summarise what has been learned and what I managed to create.

Office Grid Computing using Virtual environments – Part 3

By Steven Lloyd Watkin, Friday 4th December 2009 11:37 pm

Introduction

I work in a company where we run many batch jobs processing millions of records of data each day and I’ve been thinking recently about all the machines that sit around each and every day doing nothing for several hours. Wouldn’t it be good if we could use those machines to bolster the processing power of our systems? In this set of articles I’m going to look at the potential benefits of employing an office grid using virtualised environments.

In part 2 we looked at the jobs a server will run, and how jobs should be configured in order to achieve greatest amount of processing whilst ensuring that each job is processed without fail.

Setting up your worker – or LiMP server

The next step in the process is to set up your virtual workers. For this I’m going to use an installation of centOS using VirtualBox. I’m going to install mySQL and PHP on the server, also known as a LiMP (Linux, mySQL, PHP) Server  (I may have made that name up).

  • Install VirtualBox on your windows machine (follow link)
  • Download and install centOS (current version 5.3) within a created virtual machine

There’s no point me going to this there’s probably 1,000’s of great tutorials out there (ok, here’s one: Creating and Managing  centOS virtual machine under virtualbox). The important point to note I suppose is that I called my virtual machine GridMachine.

As far as my choices of virtualisation client and operating system go there is no big compelling reason for each choice. VirtualBox is something I use on my home machine and is supported by the three major operating systems. I chose centOS as its a good stable OS and I use it on my own web server. I am a great believer in the right tools for the job (although I’m applying ‘use the quickest and easiest for you’ mentality here), so if operating system X runs your code quicker and more efficiently use that instead :)

Importantly make sure that your VM uses DHCP, otherwise for each new virtual machine would need to be configured separately which is something we don’t want.By using DHCP we don’t need to configure network settings individually for worker machines, DHCP will hand out IPs for you. Therefore you can copy your virtual machine about the office without worrying about setting each one up (this improves scalability and reduces worker administration).

The process you should aim to achieve would be to obtain a new physical machine, install VirtualBox, and then pretty much deploy the virtual image without much else. It might be wise to setup all your workers on a different subnet so that you can at least see how many machines are running. You’ll also need to set up your machines on a long lease or unlimited lease DHCP.

How to run Jobs on the worker

This is an interesting area and there are several valid methods for processing jobs on the worker. Here I’ll just discuss the two most obvious:

  • Perpetually running script: A script, be it a shell script, or a PHP script is executed once on the worker and runs as part of an infinite loop. I’ve discounted this method as one crash of the script and potentially your workers will cease to run without some sort of intervention.
  • Cron based script execution: Every X minutes the cron daemon kicks off a call to your script to get things going. Without some checking this could lead to many many copies of your worker script running.

My decision was to go with cron which kicks off a shell script every 10 minutes.  My shell script performs the following tasks:

  1. Get a process list and grep this for ‘php’. If not found then continue.
  2. Call your job code, in my case this would be something PHP based
  3. Worker script completes its run
  4. Ready to go again on the next appropriate call

My bash script looks something like the following:

#!/bin/sh
if ps ax | grep -v grep | grep php > /dev/null
then
    echo "Job is currently processing, exit"
else
    echo "Job is not running, start now"
    php yourJobProcessingScript.php
fi

Note: the echo’s are almost completely pointless, but may help the next person who comes along to try and edit them.

That concludes the set up of the worker virtual machine, quick, simple, and easy to copy to each new piece of hardware that is received. The ‘cleverness’ of the grid system really isn’t in the visualised OS, its all to do with the code created to process jobs, the job configuration, and in making sure that the job runs when appropriate (i.e. when the host is idle).

Setting up Windows to Initialise Workers

The first task is to work out the command required to run the virtual machine from the windows command line. If you’ve installed virtualBox in the default location and you’ve named your worker GridMachine then the command required to load up your worker is:

"C:\Program Files\Sun\VirtualBox\VBoxManage.exe" startvm GridMachine

However to run the script in a ‘headless’ state we need to use:

"C:\Program Files\Sun\VirtualBox\VBoxHeadless.exe" -startvm GridMachine --vrdp=off

This will start the virtual machine without the GUI and allow it to save state gracefully. The second argument turns off RDP so it doesn’t conflict with windows RDP, or give you a message about listening on port 3389. The virtual machine name is cAsE sEnSiTiVe!

Next, we’ll need to set windows up to kick off our worker VM once the machine has been idle. To do this (on Windows XP) you’ll need to go Start -> All Programs -> Accessories -> System Tools -> Scheduled Tasks as below:

scheduled tasks

Next click on ‘Add Scheduled Task’ followed by browse to add a custom program. Navigate to your VBoxManage script and click ok. Schedule your task for any of the options (we’ll change this in a minute) and continue. After skipping the next screen windows will ask you who you want to run this task, I’d suggest either ‘Administrator’ or creating a new privileged user. Remember we don’t want to interfere with the standard staff account on the machine at any point. Click next and check show advanced options for this task.

To the end of the run textbox add our ‘startvm GridMachine‘ string and ensure that run only when logged in is left unticked. Visit the schedule task next and change the schedule drop down to the option ‘when idle’, choose the amount of time you’d like the machine to be idle before moving on to the next tab.

Finally untick the option which states stop the task if it has been running X amount of time, but do tick the option to stop the task if the machine is no longer idle.

schedule

That’s it then for the windows host setup!

Summary

In this part we have set up a virtual machine to act as a worker, as well as the way in which we call and execute our job processing scripts (for myself a PHP script). From here we look at how to set up our copies of windows to start up the virtual machine in headless mode when the computer becomes idle, and save its state when the user resumes usage of the machine. Hopefully at this point you’re seeing how simple it is to set up such a system and are itching to get some experiments going yourself!

Next time

In Part 4 we’ll be looking at using tools to ensure that you’re running the latest version of the code and data sources so that obtained results are always up-to-date with the latest business information and logic.

Office Grid Computing using Virtual environments – Part 1

By Steven Lloyd Watkin, Friday 4th December 2009 11:23 pm

Introduction

I work in a company where we run many batch jobs processing millions of records of data each day and I’ve been thinking recently about all the machines that sit around each and every day doing nothing for several hours. Wouldn’t it be good if we could use those machines to bolster the processing power of our systems? In this set of articles I’m going to look at the potential benefits of employing an office grid using virtualised environments.

As a PHP developer I’m going to use tools that I use each day namely, Linux, mySQL, PHP, VirtualBox and subversion (SVN). However I hope this guide will adapt to other languages and technologies just as well.

The solution I provide will be very loosely based on the type of processing we’d need to achieve however this may not be true through the entire article as I’ll change things for simplicity, or to produce more interesting usage scenarios.

These virtualised environments will run on windows machines since this is what the majority of offices run. The processing that the office machines do should not interfere with staff using those machines, should require no maintenance at the machine, and be easily deployable to new machines as they become available. Also, new virtual machines should not require any additional configuration as this greatly reduces the scalability and ease at which the grid system can be extended.

Why Deploy an Office Computing Grid?

Firstly you may be thinking,why not just use a cloud computing resource such as Amazon’s EC2 platform? Well the reasons could be several, for example:

  • You won’t entrust certain data to a cloud computing environment
  • You can’t put certain data into a cloud computing environment for legal reasons (e.g. data leaving the country), potentially for legal reasons, e.g. NHS records.
  • You want to keep your processing units close and have full control over the hardware too
  • You don’t have the project funds to run cloud instances
  • Your office doesn’t have a connection to the internet and therefore its not possible to use a cloud resource
  • You don’t like rain, clouds suggest rain, therefore you keep well away

I’m sure the list could continue, but I think that’s enough for now.

Advantages of an Office Computing Grid

Well, lets do some maths (and in true physics style lets make some sweeping assumptions). Imagine you have big beefy processing server running 100 jobs per day. In your office you have 50 machines which are idle 16 hours a day, each of these machines is 10% as powerful as your beefy processing sever. (All results here are rounded to underestimate performance increase).

So, 1 machine * 10% power * 2/3 time = 0.067 i.e. 1 desktop processing in idle time could process 6 full jobs per day.

If you now scale this up it takes 15 idle desktops to process as many jobs per day as your main processing server does.

So in our pretend office of 50 machines we could increase our processing power from 1 server up to 4 full processing servers, or we could be processing 400 jobs per day instead of 100.

Notice, for no investment in new hardware your company has just increased its batch processing capacity 4 times! Potentially you’re going to increase your power usage but from most office environments I’ve been to machines are generally left on overnight anyway, so you could see this as a green initiative.

Other advantages also mean that investment in new (or updated) processing servers can be delayed if your office machines are sufficient and that as you improve the power of your office machines your office grid becomes more powerful automatically.

Technologies

What you need? (or more correctly what did I use):

  • Idle office machines (in my case a spare old windows XP laptop)
  • VirtualBox (or another virtualisation client software)
  • A virtual machine with PHP, mySQL running  running a cut down OS, I’m calling these my LiMP servers :)
  • Jobs to run
  • Job server (can be another virtual machine somewhere)

Typical Jobs

The types of jobs that this system is designed to run is as follows:

  • System receives a list of data upon which we need to match and return results
  • Matching involves checking/searching several (fairly static) data sources
  • Results from data sources may require further validation, merging, checking of additional data sources in response to results
  • Data is returned with matching records, fully validated and processed
  • Each record within a job is independent of the rest

So basically we’re looking at running jobs which require a mixture of database lookups and some number crunching, a fairly typical scenario in a business environment.

Grid solutions are not only advantageous for processing jobs of this type. Basically, any process which can be split into independent units can be run in parallel. See this wikipedia for examples and more information: Grid Computing, but a couple of famous examples are Seti@Home and BIONC. There are frameworks for running computing grids, and these are well worth looking into.

What will we achieve?

By the end of these articles I hope to show that deploying an office grid need not be hugely expensive or time consuming. I’m going to discuss:

  • Setting up the job control system, job configuration
  • Creating an appropriate processing virtual machine
  • How to setup the system on a windows machine
  • Ensuring you are using the latest code and data
  • Deployment and benchmarking
  • Looking ahead

I’ll be building (ok I built, then wrote this) an example application to test the concepts on a local machine using windows XP and my ‘GridMachine’ virtual machine. My job control server will be my main machine which runs Fedora 11.

This is in no way meant to demonstrate a fully working robust system, its meant more of a demonstration and discussing showing that these things can be achieved in a reasonably short space of time and at little cost. Please feel free to send me any comments, corrections, or improvements and I’ll do my best to keep this article updated to match.

Next time

In part 2 I will start by looking at the job control system, and look into how jobs should be configured in order to achieve greatest amount of processing whilst ensuring that each job is processed without fail.

Panorama Theme by Themocracy

12 visitors online now
12 guests, 0 members
Max visitors today: 14 at 03:02 am UTC
This month: 42 at 11-03-2010 02:50 am UTC
This year: 42 at 11-03-2010 02:50 am UTC
All time: 42 at 11-03-2010 02:50 am UTC