Posts tagged: open source

Expose github README.md at a path using express middleware

By , Sunday 24th March 2013 10:33 pm

I’ve just published the first version of a new middleware package I’ve written to expose your project’s README.md via express.

To install simply run:

npm i --save express-middleware-readme.md

Continue reading 'Expose github README.md at a path using express middleware'»

New demo system for XMPP-FTW

By , Sunday 10th March 2013 6:39 pm

Originally seen on http://awesome-wildlife.blogspot.co.uk/2009/12/aardvark.html

I’ve spent most of the day writing a new demo system for XMPP-FTW and despite it looking ugly as sin (I am no god with design) I’m quite pleased with how it works, so I thought I’d write up a little piece about it…

Continue reading 'New demo system for XMPP-FTW'»

Talking at London Node User Group (LNUG) – Feburary 2013

By , Wednesday 27th February 2013 8:18 pm

At the Feburary London Node User Group (LNUG) I had a chance to speak about one of my new projects pinitto.me. Pinitto.me is an open source infinite virtual corkboard application that I created over a weekend around christmas to help with planning days for myself and colleagues at Surevine.

Continue reading 'Talking at London Node User Group (LNUG) – Feburary 2013'»

Update a buddycloud channel when events take place on your github repository

comments Comments Off on Update a buddycloud channel when events take place on your github repository
By , Thursday 27th September 2012 7:40 pm

As of this evening it is now possible to have Github post to a buddycloud channel when a push is made to a repository. This allows you to get (amost) real-time repository change information in your buddycloud channels.

Background

I’ve talked about work I’ve done with buddycloud before, but briefly buddycloud is an exciting new federated social network built upon open-source and open-standards. The buddycloud team has recently come back from San Francisco where they were involved with Mozilla’s WebFWD programme getting some great mentoring and guidance from luminaries in their fields. I don’t think I need to really introduce github, they are awesome too :)

If you’re not aware of them github has a set of service hooks that as a repository owner/admin you can utilise in order to push event information (be it commits, pushes, pull requests, branching, etc) to a 3rd party service. There’s a whole set of these services that you can already push to from Jenkins CI right through to Yammer, and now buddycloud!

If you have a service that you’d like to push event information to then github make the code available. All you have to do is fork the service-services repository, knock up some ruby code and submit a pull request. Once it’s been accepted you can then setup github to push information to your favourite information system each time something happens to your repository.

Continue reading 'Update a buddycloud channel when events take place on your github repository'»

Running your own open federated social network from your home for just $25

By , Monday 3rd September 2012 9:00 am

What is the RaspberryPi?

Raspberry Pi image from wikipedia – http://en.wikipedia.org/wiki/Raspberry_Pi

The Raspberry Pi is a small (credit card) sized computer which costs around the £25 mark. Originally envisioned to help bring back proper IT skills to schools (rather than just how to use Microsoft Office suite and alike), just like when children of the 70’s – 90’s were growing up (I just caught the tail end of it).

The ability to not only see the hardware but to mess around with the software running it without fear of breaking it. I learned many of my computer skills from continually breaking my father’s beloved PCs as a child and then hurriedly fixing them before he found out, I’m sure if I tried I could still even run off some MSCDEX lines :)

These little devices, since launch, have been near impossible to get hold of on a short timescale for they have been gobbled up by the developer community and those who remember playing with computers in the long distant past. There is a huge number of projects coming out using this little board and, more importantly, there’s even 8-year old kids generating their own programs (read: games) using it.

My first board is used to run a media server using xbian but one of the projects I was really looking forward to was running the software for an open source project I help out on (professionally and personally) and get my own open-federated social network running from the depths of my basement (more on that below).

For more information please see: Raspberry Pi – About Us

Continue reading 'Running your own open federated social network from your home for just $25'»

Office Grid Computing using Virtual environments – Part 5

By , Friday 4th December 2009 11:03 pm

Introduction

I work in a company where we run many batch jobs processing millions of records of data each day and I’ve been thinking recently about all the machines that sit around each and every day doing nothing for several hours. Wouldn’t it be good if we could use those machines to bolster the processing power of our systems? In this set of articles I’m going to look at the potential benefits of employing an office grid using virtualised environments.

In Part 4 we looked at using tools to ensure that we’re running the latest version of the code and data sources so that obtained results are always up-to-date with the latest business information and logic.

Pre-Deployment

Before deploying your grid system if there’s one thing you do and one thing alone it’s benchmark your current system! No matter what you tell colleagues about how much extra work your system is going to do unless you have numbers to back this up your guarantees are nothing. So,

  • how many records can you process currently? Per Day? Per Hour?
  • How long does it typically take to turn around a job?
  • How much more capacity do you have?

There’s also additional questions:

  • If your processing server (or one of your processing servers) goes down how will this affect your capabilities, will you be crippled?
  • What advantages do you hope/expect to get from a grid system?
  • Are your office machines capable of running the jobs?
  • Are your (or can you jobs be converted) to wrok in this style of running?

The last major point is to take your time on any major change like this. Update your processing code to work using the new methodology, benchmark again. Possibly set up your processing server to run a virtual machine, after all your processing server will just be another worker (just a very powerful one relatively). Allow the new process to settle.

Deployment

My suggestion would be to pop into the office one weekend perform all the installations and setup. Do this just before a fortnight’s holiday and leave so other poor chap to deal with the consequences… maybe not…

Deployment for a system like this needs to be slow. Despite it being relatively simple to set up this system will affect your entire office infrastructure (well the digital one). Firstly, roll out to a couple of machines at a time, monitor network traffic, how the worker hosts perform on a day-to-day basis. You may need to alter your job configuration in response to your findings.

Once the system has settled with a few machines (lets say 10% of all office machines, i.e. 5) keep monitoring network traffic and host machine performance.  Next benchmark again, you should now be processing 33% more jobs than your first benchmarks. Check this is so, or that you’re at least in this ballpark. If not, investigate what is going on before moving on. Repeat this cycle until you happily have all office machines running without killing individual machine performance or grinding your network to a standstill.

At all times keep benchmarking, even after all deployments are made. Check how new code updates affect speed of your system, check all workers are reporting in and processing jobs. Slowly (very slowly) increment your job configuration to get the best from your workers and network.

Stop!

What if you want to stop your workers from running at some time? They are all out there running, regenerating, and trying their best to process data like hungry insects. The answer may seem obvious but its worth adding just in case its overlooked. Simply edit your processing script with an exit(0) or die() or some other statement to kill your processing job. An important reason why we always try to update to the latest processing script before any run!

Demonstration System

In order to write this set of short articles I created a very small grid to demonstrate the technologies and methodologies. I read lots of articles, tutorials, and used various tools to setup and monitor what was going on. By no means have I gone out and saturated a whole office with traffic and nor have I had access to a regular staff members PC to see how host performance was affected.

My demonstration system was very humble indeed. I used my regular desktop set up as a job control server. On this I had installed mySQL server installed set up as a master in replication, PHP,  and SVN linked through apache (for access via worker VM).

I then created a centOS worker machine on VirtualBox on a 6 year old windows XP laptop. I setup scheduled tasks as specified after copying the VM onto the machine and let it go.

The virtual machine was set up with PHP, subversion, and mySQL. I checked out a branch named ‘worker’ from my job control servers repository and made sure it could be updated using ‘svn update’. Next I setup mySQL as a slave and checked that data was replicating from mySQL on the job control server down to the worker VM. After all this I setup the bash script and the cron job.

My processing script basically went along the lines of this (very simple stuff):

  • Read in the name field
  • Counted the number of similar names in a table from the data source held on the VM
  • Counted the number of names as above but splitting the name by spaces (i.e. forename, middle, surname)
  • Repeated this process 1,000 times

Each job took approximately 20 minutes to run. At one point I opened several copies of the worker VM on the windows laptop and watched the jobs be checked off by each of the worker IP addresses. At this point I also confirmed that replication automatically restarted.

Leaving the laptop to idle resulted in a worker starting to process jobs from the job control server. When resuming laptop usage there was a delay of about 30-60 seconds, this is a fair amount of time and staff would need to be made aware that their machine may pause for a short while when returning to the machine. Newer machines may not have a pause of this long. The benefit of the amount of processing performed by these machines during idle periods would more that outweigh staff members having to wait a short period (say 1 minute) on arriving at their machines of a morning (I frequently wait longer that this for a Windows Defender update to take place) provided they were made aware of this (useful time to grab a morning coffee!).

Overall I feel confident that I have demonstrated the technologies that could be used to create such a system. I have shown that such a system does work on a (very) small scale and with some more experimenting could be scaled up utilise the resources of an office’s machines. If I don’t get to the point of doing this I would be very interested to know/see when someone else does.

Conclusions / Evaluation

The next obvious step would be to actually get a real world example and start to deploy a system such as this within an office environment and see what happens. Asking a business to commit to this without a trail blazing company to prove the technology and effectiveness may be a little difficult. Grid/Distributed computing is very popular is some circles and has some large applications (BIONC, SETI@Home, Folding@Home, etc). I did not, however, find a smaller scale and simple system like this in my searches that could be rolled out within an office environment.

I created a basically free system using mostly open source software and tools available in almost any office. The technologies were basically demonstrated and show to perform and work as expected. Hopefully I have show that with not much work and with a very simple setup you can deploy an office grid computing system that is powerful, cheap,  and scalable all at the same time.

Once a system is up and running there is almost no end to the amount of customisation and improvements you can make. For example statistics / benchmarking can easily be added showing the worth of such a system every day. New machines can be added quickly and easily as and when they arrive with upgrades to existing hardware bolstering your processing power.

I hope you’ve enjoyed reading this series of articles and its given you food for thought on running an office grid system. The solution presented here won’t necessarily work in all situations but should be adaptable to allow you to get your data processing done using your own solution.

Please feel free to send me any comments, corrections, or improvements and I’ll do my best to keep this article updated to match.

Panorama Theme by Themocracy

2 visitors online now
1 guests, 1 bots, 0 members
Max visitors today: 5 at 08:01 am UTC
This month: 14 at 11-05-2017 07:08 pm UTC
This year: 45 at 02-01-2017 10:28 pm UTC
All time: 130 at 28-03-2011 10:40 pm UTC