类别:Web编程

sitemap.xml的自定义控制器/行动的请求路由

Steven ,1月6日星期三2010 12:13 AM

为了直接请求/ sitemap.xml的 ,以一个自定义的控制器和Zend Framework的应用程序在您的行动,只需在您的application.ini或替代的配置文件中添加以下(如我navigation.ini):

 resources.router.routes.sitemap.route =“sitemap.xml的”
 resources.router.routes.sitemap.defaults.controller =索引
 resources.router.routes.sitemap.defaults.action =网站地图

通过建立在适当的控制器(如我的Sitemaps索引中的控制器,地图行动)的行动,对输出的示例代码可以看出:

 < PHP
类IndexController
    延伸Zend_Controller_Action的
 {
     / **
      *呈现基于Zend_Navigation设置的一个Sitemap
      * /
    公共函数sitemapAction()
     {
    	回声$>查看 - >导航() - >网站地图();
    	 - >视图 - >布局() - > disableLayout();
    	 - > _helper ->的ViewRenderer - > setNoRender(TRUE);
     }
 }

Sitemaps的可以快速方便地使用Zend_Navigation ,会产生一个巨大的快速教程(Zend Framework的教程一般都非常有用) 是Zend的演员-动态菜单的创建一个地图和面包屑。

Zend框架的每个模块的设置

Steven ,日(星期五)2010年1月1日下午10:40

我已经创建了一个随访到这个职位需要很少的配置,请参阅模块的布局- Zend框架

当使用Zend框架的模块,其明显的,如果你正在运行的关闭同一个应用程序的各个(子)网站, 不一定要相同的布局脚本,每个部分。 我决定去以下站点结构:

  /应用程序
     /控制器
         ... ...
     /型号
     /模块
         /默认
             /控制器
             /布局
                 /脚本
             /意见
                 /脚本
         / anotherModule
             ... ...
     /脚本

问题是每个模块的基础上成立的布局脚本。 答案是通过使用一个动作助手。 每个模块的基础上设置的布局,包括三个步骤:

  1. 的application.ini(或类似的配置设置):
      admin.resources.layout.layoutPath = APPLICATION_PATH“/模块/ ADMIN /布局/脚本”
     default.resources.layout.layoutPath = APPLICATION_PATH“/模块/默认/布局/脚本”
     member.resources.layout.layoutPath = APPLICATION_PATH“/模块/会员/布局/脚本”
     affiliate.resources.layout.layoutPath = APPLICATION_PATH“/模块/子公司/布局/脚本” 
  2. 创建你的动作助手:
      <?PHP
     / **
      *每个模块的基础上设置的布局路径
      *
      * @作者:劳合社沃特金斯<lloyd@evilprofessor.co.uk>
      * _AT_自2010-01-01
      * /
    类Pro_Controller_Action_Helper_SetLayoutPath
        延伸Zend_Controller_Action_Helper_Abstract
     {
         / **
          *基于模块设置布局路径
          * /
        公共职能preDispatch()在
         {
        	模块= $> getRequest() - > getModuleName();
    
    	    如果($引导= $> getActionController()
    	                        - > getInvokeArg(“引导”)){
    
    	         $ CONFIG = $引导 - > getOptions();
    
    	         (使用isset($ CONFIG [模块] ['资源'] ['布局'] ['layoutPath'])){
    	            美元layoutPath =
    	                 配置[模块] ['资源'] ['布局'] ['layoutPath“];
    	             $> getActionController()
    	                  - > getHelper(“布局”)
    	                  - > setLayoutPath(元layoutPath);
    	         }
        	 }
         }
     } 
  3. 最后自举的动作助手:
      ... ...
         / **
          *设置每个模块的基础上,布局脚本
          * /
        保护功能_initLayoutHelper()
    	 {
    	     - >引导(“frontController);
    	     $布局= Zend_Controller_Action_HelperBroker:addHelper(
    	        新Pro_Controller_Action_Helper_SetLayoutPath());
    	 }
     ... ... 

学说:DATETIME的默认NOW()

(星期三)30日2009年12月下午6:30

我一直在努力设立一个新的数据库架构Zend框架项目。 使用 尝试使用我的数据库模型学说的ORM 。 我需要设立架构,它允许我`的datetime`列设置一个默认的日期和时间,例如,添加一个新的消息,我得到的当前时间戳时。 经过多番搜索和尝试,我找到了解决办法,所以我共享。

架构中的YAML的文件只需做到以下几点:

消息:
   actAs:
     Timestampable:
      创建:
        名称:created_at
        类型:TIMESTAMP
        格式:YMD H:我:S
      更新时间:
        姓名:last_updated
        类型:TIMESTAMP
        格式:YMD H:我:S
  列:
     ID:
      类型:整数
      主:真
      自动增量:真
    名称:字符串(255)
    电子邮件:字符串(300)
    消息:字符串(2000)

另一方面如果你不想要一个'updated_at'列,你可以使用以下命令:

消息:
   actAs:
     Timestampable:
      创建:
        名称:created_at
        类型:TIMESTAMP
        格式:YMD H:我:S
      更新时间:
        禁用:真
  列:
     ID:
      类型:整数
      主:真
      自动增量:真
    名称:字符串(255)
    电子邮件:字符串(300)
    消息:字符串(2000)

PHP设计模式-观察者模式

Steven 2009年12月29日星期二下午10:02

我一直在读头的第一个设计模式 最近已决定写一些PHP的例子为自己的利益的模式。 第一个,我已经决定代码了Observer模式 Observer模式的正式定义是:

观察者模式 (异步的一个子集的发布/订阅模式 )是一种软件设计模式,其中一个对象,所谓的主体,维护其家属名单,所谓的观察员,并通知他们通常通过调用任何状态变化自动,他们的方法之一。 它主要用于实现分布式事件处理系统。

随着系统变得更加松耦合,确保事件发生时这些更新知识通知,要求所有的系统。 例如,博客文章,节省了后,我们可能需要更新的搜索引擎(如Lucene的),更新我们的地图,标签,电子邮件订阅用户,等观察者模式允许开发人员添加额外的听众,他们观察的对象,而无需编辑。 通过注入一个主题(即博客文章编辑系统)的观察员(即搜索引擎更新观察员,Sitemaps生成器等),我们可以允许它执行所有必要的更新,不作任何改变。

继续阅读“PHP设计模式-观察者模式”»

办公室的网格计算,使用虚拟环境-第 4部分

Steven ,星期五2009年12月4日下午11:59

简介

我的工作在我们运行的批处理作业,每天处理的数据记录数百万,我最近一直在思考所有的机器,坐在周围的每个每天做几个小时没有一个公司 如果我们能够使用这些机器,以增强我们的系统的处理能力,岂不是好? 在这组文章中,我要去看看聘用一个办公室的潜在好处电网使用虚拟环境。

在第3部分中,我们创建了虚拟加工机,并成立Windows机器成为空闲时间的工人。

运行最新的代码

不可避免地创建您的工作人员的业务逻辑后,会发生变化,会发现错误,会产生更快,更高效的代码,从而留下你的工人坐在周围处理数据使用老臭的代码。 那么,我们如何确保我们总是使用最新和最伟大的处理脚本的版本?

有几个非常容易简单的方法,我们可以做到这一点,诀窍,然而,在实现这一目标,以减少处理能力和网络流量。 让最简单的解决方案,并开始慢慢改进的迭代的情侣。

第一种方法是简单地连接到我们的作业控制服务器(通过SAMBA,FTP,或类似的),并拉下最新版本的代码。 效率不是很高,但它会做的工作。 让改善,有点,如何创建一个rsync的脚本和使用,每次来代替呢? 或者把颠覆检查代码最初我们最新的处理脚本,然后就更新我们的代码在每次运行时(使用svn update)呢?

最后,我们最终可能会与一个bash脚本(称为由cron每10分钟),它看起来像这样简单的:

  #!/ bin / sh的
如果PS AX | grep的- V GREP | grep的PHP >的/ dev / null的
然后
    回声“的工作目前正在处理,退出”
其他
    回声“作业不运行,从现在开始”
     CD /路径/ / /复制工作
     svn update会
     PHP yourJobProcessingScript.php
科幻 

现在我们可以肯定的,每次运行时,我们肯定运行最新的代码。 我们确保我们的代码库更新我们每一次都执行一个运行,并减少网络流量,只在我们的网络传输文件的区别。

在我的示范设置,我没有完全按照以上。 Subversion是我的工作处理服务器上安装,我只是把最新的代码,从一个'工人'使用“SVN更新”的分支。 我还添加了一个版本号标记到我的处理脚本,这是返回结果的一部分返回到数据库。 这样我就可以看到我的代码被更新,每次我工人分局即复制到我的树干,我肯定是运行最新的处理脚本。

使用最新的数据

如果你的作业处理使用的数据源,然后在一些点,这些都将被更新。 除非你调用数据源的一个非常罕见的基础上,你将大量的交通网络,一旦你的工人开始运行陷入瘫痪带来的一切。 对于我的解决方案,我决定,我想我与我的虚拟机移动的数据源。

保持你马有!如果我的数据源是巨大的? 嗯,这真的是我们谈论了多少数据的情况下? 这可能是更具成本效益的一个额外的更大的硬盘驱动器安装到每一台机器,而不是购买一个额外的处理服务器。 这是一个预算的问题,并决定。 也许您的数据源是如此之大,它只是不可行保持在您的工作机器的数据量。 在这种情况下你会做什么? ,那么我们可以看看调用本地的数据服务器,但是这可能会导致网络问题。 在这种情况下,像这样的网格系统可能会变得不现实的,包括在您的办公环境。 这也可能是,你可以看看替代性的运行策略,例如,只调用你的员工之间的晚上八时及上午06时每天晚上和/或节流数据源的请求。

上移动100GB的数据,可以说我们的数据源量。 井是相当多的数据位左右移动网络上的更新。 我们如何确保我们在这种情况下,最新的数据复制? Rsync是一种可能性,但我个人认为,作业处理服务器上运行您最新的数据源,并设置作为复制主(有一个漂亮的长斌日志)可能要走的路:

复制 通过设置作为一个奴隶的每个工人的作业控制服务器更新您的数据源将涓滴,不需要在网络活动的大幅增加,您的工作人员(即除非您执行一个庞大的数据和更新您的所有工人踢一次)。 这已经超过rsync的优势,你不会得到一个长时间的停顿每个作业前,对数据库的更新, MySQL在守护您的工作将不断更新其数据,同时继续处理。

这是如何设置我的演示服务器。 设置复制,我跟MySQL的网站上的指南 (复制),并在20分钟内,我有我的头文字工作者,复制作业控制服务器的数据集。 复制设置和过程,对于每个额外的工作,每次工作时被复制的VM。

摘要

在本节的文章中,我们是让您的处理代码using rsync或subverion(SVN)做的工作,并减少网络流量,在相同的time.Â多么容易和无痛看着我们还讨论了如何允许它涓滴每个工人保持你的数据源的信息最新。 因此,我们面积,确保我们保持在我们的办公室电网系统的业务逻辑和信息。 显然是无数的替代品来执行这些任务,但这里是两个简单的例子来说明多么容易的解决方案来。

下一次

在这个系列中,恰当地命名为的第5部分的最后一部分,我们将讨论部署这一系统。 我会总结了什么教训和我设法创造。

办公室的网格计算,使用虚拟环境-第 3部分

Steven ,星期五2009年12月4日下午11:37

简介

我的工作在我们运行的批处理作业,每天处理的数据记录数百万,我最近一直在思考所有的机器,坐在周围的每个每天做几个小时没有一个公司 如果我们能够使用这些机器,以增强我们的系统的处理能力,岂不是好? 在这组文章中,我要去看看聘用一个办公室的潜在好处电网使用虚拟环境。

在第2部分 ,我们看着一台服务器将运行的工作,工作应该如何进行配置,以达到最大的处理量,同时确保每个作业是没有失败处理的。

设置你的工人 - ,或跛行服务器

在这个过程中的下一步是设立虚拟的工人。 为此,我将使用安装使用VirtualBox的CentOS的。 我要安装MySQLPHP的服务器,又称跛行马钱,M ySQL,P惠普)SERVERA(我可能有该名称最多)。

  • 你的Windows机器上安装的VirtualBox(以下链接)
  • 下载并安装CentOS的范围内创建的虚拟机(当前版本5.3)

有没有我这点有可能是千“有伟大的教程(OK,这里有一个: 创建和Managing CentOS的虚拟机下的VirtualBox )。 我想最重要的一点是,我叫我的虚拟机GridMachine。

至于我的选择客户端的虚拟化和操作系统有没有什么大的每一个选择的令人信服的理由。 VirtualBox是我用我的家用机,是由三个主要的操作系统支持的东西。 我选择了CentOS的作为其良好的稳定的操作系统,我用我自己的Web服务器上。 我是一个伟大的信徒为工作的合适的工具(虽然我申请“为您使用最快和最简单”的心态),因此,如果作业系统X运行你的代码更快,更有效地使用,而不是: )

重要的确保你的虚拟机使用DHCP,为每一个新的虚拟机,否则需要单独配置,这是一件好事,我们不want.By使用DHCP,我们不需要对工人的机器配置网络设置,DHCP将手你出去的IP地址。 因此您可以复制您的虚拟机有关Office无需担心设置每一个(这可以提高可伸缩性和降低工人管理)。

应力求实现的过程中,你将获得一个新的物理机,安装的VirtualBox,然后非常没有很多其他部署虚拟映像。 这可能是明智的设置上不同的子网中的所有工人,这样你至少可以看到有多少台机器运行。 您还需要设立您的机器上长期租赁或无限租用DHCP。

如何运行作业工人

这是一个有趣的领域,有几个有效的方法处理工人的工作。 这里,我将只讨论两个最明显的:

  • 永久运行脚本:一个脚本,它是一个shell脚本,或PHP脚本执行一次工人,作为一个无限循环的一部分运行。 我贴现作为一个脚本崩溃这种方法,可能你的工人将停止运行没有某种干预。
  • cron的脚本执行:cron守护进程每隔X分钟揭开序幕,以你的脚本调用事情。 如果没有一些检查,这可能会导致很多工人脚本运行的副本。

我的决定是用cron去揭开序幕shell脚本每10 minutes.Â我的shell脚本执行以下任务:

  1. 获取进程列表和grep“PHP”。 如果没有找到,那么继续。
  2. 打电话给你的工作代码,在我的情况,这将是基于PHP
  3. 工人的脚本完成其运行
  4. 准备再次去下一个合适的呼叫

我的bash脚本看起来像下面这样:

  #!/ bin / sh的
如果PS AX | grep的- V GREP | grep的PHP>的/ dev / null的
然后
    回声“的工作目前正在处理,退出”
其他
    回声“作业不运行,从现在开始”
     PHP yourJobProcessingScript.php
科幻 

注:回声的是几乎完全没有意义的,但可能有助于在未来的人来一起尝试和编辑。

这结论成立工人虚拟机,快速,简单,容易复制到每个接收到的新硬件。 电网系统的“聪明”还真是不可视化操作系统,其所有与创建过程中作业,作业的配置,并在确保作业运行在适当的时候(即当主机处于闲置状态的代码)。

设置Windows初始化工人

第一个任务是工作需要从Windows命令行运行的虚拟机的命令。 如果你已经安装在默认位置VirtualBox和你命名的工人 GridMachine然后命令需要加载您的工作是:

  “C:\ Program Files文件\ Sun \中的VirtualBox \ VBoxManage.exe”startvm GridMachine 

然而,运行在一个“无头”状态,我们需要使用脚本:

  “C:\ PROGRAM FILES \ Sun \中的VirtualBox \ VBoxHeadless.exe”startvm GridMachine - VRDP = OFF 

这将启动虚拟机,并​​没有GUI允许它优雅地保存状态。 第二个参数关闭的RDP,所以它不会与Windows RDP冲突,或给你一个消息监听端口3389。虚拟机的名称是大小写敏感的

下一步,我们将需要设置Windows揭开序幕我们的工人VM,一旦机器被闲置。 要做到这一点(在Windows XP),你需要去开始 - >所有程序 - >附件 - >系统工具 - >预定任务如下:

预定任务

然后点击“添加计划任务”浏览添加一个自定义程序。 导航到您的VBoxManage脚本,然后单击确定。 你的任务时间表的任何选项(变化,我们将在一分钟内),并继续。 跳过下一个屏幕后,Windows会问你是谁你想运行此任务,我建议无论是“管理员”或创建一个新的特权用户。 请记住我们不想干涉任何一点与机器上的标准的人员占。 单击下一步,然后检查显示此任务的高级选项。

运行文本框的末尾添加“startvm GridMachine”字符串,并确保运行,只有当记录在左unticked。 访问计划任务和变化的时间表下拉选项“空闲时”,选择你想要的机器,然后再移动到下一个标签闲置的时间。

最后勾去掉,选择哪些国家,如果它已经运行X的时间停止任务,但打勾的选项,以停止任务,如果机器不再闲置。

计划

这是它为Windows主机设置!

摘要

在这一部分中,我们已经成立了一个虚拟机作为一个工人,以及在我们调用,并执行我们的工作处理脚本(为自己的PHP脚本)。 从这里,我们来看看如何建立我们的Windows副本,在headless模式启动虚拟机时,计算机将成为闲置,并保存其状态,当用户恢复机器的使用。 希望在这一点上,你看到的是多么简单设立这样一个系统,并渴望得到一些实验自己!

下一次

第4部分中,我们将着眼于使用工具,以确保您正在运行最新版本的代码和数据源,使得到的结果总是与最新的商业信息和逻辑的最新的。

办公室的网格计算,使用虚拟环境-第 1部分

Steven ,星期五2009年12月4日下午11:23

简介

我的工作在我们运行的批处理作业,每天处理的数据记录数百万,我最近一直在思考所有的机器,坐在周围的每个每天做几个小时没有一个公司 如果我们能够使用这些机器,以增强我们的系统的处理能力,岂不是好? 在这组文章中,我要去看看聘用一个办公室的潜在好处电网使用虚拟环境。

作为一个PHP开发人员,我要使用的工具,我每天使用 Linux上,MySQL,PHP,VirtualBox和Subversion( SVN)的的。 不过,我希望这本手册将适应其他语言和技术一样好。

所提供的解决方案,我将非常松散的基础上的处理,我们就需要实现,但通过整篇文章,这可能不是真的,我会改变简单的事情,或产生更有趣的用法场景类型。

这些虚拟环境的Windows机器上运行,因为这是大多数办事处运行。 处理,办公室机器与使用这些机器的工作人员不应该干涉,应该不需要在机器的维修,并很容易地部署到新机器,因为他们成为可用。 此外,新的虚拟机时,不应要求任何额外的配置,因为这大大降低了在电网系统的可扩展的可扩展性和易用性。

为什么要部署Office网格计算?

首先,你可能会想,为什么不只是使用云计算资源,如Amazon的EC2平台 的原因可能有几个,例如:

  • 你不会委托到云计算环境中的某些数据
  • 你不能把法律上的原因(如数据离开该国),潜在的法律上的原因,如NHS的记录,到云计算环境的某些数据。
  • 你要保持加工单位的密切,已超过硬件的完全控制
  • 您不必项目资金运行云实例
  • 你的办公室没有连接到互联网,因此它不可能使用云资源
  • 你不喜欢雨,云建议雨水,因此,你保持远离

我敢肯定列表可以继续,但我认为现在。

办公室网格计算的优点

那么,让我们做一些数学(和真正的物理风格,让一些笼统的假设)。 想象一下,你有大仡的处理服务器上运行的100%的日常工作​​。 在你的办公室有50台机器处于闲置状态,每天16小时,这些机器是10%,只要你结实的处理断绝强大。 (四舍五入到这里所有的结果是低估的性能提升)。

因此,1机* 10%的力量,在空闲时间* 2 / 3 = 0.067,即1台式机处理过程中每天 6充分就业。

如果你现在规模高达15个闲置的台式机,每天处理许多工作,作为您的主服务器进行处理。

所以,我们假装办公室50台机器,我们可以增加我们从1服务器处理能力4个全处理服务器,或者我们可以处理 400每天的工作,而不是100。

请注意,贵公司有没有新的硬件投资只是增加其批量处理能力的4倍! 潜在的,你是要增加用电量,但大多数的办公室,我一直在的机器环境一般一夜之间反正左,所以你可以看到一个绿色的倡议。

其他优点也意味着如果你的办公室机器已经足够,为您提高您的办公设备的电源你的办公室电网变得更加强大,自动处理服务器可以被推迟,在新的投资(或更新)。

技术

你需要什么? (或更正确的我用什么):

  • 空闲的办公室机器(一个备用的旧的Windows XP的笔记本电脑在我的情况)
  • VirtualBox的(或其他虚拟化客户端软件)
  • 一个虚拟机上运行PHP,MYSQL runningÂ削减下来操作系统,我打电话我这些LIMP服务器 :)
  • 工作运行
  • 作业服务器(也可以是另一个虚拟机某处)

典型职位

,这个系统是设计来运行的作业的类型如下:

  • 系统接收数据后,我们需要匹配并返回结果列表
  • 匹配涉及检查/搜索几个相当静态数据源
  • 从数据源的结果可能需要进一步验证,合并,其他数据源的检查结果
  • 返回匹配的记录数据,充分验证和处理
  • 工作中的每个记录是独立的休息

所以基本上我们期待在运行需要的数据库查询和一些数字运算,在企业环境中的一个相当典型的场景混合的作业。

电网解决方案来处理这种类型的工作不仅有利。 基本上,任何可以分裂成独立的单位的过程可以并行运行。 的例子和更多信息,请参阅本维基百科 :网格计算,但几个著名的例子是SETI @ HOME和BIONC。 有运行计算网格的框架,这些都是值得研究的。

我们会怎样实现?

这些文章的最后,我希望表明,部署Office电网不需要非常昂贵或费时。 我要讨论:

  • 设置作业控制系统,工作配置
  • 创建一个适当的处理虚拟机
  • 如何在Windows机器上的系统设置
  • 确保您使用的是最新的代码和数据
  • 部署和基准
  • 展望未来

我会建设(OK,然后我建立了写本)的示例应用程序来测试在本地机使用Windows XP和我“GridMachine的”虚拟机的概念。 我的作业控制服务器将是我的主要机器上运行Fedora 11中。

这绝不意味着,表现出强大的系统完全正常的,它意味着更多的示范和讨论表明,这些东西可以在一个相当短的时间空间,并以较小的代价取得。 请随时给我发送任何意见,更正,或改善,我会尽我所能,保持更新,以匹配这篇文章。

下一次

在第2部分 ,我会开始寻找作业控制系统,研究工作应该如何进行配置,以达到最大的处理量,同时确保每个作业是没有失败处理的。

办公室的网格计算,使用虚拟环境-第 2部分

Steven ,星期五2009年12月4日下午11:23

简介

我的工作在我们运行的批处理作业,每天处理的数据记录数百万,我最近一直在思考所有的机器,坐在周围的每个每天做几个小时没有一个公司 如果我们能够使用这些机器,以增强我们的系统的处理能力,岂不是好? 在这组文章中,我要去看看聘用一个办公室的潜在好处电网使用虚拟环境。

第1部分中,我给我将使用以及讨论一些潜在的原因,你为什么会要创建一个办公室电网系统和技术的概述。

作业控制

如果你将要运行的作业,那么你会需要一些方法来管理它们。 你的作业控制系统(你的工作服务器上)需要,真正做到深思熟虑之前,甚至试图运行办公室电网。 因此,首先,什么是作业控制系统的任务:

  • 从工人的要求不可收拾工作
  • 告诉工人要运行什么类型的职位
  • 跟踪作业
  • 确保工作只运行一次
  • 为工人提供就业数据,或至少告诉他们在哪里得到它

该系统还需要具有可扩展性,一个解决方案,现在可以在一个单一的情况下可以延长至运行几种类型的就业机会,为企业看到在网格解决方案的价值。 例如,作业可能会获得优先事项,可能存在多个作业类型(即几个代码基地),最终你甚至可以运行几个不同的工作,每个作业类型优化的机器(虽然不从“一般工人“的想法)。 总是尝试思考未来,在开发系统时,短期前景可能会导致较长期的挫折和更多的发展时间。

作业服务器

We're going to need somewhere to control our jobs from, this should be the only system in your grid that has a fixed resource locator, be that an IP address, host name, URL (using internal DNS), etc. This is because the workers need to know where to look for jobs, workers need to find the job control system (not the job control system find the workers).

The job server itself doesn't really have a complicated task (in a basic system anyhow), it needs to store a list of jobs, hand out jobs, receive results, and subsequently store them for later retrieval. How these parts (such as 'hand out jobs') are defined can be very basic. Later on we can extend the system to include an administration interface to add, edit, delete, suspend jobs but this is beyond this exercise.

There is no reason whatsoever then that your job server could not be a virtual machine running within your main processing server provided it doesn't drain too many resources from it. The job server however does need high availability, if it goes down on a Friday evening you're going to lose a whole weekend of processing, potentially costing you a couple of weeks worth of processing time (when compared to your main processing server alone). You may want to consider putting your job server on a load balanced environment for high availability.

Basic Setup

The basic setup for our job server will consist of what I'm calling one of my LiMP servers (that is Li nux, m ySql, P HP). The code running on the workers will actually work out what jobs it can run by interacting with with job control system databases. Later on we could create a web service and actually hand out jobs rather than having the workers do the hard work themselves, but for now we'll continue using the KISS principle (Keep it Simple, Stupid!).

So, lets create three mySQL tables to deal with jobs. These will be `jobs`, `jobRecords`, and `jobResults`.

jobs table Here I'm using SQL Buddy a great little alternative to phpMyAdmin just because its easier to install on centOS (for others see: 10 Great alternatives to phpMyAdmin )

This table consists of 5 simple fields,

  • id: Uniquely identify the job
  • name: Could be a client reference, or any number of other identifiers
  • Status: You need to know where the job is at, eg
    • 0: Not started
    • 1: Picked up
    • 2: Completed
  • started_by: Who's started doing the job? This isn't entirely required but is a nice to have. I'd suggest tracking workers by their IP address on your network
  • started_at: When did the worker start the job? By tracking jobs that have not completed within X amount of time we know we need to pick up the job once again and start processing by another worker. Workers could stop processing/go offline for any number of reasons, power failure, crash, network loss, etc.

It is easy how this table could be extended with a few additional fields to allow for statistics tracking, a finish time column to see how long the job took, a counter to see how many workers picked up the job (obviously this needs to tend to 1), job priority, the list can go on and on. In more complex job scenarios it would be possible to specify how much memory the worker would need access to (and therefore only use suitable workers), or even what type of worker would be required.

Lets add a few example jobs:

example jobs

The next table again is quite simple to understand, these are our job records. They are linked to the main jobs table by a column `jobs_id`. The make up of this table very much depends on the data that you need to supply to your workers, lets make a very simple example where we have four columns:

  • id: ID of the record
  • name: Person's name
  • address: Person's address
  • jobs_id: The job ID that this record is linked to

The third and final table consists of a results table, it has much the same make up as our records table, and with the addition of some columns could be part of the records table:

  • job_record_id: Link the result to the job table
  • result: The result data

…and that's all you need for job control! (albeit at a very basic level) In my case I'm pointed to another table where my data to process was located, but this could just as easily been a file, parameters to run simulation code, you name it.

Selecting a job

As stated previously, the workers will do our job management for us for now, so all we need to really do is find a job that needs processing and get the information. How would we do this? Well pick our job selection criteria and look for jobs, in SQL I did the following:

  1. Take any jobs that are not marked as complete but from our worker and reset them (substitute __ME__ with an identifier, easiest would be IP address):
     UPDATE `jobs` SET `status` = 0 WHERE `status` = 1 AND `started_by` = __ME__; 
  2. Using our job selection criteria, select a job and tell the control system that this worker is dealing with it:
     UPDATE `jobs` SET `status` = 1, `started_by` = __ME__, `started_at` = NOW() WHERE `status` = 0 OR
    (`status` = 1 AND `started_at` > DATE_SUB(NOW(), INTERVAL X HOUR)) ORDER BY `id` ASC; 

    By grabbing jobs that haven't returned results in X amount of time we ensure that all jobs are run in the event of a worker crashing or going AWOL.

  3. Next grab the jobs details followed by the records themselves:
     SELECT * FROM `jobs` WHERE `started_by` = __ME__ LIMIT 1;
    SELECT * FROM `job_records` WHERE `id` = __JOBID__; 

Upon completion of the job we insert our result records and mark the job as complete. Remember as jobs can suspend/resume at any time allow for some robustness in your script. It might be that the task suspends half way through updating the job control system, so checking the number of records in a job and the number of results saved back to the job control system would be a wise move.

In addition, whilst this demonstrates how jobs can be selected and managed from an SQL-query frame you should really be abstracting your job control so that if you decide to switch to using a web service, a file based system, XML , or any other number of systems it will not affect the code above it.

Job Configuration

The next aspect to consider is job size and configuration. By playing with job configuration we can strike an excellent balance between speed, process replication, and reliability. Take a couple of scenarios:

  1. Jobs take 1 day each to run: This means that your workers need 15 days to process each job (remember 10% of the power for 2/3rds of the time). This is clearly not a wise configuration, your job size is way too big! It would take at least double the time to get a job processed should the initial worker go AWOL (time to pick up that it hasn't returned a result plus reprocessing time). In an ideal you'd have at least one full job easily cleared by the end of each long idle period, that way you keep the jobs ticking over and at worst case a job would take two days to process should the first go missing.
  2. Jobs take 1 minute to run: This means that your workers take about 15 minutes to run each job. Whilst this may initially seem ideal, you gain additional work processing during lunch time, coffee breaks, meetings, etc this scenario puts strain on other areas of your system and introduces its own problems. For example, firstly your setup/processing time ratio is going to go right down, therefore losing system efficiency. Your network is going to be constantly streaming job information to the various workers frustrating staff who are dong their day to day work. You're also going to put more strain on your job processing server as it has to dish out lots and lots of small pieces of work on a regular basis. Lastly, in this situation if your job server goes down you're going to create a huge back log of uncompleted work whereas bigger jobs could of continued processing blissfully unaware that the job server was experiencing difficulties.

In reality there will be no one ideal configuration for your grid setup, much depends on the available resources, types of job, job turnaround time requirements, network capability, and so on. However some guidelines would be:

  • Size jobs so that each worker can get through at least 3-4 jobs in a period of 15 hours (the longest likely idle time period)
  • Play with the job size so that setup time becomes fairly insignificant compared to the processing time (bearing in mind the above point).
  • If a job doesn't complete in double the amount of time (maybe less) you expect it to complete it assume that its gone AWOL and start processing it with another worker. This means you may have to wait up to three times the normal length of a job for it to complete (possibly longer if the subsequent job fails). You may want to reduce this time, but be careful not to reduce it too much as you may start duplicating processing tasks on a regular basis.
  • Jobs should be independent of outside requirements as much as possible. The job server, for example, should only be contacted at the start and end of every job.
  • Don't saturate your network, this will have two negative effects, your daytime staff will find using the network frustrating and problems may be experienced with connections timing out a problem that will only get worse as you scale your grid.
  • Ensure jobs can run on your workers. If jobs become too memory intensive or disk space intensive jobs will start aborting and the only thing you'll notice is a drop in number of jobs processed with no real reason why.

Submitting Results of a Job

When submitting the results of a job it is important to check that results have not been submitted by another worker, especially if the current worker has been dormant for some time.

When results are submitted ensure that the number of results matches the number of records within the job.

As stated previously, and can not be over emphasised, build fault tolerance into job retrieval and results submission. The workers can (and most likely will) go into suspend mode at the most inconvenient of times and this needs to be catered for. Also once again abstracting away your results submission will help cater for future changes to your job control system much easier to deal with.

摘要

In this section we have looked at what a job control server needs to do and how to get a very basic system set up. We discussed how to retrieve a job from the control system and how best to configure jobs to get the most our of your office grid system. To finish, a paragraph or two on submitting results back to the job control server was presented.

  • A job control server manages jobs and ensures that all work units are completed
  • By abstracting your job select/results submission we can change the technology of the control server without much problems
  • Configure your jobs to ensure that they are run quickly and efficiently without putting too much pressure on your network infrastructure, and without duplicating processing tasks on a regular basis.
  • Ensure that you build fault tolerance and error checking into your routines, workers can suspend and resume and the most inconvenient of times. Remember to check if results have already been submitted by another worker.

Next time

In part 3 we'll create our virtual processing machine and set up our windows machines to become idle-time workers.

Office Grid Computing using Virtual environments – Part 5

By , Friday 4th December 2009 11:03 pm

简介

I work in a company where we run many batch jobs processing millions of records of data each day and I've been thinking recently about all the machines that sit around each and every day doing nothing for several hours. Wouldn't it be good if we could use those machines to bolster the processing power of our systems? In this set of articles I'm going to look at the potential benefits of employing an office grid using virtualised environments.

In Part 4 we looked at using tools to ensure that we're running the latest version of the code and data sources so that obtained results are always up-to-date with the latest business information and logic.

Pre-Deployment

Before deploying your grid system if there's one thing you do and one thing alone it's benchmark your current system ! No matter what you tell colleagues about how much extra work your system is going to do unless you have numbers to back this up your guarantees are nothing. So,

  • how many records can you process currently? Per Day? Per Hour?
  • How long does it typically take to turn around a job?
  • How much more capacity do you have?

There's also additional questions:

  • If your processing server (or one of your processing servers) goes down how will this affect your capabilities, will you be crippled?
  • What advantages do you hope/expect to get from a grid system?
  • Are your office machines capable of running the jobs?
  • Are your (or can you jobs be converted) to wrok in this style of running?

The last major point is to take your time on any major change like this. Update your processing code to work using the new methodology, benchmark again. Possibly set up your processing server to run a virtual machine, after all your processing server will just be another worker (just a very powerful one relatively). Allow the new process to settle.

Deployment

My suggestion would be to pop into the office one weekend perform all the installations and setup. Do this just before a fortnight's holiday and leave so other poor chap to deal with the consequences… maybe not…

Deployment for a system like this needs to be slow. Despite it being relatively simple to set up this system will affect your entire office infrastructure (well the digital one). Firstly, roll out to a couple of machines at a time, monitor network traffic, how the worker hosts perform on a day-to-day basis. You may need to alter your job configuration in response to your findings.

Once the system has settled with a few machines (lets say 10% of all office machines, ie 5) keep monitoring network traffic and host machine performance. Next benchmark again, you should now be processing 33% more jobs than your first benchmarks. Check this is so, or that you're at least in this ballpark. If not, investigate what is going on before moving on. Repeat this cycle until you happily have all office machines running without killing individual machine performance or grinding your network to a standstill.

At all times keep benchmarking, even after all deployments are made. Check how new code updates affect speed of your system, check all workers are reporting in and processing jobs. Slowly (very slowly) increment your job configuration to get the best from your workers and network.

Stop!

What if you want to stop your workers from running at some time? They are all out there running, regenerating, and trying their best to process data like hungry insects. The answer may seem obvious but its worth adding just in case its overlooked. Simply edit your processing script with an exit(0) or die() or some other statement to kill your processing job. An important reason why we always try to update to the latest processing script before any run!

Demonstration System

In order to write this set of short articles I created a very small grid to demonstrate the technologies and methodologies. I read lots of articles, tutorials, and used various tools to setup and monitor what was going on. By no means have I gone out and saturated a whole office with traffic and nor have I had access to a regular staff members PC to see how host performance was affected.

My demonstration system was very humble indeed. I used my regular desktop set up as a job control server. On this I had installed mySQL server installed set up as a master in replication, PHP , and SVN linked through apache (for access via worker VM).

I then created a centOS worker machine on VirtualBox on a 6 year old windows XP laptop. I setup scheduled tasks as specified after copying the VM onto the machine and let it go.

The virtual machine was set up with PHP, subversion, and mySQL. I checked out a branch named 'worker' from my job control servers repository and made sure it could be updated using 'svn update'. Next I setup mySQL as a slave and checked that data was replicating from mySQL on the job control server down to the worker VM. After all this I setup the bash script and the cron job.

My processing script basically went along the lines of this (very simple stuff):

  • Read in the name field
  • Counted the number of similar names in a table from the data source held on the VM
  • Counted the number of names as above but splitting the name by spaces (ie forename, middle, surname)
  • Repeated this process 1,000 times

Each job took approximately 20 minutes to run. At one point I opened several copies of the worker VM on the windows laptop and watched the jobs be checked off by each of the worker IP addresses. At this point I also confirmed that replication automatically restarted.

Leaving the laptop to idle resulted in a worker starting to process jobs from the job control server. When resuming laptop usage there was a delay of about 30-60 seconds, this is a fair amount of time and staff would need to be made aware that their machine may pause for a short while when returning to the machine. Newer machines may not have a pause of this long. The benefit of the amount of processing performed by these machines during idle periods would more that outweigh staff members having to wait a short period (say 1 minute) on arriving at their machines of a morning (I frequently wait longer that this for a Windows Defender update to take place) provided they were made aware of this (useful time to grab a morning coffee!).

Overall I feel confident that I have demonstrated the technologies that could be used to create such a system. I have shown that such a system does work on a (very) small scale and with some more experimenting could be scaled up utilise the resources of an office's machines. If I don't get to the point of doing this I would be very interested to know/see when someone else does.

Conclusions / Evaluation

The next obvious step would be to actually get a real world example and start to deploy a system such as this within an office environment and see what happens. Asking a business to commit to this without a trail blazing company to prove the technology and effectiveness may be a little difficult. Grid/Distributed computing is very popular is some circles and has some large applications (BIONC, SETI@Home, Folding@Home, etc). I did not, however, find a smaller scale and simple system like this in my searches that could be rolled out within an office environment.

I created a basically free system using mostly open source software and tools available in almost any office. The technologies were basically demonstrated and show to perform and work as expected. Hopefully I have show that with not much work and with a very simple setup you can deploy an office grid computing system that is powerful, cheap, and scalable all at the same time.

Once a system is up and running there is almost no end to the amount of customisation and improvements you can make. For example statistics / benchmarking can easily be added showing the worth of such a system every day. New machines can be added quickly and easily as and when they arrive with upgrades to existing hardware bolstering your processing power.

I hope you've enjoyed reading this series of articles and its given you food for thought on running an office grid system. The solution presented here won't necessarily work in all situations but should be adaptable to allow you to get your data processing done using your own solution.

Please feel free to send me any comments, corrections, or improvements and I'll do my best to keep this article updated to match.

Zend Framework: Fundamentals – Review

By , Saturday 28th November 2009 10:42 pm

My employer recently paid for a group of us developers to take the Zend Framework: Fundamentals course, here I'll summarise my thoughts and opinions on the course for others. For those looking to save time, here's my summary:

For developers who haven't had time to look at the Zend Framework this course (Zend Framework: Fundamentals) offers a good overall picture of the framework introducing you to the key areas and giving enough information in order to continue. For those who have spent time looking at the framework and have followed one or two tutorials this course does not offer much beyond.

Background

I've been a PHP developer for around 5-6 years, and have started working with the Zend Framework on a component basis over the last 6 months. I've developed and/or been a developer on a couple of small Zend Framework MVC sites. I'll be honest, I haven't had a huge amount of exposure to other frameworks from a coding point of view but have spent several hours researching the project websites and evaluating them. The framework and the community surrounding Zend Framework it is quite exciting and there seem to be huge possibilities in where its going.

About the Course

The course is delivered over 9 two hour webex sessions (with a 10-minute break in the middle). The time is spent going through a set of slides provided by Zend with discussion at any time. You can use a microphone to talk to the instructor, but to be honest I didn't see anyone use anything more than the chat window. In addition a VMWare Ubuntu machine is provided that has example code and projects set up an a trial version of Zend Studio. The course leader talks to attendees either over an integrated VoIP solution, or you can dial in using one of the many worldwide dial in numbers.

During the course the material consists of a brief overview of the Framework and the MVC pattern before heading into a sample guestbook application. The discussion demonstrated bootstrapping, Zend_Application, Db Tables, Database access, Forms, Filtering, ACL, Validating, etc, etc. Basically covering all the topics you'd require to get a basic site up an running all the time giving you the tools to go and get more advanced in the framework (although this did amount to 'See the website' much of the time).

Time is given to code up some examples, and to develop the 'guestbook' and simple 'wiki' application. Personally I felt that providing the code or each app and then asking us to develop what was essentially a copy alongside didn't really provide a good learning experience. I would have preferred to develop an application similar, but not identical. to the example application with the benefit of having a guide to refer to. Alternatively building the applications from scratch with the demonstrator would of possibly led to more questions about why and how , thus giving a better understanding of the framework, after all you can look up specifics after the course.

The last lecture consisted of working on the wiki application with help/guidance from the instructor. After the course feedback was taken, it was emphasised several times through the course that Zend takes feedback very seriously, in fact apparently our version of the course was quite new. Some of the other developers in the company will be taking the course soon so it will be interesting to see if this has happened.

The course style was informal, allowed for feedback and collaboration between attendees and the instructor. The course leader was friendly, approachable (email addresses were shared for questions), and whilst his presentation from the slides was a bit shaky seemed fully competent in the framework. He was clearly someone who used the framework on a regular basis rather than someone who is taught to teach the course, I liked the 'real world' experience in that respect.

Overall Feeling

In some ways I found the course a waste of time, in others it was very handy. Hopefully I'll get my reasons across clearly, and maybe provide some food for thought or useful feedback (knowing me this is unlikely!).

For myself this course was aimed at too low a level. Having gone through the quickstart guide, read Rob Allen's Zend Framework in Action, and worked with the framework a little I didn't really get anything too much. I would of liked the course to pick up from the end of the quickstart and develop additional skills.

That said, the course title does clearly state “Zend Framework: Fundamentals ” and in that aspect the course achieves what it sets out to do. Other members of the development team that haven't spent the time looking into the framework finished each session with enthusiasm and asked questions which was really nice to see.

All was not lost, it was good to spend time confirming the basic details of the framework and get to ask a couple of questions in areas where I wasn't 100%. It was also time that I got to sit down each day and think about coding using the framework and future projects, something I wouldn't of been able to do otherwise (can you imagine your company agreeing to that? :) ). Last but not least you also get a nice certificate from Zend to say that you attended the course (albeit by email).

Zend Framework Certification

This was one question that kept coming to mind during the course, would it prepare me for the certification? The quick, easy is a resounding No . The course instructor was quite clear on that with the additional advice that for the certification you should really be using the framework on a day to day basis and feel very comfortable and confident in its usage and methodologies.

摘要

Given everything I've written above, I'll summarise everything in two easy bullet points:

  • New to Zend Framework: This course does exactly what you'd expect, it gives you a nice introduction to the framework and a good grounding on the basics from which you can build. The course seems to generate interest and enthusiasm for the framework amongst developers.
  • Used the Zend Framework: While it was nice to shore up some of the very basics I felt the time, effort, and funds to take the course could of been better spent elsewhere. It will be nice to see Zend create a new higher level course to take developers to the next level – at least to the standard of certification and beyond. For that I would sign up immediately.












Panorama Theme by Themocracy

8 visitors online now
6 guests, 2 bots, 0 members
Max visitors today: 11 at 11:07 pm UTC
This month: 16 at 04-08-2011 12:59 am UTC
This year: 130 at 28-03-2011 10:40 pm UTC
All time: 130 at 28-03-2011 10:40 pm UTC