介紹
我在一家公司上班,我們許多批處理作業處理數據每天數以百萬計的記錄,我一直在想最近所有的機器,周圍坐的每一天做幾個小時沒有運行。 如果我們可以使用這些機器,以增強我們的系統的處理能力,豈不是好? 在這組文章中,我要去看看用人辦公室電網使用虛擬環境的潛在好處。
在第4部分中,我們看到,在使用工具,以確保我們正在運行最新版本的代碼和數據源,使得到的結果是一直到最新的最新的商業信息和邏輯。
預部署
前,如果有一件事你做一件事單獨部署的網格系統,它是當前系統的基準 ! 無論你告訴同事多少額外的工作,您的系統正在做的,除非你有編號,以備份您的擔保這是什麼。 因此,
- 你可以處理多少條記錄當前? 每日? 每小時?
- 多久通常需要扭轉工作?
- 你有更多的容量多少?
還有其他問題:
- 如果您的服務器進行處理(或處理服務器)出現故障,這將如何影響你的能力,你會被削弱?
- 你希望/希望得到來自電網系統有什麼優勢?
- 是你的辦公室機器能夠運行的工作嗎?
- 是你(或你的工作可以被轉換),在這種運行方式wrok?
最後一個主要的一點是採取你的時間,像這樣的任何重大變化。 更新您的處理代碼工作再次使用的新方法,基準。 可能成立處理服務器來運行一個虛擬機,所有的服務器進行處理後,將是另一名工人(只是一個非常強大的比較)。 允許新的進程來解決。
部署
我的建議是到辦公室的一個週末彈出執行所有的安裝和設置。 之前剛剛做了兩個星期的假期,這和其他貧困離開章處理的後果......也許不是......
部署這樣的系統需要是緩慢的。 Despite it being relatively simple to set up this system will affect your entire office infrastructure (well the digital one). Firstly, roll out to a couple of machines at a time, monitor network traffic, how the worker hosts perform on a day-to-day basis. You may need to alter your job configuration in response to your findings.
Once the system has settled with a few machines (lets say 10% of all office machines, ie 5) keep monitoring network traffic and host machine performance. Next benchmark again, you should now be processing 33% more jobs than your first benchmarks. Check this is so, or that you're at least in this ballpark. If not, investigate what is going on before moving on. Repeat this cycle until you happily have all office machines running without killing individual machine performance or grinding your network to a standstill.
At all times keep benchmarking, even after all deployments are made. Check how new code updates affect speed of your system, check all workers are reporting in and processing jobs. Slowly (very slowly) increment your job configuration to get the best from your workers and network.
Stop!
What if you want to stop your workers from running at some time? They are all out there running, regenerating, and trying their best to process data like hungry insects. The answer may seem obvious but its worth adding just in case its overlooked. Simply edit your processing script with an exit(0) or die() or some other statement to kill your processing job. An important reason why we always try to update to the latest processing script before any run!
Demonstration System
In order to write this set of short articles I created a very small grid to demonstrate the technologies and methodologies. I read lots of articles, tutorials, and used various tools to setup and monitor what was going on. By no means have I gone out and saturated a whole office with traffic and nor have I had access to a regular staff members PC to see how host performance was affected.
My demonstration system was very humble indeed. I used my regular desktop set up as a job control server. On this I had installed mySQL server installed set up as a master in replication, PHP , and SVN linked through apache (for access via worker VM).
I then created a centOS worker machine on VirtualBox on a 6 year old windows XP laptop. I setup scheduled tasks as specified after copying the VM onto the machine and let it go.
The virtual machine was set up with PHP, subversion, and mySQL. I checked out a branch named 'worker' from my job control servers repository and made sure it could be updated using 'svn update'. Next I setup mySQL as a slave and checked that data was replicating from mySQL on the job control server down to the worker VM. After all this I setup the bash script and the cron job.
My processing script basically went along the lines of this (very simple stuff):
- Read in the name field
- Counted the number of similar names in a table from the data source held on the VM
- Counted the number of names as above but splitting the name by spaces (ie forename, middle, surname)
- Repeated this process 1,000 times
Each job took approximately 20 minutes to run. At one point I opened several copies of the worker VM on the windows laptop and watched the jobs be checked off by each of the worker IP addresses. At this point I also confirmed that replication automatically restarted.
Leaving the laptop to idle resulted in a worker starting to process jobs from the job control server. When resuming laptop usage there was a delay of about 30-60 seconds, this is a fair amount of time and staff would need to be made aware that their machine may pause for a short while when returning to the machine. Newer machines may not have a pause of this long. The benefit of the amount of processing performed by these machines during idle periods would more that outweigh staff members having to wait a short period (say 1 minute) on arriving at their machines of a morning (I frequently wait longer that this for a Windows Defender update to take place) provided they were made aware of this (useful time to grab a morning coffee!).
Overall I feel confident that I have demonstrated the technologies that could be used to create such a system. I have shown that such a system does work on a (very) small scale and with some more experimenting could be scaled up utilise the resources of an office's machines. If I don't get to the point of doing this I would be very interested to know/see when someone else does.
Conclusions / Evaluation
The next obvious step would be to actually get a real world example and start to deploy a system such as this within an office environment and see what happens. Asking a business to commit to this without a trail blazing company to prove the technology and effectiveness may be a little difficult. Grid/Distributed computing is very popular is some circles and has some large applications (BIONC, SETI@Home, Folding@Home, etc). I did not, however, find a smaller scale and simple system like this in my searches that could be rolled out within an office environment.
I created a basically free system using mostly open source software and tools available in almost any office. The technologies were basically demonstrated and show to perform and work as expected. Hopefully I have show that with not much work and with a very simple setup you can deploy an office grid computing system that is powerful, cheap, and scalable all at the same time.
Once a system is up and running there is almost no end to the amount of customisation and improvements you can make. For example statistics / benchmarking can easily be added showing the worth of such a system every day. New machines can be added quickly and easily as and when they arrive with upgrades to existing hardware bolstering your processing power.
I hope you've enjoyed reading this series of articles and its given you food for thought on running an office grid system. The solution presented here won't necessarily work in all situations but should be adaptable to allow you to get your data processing done using your own solution.
Please feel free to send me any comments, corrections, or improvements and I'll do my best to keep this article updated to match.