The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 21, 2019, 12:50:52 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: Azkaban  (Read 2584 times)
isthisthingon
Global Moderator
Lifer
*****
Offline Offline

Posts: 2879



View Profile
« on: March 30, 2010, 04:29:45 PM »

http://sna-projects.com/azkaban/

I got the notification from the Apache Foundation today at 2:30pm so I'm just looking into this myself.  My initial thoughts are yum, drool and can I have a side of servers with that nifty, open source workflow scheduler?  The email states: "Azkaban is LinkedIn's solution to running multiple, scheduled batch jobs (workflows) against our Hadoop grids.  It supports dependency chains, a web UI, job upload capability, Pig, external-to-Hadoop steps, and more."

Here's the page, in case you fear the dreaded ITTO rick roll!

Quote
What is it?
A batch job scheduler can be seen as a combination of the unix utilities cron and make. Batch jobs need to be scheduled to run periodically. They also typically have intricate dependencies chains–for example dependencies on various data extraction processes or previous steps. Larger processes may have 50 or 60 steps, some of which may run in parallel and others of which must wait for one another. Combining all these processes into a single program will allow you to control the dependency management but lead to sprawling monolithic programs which are difficult to test or maintain. Simply scheduling the individual pieces to run at different times avoids the monolithic problem, but introduces many timing assumptions that are inevitably broken. Azkaban is a "workflow" scheduler that allows the pieces to be declaratively assembled into a single workflow and for that workflow to be scheduled to be run periodically.  A good batch workflow system allows a program to be built out of small reusable pieces that need not know about one another. By declaring dependencies you can control sequencing. Other functionality available from Azkaban can then be layered on top of the job–email notifications of success or failure, resource locking, retry on failure, log collection, historical job runtime information, and so on.


Why was it made?
Schedulers are readily available (both open source and commercial), but tend to be extremely unfriendly to work with–they are basically bad GUI's grafted onto 20 year old command line clients. We wanted something that made it reasonably easy to visualize job hierarchies and run times without the pain. Previous experience made it clear that a good batch programming framework can make batch programming easy and successful in the same way that a web framework can aid web development beyond what you can do with an HTTP library and sockets.


State of the project
We have been using Azkaban internally at LinkedIn for the last nine months or so, and have about a hundred jobs running in it, mostly hadoop jobs or ETL of some sort. Azkaban is quite new as an open source project, though, and we are working now to generalize it enough to make it useful for everyone.  Any patches, bug reports, or feature ideas are quite welcome. We have created a mailing list for this purpose.
Logged

I would love to change the world, but they won't give me the source code.
kurdt
Lifer
*****
Offline Offline

Posts: 1153


paha arkkitehti


View Profile
« Reply #1 on: March 30, 2010, 10:14:19 PM »

Here's the page, in case you fear the dreaded ITTO rick roll!
Aren't you a wee bit old for rick rolling? For you I was imagining more like some Frank Sinatra photo with speech bubble "In my days..." Wink
Logged

I met god and he had nothing to say to me.
isthisthingon
Global Moderator
Lifer
*****
Offline Offline

Posts: 2879



View Profile
« Reply #2 on: March 31, 2010, 12:10:48 AM »

Frank's like the Meaty Cheesy Boys in my book.  All that new fangled shit gets my knickers right in a firm twist.  Anyhoo, I'm 100% Bach, bitches.
Logged

I would love to change the world, but they won't give me the source code.
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!