Load balancing and fail over with scheduler

Every programmer at least develop one Scheduler or Job in their life time of programming. Nowadays writing or developing scheduler to get you job done is very simple, but when you are thinking about high availability or load balancing your scheduler or job it getting some tricky. Even more when you have a few instance of your scheduler but only one can be run at a time also need some tricks to done. A long time ago i used some data base table lock to achieved such a functionality as leader election. Around 2010 when Zookeeper comes into play, i always preferred to use Zookeeper to bring high availability and scalability. For using Zookeeper you have to need Zookeeper cluster with minimum 3 nodes and maintain the cluster. Our new customer denied to use such a open source product in their environment and i was definitely need to find something alternative. Definitely Quartz was the next choose. Quartz makes developing scheduler easy and simple. Quartz clustering feature brings the HA and scalability in scheduler.
Quartz uses JDBC-Jobstore to store the jobs and load balancing between different nodes. Please see below for high level architecture of Quartz clustering.
Quartz Clustering Features:
1) Provides fail-over.
2) Provides load balancing.
3) Quartz's built-in clustering features rely upon database persistence via JDBCJobStore (described above).
4) Terracotta extensions to Quartz provide clustering capabilities without the need for a back-end database.
Lets start coding, first we need to prepare our DB:
In my case, its Oracle.
GRANT create session TO quartztest;
I have just create a DB schema named quartztest in my Oracle database.
For creating database objects we have to download the quartz distributive and run the SQL script for Oracle. You can also download the script from my Github repository
After running the sql script and prepared our DB, we are ready to start developing our high availability scheduler.
First implements the Quartz JOB interface as below
package com.blu.scheduler;

import org.quartz.Job;
import org.quartz.JobExecutionContext;
import org.quartz.JobExecutionException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

 * Created by shamim on 07/12/15.
public class AltynJob implements Job {
    private Logger logger = LoggerFactory.getLogger(AltynJob.class);
    public void execute(JobExecutionContext jobExecutionContext) throws JobExecutionException {"Do something useful!!", jobExecutionContext);
Now we need to create our Job and start the scheduler
package com.blu.scheduler;

import org.quartz.*;

import org.quartz.impl.StdSchedulerFactory;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import static org.quartz.JobBuilder.newJob;
import static org.quartz.SimpleScheduleBuilder.simpleSchedule;
import static org.quartz.TriggerBuilder.newTrigger;

 * Created by shamim on 11/12/15.
public class CreateJob {
    private static final Logger LOGGER = LoggerFactory.getLogger(CreateJob.class);
    private static final String JOB_NAME="jobMedian";
    private static final String GROUP = "jobMedianGroup";
    private static final String TRIGGER_NAME= "trgMedian";
    private static final boolean isRecoverable = false;
    private static final Integer INTERVAL = 40; // in seconds

    private void create() throws SchedulerException {
        final Scheduler scheduler = StdSchedulerFactory.getDefaultScheduler();
        // create JOb
        JobDetail jobMedian = newJob(AltynJob.class).withIdentity(JOB_NAME, GROUP)
                                                .requestRecovery() // ask scheduler to re-execute this job if it was in progress when the scheduler went down
        // trigger
        SimpleScheduleBuilder scheduleBuilder = SimpleScheduleBuilder.simpleSchedule();

        Trigger trgMedian = newTrigger().withIdentity(TRIGGER_NAME, GROUP)
                                        .startNow().withSchedule(scheduleBuilder).build();"Start the scheduler!!");
        // Schedule the job
        scheduler.scheduleJob(jobMedian, trgMedian);


    public static void main(String[] args) {"Create and Start the scheduler!!");
        try {
            new CreateJob().create();
        } catch (SchedulerException e) {
Note that Job should be create one time by any scheduler. Our job named jobMedian will be store in data base.
Now we need Quartz configuration to create and running the job
# Configure Main Scheduler Properties

org.quartz.scheduler.instanceName = MyClusteredScheduler
org.quartz.scheduler.instanceId = AUTO

# Configure ThreadPool

org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount = 1
#org.quartz.threadPool.threadPriority = 5

# Configure JobStore

org.quartz.jobStore.misfireThreshold = 60000

org.quartz.jobStore.class = org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass =
org.quartz.jobStore.useProperties = false
org.quartz.jobStore.dataSource = myDS
org.quartz.jobStore.tablePrefix = QRTZ_

org.quartz.jobStore.isClustered = true
# interval should be minimum for nodes to put and get the lock. In my case trigger interval is 40 seconds.
org.quartz.jobStore.clusterCheckinInterval = 10000

# Configure Datasources

org.quartz.dataSource.myDS.driver = oracle.jdbc.driver.OracleDriver
org.quartz.dataSource.myDS.URL = jdbc:oracle:thin:@localdomain:1521:DB11G
org.quartz.dataSource.myDS.user = quartztest
org.quartz.dataSource.myDS.password = quartztest
org.quartz.dataSource.myDS.maxConnections = 5
org.quartz.dataSource.myDS.validationQuery=select 0 from dual
One of the main properties is org.quartz.jobStore.isClustered = true, which confirm that quartz scheduler will run in cluster mode.
If you will run the above code first time, it will create and run the scheduler in cluster mode. Every 40 seconds you will get the message "Do something useful!!" in your console.
If you will the class StartNode, it will start another scheduler instance.
Another very important configuration is org.quartz.jobStore.clusterCheckinInterval = 10000, if it is too high or near by your trigger interval, load balancing could not work. Because every scheduler instance should get the lock for run their job. If the properties is too small it can harm you Database performance. Imagine every 2 seconds a lot of scheduler with 100 jobs are trying to get and release lock from you data base. How does the fail over work? if i shut down one of my instance, after 10 seconds one of the instance detect that one of my instance went down and he will take care of my scheduler.
If you query the data base table QRTZ_FIRED_TRIGGERS, you will found which instance acquired the lock and fired the trigger
In summary, quartz scheduler cluster is easy to setup and run. Happy weekend.


