Thursday, September 22, 2016

Schedule a Pentaho job to run automatically

Got a job that needs to run every morning at the same time, or even every half hour? Pentaho Data Integration Server does have a built-in scheduling view, but if you are using Pentaho CE you may end up using Windows Task Scheduler to call a .bat file that will in turn call kitchen.bat to run your .kjb file. Clear as mud? Here we go ...

I have the world's simplest job called main.kjb. All it does is create a file called hello_world.txt, with an added timestamp before the file extension.

The batch file (.bat)

If I want to use Windows Task Scheduler to run this task, I need a very simple batch file. It needs to call kitchen.bat, and then point Kitchen at the job file that I want to run. It can also create a job log for me, with any level of verbosity you choose. Kitchen has good documentation at http://wiki.pentaho.com/display/EAI/Kitchen+User+Documentation.



If you've never run Kitchen from the command line on your machine before, you will probably have to add the path to your Path system environment variable. Search for these variables in the Settings menu. You'll need the full path to where your kitchen.bat lives.


This may look different depending on your version of Windows.

Once you've done this, you can test your .bat file by double-clicking it. This is a good check before setting it up in the task scheduler. 


The task scheduler

The scheduler can be found by searching for it on your machine. If you use the wizard to set up your schedule, you want to select the action of "Start a program". But the program you are starting isn't Spoon, it's the .bat file that you created. 

In the wizard, you can pick the time that the job will run. Elsewhere in the utility you can select fancier options, like having the task run every half hour between 9 am and 5 pm on weekdays.

Let's start a program...

Here's where the .bat file goes...


And here's how the task looks in the schedule list, along with its task siblings.



If you enable history logging, the task scheduler shows you a rough history of every action taken on your task.

The task scheduler also allows you to run the task on demand, if you would like. This means that you don't have to open your job file and risk inadvertently changing something.

No comments:

Post a Comment