Thursday, August 11, 2016

Delete files from Confluence by date

Deleting files from Confluence can be tedious if you have to do it one at a time, especially if you have potentially dozens of files to be deleted and you need to only delete files with specific attributes.

Using Confluence CLI (Command Line Interface) and Pentaho PDI, this can be easily accomplished. You can even schedule the Pentaho job to run at specified time intervals using Windows Scheduler or your favorite scheduling tool.

This is another small job, with one transformation in the middle that can pass multiple rows to the .bat file in the last step. In this example I'm deleting all files from a space that were posted before yesterday.


get list of files

The first entry in this job is a Shell script step that gets a list of attachments from space YOURSPACE on the page called Your Title. The list will be a .csv file with file attributes, and it will land in the location specified in the "General" tab.


select files to delete

At the heart of the job is a transformation that takes in a list of file attributes in a Confluence space, obtains yesterday's date from the System Info step, performs filtering, and then passes the rows back to the main job.



Here's a list of attributes that are in the attachmentlist.csv. There are plenty to use for filtering. We are going to be filtering on the Created field.


We get Yesterday from the System:



And compare it with Created, sending the rows where we want them. The "Select values" steps that are the targets of the Filter step are there mainly for troubleshooting; they don't do anything.


removeAttachment.bat

The last entry in the job is a call to a batch file called removeAttachment.bat, which contains this command:




In order to get this to work, we have to copy previous results to the arguments, and execute for every row.
And that's it! Confluence CLI is full of handy tools, and Pentaho makes them even more flexible.


No comments:

Post a Comment