A couple years ago I started delving into MySQL Events as a clean way to do some cronjob like tasks (data aggregation, preprocessing, etc). Overall I’ve been pleased with their performance. However, like any automated, scheduled process eventually something will not work correctly. When that happens you need to know when the issue started to occur, and then dig into the “why” so you can fix it.

Here’s how I track the performance of my scheduled Events. The general idea is to create a new table to store data about each scheduled Event that runs. Technically since a single Event can consist of multiple commands, I try to record a row for each command that is issued.

Log Table

CREATE TABLE `ScheduledEventLog` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `InsertTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `SourceScript` varchar(64) DEFAULT NULL,
  `StartTime` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `EndTime` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `RunDuration` int(11) DEFAULT NULL,
  `RowsAffected` int(11) DEFAULT NULL,
  PRIMARY KEY (`ID`),
  KEY `InsertTime_idx` (`InsertTime`),
  KEY `RunDuration_idx` (`RunDuration`),
  KEY `SourceScript_idx` (`SourceScript`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

Populating the Table

The trick here is to add a couple short commands surrounding each of our scheduled Events. Prior to each aggregation, we just use a session variable to store the database time. After each aggregation we calculate how long it took and record that in our table, along with the number of rows affected.

DELIMITER $$

CREATE EVENT `DailyAggregations` 
ON SCHEDULE EVERY 1 DAY STARTS '2014-01-21 01:01:01' 
ON COMPLETION NOT PRESERVE ENABLE 
COMMENT 'Aggregates data into Logins_PerDayAndUser' 
DO BEGIN

/*** LOG START TIME ***/
SET @starttime := NOW();
/*** LOG START TIME ***/

/*** COMMAND ***/
INSERT INTO `SchemaName`.`Logins_PerDayAndUser`
(`TheDate`,`UserID`,`Logins`)
	SELECT DATE(`Time`) AS TheDate, UserID, COUNT(*) AS Logins
	FROM `SchemaName`.`Logins`
	WHERE `Time` BETWEEN CAST(CONCAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), ' 00:00:00') AS DATETIME)
		AND CAST(CONCAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), ' 23:59:59') AS DATETIME)
	GROUP BY TheDate, UserID
	ON DUPLICATE KEY UPDATE
		`Logins`=VALUES(`Logins`);
/*** COMMAND ***/

/*** LOG END TIME ***/
INSERT LOW_PRIORITY INTO `SchemaName`.`ScheduledEventLog`
(`SourceScript`, `StartTime`, `EndTime`, `RunDuration`, `RowsAffected`)
	SELECT 'DailyAggregations_1.1' AS `SourceScript`
	, CAST(@starttime AS DATETIME) AS `StartTime`
	, NOW() AS `EndTime`
	, TIMESTAMPDIFF(SECOND,CAST(@starttime AS DATETIME),NOW()) AS `RunDuration`
	, `RowsAffected`
	FROM (SELECT ROW_COUNT() AS `RowsAffected`) AS R1;
/*** LOG END TIME ***/



/*** LOG START TIME ***/
SET @starttime := NOW();
/*** LOG START TIME ***/

/*** COMMAND ***/
INSERT INTO `SchemaName`.`Logins_PerDayAndUser`
(`TheDate`,`UserID`,`Logins`)
	SELECT DATE(`Time`) AS TheDate, UserID, COUNT(*) AS Logins
	FROM `SchemaName`.`Logins`
	WHERE `Time` BETWEEN CAST(CONCAT(DATE_SUB(CURDATE(), INTERVAL 2 DAY), ' 00:00:00') AS DATETIME)
		AND CAST(CONCAT(DATE_SUB(CURDATE(), INTERVAL 2 DAY), ' 23:59:59') AS DATETIME)
	GROUP BY TheDate, UserID
	ON DUPLICATE KEY UPDATE
		`Logins`=VALUES(`Logins`);
/*** COMMAND ***/

/*** LOG END TIME ***/
INSERT LOW_PRIORITY INTO `SchemaName`.`ScheduledEventLog`
(`SourceScript`, `StartTime`, `EndTime`, `RunDuration`, `RowsAffected`)
	SELECT 'DailyAggregations_1.2' AS `SourceScript`
	, CAST(@starttime AS DATETIME) AS `StartTime`
	, NOW() AS `EndTime`
	, TIMESTAMPDIFF(SECOND,CAST(@starttime AS DATETIME),NOW()) AS `RunDuration`
	, `RowsAffected`
	FROM (SELECT ROW_COUNT() AS `RowsAffected`) AS R1;
/*** LOG END TIME ***/

END$$
DELIMITER ;
NOTE: `SourceScript` is just a text field used to store a combination of the Event name and a unique number so we can track down performance of each individual aggregation.

 

Now that we are storing data from each command within each Event we have a lot of flexibility to track the performance. I won’t go into depth, but here are a few ideas:

  • Trigger an email if any command takes longer than X seconds
  • Report an issue if a command takes more than 100% longer than the recent average duration
  • Query for missing logs to find commands that didn’t finish successfully

Leave a reply