Rundeck との闘争
ウチの社では、Rundeck
をジョブ管理サーバーとして使用している。
各バッチサーバー上のバッチプログラムを、定周期で起動させるのが仕事なのだが
何故かRundeckを運用するサーバーの負荷が右肩上がりで、しかも定期的にCPUが高止まりする、という困った状態に。
これが何とか解消できた、という話。
そもそもRundeckとは
- ジョブスケジューラ
- WebのGUIを持っていて管理が楽
- オープンソースプロダクト(Java)
- 最新バージョンは 2.6.2 (ウチらが導入したのが2015/10くらいで2.5.2なのでメンテ頻度はまあまあ)
ウチらが取ってる構成(現状)
以下のような感じ。
そしてMySQLが過負荷に
ただジョブを定期的に起動させるだけなのだが・・・典型的にヤバい感じに。
グラフを見ればわかるのだが、負荷が落ち着く期間もあるものの、傾向的には右肩上がり。
ジョブは総数70もないので、今後が思いやられていた。
slow.log を見てみると
こんな感じのが大量に出ていた。(これに早く気付いていれば。)
select commands0_.workflow_commands_id as workflow1_15_0_, commands0_.workflow_step_id as workflow2_17_0_, commands0_.commands_idx as commands3_0_, workflowst1_.id as id1_16_1_, workflowst1_.version as version2_16_1_, workflowst1_.description as descript3_16_1_, workflowst1_.error_handler_id as error_ha4_16_1_, workflowst1_.keepgoing_on_success as keepgoin5_16_1_, workflowst1_.adhoc_execution as adhoc_ex7_16_1_, workflowst1_.adhoc_filepath as adhoc_fi8_16_1_, workflowst1_.adhoc_local_string as adhoc_lo9_16_1_, workflowst1_.adhoc_remote_string as adhoc_r10_16_1_, workflowst1_.arg_string as arg_str11_16_1_, workflowst1_.file_extension as file_ex12_16_1_, workflowst1_.interpreter_args_quoted as interpr13_16_1_, workflowst1_.script_interpreter as script_14_16_1_, workflowst1_.json_data as json_da15_16_1_, workflowst1_.node_step as node_st16_16_1_, workflowst1_.type as type17_16_1_, workflowst1_.job_group as job_gro18_16_1_, workflowst1_.job_name as job_nam19_16_1_, workflowst1_.node_filter as node_fi20_16_1_, workflowst1_.node_keepgoing as node_ke21_16_1_, workflowst1_.node_rank_attribute as node_ra22_16_1_, workflowst1_.node_rank_order_ascending as node_ra23_16_1_, workflowst1_.node_threadcount as node_th24_16_1_, workflowst1_.class as class6_16_1_ from workflow_workflow_step commands0_ inner join workflow_step workflowst1_ on commands0_.workflow_step_id=workflowst1_.id where commands0_.workflow_commands_id=2061677;
workflow_workflow_step
なるテーブルの定義を見てみると
| workflow_workflow_step | CREATE TABLE `workflow_workflow_step` ( `workflow_commands_id` bigint(20) DEFAULT NULL, `workflow_step_id` bigint(20) DEFAULT NULL, `commands_idx` int(11) DEFAULT NULL, KEY `FK_9pkey6k5fdo6worgquakkh7d1` (`workflow_step_id`), CONSTRAINT `FK_9pkey6k5fdo6worgquakkh7d1` FOREIGN KEY (`workflow_step_id`) REFERENCES `workflow_step` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=sjis |
indexらしきものがない!(PKさえない!)
explain とってみると、案の定。
> explain select commands0_.workflow_commands_id as workflow1_15_0_, commands0_.workflow_step_id as workflow2_17_0_, commands0_.commands_idx as commands3_0_, workflowst1_.id as id1_16_1_, workflowst1_.version as version2_16_1_, workflowst1_.description as descript3_16_1_, workflowst1_.error_handler_id as error_ha4_16_1_, workflowst1_.keepgoing_on_success as keepgoin5_16_1_, workflowst1_.adhoc_execution as adhoc_ex7_16_1_, workflowst1_.adhoc_filepath as adhoc_fi8_16_1_, workflowst1_.adhoc_local_string as adhoc_lo9_16_1_, workflowst1_.adhoc_remote_string as adhoc_r10_16_1_, workflowst1_.arg_string as arg_str11_16_1_, workflowst1_.file_extension as file_ex12_16_1_, workflowst1_.interpreter_args_quoted as interpr13_16_1_, workflowst1_.script_interpreter as script_14_16_1_, workflowst1_.json_data as json_da15_16_1_, workflowst1_.node_step as node_st16_16_1_, workflowst1_.type as type17_16_1_, workflowst1_.job_group as job_gro18_16_1_, workflowst1_.job_name as job_nam19_16_1_, workflowst1_.node_filter as node_fi20_16_1_, workflowst1_.node_keepgoing as node_ke21_16_1_, workflowst1_.node_rank_attribute as node_ra22_16_1_, workflowst1_.node_rank_order_ascending as node_ra23_16_1_, workflowst1_.node_threadcount as node_th24_16_1_, workflowst1_.class as class6_16_1_ from workflow_workflow_step commands0_ inner join workflow_step workflowst1_ on commands0_.workflow_step_id=workflowst1_.id where commands0_.workflow_commands_id=2061677; +----+-------------+--------------+------+------------------------------+------------------------------+---------+-------------------------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------------+------+------------------------------+------------------------------+---------+-------------------------+---------+-------------+ | 1 | SIMPLE | workflowst1_ | ALL | PRIMARY | NULL | NULL | NULL | 2008680 | | | 1 | SIMPLE | commands0_ | ref | FK_9pkey6k5fdo6worgquakkh7d1 | FK_9pkey6k5fdo6worgquakkh7d1 | 9 | rundeck.workflowst1_.id | 1 | Using where | +----+-------------+--------------+------+------------------------------+------------------------------+---------+-------------------------+---------+-------------+
index を張ってみる
> ALTER TABLE workflow_workflow_step ADD INDEX workflow_commands_id(workflow_commands_id);
張った直後のリソース
劇的に収まったようだ。。。ε-(´。`;)ホッ
後は、ジョブが集中する月初、0時0分0秒の状態を見て問題なければこれで良さそう。