The performance DB - Some cool things from SMF 101 records
In the life of a DB2 DBA on zOS , Performance tuning could broadly be categorized under :
a. Reactive Tuning : - This is generally reported by the application to the DBA as a slowness situation or the operations or capacity personnel who observe a sudden spike in CPU times during month-end that deviates from other month ends and is reported to the application and/or DBA . This could also be a situation where the DBA has been summoned by the situation manager to track any issues that could reside in the database pertaining to a slowness experienced by the business users.
If the "situation" is an active one , most shops these days arm the DBA with monitoring tools like IBM's Omegamon, CA Insight , BMC Mainview , BMC Apptune to look at the thread details at a preliminary level to see if the latency is due to the workload being swapped out of central storage or is the DB2 CPU time really a cause for concern alongwith a plethora of other details like I-O Time , Getpages Read efficiency , Other Read suspension times to name a few .
If the "situation" is one from the past , it depends how far back in time is this level of detail available on the online monitors like IBM's Omegamon, CA Insight , BMC Mainview , BMC Apptune. Sometimes , the problem may present itself only on month-end executions and it really depends on how much storage can be made available to save all history data on the monitors.
To address this - SMF 101 records are generally used to answer this question. The SMF 101 datasets can be formatted to load the relevant accounting info into a performance database . Every third party vendor provides : - DDL for the tables that would hold this info . Alongwith the DDL , a process that extracts the SMF Record type 101 records for Accounting records is also provided . The info can be summarized by the hour to roll-up performance data for various workload like Batch (Db2 Call) , TSO , CICS or Distributed connections.
This info can get as granular as Program level detail . This helps us track and trend the performance of the thread (identified by correlation ID) and check for spikes in the past to determine if this is a known issue or if there is indeed an uptick that's started to grow and must be addressed as early as possible.
b. Proactive Tuning: - Proactive tuning looks at the problem from a different perspective. This is when the application hasn't reported the problem and are operating 'under' service level thresholds which are very liberal or non-existent. A stringent scrutiny on top CPU consumers during the month-end could help organizations peek into what is driving up their 4 hour rolling average and thereby engage personnel who could assist with tuning efforts.
This is easier said than done and the premise pivots on 2 pillars
i.) Automated capture of the month-end top 10 consumers
If you have captured SMF 101 records into a performance DB , unload the top 10 or 20 over-all CPU consumers between the last days of the previous month and first few days of the month . Schedule this into a DSNTIAUL job to report every month .
Sometimes a one-off check doesn't suffice . Having this into a DSNTIAUL job with a report sent to the people footing the mainframe CPU bill for the enterprise will help garner serious traction towards a proactive tuning effort.
ii.) Weigh the cost of tuning versus the cost burnt in CPU
More often than not - tuning is not single pronged. There are multiple facets to it. And with that multiple teams have a vested interest in the candidate program in question. One must account to a great level of detail whether the tuning is a simple Index to be added versus a change in the SQL or application logic and how much of testing effort is involved .
From a technical standpoint when a solution seems feasible , depending upon the 'agility' in your organization with embracing change, you must also account for how much effort would this entail if we were to implement the tuning fix versus how much we save in terms of CPU chargeback. This quantification helps weigh the cost of the solution to the cost incurred without and helps tremendously in directing a decision towards creating a project for the tuning fix.
As you may have observed , the SMF 101 plays a vital role in the areas of both reactive and proactive tuning and presents itself as a strong record of evidence to the DBA to check trends, analyze workload anomalies and also monitor the top CPU consumers during a month-end.
There are also other pieces of info you could glean out of the SMF 101 records.
- Learning the nature of your connections: - Batch / TSO / CICS / Distributed / Stored Procedure
- Identifying users or applications that have analytical workloads (typically Connection type SERVER or PLANNAME = DISTSERV )
- Identify applications that can benefit from Hi-Performance DBATs
- Monitor zIIP offloads and General Processor failovers
- Tracking info on what Plans are actively used and by which connections
- Check the number of times a package was invoked in a specific correlation (example is a DB2 batch program that might be called millions of times for every record read on the input file)
- Monitor Asynchronous reads, getpages , elapsed time per synchronous IO
- SMF 100 also provides info that could help with checking page residency on bufferpools that could assist with bufferpool tuning and nature of application traffic on the objects pinned to that pool.
These are just a few things to list .
What type of info do you retrieve using your SMF 101 data ?