QA Process Accounting

From TeraGrid Wiki

Jump to: navigation, search

Contents

Motivation

As part of the QA group deliverable of prioritizing testing/debugging of services that we think are the most relevant for users, we searched for sources of data we could use to access how much usage a service was getting (i.e., how important it was to users) and how reliable it was. One source of usage information was the process accounting data being collected by TACC, NICS, and Purdue. Because user requirements and priorities change over time, the group felt it was important to create a sustainable process for collecting and analyzing usage data. We are currently investigating methods to automate the manual processing accounting process and we’ll continue to look at other way to reduce manual usage data collection and reporting overall.

The acct/pacct Process Accounting

UNIX can log every command run by every user at the command line. This kind of logging is often called process accounting and provides some excellent information about every process running. The 'lastcomm' or 'acctcom' program displays the contents of this log file in a human-readable format.

Process accounting is often installed and controlled by the local security group at each site. They should know where the acct/pacct process accounting files are. For example, on Kraken for each login node they are located at: /var/account/, where each file name looks like: pacct.20090417 (which includes a date stamp in the form of YearMonthDay). Another common location for this files is /var/adm/pacct.

This data collection process can be resource intensive, and would not be recommended for capturing only one type of statistics like globus usage. Process accounting can take up a large amount of space on the drive and may cause performance degredation. Therefore, it might be convenient to set up a script to periodically zip up this output and rotate the logs.

In general to activate and look at its output can be performed as follows:

Turn accounting on and write the output to /var/adm/pacct: /usr/lib/acct/accton /var/adm/pacct
View the contents of /var/adm/pacct: /bin/acctcom
Turn accounting off: /usr/lib/acct/accton

Note: accton can be located at different locations like: /usr/sbin/accton.

Example Usage Reports

Some examples of process accounting reports are linked below by site:

Automating Usage Reports

The QA recommends using the following scripts written by Dave Carver (TACC) to do automated process accounting.

Here is an example of the root cron job that TACC runs four times a day on each TG login node to gather process accounting stats from each of them, and have it ready for INCA to access it for further processing.

#!/bin/bash
#
# Extract ASCII text and filter out globus commands and remove selected userids
#
# Notice: This script must run as root to a access the process accounting information
# 
# Directory to store filter process_accounting files
#
DIR="/home/utexas/staff/inca/process_accounting"
#
# Where to get the the process accounting from the system
#
AC="/var/log/ac"
#
# Setup Time and Day to timestamp each file
#
DATE=`date +%Y%m%d`
TIME=`date +%H%M%S`
HOSTNAME=`hostname`
#
# File with Commands to extract
#
GLOBUS_COMMANDS="$DIR/globus-commands"
#
# File with Users to exclude 
#
USERS_EXCLUDE="$DIR/globus-users"
#
lastcomm -f $AC | grep -f $DIR/globus-commands | grep -vf $USERS_EXCLUDE >$DIR/logs/$HOSTNAME-$DATE-$TIME
#
# Change ownership to allow inca to access filtered information
#
chown inca:G-80230 $DIR/logs/$HOSTNAME-$DATE-$TIME


The following perl script parses the process accounting log file from lastcomm and generates a simple report

usage: usage.pl <out-from-lastcomm > usage.reports

#!/usr/bin/perl
#
# Parse process accouting log and generate a simple report showing a frequency for each 
# command found and a list of  users that used the command more than 10% of the total frequency.
# Modified for RHEL 5 to understand the new S,F,D, and X flags and skip past these fields
#
# Usage
#       sum.pl < process.accounting.ascii.file > process.account.report 
#
 
while(<STDIN>) {
   chop;
#  ($command, $user, $junk, $sec, $sec_junk, $dayofweek, $month, $day, $time ) = split;
   $command = substr $_,0,18;     # Get command first 18 characters
   $user = substr $_,24,8;        # Skip to column 24 for username
   $gcomand{$command} = $gcomand{$command} + 1;
   $guser{$user} = 1;
   $gcommanduser{$command}{$user} = $gcommanduser{$command}{$user} + 1;
}

foreach $command (sort keys %gcomand) {
    $numberofusers = 0;
    foreach $user (sort keys %guser) {
         if ($gcommanduser{$command}{$user}) {
               $numberofusers = $numberofusers + 1;
         }
    }
    print "\n$command number of times = $gcomand{$command} number of users = $numberofusers\n";
    foreach $user (sort keys %guser) {
         if ($gcommanduser{$command}{$user}) {
               $percent = $gcommanduser{$command}{$user}*100/$gcomand{$command};
#
# Report users that are greater than 50.0% of total useage 
#
               if ($percent > 50.0) {
               print " $command $user number of times = $gcommanduser{$command}{$user} out of $gcomand{$command}";
               printf "  %6.2f\%\n", $percent;
               }
         }
    }
}
 

Sample output report from usage.pl:

master$ cat usage.report
 
grid-proxy-info: number of times = 126 number of users = 1
       grid-proxy-info: envision: number of times = 126 out of 126  100.00%

gsissh: number of times = 4673 number of users = 1
       gsissh: envision: number of times = 4673 out of 4673  100.00%

Inca Reporter

The Inca reporter for process accounting data has two parts. The first pacct_report.py, takes the process accounting logs generated by the kernel and digests them into a json formatted report containing the per user count of the processes. The json formatted configuration file allows for exclusion of specific uid's from the report as well as restricting the report to only the specified process names. This report was intended to be run from cron with it's output redirected to a file.

The second part, grid-tool-usage, takes a single argument '--pacctjson=<path to pacct_report.py output>'. It is extremely simple in design, just loading the json and tallying the total usage and user count for each command.

GitHub repo for the reporter is here https://github.com/jmlowe/TG-QA-Process-Accounting-Reporter

Personal tools