Tuesday, January 12, 2010

Optimizing your shell scripts

A few days ago I came across a couple of posts, by Brock Noland, (blog site is dead now) about splitting strings natively with the shell.  You can find them here here (Internet archive) and here here. Basically, the writer demonstrated that using shell builtins and variables is much more desirable than using outside programs, such as cut and awk for the same purpose.  The reason for this is due to speed.  Every time you make a call to an outside program with the shell, you fork a new process, and that takes time.

I decided to tinker with this idea a bit more, to get a better impression of how much slower using cut and awk really is. I made each of the following functions iterate 100 times.  I am pretty sure some of you will be shocked by the speed difference between cut and awk, versus the shell native string splitting.
Here is my code, followed by the execution results:

Note: The following code is virtually identical to what is found on the pages I linked to above.  The first two functions I modified to iterate more times, the third and fourth are essentially unchanged, and the final two are changed from the two preceding, albeit slightly, to gain a small performance increase.
#!/bin/bash                                  
f_cut() {
   for i in {0..99}; do
      while read line; do
         uid=$(printf "$line" | cut -d: -f3)
         if [[ $uid -gt 10 ]]; then         
            shell=$(printf "$line" | cut -d: -f7)
            if [[ '/sbin/nologin' == "$shell" ]]; then
               printf "$line" | cut -d: -f1           
            fi                                        
         fi                                           
      done </etc/passwd                            
   done                                               
}                                                     

f_awk() {
   for i in {0..99}; do
      while read line; do
         uid=$(printf "$line" | awk -F: '{print $3}')
         if [[ $uid -gt 10 ]]; then                  
            shell=$(printf "$line" | awk -F: '{print $7}')
            if [[ '/sbin/nologin' == "$shell" ]]; then    
               printf "$line" | awk -F: '{print $1}'      
            fi                                            
         fi                                               
      done </etc/passwd                                
   done                                                   
}                                                         

shell_1() {
   for i in {0..99}; do
      while read line; do
         oldifs="$IFS"   
         IFS=:           
         set -- $line    
         IFS="$oldifs"   
         if [[ $3 -gt 10 ]] && [[ '/sbin/nologin' == "$7" ]]; then
            printf "$1"                                           
         fi                                                       
      done </etc/passwd                                        
   done                                                           
}                                                                 

shell_2() {
   for i in {0..99}; do
      while IFS=: read username pp uid gid gecos hd shell; do
         if [[ $uid -gt 10 ]] && [[ '/sbin/nologin' == "$shell" ]]; then
            printf "$username"                                          
         fi                                                             
      done </etc/passwd                                              
   done                                                                 
}                                                                       

shell_3() {
   oldifs="$IFS"
   IFS=:      
   for i in {0..99}; do
      while read line; do
         set -- $line    
         if [[ $3 -gt 10 ]] && [[ '/sbin/nologin' == $7 ]]; then
            printf "$1"                                         
         fi                                                     
      done </etc/passwd                                      
   done                                                         
   IFS="$oldifs"                                                
}                                                               

shell_4() {
   oldifs="$IFS"
   IFS=:
   for i in {0..99}; do
      while read username pp uid gid gecos hd shell; do
         if [[ $uid -gt 10 ]] && [[ '/sbin/nologin' == $shell ]]; then
            printf "$username"
         fi
      done </etc/passwd
   done
   IFS="$oldifs"
}

printf "\n%s" "---Cut---"
time f_cut >/dev/null
printf "\n%s" "---Awk---"
time f_awk >/dev/null
printf "\n%s" "---Shell 1---"
time shell_1 >/dev/null
printf "\n%s" "---Shell 2---"
time shell_2 >/dev/null
printf "\n%s" "---Shell 3---"
time shell_3 >/dev/null
printf "\n%s" "---Shell 4---"
time shell_4 >/dev/null

Now we'll have a look at the time results:



---Cut---
real    1m10.278s
user    0m20.068s
sys     1m9.745s 

---Awk---
real    1m24.043s
user    0m25.170s
sys     1m21.238s

---Shell 1---
real    0m1.387s
user    0m1.282s
sys     0m0.100s

---Shell 2---
real    0m1.219s
user    0m1.109s
sys     0m0.104s

---Shell 3---
real    0m0.943s
user    0m0.852s
sys     0m0.090s

---Shell 4---
real    0m0.887s
user    0m0.786s
sys     0m0.097s

First, notice that the functions with cut and awk took over a minute each to complete the work!  Compare this with the results of Shell 1 and Shell 2 and you can see a massive difference.  The latter only took a little over a second to complete.  The obvious lesson here, as Brock Noland pointed out in his post, is that cut and awk are not the best solution in a case like this.

The last two functions, whose results are displayed in Shell 3 and Shell 4, are my modifications.  I was able to shave enough off to get them both below a second in execution time.  I did this by moving some stuff out of the for loop that did not need to be in there at all.  This saves a tiny bit of work for the processor.  On big jobs, this would add up to bigger time differences.

Don't use cut or awk for jobs like this.  Let the shell do the work.

No comments:

Post a Comment