I decided to tinker with this idea a bit more, to get a better impression of how much slower using cut and awk really is. I made each of the following functions iterate 100 times. I am pretty sure some of you will be shocked by the speed difference between cut and awk, versus the shell native string splitting.
Here is my code, followed by the execution results:
Note: The following code is virtually identical to what is found on the pages I linked to above. The first two functions I modified to iterate more times, the third and fourth are essentially unchanged, and the final two are changed from the two preceding, albeit slightly, to gain a small performance increase.
#!/bin/bash f_cut() { for i in {0..99}; do while read line; do uid=$(printf "$line" | cut -d: -f3) if [[ $uid -gt 10 ]]; then shell=$(printf "$line" | cut -d: -f7) if [[ '/sbin/nologin' == "$shell" ]]; then printf "$line" | cut -d: -f1 fi fi done </etc/passwd done } f_awk() { for i in {0..99}; do while read line; do uid=$(printf "$line" | awk -F: '{print $3}') if [[ $uid -gt 10 ]]; then shell=$(printf "$line" | awk -F: '{print $7}') if [[ '/sbin/nologin' == "$shell" ]]; then printf "$line" | awk -F: '{print $1}' fi fi done </etc/passwd done } shell_1() { for i in {0..99}; do while read line; do oldifs="$IFS" IFS=: set -- $line IFS="$oldifs" if [[ $3 -gt 10 ]] && [[ '/sbin/nologin' == "$7" ]]; then printf "$1" fi done </etc/passwd done } shell_2() { for i in {0..99}; do while IFS=: read username pp uid gid gecos hd shell; do if [[ $uid -gt 10 ]] && [[ '/sbin/nologin' == "$shell" ]]; then printf "$username" fi done </etc/passwd done } shell_3() { oldifs="$IFS" IFS=: for i in {0..99}; do while read line; do set -- $line if [[ $3 -gt 10 ]] && [[ '/sbin/nologin' == $7 ]]; then printf "$1" fi done </etc/passwd done IFS="$oldifs" } shell_4() { oldifs="$IFS" IFS=: for i in {0..99}; do while read username pp uid gid gecos hd shell; do if [[ $uid -gt 10 ]] && [[ '/sbin/nologin' == $shell ]]; then printf "$username" fi done </etc/passwd done IFS="$oldifs" } printf "\n%s" "---Cut---" time f_cut >/dev/null printf "\n%s" "---Awk---" time f_awk >/dev/null printf "\n%s" "---Shell 1---" time shell_1 >/dev/null printf "\n%s" "---Shell 2---" time shell_2 >/dev/null printf "\n%s" "---Shell 3---" time shell_3 >/dev/null printf "\n%s" "---Shell 4---" time shell_4 >/dev/null
Now we'll have a look at the time results:
---Cut--- real 1m10.278s user 0m20.068s sys 1m9.745s ---Awk--- real 1m24.043s user 0m25.170s sys 1m21.238s ---Shell 1--- real 0m1.387s user 0m1.282s sys 0m0.100s ---Shell 2--- real 0m1.219s user 0m1.109s sys 0m0.104s ---Shell 3--- real 0m0.943s user 0m0.852s sys 0m0.090s ---Shell 4--- real 0m0.887s user 0m0.786s sys 0m0.097s
First, notice that the functions with cut and awk took over a minute each to complete the work! Compare this with the results of Shell 1 and Shell 2 and you can see a massive difference. The latter only took a little over a second to complete. The obvious lesson here, as Brock Noland pointed out in his post, is that cut and awk are not the best solution in a case like this.
The last two functions, whose results are displayed in Shell 3 and Shell 4, are my modifications. I was able to shave enough off to get them both below a second in execution time. I did this by moving some stuff out of the for loop that did not need to be in there at all. This saves a tiny bit of work for the processor. On big jobs, this would add up to bigger time differences.
Don't use cut or awk for jobs like this. Let the shell do the work.
No comments:
Post a Comment