Monday, January 11, 2016

Profile your python code

Many cases, when you write a python script, you find it takes a long time to run. You want to optimize it, but don't know which part you can make it run faster. I found the line_profiler is a great tool to profile your code line by line. It is also very easy to use, I use it almost all the time when I write some new python code.

Step 1 install
$ sudo pip install line_profiler

Step 2 add profile decorator to the function you want to profile
If I have a script called example.py that has two functions to covert the output of range function to a string list, but the 2nd function uses list comprehension:
######################################################################
@profile
def example_function(myRange):
    # directly
    str_list = []  
    for i in myRange:
        str_list.append(str(i))
     
@profile
def example_function2(myRange):
    # use list comprehension to convert range to string list
    str_list = [str(i) for i in myRange]
     
example_function(range(1000000))
example_function2(range(1000000))
######################################################################

Step 3 run kernprof script to profile it
[20:47:55 qingkaikong]$kernprof -l -v example.py 
Wrote profile results to example.py.lprof
Timer unit: 1e-06 s

Total time: 0.701441 s
File: example.py
Function: example_function at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           @profile
     2                                           def example_function(myRange):
     3                                               # directly convert range to string list
     4         1            2      2.0      0.0      str_list = []
     5   1000001       248127      0.2     35.4      for i in myRange:
     6   1000000       453312      0.5     64.6          str_list.append(str(i))

Total time: 0.416699 s
File: example.py
Function: example_function2 at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile   
     9                                           def example_function2(myRange):
    10                                               # use list comprehension to convert range to string list

    11   1000001       416699      0.4    100.0      str_list = [str(i) for i in myRange] 

Conclusion:
Now you can see the time used for each line in the functions, and understand the benefit of using list comprehension.

Troubleshoting:
When I first install the package, and run with the kernprof script, I got the following error:
NameError: name 'profile' is not defined

Then I found the solution at: https://github.com/rkern/line_profiler/pull/25

It seems the version I installed from pip is not contain this commit, so you need apply this commit manually if you have this problem too (you will soon have no problems after the developers put this new fix on pip).  For now you need do is to apply this commit: