Profiling and Optimizing Python
Pages: 1, 2, 3
Reprofiling
The results were surprising. I was hoping for a 50 to 100 percent performance improvement. Here is a time of running the parser with no profiler:
jmjones@qiwi 9:25PM parser % time python aggregate_parser.py \
~/svn/ediplex/example_data/big_edi.txt
python aggregate_parser.py ~/svn/ediplex/example_data/big_edi.txt
0.47s user 0.01s system 99% cpu 0.481 total
Here is the parser run with the hotshot profiler:
jmjones@qiwi 5:43PM parser % time python aggregate_parser.py \
~/svn/ediplex/example_data/big_edi.txt
python aggregate_parser.py ~/svn/ediplex/example_data/big_edi.txt
0.42s user 0.49s system 87% cpu 1.042 total
Without the profiler, this is a performance boost of about 7X. Removing the Python interpreter startup time, library imports, and such from both sides of the equation, this was actually a performance boost of 10X. The timeit module shows a 2.3 second best time for parsing a file before optimization and .22 seconds after optimizations. Here is the statistical data from running the parser:
In [1]: from hotshot import stats
In [2]: s = stats.load("hotshot_edi_stats")
In [3]: s.sort_stats("time").print_stats()
11005 function calls in 0.436 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1000 0.338 0.000 0.374 0.000 /home/jmjones/svn/ediplex/src/ediplex/parser/x12_parser.py:32(is_valid_header)
7000 0.036 0.000 0.036 0.000 /usr/lib/python2.4/string.py:248(strip)
1001 0.035 0.000 0.428 0.000 aggregate_parser.py:57(parse_file)
1000 0.013 0.000 0.013 0.000 /home/jmjones/svn/ediplex/src/ediplex/parser/x12_parser.py:15(__init__)
1 0.008 0.008 0.436 0.436 aggregate_parser.py:128(main)
1000 0.006 0.000 0.006 0.000 /home/jmjones/svn/ediplex/src/ediplex/parser/x12_parser.py:64(is_valid_footer)
1 0.000 0.000 0.000 0.000 aggregate_parser.py:122(get_canned_aggregate_parser)
1 0.000 0.000 0.000 0.000 aggregate_parser.py:15(__init__)
1 0.000 0.000 0.000 0.000 aggregate_parser.py:107(add_parser)
0 0.000 0.000 profile:0(profiler)
Out[3]: <pstats.Stats instance at 0xb798f4cc>
The profiler shows far fewer methods than before optimizing things. The majority of time is now in methods that actually do things rather than no-op or state transitions. All of the features that had no-op placeholders are now gone and will occur only when a user decides he needs to call them.
The profiler also shows far fewer method calls. Before, there were two
methods, body_seg() and segment(), with calls over
130,000 times. Function calls were fairly expensive in previous versions of
Python--and I believe they still are, but to a lesser degree. This explains
why 130,000 function calls to even a no-op created a performance bottle
neck.
I heavily optimized the method that searched for the end of an EDI interchange (now called is_valid_footer()) but hardly touched the one that identified the beginning of an interchange (now called is_valid_header()). is_valid_header() is now the biggest bottleneck, taking 10 times more time than the next method in the list, which happens to be the Python standard string.strip() method--and about 77 percent of the total processing time. The only section of is_valid_header() that does anything is a for loop that iterates over range(106) and checks the header string for correctness. If I really wanted to optimize is_valid_header(), I would remove the for loop and either do the string check with a (shudder) regex or write a function in C and swig it. I may implement one of these optimizations just because of that nagging bottleneck (and for fun!)--but the 2,080 2K interchanges per second, which includes Python interpreter startup and library imports, or 4,500 2K interchanges per second, excluding Python startup and imports, is faster than I had hoped to take this parser.
Just for the sake of comparison, here is the timeit run of the optimized code:
jmjones@qiwi 4:33PM parser % python aggregate_parser.py
~/svn/ediplex/example_data/edi_1000_interchanges.txt
[0.4770960807800293, 0.47198605537414551, 0.43927288055419922,
0.29295921325683594, 0.21883893013000488, 0.21800088882446289,
0.2185051441192627, 0.21764492988586426, 0.23419904708862305,
0.21747803688049316] :: 0.21747803688
Conclusion
I have attempted to live by the "premature optimization" motto with the activity surrounding this article. I spent very little time up front trying to predict where my code would run slow, and made very few concessions to preemptively make it run faster. I simply used the hotshot profiler, a tool that is part of the Python standard library, found my bottlenecks, rearchitected my parser (which is probably a bit more extreme than what most folks will have to do), made some code changes, and saw a significant performance improvement. For any performance analysis of Python source code, the profiler is an indispensable tool.
Jeremy Jones is a software engineer who works for Predictix. His weapon of choice is Python.
Return to the Python DevCenter.
-
Great article
2006-05-10 03:05:27 Rick Copeland |
[View]
-
profiling threads in python
2006-01-03 13:10:42 noir.pl [View]