Programmers typically place debuggers in the "uh oh" corner of their
toolboxes, somewhere between the network packet sniffer and the disassembler.
The only time we reach for a debugger is when something goes wrong or breaks
and our standard debugging techniques such as print statements (or better
yet, log messages) do not reveal the root of the problem. The Python standard
library contains an interactive source code debugger which suits "uh oh"
situations well.
The Python interactive source code debugger runs code in a controlled manner. It allows stepping through a piece of code one line at a time, walking up and back down call trees, setting break points, and using the power of the Python shell for various levels of introspection and control.
What's the big deal? I can do most of that by modifying the source code by
placing print statements everywhere (mostly resembling either
print "dir>>", dir() or print "variable_name>>",
variable_name) and running it again. While this is true, there are
issues of convenience and control.
|
Related Reading
Python Cookbook |
Regarding convenience, sometimes it is much more convenient to drop in to a
debugger to see what is going on right in front of your eyes and poke at your
code while at a Python prompt rather than having to modify the code and rerun
it. What if you are trying to debug a database application in which the bug occurs after retrieving a set of data that took tens of seconds to retrieve?
Worse still, what if you have a bug in a computationally intense application
that occurs after processing several hours' worth of data? You might possibly
nearly break even on the first run of a program using either the interactive
debugger versus the print technique of debugging. But chances are
you will not have gathered enough data on the first run to solve the problem
successfully. The payback comes when it would have taken several runs and
multiple print inserts into the source code to solve the problem.
With the debugger, you can do an exhaustive amount of information gathering and
analysis and, hopefully, solve the problem all at once.
Regarding control, which overlaps with convenience, debugging an application at a prompt, as opposed to modifying source code and rerunning it, provides an immediate level of control. Sometimes it is easier to figure out what is going on with a set of code if you have live objects at your fingertips and can interact with them through a prompt, especially if you are using a powerful shell such as IPython. This is one of the minor general reasons Python is a powerful language; the interactive prompt provides immediate, interactive control over objects living in a set of code.
The pdb module contains the debugger. pdb contains
one class, Pdb, which inherits from bdb.Bdb. The
debugger documentation mentions six functions, which create an interactive
debugging session:
pdb.run(statement[, globals[, locals]])
pdb.runeval(expression[, globals[, locals]])
pdb.runcall(function[, argument, ...])
pdb.set_trace()
pdb.post_mortem(traceback)
pdb.pm()
All six functions provide a slightly different mechanism for dropping a user into the debugger.
pdb.run(statement[, globals[, locals]])pdb.run() executes the string statement under the
debugger's control. Global and local dictionaries are optional parameters:
#!/usr/bin/env python
import pdb
def test_debugger(some_int):
print "start some_int>>", some_int
return_int = 10 / some_int
print "end some_int>>", some_int
return return_int
if __name__ == "__main__":
pdb.run("test_debugger(0)")
pdb.runeval(expression[,
globals[, locals]])pdb.runeval() is identical to pdb.run(), except
that pdb.runeval() returns the value of the evaluated string
expression:
#!/usr/bin/env python
import pdb
def test_debugger(some_int):
print "start some_int>>", some_int
return_int = 10 / some_int
print "end some_int>>", some_int
return return_int
if __name__ == "__main__":
pdb.runeval("test_debugger(0)")
pdb.runcall(function[,
argument, ...])pdb.runcall() calls the specified function and
passes any specified arguments to it:
#!/usr/bin/env python
import pdb
def test_debugger(some_int):
print "start some_int>>", some_int
return_int = 10 / some_int
print "end some_int>>", some_int
return return_int
if __name__ == "__main__":
pdb.runcall(test_debugger, 0)
pdb.set_trace()pdb.set_trace() drops the code into the debugger when execution
hits it:
#!/usr/bin/env python
import pdb
def test_debugger(some_int):
pdb.set_trace()
print "start some_int>>", some_int
return_int = 10 / some_int
print "end some_int>>", some_int
return return_int
if __name__ == "__main__":
test_debugger(0)
pdb.post_mortem(traceback)pdb.post_mortem() performs postmortem debugging of the
specified traceback:
#!/usr/bin/env python
import pdb
def test_debugger(some_int):
print "start some_int>>", some_int
return_int = 10 / some_int
print "end some_int>>", some_int
return return_int
if __name__ == "__main__":
try:
test_debugger(0)
except:
import sys
tb = sys.exc_info()[2]
pdb.post_mortem(tb)
pdb.pm()pdb.pm() performs postmortem debugging of the traceback
contained in sys.last_traceback:
#!/usr/bin/env python
import pdb
import sys
def test_debugger(some_int):
print "start some_int>>", some_int
return_int = 10 / some_int
print "end some_int>>", some_int
return return_int
def do_debugger(type, value, tb):
pdb.pm()
if __name__ == "__main__":
sys.excepthook = do_debugger
test_debugger(0)
|
Now that I've shown you how to get into a debugging session, it's time for a
simple example. The Python standard library
reference documentation includes the Python
debugger commands, but I will go over commands as I introduce them. The
following example script starts by calling f1(), which calls
f2(), which calls f3(), which calls
f4(), and then returns back up the chain. Running the script
immediately drops into a debugging session:
#!/usr/bin/env python
import pdb
def f1(some_arg):
print some_arg
some_other_arg = some_arg + 1
return f2(some_other_arg)
def f2(some_arg):
print some_arg
some_other_arg = some_arg + 1
return f3(some_other_arg)
def f3(some_arg):
print some_arg
some_other_arg = some_arg + 1
return f4(some_other_arg)
def f4(some_arg):
print some_arg
some_other_arg = some_arg + 1
return some_other_arg
if __name__ == "__main__":
pdb.runcall(f1, 1)
When I run this piece of code, I immediately get a (Pdb)
prompt, like this:
jmjones@bean:~/debugger $ python simple_debugger_example.py
> /home/jmjones/debugger/simple_debugger_example.py(8)f1()
-> print some_arg (Pdb)
|
A Note on Commands Most debugger commands have abbreviations. For example, you can use either
the full command |
First, I wanted to step way down to f2() using the
(s)tep command, and then see where I was with the
(l)ist command:
(Pdb) s
1
> /home/jmjones/svn/articles/debugger/simple_debugger_example.py(9)f1()
-> some_other_arg = some_arg + 1
(Pdb) s
> /home/jmjones/svn/articles/debugger/simple_debugger_example.py(10)f1()
-> return f2(some_other_arg)
(Pdb) s
--Call--
> /home/jmjones/svn/articles/debugger/simple_debugger_example.py(12)f2()
-> def f2(some_arg):
(Pdb) s
> /home/jmjones/svn/articles/debugger/simple_debugger_example.py(13)f2()
-> print some_arg
(Pdb) l
8 print some_arg
9 some_other_arg = some_arg + 1
10 return f2(some_other_arg)
11
12 def f2(some_arg):
13 -> print some_arg
14 some_other_arg = some_arg + 1
15 return f3(some_other_arg)
16
17 def f3(some_arg):
18 print some_arg
(Pdb)
The step command executes the next piece of code that it can
and returns a debugger prompt. If the next piece of code to execute is inside a
function, it steps inside the function. The list command shows
five lines above and five lines below the debugger's current position in the
code. The list command above shows that the debugger is on line 13
with the -> characters. Instead of repeating the
step command, I could have just pressed Enter, which repeats the
previous command.
|
My next trick is to set a break point in f4() using the
(b)reak command, and continue to the break point using the
(c)ontinue command:
(Pdb) b f4
Breakpoint 1 at /home/jmjones/svn/articles/debugger
/simple_debugger_example.py:22
(Pdb) c
2
3
> /home/jmjones/svn/articles/debugger/simple_debugger_example.py(23)f4()
-> print some_arg
(Pdb) l
18 print some_arg
19 some_other_arg = some_arg + 1
20 return f4(some_other_arg)
21
22 B def f4(some_arg):
23 -> print some_arg
24 some_other_arg = some_arg + 1
25 return some_other_arg
26
27 if __name__ == "__main__":
28 pdb.runcall(f1, 1)
(Pdb)
The break command creates break points that the debugger will
stop at when it encounters them. The continue command tells the
debugger to keep executing code until it hits a break point or EOF.
I next issued a (w)here command:
(Pdb) where
/usr/local/python24/lib/python2.4/bdb.py(404)runcall()
-> res = func(*args, **kwds)
/home/jmjones/svn/articles/debugger/simple_debugger_example.py(10)f1()
-> return f2(some_other_arg)
/home/jmjones/svn/articles/debugger/simple_debugger_example.py(15)f2()
-> return f3(some_other_arg)
/home/jmjones/svn/articles/debugger/simple_debugger_example.py(20)f3()
-> return f4(some_other_arg)
> /home/jmjones/svn/articles/debugger/simple_debugger_example.py(23)f4()
-> print some_arg
(Pdb)
The where command prints out a stack trace, showing the call
tree from f1() to f4(). This can help you see how a
function is being called.
I then navigated up the stack trace with the (u)p command and
saw where I was with the list command:
(Pdb) u
> /home/jmjones/debugger/simple_debugger_example.py(20)f3()
-> return f4(some_other_arg)
(Pdb) u
> /home/jmjones/debugger/simple_debugger_example.py(15)f2()
-> return f3(some_other_arg)
(Pdb) u
> /home/jmjones/debugger/simple_debugger_example.py(10)f1()
-> return f2(some_other_arg)
(Pdb) u
> /usr/local/python24/lib/python2.4/bdb.py(404)runcall()
-> res = func(*args, **kwds)
(Pdb) u
*** Oldest frame
(Pdb) l
399 self.reset()
400 sys.settrace(self.trace_dispatch)
401 res = None
402 try:
403 try:
404 -> res = func(*args, **kwds)
405 except BdbQuit:
406 pass
407 finally:
408 self.quitting = 1
409 sys.settrace(None)
(Pdb)
The up command moves the debugger up a frame in the stack trace
to an older frame. In this example, it took me all the way up to the oldest
frame, which is one of the debugger modules, bdb.py. This is because
the debugger (part of which is in bdb.py) is actually executing this
example script.
I navigated down the stack trace a few frames with the (d)own
command, which is the opposite of the up command:
(Pdb) d
> /home/jmjones/svn/articles/debugger/simple_debugger_example.py(10)f1()
-> return f2(some_other_arg)
(Pdb) d
> /home/jmjones/svn/articles/debugger/simple_debugger_example.py(15)f2()
-> return f3(some_other_arg)
(Pdb) l
10 return f2(some_other_arg)
11
12 def f2(some_arg):
13 print some_arg
14 some_other_arg = some_arg + 1
15 -> return f3(some_other_arg)
16
17 def f3(some_arg):
18 print some_arg
19 some_other_arg = some_arg + 1
20 return f4(some_other_arg)
(Pdb)
I moved down two frames into f2(), but what does it really
mean to be in a different frame? As an illustration, I've
printed out a couple of variables in two different frames:
(Pdb) print some_arg
2
(Pdb) d
> /home/jmjones/svn/articles/debugger/simple_debugger_example.py(20)f3()
-> return f4(some_other_arg)
(Pdb) l
15 return f3(some_other_arg)
16
17 def f3(some_arg):
18 print some_arg
19 some_other_arg = some_arg + 1
20 -> return f4(some_other_arg)
21
22 B def f4(some_arg):
23 print some_arg
24 some_other_arg = some_arg + 1
25 return some_other_arg
(Pdb) print some_arg
3
(Pdb)
I went from f2() down to f3() and printed out
some_arg in both of them. I saw 2 from f2() and 3
from f3(). What would happen if I were to step
forward? Back up into f2() to see a clearer illustration:
(Pdb) u
> /home/jmjones/debugger/simple_debugger_example.py(15)f2()
-> return f3(some_other_arg)
(Pdb) l
10 return f2(some_other_arg)
11
12 def f2(some_arg):
13 print some_arg
14 some_other_arg = some_arg + 1
15 -> return f3(some_other_arg)
16
17 def f3(some_arg):
18 print some_arg
19 some_other_arg = some_arg + 1
20 return f4(some_other_arg)
(Pdb) s
4
> /home/jmjones/debugger/simple_debugger_example.py(24)f4()
-> some_other_arg = some_arg + 1
(Pdb) l
19 some_other_arg = some_arg + 1
20 return f4(some_other_arg)
21
22 B def f4(some_arg):
23 print some_arg
24 -> some_other_arg = some_arg + 1
25 return some_other_arg
26
27 if __name__ == "__main__":
28 pdb.runcall(f1, 1)
[EOF]
(Pdb)
If the debugger really were in f2(), stepping forward
should have taken it into f3() if it hadn't yet made the call to
f3(), or up into f1() if f3() had already
returned. Instead, it jumped to f4(). Navigating up or down a
stack trace provides access to that frame's local namespace, but that appears
to be about it.
Those are the majority of my most frequently used commands.
|
Now that you know what the debugger is, how to get it to run your code, and what the basic commands do, it's time to walk through a meatier example. The following code reads in a text file a line at a time, splits the line on white space, and converts the line to a dictionary with stringified word positions as its keys and the integer values of the words themselves as values for those keys:
#!/usr/bin/env python
import pdb
import string
import sys
class ConvertToDict:
def __init__(self):
self.tmp_dict = {}
self.return_dict = {}
def walk_string(self, some_string):
'''walk given text string and return a dictionary.
Maintain state in instance attributes in case we hit an exception'''
l = string.split(some_string)
for i in range(len(l)):
key = str(i)
self.tmp_dict[key] = int(l[i])
return_dict = self.tmp_dict
self.return_dict = self.tmp_dict
self.reset()
return return_dict
def reset(self):
'''clean up'''
self.tmp_dict = {}
self.return_dict = {}
def get_number_dict(self, some_string):
'''do super duper exception handling here'''
try:
return self.walk_string(some_string)
except:
#if we hit an exception, we can rely on tmp_dict
being a backup to the point of the exception
return self.tmp_dict
def main():
ctd = ConvertToDict()
for line in file(sys.argv[1]):
line = line.strip()
print "*" * 40
print "line>>", line
print ctd.get_number_dict(line)
print "*" * 40
if __name__ == "__main__":
#pdb.runcall(main)
main()
Note that I have pdb.runcall(main) commented out. This will
make it easy to drop into the debugger in a moment. Here is a simple example
input file:
jmjones@bean:~/debugger $ cat simple_example.data
1234 2345 3456 4567
9876 8765 7654 6543
Here is the output from running the script on that file:
jmjones@bean:~/debugger $ python example_debugger.py simple_example.data
****************************************
line>> 1234 2345 3456 4567
{'1': 2345, '0': 1234, '3': 4567, '2': 3456}
****************************************
****************************************
line>> 9876 8765 7654 6543
{'1': 8765, '0': 9876, '3': 6543, '2': 7654}
****************************************
Now, given the following input file:
jmjones@bean:~/debugger $ cat example_debugger.data
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4
1 2 3 4
the script produces this output:
jmjones@bean:~/debugger $ python example_debugger.py example_debugger.data
****************************************
line>> 1 2 3 4 5 6 7 8 9 10
{'1': 2, '0': 1, '3': 4, '2': 3, '5': 6, '4': 5, '7': 8, '6': 7, '9': \
10, '8': 9}
****************************************
****************************************
line>> 1 2 3 4 5 6 7 8 9 10
{'1': 2, '0': 1, '3': 4, '2': 3, '5': 6, '4': 5, '7': 8, '6': 7}
****************************************
****************************************
line>> 1 2 3 4
{'1': 2, '0': 1, '3': 4, '2': 3, '5': 6, '4': 5, '7': 8, '6': 7}
****************************************
****************************************
line>> 1 2 3 4
{'1': 2, '0': 1, '3': 4, '2': 3}
****************************************
Something is obviously amiss. The first and second lines should be the same, as should the third and fourth lines. What to do? Start randomly
inserting print statements throughout the source file or fire up
the debugger? How about fire up the debugger? Uncomment the debugger line,
comment out the main() line in the source file, and then:
jmjones@bean:~/debugger $ python example_debugger.py example_debugger.data
> /home/jmjones/debugger/example_debugger.py(35)main()
-> ctd = ConvertToDict()
(Pdb)
|
This greeted me with a friendly debugger prompt. What next? I always like to
know where I am, so doing a list is a good start:
(Pdb) l
30 except:
31 #if we hit an exception, we can rely on tmp_dict
being a backup to the point of the exception
32 return self.tmp_dict
33
34 def main():
35 -> ctd = ConvertToDict()
36 for line in file(sys.argv[1]):
37 line = line.strip()
38 print "*" * 40
39 print "line>>", line
40 print ctd.get_number_dict(line)
At this point, I could either step through the file a line at a time and examine everything, or formulate a hypothesis and try to prove or disprove it. Formulating a hypothesis also helps to rule out some obvious dead ends. I thought about things I could rule out:
get_number_dict() is not corrupting the input string because
it is passing it straight to walk_string().What could it be? The problem appears to be happening on either line 1 or 2 and causing problems as far as line 3 of the input file. I wondered about things I might not have watched for:
tmp_dict and/or
return_dict properly.I decided to set break points at lines 14 (the first line of code in the
walk_string() method) and 18 (the first line of code after the
for loop in the walk_string() method), run through
execution of the script once while inspecting variables, and watch for
anomalies.
(Pdb) b 14
Breakpoint 1 at /home/jmjones/debugger/example_debugger.py:14
(Pdb) b 18
Breakpoint 2 at /home/jmjones/debugger/example_debugger.py:18
(Pdb) c
****************************************
line>> 1 2 3 4 5 6 7 8 9 10
> /home/jmjones/svn/home/debugger/example_debugger.py(14)walk_string()
-> l = string.split(some_string)
To get my bearings, I listed where I was:
(Pdb) l
9 self.tmp_dict = {}
10 self.return_dict = {}
11 def walk_string(self, some_string):
12 '''walk given text string and return a dictionary.
13 Maintain state in instance attributes in case
we hit an exception'''
14 B-> l = string.split(some_string)
15 for i in range(len(l)):
16 key = str(i)
17 self.tmp_dict[key] = int(l[i])
18 B return_dict = self.tmp_dict
19 self.return_dict = self.tmp_dict
You can see I was at line 14, and you can also see the break points at lines
14 and 18. Just out of paranoia, I checked what was in some_string,
even though that printed out after the continue command:
(Pdb) print some_string
1 2 3 4 5 6 7 8 9 10
The continue command dropped me down to the next break point, at
line 18:
(Pdb) c
> /home/jmjones/svn/articles/debugger/example_debugger.py(18)walk_string()
-> return_dict = self.tmp_dict
At this point, self.tmp_dict has everything in it that it is
going to have:
(Pdb) print self.tmp_dict
{'1': 2, '0': 1, '3': 4, '2': 3, '5': 6, '4': 5, '7': 8, '6': 7, '9': \
10, '8': 9}
This looked good. I continued and printed out the next input line:
(Pdb) c
{'1': 2, '0': 1, '3': 4, '2': 3, '5': 6, '4': 5, '7': 8, '6': 7, '9': \
10, '8': 9}
****************************************
****************************************
line>> 1 2 3 4 5 6 7 8 9 10
> /home/jmjones/svn/articles/debugger/example_debugger.py(14)walk_string()
-> l = string.split(some_string)
(Pdb) print some_string
1 2 3 4 5 6 7 8 9 10
The input line looked fine. I continued further, knowing that the debugger should stop at the break point set at line 18. Then I could run through a couple of diagnostics and see whether everything looked OK:
(Pdb) c
{'1': 2, '0': 1, '3': 4, '2': 3, '5': 6, '4': 5, '7': 8, '6': 7}
****************************************
****************************************
line>> 1 2 3 4
> /home/jmjones/svn/articles/debugger/example_debugger.py(14)walk_string()
-> l = string.split(some_string)
Huh? It dropped me back to line 14 rather than on 18, and I was on the next data input line. This means that the interpreter returned before it hit line 18, so it did not fully process the second line of data from the data file. I just found where the problem was, but not what was causing it. I had to run through it again, a little more carefully this time, in order to figure it out. Fortunately, I could extract a little information from this run to help out with the next run.
If walk_string() returned before it hit line 18, it could not have
done a proper reset(). What did self.return_dict and
self.tmp_dict contain just then?
(Pdb) print self.return_dict
{}
(Pdb) print self.tmp_dict
{'1': 2, '0': 1, '3': 4, '2': 3, '5': 6, '4': 5, '7': 8, '6': 7}
That's why the program had too much information in the return dictionary for
data line 3; it carried the dictionary self.tmp_dict with it from
the data line that went bad. I felt pretty confident that the problem occurred
between lines 14 and 18 when processing line 2 of the data file. I just didn't
know why--yet.
|
The only reason the interpreter would have returned between lines 14 and 18
is that it hit an exception. Because the debugger has a couple of nifty
postmortem functions, I modified my script to use one, fired it up to see where
it stopped, and started debugging from there. Here is the modified source file; pay special attention to the try/except block in the
get_number_dict() method:
#!/usr/bin/env python
import pdb
import string
import sys
class ConvertToDict:
def __init__(self):
self.tmp_dict = {}
self.return_dict = {}
def walk_string(self, some_string):
'''walk given text string and return a dictionary.
Maintain state in instance attributes in case we hit an exception'''
l = string.split(some_string)
for i in range(len(l)):
key = str(i)
self.tmp_dict[key] = int(l[i])
return_dict = self.tmp_dict
self.return_dict = self.tmp_dict
self.reset()
return return_dict
def reset(self):
'''clean up'''
self.tmp_dict = {}
self.return_dict = {}
def get_number_dict(self, some_string):
'''do super duper exception handling here'''
try:
return self.walk_string(some_string)
except:
#modified exception handler - drop us into a debugger
tb = sys.exc_info()[2]
pdb.post_mortem(tb)
#if we hit an exception, we can rely on tmp_dict
being a backup to the point of the exception
return self.tmp_dict
def main():
ctd = ConvertToDict()
for line in file(sys.argv[1]):
line = line.strip()
print "*" * 40
print "line>>", line
print ctd.get_number_dict(line)
print "*" * 40
if __name__ == "__main__":
main()
I chose to kick off the postmortem debugger in the except clause of the
get_number_dict() method, because get_number_dict() is
the closest exception handler to where the exception must be occurring in the
walk_string() method. Here is the result of a run with a
list immediately after it for context:
jmjones@bean:~/debugger $ python example_debugger_pm.py example_debugger.data
****************************************
line>> 1 2 3 4 5 6 7 8 9 10
{'1': 2, '0': 1, '3': 4, '2': 3, '5': 6, '4': 5, '7': 8, '6': 7, '9': \
10, '8': 9}
****************************************
****************************************
line>> 1 2 3 4 5 6 7 8 9 10
> /home/jmjones/debugger/example_debugger_pm.py(17)walk_string()
-> self.tmp_dict[key] = int(l[i])
(Pdb) list
12 '''walk given text string and return a dictionary.
13 Maintain state in instance attributes in case
we hit an exception'''
14 l = string.split(some_string)
15 for i in range(len(l)):
16 key = str(i)
17 -> self.tmp_dict[key] = int(l[i])
18 return_dict = self.tmp_dict
19 self.return_dict = self.tmp_dict
20 self.reset()
21 return return_dict
22 def reset(self):
This confirmed some of my earlier suspicions. It was hitting an
exception in the for loop. Now my goal was to figure out what
exception it hit and why:
(Pdb) for e in sys.exc_info(): print "EXCEPTION>>>", e
EXCEPTION>>> exceptions.AttributeError
EXCEPTION>>> Pdb instance has no attribute 'do_for'
EXCEPTION>>> <traceback object at 0x402ef784>
What? My code was not even calling a do_for() method or trying
to access any do_for attribute. This is a little gotcha that
hounded me for a bit as I was trying to print the last exception from within
this example code. You should be able to tell what is going on with a little
insight from the traceback module:
(Pdb) import traceback
(Pdb) traceback.print_stack()
File "example_debugger_pm.py", line 48, in ?
main()
File "example_debugger_pm.py", line 44, in main
print ctd.get_number_dict(line)
File "example_debugger_pm.py", line 33, in get_number_dict
pdb.post_mortem(tb)
File "/usr/local/python24/lib/python2.4/pdb.py", line 1009, in post_mortem
p.interaction(t.tb_frame, t)
File "/usr/local/python24/lib/python2.4/pdb.py", line 158, in interaction
self.cmdloop()
File "/usr/local/python24/lib/python2.4/cmd.py", line 142, in cmdloop
stop = self.onecmd(line)
File "/usr/local/python24/lib/python2.4/cmd.py", line 218, in onecmd
return self.default(line)
File "/usr/local/python24/lib/python2.4/pdb.py", line 167, in default
exec code in globals, locals
File "<stdin>", line 1, in ?
Basically, the Python interpreter keeps track of any exceptions that running
code raises; it saves the last traceback when running code hits an exception.
The debugger is just another piece of code that the interpreter runs. When a
user feeds the debugger commands, those commands may raise exceptions within
the debugger itself. In this case, I gave the debugger a for
statement. It first tried to execute a debugger for command by
calling the do_for() method. That raised the exception above.
Such debugger exceptions can potentially pollute the interpreter with
unexpected exceptions and tracebacks that we users don't really want to see
when trying to debug our own code. Maybe there is a better way of writing a
debugger than combining the debugger code and the debugged code in the same
interpreter as shown here, but the Python team has produced a pretty tricky,
complex piece of code that otherwise works really well.
|
I started the debugging session over by quitting with the
(q)uit command, and restarted the debugger by kicking off the
script again:
(Pdb) q
{'1': 2, '0': 1, '3': 4, '2': 3, '5': 6, '4': 5, '7': 8, '6': 7}
****************************************
****************************************
line>> 1 2 3 4
{'1': 2, '0': 1, '3': 4, '2': 3, '5': 6, '4': 5, '7': 8, '6': 7}
****************************************
****************************************
line>> 1 2 3 4
{'1': 2, '0': 1, '3': 4, '2': 3}
****************************************
jmjones@bean:~/debugger $ python example_debugger_pm.py example_debugger.data
****************************************
line>> 1 2 3 4 5 6 7 8 9 10
{'1': 2, '0': 1, '3': 4, '2': 3, '5': 6, '4': 5, '7': 8, '6': 7, '9': \
10, '8': 9}
****************************************
****************************************
line>> 1 2 3 4 5 6 7 8 9 10
> /home/jmjones/debugger/example_debugger_pm.py(17)walk_string()
-> self.tmp_dict[key] = int(l[i])
(Pdb)
Now, I could give a command to show the exception at this line:
(Pdb) !for e in sys.exc_info(): print "EXCEPTION", e
EXCEPTION exceptions.ValueError
EXCEPTION invalid literal for int(): 9
EXCEPTION <traceback object at 0x402ef0f4>
This is the same command as before, except I prepended it with a
! to tell the debugger that this is Python code, which it needs to
evaluate, rather than a debugger command. Because the debugger knows this is
Python code to evaluate, it doesn't try to execute a do_for()
method and generate an exception.
The line self.tmp_dict[key] = int(l[i]) raised a ValueError
exception because it could not covert "9" to an int? That is
really weird. Sometimes, things aren't exactly what they seem, though.
What exactly was the input? Take a look:
(Pdb) !print l[i]
9
That looks pretty normal to me. When I didn't feed l[i] to
print, what happened?
(Pdb) !l[i]
'9\x089'
The mystery is pretty much over. The input data contained some funky values
that masked themselves. I did the same thing with some_string
(which was a single input line from the data file):
(Pdb) !print some_string
1 2 3 4 5 6 7 8 9 10
This looked pretty normal as well. Here it is when I don't
print it:
(Pdb) !some_string
'1 2 3 4 5 6 7 8 9\x089 10'
The \x08 character is \b, a backspace, so when the code prints out the input line, it prints 1 2 3 4 5 6 7 8 9,
backspaces over the 9, and then prints out 9 10. If you
ask the interpreter what the value of the input line is, it shows you the
string value--including the hex value of unprintable characters.
The ValueError exception is totally expectable in this
situation. Here is the result, at a debugger prompt, of trying to get the integer
value of the same kind of string:
(Pdb) int("9\b9")
*** ValueError: invalid literal for int(): 9
The problem with the example code boils down to a couple of things:
The first item (improper exception handling) caused the code to create a
dictionary of part of the corrupt input line. It created a dictionary with keys
of 0 through 7.
The second item (improper cleanup) caused the code to use the dictionary
that existed from the corrupt line (line 2 from the input data file) for the
following line (line 3 from the input data file). That is why the first
1 2 3 4 line contained dictionary keys of 0 through
7 rather than 0 through 3.
Interestingly, if the code handled the exception of converting a string to an
int properly, cleaning up would not have been an issue.
Here is a better version of the code, which does better exception handling and cleans up better in the case of a catastrophic error:
#!/usr/bin/env python
import pdb
import string
import sys
class ConvertToDict:
def __init__(self):
self.tmp_dict = {}
self.return_dict = {}
def walk_string(self, some_string):
'''walk given text string and return a dictionary.
Maintain state in instance attributes in case we hit an exception'''
l = string.split(some_string)
for i in range(len(l)):
key = str(i)
try:
self.tmp_dict[key] = int(l[i])
except ValueError:
self.tmp_dict[key] = None
return_dict = self.tmp_dict
self.return_dict = self.tmp_dict
self.reset()
return return_dict
def reset(self):
'''clean up'''
self.tmp_dict = {}
self.return_dict = {}
def get_number_dict(self, some_string):
'''do slightly better exception handling here'''
try:
return self.walk_string(some_string)
except:
#if we hit an exception, we can rely on tmp_dict
being a backup to the point of the exception
return_dict = self.tmp_dict
self.reset()
return return_dict
def main():
ctd = ConvertToDict()
for line in file(sys.argv[1]):
line = line.strip()
print "*" * 40
print "line>>", line
print ctd.get_number_dict(line)
print "*" * 40
if __name__ == "__main__":
main()
The output from running it is:
jmjones@bean:~/debugger $ python example_debugger_fixed.py example_debugger.data
****************************************
line>> 1 2 3 4 5 6 7 8 9 10
{'1': 2, '0': 1, '3': 4, '2': 3, '5': 6, '4': 5, '7': 8, '6': 7, '9': \
10, '8': 9}
****************************************
****************************************
line>> 1 2 3 4 5 6 7 8 9 10
{'1': 2, '0': 1, '3': 4, '2': 3, '5': 6, '4': 5, '7': 8, '6': 7, '9': \
10, '8': None}
****************************************
****************************************
line>> 1 2 3 4
{'1': 2, '0': 1, '3': 4, '2': 3}
****************************************
****************************************
line>> 1 2 3 4
{'1': 2, '0': 1, '3': 4, '2': 3}
****************************************
That looks much better. If the script cannot convert a string to an integer,
it puts None in the dictionary.
The Python debugger is an indispensable tool when you have a problem that is eluding other efforts to root it out. It is not an everyday tool, but when you need it, you need it.
Jeremy Jones is a software engineer who works for Predictix. His weapon of choice is Python.
Return to the Python DevCenter.
Copyright © 2009 O'Reilly Media, Inc.