Which Table, Which Column?
by Jonathan Gennick, coauthor of Transact-SQL Cookbook08/22/2002
When I first learned SQL, I was enthralled by the idea that I could just declare what data I wanted and let the database software figure out how best to get that data. Let the computer do the work, right? That same attitude also influenced the way in which I wrote SQL statements: when I wrote a query involving multiple tables, I figured it was the database's job to sort out which column belonged to which table. For example:
SELECT emp_nm, dept_nm
FROM emp_mst, dept_mst
WHERE emp_dept_id = dept_id;
The database has all that metadata, so let the database figure out that
emp_nm and emp_dept_id are columns of emp_mst, and that dept_id and dept_nm are columns of dept_mst. In the arrogance and ignorance of my youth, this attitude made perfect sense, but I've since learned humility, and I've
learned to use table aliases consistently, too.
Many potential problems lurk when you do not fully qualify column names
using either table names or table aliases. In this article, I'm going to
focus on just one such problem recently brought to my attention by a perplexed
reader. To begin with, I have two tables, emp_mst and dept_mst, with the
following columns and data:
SQL>SELECT * FROM emp_mst;EMP_ID EMP_NM EMP_DEPT_ID -------- ------------------------- ----------- 1 Jonathan Gennick 1 2 Mayank Agarwal 2 SQL>SELECT * FROM dept_mst;DEPT_ID DEPT_NM DEPT_MGR_ID -------- -------------------------- ----------- 1 Accounts Payable 1 2 Data Processing 0
Let's say I wish to write a query that returns emp_mst data for all department
managers. My CEO needs this information for a presentation he's making
to the board of directors, and he asked me for it at the very last minute,
so I'm a bit distracted with other thoughts, but this is a simple enough
query, so I sit down and quickly write the following:
SELECT *
FROM emp_mst
WHERE emp_id IN (
SELECT emp_id
FROM dept_mst);
This looks good. I'll test my new query by executing it in SQL*Plus:
SQL> SELECT *
2 FROM emp_mst
3 WHERE emp_id IN (
4 SELECT emp_id
5 FROM dept_mst);
EMP_ID EMP_NM
-------- ------------------------------
1 Jonathan Gennick
2 Mayank Agarwal
Looks good! The query works. I quickly run the query again to generate a printed report, hand that off to my CEO, and away he goes to the board meeting to give his presentation. On his way out the door, he gives me an "atta-boy" for getting the report done so quickly; I go home and sleep that night with visions of promotions, raises, and year-end bonuses dancing in my head.
|
Related Reading Transact-SQL Cookbook |
The very next day I arrive at work only to get chewed out by my boss, and then later by the CEO himself. I gave the CEO a report full of bad data, and as a result his presentation to the board of directors was a complete flop. I'm embarrassed and humiliated. I feel like a two-bit junior programmer fresh from college, and I think I can forget about that promotion.
What happened? What went wrong? In my hurry to generate the report for
my CEO, I'd mistakenly referenced the wrong column in my subquery. I'd
intended to write the following subquery to return a list of manager IDs
from the dept_mst table:
SELECT dept_mgr_id FROM dept_mst
Instead, being a bit distracted, I'd written the following, incorrect subquery:
SELECT emp_id FROM dept_mst
Realizing now that I'd written emp_id where I'd intended to write dept_mgr_id, I sit in my cube, nursing my now-cold cup of coffee, wondering how I could
have made such an elementary blunder. Then I begin to wonder why my query
even worked to begin with. After all, I selected a column from dept_mst
that didn't exist in the dept_mst table. Why oh why did the database return
results? What business did the database have executing what was obviously
a bad query? I set my coffee cup aside, type in my subquery, and get the
following results:
SQL> SELECT emp_id FROM dept_mst;
SELECT emp_id FROM dept_mst
*
ERROR at line 1:
ORA-00904: "EMP_ID": invalid identifier
Sure enough, the database recognized that the emp_id column didn't exist
in the dept_mst table, and returned a corresponding error message. Well
<dirty words>! Why did the same SELECT not return an error when
used as a subquery when I generated that report for my CEO? Why did the
database not catch my mistake when it counted? Why indeed? The answer
to these questions lies in an understanding of the ANSI SQL standard's
prescribed behavior for resolving column names used in an SQL statement.
When we don't explicitly link column names to their corresponding table,
the database software must, in accordance with the ANSI SQL standard,
attempt to "divine" our intent. In the case of the query that
caused me so much grief, the database did the following:
-
Checked
dept_mst, the table listed in the subquery'sFROMclause, for anemp_idcolumn. Noemp_idcolumn was found indept_mst, so the database continued its search. -
Checked
emp_mst, found anemp_idcolumn there, and consequently presumed that I was referencing that column, theemp_mst.emp_idcolumn, from my subquery.
The database had no way of knowing that I intended to write dept_mgr_id
in the subquery, rather than emp_id. My intent was to write a non-correlated
subquery, executed just once, that returned a list of department manager
IDs from the dept_mst table. Because I didn't use table aliases to fully qualify
my column names, the database interpreted my subquery as a correlated
subquery linked to the outer query via the emp_id column. The resulting
subquery was executed once for each row retrieved by the outer query,
and generated erroneous results. Consequently, my outer query also returned
erroneous results. The outer query retrieved all records from emp_mst.
For the first record retrieved, emp_id was 1, so the following version
of the subquery was executed:
SELECT 1 FROM dept_mst
For the second emp_mst record, emp_id was 2, so the following subquery
was executed:
SELECT 2 FROM dept_mst
In each case, the subquery was selecting the current emp_id value from
dept_mst, so of course emp_id was in the list of values returned by the
subquery. Duh! For each row returned by the outer query, the IN predicate
returned TRUE, so the query returned all rows from emp_mst, when my intent
was to return rows only for department managers.
Understanding now why my query executed and returned incorrect data rather than generating an error message, I pour myself another cup of coffee and sit back to ponder what I can do in the future to reduce the risk of ever making such a mistake again. I come up with two practices to adopt:
-
Test each subquery independently of the main query.
Use table aliases to fully qualify column names in a query.
My subquery was very simple; so simple that I didn't see the need to test it independently of the main query. Not testing it independently was a mistake. Had I been a bit more rigorous in my approach, I would have tested the subquery apart from the main query, and the resulting error message would have clued me in to my blunder. In the future, I'll take care to avoid such "shortcuts".
See my Oracle Magazine article "An Incremental Approach to Developing SQL Queries" for more advice on incrementally developing and testing SQL queries.
The other practice I resolve to adopt is to use table aliases in any
query involving more than one table, to explicitly declare my intentions
as to which column is associated with which table. I intended, in my subquery,
to select an employee ID column from the dept_mst table. The problem was
that my memory failed me, and I thought emp_id was the correct column
name. Had I used table aliases to make my intentions clear, my query would
have failed straightaway:
SQL> SELECT *
2 FROM emp_mst e
3 WHERE e.emp_id IN (
4 SELECT d.emp_id
5 FROM dept_mst d);
SELECT d.emp_id
*
ERROR at line 4:
ORA-00904: "D"."EMP_ID": invalid identifier
The alias in front of emp_id in the subquery removes all doubt as to
which table I am selecting from, and the error message makes it painfully
clear that I used the wrong column name. Had I encountered this error,
I would have immediately realized my mistake, and would have used the
following, correct query to generate that report for my CEO:
SELECT *
FROM emp_mst e
WHERE e.emp_id IN (
SELECT d.dept_mgr_id
FROM dept_mst d);
As I sip the last of my coffee, I'm inspired with the idea of writing a story about my experience and posting it on the company Intranet for the benefit of my fellow developers. My boss reads the story and tells his boss. Word gets up to my CEO, who appreciates that I can learn from my mistakes, and he is especially pleased that I'm sharing my newfound knowledge. He even takes me to lunch. I drift off to sleep that night (again) with visions of promotions and bonuses dancing in my head ...
Jonathan Gennick is an O'Reilly Media, Inc. editor specializing in database and programming titles.
Return to ONLamp.com.
