|
What's in a Condition?by Jonathan Gennick, author of Oracle SQL*Plus Pocket Reference, 2nd Edition10/01/2002 |
Oracle9i Database introduced support for SQL92 join syntax. This
newly supported syntax allows you to explicitly specify join conditions in the
FROM clause of a SELECT statement. SQL92 join syntax has been supported by other databases for years, but is new to those of us who have been living in an
Oracle-centric world. Recently, while researching the behavior of this new (to Oracle) syntax, I had to significantly adjust my mental model of joins to incorporate a new understanding of just what a join condition is. I was
surprised and you might be too. Read on to learn of a subtle, but important-to-understand facet of this newly supported syntax.
The Old Way
Up until Oracle9i, you expressed a join by listing two tables in the FROM
clause of a SELECT statement and writing one or more join conditions in the
WHERE clause of that same statement. The join conditions served to link the two
tables together, and each condition referenced one column from each of the two tables involved in the join. For example, the following is a typical outer-join using the traditional Oracle syntax:
SQL> SELECT c.city_name, c.population, b.business_name
2 FROM up_city c, up_business b
3* WHERE c.city_name = b.city_name(+);
CITY_NAME POPULATION BUSINESS_NAME
-------------------- ---------- -------------------------
Ishpeming 6686 Da Yoopers Tourist Trap
Munising 2539
Sault Ste. Marie 16542 Museum Ship Valley Camp
This query returns a list of Upper Peninsula (of Michigan) cities and
businesses. The "(+)" following the b.city_name column in the WHERE clause makes the up_business table the optional table in the join (the b.city_name column refers to the up_business table). The results include one city, Munising, for which no businesses are known, but because this is an outer-join, and because
the up_business table is the optional table, Munising is listed anyway.
|
Related Articles: Learn about SQL92 |
I chose an outer-join for this example, because it's the SQL92 syntax's
impact on outer-joins that might surprise you. The join condition for this query
is implicit. You recognize the join condition by the fact that it links columns
from the two tables via a common column, but otherwise there's nothing in the
syntax to identify it as a join condition. It's no different than any other
condition you might find in the WHERE clause. Up until very recently, my mental
definition of a join condition went something like:
A join condition is any condition that uses a comparison operator (such as =) to link a column from one table to a column from another table.
This mental definition of mine came about because I've always worked with
joins that were specified implicitly. SQL92 joins are explicit, and that's key
to my new understanding of how joins work.
The New Way
The new syntax allows us to specify join conditions in the FROM clause of a
SELECT statement. Thus, I can rewrite the outer-join shown in the previous
section as follows:
SQL> SELECT c.city_name, c.population, b.business_name
2 FROM up_city c LEFT OUTER JOIN up_business b
3 ON c.city_name = b.city_name;
CITY_NAME POPULATION BUSINESS_NAME
-------------------- ---------- ------------------------
Ishpeming 6686 DA Yoopers Tourist Trap
Sault Ste. Marie 16542 Museum Ship Valley Camp
Munising 2539
No surprises here, not yet anyway. The results are in a different order, but
that doesn't matter as I didn't specify an ORDER BY clause. The data returned is
identical to that returned by the previous query. What is different is that the
join condition is explicit from the syntax of the statement. The ON clause
specifically identifies c.city_name = b.city_name as the join condition. This is
a key point to understand, as you'll soon see.
The Surprise
The surprise for me came when a reader asked whether it might be more
efficient to put what would otherwise be a non-join condition into the FROM
clause in the hopes that it might be able to eliminate rows earlier in the query
execution process, thus making the query more efficient. Consider the following
query, which is an extension of our previous query that restricts the results to
only those cities with populations 5,000 and higher:
SQL> SELECT c.city_name, c.population, b.business_name
2 FROM up_city c LEFT OUTER JOIN up_business b
3 ON c.city_name = b.city_name
4* WHERE c.population >= 5000;
CITY_NAME POPULATION BUSINESS_NAME
-------------------- ---------- -------------------------
Ishpeming 6686 DA Yoopers Tourist Trap
Sault Ste. Marie 16542 Museum Ship Valley Camp
This query works just as expected. Now, let's try that reader's suggestion
and move the restriction on population into the FROM clause. By doing that, we
explicitly make c.population >= 5000 one of the join conditions. It's one
of the join conditions even though it doesn't link columns from two tables. Following are the results:
SQL> SELECT c.city_name, c.population, b.business_name
2 FROM up_city c LEFT OUTER JOIN up_business b
3 ON c.city_name = b.city_name
4* AND c.population >= 5000;
CITY_NAME POPULATION BUSINESS_NAME
-------------------- ---------- -------------------------
Munising 2539
Ishpeming 6686 DA Yoopers Tourist Trap
Sault Ste. Marie 16542 Museum Ship Valley Camp
Look at this: Munising is back in the game. Why? My query still contains the
condition: c.population >= 5000. Why does Munising, which clearly violates my
condition, still show up in the results? This behavior came as a complete and
utter shock to me. I was quite dumbfounded and convinced I'd uncovered a rather
serious bug in Oracle's software. But it was my mental model which needed
adjusting, not Oracle's software.
The Explanation
Remember my mental definition for join condition. I had thought that
the use of the ON clause versus the WHERE clause was mere eye-candy, and that
all conditions were treated identically no matter where they were placed in the
query. That's not the case. By moving the restriction on population from the
WHERE clause to the FROM clause, I made it one of the join conditions, thus changing the semantics of my query. To understand why that affected the results
the way it did, you need to understand conceptually how a query with an outer-join is executed:
- First, a Cartesian product is produced from the two tables involved in the
join. This results in all combinations of all rows. Thejoinconditions are then evaluated for each row in the Cartesian-product. - If the
joinconditions evaluate toTRUE, the row in question is retained in the result set. - If the
joinconditions evaluate toFALSE, the row is still retained. The columns from the non-optional table (up_cityin my case) are preserved, and the columns from the optional table (up_business) are set toNULL. - Any duplicate rows as a result of step 3 are ultimately removed.
- The
WHEREclause is evaluated, and any rows not satisfying the conditions in theWHEREclause are removed.
The key to why Munising shows up in the results lies in step 3. Join
conditions are explicit in the SQL92 syntax. By moving c.population >= 5000
into the FROM clause, I made that condition one of the join conditions, and
moved it's evaluation up from step 5 to step 3. Thus, in step 3, for my query,
Oracle is evaluating the following:
c.city_name = b.city_name AND c.population >= 5000
Clearly for Munising, with its population of only 2,539, this expression
evaluates to FALSE. However, it doesn't matter, because the expression is
evaluated as part of step 3. The row is still retained.
Updating My Mental Model
You can see that the ability to make join conditions explicit broadens the
scope of what can be considered a join condition. I had to update my mental
model of how joins work to include the following definition:
A join condition is any condition that I explicitly specify in the FROM clause, regardless of what columns, if any, it references. Join conditions may still be specified in the WHERE clause (for backwards compatibility), in which case they are those that use a comparison operator (such as =) to link a column from one table to a column from another table.
|
Related Reading
Oracle SQL Plus Pocket Reference |
It's worth pointing out that with an inner-join, it won't make a difference
if you move a condition from the WHERE clause to the FROM clause. That's because
an inner-join doesn't allow for optionality: all join conditions must evaluate
to TRUE or a row is rejected.
I don't know about the efficiency aspect of that one reader's question as to
whether it might sometimes be more efficient to evaluate a condition in the FROM
clause rather than the WHERE clause of a query. He hails from the DB2 world
where, he tells me, queries do sometimes execute faster if you push restrictions
into the FROM clause. I've run a few tests with Oracle, and always Oracle comes
up with the same execution plan each way. I'm keeping an open mind on the issue,
but at the same time my advice is not to play such games with your queries. Aim
for clarity. Don't obfuscate things by moving what should be a non-join
condition into the FROM clause.
New Possibilities
Now that I've aligned my mental model of joins to reality, I realize that
there is a benefit to the way things work. The ANSI committee wasn't sitting
around trying to make things confusing for us. Using Oracle's old syntax for
outer-joins, it would never have been possible for a condition such as
c.population >= 5000 to be ignored. That condition would never have been
considered a join condition, because it doesn't reference one column from each
table. Only using the SQL92 syntax do you have the choice of making it a join
condition. And choice is what's important.
Acknowledgments
My thanks to the reader from the DB2 world (you know who you are) who first
posed the question about pushing conditions into the FROM clause, which ended
up pushing me down the path of discovery. Thanks also to Joe Celko, author of SQL
for Smarties and former member of the ANSI SQL standards committee, for enlightening me as to the conceptual execution of an outer-join.
Jonathan Gennick is an O'Reilly Media, Inc. editor specializing in database and programming titles.
Return to the O'Reilly Network.
-
Good Article!!!
2006-04-19 12:27:05 Santosh_Gaikwad [View]
-
For too much text from small change in concept
2002-12-10 11:32:47 anonymous2 [View]
-
Why so long?
2002-10-24 09:45:30 anonymous2 [View]
-
Pushing the WHERE clause into the FROM clause
2002-10-07 12:23:49 anonymous2 [View]
-
Refreshing
2002-10-04 07:44:48 Donald Bales |
[View]
-
Refreshing
2002-10-04 15:42:31 Dave Rolsky |
[View]
-
This is a bit scary ...
2002-10-02 12:39:50 Dave Rolsky |
[View]
-
This is a bit scary ...
2002-10-03 05:55:48 John W. Adams |
[View]
-
Learning is nothing to be scared of
2002-10-03 08:09:16 Jonathan Gennick |
[View]
-
Learning is nothing to be scared of
2002-10-03 13:39:49 Dave Rolsky |
[View]




