# #34: SQL joins

In relational databases, data is kept in relations, commonly known as tables. Simplifying, when data is normalized, it’s not duplicated. For example, when storing books and authors, you don’t keep an author’s name next to a book record. Instead, you use a so-called foreign key that references the author in another table. Thanks to this level of indirection, books by the same author do not store repeated information. This has many benefits and one, huge drawback. In order to look up a book together with a corresponding author you must somehow correlate these two tables. This is called joining.

You can think of joining as taking all records from one table and finding a corresponding record in another table.
Typically, one column in one table must be equal to another column in the other table.
In our example `books`

table has an `author_id`

column.
That column must reference an existing record in the other table.
Typically, it’s a primary key, like `author.id`

.
Thinking imperatively, for each book the SQL engine must find an author for a given `author_id`

.
In SQL terms it’s called *inner join*.
It’s written as

```
SELECT *
FROM books JOIN authors ON books.author_id = authors.id
```

or with implicit syntax, simply:

```
SELECT *
FROM books, authors
WHERE books.author_id = authors.id
```

Now, there are some edge cases.
What if there’s a book with no author at all?
For example, the excellent “*Frankenstein*” novel was first published anonymously.
Mary Shelley revealed herself in the second edition.
In our database, the `author_id`

is NULL.
Inner join can’t find a corresponding author for “*Frankenstein*” so this book is entirely skipped.
This is where *outer joins* start to appear.
Replace `JOIN`

with `LEFT JOIN`

and you’ll see all books in the result.
Some of them, like “*Frankenstein*”, will have `NULL`

where author’s ID and name were expected.
In the same manner, you can try `RIGHT JOIN`

.
In that case you’ll end up with authors that never published any book.
Put differently, records in the `authors`

table, that are not referenced by any record in the `books`

table.

Another interesting side-effect.
If an author never published any novel, he or she will appear once.
The same applies for authors that published just one novel.
Like Margaret Mitchell, who wrote one masterpiece: “*Gone with the Wind*”.
But what about authors who published more than one book?
Well, they will appear as many times as many books they written.

There’s one more type of `JOIN`

worth mentioning.
`CROSS JOIN`

.
If you have some math background, this is the Cartesian product of two sets.
Without math background: each and every row from one table is mapped to each and every row of the second table.
This is especially useful combined with another technique: self joins.
Believe it or not, you can join a table with itself.
For example, imagine you are looking for pairs of books published the same year.
A `book`

table has a `published_year`

column.
But how do you find two books with the same `published_year`

value?
Simple!
Join every book with every other book first.
Just be careful, for a tiny library of one hundred books, this produces ten thousand pairs.
So an extra filtering condition is needed to only choose pairs with the same year.
For example:

```
select *
from books b1 join books b2
on b1.published_year = b2.published_year
and b1.id > b2.id
```

Aliases `b1`

and `b2`

are needed to distinguish between `books`

tables.
Moreover, a condition on ID excludes duplicates and self-joins.

Implementing fast joins in relational databases requires quite a lot of engineering and optimizations. But that’s a different story.

Thanks for listening, bye!