SQL Zoo Guide: SELECT within SELECT
Eager to practice SQL?
Introduction
SQL Zoo offers plenty of exercises with actual tables and a SQL editing box to code and run queries in. Today’s post will be part of a series of multiple blog posts that serve as a guide for completing the SQL exercises found in SQL Zoo. Assuming the reader already has a basic understanding of SQL, we’ll skip over the basic exercises and begin with the intermediate/advanced exercises.
The first exercise involves learning to use SELECT
within existing SELECT
queries. Below is a snippet of the table we’ll be working with for the following exercises.
world Table
name | continent | area | population | gdp |
---|---|---|---|---|
Afghanistan | Asia | 652230 | 25500100 | 20343000000 |
Albania | Europe | 28748 | 2831741 | 12960000000 |
Algeria | Africa | 2381741 | 37100000 | 188681000000 |
Andorra | Europe | 468 | 78115 | 3712000000 |
Angola | Africa | 1246700 | 20609294 | 100990000000 |
… | … | … | … | … |
SQL execution order
Before we begin writing queries, it’s useful to think about the order in which SQL executes a query i.e.
Step 1: From |
Step 2: Join |
Step 3: Where |
Step 4: Group By |
Step 5: Having |
Step 6: Select |
Step 7: Distinct |
Step 8: Order By |
Having a clear understanding of how SQL executes queries helps the user organize their thoughts and think about the framework of the query about to be written.
#1. Bigger than Russia
1) List each country name where the population is larger than that of ‘Russia’.
For the first example, begin with fragmenting the requested data into multiple parts. The example can be split into four parts:
Pull countries from world table. |
Specifically name of country |
Condition: population of country > Russia |
Require population of Russia |
Let’s translate this in SQL terms and build a query.
SELECT name FROM world -- Step 1: FROM
WHERE population > -- Step 2: WHERE
(SELECT population FROM world -- Step 3: (Start of a new query, reset your steps!) SELECT/FROM
WHERE name ='Russia') -- Step 4: WHERE
By splitting the problem into separate parts, it becomes easier to work out the flow of logic needed to query the correct data.
Lets move on to the next example and apply a similar approach as above.
#2. Richer than UK
2) Show the countries in Europe with a per capita GDP greater than ‘United Kingdom’.
Fragment the example based on the data requested.
Pull countries from world table |
Specifically name of country |
Condition 1: Country from Europe |
Condition 2: Per capita GDP > UK |
Per capita GDP = GDP/Population |
Require per capita GDP of UK |
Build your query using the above fragemnts.
SELECT name FROM world -- Step 1: FROM(Note* only asked for list of counties, not per capita gdp)
WHERE continent = 'Europe' -- Step 2: WHERE (Condition 1)
AND gdp/population > -- Divide gdp by population to pull per capita GDP
(SELECT gdp/population FROM world -- Step 3: (Reset your steps!) SELECT/FROM
WHERE name = 'United Kingdom') -- STEP 4: WHERE (Condition 2)
Despite adding an extra condition and requiring per capita gdp, the problem was relatively straightforward once broken down into individual parts.
#3. Neighbors of Argentina and Australia
3) List the name and continent of countries in the continents containing either Argentina or Australia. Order by name of the country.
As we progress through the examples, the queries will increase in complexity, however, remember to think about the order of execution and break the problem apart into well defined fragments.
Based on the requested data, we want the following:
Pull data from world table |
Specifically name of the country and its respective continent |
Condition 1: Countries that reside in the same continent as Argentina |
Condition 2: Countries that reside in the same continent as Australia |
Order by name of Country |
Build your query using the above fragments.
SELECT name, continent FROM world -- Step 1: FROM (Want both country and continent)
WHERE continent IN --Step 2: WHERE (*Note Use `IN` since we may return more than one result)
(SELECT continent FROM world -- Step 3:(Reset your steps!) SELECT/FROM
WHERE name IN ('Argentina', 'Australia')) -- Step 4: WHERE
ORDER BY name -- Step 5: ORDER BY
It’s important to note the use of IN
at step 2 and step 4 as opposed to =
which is a common mistake. Use IN
when dealing with multiple elements in your WHERE
function.
#4. Between Canada and Poland
4) Which country has a population that is more than Canada but less than Poland? Show the name and the population.
The requested data requires:
Pull data from world table |
Specifically name and population of country |
Condition 1: Population > Canada’s population |
Condition 2: Population < Poland |
Require population of Canada & Poland |
This results in the query below.
SELECT name, population FROM world -- -- Step 1: FROM (Want both country and population)
WHERE population > --Step 2: WHERE (Condition 1)
(SELECT population FROM world -- Step 3: (Reset your steps!) SELECT/FROM
WHERE name = 'Canada') AND -- Separate conditions required
population < -- Condition 2
(SELECT population FROM world -- Step 3: (Reset your steps!) SELECT/FROM
WHERE name = 'Poland') -- Step 4: WHERE
#5. Percentages of Germany
5) Germany (population 80 million) has the largest population of the countries in Europe. Austria (population 8.5 million) has 11% of the population of Germany. Show the name and the population of each country in Europe. Show the population as a percentage of the population of Germany. The format should be Name, Percentage
name | percentage |
---|---|
Albania | 3% |
… | … |
Pulling this data involves:
Pull data from world table |
Specifically name and population as a % of the population of Germany |
Requires specific formatting of data: % symbol, rounded to nearest integer |
Column name should be percentage |
Population as a % of Germany population = (pop./Germany’s pop.) x 100 |
Condition 1: Only countries from Europe |
The query involves pulling data on the population of Germany in order to return the population of each country within Europe as a percentage of Germany’s population. Up to this point, we have used the SELECT
function within the WHERE
condition. For this problem, we will need to use a separate SELECT
within the first SELECT
function.
The pulled data also requires a specific format. In order display as a percentage, the population of each country has to be divided by Germany’s population and then multiplied by 100. This has to be rounded to the nearest integer. Finally, as a visual preference, we will add a % to the returned data.
It is important to think about the specific order of formatting to achieve the wanted result. The last step in the formatting should correspond to the first function. Think of it as a house; build the frame work of the outside and work your way towards the middle.
% population |
rounded to nearest integer |
include % sign at the end |
The following query would be:
SELECT name, -- Requested data includes name
CONCAT( -- will join the result and % sign together
CAST( -- will format result as an integer
ROUND( -- will round result to the nearest integer
100*population/ -- population of each country multiplied by 100 and divided by...
(SELECT population FROM world -- population of Germany
WHERE name = 'Germany'),
0) -- The zero tells ROUND function to round to nearest ones place
AS INT), -- AS INT will tell CAST function to format result as an integer
'%') -- Joins results with % sign
AS 'percentage' -- Rename column
FROM world
WHERE continent = 'Europe' -- Condition: Only countries from Europe
#6. Bigger than every country in Europe
6) Which countries have a GDP greater than every country in Europe? [Give the name only.] (Some countries may have NULL
gdp values)
As usual, fragment the requested data.
Pull data from world table |
Specifically name of country |
Condition: Country gdp > all European country’s gdp |
Require gdp of all European countries |
Dealing with potential NULL values |
Since we’re dealing with a boolean that operates on multiple elements, the ALL
function will be useful for this query. It’s also important to note that we will need to add gdp > 0 condition since we’re dealing with potential NULL
values.
Building the query we get:
SELECT name FROM WORLD -- Require only name
WHERE gdp > ALL( -- Perform boolean function across all elements returned
SELECT gdp FROM world -- require gdp
WHERE gdp > 0 AND -- Prevents NULL values from returning, which cause errors
continent = 'Europe' -- Condition: European countries only
)
#7. Largest in each Continent
7) Find the largest country (by area) in each continent. Show the continent, the name and the area.
This problem provides an opportunity to work with correlated sub-queries. Correlated subqueries read every row in a table and compare values in each row against related data. It is mainly used to filter out specific results from the parent query. In layman terms, a correlated subquery answers a multipart question whose answer depends on the value in each row processed by the parent statement.
For the following problem, we want to select all the countries within each continent and within each continent, we want to compare those specific results and pull only the country with the greatest area.
The proper way to refer to the same table with an outer query and sub query is to use the following format: table x
& table y
and then refer to the columns with it’s respective table as x.column1
& y.column1
.
Using the above knowledge, let’s build a query that answers the above question. Begin by fragmenting the question.
Pull data from world table |
Specifically continent, name and area |
Largest country in EACH continent |
Require all countries from each continent |
Require area of each country from each continent |
Compare area of each country only with ALL of those within the same continent |
SELECT continent, name, area
FROM world x -- Outer query that pulls all the relevant data of all countries
WHERE area >= ALL( -- Outer query will pull country that satisfies boolean and is > all results
SELECT area FROM world y -- pulling from same table but need to distinguish from outer query
WHERE y.continent=x.continent -- Compares area of countries within the same continent
AND area > 0 -- Removes any null values
)
Correlated subqueries are an important skill to learn when beginning SQL as many complex queries are multi-part questions that rely on an outer query to solve an inner query.
#8. First Country of each continent (alphabetically)
8) List each continent and the name of the country that comes first alphabetically.
Here is another opportunity to practice correlated subqueries.
Fragmenting the problem, we have the following:
Pull data from world table |
Specifically continent, name |
First country in that appears in EACH continent alphabetically |
Require all countries from each continent |
Compare name of each country only with ALL of those within the same continent |
SELECT continent, name
FROM world x -- Outer query that pulls all the relevant data of all countries
WHERE name <= ALL( -- query will automatically pull countries by alphabetical order. Only the 1st country of each continent will satisfy the boolean
SELECT name FROM world y -- pulling from same table but need to distinguish from outer query
WHERE y.continent=x.continent -- Compares names of countries within the same continent only
)
#9 & #10 Difficult questions that utilize techniques not covered in prior sections.
9) Find the continents where all countries have a population <= 25000000. Then find the names of the countries associated with these continents. Show name, continent, and population.
As you become more comfortable with SQL, you will start to parse out the language and think of questions in terms of SQL functions/queries. In this example, looking specifically at the phrase “continents where ALL countries have a…” we should think about the framework for how we should build our query and recognize that a correlated subquery will be necessary since we require that all countries within EACH individual continent satisfy a specific condition.
Pull data from world table |
Specifically continent, name, and population |
Continents where all countries within have population > 25000000 |
Require name of all countries from each continent |
Require population of all countries within EACH individual continent |
SELECT name, continent, population
FROM world x -- Outer query that pulls all the relevant data of all countries
WHERE 25000000 >= ALL( -- limits query output only to those where inner correlated query satisfies the boolean
SELECT population FROM world y -- pulling data from same table but need to distinguish from outer query, specifically population
WHERE y.continent = x.continent -- Compares population of countries within the same continent only
AND population > 0 -- Removes any null values
)
10) Some countries have populations more than three times that of any of their neighbors(in the same continent). Give the countries and continents.
Similar to the previous problem, we’re comparing countries’ population with ALL of those from the same continent with the added condition that the country selected has 3 times the population as the rest of the countries specifically within the same continent.
Pull data from world table |
Specifically name and continent |
Specific condition where population where a country within each continent has 3x the population of all other countries within the same continent |
Require all countries from each continent |
Require population of all countries |
SELECT name, continent
FROM world x --Outer query that pulls all the relevant data from all countries
WHERE population > ALL( -- limits query output only to those where inner correlated query satisfies the condition
SELECT population*3 -- 3x population than all other countries condition
FROM world y
WHERE x.continent = y.continent AND -- compares population of countries within the same continent
population > 0 AND -- Removes any null values
y.name != x.name -- prevents the a country from comparing its population to itself
)
Hopefully this guide has been useful for learning more about using Select within Select queries and correlated queries. Stay tuned as more guides come out for future exercises!