SQL Zoo Guide: SELECT within SELECT

11 minute read

Eager to practice SQL?

Introduction

SQL Zoo offers plenty of exercises with actual tables and a SQL editing box to code and run queries in. Today’s post will be part of a series of multiple blog posts that serve as a guide for completing the SQL exercises found in SQL Zoo. Assuming the reader already has a basic understanding of SQL, we’ll skip over the basic exercises and begin with the intermediate/advanced exercises.

The first exercise involves learning to use SELECT within existing SELECT queries. Below is a snippet of the table we’ll be working with for the following exercises.

world Table

name continent area population gdp
Afghanistan Asia 652230 25500100 20343000000
Albania Europe 28748 2831741 12960000000
Algeria Africa 2381741 37100000 188681000000
Andorra Europe 468 78115 3712000000
Angola Africa 1246700 20609294 100990000000

SQL execution order

Before we begin writing queries, it’s useful to think about the order in which SQL executes a query i.e.

Step 1: From
Step 2: Join
Step 3: Where
Step 4: Group By
Step 5: Having
Step 6: Select
Step 7: Distinct
Step 8: Order By

Having a clear understanding of how SQL executes queries helps the user organize their thoughts and think about the framework of the query about to be written.

#1. Bigger than Russia

1) List each country name where the population is larger than that of ‘Russia’.

For the first example, begin with fragmenting the requested data into multiple parts. The example can be split into four parts:

Pull countries from world table.
Specifically name of country
Condition: population of country > Russia
Require population of Russia

Let’s translate this in SQL terms and build a query.

SELECT name FROM world -- Step 1: FROM
WHERE population > -- Step 2: WHERE
    (SELECT population FROM world -- Step 3: (Start of a new query, reset your steps!) SELECT/FROM
    WHERE name ='Russia') -- Step 4: WHERE

By splitting the problem into separate parts, it becomes easier to work out the flow of logic needed to query the correct data.

Lets move on to the next example and apply a similar approach as above.

#2. Richer than UK

2) Show the countries in Europe with a per capita GDP greater than ‘United Kingdom’.

Fragment the example based on the data requested.

Pull countries from world table
Specifically name of country
Condition 1: Country from Europe
Condition 2: Per capita GDP > UK
Per capita GDP = GDP/Population
Require per capita GDP of UK

Build your query using the above fragemnts.

SELECT name FROM world -- Step 1: FROM(Note* only asked for list of counties, not per capita gdp)
WHERE continent = 'Europe' -- Step 2: WHERE (Condition 1)
AND gdp/population > -- Divide gdp by population to pull per capita GDP
    (SELECT gdp/population FROM world -- Step 3: (Reset your steps!) SELECT/FROM
    WHERE name = 'United Kingdom') -- STEP 4: WHERE (Condition 2)

Despite adding an extra condition and requiring per capita gdp, the problem was relatively straightforward once broken down into individual parts.

#3. Neighbors of Argentina and Australia

3) List the name and continent of countries in the continents containing either Argentina or Australia. Order by name of the country.

As we progress through the examples, the queries will increase in complexity, however, remember to think about the order of execution and break the problem apart into well defined fragments.

Based on the requested data, we want the following:

Pull data from world table
Specifically name of the country and its respective continent
Condition 1: Countries that reside in the same continent as Argentina
Condition 2: Countries that reside in the same continent as Australia
Order by name of Country

Build your query using the above fragments.

SELECT name, continent FROM world -- Step 1: FROM (Want both country and continent)
WHERE continent IN --Step 2: WHERE (*Note Use `IN` since we may return more than one result)
    (SELECT continent FROM world -- Step 3:(Reset your steps!) SELECT/FROM
    WHERE name IN ('Argentina', 'Australia')) -- Step 4: WHERE
ORDER BY name -- Step 5: ORDER BY

It’s important to note the use of IN at step 2 and step 4 as opposed to = which is a common mistake. Use IN when dealing with multiple elements in your WHERE function.

#4. Between Canada and Poland

4) Which country has a population that is more than Canada but less than Poland? Show the name and the population.

The requested data requires:

Pull data from world table
Specifically name and population of country
Condition 1: Population > Canada’s population
Condition 2: Population < Poland
Require population of Canada & Poland

This results in the query below.

SELECT name, population FROM world -- -- Step 1: FROM (Want both country and population)
WHERE population > --Step 2: WHERE (Condition 1)
    (SELECT population FROM world -- Step 3: (Reset your steps!) SELECT/FROM
    WHERE name = 'Canada') AND --  Separate conditions required
population < -- Condition 2
    (SELECT population FROM world -- Step 3: (Reset your steps!) SELECT/FROM
    WHERE name = 'Poland') -- Step 4: WHERE

#5. Percentages of Germany

5) Germany (population 80 million) has the largest population of the countries in Europe. Austria (population 8.5 million) has 11% of the population of Germany. Show the name and the population of each country in Europe. Show the population as a percentage of the population of Germany. The format should be Name, Percentage

name percentage
Albania 3%

Pulling this data involves:

Pull data from world table
Specifically name and population as a % of the population of Germany
Requires specific formatting of data: % symbol, rounded to nearest integer
Column name should be percentage
Population as a % of Germany population = (pop./Germany’s pop.) x 100
Condition 1: Only countries from Europe

The query involves pulling data on the population of Germany in order to return the population of each country within Europe as a percentage of Germany’s population. Up to this point, we have used the SELECT function within the WHERE condition. For this problem, we will need to use a separate SELECT within the first SELECT function.

The pulled data also requires a specific format. In order display as a percentage, the population of each country has to be divided by Germany’s population and then multiplied by 100. This has to be rounded to the nearest integer. Finally, as a visual preference, we will add a % to the returned data.

It is important to think about the specific order of formatting to achieve the wanted result. The last step in the formatting should correspond to the first function. Think of it as a house; build the frame work of the outside and work your way towards the middle.

% population
rounded to nearest integer
include % sign at the end

The following query would be:

SELECT name, -- Requested data includes name
    CONCAT( -- will join the result and % sign together
        CAST( -- will format result as an integer
            ROUND( -- will round result to the nearest integer
                100*population/ -- population of each country multiplied by 100 and divided by...
                    (SELECT population FROM world  -- population of Germany
                    WHERE name = 'Germany'),  
            0) -- The zero tells ROUND function to round to nearest ones place  
        AS INT), -- AS INT will tell CAST function to format result as an integer
    '%') -- Joins results with % sign
AS 'percentage' -- Rename column
FROM world
WHERE continent = 'Europe' -- Condition: Only countries from Europe

#6. Bigger than every country in Europe

6) Which countries have a GDP greater than every country in Europe? [Give the name only.] (Some countries may have NULL gdp values)

As usual, fragment the requested data.

Pull data from world table
Specifically name of country
Condition: Country gdp > all European country’s gdp
Require gdp of all European countries
Dealing with potential NULL values

Since we’re dealing with a boolean that operates on multiple elements, the ALL function will be useful for this query. It’s also important to note that we will need to add gdp > 0 condition since we’re dealing with potential NULL values.

Building the query we get:

SELECT name FROM WORLD -- Require only name
WHERE gdp > ALL( -- Perform boolean function across all elements returned
    SELECT gdp FROM world -- require gdp
    WHERE gdp > 0 AND -- Prevents NULL values from returning, which cause errors
    continent = 'Europe' -- Condition: European countries only
)

#7. Largest in each Continent

7) Find the largest country (by area) in each continent. Show the continent, the name and the area.

This problem provides an opportunity to work with correlated sub-queries. Correlated subqueries read every row in a table and compare values in each row against related data. It is mainly used to filter out specific results from the parent query. In layman terms, a correlated subquery answers a multipart question whose answer depends on the value in each row processed by the parent statement.

For the following problem, we want to select all the countries within each continent and within each continent, we want to compare those specific results and pull only the country with the greatest area.

The proper way to refer to the same table with an outer query and sub query is to use the following format: table x & table y and then refer to the columns with it’s respective table as x.column1 & y.column1.

Using the above knowledge, let’s build a query that answers the above question. Begin by fragmenting the question.

Pull data from world table
Specifically continent, name and area
Largest country in EACH continent
Require all countries from each continent
Require area of each country from each continent
Compare area of each country only with ALL of those within the same continent
SELECT continent, name, area
FROM world x -- Outer query that pulls all the relevant data of all countries
WHERE area >= ALL( -- Outer query will pull country that satisfies boolean and is > all results
    SELECT area FROM world y -- pulling from same table but need to distinguish from outer query
    WHERE y.continent=x.continent -- Compares area of countries within the same continent
    AND area > 0 -- Removes any null values  
)

Correlated subqueries are an important skill to learn when beginning SQL as many complex queries are multi-part questions that rely on an outer query to solve an inner query.

#8. First Country of each continent (alphabetically)

8) List each continent and the name of the country that comes first alphabetically.

Here is another opportunity to practice correlated subqueries.

Fragmenting the problem, we have the following:

Pull data from world table
Specifically continent, name
First country in that appears in EACH continent alphabetically
Require all countries from each continent
Compare name of each country only with ALL of those within the same continent
SELECT continent, name
FROM world x -- Outer query that pulls all the relevant data of all countries
WHERE name <= ALL( -- query will automatically pull countries by alphabetical order. Only the 1st country of each continent will satisfy the boolean
    SELECT name FROM world y -- pulling from same table but need to distinguish from outer query
    WHERE y.continent=x.continent -- Compares names of countries within the same continent only  
)

#9 & #10 Difficult questions that utilize techniques not covered in prior sections.

9) Find the continents where all countries have a population <= 25000000. Then find the names of the countries associated with these continents. Show name, continent, and population.

As you become more comfortable with SQL, you will start to parse out the language and think of questions in terms of SQL functions/queries. In this example, looking specifically at the phrase “continents where ALL countries have a…” we should think about the framework for how we should build our query and recognize that a correlated subquery will be necessary since we require that all countries within EACH individual continent satisfy a specific condition.

Pull data from world table
Specifically continent, name, and population
Continents where all countries within have population > 25000000
Require name of all countries from each continent
Require population of all countries within EACH individual continent
SELECT name, continent, population
FROM world x -- Outer query that pulls all the relevant data of all countries
WHERE 25000000 >= ALL( -- limits query output only to those where inner correlated query satisfies the boolean
    SELECT population FROM world y -- pulling data from same table but need to distinguish from outer query, specifically population
    WHERE y.continent = x.continent -- Compares population of countries within the same continent only
    AND population > 0 -- Removes any null values
)

10) Some countries have populations more than three times that of any of their neighbors(in the same continent). Give the countries and continents.

Similar to the previous problem, we’re comparing countries’ population with ALL of those from the same continent with the added condition that the country selected has 3 times the population as the rest of the countries specifically within the same continent.

Pull data from world table
Specifically name and continent
Specific condition where population where a country within each continent has 3x the population of all other countries within the same continent
Require all countries from each continent
Require population of all countries
SELECT name, continent
FROM world x --Outer query that pulls all the relevant data from all countries
WHERE population > ALL( -- limits query output only to those where inner correlated query satisfies the condition
        SELECT population*3 -- 3x population than all other countries condition
        FROM world y
        WHERE x.continent = y.continent AND -- compares population of countries within the same continent
            population > 0 AND -- Removes any null values
            y.name != x.name --  prevents the a country from comparing its population to itself
    )

Hopefully this guide has been useful for learning more about using Select within Select queries and correlated queries. Stay tuned as more guides come out for future exercises!

Tags:

Updated: