SQL Zoo Guide: SUM and COUNT
More SQL using SUM
and COUNT
!
Introduction
We continue where we left off from the previous post where we explored how to use SELECT
within SELECT
and correlated subqueries using the exercises found in SQL Zoo.
Below is a snippet of the table we’ll be working with for the following exercises.
world Table
name | continent | area | population | gdp |
---|---|---|---|---|
Afghanistan | Asia | 652230 | 25500100 | 20343000000 |
Albania | Europe | 28748 | 2831741 | 12960000000 |
Algeria | Africa | 2381741 | 37100000 | 188681000000 |
Andorra | Europe | 468 | 78115 | 3712000000 |
Angola | Africa | 1246700 | 20609294 | 100990000000 |
… | … | … | … | … |
SUM
, COUNT
, MAX
DISTINCT
& ORDER BY
functions
Today we will be learning how to use various aggregate functions in addition to DISTINCT
and ORDER BY
. Before we begin working on the following exercises, it is important to know when and how to use each of these functions.
The COUNT
function counts the number of non-empty values within a column. We simply provide the name of the column within the function and the query should return the total number of non-empty rows. It’s important to note COUNT
does not take into account the actual values stored within each row and only cares if the row has a non-empty value.
The SUM
function on the other hand takes a column name and returns the sum of all values within the column. As opposed to the COUNT
function, the SUM
function does take into account the actual numerical value stored within each row.
The MAX
function returns the max numeric value within the specified column.
The DISTINCT
function removes duplicate observations and returns each unique observation within a column.
The ORDER BY
function sorts the result-set in ascending or descending order. By default, ORDER BY
sorts by ascending order. To sort by descending order, simply add the DESC
at the end after the column(s) by which you are sorting by.
#1. Total world population
1) Show the total population of the world.
The first exercise begins with the basic application of the SUM
function. Simply apply the SUM
function to the population
column which contains the population of all the countries in the world.
SELECT SUM(population)
FROM world
The query should return 7,118,632,738.
#2. List of continents
2) List all the continents - just once each.
The next exercise involves listing out all of the distinctive continents within the world
table. Note, similar to the previous exercise, the DISTINCT
function is applied at the beginning on the same line as the SELECT
function.
SELECT DISTINCT(continent)
FROM world
The query should return all 8 distinct continents within the world
table
#3. GDP of Africa
3) Give the total GDP of Africa.
Similar to our previous post on SQL Zoo, as the queries become more complicated, it helps to break the problem down into easy to understand fragments. Since we are taking into consideration the values stored within each observation (GDP), we will be using the SUM
function.
We require the following:
Pull GDP from world table |
Condition: Only countries from Africa |
Require the total GDP |
SELECT SUM(gdp)
FROM world
WHERE continent = 'Africa'
#4. Count the big countries
4) How many countries have an area of at least 1000000
This exercises requires us to count the total number of results that are returned after meeting the specified condition. Therefore, we require the COUNT
function.
Pull countries from world table |
Condition: Only countries with area > 1000000 |
Specifically, require the total number of observations returned |
SELECT COUNT(name)
FROM world
WHERE area > 1000000
The returned result should be 28 countries.
#5. Baltic states population
5) What is the total population of (‘Estonia’, ‘Latvia’, ‘Lithuania’)?
Similar to exercise #3, we are interested in the total sum but with an added condition.
Pull population from world table |
Condition: Only from the specified countries |
Require the total population of the specified countries |
SELECT SUM(population)
FROM world
WHERE name IN ('Estonia', 'Latvia', 'Lithuania')
Note: Remember to use IN
instead of =
when referencing multiple conditions.
Using GROUP BY
& HAVING
The following examples explore combining the GROUP BY
& HAVING
functions with the aggregate functions we have practiced with.
The GROUP BY
function is often used in conjuction with aggregate functions by applying the functions to groups of items that share the same values.
The HAVING
function is used alongside the GROUP BY
function in place of the WHERE
function as the WHERE
function cannot be used with aggregate functions.
#6. Counting the countries of each continent
6) For each continent, show the continent and number of countries.
For this exercise, we want to know how many total countries each continent has. Since the table doesn’t provide this data explicitly, we have to manipulate the table for the desired results.
Breaking this problem down, we have:
Pull continents and countries from world table |
Require total # of countries |
Require each continent |
Seperate the total # of countries by continent |
SELECT DISTINCT(continent),
COUNT(name) AS 'Total # of countries'
FROM world
GROUP BY continent
#7. Counting big countries in each continent
7) For each continent, show the continent and number of countries with populations of at least 10 million.
Similar to the prior problem, we have to manipulate the data to obtain the desired information. The first clue we have is “For each continent”. This immediately lets us know that we will group the data by continent. Splitting the problem into simpler parts we have the following:
Pull continents and countries from world table |
Require total # of countries |
Condition: Only countries with population > 10,000,000 |
Seperate the total # of countries by continent |
SELECT continent,
COUNT(name) AS 'total # of countries'
FROM world
WHERE population > 10000000
GROUP BY continent
#8. Counting big continents
8) List the continents that have a total population of at least 100 million.
The final problem may be tricky to those that haven’t familiared themselves with the HAVING
function. Normally, when we specify a condition, we use the WHERE
function. However, when we are applying conditions to a group, it’s important to distinguish between the two similar functions.
In this example, we are interested in the total population of each continent. However, we do not have that specific information. We have the population of each country but with some data manipulation, we can easily figure the population of each continent by grouping the countries and adding their total population together. Once this is done, we can specify the condition each continent must meet. In simpler terms we have:
Pull continents from world table |
Require the entire population of EACH continent |
Condition: Only return continents with a TOTAL population > 100,000,000 |
SELECT continent
FROM world
GROUP BY continent
HAVING SUM(population) > 100000000
This concludes our guide on learning how to use aggregate functions alongside with the GROUP BY
and HAVING
functions. Stay tuned for a future post on learning to use JOIN
functions!