Land Technologies
< all posts

SQL Window Functions

Posted by ChantelleM on June 29, 2021 · 4 min read datasqlpostgres

The data team often faces situations where standard aggregation functions, GROUP BY or DISTINCT ON won’t give us the results we need. To better explain this, we’ll cover a problem we recently solved using a window function. For this example we’ll use a similar table to the DISTINCT ON post – this table is called order_items

id|order_id|order_item_id|item_id|item_short_name|
--+--------+-------------+-------+---------------+
 1|       1|            1|      1|Toothpaste     |
 2|       1|            2|      1|Toothpaste     |
 3|       1|            3|      2|Bread          |
 4|       1|            4|      3|Milk           |
 5|       2|            1|      1|Toothpaste     |
 6|       3|            1|      2|Bread          |
 7|       4|            1|      3|Milk           |

Let’s look at the following data question: I want to find all items that only occur once in an order. The desired result would return the following, discounting the two toothpastes in order_id 1.

id|order_id|order_item_id|item_id|item_short_name|
--+--------+-------------+-------+---------------+
 3|       1|            3|      2|Bread          |
 4|       1|            4|      3|Milk           |
 5|       2|            1|      1|Toothpaste     |
 6|       3|            1|      2|Bread          |
 7|       4|            1|      3|Milk           |

We’ll begin by breaking the problem into two steps:

  1. COUNT the distinct combinations of order_id and item_id
  2. select rows where COUNT = 1

For step 1, even though we’re querying for distinct combinations of order_id and item_id, using the PSQL specific DISTINCT ON clause won’t give us the result we’re after. The following query helps demonstrate this:

  SELECT DISTINCT ON
    (order_id, item_id) *
  FROM
      order_items;

This query returns all distinct combinations of order_id and item_id, but there’s no way to combine DISTINCT ON with a COUNT and identify that two toothpastes have been added to order_id 1.

id|order_id|order_item_id|item_id|item_short_name
--+--------+-------------+-------+---------------+
 1|       1|            1|      1|Toothpaste     |
 3|       1|            3|      2|Bread          |
 4|       1|            4|      3|Milk           |
 5|       2|            1|      1|Toothpaste     |
 6|       3|            1|      2|Bread          |
 7|       4|            1|      3|Milk           |

This is because using DISTINCT ON, GROUP BY or a regular aggregation, like COUNT or SUM, will always reduce the results to a single row. Which works well when you want to find a single value. The big advantage of window functions is that they allow you to add information to every row while maintaining all the original rows. This makes it possible to then use the additional information to further filter the data.

So recapping on the steps above:

  1. COUNT the distinct combinations of order_id and item_id
  2. select rows where COUNT = 1

This is where a window function comes to the rescue 🙌

We’ll address the first step using the query below. In the query you’ll see a regular COUNT aggregation followed by an OVER clause. The OVER clause is how we define a window function. Inside the window function we also have a PARTITION BY clause, which partitions the window by the order_id and item_id combinations:

  SELECT
      *,
      COUNT(*) OVER (
          PARTITION BY order_id,
          item_id
      ) AS order_item_count
  FROM
      order_items;

Which give us the following results:

id|order_id|order_item_id|item_id|item_short_name|order_item_count|
--+--------+-------------+-------+---------------+----------------+
 1|       1|            1|      1|Toothpaste     |               2|
 2|       1|            2|      1|Toothpaste     |               2|
 3|       1|            3|      2|Bread          |               1|
 4|       1|            4|      3|Milk           |               1|
 5|       2|            1|      1|Toothpaste     |               1|
 6|       3|            1|      2|Bread          |               1|
 7|       4|            1|      3|Milk           |               1|

You’ll notice that all the original rows are returned and that an additional column, order_item_count, has been added. This column is a count of the rows returned in the order_id and item_id partitions, and it allows us to easily distinguish that there are two toothpastes in order_id 1.

Partition by

We can now move to step two and select the rows where count is equal to 1. To achieve this we’ll turn the previous query into a sub-query and use a WHERE clause to select where order_item_count equals 1:

  SELECT * FROM (
    SELECT
        *,
        COUNT(*) OVER (
            PARTITION BY order_id,
            item_id
        ) AS order_item_count
    FROM
        order_items
  ) AS t
  WHERE
    t.order_item_count = 1;

This completes the query and gives us the results we expect:

id|order_id|order_item_id|item_id|item_short_name|order_item_count|
--+--------+-------------+-------+---------------+----------------+
 3|       1|            3|      2|Bread          |               1|
 4|       1|            4|      3|Milk           |               1|
 5|       2|            1|      1|Toothpaste     |               1|
 6|       3|            1|      2|Bread          |               1|
 7|       4|            1|      3|Milk           |               1|

Hopefully this post helps explain the basics and demonstrates the power of window functions. There’s definitely more to learn, including value or ranking window functions and using ORDER BY or ROWS/RANGE clauses. You can read more about the different flavours of window functions here.

You can also check out our post on the PSQL specific DISTINCT ON, which might be more appropriate than a window function in some cases.


datasqlpostgres
Land Technologies

We are the engineers behind LandInsight and LandEnhance. We’re helping property professionals build more houses, one line of code at a time. We're a remote-first company with our tech team in Europe, and yes, we're hiring!

© 2021, Built with Gatsby
LAND TECHNOLOGIES LTD, m: WeWork c/o LandTech, 10 Devonshire Square, London, EC2M 4PL t: 0203 086 7855
© 2016 Land Technologies Ltd. Registered Company Number 08845300.