Clean SQL Tricks
I'm not great at analytical SQL, but when I try to treat my SQL the way I treat code by breaking out intermediate values, prioritizing tight feedback loops, and using clear constant values, I find that I'm able to write better, more maintainable SQL, faster. I consistently use a few techniques to make this happen:
create temporary table: creating lots of small, focused tables has been the most important technique to improve my SQL queries. Smaller queries are much simpler to understand, and by building these tables up one by one, subsequent queries that build on those tables can be much faster. If you often write large queries with lots of
joins, you're probably familiar with the struggle of debugging and maintaining these queries! (The temporary keyword is a Redshift keyword to create a table only for your current session)
fromto quickly test out behavior. You can run queries like
select extract('year' from GETDATE());or
select 1 != null, 1 is not nullto quickly check your understanding. There's no need for a
- Rather than writing constants multiple times, setting up a constants table, or a
with constantsclause, can make queries more maintainable. Realizing you updated a field in one spot but not in another can be frustrating!
with constants as (select '2020-01-01' as start, '2020-02-01' as end)(Shoutout to Ben Haley for showing me this trick!)
limitto explore data as you go. Building up complex analytics queries should be a pretty iterative process. Ideally, you create focused, easy-to-understand tables, query them to make sure their data looks like you expect, and then later combine those tables together to answer your question. Running lots of small queries using
limitto see if things are working correctly as you go can make the whole process a lot easier!