I've been scared of SQL (and relational databases in general) for a pretty long time. The syntax has always felt clunky to me, and I feel like I'm using technology from 25 years ago. And that's probably because I am: the idea of a relational database was first coined by a chap at IBM in June of 1970!

I decided to tackle a problem I'd previously failed at -- and done so pretty badly. After following Miguel Grinberg's excellent (if now slightly dated) Flask Mega-Tutorial, I wanted to add the functionality to follow/unfollow a given user's posts. I'd reasoned that I needed a Following table to manage who was following whom, but my understanding of relational databases was (and is!) pretty rough around the edges: the details of how to link up the Post, User and Following tables to return only the relevant posts completely eluded me.

With a fresh-ish mind, and a new install of PyCharm, I decided to have another crack at it. This time I didn't want to make the webapp first; I wanted to make sure my database design was solid before building any API endpoints.

In actual fact, I decided not to write any Python until the database was working -- uncharacteristic I know, but I wanted the satisfaction of doing it without any of the myriad of Flask frameworks to abstract away the details of relational databases.

I started off writing three files (to keep around, in case the database ever got deleted):

  • create_user.sql
  • create_post.sql
  • create_following.sql

Each one creates a table in the database (the actual data source being configured via PyCharm's magical UI), as per their name. These could easily have all been contained within one file -- and in fact, I'll probably make that happen -- but I wanted to see it all happen individually.

I worked out pretty quickly that I'd have to run create_user first on a newly-created database: otherwise, the Post and Following tables couldn't refer back to the User table in their foreign keys.

So, the User table design looks like this:

create table User (
  user_id integer primary key not null,
  username string not null,
  user_bio string not null
);

Simple enough: we create a table called User. In this table there's 3 columns:

  • user_id, used as a unique identifier for the individual user;
  • username, a string value; and
  • user_bio, another arbitrary string.

There's two modifications to user_id: firstly, it's the primary key of the table, which as far as I can tell just forces it to be a unique identifier; and secondly, it's not null. This probably isn't actually necessary, as I think primary keys are not null by default, but hey -- nothing's broken.

The Post table is pretty simple too:

create table Post (
  post_id integer not null primary key,
  user_id integer,
  content string,
  timestamp string,
  foreign key(user_id) references User
);

We have these fields:

  • The post_id, another primary key which we probably won't use for anything in particular could be used for implementing a Favourite or Share table!;
  • user_id, the ID of the user who wrote the post;
  • content, the body of the post as a string; and
  • timestamp, the time at which the post was written.

The last line makes it explicit that user_id is going to be matched up with the user id of the User table. I'm told by my good friend StackOverflow that this isn't actually necessary, but improves database integrity and prevents accidental deletions. Seems like good practice to me, so it's staying in.

Finally, we have the Following table:

create table Following (
  follower_id integer,
  followed_id integer,
  primary key (follower_id, followed_id),
  foreign key(follower_id) references User,
  foreign key(followed_id) references User
);

Again, the last two lines aren't actually necessary, but (I think) make sure we don't have any accidental deletions of users who are still being followed by someone.

The first two lines are simple enough: although a "Following" is slightly awkward linguistically, it just represents a single instance of one user following another. These are the kind of strange mental gymnastics we apparently have to go through given the absence of a "list" type in SQL.

The third line is slightly strange, and probably won't matter too much, but instead of creating a pointless "id" column for the primary key, I've decided to use a Composite Primary Key. It's a primary key composed of the two important fields -- follower_id and followed_id -- and it's provably unique, since any given user X is either following user Y or not!


That's all well and good, but the real problem I ran into when trying this last time was actually forming the query to retrieve all the posts that the current user should see. This was defined as "every post from every user that the current user is following".

Strangely enough, I didn't have to think too hard before working out the correct query this time! Maybe it's because I'd written everything in SQL instead of using a Python wrapper, but I felt like I understood the (admittedly pretty simple) database design enough to just write the query straight off the bat...

...so here it is!

select content from Post, Following, User
  where Post.user_id = Following.followed_id
  and Following.follower_id = ?;

The ? is a way (apparently the safest way!) to parameterize SQL queries, and will be populated by the current user's ID, which should be available to me in the webapp.

In order to test it out, I used a slight modification:

select content from Post, Following, User
  where Post.user_id = Following.followed_id
  and Following.follower_id = User.user_id
  and User.username = "barry";

Of course, before getting any results from that query, I had to populate the tables with some rows! PyCharm makes this incredibly easy with the included table editor, but just to brush up on my SQLite syntax I wrote some equivalent queries myself (which I'll be relying on if I write the actual webapp!):

  • insert into Following values (?, ?);
  • insert into User values (null, ?, ?);
  • insert into Post values (null, ?, ?, ?);

Again, the ?-syntax allows me to parameterize the queries from the webapp backend. In my testing, I replaced them with arbitrary values matching the required types.

The null argument to User and Post is an SQLite quirk I discovered by accident, which autoincrements any integer primary key if null is passed as a value. Hooray for smart design!

It's great to see that I can actually learn some basic SQL and use relational databases, even if I'll likely convert the raw SQL queries into the equivalent calls to some Flask framework. I'm relieved not to be so completely terrified of relational databases any more, and extremely glad to have solved a past problem!