BetaBud

Structuring the Data

No matter the project, the smallest microservice to a large monolith, data is a key consideration. Even considering schemaless NoSQL databases have become more common it doesn’t mean data design has become any less important.

A data store can take on many different faces and have many different implementations, however without sufficient consideration, it can become a costly part of the system.

Draw it out, it will save a lot of pain later.

I am wary of using the word database throughout this post as it implies a single instance. The shape of your data can take on many different forms depending on which is the most applicable, therefore “data store” is probably more appropriate.

This blog is written from the perspective of starting BetaBud, which uses a single DynamoDB table for now...

This blog attempts to be as transparent as possible about how BetaBud has been implemented. Here we will lift the lid on the data implementation down to detail that you usually only see in tutorial scenarios.

Take a look here for the architecture design of BetaBud.

V³

Velocity, Variety, and Volume are three important factors to consider when creating your data store.

Velocity is the rate of the reads and writes to the store. Are the values proportional, or is there a far greater number of reads than writes? Consider the rate at which datasets change and are queried. Perhaps not all data is frequently read from, or it is forever constant.

Variety alludes to how structured is the data. If the structure is highly defined and highly related rather than being a large BLOB. Additionally, consider whether the store needs to be aware of any nested data.

Volume is both the quantity of the total dataset and each record itself.

Consider Patterns

The first step to structuring data is to think about the patterns for both reading and writing. By patterns, I mean in what form will the commands and queries look? Will we always be selecting a list of the same object type? Will we be selecting individual pieces of data? What values will we have to query with? What will a typical write command look like?

All of these considerations should consider the most frequent of operations. Designing for the 99% scenarios, then worrying about the 1% cases later.

BetaBud’s MVP Patterns

Let's consider the fundamental scenarios for BetaBud minimal viable product and the data we have access to at each point:

Type	Description	Available Data
Query	Querying the main feed - https://betabud.io	We know it is a list of forms
Query	Querying your feed - https://betabud.io/myforms	We also know it is a list of forms but they belong to a given user
Command	Creating a form - https://betabud.io/create	The given user and all the data about a not yet saved form
Query	Loading a form	The user and the form ID from either the main feed or own feed data
Command	Responding to a form	The user and the form ID from loading the page, as well as all of the responses
Query	Loading the user’s data - for instance the number of tokens they have	The given user

Choosing the initial store

As aforementioned, I acknowledge this store needs to be the silver bullet that would remain constant throughout, and be exclusively used. The store will change, and growth allows this store to be broken down and reconsidered.

Yet, with this concise set of initial MVP requirements, all could be achieved with a single store.

Honestly, this was a biased call from the start, I was always inclined to use DynamoDB. Perhaps a relational database would have been easier to query and reduce duplication - will explore later - but I wanted a store quick to set up and configure, easy to maintain, with very fast latency.

Single Table Design

All the data above is highly related, so it made sense to store it all together in a single table. The table has a generic “PK” and “SK” name for the primary key and sort key columns. This means they can take any form without creating confusion.

This single table also reduces the complexity when interacting with the data store, especially when using batch read/write operations.

A reason to perhaps consider splitting this table later could perhaps be a vastly differing rate of reads and writes per item type, maybe one type needs higher provisioning throughput than the rest. Or a single type requires a Global Secondary Index that the others don’t, at the stage of MVP this isn’t the case.

Item Design

Based on the above patterns here is what the primary keys look like.

Form Header

A form header item that includes any data required for the main feed

PK	SK
HEADER	1712009988#username#formIdentifier

PK: A static value that suggests this is a form header item. This means that we can easily query all the form headers.
SK: A concatenated string that includes published time - as epoch time, owner username and a form identifier. The published time allows for the sorting of the feed. The username helps identify whether it belongs to the logged-in user - can they respond? The form identifier both keeps it unique and allows for creating a link

Form Body

The form body item includes all data about a form such as the different questions

PK	SK
username	1712009988#formIdentifier

PK: The username of the form’s owner
SK: The concatenation of the published time and the same form identifier mentioned in the header item. This means the feed will also be sorted, with the form identifier maintaining uniqueness.

Form Response

The form response item is created when another user responds to a form

PK	SK
RESPONSE#username	formIdentifier#respondeeUsername

PK: A concatenation of a static string along with the username of the form’s owner. This way we know who the response is directed towards and the static prefix distinguishes the record from the others.
SK: A combination of the form identifier and the respondent's username. This means we can easily load all of the responses for a given form and also mandates the rule that a user can only respond to a form once.

User Token

A user token that contains other data about the user such as tokens earned and spent

PK	SK
USERTOKEN	username

PK: Again a static value that suggests this is a user token. This means should we want to query all users it is easily done.
SK: The username of the account. So retrieving the exact user is simply a get-item rather than a query - quicker and cheaper in terms of DynamoDB.

Someone say ERD?

An entity relationship diagram showing how the different entities can be stored in one table

Time to live (TTL)

Time to Live is a clever functionality of DynamoDB. At table level, set a date column that should behave as the TTL, and then whenever that date has elapsed, the item will be deleted.

Currently, there is only a TTL applied to the form header record within BetaBud. The absence of this header indicates that a form is archived and won't show in the main feed. This is currently set to happen after 28 days.

This in-built trigger functionality reduces the need for any scheduled process that would need to be implemented by another resource.

To be continued

As mentioned, this is the MVP implementation of BetaBud. A data store is an evolving ecosystem as much as your front-end design is. The greater consideration you give it upfront, the less pain it will save in the long run.

There is always the opportunity to perform a data migration later on, but let's try and reduce the headaches of future us with that initial design.