Advanced Design Patterns for DynamoDB

YOUTUBE 6yqfmXiZTlM Rick Houlihan presents a repeat of his 2018 talk at AWS re:Invent 2019

Technology Adoption Hype Curve, @4:33

This was referred to from the start of Data modeling with Amazon DynamoDB which Nathan referred me to when I started to work at Centrapay.

I found a nice summary of this talk in "Takeaways from AWS re:Invent 2019’s Amazon DynamoDB Deep Dive: Advanced Design Patterns (DAT403)" jeremydaily.com

"It's really easy to do small things badly and think that you're doing ok" -- Houlinhan, @4:12

Tradeoffs with NoSQL, @6:12

If you use a normalised data model you're going to have pain at scale.

NoSQL is only good for applications with repeatable data patterns. If I don't know how I'm going to access the data, then it's better to use an adhoc query engine.

90% of applications we are written to support OLTP business process which makes it very relevant to business today.

Overview of datastructures in DynamoDB, @8:45

Table is an object repository. Tables must have a single attribute called the Partition Key, nothing else is 'required'.

Biggest difference is you get to add a sort key. This uniquely identifies the objects. Range queries can be applied to those sort keys.

Data in our table doesn't have to be Homogeneous, it can be Heterogeneous too. Comparison of these models can be found in the article Heterogeneous and Homogeneous mixtures. thoughtco.com

Our primary table holds a collection of objects that meet certain access pasterns. We've joined objects into a partition and we're going to join these using indexes.

Secondary index types, @12:58

What we're doing in design is to try create queries to get multiple objects using a single query. Grouping items in a partition means we don't have to do a lot of processing on the data.

Tables are replicated with a 100% guarantee on Global Secondary Indexes (GSIs) with eventual consistency. They're 10ms latency which is great for secondary access patterns.

Kasandra says don't use their indexes because they can't be consistent -- interesting claim.

To control costs, understand what you index. You can choose to project all, some or none of the attributes into your GSI. This can duplicate the data and the WCU cost for inserts to the table. This is a good way to understand your costs by understanding how you project into the GSIs.

Read Capacity Units (RCU) and Write Capacity Units (WCU) are priced separately. Choosing your indexes means choosing how much of these resources you're consuming. Dynamo Pricing

pk and sk are generic keys, but another access pattern is to use gsi1pk and gsi1sk for secondary lookup.

Doling it wrong - Clustered access, an example of how you shouldn't use NoSQL, @19:04

When you don't spread the data out you get hot keys which you can monitor in cloudwatch. You can't get that heatmap from cloudwatch, but you can get the data.

To get the most out of DynamoDB throughput, create tables where the partition key element has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible.

Doing it right - Uniform distinct access patterns, for better and cheeper performance, @20:10

Optimise space by spreading access evenly over the key space, or optimise for time by spreading your requests out over time. Beware the thundering herd.

DynamoDB autoscale helps if you need an on demand, but Amazon has pricing if that doesn't fit your usage.

Dynamo also has a burst bucket-- so there's a 5 minute burst bucket if you need to respond quickly. This means sudden memory pressure doesn't take down the database.

SQL vs NoSQL design patterns, @25:27

Data modelling is about relationships. Knowing your ERD is still important with NoSQL even if it's a key/access pattern.