How not to design a database
by pompo500 on Oct.02, 2010, under Software
I’ve come across quite a few horrible database designs. Here are a few tips on how to build a better database world.
Drupal is a great example of this. It’s database design offends most of these rules I consider that make a good database.
COLUMN NAMES
This is actually the most imporant one in my opinion, where almost everybody gets this wrong.
Always prefix your column names with the table name, and with camelcase. Makes everything more readable.
Consider tables “products” and “categories”.
Products have
- ID
- Name
- Category ID
Categories:
- ID
- Name
If your tables are designed poorly then you have columns named products.ID and categories.ID. How do you SELECT products along with their category names? You have to say category = categories.id. First off, the name “category” suggests it refers to “categories” table, but in fact it references “products.category”. That’s dumb and un-intuitive.
Why not prefix your columns with their table name? Like ProductID, ProductName and so on.
SELECT * FROM products INNER JOIN categories ON ProductCategory = CategoryID
SELECT * FROM products INNER JOIN categories ON category = category.id
Which looks better? And the first saves you from column ambiquity errors. And if the database schema changes and introduces a new column name that exists in two tables and a JOIN would happen to get data from both, without table prefix. Shit hits the fan and your application breaks.
COLUMN TYPES
Choose the right data types that fit the content you are storing. If your columns are spec’d as varchar(100) then something is probably wrong.
There are many and justified cases to use variable length columns, but if the table does not specifically need it, fixed length rows are much faster to seek to. It’s only a simple multiply operation on fixed-length rows.
What’s a variable length row? It’s when a table contains at least one column whose data type is variable length. (varchar / tinytext / mediumtext / blob / ..)
CONTENT
What to store in the database? Only the data that that concerns you business logic. Not anything else, supporting functions for example.
Drupal is a great example of this. Drupal has 7 (seven) tables for caching purposes. Database should not handle cached data, rate limiter counts or anything of that nature.
INDEXES
Index the right stuff.
I won’t be explaining indexing here. Should not know of it enough, there are many guides and sites that explain it better and more wisely than I could. Go Google it.
CASCADES
Don’t use any “ON UPDATE CASCADE, ON DELETE CASCADE” magic. They may be of convenience but there are some cases where you really need handle more complex logic when deleting something that simple cascades couldn’t handle.
And anyways database should be just a stupid entity that stores the data you tell it to. It should not contain any business logic. And CASCADES just make things a bit more confusing, and software is quite complex as is already.
You might think they’re cool and you know them when you implement them, but when a new developer comes to the project they have to study the foreign references with great concentration to get to know how the database behaves on certain events. Like I said, database should be dumb and act just like you want it to (= only do things you are told to).
STORED PROCEDURES
Don’t use stored procedures. Same as previous, database is not part of your business logic, except only in the sense of storing your data. Unnecessary complexity. Harder to debug on errors. Vendor lock-ins. Database should be a dumb entity that can be switched if needed.
Don’t store images, files or serialized data inside your DB. It’s not cool. Filesystem is for images and files. Serialized data? You are probably doing something wrong already.