Matt Willard


Database Design for Mere Mortals

=DESIGN OBJECTIVES
design = arrange building blocks to achieve a desired result
-learn how to do it from an established methodology; don’t just guess; methodology makes it easier and faster
-DB should support required and ad-hoc info retrieval
-tables should be proper and effective
-data integrity should be imposed
-DB supports relevant business rules
-DB supports itself to future growth

THREE STEPS
-requirements analysis: examine business, talk with users, analyze system to see what they want to do
-data modeling: build a database structure using a modeling method to represent aspects of structure and relationships
-normalization: use special rules called “forms” to reduce bigger tables into smaller one; prevents redundancy in data

KEYS:
primary key value = ID of specific record in DB
primary key field = ID given table in DB
foreign key = copy of another table’s primary key in a second table: used for relationships
–link them up for views

DATA INTEGRITY:
table-level: no duplicate records in table; ID field is unique and never null
field-level: each field is valid and consistent and accurate
relationship-level: pairing between tables is sound and synchronizes when changes are made
business rules: any organization restrictions

=CONCEPTUAL OVERVIEW
-define mission statement (purpose of DB) and objectives (tasks performed by users against database data)
-analyze current database; review how it collects and presents data; interview users/management on how they use it every day
-create data structures: establish tables by IDing subjects database will track; then assign fields to tables based on subject’s characteristics; set primary key for field; document field specifications and data types
-determine/establish table relationships: ID relationships that exist between tables and connect via primary/foreign keys or linking tables; set them to 1:1, 1:M, M:M
-define business rules: do interviews to ID constraints and limitations that must be part of DB; set validation tables to ensure these rules
-establish views: interview people to find out how they work with respective data and create views to accommodate those situations and their viewing criteria
-review data integrity: make sure tables/fields fit their design specifications; make sure business rules are accounted for and relationships are valid and have integrity

=STARTING THE PROCESS
-get mission statement: succinct, specific, clear
-objectives: each one is a single general task, easy to understand, that defines what people are doing in the DB and using it for

=ANALYZING THE CURRENT DATABASE
Ask:
What types of data does the organization use?
How does the organization use that data?
How does the organization manage and maintain that data?

1. interview users/management to see how organization collects/presents data
2. create preliminary list of tables and fields
3. make any changes to prevent carrying over weaknesses of old DB
-DO NOT COPY THE STRUCTURE OF THE OLD DATABASE.-

-look at how data is collected: get samples of paper data, get screenshots of programs that interact with and collect data; check web pages sed
-see how info is presented: reports, web pages, presentations; review these and get samples/screenshots to see how it’s laid out and which ones add or use data
-interview users/management on how data is used: ask open-ended questions and pick out nouns that represent data characteristics; start and add to this list of subjects; follow up about subjects with specific questions to pick out their characteristics and details about the subject

QUESTIONS TO ASK USERS: what type of data they use, how they use it, types of info needed for their work; check out the samples and how they help the users
–information they need now and in the future to do their jobs
QUESTIONS TO ASK MANAGEMENT: info managers currently get, info they need to get, info they foresee themselves needing, perception of organization’s overall info requirements

MAKE A PRELIMINARY FIELD LIST
-review and refine list of characteristics: rename same name ones, refine ones that are the same, make sure all items are actual characteristics and break them down
-check samples for new characteristics and reference with List
-pull out calculated fields (results of expression, text or number) and put into calculated field list
-review lists with users/management to make sure they all fit well and have fields needed

=ESTABLISHING TABLE STRUCTURES
DEFINE PRELIMINARY TABLE LIST
-review preliminary field list: identify subjects from this list first
-then merge this subject list with your other subject list; resolve duplicate items, items that represent the same subject, then combine subjects with prelim field list
-then, use mission objectives to see if any other subjects need to be added

DEFINE FINAL TABLE LIST
-final table list has subjects, table type, and table description
-table types: data tables represent important organization table, linking table links two tables in many to many relationship, subset tables are related to data table’s subject in greater detail, validation table has static, data integrity data
–all of these will probably be data for now and change as needed
-make sure table names are: descriptive, accurate, clear, concise, full words

ASSOCIATE FIELDS WITH EACH TABLE
-assign the appropriate fields from your field list to tables

REFINE FIELDS
-make sure field names are good in a format similar to table names
-make sure fields are distinct, unique, with a singular value, can’t be broken down further
-multipart fields should be broken up into separate fields to help with later work as needed; each field should represent one value; keep cutting down on redundancy
-if there’s a lot of redundancy this way, use the field as a basis for a new table and related a field or two from another table to it

REFINE TABLE STRUCTURES
-you want to keep duplicate fields used to relate tables to each other to a minimum to cut down on redundant data
-use an ideal table to refine structures: an ideal table has a single subject (object or event), a primary key, no multivalue fields, no calculated fields, no duplicate fields outside of ones used to relate tables together
-you don’t need duplicate fields most of the time, do a JOIN; remove duplicates from tables as needed
-you might need to combined reoccurences of the same type of value into a multivalue field, and then resolve accordingly
-subset tables: some fields in a table don’t always have values; if they don’t fit an ideal table, might have to make subset tables

=KEYS
1. choose candidate keys that fit all elements:
not multipart, needs unique values, no nulls, doesn’t breach privacy (so no SSNs), not optional values, serves as uniqueID for each record and field; only modified in extreme circumstances
-usual candidates tend to be customer/employee/order IDs or similar
-you can also create a key if one doesn’t exist or could be stronger than other fields: again, usually an ID field is best for this
2. set primary key: from candidates, choose single-field key that incorporates table name; if a field isn’t exclusively identified by the key, might need to remove it
3. set other candidates as alternate keys

TABLE-LEVEL INTEGRITY
no duplicate records, key exclusively identifies each record, keys are unique values and not null

-next: do interviews: make sure appropriate subjects are represented, table names must be meaningful along with field names; make sure all appropriate fields are in table

=FIELD SPECIFICATIONS
field specifications are good for field-level integrity, overall data integrity, gets a clear understanding of the nature/purpose of the data
–be mindful of data type for the field and length; it’s important to confirm that you define the characters that can be entered in the field to confirm field integrity
–also consider input mask and display format for this
–it’s good to run through these field specifications to make sure all the fields are accurate

=TABLE RELATIONSHIPS\

  • why relationships are important
    -establishes connection between logically related tables
    -refines table structures
    -minimizes redundant data
    -draw data from multiple tables simultaneously

  • types of relationships
    ONE-TO-ONE
    only one record in first table is related to only one record in the second table

ONE-TO-MANY
single record is related to many records in the second table, but not the other way around
-a customer can have many orders, but a single order can only have one customer

MANY-TO-MANY
1 record in Table A: related to many records in Table B
1 record in Table B: related to many records in Table A
-A STUDENT can have many CLASSES.
-A CLASS can have many STUDENTS.

-self-referencing relationships refer to other records within the same table; can also be one-to-one, one-to-many, many-to-many

  • identifying relationships
    ask questions about how records can be associated or belong to other records to determine relationships from the perspective of the table you’re working on
    -you can use sample data to assist

  • establishing relationships
    use primarily and foreign keys to establish connection for 1:1 and 1:M
    -1:1: assign parent and child role to tables, put copy of parent’s primary key into child table (this makes it a FOREIGN key); label it FK; label primary keys PK
    -1:M: take primary key copy from “one” table and put in “many” table as foreign key
    –multivalued fields often gain these 1:M relationships
    -M:M: these use linking tables: table A and B are “one”, table AB is “many”
    1. linking table gets copies of primary keys from both tables: composite primary key
    2. name linking table that reflects relationship
    3. add linking table to final list
    –if it turns out that records are generating a lot of redundant data in a M:M sort of table, you may also need to add other fields to the linking table that cut down on said redundant data (use sample data to find these)

  • refining foreign keys
    make sure they have same name as the other primary key; uses a replication of the same field specifications;

  • establishing relationship characteristics
    -define deletion rule to protect from orphaned records
    DENY: keeps record, sets as inactive
    RESTRICT: if child records that are related exist in child table, it won’t delete parent record
    CASCADE: kill parent record and all related child records
    NULLIFY: delete parent record, updates foreign keys in child table to null
    SET DEFAULT: delete parent record, update child foreign key values to default value
    –go with restrict by default, and use rules based on question: if a parent record is deleted, what should happen to the child records?
    -participation: do you need a record in a table before entering records in the related table? (does an employee need to be assigned to a manager?) if so: mandatory, if not: optional
    -degree of participation: min/max number of records allowed to be associated (can be unlimited)

  • relationship-level integrity
    this is when
    –connection between two tables in relationship is wound, new records can be meaningfully added, records can be deleted without any adverse effect, there’s a meaningful record limit

=BUSINESS RULES
-imposed constraints on DB usually based on policies set by business
-these are established OUTSIDE of the DB design using its type
EXAMPLE: Students can’t have more than two instruments checked out at the same time.

DATABASE ORIENTED RULES: constraints inside of the DB (such as limited entry fields available for a state)
–FIELD SPECIFIC: rules on fields, like “oh this field needs to be able to take zip codes”
–RELATIONSHIP SPECIFIC: such as no less than 5 students but no more than 20
APPLICATION ORIENTED RULES: established outside of DB and more in design (such as: you can’t store someone’s calculated 15% discount within the table)

-make sure to test implemented rules and document them for later; test by doing CRUD and seeing if anything gets violated
-validation tables (lookup tables) can be used to satisfy rules when fields get limited (for example, in case categories need to remain consistent and can’t have new ones be invented by staff)

=VIEWS
a view is a virtual table, based on data with multiple tables: it’s basically a query
-it gets routinely refreshed each time it’s accessed, can be customized to the user’s needs, and it can protect the rest of the DB from access
THREE VIEW TYPES:
DATA VIEW: examine/manipulate data from single base table or multiple base tables
AGGREGATE VIEW: using one or more base tables, show calculated fields that group the data: sum, average, min, max, etc.
VALIDATION VIEW: use a sub table to enforce access based on permissions, business rules, and validation
-get with users and review notes to get ideas of likely needed business views
–with views, it’s a very good idea to use calculated fields, such as sum, or appending data types together; you can also use filters to limit data further to specify the view

=REVIEWING DATA INTEGRITY
Double check:
-no duplicate fields
-no calculated fields
-no multivalue fields
-no multipart fields
-no duplicate records
-each record has a primary key and these primary keys confirm to proper primary key elements

-each field is a proper ideal field and has field specifications

-make sure relationships are properly defined with correct deletion rules, proper type and degree of participation

-business rules should be properly defined and categorized

-views have all appropriate tables and fields and returns appropriate records

-double-checked that the documentation is there; good for review to answer questions and be used when implementing and modifying the database