Database Design Flashcards ionicons-v5-c

Data

Facts that can be recorded and stored using computer media

Information

Data processed to increase the knowledge of the people who use it

Database Management System (DBMS)

General-purpose software system that facilitates the processes of defining, creating, using, and maintaining databases

Database

An organized collection of logically related data.Includes data and metadata.

data

the actual values of interest

Metadata

Data that describes the properties or charac-teristics of other data (a means of providing context for data)

Data dictionary

location where definitions of data (metadata) are maintained

Data access

read, insert, update, and delete

Rows

unique instances or records of entity

Columns

store attribute data

Primary key

Enforces uniqueness for each record

Foreign keys

Define relationships between tables

Object-Relational Database Model

Combine:Ability of object technology to handle advanced relationship typesData integrity, reliability, and recovery features of the traditional relational modelRepresented by the most popular and powerful of modern database management systemsE.g., Oracle, IBM DB2, Microsoft SQL Server

Transaction Processing Systems (TPS)

Designed to support the processing of everyday operational business transactions (i.e., retrieving, adding, modifying, and deleting data)Represent the vast majority of database systemsAre the primary focus of this course

Decision Support Systems (DSS)

Specifically designed to aid managers in decision-making tasksContain data that has been accumulated over time to aid in trend analysis and forecastingData about subjects must be organized in such a way that it provides a unified, overall picture of all the important details about the subjects over timeData warehouses are the most common exampleData marts are subsets of data warehouses designed to support a part of an organization

Business rules

Can be used to define or constrain some aspect of a business's structure or processesDatabase systems should support business rules that are relevant to the specific functions supported by the system

Conceptual Data Modeling

goal is to arrive at an understanding of the principal data sources and data elements of interest to the business or organization, and the relationships between the data sources, in order to satisfy requirements for informationtool is the Entity-Relationship Diagram (ERD)

Logical Data Modeling

goal is to convert the conceptual model into a form that can be utilized to create an IS (e.g., a relational database)tool is the Relational Schema

Physical Data Modeling

goal is to specify all identifying & operational characteristics of the data that will be recorded in the information systemtools include the Data Dictionary and others

Entity-Relationship Diagram (ERD)

Entities: named as singular nounsAttributes: named as singular nounsRelationships: named as verbs or verb phrases

Entity

something of interest in the environment (e.g., person, place, object, event, concept) that is described by attributes and that will have numerous instances we wish to track

Attribute

A discrete data elementDescribes an entity (i.e., is a characteristic)Meaningful (for the system being modeled)Value may be required or optional

Simple

at the atomic, most basic level

Composite

a related group of attributesexample: Address (Street, City, State, Zip)

Single Valued

only one value per entity instance (e.g., Last Name, Date Of Birth)

Multivalued

multiple values per entity instance are possible (e.g., Degree, Club, Skill)

Derived

calculated, but not stored (e.g., Total)

Identifier

an attribute that uniquely identifies each entity instance (e.g., Social Security Number)can be one or more attributes (e.g., First Name, Middle Name, and Last Name)Create an identifier if there is no obvious identifying attribute (e.g., Part Number)Identifier of a dependent (weak) entity is its partial identifier combined with identifier of its owner

candidate key

any attribute that would qualify as an identifier (e.g., SSN and Employee ID)

derived attributes

represent values that are calculated when needed and not stored(e.g., values generated by a query for use on a report)These are only optionally shown on an ERDWhen shown, diagrammed within square bracketsExample: Acquisition Cost * (1 + Percent Markup) = Selling Price

A relationship:

is an association between entities:The degree of a relationship indicates the number of entities involved in the relationshipspecified as unary, binary, ternary, or higher (n-ary)The cardinality of a relationship describes the number of instances of one entity that may be associated with one instance of another entityspecified as either one or many (many = one or more)The optionality (modality) indicates whether partici-pation in the relationship is required or notspecified as either mandatory or optional

unary relationship

only one entity is involved.

binary relationship

two entities are involved.

ternary relationship

three entities are simultaneously involved.

Two Cases When you Must Use Associative Entities

Case 1: If it is necessary to assign an identifier to uniquely identify each occurrence of the M:M relationship between the original entities, then an associative entity must be created.(Entities have identifiers; relationships do not.)Case 2: In a ternary relationship, if the optionalities at all three entities are not identical, then an associative entity must be created.Otherwise, the ERD cannot depict the interactions between the entities without ambiguity.

Strong Entity

exists independently of other entitieshas its own unique identifier (shown w/single underline)represented with regular rectangle symbol

Dependent (Weak) Entity

dependent on a strong entity; cannot exist on its owndoes not have a unique identifier, only a partial identifier (shown w/double underline)represented with rectangle symbol with lines in corners

Identifying Relationship

links strong entities to dependent (weak) entitiesrepresented with diamond symbol with lines in corners

Enhanced Entity-Relationship Model

addressed shortcomings of the basic E-R modelnot universally agreed upon in some respectsintroduced the supertype/subtype structure

Two Processes to Develop Supertype/Subtypes

1. Generalization: The process of defining a more general entity type from a set of more specialized entity types (A "bottom-up" approach)2. Specialization: The process of defining one or more subtypes of a general entity based on distinguishing attributes or relationships (A "top-down" approach)

Supertype

(example: Employee)a generic entity that has a relationship with one or more subtypes

Subtype

a subgrouping of a supertype entity that is meaningful to an organization (not just possible)shares all attributes of its supertype, but also has unique attributes of its own and/or :has relationships with other entities distinct from those of other subtypes

Inheritance Rule

An instance of a subtype is also an instance of the supertypeSubtype entities inherit values of all attributes of the supertype

Rule Restricting the Use of Supertype/Subtypes

You may use this type of structure only when either (or both) of the following are present: 1. When there are attributes that apply to some (but not all) of the instances of an entity type2. When the instances of a subtype participate in a relationship unique to that subtype

Supertype/Subtype Relationships Rule

Relationships shown at the supertype level indicate that all subtypes will participate in the relationshipThe instances of a subtype may participate in a relationship unique to that subtype; in this situation, the relationship is shown at the subtype level

Supertype/Subtype Constraints

1. Completeness Constraint: Addresses the question of whether an instance of a supertype must also be a member of at least one subtype2. Disjointness Constraint: Addresses the question of whether an instance of a supertype may simultaneously be a member of two (or more) subtypes

Completeness Constraint:Two Possible Rules

1. Total Specialization Rule (Double-line notation)Specifies that each entity instance of the super-type must be a member of some subtype in the relationship(Example: all STUDENTS are either UNDERGRADUATE or GRADUATE students)2. Partial Specialization Rule (Single-line notation)Specifies that an entity instance of the super-type is allowed to not belong to any subtype (Example: FACULTY and STAFF are not the only possible types of EMPLOYEE)

Disjointness Constraint:Two Possible Rules

1. Disjoint Rule (Letter "d" notation)Specifies that if an entity instance is a member of one subtype, it cannot simultaneously be a member of any other subtype (Example: all PERSONS are either MALE or FEMALE)2. Overlap Rule (Letter "o" notation)Specifies that an entity instance can simultaneously be a member of two (or more) subtypes (Example: an ATHLETE can be both a RUNNER and a JUMPER)

Disjoint

a simple attribute with alternative values to indicate the possible subtypes

Overlap

a composite attribute whose subparts pertain to different subtypes; each subpart contains a boolean value (yes or no) to indicate whether or not the instance belongs to the associated subtype

E. F. Codd

published an article titled "A Relational Model of Data for Large Shared Data Banks" in the Communications of the ACM journal.

Relational Database Model Components

Data Structure, Data Manipulation, and Data Integrity

Data Structure

Data are organized in two-dimensional tables (also called relations) with columns and rows

Data Manipulation

Data stored in the tables may be manipulated through the use of a command language (Structured Query Language - SQL - was developed expressly for this purpose)

Data Integrity

Business rules may be defined that maintain the integrity of the data that is manipulated

Relational Database Model Constraints

Domain Integrity, Policy Integrity, Entity Integrity, and Referential Integrity

Domain Integrity

constrains allowable values for columns(e.g., data type, column size, maximum value, etc.)

Entity Integrity

constrains data operations to business rules (e.g., only managers may place vendor orders)

Referential Integrity

constrains a foreign key value to match a primary key value in a related tablemore on this concept to come

E-R Model

Entity, Attribute, and Instance

Relational Model

Table (Relation), Column (Attribute), and Row (Tuple)

Each table (relation)

in a given database has a unique name

Each column (attribute)

within a given table has a unique nameEvery column (attribute) is single-valuedThus, multivalued attributes require special treatment when designing relational tables

Every row (tuple)

in a table is unique

Primary Key (PK)

A column (or columns) whose value uniquely identifies or differentiates each row in a table(e.g., EmployeeID)Required for every table in a relational databaseComposite Key - a primary key made up of more than one column (e.g., FirstName + MiddleName + LastName)

Foreign Key (FK)

A column in one table that serves as the primary key of another table in the same database (thus serving as a link between the two tables)

candidate key

If a table has more than one column that provides a way of uniquely identifying the rows of the tableWhen there is more than one candidate key, one of them must be chosen to be the primary key of the table

alternate key

A candidate key that is not chosen to be the primary key

Foreign Keys

serves as a primary key in another tablemeans by which tables are linked together to repre-sent unary, binary, ternary, etc. relationships

Referential Integrity

involves the circumstance of referring to a specific data row in one table in the database based on the value stored in a column in another tableThis constraint states that a foreign key value can not be stored in a table unless a matching value can be found in the primary key of the related table

Referential Integrity

For every value of a foreign key there must be a primary key with that valueFor every value of CustomerID in the Order table there must be a matching value of CustomerID in the Customer tableThe primary key must exist before the foreign key can be defined

Logical Database Design

The process of arranging the entities and attributes of the conceptual data model of the business environment into the tables and columns of a relational database structure to serve that business in an information systemThe goal is to create well-structured tables (i.e., free of anomalies) that properly reflect the organization's business environment (see the following slides for an explanation of anomalies)

Anomalies

problems that are experienced when attempting to manipulate stored data

Well-Structured Tables

Tables that contains minimal redundancy and that allow users to insert, delete, and modify table rows without errors or inconsistencies are considered to be well-structuredWhen designing relational database tables, we seek to eliminate anomalies through the use of normalization

Insertion Anomalies

are experienced when we attempt to store a value for a column but cannot because the value of another column is unknowne.g., cannot add a new customer's information until an order number is ready to be entered (because OrderID column serves as the primary key for the table and cannot have null values)

Deletion Anomalies

are experienced when a value for a column we wish to keep is unexpectedly removed when a value for another column is deletede.g., cannot delete the sole order for a customer without deleting the only copy of the customer's information also

Update Anomalies

are experienced when changes to multiple rows of a table are needed to effect an update to a single value of a columne.g., cannot completely update a customer's address without changing it for every order placed by that customer

referential integrity arrows

linkages between the tablesused to connect the FK of one table to the PK of another table

Map Regular Entities to Tables

Composite Attributes: Use only their simple, component attributes Multivalued Attributes: Become a separate table with a foreign key taken from the table for the original entityDerived Attributes: Are not included in a relational schema (since, by definition, they represent data that are not stored, only calculated as needed)

Map Dependent (Weak) Entities

Becomes a separate table with a foreign key taken from the primary key of the table for the strong entity Primary key is composed of the partial identifier of the dependent entity plus the primary key from the table for the strong entity (thus, creating a composite PK)

Map Binary Relationships

One-to-Many - Primary key on the one side becomes a foreign key on the many side Many-to-Many - Create a new table; the primary key of the new table is typically a CPK comprised of (at least) the primary keys of the two entities involved in the relationshipOne-to-One - Primary key on the mandatory side becomes a foreign key on the optional side (if optionalities are asymmetric)

Map Associative Entities

Identifier Not Assigned Default primary key for the table formed for the associative entity is typically a composite PK composed of (at least) the primary keys of the two entitiesIdentifier Assigned May use if one exists that is natural and familiar to end-users Must use if the composite PK can not be made unique by adding intersection data

Map Unary (Recursive) Relationships

One-to-Many: Recursive foreign key in the same table (also true for unary One-to-One)Many-to-Many (e.g., bill of materials): Two tables result: One for the entity type One for an associative relation in which the primary key has two fields, both taken from the identifier of the original entity

Map Ternary (and n-ary) Relationships

One table for each original entity and one for the common relationship (associative entity) (i.e., a ternary relationship maps to a total of four tables)Table representing the associative entity has foreign keys to each entity in the relationshipPK of the table formed for the associative entity is typically a composite PK composed of (at least) the primary keys of the three entities

Map Supertype/Subtype Structures

Create a separate table for the supertype and each of the subtypesAssign common attributes, including subtype discriminator, to the supertype tableAssign to the subtype tables those attributes unique to each subtypeAssign to the subtype tables the primary key of the supertype table (which also functions as a FK referencing the supertype)

Data Normalization

A formal process for grouping attributes into tablesA tool to validate and improve logical designs so that they satisfy certain constraints to avoid unnecessary duplication of dataThe process of decomposing a table with anomalies into two or more, smaller, well-structured tables

Functional Dependency

A constraint between two attributes in which the value of one attribute is determined by the value of another attribute.The value of one attribute (the determinant) determines the value of another attribute A -----> B"A determines B" or "B is functionally dependent on A"

First Normal Form

Table has no multivalued attributes a table that has multivalued attributes is unnormalized in this context, a multivalued attribute is sometimes referred to as a repeating group

Second Normal Form

Table is in 1NF and has no partial functional dependencies (that is, every nonkey attribute is fully functionally dependent on the entire primary key)This means that every nonkey attribute must be determined by the entire primary key, not by only part of the primary key

Third Normal Form

Table is in 2NF and no transitive dependencies (functional dependencies between nonkey attributes)This means that no nonkey attribute should be able to determine another nonkey attribute

Removing Transitive Dependencies

For each nonkey attribute that is a determinant in a table, create a new table; that attribute becomes the primary key of the new tableMove all of the attributes that are functionally dependent on that determinant attribute from the old table to the new tableLeave the attribute that serves as the primary key in the new table in the old table to serve as a foreign key to allow the tables to be related

Third Normal Form

Important points about the 3NF structure are: It is free of unnecessary data redundancy All foreign keys appear where needed to logically tie together related tables It is the same structure that would have been derived from a properly drawn ERD of the same business environment Normalization can be used to check the structures of any relational tables, even those created through ERD conversions, for 3NF (e.g., to check for the possible existence of previously unidentified transitive dependencies)

Candidate Keys as Determinants

There is one exception to the rule that, in third normal form, nonkey attributes are not allowed to define other nonkey attributesThe rule does not hold if the defining nonkey attribute is a candidate key of the tableThus, candidate keys in a table may define other nonkey attributes without violating third normal form

Physical Database Design

Physical design involves enhancing and/or modifying a logical database design to improve the performance of the run-time environmentThis may involve adding additional structures (e.g., indexes) to the logical database design, and/or modifying the 3NF tables produced by the logical database designThus, the well-structured relational tables produced either by the conversion from ERDs or by the data normalization process form the starting point for physical database design

Physical database design inputs include

Data tables described in logical database design, Business environment requirements, Data characteristics, Application characteristics, Operational requirements, and Hardware and software characteristics

Business environment requirements

Response time and throughput requirements

Data characteristics

Data volume and volatility assessments

Application characteristics

Data requirements and application priorities

Operational requirements

Data security and backup & recovery concerns

Hardware and software characteristics

Computer, network, and DBMS characteristics

Throughput

Measure of how many queries from simultaneous users can be satisfied in a given period of time

Data volume assessment:

Depends on how database system stores data For rough database size, add storage sizes of all tables [table size = (size of row) x (# of rows)]

Data volatility:

This is how often stored data is updatedImpacts: Throughput and response time (the greater the need for updates, the greater throughput required) Extent of normalization (the greater the need for updates, the more important normalization becomes)

Application data requirements:

Understanding the information needed by appli-cations and how the information will be used

Operational Requirements

Ongoing activities relating to protecting data and data availability

Domain integrity

ensures data values are validEnforced with: Data types Nullability settings Check constraints Unique constraints Default constraints

Entity integrity

ensures table rows are uniqueEnforced with: Primary keys (values must be unique and non-null)

Referential integrity

Ensures that data references to other tables remain validEnforced typically by foreign key constraints

Policy integrity

Ensures that data adhere to business rules e.g., limiting checking account withdrawal to amount not greater than current balanceEnforced either through DBMS or by application