Lesson 2 Data Modeling Flashcards ionicons-v5-c

Data Modeling

the first step in designing a database, refers to the process of creating a specific data model for a determined problem domain

problem domain

a clearly defined area within the real-world environment, with a well-defined scope and boundaries that will be systematically addressed

data model

a relatively simple representation, usually graphical, of more complex real-world data structures; used in the database design phase of the Database Life Cycle.

entity

a person, place, thing, or event about which data will be collected and stored; represents a particular type of object in the real world, which means it is "distinguishable"—that is, each occurrence is unique and distinct

attribute

a characteristic of an entity

relationship

describes an association between entities

one-to-many relationship

A relationship between two tables in a database in which one record in the primary table can match many (zero, one, or many) records in the related table.Expressed as 1:M or 1..*

many-to-many relationship

In databases, a relationship in which one record in Table A can relate to many matching records in Table B, and vice versa.Expressed as M:N or ..

one-to-one relationship

In databases, a relationship in which each record in Table A can have only one matching record in Table B, and vice versa.Expressed as 1:1 or 1..1

constraint

restriction placed on the data, usually expressed in the form of rules. For example, "A student's GPA must be between 0.00 and 4.00." Constraints are important because they help to ensure data integrity

business rule

describe the main and distinguishing characteristics of the data as viewed by the company

Relationships are

bidirectional

hierarchical model

An early database model whose basic concepts and characteristics formed the basis for subsequent database development. This model is based on an upside-down tree structure in which each record is called a segment. The top record is the root segment. Each segment has a 1:M relationship to the segment directly below it.

segment

In the hierarchical data model, the equivalent of a file system's record type; a higher layer is perceived as the parent of the segment directly beneath it, which is called the child

network model

An early data model that represented data as a collection of record types in 1:M relationships; allows a record to have more than one parent

schema

A logical grouping of database objects, such as tables, indexes, views, and queries, that are related to each other; the conceptual organization of the entire database as viewed by the database administrator

subschema

The portion of the database that interacts with application programs that actually produce the desired information from the data within the database

Data Manipulation Language (DML)

The set of commands that allows an end user to manipulate the data in the database, such as SELECT, INSERT, UPDATE, DELETE, COMMIT, and ROLLBACK

Data Definition Language (DDL)

The language that allows a database administrator to define the database structure, schema, and subschema

relational database

A database that represents data as a collection of tables in which all data relationships are represented by common values in related tablesIntroduced in 1970 by E. F. Codd of IBM in his landmark paper "A Relational Model of Data for Large Shared Databanks" (Communications of the ACM, June 1970, pp. 377-387).

relation

A logical construct perceived to be a two dimensional structure composed of intersecting rows (entities) and columns (attributes) that represents an entity set in the relational model; sometimes called a table

relational database management system (RDBMS)

a DBMS that organizes data in tables or relations; translates a user's logical requests (queries) into commands that physically locate and retrieve the requested data

relational diagram

a graphical representation of a relational database's entities, the attributes within those entities, and the relationships among the entities

end-user interface

the interface allows the end user to interact with the data (by automatically generating SQL code)

collection of tables stored in the database

all data is perceived to be stored in tables. The tables simply "present" the data to the end user in a way that is easy to understand. Each table is independent. Rows in different tables are related by common values in common attributes

SQL engine

hidden from the end user, the SQL engine executes all queries, or data requests; said to be a declarative language that tells what must be done but not how

Entity Relationship Model

A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level with the help of ER diagrams. The model was developed by Peter Chen

Entity Relationship Diagram

A diagram that depicts an entity relationship model's entities, attributes, and relations

entity (ERM)

represented in the ERD by a rectangle, also known as an entity box

entity instance

Each row in the relational table

entity occurrence

Each row in the relational table

entity set

A collection of like entities

attribute (ERM)

particular characteristics of the entity

relationships (ERM)

describe associations among data

connectivity (ERM)

type of relationship between entities, classifications include 1:1, 1:M, and M:N; name of the relationship is usually an active or passive verb. For example, a PAINTER paints many PAINTINGs, an EMPLOYEE learns many SKILLs, and an EMPLOYEE manages a STORE

Chen notation

In ____________________, relationships are represented by a diamond connected to the related entities through a relationship line.

Crow's Foot notation

In ___ a three-pronged symbol represents the "many" side of the relationship.

class diagram notation

the connectivities are represented by lines with symbols (1..1, 1..*), uses names in both sides of the relationship; part of UML

object orientated Data model (OODM)

both data and its relationships are contained in a single structure known as an object

object

described by its factual content, but unlike an entity, an object includes information about relationships between the facts within the object, as well as information about its relationships with other objects

Object-Oriented Database Management System (OODBMS)

stores the data and procedures that act on those data as objects that can be automatically retrieved and shared; said to be semantic data model

attribute (OODM)

describe the properties of an object

class (OODM)

a collection of similar objects with shared structure (attributes) and behavior (methods)

method (OODM)

represents a real-world action such as finding a selected PERSON's name, changing a PERSON's name, or printing a PERSON's address. In other words, the equivalent of procedures in traditional programming languages. In OO terms, define an object's behavior

class hierarchy

The organization of classes in a hierarchical upside down tree in which each parent class is a superclass and each child class is a subclass. For example, the CUSTOMER class and the EMPLOYEE class share a parent PERSON class. See also inheritance.

Inheritance

the ability of an object within the class hierarchy to inherit the attributes and methods of the classes above it. For example, two classes, CUSTOMER and EMPLOYEE, can be created as subclasses from the class PERSON. In this case, CUSTOMER and EMPLOYEE will inherit all attributes and methods from PERSON

Unified Modeling Language (UML)

A language based on object-oriented concepts that provides tools such as diagrams and symbols to graphically model a system

UML Class Diagram

used to represent data and its relationships within the larger UML object-oriented system's modeling language

extended relational data model (ERDM)

adds many of the OO model's features within the inherently simpler relational database structure; gave birth to a new generation of relational databases that support OO features such as objects (encapsulated data and methods), extensible data types based on classes, and inheritance; often described as an object/relational database management system (O/R DBMS)

object/relational database management system (O/R DBMS)

A DBMS based on the extended relational model. The ERDM, championed by many relational database researchers, constitutes the relational model's response to the OODM. This model includes many of the object-oriented model's best features within an inherently simpler relational database structure.

Extensible Markup Language (XML)

A meta-language used to represent and manipulate data elements. Unlike other markup languages, XML permits the manipulation of a document's data elements. XML facilitates the exchange of structured documents such as orders and invoices over the Internet

Big Data

refers to a movement to find new and better ways to manage large amounts of web and sensor-generated data and derive business insight from it, while simultaneously providing high performance and scalability at a reasonable cost

3 Vs of Big Data

volume, velocity, variety

volume

the amounts of data being stored

velocity

not only to the speed with which data grows but also the need to process this data quickly in order to generate information and insight

variety

the fact that the data being collected comes in multiple different data formats

Hadoop, MapReduce, NoSQL

Some of the most frequently used Big Data technologies

Hadoop

a Java based, open source, high speed, fault-tolerant distributed storage and computational framework. uses low-cost hardware to create clusters of thousands of computer nodes to store and process data. has several modules, but the two main components are Hadoop Distributed File System (HDFS) and MapReduce

Hadoop Distributed File System (HDFS)

a highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds. In order to achieve high throughput, uses the write-once, read many model. This means that once the data is written, it cannot be modified

HDFS modes

3 Types: - name node- data node- client node

name node (HDFS)

stores all the metadata about the file system

data node (HDFS)

stores fixed-size data blocks (that could be replicated to other data nodes)

client node (HDFS)

acts as the interface between the user application and the HDFS

MapReduce

An open-source application programming interface (API) that provides fast data analytics services; one of the main Big Data technologies that allows organizations to process massive data stores; works with structured and nonstructured data

Map and Reduce

two main functions of MapReduce

map function

takes a job and divides it into smaller units of work

reduce function

collects all the output results generated from the nodes and integrates them into a single result set

NoSQL

a large-scale distributed database system that stores structured and unstructured data in efficient ways

key-value data model

based on a structure composed of two data elements: a key and a value, in which every key has a corresponding value or set of values; also referred to as the attribute-value or associative data model

sparse data (NoSQL)

cases in which the number of attributes is very large but the number of actual data instances is low

eventual consistency (NoSQL)

means that updates to the database will propagate through the system and eventually all data copies will be consistent; data is not guaranteed to be consistent across all copies of the data immediately after an update

Intersection Data

describes the relationship between the two entities (Quantity) in many-to-many relationships

Associative Entity

An entity type that associates the instances of one or more entity types and contains attributes that are peculiar to the relationship between those entity instances. (many-to-many relationships)Ex. indicates a relationship between a salesperson and a product, specifically the fact that a particular salesperson has been involved in selling a particular product, and includes any intersection data that describes this relationship

unique identifier

a way of uniquely identifying each record in the database

abstract data type

Data type that describes a set of similar objects with shared and encapsulated data representation and methods. An abstract data type is generally used to describe complex objects. See also class.

Application Programming Interface (API)

Software through which programmers interact with middleware. An API allows the use of generic SQL code, thereby allowing client processes to be database server-independent

complex object

An object formed by several different objects in complex relationships. See also abstract data types

conceptual model

The output of the conceptual design process. The conceptual model provides a global view of an entire database and describes the main data objects, avoiding details

conceptual schema

A representation of the conceptual model, usually expressed graphically. See also conceptual model

external model

The application programmer's view of the data environment. Given its business focus, an external model works with a data subset of the global database schema

External Schema

The specific representation of an external view; the end user's view of the data environment

hardware independence

A condition in which a model does not depend on the hardware used in the model's implementation. Therefore, changes in the hardware will have no effect on the database design at the conceptual level

internal model

In database modeling, a level of data abstraction that adapts the conceptual model to a specific DBMS model for implementation; representation of a database as "seen" by the DBMS. In other words, requires a designer to match the conceptual model's characteristics and constraints to those of the selected implementation model.

internal schema

A representation of an internal model using the database constructs supported by the chosen database

object-oriented data model

A data model whose basic modeling structure is an object

object-oriented database management system (OODBMS)

Data management software used to manage data in an object-oriented database model

relational model

Developed by E. F. Codd of IBM in 1970, the relational model is based on mathematical set theory and represents data as independent relations. Each relation (table) is conceptually represented as a two dimensional structure of intersecting rows and columns. The relations are related to each other through the sharing of common entity characteristics (values in columns)

superclass

In a class hierarchy, the superclass is the more general classification from which the subclasses inherit data structures and behaviors

table

A logical construct perceived to be a two dimensional structure composed of intersecting rows (entities) and columns (attributes) that represents an entity set in the relational model

Versioning

A property of an OODBMS that allows the database to keep track of the different transformations performed on an object