Lesson 2 Data Modeling Flashcards
Data Modeling
the first step in designing a database, refers to the process of creating a specific data model for a determined problem domain
problem domain
a clearly defined area within the real-world environment, with a well-defined scope and boundaries that will be systematically addressed
data model
a relatively simple representation, usually graphical, of more complex real-world data structures; used in the database design phase of the Database Life Cycle.
entity
a person, place, thing, or event about which data will be collected and stored; represents a particular type of object in the real world, which means it is "distinguishable"âthat is, each occurrence is unique and distinct
attribute
a characteristic of an entity
relationship
describes an association between entities
one-to-many relationship
A relationship between two tables in a database in which one record in the primary table can match many (zero, one, or many) records in the related table.Expressed as 1:M or 1..*
many-to-many relationship
In databases, a relationship in which one record in Table A can relate to many matching records in Table B, and vice versa.Expressed as M:N or ..
one-to-one relationship
In databases, a relationship in which each record in Table A can have only one matching record in Table B, and vice versa.Expressed as 1:1 or 1..1
constraint
restriction placed on the data, usually expressed in the form of rules. For example, "A student's GPA must be between 0.00 and 4.00." Constraints are important because they help to ensure data integrity
business rule
describe the main and distinguishing characteristics of the data as viewed by the company
Relationships are
bidirectional
hierarchical model
An early database model whose basic concepts and characteristics formed the basis for subsequent database development. This model is based on an upside-down tree structure in which each record is called a segment. The top record is the root segment. Each segment has a 1:M relationship to the segment directly below it.
segment
In the hierarchical data model, the equivalent of a file system's record type; a higher layer is perceived as the parent of the segment directly beneath it, which is called the child
network model
An early data model that represented data as a collection of record types in 1:M relationships; allows a record to have more than one parent
schema
A logical grouping of database objects, such as tables, indexes, views, and queries, that are related to each other; the conceptual organization of the entire database as viewed by the database administrator
subschema
The portion of the database that interacts with application programs that actually produce the desired information from the data within the database
Data Manipulation Language (DML)
The set of commands that allows an end user to manipulate the data in the database, such as SELECT, INSERT, UPDATE, DELETE, COMMIT, and ROLLBACK
Data Definition Language (DDL)
The language that allows a database administrator to define the database structure, schema, and subschema
relational database
A database that represents data as a collection of tables in which all data relationships are represented by common values in related tablesIntroduced in 1970 by E. F. Codd of IBM in his landmark paper "A Relational Model of Data for Large Shared Databanks" (Communications of the ACM, June 1970, pp. 377-387).
relation
A logical construct perceived to be a two dimensional structure composed of intersecting rows (entities) and columns (attributes) that represents an entity set in the relational model; sometimes called a table
Each column represents an
attribute
relational database management system (RDBMS)
a DBMS that organizes data in tables or relations; translates a user's logical requests (queries) into commands that physically locate and retrieve the requested data
relational diagram
a graphical representation of a relational database's entities, the attributes within those entities, and the relationships among the entities
end-user interface
the interface allows the end user to interact with the data (by automatically generating SQL code)
collection of tables stored in the database
all data is perceived to be stored in tables. The tables simply "present" the data to the end user in a way that is easy to understand. Each table is independent. Rows in different tables are related by common values in common attributes
SQL engine
hidden from the end user, the SQL engine executes all queries, or data requests; said to be a declarative language that tells what must be done but not how
Entity Relationship Model
A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level with the help of ER diagrams. The model was developed by Peter Chen
Entity Relationship Diagram
A diagram that depicts an entity relationship model's entities, attributes, and relations
entity (ERM)
represented in the ERD by a rectangle, also known as an entity box
entity instance
Each row in the relational table
entity occurrence
Each row in the relational table
entity set
A collection of like entities
attribute (ERM)
particular characteristics of the entity
relationships (ERM)
describe associations among data
connectivity (ERM)
type of relationship between entities, classifications include 1:1, 1:M, and M:N; name of the relationship is usually an active or passive verb. For example, a PAINTER paints many PAINTINGs, an EMPLOYEE learns many SKILLs, and an EMPLOYEE manages a STORE
Chen notation
In ____________________, relationships are represented by a diamond connected to the related entities through a relationship line.
Crow's Foot notation
In ___ a three-pronged symbol represents the "many" side of the relationship.
class diagram notation
the connectivities are represented by lines with symbols (1..1, 1..*), uses names in both sides of the relationship; part of UML
object orientated Data model (OODM)
both data and its relationships are contained in a single structure known as an object
object
described by its factual content, but unlike an entity, an object includes information about relationships between the facts within the object, as well as information about its relationships with other objects
Object-Oriented Database Management System (OODBMS)
stores the data and procedures that act on those data as objects that can be automatically retrieved and shared; said to be semantic data model
attribute (OODM)
describe the properties of an object
class (OODM)
a collection of similar objects with shared structure (attributes) and behavior (methods)
method (OODM)
represents a real-world action such as finding a selected PERSON's name, changing a PERSON's name, or printing a PERSON's address. In other words, the equivalent of procedures in traditional programming languages. In OO terms, define an object's behavior
class hierarchy
The organization of classes in a hierarchical upside down tree in which each parent class is a superclass and each child class is a subclass. For example, the CUSTOMER class and the EMPLOYEE class share a parent PERSON class. See also inheritance.
Inheritance
the ability of an object within the class hierarchy to inherit the attributes and methods of the classes above it. For example, two classes, CUSTOMER and EMPLOYEE, can be created as subclasses from the class PERSON. In this case, CUSTOMER and EMPLOYEE will inherit all attributes and methods from PERSON
Unified Modeling Language (UML)
A language based on object-oriented concepts that provides tools such as diagrams and symbols to graphically model a system
UML Class Diagram
used to represent data and its relationships within the larger UML object-oriented system's modeling language
extended relational data model (ERDM)
adds many of the OO model's features within the inherently simpler relational database structure; gave birth to a new generation of relational databases that support OO features such as objects (encapsulated data and methods), extensible data types based on classes, and inheritance; often described as an object/relational database management system (O/R DBMS)
object/relational database management system (O/R DBMS)
A DBMS based on the extended relational model. The ERDM, championed by many relational database researchers, constitutes the relational model's response to the OODM. This model includes many of the object-oriented model's best features within an inherently simpler relational database structure.
Extensible Markup Language (XML)
A meta-language used to represent and manipulate data elements. Unlike other markup languages, XML permits the manipulation of a document's data elements. XML facilitates the exchange of structured documents such as orders and invoices over the Internet
Big Data
refers to a movement to find new and better ways to manage large amounts of web and sensor-generated data and derive business insight from it, while simultaneously providing high performance and scalability at a reasonable cost
3 Vs of Big Data
volume, velocity, variety
volume
the amounts of data being stored
velocity
not only to the speed with which data grows but also the need to process this data quickly in order to generate information and insight
variety
the fact that the data being collected comes in multiple different data formats
Hadoop, MapReduce, NoSQL
Some of the most frequently used Big Data technologies
Hadoop
a Java based, open source, high speed, fault-tolerant distributed storage and computational framework. uses low-cost hardware to create clusters of thousands of computer nodes to store and process data. has several modules, but the two main components are Hadoop Distributed File System (HDFS) and MapReduce
Hadoop Distributed File System (HDFS)
a highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds. In order to achieve high throughput, uses the write-once, read many model. This means that once the data is written, it cannot be modified
HDFS modes
3 Types: - name node- data node- client node
name node (HDFS)
stores all the metadata about the file system
data node (HDFS)
stores fixed-size data blocks (that could be replicated to other data nodes)
client node (HDFS)
acts as the interface between the user application and the HDFS
MapReduce
An open-source application programming interface (API) that provides fast data analytics services; one of the main Big Data technologies that allows organizations to process massive data stores; works with structured and nonstructured data
Map and Reduce
two main functions of MapReduce
map function
takes a job and divides it into smaller units of work
reduce function
collects all the output results generated from the nodes and integrates them into a single result set
NoSQL
a large-scale distributed database system that stores structured and unstructured data in efficient ways
key-value data model
based on a structure composed of two data elements: a key and a value, in which every key has a corresponding value or set of values; also referred to as the attribute-value or associative data model
sparse data (NoSQL)
cases in which the number of attributes is very large but the number of actual data instances is low
eventual consistency (NoSQL)
means that updates to the database will propagate through the system and eventually all data copies will be consistent; data is not guaranteed to be consistent across all copies of the data immediately after an update
Intersection Data
describes the relationship between the two entities (Quantity) in many-to-many relationships
Associative Entity
An entity type that associates the instances of one or more entity types and contains attributes that are peculiar to the relationship between those entity instances. (many-to-many relationships)Ex. indicates a relationship between a salesperson and a product, specifically the fact that a particular salesperson has been involved in selling a particular product, and includes any intersection data that describes this relationship
unique identifier
a way of uniquely identifying each record in the database
abstract data type
Data type that describes a set of similar objects with shared and encapsulated data representation and methods. An abstract data type is generally used to describe complex objects. See also class.
Application Programming Interface (API)
Software through which programmers interact with middleware. An API allows the use of generic SQL code, thereby allowing client processes to be database server-independent
complex object
An object formed by several different objects in complex relationships. See also abstract data types
conceptual model
The output of the conceptual design process. The conceptual model provides a global view of an entire database and describes the main data objects, avoiding details
conceptual schema
A representation of the conceptual model, usually expressed graphically. See also conceptual model
external model
The application programmer's view of the data environment. Given its business focus, an external model works with a data subset of the global database schema
External Schema
The specific representation of an external view; the end user's view of the data environment
hardware independence
A condition in which a model does not depend on the hardware used in the model's implementation. Therefore, changes in the hardware will have no effect on the database design at the conceptual level
internal model
In database modeling, a level of data abstraction that adapts the conceptual model to a specific DBMS model for implementation; representation of a database as "seen" by the DBMS. In other words, requires a designer to match the conceptual model's characteristics and constraints to those of the selected implementation model.
internal schema
A representation of an internal model using the database constructs supported by the chosen database
object-oriented data model
A data model whose basic modeling structure is an object
object-oriented database management system (OODBMS)
Data management software used to manage data in an object-oriented database model
relational model
Developed by E. F. Codd of IBM in 1970, the relational model is based on mathematical set theory and represents data as independent relations. Each relation (table) is conceptually represented as a two dimensional structure of intersecting rows and columns. The relations are related to each other through the sharing of common entity characteristics (values in columns)
superclass
In a class hierarchy, the superclass is the more general classification from which the subclasses inherit data structures and behaviors
table
A logical construct perceived to be a two dimensional structure composed of intersecting rows (entities) and columns (attributes) that represents an entity set in the relational model
Versioning
A property of an OODBMS that allows the database to keep track of the different transformations performed on an object