[데이터베이스] 데이터베이스 기말고사 요점정리 공유합니다.

학교공부/데이터베이스

[데이터베이스] 데이터베이스 기말고사 요점정리 공유합니다.

yunmap 2017. 7. 30. 22:13

데이터베이스 기말고사 요점정리

안녕하세요. 이번에는 기말고사 요점정리를 가져왔습니다.!!!

그런데 제가 원래 hwp 파일로 저장을 해서 사용하는데 그대로 복사해서 올리다보니 좀 글씨가 따닥따닥 붙어있는 느낌이네요..

우선 올리고 좀 더 편하게 볼 수 있도록 수정해보겠습니다. ㅎㅎㅎ

DataBase System concepts 6th edition 사용했고, transaction 전까지 시험범위입니다.

7. Entity-Relationship Model

design phases : 요청 -> 도출 -> 분석(user requirement analysis) -> 기록(어떤 업무에 사용할지, 무슨 데이터가 필요할지)

Logical design : 의미적 data의 그룹화. Find a “good” collection of relation schema.(deciding on the DB schema)

Business : what attributes should we record in DB.

Computer science : what relation schemas should we have and how should the attr be distributed among the various relation schemas.

Physical design : deciding on the physical layout of the DB.

data model은 공통된 표기법을 제시하고, 이를 바탕으로 결과물을 만든다. -> data modeling

하나를 바꾸기 위해 전체를 수정하는 비효율적인 상황 방지 -> conceptual data modeling

Entity : a thing or object in the enterprise that is distinguishable from other objects. (described by a set of attributes)

Relationship : an association among several entities.

use case diagram : application 개발 전 단계에서 수행하는 어떤 기능이 필요한지 명확히 분석하는 도구이다. (유저와 시스템 상호작용을 나타낸다.)

- representing how a business system interacts with its environment

- describe scenarios that describe the interaction between users of the system (actor) and the system itself.

- describe what a system does from the standpoint of an external observer. Emphasis is on what a system does.

(rather than how)

- scenario : an example of what happens when someone interacts with the system.

use case : summary of scenarios for a single task or goal.

- three component :

1. task (represents a feature needed in a software system)

2. actor (trigger the use case task to activate)

3. communication (show how the actors communicate with the use case)

actor : 사용하는 사람들이 누가 있는지를 명확하게 하기 위해서 만들어줌. scenario를 만드는 대상이고, actor가 어떤 task를 하는지 연결해 주는 것이 communicate이다. (ex. human, hardware, external system, subsystem, time, time based event)

어떤 actor가 한 task를 하면서 사용하는 모든 것이 entity가 될 수 있다. -> ER diagram. (attribute : entity에 대한 부연적 설명)

ER model의 three basic concepts : entity sets, relationship sets, attributes. (diagrammatic하다는 것은 diagram으로 표현가능이란 뜻)

entity set : sets of entities of the same type that share the same properties (Set of all persons, companies, trees)

primary ket : uniquely identifying each member of the set.

강재우 entity, 박성빈 entity가 모여 교수 entity set을 형성한다.

(student가 entity set이고, student ID와 student_name이라는 공통된 attribute를 가진다.)

relationship : association among several entities

relationship set : mathematical relation among n>=2 entities, each taken from entity sets.

(A1, B1) ∊ relationship set

또한, relationship이 생기면서 새로운 attribute들이 생성될 수 있다.

binary relationship : 관련된 entity가 2의 n승인 경우. (대부분 2개)

mapping cardinality constraints : one to one, one to many, many to one, many to many (entity와 entity간의 관계)

한 셋의 엔티티 a는 다른 셋의 엔티티 중 어느 것과도 연결이 안될 수도 있다.

complex attribute : attribute에 여러가지 특성이 있고, 이를 통해서 simple/composite, single-valued/multivalued, derived/stored로 나눌 수 있다.

- simple/composite : 주민등록번호에서 970224-20xxxxx 이렇게 되있는데 970224/2/xxxxxx로 나누면 앞은 생년월일 뒤는 성별이잖아 -> 새로운 의미를 갖게 됨! 이런게 composite이고, 나누면 고유의 값을 잃는게 single이다. (ID 같은거)

- single-valued/multivalued : multivalued는 막 한 값이 여러개일 수 있는 것임. 핸드폰번호도 2~3개 일 수 있는 것처럼.

- derived : 다른 attribute로 유도될 수 있는 것. 주민번호 attribute로 생년월일, 나이, 성별을 알 수 있음. birth()라고 표현

domain : the set of the permitted values for each attribute

주민번호

생일

성별

주민번호

/ ＼

생일 성별

↑ 이렇게 주민번호라는 composite attribute를 표현한다. 생일 성별은 component이다.

redundant attribute : 관련이 없거나 중복된 속성을 제거한다. 그러나 다시 table로 고칠 때 지웠던 attr이 재등장 하는 경우가 있음.

weak entity set : redundant한 relationship set을 지워서 relationship이 implicit하게 attribute에 있는 것을 말한다. 개체 집합의 Key값(각 개체들을 구분할 수 있는 값)이 하나라도 다른 개체 집합에 속하는 속성으로 표현될 때, 즉, foreign key 없이 자기 자신의 attribute만으로는 어떤 entity인지 식별할 수 없는 entity가 되겠지요.

예를 들어 section = {semester, year, sec_id}일 때 서로 다른 코스에 대한 섹션이 sec_id, semester, year를 공유할 수 있고 즉, section이라는 entity는 자신의 attribute들로는 구별이 안된다. 이런게 weak entity이고, course라는 entity의 course_id값이 필요하므로 course entity를 identifying entity라고 하며, course_id는 discriminator라고 한다. (weak는 primary key가 없다.)

-> weak entity는 identifying entity와 관련이 지어져야 하기 때문에 weak entity set을 existence dependent on the identifying entity set이라고 한다. 그리고, identifying entity set은 own the weak entity set that it identifies라고도 한다.

-> weak entity set과 identifying entity set의 association은 identifying relationship이라고 한다.

-> 그리고! discriminator는 relational schema에서 weak entity set에 attribute로 포함된다. (비록 실제로는 없더라도)

-> discriminator(partial key)는 weak entity set의 모든 entity를 구분할 수 있는 attribute의 집합.

⟶ 는 one을 나타내고 ㅡ는 many를 나타낸다. 화살표는 entity에 찍혀야 된다.

= 이렇게 생긴 건 total participation이라고 하는데 entity set의 모든 entity가 적어도 하나의 relation ship set과 associate되어있다는 의미이다.

이거는 모든 student가 적어도 한 명의 associated instructor를 가져야 한다는 의미이다.

partial participation : 몇 개의 entity는 relationship set의 어떤 relationship에도 participate하지 않을 수도 있다.

(은행에서 계좌가 없는 고객이 있을 수 있는 것처럼.)

좀 더 자세한 제약을 표혀할 수 있는데 X..Y에서 X가 최소 cardinality이고 Y가 최대의 cardinality이다.

X=1이면 total participation이고, Y=1이면 - to one이겠지. 그리고 Y가 *인 것은 no limit을 나타낸다.

instructor	entity의 이름
ID	primary key
name	composite하다.
first_name	component1
middle_name	component2
last_name	component3
address	composite하다.
street	component1인데 composite하다.
street_number	component1-1
street_name	component1-2
apt_number	component1-3
city	component2
state	component3
{phone_number}	여러 개의 값을 가질 수 있다. multivalued
date_of_birth	그냥 한 attribute
age()	derived된 것이다. (주민번호를 어디서 가져왔나봄.)

weak entity는 테두리를 =이렇게 그린다. relationship set도 두겹으로 감싸야 된다. course_id는 discriminator

그리고 partial key(discriminator)는 -----이렇게 나타낸다. sec_course는 모든 primary key를 가진다.

그리고 section의 primary key는 course_id, sec_id, semester, year이다.

schema로 바꿀 때 composite attributes는 component들로 대체시킨다.ㅠ.. 그리고 그냥 section(course_id, sec_id, sem, year)이런식으로 나타내는게 schema다. 밑줄이 primary key를 의미한다.

그리고 multivalued는 그냥 E M 이런식으로 나타낸다. (2222,010-2050-3417), (2222,010-9252-4671) 이렇게! 같은 ID에 두 개의 핸드폰 번호. 그리고 inst_phone = (ID, phone_number)로 표현.

role이 필요한 이유는 : 선수/후수를 나타내기 위해. 선수강 강의 있는 것 처럼 그런거 나타내는 것임.

redundancy of schema : one to many 일 때 새로운 relation생성하지 말고, many에 해당하는 entity에 one의 primary key 추가.

만약 one to one이면 아무 쪽이나 many의 역할을 할 수 있다.

(기존 entity에 foreign key를 붙이면 훨씬 편하다. 단 외래키는 반드시 many쪽으로 가야 한다.)

weak entity는 strong entity set이 없으면 필요 없다. -> existence dependency.

specialization : 세분화하여 표기를 간단하게 한다.

overlap : A와 B가 둘 다 될 수 있는 것. (한 entity에 여럿이 포함될 수 있을 때). employee와 student

disjoint : A와 B가 동시에 될 수 없는 것. instructor와 secretary.

이 specialization을 표현하는 방법엔 2가지 방법이 있음.

1. higher-level entity에 대한 schema를 작성하고, each lower-level entity에 대해서 schema를 작성한다. (higher level의 primary key와 각자의 local attribute를 포함해야 한다.) 이거 단점은 위 상황에서 employee에 대한 정보를 가져올 때 2개의 relation에 접근을 해야함. 하나는 low level schema고 나머지는 high level schema이다.

2. schema를 모든 entity에 대해서 만드는데 local과 inherited attribute를 포함해서 만든다. (상위 entity attribute 포함). 이거 단점은 당연히 name, street, city가 redundant하게 저장된다는 것임. (student와 employee 둘 다 에게)

generalization : 공통된 것을 추출한다.

- bottom-up design process : 같은 feature들을 공유하는 entity set들을 higher-level entity set으로 합친다.

specialization과 generalization의 디자인 constraints :

- completeness constraint : higher level entity가 최소 하나의 lower level entity을 generalization한 것에 속하는지.

- total : higher level entity가 lower level entity 중 최소 하나에 속해야한다.

- partial : 그럴 필요가 없다. (기본값)

aggregation : allows relationships between relationships, treat relationship as an abstract entity, abstraction of relationship into new entity

-> schema로 나타낼 땐 aggregated relationship의 primary key와 관련된 entity set의 primary key그리고 descriptive attribute를 포함해야 한다.

relationship set은 그냥 entity들 간에 일어나는 action을 약간 설명하는 의미로 사용하는 게 좋을 듯.

non-binary에서 binary로 바꾸려면 relation을 entity로 바꾸고 나머지 entity랑 각각의 entity로 엮으면 된다.

그리고, 주의할 점은 바꾸고 constraint들도 수정해야 된다. 그리고 새로 만들어진 entity를 weak entity로 만들면 어느정도 해결.

ER design issue 5개

1. use of an attribute or entity set to represent an object

2. use of a ternary relationship versus a pair of binary relationships.

3. use of a strong or weak entity set.

4. use of specialization/generalization - contributes to modularity in the design.

5. use of aggregation

8. Relational database design

schema를 그냥 합쳐버리면 중복이 많아져서 저장 공간의 비효율성이 발생할 가능성이 높다. -> decompose!

-> decomposition을 할 때 어떤 attribute를 묶을지 정해야한다. (functionally determinate)

inst_dept를 instructor와 department로 나눈다고 하자. 그리고 rule이 있는데 바로, dept_name, building, budget이 있을 때 dept_name이 candidate key라는 규칙이다. 그러면 dept_name -> building, budget으로 나타낼 수 있음. => functional dependency. 그러면 inst_dept에서는 dept_name이 candidate key가 아니므로 building이랑 budget이 repeat될 수 있다. -> inst_dept를 decompose해야 한다.

그러나, 항상 decomposition이 좋은 건 아니야. Lossy decomposition이란 것이 있는데 손실된 정보로 인해서 정보가 무분별하게 많아지고 원본으로 돌아갈 수 없는 상태를 말한다. -> lossless-join decomposition이 필요하다.

First normal form : relation schema R이 first normal form이려면 R의 모든 attribute의 domain이 atomic해야 한다.

Theory

1. 만약 R이 good form이 아니면 decompose를 시도하는데 결과물이 모두 good form이어야 하고, lossless-join decompose사용해야 한다.

2. functional dependency : key처럼 어떤 attr의 value가 다른 set의 attribute의 값을 unique하게 결정할 수 있는 것.

서로 다른 두 레코드에 대해서 a의 column값이 같으면 b도 같아야 a가 b를 determine한다고 말한다.

그리고 한 번 생성되면(hold) constraint로 작동하여 위배하는 것을 추가할 수 없다.

a는 determinant(결정자), b는 dependent(종속자)이다.

superkey : K가 R의 superkey이면 K -> R (필요충분조건)

candidate key : K가 R의 candidate key려면 K -> R이고 a ⊂ K일 때 a -> R일 수 없다.

trivial : relation의 모든 instance에 대해서 만족될 때.

a -> b 는 b ⊆ a 일 때 trivial 하다.

R satisfy F는 relation이 functional dependency를 만족하는 것을 말하고, F holds on R은 functional dependency를 모든 relation R이 만족할 때를 말한다.

closure of functional dependency : functional dependency(F)로 유도될 수 있는 다른 functional dependency.

그리고 유도된 모든(all) functional dependency를 closure of F라고 한다. = F+

closure는 F의 superset이다.

3. multi-valued dependency : 학과 ->-> 근무자, 학과 ->-> 전화번호처럼 여러 개가 해당할 수 있음. functional dependency가 있으나, 의미적으로 맞지 않는다!

BCNF, 3NF는 등장 이유 및 약간 세세하게.

BCNF(Boyce-Codd Normal Form) : F+의 모든 것을 a -> b로 나타낼 때 모든게 둘 중 하나를 만족해야 한다. a -> b가 trivial이거나 a가 R의 superkey일 때. (모든 determinant가 candidate key일 때) -> 데이터의 중복성이 매우 없음. 그러나, BCNF로 만들기 위해서 decompose하는 과정에서 functional dependency가 사라지는 것이 매우 빈번하고 사실 보존하기 힘듦. 그래서 이거보다 약한 normal form인 3NF를 많이 이용. -> redundancy를 좀 봐준다.

3NF : F+ 의 모든 것을 a -> b로 나타낼 때 마찬가지로 적어도 하나를 만족해야 한다.

a -> b 가 trivial

a 가 R의 superkey

b - a의 attribute가 R의 candidate key에 포함 (each attribute는 may be in a different candidate key)

BCNF 이면 3NF이다. b - a의 attribute를 이용한 조건이 추가된 이유가 dependency preservation을 ensure하기 위함이다.

Armstrong's Axioms :

1. if b ⊆ a, then a -> b (reflexivity)

2. if a -> b, then ra -> rb (augmentation)

3. if a -> b, and b -> r, then a -> r (transitivity)

-> sound (generate only functional dependencies that actually hold) and complete (generate all functional dependencies that hold)

Additional rule :

1. if a -> b holds and a -> r holds, then a -> br holds (union)

2. if a -> br holds, than a -> b holds and a -> r holds (decomposition)

3. if a -> b holds and r b -> x holds, then a r -> x holds (pseudotransitivity)

이 세 가지의 룰은 armstrong의 axiom에서 도출됨.

result 로 relation 내 모든 attribute가 유도될 수 있다면 super key이다. 그러나 candidate key이려면 irreducible해야 한다.

functional dependency : a -> b가 hold인지 확인하려면 (F+에 포함되는지) b ⊆ a+ 인지 확인하면 된다.

F+를 구하는 법 : r ⊆ R인 모든 r에 대해서 각각의 r+를 찾고, 각 r+에 대해서 s ⊆ r+인 모든 각각 s에 대해 r -> s이다.

canonical cover : sets of functional dependencies may have redundant dependencies that can be inferred from the others

A -> C 는 {A -> B, B -> C, A -> C} 에서 redundant 이다.

따라서, F의 canonical cover는 최소의 set of functional dependencies equivalent to F라고 할 수 있다.

(redundant dependency 또는 redundant part of dependency가 존재하지 않는다.)

Extraneous Attributes : 굳이 없어도 되는 것.

1. a -> b 이고 A ∈ a 일 때 ({a} - A)+를 계산했을 때 (F이용) b를 포함하면 A가 a에서 extraneous

2. a -> ｂ 이고 A ∈ b 일 때, (F - {a -> b}) union (a -> (b-A)) 만 사용해서 a+를 구했을 때 a+에 A 있으면 A is extraneous in b

canonical cover는 특정 set에 대해서 다 다를 수는 있지만 closure는 항상 동일하고, functional dependency의 left side는 unique.

1. a -> b, a -> c는 a -> bc로 합친다.

2. extraneous를 지운다.

위 과정을 반복한다.

loseless-join decomposition인지 확인하는 방법 (이거는 constraints가 functional dependency일 때 지켜야 하는 부분)

1. R = (R1, R2) 일 때, F+에 R1 ∩ R2 -> R1 이거나 R1 ∩ R2 -> R2 이면 된다.

그러나 이거 만족한다고 dependency를 preserve하는 것은 아님.

(하나의 Rn으로 dependence를 알 수 없으면 dependence preserving이 실패한 것임)

dependency preserving인지 확인하려면 (F1 ∪ F2 ∪ ... ∪ Fn)+ = F+ 이면 된다.

non trivial dependency a -> b 가 BCNF를 불만족하는지 알아보는 법

1. a+를 계산한 뒤 R의 모든 attribute를 가지면 a는 R의 superkey이다.

F의 모든 dependency가 violation일으키지 않으면 F+도 안 일으킬 것이다.

항상 맞는 것은 아니다.

57p부터 읽어보자

10. Storage and file structure

나누는 기준 : speed with which data can be accessed, cost per unit of data, reliability(data loss on power failure or system crash)

volatile : power를 끄면 정보도 날아간다.

non-volatile : power 꺼도 정보는 유지된다. (secondary, tentiary storage)

cache - main memory - flash memory - magnetic disk - optical disk - magnetic tapes

storage hierarchy
primary storage	빠르지만 휘발성이다. cache, main memory
secondary storage	on-line storage. 비휘발성
tentiary storage	off-line storage. 비휘발성. 느린 access time. (음악 tape도 해당되는데 이 테이프는 sequential 하다.)

RAID : 중복 데이터를 저장(똑같은 것을 만들어둔다.)

File > record > field (record의 최소 단위)

만약 recored의 길이가 일정하고, 각 file마다 한 type의 record가 있으며 서로 다른 file에 서로 다른 reaction을 하면 가장 쉬운 case

1. recored size가 n으로 fix면 i번째 record는 n*(i-1)에 저장한다. 만약 record가 block을 넘을 수 있으니 넘지 않도록 규제한다.

<i번째 record를 지웠을 때>

- compaction : 뒤의 record를 모두 한 칸 씩 땡긴다.

- 마지막 data를 i의 위치로 이동한다.

- record들을 그냥 두고, 빈자리를 header를 통하여 free list에 link 시킨다.

=> 사용중인 record는 pointer를 저장하지 않아서 space efficient 하다.

record의 길이가 서로 다른 경우

1. file에 여러 type의 record를 저장할 때

2. record의 type이 varchar같은 거라서 같은 type이라도 서로 다른 길이가 가능할 때

3. repeating field를 allow하는 record type일 때

그리고 이 때는 우선 fixed length인 record부터 넣고 그 이후에 variable인 것을 넣는다.

slotted page : number of record entries, end of free space in the block, location and size of each record

헤더 - free space - record의 형태

record가 이동 가능하고, header의 update가 필요하다.

data를 pointer가 직접 record를 가르키지 않고 record의 entry를 가르킨다.

organization of records in files
heap	어디든 넣을 수 있지만 가져오기 힘들다.
sequential	search key를 정해서 search key의 order순으로 넣는다. 추가와 관리가 어렵지만 찾기 쉽다.
hash	연관된 records는 같은 block에 보관한다. -> 서로 다른 relation의 record도 같은 file에 저장될 수 있다.

sequential은 자기 다음 record의 위치를 pointer를 통해 나타낸다. chain만 업데이트하면 insert가 가능하다.

multitable clustering file organization : 서로 다른 relation에 대해서 길이를 맞춰준다. 유저가 지정하는게 아니고 DBMS가 한다.

포인터로 같은 relation끼리 연결해서 나타낼 수 있다.

buffer : main memory의 한 부분으로 disk block의 copy를 저장한다.

buffer manager : buffer를 사용하기 위해서 필요하다.

Buffer replacement policy
LRU	buffer가 다 찼을 때 최근에 가장 사용되지 않은 것을 선택하여 바꿔준다.
pinned block	memory block that is not allowed to be written back to disk
toss immediate	block의 final tuple이 프로세스 완료되면, 그 block이 차지한 공간을 바로 free해준다.
MRU	system must pin the block currently being processed (다 끝나면 unpin되고, MRU 블록이 된다.)

forced output : 정보를 저장하여 보존성을 높인다.

저작자표시 비영리 변경금지 (새창열림)