DBMS CHAPTER WISE QUESTONS COLLECTION

GYAN WALLA
0



Unit 1: Database and Database Users

Q1: What are the advantages of using a Database Management System (DBMS) over a traditional file system? 

Solution

A Database Management System (DBMS) is software that allows users to create, manage, and manipulate databases efficiently.

It provides an organized and systematic way to store, retrieve, and maintain data compared to a traditional file-based system.

The advantages of using a Database Management System (DBMS) over a traditional file system are 

Data Redundancy Control:

DBMS minimizes data duplication by maintaining a single data repository shared among multiple applications.

Data Consistency:

Since redundancy is reduced, data remains uniform and accurate across the system.

Data Sharing:

Multiple users and applications can access the same database simultaneously in a controlled manner.

Data Integrity:

DBMS enforces rules and constraints to maintain the correctness and validity of data.

Data Security:

Provides user authentication, access control, and encryption to protect sensitive data.

Backup and Recovery:

Automatic backup and recovery features ensure data safety in case of system failure.

Data Independence:

Changes in data structure do not affect application programs, improving flexibility.

Efficient Query Processing:

Query languages like SQL make data retrieval faster and more convenient.

Concurrency Control:

DBMS manages simultaneous data access without conflicts or data loss.

Reduced Application Development Time:

Centralized management and predefined operations simplify application development.


Q2: What are the characteristics of the database approach? 

Solution

The characteristics of the database approach are 

Self-describing nature: Database stores both data and metadata together.

Program-data independence: Application programs are independent of data storage structure.

Multiple views: Different users can access customized views of the same data.

Data sharing and concurrency: Allows multiple users to access data simultaneously without conflict.

Centralized control: Ensures consistency, reduced redundancy, and controlled access.

Data integrity and security: Maintains accuracy, validity, and restricted data access.

Query language support: Provides easy data manipulation using SQL.

Data independence: Structure or storage changes do not affect application programs.

Backup and recovery: Protects data from loss due to failures.

Reduced data redundancy: Eliminates unnecessary data duplication across the system.


Q3: Who are the different types of database users ("actors on the scene") and what are their roles? 

Solution

The different types of database users ("actors on the scene") and what are their roles are 

Database Administrator (DBA): Manages database storage, security, backup, and performance.

Database Designers: Define data models, relationships, and database structure.

End Users: Use applications or query tools to retrieve and manipulate data.

Application Programmers: Develop programs that interact with the database.

System Analysts: Analyze business requirements and design suitable database solutions.

Data Entry Operators: Input and update data in the database through forms or interfaces.

Casual Users: Occasionally access data using ad-hoc queries or reports.

Sophisticated Users: Use advanced query languages and analysis tools for data exploration.





Unit 2: Database System – Concepts and Architecture


Q1: Explain the ANSI/SPARC three-schema architecture with a suitable diagram. 

Solution

The ANSI/SPARC three-schema architecture is a framework that separates the database into three levels to provide data abstraction and independence between users and the physical database.


1. Internal Level (Physical Schema):

Describes how data is physically stored in the database.

Deals with file structures, indexing, and access paths.


2. Conceptual Level (Logical Schema):

Represents the entire database structure for the organization.

Describes entities, relationships, and constraints without showing physical details.


3. External Level (View Schema):

Provides individual user views of the database.

Each user sees only the data relevant to them, improving security and simplicity.


Data Independence:

Logical Data Independence: Changes in the conceptual schema don’t affect external views.

Physical Data Independence: Changes in physical storage don’t affect the conceptual schema.


Q2: Define data independence and explain its types (logical and physical). 

Solution

Data Independence refers to the ability to modify a schema at one level of the database without affecting the schema at the next higher level.

It ensures that changes in data structure or storage do not affect application programs.


Types of Data Independence:


1. Logical Data Independence:

The ability to change the conceptual schema (like adding/removing fields or tables) without altering the external schema or user views.

Example: Adding a new attribute to a table should not require modifying existing application programs.


2. Physical Data Independence:

The ability to change the internal schema (like file organization, indexing, or storage structure) without affecting the conceptual schema.

Example: Changing from sequential file storage to indexed storage without altering the logical structure of the database.



Q3: Define the terms schema and instance in a DBMS with examples. 

Solution

Schema:

A schema is the logical structure or blueprint of a database that defines how data is organized.

It includes definitions of tables, attributes, relationships, and constraints.

Example: In a student database, the schema may define a table Student(RollNo, Name, Age, Address).


Instance:

An instance is the actual content of the database at a particular moment in time.

It represents the current state of data stored according to the schema.

Example: This table content is an instance of the Student schema.



Q4: What is data abstraction? Explain the three levels of data abstraction.
Solution
Data Abstraction:
Data abstraction is the process of hiding unnecessary details of data storage and representation from users.
It helps users interact with data without knowing the complexities of how data is stored or maintained.
It improves data security, simplicity, and independence.
Three Levels of Data Abstraction:

Physical Level (Lowest Level):
Describes how data is actually stored in memory or on disks.
Deals with data structures, file organization, and access paths.
Example: Storing student data in a binary file or B+ tree format.

Logical Level (Middle Level):
Describes what data is stored and the relationships among data.
It defines tables, attributes, and constraints.
Example: Student(RollNo, Name, Age, Address)

View Level (Highest Level):
Describes how data is viewed by users.
Provides different customized views for different users.
Example: A teacher may see only student names and grades, not addresses.

Summary:
Data abstraction hides complexity in layers:
Physical → Logical → View,
making database usage simple, secure, and flexible.



Q5: Classify or explain different types of data models. 
Solution
A data model defines the way data is represented, organized, and manipulated in a database system.
It provides a framework for describing the structure, constraints, and relationships of data.
Types of Data Models:

Conceptual Data Model:
Describes data at a high level of abstraction.
Focuses on how data is related and understood by users.
Example: Entity-Relationship (ER) Model — represents entities, attributes, and relationships.

Logical Data Model:
Describes how data is logically stored in the database
Defines tables, columns, data types, and relationships.
Example: Relational Model (uses tables and keys).

Physical Data Model:
Describes how data is physically stored on storage media.
Involves indexing, file structure, and access methods.

Q6: Define the terms data, database, DBMS, database system, and database catalog. 
Solution
Data: 
Raw facts or figures that have no meaning by themselves until processed or interpreted.

Database:
A structured collection of related data stored in a way that allows easy access, management, and updating.

DBMS (Database Management System):
Software that allows users to define, create, maintain, and control access to databases efficiently.

Database System:
A combination of the database, DBMS software, and the associated applications that interact with the database.

Database Catalog:
A repository within the DBMS that stores metadata (data about data), including definitions of database objects, schemas, and user privileges


Q7: Differentiate between Centralized and Client/Server Architectures for DBMS.
Solution




Q8: What are the functions of a database administrator (DBA)? 
Solution
The functions of a database administrator (DBA) are
Database Design: Plans and defines the database structure and schema to meet organizational requirements.
Security Management: Implements user access controls, authentication, and authorization to protect data.
Performance Monitoring: Monitors and tunes database performance for efficient operation.
Backup and Recovery: Ensures regular backups and develops recovery strategies in case of data loss.
Data Integrity: Enforces rules and constraints to maintain accuracy and consistency of data.
User Support: Assists users and developers in using the database effectively.
Maintenance: Updates and patches DBMS software and manages storage resources.


Unit 3: Data Modeling Using the Entity-Relational Model

Q1: Explain specialization and generalization, including constraints like the disjoint constraint. 

Solution

Specialization:

It is a top-down approach in ER modeling where a higher-level entity is divided into two or more lower-level entities based on some distinguishing attributes.

Helps in representing subclasses with specific properties.


Generalization:

It is a bottom-up approach where two or more lower-level entities are combined into a higher-level entity based on common attributes.

Reduces redundancy and simplifies the ER model.


Disjoint Constraint:

Ensures that an entity instance can belong to only one of the subclasses in specialization.

Prevents overlapping membership among subclasses.

Example:

Entity Employee can be specialized into Manager and Clerk (disjoint, since one employee cannot be both).

Entities Car and Truck can be generalized into Vehicle (combining common attributes like registration number, manufacturer).


Q2: What do you mean by entity type and entity set? Explain with an example.

Solution

Entity Type:

A collection of entities that share common attributes and have the same properties.

It defines the structure or blueprint for entities in a database.

Example:

Entity Type: Student with attributes Student_ID, Name, Age.


Entity Set:

The collection of all instances of a particular entity type at a given time.

Represents the actual data stored in the database for that entity type.


Entity Set: { (101, "Madhav", 20), (102, "Sita", 19), (103, "Ram", 21) } representing all students currently in the database.


Q3: Construct an ER diagram for an airline ticket booking system. The system should provide discounts based on the number of tickets bought, keep records of buyer visit frequency, and filter unwanted visitors. 

Solution


Q4: Construct an E-R diagram for an online course registration system where students register for courses online. 
Solution



Q5: What is a conceptual data model? Explain the different types of attributes used in an ER diagram. 
Solution
Conceptual Data Model:
A high-level representation of the structure of the database.
Focuses on entities, attributes, and relationships without considering how they are stored.
Helps in understanding and designing the database from a business perspective.

Types of Attributes in ER Diagram:
Simple/Atomic Attribute: Cannot be divided further (e.g., City).
Composite Attribute: Can be divided into sub-parts (e.g., Address → Street, City, State, Country).
Derived Attribute: Value can be derived from other attributes (e.g., Age from DOB).
Multi-valued Attribute: Can have multiple values for an entity (e.g., Phone_No).
Key Attribute: Uniquely identifies an entity (e.g., Roll_no).


Q6: Write a short note on the Weak entity type.
Solution
Weak Entity Type:
An entity that cannot be uniquely identified by its own attributes alone.
Depends on a strong (owner) entity for its identification.
Has a partial key (also called discriminator) that uniquely identifies it within the owner entity.
Represented by a double rectangle in ER diagrams.
Connected to the owner entity through an identifying relationship, shown as a double diamond.
Example: Dependent entity relies on Employee entity; Dependent_ID alone cannot identify a dependent, but (Employee_ID, Dependent_ID) can.

Q7: Explain how to reduce different types of entities and relationships of an ER diagram into tables. 
Solution
Regular (Strong) Entity:
Create a table with all its attributes.
The primary key of the entity becomes the table’s primary key.

Weak Entity:
Create a table with all its attributes.
Include the primary key of the owner entity as a foreign key.
The primary key of this table is a combination of the owner’s primary key and the weak entity’s partial key.

One-to-One (1:1) Relationship:
Include the primary key of one entity as a foreign key in the other entity’s table.
If participation is total, include the foreign key in the entity with total participation.

One-to-Many (1:N) Relationship:
Include the primary key of the ‘one’ side entity as a foreign key in the ‘many’ side entity’s table.

Many-to-Many (M:N) Relationship:
Create a separate table for the relationship.
Include the primary keys of both participating entities as foreign keys.
These foreign keys together become the primary key of the relationship table.

Multivalued Attributes:
Create a separate table for the multivalued attribute.
Include the primary key of the original entity as a foreign key.

Q8: Explain aggregation with an example. 
Solution
Aggregation is a concept in ER modeling where a relationship itself is treated as an entity to participate in another relationship.
It is used when we want to model a higher-level abstraction from relationships between entities.
Aggregation is used to simplify modeling of complex relationships and maintain clarity in ER diagrams.

Purpose:
Helps represent complex relationships without losing information.
Makes it easier to handle relationships involving other relationships.

Example:
Consider entities Project and Employee connected by the relationship Works_On.
Suppose a Department oversees a Project.
Instead of directly linking Department to both Employee and Project, we aggregate the Works_On relationship into a higher-level entity called Assignment, and then connect Department to Assignment.

Q9: What are the components of an ER diagram? Explain the function of various symbols used and construct an ER diagram for a college library. 
Solution
Entity:
Represents a real-world object or concept.
Shown as a rectangle.
Example: Student, Employee.

Attributes:
Properties or characteristics of an entity or relationship.
Shown as ovals connected to the entity.
Types: Simple, Composite, Derived, Multivalued.
Example: Name, Date_of_Birth.

Key Attribute:
Uniquely identifies an entity in an entity set.
Shown as an underlined oval.
Example: Student_ID.

Relationship:
Represents an association between entities.
Shown as a diamond.
Example: Enrolls between Student and Course.

Degree of Relationship:
Number of entities involved in a relationship.
Types: Unary (1), Binary (2), Ternary (3).

Cardinality/Participation:
Defines how many instances of an entity participate in a relationship.
Shown as min-max notation or lines with symbols.
Example: One-to-One, One-to-Many, Many-to-Many.

Weak Entity:
Cannot be identified by its attributes alone; depends on a strong entity.
Shown as a double rectangle.
Connected to strong entity via double diamond relationship.



Q10: Explain Relationship and Relationship sets with an example.
Solution
Relationship:
A relationship is an association among two or more entities in a database.
Represents how entities are related to each other in real-world scenarios.
Example: A Student enrolls in a Course. Here, Enrolls is the relationship between Student and Course.

Relationship Set:
A relationship set is a collection of similar types of relationships.
Represents all instances of a particular relationship type in the database.
Example: The set of all enrollments of students in courses forms the Enrolls relationship set.

Properties of Relationship Sets:
Degree: Number of entities involved in the relationship.
Example: Binary relationship involves 2 entities (Student and Course).
Attributes: Relationships can have their own attributes.
Example: Enrollment_Date for Enrolls.
Participation: Defines whether entity participation is total or partial.

Summary:
A relationship shows one instance of association, while a relationship set shows all instances of that relationship type, helping to model real-world interactions efficiently.




Unit 4: The Relational Data Model and Relational Database Constraints

Q1: What do you mean by referential integrity? Why is it needed?

Solution:

Referential integrity ensures that a foreign key value in one table must match a primary key value in another table or be null.

It maintains consistency and correctness of relationships between tables.

Prevents orphan records, i.e., records referencing non-existent entries in related tables.


Purpose / Importance:

Ensures data consistency across related tables.

Prevents invalid data insertion, deletion, or updates.

Helps maintain accurate relationships in the database.

Supports reliable querying and reporting by preserving data integrity.



Example:

Table Student(StudentID, Name, DeptID)

Table Department(DeptID, DeptName)

DeptID in Student must exist in Department to maintain referential integrity.


Conclusion:

Referential integrity is essential to maintain trustworthy and consistent relational data, avoiding errors caused by invalid references between tables.


Q2: Explain the fundamental characteristics of a relation and define the terms domain, attribute, tuple, and relation. 

Solution:

A relation is a table with rows and columns in a relational database.

It organizes data in a structured way for easy access, manipulation, and retrieval.

Relations follow specific rules to maintain data consistency and integrity.


Fundamental Characteristics of a Relation:

Rows are tuples: Each row represents a unique record.

Columns are attributes: Each column represents a property of the entity.

Atomic values: Each cell contains a single, indivisible value.

Unique tuples: No two rows are identical.

Order-independent: The order of rows and columns does not matter.


Domain: The set of all possible values that an attribute can take.

Example: Age attribute → domain = {1, 2, 3, …, 100}


Attribute: A named column in a table that represents a property of an entity.

Example: Name, Age, StudentID in a Student table


Tuple: A single row in a table representing a record of the entity.

Example: (101, "Hari", 20)


Relation: A table consisting of rows (tuples) and columns (attributes) that stores data in a structured way.

Example: Student(StudentID, Name, Age)


Conclusion:

These terms form the basic building blocks of a relational database, ensuring structured, consistent, and organized data storage.


Q3: Explain different types of database integrity. 

Solution:

Database integrity ensures that the data stored in a database is accurate, consistent, and reliable. It prevents invalid or inconsistent data from being entered.


Types of Database Integrity:


Entity Integrity:

Ensures that primary key values are unique and not null.

Prevents duplicate or missing records.

Example: In a Student table, StudentID (primary key) cannot be null or repeated.


Referential Integrity:

Ensures that foreign key values match primary key values in the referenced table.

Prevents orphan records.

Example: If a Course table references DeptID in the Department table, each DeptID in Course must exist in Department.


Domain Integrity:

Ensures that values of attributes belong to a predefined domain (valid data type and range).

Example: Age must be between 1 and 100.


User-Defined Integrity:

Rules defined by the user or organization to enforce business policies.

Example: Salary of an employee must be greater than minimum wage.


Conclusion:

Database integrity maintains accuracy, consistency, and reliability, ensuring that the data is trustworthy and meaningful for applications.


Q4: Explain the use of primary and foreign keys in a DBMS and the role of a foreign key.

Solution:

Keys are attributes used to uniquely identify records and maintain relationships between tables in a database.

Primary Key:

A primary key uniquely identifies each tuple (row) in a table.

Characteristics: Must be unique and not null.

Example: StudentID in Student table uniquely identifies each student.



Foreign Key:

A foreign key is an attribute in one table that references the primary key of another table.

Purpose: Maintains referential integrity between tables.

Example: DeptID in Course table references DeptID in Department table.



Role of Foreign Key:

Establishes a relationship between two tables.

Prevents orphan records (records with invalid references).

Ensures consistency and accuracy of data across related tables.

Supports cascading actions like update or delete to maintain integrity.


Conclusion:

Primary and foreign keys are essential for data integrity, uniqueness, and maintaining relationships between tables in a database.


Q5: Differentiate between Integrity and Security with an example. 

Solution:




Unit 5: The Relational Algebra and Relational Calculus

Q1: What is tuple relational calculus? Explain with an example. 

Solution

Tuple Relational Calculus (TRC) is a non-procedural query language for relational databases.

It specifies what to retrieve rather than how to retrieve it.

Queries are expressed using variables that represent tuples of a relation.

TRC is based on first-order predicate logic.

Uses tuple variables to describe tuples in a relation.

Conditions are applied to select tuples satisfying certain properties.

The result is a set of tuples from the relation that meets the condition.

Provides high-level abstraction, independent of database storage or access methods.


Example:


Let Student(SID, Name, Age, Major) be a relation.

Query: “Find all students majoring in Computer Science.”

TRC Expression:


{ t | t ∈ Student ∧ t.Major = 'Computer Science' }

Meaning: “Retrieve all tuples t from Student where Major = Computer Science.”


Summary:

TRC is a declarative language that tells what to retrieve without specifying the procedure.

It is useful for expressing queries clearly and concisely.


Q2: Explain the relational algebra natural join (*) operation with an example.

Solution

Natural Join (⋈) is a relational algebra operation that combines two relations based on common attributes.

It returns a relation containing all combinations of tuples from both relations where the common attribute values match.

It automatically eliminates duplicate columns from the result.

Natural join is used to retrieve related data from multiple tables.

Combines two relations into a single relation.

Matching is done on all attributes with the same name in both relations.

Only tuples with equal values in the common attributes are included.

Reduces redundancy by removing duplicate columns automatically.

Example:


Q3: Explain the six fundamental operations of Relational Algebra with suitable examples. 
Given the schema:
TEACHER(TID, TName), SCHOOL(SID, SName), and SCHOOL_TEACHER(SID, TID, No_of_Period), write a Relational Algebra query to retrieve the TName and No_of_period of teachers who teach in "ABC" school. 
Solution
Relational Algebra is a procedural query language used to retrieve data from relations (tables).
It provides operations to manipulate and combine relations.
The six fundamental operations are Selection, Projection, Union, Set Difference, Cartesian Product, and Rename.
These operations form the basis of all relational queries.

Fundamental Operations:

Selection (σ):
Retrieves rows (tuples) that satisfy a condition.
Example: σ Major='CS'(Student) → All students with Major CS.

Projection (π):
Retrieves specific columns (attributes) from a relation.
Example: π Name, Major(Student) → Only Name and Major of students.

Union (∪):
Combines tuples from two relations with the same attributes, removing duplicates.
Example: R ∪ S → All tuples in R or S.

Set Difference (-):
Retrieves tuples in one relation but not in another.
Example: R - S → Tuples in R but not in S.

Cartesian Product (×):
Combines every tuple of one relation with every tuple of another.
Example: Student × Department → All possible student-department pairs.

Rename (ρ):
Changes the relation or attribute names.
Example: ρ NewStudent(Name, Major)(Student) → Rename Student relation.

Relational Algebra Query for Given Schema:

Schema:
TEACHER(TID, TName)
SCHOOL(SID, SName)
SCHOOL_TEACHER(SID, TID, No_of_Period)

Query: Retrieve TName and No_of_Period of teachers who teach in "ABC" school.

Step-wise Solution:
Select the school "ABC":
σ SName='ABC'(SCHOOL)

Join with SCHOOL_TEACHER:
σ SName='ABC'(SCHOOL) ⨝ SCHOOL.SID = SCHOOL_TEACHER.SID SCHOOL_TEACHER

Join with TEACHER to get names:
(σ SName='ABC'(SCHOOL) ⨝ SCHOOL.SID = SCHOOL_TEACHER.SID SCHOOL_TEACHER) ⨝ SCHOOL_TEACHER.TID = TEACHER.TID TEACHER

Project required columns:
π TName, No_of_Period((σ SName='ABC'(SCHOOL) ⨝ SCHOOL.SID = SCHOOL_TEACHER.SID SCHOOL_TEACHER) ⨝ SCHOOL_TEACHER.TID = TEACHER.TID TEACHER)

Q5: Explain the difference between the "Join" and "Natural Join" operations with an example. 
Solution
Join is a relational algebra operation that combines tuples from two relations based on a condition.
Natural Join is a special type of join that automatically matches columns with the same name in both relations.
Both are used to merge related data from multiple tables.
Helps in retrieving meaningful information from normalized databases.







Unit 6: SQL


Q1: For the Library schema: USER(UserID, User_name), USES(UserID, BookID), BOOK(BookID, Book_name, Author, PublisherID), PUBLISHER(PublisherID, Publisher_name), write SQL and Relational Algebra queries to:
Retrieve names of all books issued to the user "Hari".
Retrieve the names of all books published by "ABC publication". 
Count the number of books published by the author "Ram". 
Solution:
1. Retrieve names of all books issued to the user "Hari"

SQL Query:


Relational Algebra:
σ_User_name='Hari'(USER) ⨝ USER.UserID=USES.UserID USES ⨝ USES.BookID=BOOK.BookID BOOK
π_Book_name (Result)

Explanation:
Select the user "Hari".
Join USER → USES → BOOK to get issued books.
Project only the Book_name.
2. Retrieve the names of all books published by "ABC publication"

SQL Query:




Relational Algebra:
σ_Publisher_name='ABC publication'(PUBLISHER) ⨝ PUBLISHER.PublisherID=BOOK.PublisherID BOOK
π_Book_name (Result)

Explanation:
Select the publisher "ABC publication".
Join with BOOK table to get books published by them.
Project Book_name.

3. Count the number of books published by the author "Ram"

SQL Query:



Relational Algebra:
σ_Author='Ram'(BOOK)
Count(BookID)

Explanation:
Select all books where Author = 'Ram'.
Count the total number of books.
Q2: For the Banking (Loan) schema: Customer(CustomerID, CustomerName, Address), Borrows(CustomerID, LoanNumber), Loan(LoanNumber, LoanType, Amount), write SQL and Relational Algebra queries to:
Display the names of customers in "Lalitpur" in ascending order.
Count the total number of customers who have a loan.
Find the names of customers with a loan amount >= 500,000.
Find the average loan amount for each loan type.
Solution:
1. Display the names of customers in "Lalitpur" in ascending order

SQL Query:



Relational Algebra:
σ_Address='Lalitpur'(Customer)
π_CustomerName (Result)
Sort(CustomerName)

Explanation:
Select customers whose Address = 'Lalitpur'.
Project only CustomerName.
Sort in ascending order.
2. Count the total number of customers who have a loan

SQL Query:




Relational Algebra:
π_CustomerID(Borrows)
Count(CustomerID)

Explanation:
Select CustomerIDs from Borrows table.
Count distinct customers who have taken a loan.

3. Find the names of customers with a loan amount >= 500,000

SQL Query:




Relational Algebra:
σ_Amount>=500000(Loan) ⨝ Loan.LoanNumber=Borrows.LoanNumber Borrows
⨝ Borrows.CustomerID=Customer.CustomerID Customer
π_CustomerName (Result)

Explanation:
Select loans with Amount ≥ 500,000.
Join with Borrows to get customers.
Project CustomerName.

4. Find the average loan amount for each loan type

SQL Query:





Relational Algebra:
γ_LoanType; AVG(Amount) (Loan)

Explanation:
Group loans by LoanType.
Compute average Amount for each group.
Q3: For the Purchase schema: Customer(Cno, Cname), Purchase(Cno, Pid), Product(Pid, Pname, price, quantity), write SQL queries to:
Find the names of all products with a price of 1000.
Find the names of customers who purchased a 'Dell Laptop'.
Find the total number of products purchased by customer 'Ram'.
Increase the price of all products by 5%.
Solution:
1. Find the names of all products with a price of 1000

SQL Query:




Explanation:
Select Pname from Product table.
Condition: price = 1000.

2. Find the names of customers who purchased a 'Dell Laptop'

SQL Query:





Explanation:

Join Customer, Purchase, and Product tables.

Condition: Product name = 'Dell Laptop'.

Select Customer names who purchased it.

3. Find the total number of products purchased by customer 'Ram'

SQL Query:




Explanation:
Join Customer and Purchase tables.
Filter Cname = 'Ram'.
Count Pid to get total products purchased.

4. Increase the price of all products by 5%

SQL Query:




Explanation:
Update Product table.
Increase price by 5% for all products.

Q4: For the Banking (Account) schema: Customer(CustomerID, CustomerName, Address), Owns(CustomerID, AccountNumber), Account(AccountNumber, AccountType, Balance), write SQL and Relational Algebra queries to:
Display the names of customers in "Kathmandu".
Count the total number of customers.
Find the names of customers with a balance >= 100,000.
Find the average balance for each account type.
Solution:
1. Display the names of customers in "Kathmandu"

SQL Query:

SELECT CustomerName
FROM Customer
WHERE Address = 'Kathmandu';

Explanation:
Select CustomerName from Customer table.
Filter by Address = 'Kathmandu'.

2. Count the total number of customers

SQL Query:

SELECT COUNT(*) AS TotalCustomers
FROM Customer;

Explanation:
Use COUNT to find the total customers in the table.

3. Find the names of customers with a balance >= 100,000

SQL Query:

SELECT C.CustomerName
FROM Customer C
JOIN Owns O ON C.CustomerID = O.CustomerID
JOIN Account A ON O.AccountNumber = A.AccountNumber
WHERE A.Balance >= 100000;


Explanation:
Join Customer, Owns, and Account tables.
Filter accounts with Balance >= 100,000.

Select CustomerName.

4. Find the average balance for each account type

SQL Query:

SELECT AccountType, AVG(Balance) AS AvgBalance
FROM Account
GROUP BY AccountType;

Explanation:
Group Account table by AccountType.
Compute average balance for each type.

Q5: Explain Assertion and Triggers with an example.
Solution:
Assertion
An assertion is a database constraint that enforces a condition on one or more tables.
Ensures data integrity and prevents invalid data from being inserted or updated.
Applies to the entire database and is checked automatically by the DBMS.
Example:

CREATE ASSERTION check_salary
CHECK (NOT EXISTS (SELECT * FROM Employee WHERE Salary < 0));

Ensures no employee has a negative salary.

Trigger
A trigger is a procedure executed automatically when a specified event occurs on a table.
Maintains business rules, audit trails, and automatic updates.
Types: BEFORE INSERT/UPDATE/DELETE and AFTER INSERT/UPDATE/DELETE.

Example:

CREATE TRIGGER update_balance
AFTER INSERT ON Transaction
FOR EACH ROW
BEGIN
    UPDATE Account
    SET Balance = Balance + NEW.Amount
    WHERE AccountNumber = NEW.AccountNumber;
END;


Automatically updates Account balance after a new transaction.

Summary
Assertions enforce global constraints.
Triggers perform automatic actions on events.
Both ensure data integrity and consistency in the database.

Q6: Given the schema
TEACHER(TID, TName), SCHOOL(SID, SName, SPhone), and SCHOOL_TEACHER(SID, TID), write an SQL query to retrieve the TName, SName, and SPhone for "ABC" school.
Solution:
Retrieve teacher names, school names, and school phone for "ABC" school

SQL Query:

SELECT T.TName, S.SName, S.SPhone
FROM TEACHER T
JOIN SCHOOL_TEACHER ST ON T.TID = ST.TID
JOIN SCHOOL S ON ST.SID = S.SID
WHERE S.SName = 'ABC';


Explanation:
Join TEACHER and SCHOOL_TEACHER on TID.
Join SCHOOL_TEACHER and SCHOOL on SID.
Filter results where SName = 'ABC'.
Select the required columns: TName, SName, SPhone.

Q7: Create two tables
Courses(CID, Course, Dept) and HoD(Dept, Head) using SQL, including all constraints (Primary key, Foreign key, and Referential Integrity).
Solution:
Create the HoD table

SQL Query:

CREATE TABLE HoD (
    Dept VARCHAR(50) PRIMARY KEY,
    Head VARCHAR(50) NOT NULL
);


Explanation:
Dept is the primary key.
Head stores the name of the head of the department.
Ensures no NULL values for Head.
Create the Courses table

SQL Query:

CREATE TABLE Courses (
    CID INT PRIMARY KEY,
    Course VARCHAR(50) NOT NULL,
    Dept VARCHAR(50),
    FOREIGN KEY (Dept) REFERENCES HoD(Dept)
        ON UPDATE CASCADE
        ON DELETE CASCADE
);


Explanation:
CID is the primary key.
Course cannot be NULL.
Dept is a foreign key referencing HoD(Dept) to maintain referential integrity.
ON UPDATE CASCADE: Changes in HoD.Dept automatically update Courses.Dept.
ON DELETE CASCADE: Deleting a department in HoD deletes related courses.

Summary:
Two tables are created with primary and foreign keys.

Unit 7: Relational Database Design

Q1: What is normalization? Why is it required? Explain 1NF, 2NF, and 3NF with examples. 
Solution:
Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity.
It ensures that each piece of data is stored only once and dependencies are properly managed.

Purpose/Need:
Eliminate data redundancy.
Avoid update, insert, and delete anomalies.
Maintain data consistency and improve query efficiency.

Advantages of Normalization
It eliminates data redundancy i.e. duplication of data.
It enhances faster sorting and index creation.
It simplifies the complex structure of tables.
It avoid loss of data through consistency and integrity.
It solves insertion, deletion and updating anomalies.
It improves the performance of a system.






Q2: What are informal design guidelines for relational schemas? 
Solution:
Informal design guidelines help in creating good relational database schemas to ensure data consistency and efficiency.

Guidelines:
Clear and meaningful attribute names:
Use simple, descriptive names for better understanding.

Avoid redundant data:
Redundancy causes update and storage anomalies.

Avoid NULL values as much as possible:
Too many NULLs waste space and complicate queries.

Use derived attributes carefully:
Avoid storing values that can be derived (e.g., Age from DOB).

Ensure lossless join decomposition:
When splitting a relation, ensure original data can be recovered.

Preserve dependencies:
Functional dependencies must remain valid after decomposition.

Q3: What is a functional dependency? Explain why we need them, their types (trivial/non-trivial), and the role of inference rules. 
Solution:
A functional dependency (FD) defines a relationship between two attributes in a relation.
If attribute A uniquely determines attribute B, it is written as A → B.

Need for Functional Dependency:
To identify primary keys.
To normalize relations and remove redundancy.
To maintain data consistency.

Types:

Trivial Dependency:
Occurs when the dependent attribute is part of the determinant.
Example: {RollNo, Name} → Name.

Non-Trivial Dependency:
Occurs when the dependent attribute is not part of the determinant.
Example: RollNo → Name.

Inference Rules (Armstrong’s Axioms):
Reflexivity: If Y ⊆ X, then X → Y.
Augmentation: If X → Y, then XZ → YZ.
Transitivity: If X → Y and Y → Z, then X → Z.

Q4: Define Boyce-Codd Normal Form (BCNF) with an example. How is it different from 3NF? 
Solution:
BCNF is a stronger version of Third Normal Form (3NF) that eliminates all redundancy based on functional dependencies.
A relation is in BCNF if, for every non-trivial FD X → Y, X is a super key.
BCNF ensures higher normalization and stronger removal of redundancy than 3NF.

Example 1: Relation in 3NF but NOT in BCNF
Scenario: Student-Course-Instructor
Relation: TEACHING(StudentID, Course, Instructor)

Functional Dependencies:
StudentID, Course → Instructor
Instructor → Course (Each instructor teaches only one course)

Candidate Keys:
{StudentID, Course}
{StudentID, Instructor}

Prime Attributes: StudentID, Course, Instructor (all are part of candidate keys)
-----------------------------------------------------------------------------------------
Check for 3NF:
FD: Instructor → Course

Left side (Instructor) is NOT a super key
But right side (Course) is a prime attribute ✓
Result: In 3NF ✓

Check for BCNF:
FD: Instructor → Course

Left side (Instructor) is NOT a super key ✗
Result: NOT in BCNF ✗





Q5: What is a normal form? Explain their types and discuss loss-less join decomposition.
Solution:
A normal form defines a level of normalization in a database to reduce redundancy and anomalies.
Each higher normal form builds upon the previous one.

Types of Normal Forms:

1NF (First Normal Form):
Atomic values; no repeating groups.

2NF (Second Normal Form):
No partial dependency.

3NF (Third Normal Form):
No transitive dependency.

BCNF (Boyce-Codd Normal Form):
Every determinant is a candidate key.

4NF (Fourth Normal Form):
No multi-valued dependencies.

5NF (Fifth Normal Form):
Removes redundancy caused by join dependencies.

Lossless Join Decomposition:
A decomposition is lossless if the original relation can be reconstructed from the decomposed tables without losing information.
Ensures that no data is lost during normalization.

Unit 8: Introduction to Transaction Processing Concepts and Theory


Q1: What is a schedule? Explain serializability and how you can test for it. 

Solution:

A schedule is the sequence of operations (read/write) from multiple transactions that shows how they are executed in a database system.

It defines the order of interleaved operations of concurrent transactions.

The goal of a schedule is to maintain consistency and isolation during concurrent execution.


Types of Schedules:


Serial Schedule:

All transactions are executed one after another, without overlapping.

Always consistent and free from conflicts.


Non-Serial Schedule:

Operations of multiple transactions are interleaved.

May cause conflicts or inconsistency if not properly controlled.


Serializability:

Serializability ensures that a non-serial schedule produces the same result as a serial schedule.

It is the main criterion for correctness in concurrent transaction execution.


Types of Serializability:


Conflict Serializability:

If a schedule can be transformed into a serial schedule by swapping non-conflicting operations, it is conflict-serializable.


Non-conflicting operations:

Read–Read on the same data item.


Conflicting operations:

Read–Write, Write–Read, or Write–Write on the same data item.


View Serializability:

Two schedules are view-equivalent if they produce the same final result and each read operation reads the same data value in both schedules.

eg video link 

Q2: Define a transaction and explain its desirable ACID properties. 
Solution:
A transaction is a logical unit of work in a database that consists of one or more operations (like read, write, update, delete) performed as a single sequence.
It transforms the database from one consistent state to another.
A transaction ensures reliable and consistent database operations.
The ACID properties guarantee that even in case of errors, failures, or concurrent access, the integrity and reliability of the database are maintained.
Example:
Transferring money from Account A to Account B involves:
Read(A)
A = A – 100
Write(A)
Read(B)
B = B + 100
Write(B)
If any step fails, the whole transaction must abort and roll back the changes.


Q3: Explain conflict-serializability with an example.

Serializability:

Serializability ensures that a non-serial schedule produces the same result as a serial schedule.

It is the main criterion for correctness in concurrent transaction execution.


Types of Serializability:


Conflict Serializability:

If a schedule can be transformed into a serial schedule by swapping non-conflicting operations, it is conflict-serializable.


Q4: Explain schedules based on recoverability and serializability.
Solution:

A schedule is the sequence of operations (read/write) from multiple transactions that shows how they are executed in a database system.

It defines the order of interleaved operations of concurrent transactions.

The goal of a schedule is to maintain consistency and isolation during concurrent execution.


Types of Schedules:


Serial Schedule:

All transactions are executed one after another, without overlapping.

Always consistent and free from conflicts.


Non-Serial Schedule:

Operations of multiple transactions are interleaved.

May cause conflicts or inconsistency if not properly controlled.


Recoverability deals with whether a database can recover to a consistent state after a transaction failure.


Types of Schedules Based on Recoverability:

1. Recoverable Schedule

A schedule is recoverable if no transaction commits until all transactions whose changes it read have committed.

Rule: If T2 reads data written by T1, then T1 must commit before T2 commits.

Example:



2. Cascadeless Schedule (Avoids Cascading Rollback)

A schedule where transactions read only committed data (no dirty reads).

Rule: If T2 reads data written by T1, T1 must commit before T2 reads.

Example:



3. Strict Schedule (Strictest)

A schedule where transactions can neither read nor write data written by uncommitted transactions.

Rule: If T1 writes X, no other transaction can read or write X until T1 commits/aborts.

Example:



Q5: Write a short note on Transaction processing.

Solution:

 A transaction is a logical unit of work in a database that consists of one or more operations (like read, write, update, delete) performed as a single sequence.

It transforms the database from one consistent state to another.

A transaction ensures reliable and consistent database operations.

The ACID properties guarantee that even in case of errors, failures, or concurrent access, the integrity and reliability of the database are maintained.

Example:

Transferring money from Account A to Account B involves:

Read(A)

A = A – 100

Write(A)

Read(B)

B = B + 100

Write(B)

If any step fails, the whole transaction must abort and roll back the changes.

A transaction in DBMS is a sequence of operations performed as a single logical unit of work.
It passes through various states during its execution to ensure data consistency and reliability.
Transaction States
A transaction can exist in five main states:
1. Active
2. Partially Committed
3. Committed
4. Failed
5. Aborted
(a) Active State
Transaction starts execution.
All operations are being performed.
Can read/write data items.
✅ Example: Executing UPDATE account SET balance = balance - 500.

(b) Partially Committed State
Transaction has executed its final statement.
Changes are still in buffer (not yet permanent).
✅ Example: After last SQL command but before COMMIT confirmation.

(c) Committed State
All operations are successfully completed.
Changes are permanently saved in the database.
✅ Example: After COMMIT command is executed.

(d) Failed State
Transaction encounters an error or system crash before completion.
Data may be inconsistent at this stage.
✅ Example: Power failure or constraint violation occurs.

(e) Aborted State
Transaction is rolled back and all changes are undone.
Database is restored to its previous consistent state.
✅ Example: Executing ROLLBACK after failure.


Unit 9: Concurrency Control Techniques

Q1: Why do we need concurrency control? Discuss the two-phase locking (2PL) protocol, including its types (basic, conservative, strict, rigorous).

solution:

Concurrency control ensures that multiple transactions can execute simultaneously without conflict or inconsistency.

It maintains the integrity and isolation of data when transactions overlap in time.

Without concurrency control, problems like lost updates, dirty reads, or inconsistent data can occur.


2. Need for Concurrency Control

Prevent Data Inconsistency: Ensures consistent database state during concurrent execution.

Avoid Conflicts: Prevents simultaneous access to the same data item.

Maintain Isolation: Each transaction should appear to execute alone.

Ensure Serializability: The final result should be the same as if transactions executed serially.


3. Two-Phase Locking (2PL) Protocol

A locking protocol that ensures serializability by dividing the transaction execution into two distinct phases.


4. Phases of 2PL


Growing Phase:

A transaction acquires locks but cannot release any.


Shrinking Phase:

A transaction releases locks but cannot acquire new ones.

➡️ Once a transaction releases its first lock, it cannot obtain any new locks.


5. Types of Two-Phase Locking

(a) Basic 2PL

Follows the two-phase rule (growing + shrinking).

Guarantees conflict serializability but may lead to deadlocks.


(b) Conservative (Static) 2PL

All locks are acquired before the transaction starts execution.

Prevents deadlocks, but may reduce concurrency.


(c) Strict 2PL

All exclusive (write) locks are held until commit or abort.

Prevents cascading rollbacks.

Ensures recoverable schedules.


(d) Rigorous 2PL

Both shared (read) and exclusive (write) locks are held until commit or abort.

Ensures strict serializability (the strongest form of isolation).


Q2: Explain deadlock with an example. Discuss different deadlock prevention protocols, deadlock detection, and starvation.

solution:

A deadlock occurs in a database system when two or more transactions are waiting indefinitely for each other to release locks.

It happens when each transaction holds a resource and waits for another resource locked by another transaction.

Deadlocks cause transactions to halt permanently, blocking system progress.


Example of Deadlock


Consider two transactions T1 and T2:



 T1 waits for B, while T2 waits for A.

Neither can proceed → Deadlock occurs.


3. Deadlock Prevention Protocols

Deadlock prevention ensures that the system never enters a deadlock state by controlling how locks are acquired.


(a) Wait-Die Scheme (Non-Preemptive)

Uses timestamps to decide which transaction waits or aborts.

If an older transaction requests a lock held by a younger one → it waits.

If a younger transaction requests a lock held by an older one → it dies (aborts and restarts).


(b) Wound-Wait Scheme (Preemptive)

If an older transaction requests a lock held by a younger one → it wounds (forces abort) the younger transaction.

If a younger transaction requests a lock held by an older one → it waits.


(c) Timeout-Based Prevention

If a transaction waits longer than a specified time, it is aborted and restarted.


(d) Resource Ordering


Assign a fixed order to all data items.

Transactions must request locks in that order to avoid circular waits.


4. Deadlock Detection


Deadlock detection allows deadlocks to occur but uses a mechanism to identify and resolve them.

The system constructs a Wait-for Graph (WFG):

Nodes represent transactions.


Edges represent “waiting for” relationships.

If the graph contains a cycle, a deadlock exists.

One transaction in the cycle is aborted to break the deadlock.


5. Deadlock Recovery

Once a deadlock is detected:

Abort one or more transactions (usually the one with least progress).

Rollback the aborted transaction.

Restart the transaction later.


6. Starvation

Starvation occurs when a transaction never gets the required resource because others are repeatedly favored.

Common in deadlock prevention or priority-based systems.

Solution: Use fair scheduling (FIFO order) to ensure every transaction eventually executes.

Q3: Explain how the timestamp ordering protocol is used for concurrency control.
solution:
Timestamp Ordering Protocol (TO Protocol) is a concurrency control method that ensures serializability of transactions based on their timestamps.
Every transaction is assigned a unique timestamp (TS) when it starts.
The protocol ensures that transactions execute in timestamp order — older transactions always get priority over newer ones.

Purpose
To maintain serializability without using locks.
To avoid deadlocks, since transactions are ordered and never wait.

Working Principle
Each data item (Q) has two timestamp values:
Read_TS(Q): The largest timestamp of a transaction that successfully read Q.
Write_TS(Q): The largest timestamp of a transaction that successfully wrote Q.
The protocol checks these values before allowing read or write operations.

Rules of Timestamp Ordering
(a) Read Operation (Read(Q))
If Transaction T wants to read Q:
If TS(T) < Write_TS(Q):
→ The read is rejected (T tries to read an outdated value).
→ T is rolled back.
Otherwise:
→ The read is allowed, and Read_TS(Q) is updated to max(Read_TS(Q), TS(T)).

(b) Write Operation (Write(Q))
If Transaction T wants to write Q:
If TS(T) < Read_TS(Q):
→ A younger transaction has already read the old value → write rejected, rollback T.
If TS(T) < Write_TS(Q):
→ A younger transaction has already written Q → write rejected, rollback T.
Otherwise:
→ The write is allowed, and Write_TS(Q) is updated to TS(T).





Q4: What is the Granularity of data items, and how does it affect concurrency control? 
solution:
The granularity of data items refers to the size or scope of the data items that are being accessed or modified in a database.
It determines whether the lock is applied to a large data unit (like a table) or a small one (like a record or field).
The choice of granularity affects system performance and concurrency.
Proper selection helps balance concurrency and overhead in transaction processing.

Key Points:

Fine Granularity:
Fine-grained data items are small, individual pieces of data that can be accessed or modified independently.
Allows high concurrency, as multiple transactions can access different parts of the same table.
Increases lock overhead, since more locks need to be managed.
Example: Locking individual rows in a table.

Coarse Granularity:
Coarse-grained data items are larger, more complex pieces of data that may be composed of multiple fine-grained data items.
Reduces lock management overhead.
Decreases concurrency, as fewer transactions can access data simultaneously.
Example: Locking an entire table during an update.

Granularity Hierarchy:
Locks can exist at multiple levels — database → table → page → record → field.
Hierarchical locking uses intention locks to manage multi-level granularity efficiently.

Effect on Concurrency Control:
Fine granularity → higher concurrency, more overhead.
Coarse granularity → lower concurrency, less overhead.
The DBMS must choose an optimal granularity based on workload and performance needs.

Example:
In a banking system, if each account record is locked individually (fine granularity), multiple users can update different accounts at once.
If the entire accounts table is locked (coarse granularity), only one transaction can update at a time.

Summary:
Granularity of data items determines the unit size of locking in a database.
It directly affects the trade-off between concurrency and system overhead in concurrency control.

Q5: Differentiate between a Binary lock and a Shared/Exclusive lock.
solution:



Unit 10: Database Recovery Techniques

Q1: Why is database recovery essential? Explain the recovery technique based on the immediate update.

Solution:

Database recovery is the process of restoring the database to a consistent state after a failure.

It ensures that all committed transactions are saved and uncommitted ones are undone.

Failures can occur due to system crashes, power loss, or transaction errors.

Recovery maintains the integrity and reliability of the database system.


Need / Importance of Database Recovery:

Ensures Consistency: Restores the database to a consistent state after failures.

Maintains Durability: Ensures that results of committed transactions are permanent.

Protects Data: Prevents loss or corruption of data.

Supports Reliability: Provides confidence that the system can recover from unexpected issues.


Immediate Update Recovery Technique:

In the Immediate Update technique, all changes made by a transaction are immediately written to the database (and log) before the transaction commits.

The Immediate Update technique allows updates before commit but relies on undo/redo logs to ensure the database returns to a consistent state after a crash.


Write-Ahead Logging (WAL):

Every update is first written to a log file before being applied to the database.

Ensures that recovery actions can be performed correctly after a crash.


Types of Actions:

UNDO: If a transaction fails before committing, its changes must be undone using the log.

REDO: If a transaction commits but the system fails before all changes are written to the database, those changes are redone from the log.


Recovery Process:

Identify transactions that were active, committed, or failed at the time of crash.

Undo changes of uncommitted transactions.

Redo changes of committed transactions using the log entries.


Advantages:

Ensures high data reliability even in case of system failure.

Changes can be recovered accurately using the log.


Disadvantages:

Requires frequent disk writes, increasing overhead.

Recovery process may take longer due to both undo and redo operations.


Example:

Suppose Transaction T1 updates Account A from ₹1000 to ₹800 and commits.

If the system crashes after the log is written but before the data is fully saved, recovery uses the log to redo T1’s update.

If T1 had not committed, its partial changes would be undone.


Q2: Why do we need database recovery? Discuss the shadow paging technique.

Solution:

Need / Importance of Database Recovery:

Ensures Consistency: Restores the database to a consistent state after failures.

Maintains Durability: Ensures that results of committed transactions are permanent.

Protects Data: Prevents loss or corruption of data.

Supports Reliability: Provides confidence that the system can recover from unexpected issues.


shadow paging technique


Shadow paging is a database recovery technique that avoids the need for logs.

It maintains two copies (pages) of the database — one current and one shadow.

The shadow copy always represents the consistent state of the database before the transaction starts.

It provides an atomic and crash-safe recovery method.

It is quite similar to the shadow copy technique in shadow copy, we make a copy of the original data, and if anything goes wrong in the database, we use the copy of the database to correct.

But in shadow paging first we change in the clone of the database, if it works then only we use it in the main database.


Q3: Explain the deferred update approach in database recovery. 

Solution:

Deferred update is a database recovery technique where updates made by a transaction are not applied to the database until the transaction commits.

All updates are temporarily stored in a log file or buffer.

It follows the principle of "do after commit" — no changes are made to the database until it is certain the transaction will complete successfully.

This approach ensures easy recovery and data consistency.

it is also known as "no update no undo".

Example:

while we are woking on the database or on the coding we do all the code and work and check either it is working or not if it's work then only we save so that the worked done by us is reflected on the database neither it won't, it quit similar example . 


Advantages:

Simple recovery — only redo is required, no need for undo.

Ensures data consistency and atomicity.

Useful when failures are frequent, as the database remains safe until commit.


Disadvantages:

Longer transaction time, since updates are delayed until commit.

Requires extra storage for maintaining logs.

Not suitable for real-time applications where immediate updates are needed.


Q4: What is Buffer Management in DBMS? Explain.

Solution:

Buffer Management in DBMS is the process of handling data pages that are temporarily stored in main memory (RAM) while being read from or written to disk.

It acts as a bridge between disk storage and main memory, improving data access speed.

A buffer pool is a reserved area in memory used to hold copies of database pages from disk.

When a query requests data, the DBMS first checks the buffer pool. If the data is found, it is called a buffer hit; otherwise, it’s a buffer miss, and the data is fetched from disk.

It helps in reducing disk I/O operations, which are slower compared to memory access.

The Buffer Manager decides which pages to keep in memory and which to replace when new pages need to be loaded.

Common page replacement algorithms include LRU (Least Recently Used), MRU (Most Recently Used), and the Clock Algorithm.

It ensures data consistency by managing dirty pages (pages that have been modified in memory but not yet written to disk).

Efficient buffer management leads to better system performance, faster query execution, and optimized resource usage.




Q5: What are Checkpoints in database recovery? How do they help? 

Solution:

A checkpoint is a mechanism in database recovery that records the current state of the database and transaction log at a specific point in time.

It helps the system reduce recovery time after a failure.

Checkpoints act as a snapshot of the database’s consistent state.

They are created periodically by the DBMS during normal operation.


Purpose of Checkpoints:

To minimize the amount of work needed during recovery.

To mark a safe point from which the system can restart after a crash.


Recovery Using Checkpoints:

During recovery, the system starts from the last checkpoint instead of the beginning of the log.

Only transactions that started after the last checkpoint are checked for undo/redo operations.


Example:

Suppose a checkpoint is created at 10:00 AM.

If a crash occurs at 10:05 AM, recovery starts from the 10:00 AM checkpoint, not from the beginning of the log — saving time and effort.


Q6: What are the different approaches to Database recovery? What should a log file maintain in log-based recovery?

Solution:

Database recovery is the process of restoring the database to a consistent state after a failure.

It ensures that committed transactions are saved and uncommitted ones are undone.

Different recovery approaches use logs, checkpoints, and copies to maintain data integrity.

The choice of approach depends on how and when updates are applied to the database.


1. Deferred Update Approach:

Updates are not applied to the database until the transaction commits.

All updates are stored in the log file first.

On commit → updates are written to the database.

On failure before commit → no undo is required (since the database wasn’t changed).

Only redo is needed for committed transactions.


2. Immediate Update Approach:

Updates are applied to the database immediately, even before the transaction commits.

A log entry is written before updating the database (Write-Ahead Logging). 

On failure →

Undo uncommitted transactions.

Redo committed transactions.

Ensures durability and consistency, but increases overhead.


3. Shadow Paging:

Maintains two copies of the database: a shadow copy (stable) and a current copy (active).

All updates are made to the current copy.

On commit → the current copy becomes the new shadow copy.

On failure → the shadow copy is used for recovery (no undo/redo needed).

Fast recovery, but requires extra storage.


4. Checkpoint-Based Recovery:

Periodically saves a snapshot (checkpoint) of the database and transaction log.

On failure → recovery starts from the last checkpoint instead of the beginning of the log.

Reduces recovery time significantly.


Log-Based Recovery

In log-based recovery, every change made to the database is recorded in a log file before it is applied to the database.

The log file helps perform undo and redo operations during recovery.


Log File Should Maintain:


Transaction Identifier (TID):

Unique ID for each transaction to track its operations.


Type of Operation:

Specifies the action, e.g., START TRANSACTION, UPDATE, COMMIT, or ABORT.


Data Item (Object):

The name or address of the data item being modified.


Old Value (Before Image):

The value of the data item before the update (used for undo).


New Value (After Image):

The value of the data item after the update (used for redo).


Timestamps:

Records the time of each operation for sequencing during recovery.

Tags

Post a Comment

0Comments

Post a Comment (0)