Modern data systems are distributed, analytical and security-sensitive. PSC questions may ask distributed DBMS transparency, fragmentation/replication, two-phase commit, OLTP vs OLAP, data warehouse schemas, mining tasks and database security controls.

Engineering Definitions

Distributed database

Standard definition: A logically related database stored across multiple networked sites.

Exam meaning: Logical एउटै database तर data multiple sites मा राखिएको system।

Data warehouse

Standard definition: A subject-oriented, integrated, time-variant and non-volatile collection of data for decision support.

Exam meaning: Decision support/analysis का लागि historical integrated data store।

Data mining

Standard definition: The process of discovering useful patterns, relationships or knowledge from large datasets.

Exam meaning: ठूलो data बाट useful pattern/knowledge निकाल्ने process।

Database security

Standard definition: Controls that protect database confidentiality, integrity and availability.

Exam meaning: Database data र service लाई unauthorized access/change/failure बाट बचाउने controls।

Concept Teaching

Distributed databases improve availability and locality but introduce consistency and coordination issues. Data warehouses separate analytics from transaction systems. Data mining extracts patterns, while security ensures only correct users and operations are allowed.

Distributed DBMS Concepts

Distribution should be hidden where possible.

  • Location transparency hides where data is stored.
  • Replication transparency hides multiple copies.
  • Fragmentation transparency hides data partitioning.
  • Horizontal fragmentation splits rows.
  • Vertical fragmentation splits columns.
  • Replication improves availability but creates consistency challenges.

Commit and Replication

Distributed transactions need agreement.

  • Two-phase commit has prepare and commit/abort phases.
  • Coordinator asks participants to vote.
  • If all vote yes, coordinator commits; otherwise aborts.
  • 2PC can block if coordinator fails at critical time.
  • Synchronous replication gives stronger consistency with higher latency.
  • Asynchronous replication is faster but can lag.

Warehouse and OLAP

Warehouse systems support analysis rather than day-to-day transactions.

Concept Meaning Exam point
OLTP Operational transaction processing Current frequent updates
OLAP Analytical processing Aggregations and historical analysis
Star schema Fact table with dimension tables Simple warehouse schema
Snowflake schema Normalized dimensions Less redundancy, more joins
ETL Extract, transform, load Data warehouse pipeline

Data Mining Tasks

Mining tasks differ by goal.

Task Goal Example
Classification Predict class label Spam/not spam
Clustering Group similar records Customer segments
Association rule Find co-occurring items Market basket
Regression Predict numeric value Sales forecast
Anomaly detection Find unusual cases Fraud detection

Database Security

Database security is layered.

  • Authentication verifies identity.
  • Authorization controls permissions.
  • Roles simplify privilege management.
  • Views can limit data exposure.
  • Encryption protects data at rest/in transit.
  • Auditing records access and changes.
  • SQL injection is prevented using parameterized queries and input handling.

Engineering Mechanism

  • Distribute data using fragmentation/replication.
  • Route queries to relevant sites.
  • Coordinate distributed transaction using commit protocol.
  • Extract operational data to warehouse through ETL.
  • Analyze data with OLAP/mining methods.
  • Enforce security through authentication, authorization, encryption and audit.

Diagrams / Models To Draw

  • Draw distributed DB sites with replicated fragments.
  • Draw two-phase commit coordinator/participants.
  • Draw star schema fact and dimensions.
  • Draw ETL pipeline.
  • Draw role-based access control.

Formulas, Algorithms and Rules

  • 2PC phases = prepare/vote + commit/abort.
  • Warehouse properties = subject-oriented + integrated + time-variant + non-volatile.
  • Association rule metrics include support and confidence.
  • Security goals = confidentiality + integrity + availability.
Concept Purpose Tradeoff
Fragmentation Locality/parallelism Query complexity
Replication Availability/read performance Consistency overhead
2PC Atomic distributed commit Blocking risk
Warehouse Decision support ETL/storage cost
Mining Pattern discovery Privacy/false patterns
Security Protect data Usability/admin overhead

Exam Point

  • Differentiate OLTP and OLAP clearly.
  • Data warehouse definition is high-yield.
  • 2PC can block.
  • Replication is not same as backup.
  • SQL injection prevention is parameterized queries.
  • Mining task examples help subjective answers.

Worked Example

A national tax system may keep taxpayer data fragments near provinces but replicate summary records centrally. Distributed queries combine fragments, while a data warehouse stores historical collections for revenue trend analysis.

Subjective Answer Pattern

  • Define distributed database.
  • Explain fragmentation/replication/transparency.
  • Describe two-phase commit.
  • Explain warehouse and OLAP.
  • List data mining tasks.
  • Discuss security controls.

Common Engineering Mistakes

  • Confusing data warehouse with operational database.
  • Saying clustering needs class labels.
  • Ignoring 2PC blocking.
  • Treating replication as automatic consistency.
  • Only mentioning passwords for security.

MCQ Revision

  • What are two phases of 2PC?
  • OLAP is used for what?
  • Star schema contains what central table?
  • Classification predicts what?
  • What prevents SQL injection?
  • What is horizontal fragmentation?

Final Summary

  • Distributed DBMS stores data across sites with transparency goals.
  • Replication and fragmentation improve performance/availability but add complexity.
  • Warehouses support analytical decisions.
  • Mining discovers patterns.
  • Security protects confidentiality, integrity and availability.