Modern data systems are distributed, analytical and security-sensitive. PSC questions may ask distributed DBMS transparency, fragmentation/replication, two-phase commit, OLTP vs OLAP, data warehouse schemas, mining tasks and database security controls.
Engineering Definitions
Distributed database
Standard definition: A logically related database stored across multiple networked sites.
Exam meaning: Logical एउटै database तर data multiple sites मा राखिएको system।
Data warehouse
Standard definition: A subject-oriented, integrated, time-variant and non-volatile collection of data for decision support.
Exam meaning: Decision support/analysis का लागि historical integrated data store।
Data mining
Standard definition: The process of discovering useful patterns, relationships or knowledge from large datasets.
Exam meaning: ठूलो data बाट useful pattern/knowledge निकाल्ने process।
Database security
Standard definition: Controls that protect database confidentiality, integrity and availability.
Exam meaning: Database data र service लाई unauthorized access/change/failure बाट बचाउने controls।
Concept Teaching
Distributed databases improve availability and locality but introduce consistency and coordination issues. Data warehouses separate analytics from transaction systems. Data mining extracts patterns, while security ensures only correct users and operations are allowed.
Distributed DBMS Concepts
Distribution should be hidden where possible.
- Location transparency hides where data is stored.
- Replication transparency hides multiple copies.
- Fragmentation transparency hides data partitioning.
- Horizontal fragmentation splits rows.
- Vertical fragmentation splits columns.
- Replication improves availability but creates consistency challenges.
Commit and Replication
Distributed transactions need agreement.
- Two-phase commit has prepare and commit/abort phases.
- Coordinator asks participants to vote.
- If all vote yes, coordinator commits; otherwise aborts.
- 2PC can block if coordinator fails at critical time.
- Synchronous replication gives stronger consistency with higher latency.
- Asynchronous replication is faster but can lag.
Warehouse and OLAP
Warehouse systems support analysis rather than day-to-day transactions.
| Concept | Meaning | Exam point |
|---|---|---|
| OLTP | Operational transaction processing | Current frequent updates |
| OLAP | Analytical processing | Aggregations and historical analysis |
| Star schema | Fact table with dimension tables | Simple warehouse schema |
| Snowflake schema | Normalized dimensions | Less redundancy, more joins |
| ETL | Extract, transform, load | Data warehouse pipeline |
Data Mining Tasks
Mining tasks differ by goal.
| Task | Goal | Example |
|---|---|---|
| Classification | Predict class label | Spam/not spam |
| Clustering | Group similar records | Customer segments |
| Association rule | Find co-occurring items | Market basket |
| Regression | Predict numeric value | Sales forecast |
| Anomaly detection | Find unusual cases | Fraud detection |
Database Security
Database security is layered.
- Authentication verifies identity.
- Authorization controls permissions.
- Roles simplify privilege management.
- Views can limit data exposure.
- Encryption protects data at rest/in transit.
- Auditing records access and changes.
- SQL injection is prevented using parameterized queries and input handling.
Engineering Mechanism
- Distribute data using fragmentation/replication.
- Route queries to relevant sites.
- Coordinate distributed transaction using commit protocol.
- Extract operational data to warehouse through ETL.
- Analyze data with OLAP/mining methods.
- Enforce security through authentication, authorization, encryption and audit.
Diagrams / Models To Draw
- Draw distributed DB sites with replicated fragments.
- Draw two-phase commit coordinator/participants.
- Draw star schema fact and dimensions.
- Draw ETL pipeline.
- Draw role-based access control.
Formulas, Algorithms and Rules
- 2PC phases = prepare/vote + commit/abort.
- Warehouse properties = subject-oriented + integrated + time-variant + non-volatile.
- Association rule metrics include support and confidence.
- Security goals = confidentiality + integrity + availability.
| Concept | Purpose | Tradeoff |
|---|---|---|
| Fragmentation | Locality/parallelism | Query complexity |
| Replication | Availability/read performance | Consistency overhead |
| 2PC | Atomic distributed commit | Blocking risk |
| Warehouse | Decision support | ETL/storage cost |
| Mining | Pattern discovery | Privacy/false patterns |
| Security | Protect data | Usability/admin overhead |
Exam Point
- Differentiate OLTP and OLAP clearly.
- Data warehouse definition is high-yield.
- 2PC can block.
- Replication is not same as backup.
- SQL injection prevention is parameterized queries.
- Mining task examples help subjective answers.
Worked Example
A national tax system may keep taxpayer data fragments near provinces but replicate summary records centrally. Distributed queries combine fragments, while a data warehouse stores historical collections for revenue trend analysis.
Subjective Answer Pattern
- Define distributed database.
- Explain fragmentation/replication/transparency.
- Describe two-phase commit.
- Explain warehouse and OLAP.
- List data mining tasks.
- Discuss security controls.
Common Engineering Mistakes
- Confusing data warehouse with operational database.
- Saying clustering needs class labels.
- Ignoring 2PC blocking.
- Treating replication as automatic consistency.
- Only mentioning passwords for security.
MCQ Revision
- What are two phases of 2PC?
- OLAP is used for what?
- Star schema contains what central table?
- Classification predicts what?
- What prevents SQL injection?
- What is horizontal fragmentation?
Final Summary
- Distributed DBMS stores data across sites with transparency goals.
- Replication and fragmentation improve performance/availability but add complexity.
- Warehouses support analytical decisions.
- Mining discovers patterns.
- Security protects confidentiality, integrity and availability.