andre/UoL

Fork 0

Files

levdoescode 53ac6f4af0 Week 9 notes completed

2023-03-06 01:40:58 -05:00

4.3 KiB

Raw Permalink Blame History

Advanced RDBMS topics

Speeding things up

Using binary search on a sorted table:

Average case - O(log n)
Worst case - O(log n)
As fast as tree indexes
Use no extra space
Can only sort in one order (column)

Using indexes

Usually one of:

B-tree
Hash table

May also be spatial/geometric

May be held in memory, because it's much smaller than the table.

May optimize for disk structure

B-trees (balanced trees)

O(log n) in typical and worst case
Supports ranges (because we know the limits)
Support (some) approximate searching

Hash tables

O(1) in typical case
Don't scale with the size of the data
O(n) in (unusual) worst case
Hash algorithm may be expensive
Depending on the size of the dataset, B-trees may be fast enough even with worst O's
No approximation
No range-based retrieval - Things that are similar are placed in different places throughout the hash table (locality, sensitive hashing)

Query efficiency

The order of the statements in the query is important. This is normally done automatically by the DBMS.

Optimizing actions

Indexes and sorted tables can save on copying, searching and sorting.

Query strategy has a huge impact

Order of operations
Use of indexes
Making fresh indexes
Copying or reading data

How do we optimize? Let's see an example.

SELECT *
FROM MOvieLocation,
     Actors
WHERE Actor1=Name;

We can use the keyword EXPLAIN:

SELECT *
FROM MovieLocation,
     Actors
WHERE Actor1=Name;

Tells us which operations on which tables and on which order.

MovieLocations select_type: SIMPLE possible_keys: NULL (Actor Name is not the complete Key) rows: 1819 filtered: 100 (100% of the data needs to be looked at)
Actors select_type: SIMPLE possible_keys: NULL rows: 39175 filtered: 10 Extra: Using where; Using hash join

About 50ms

Making a change, creating an index on the MovieLocations table. How does it change things?

CREATE INDEX LocAct1
    ON  MovieLocations(Actor1);

Running EXPLAIN again:

Actors select_type: SIMPLE possible_keys: NULL rows: 39175 filtered: 100 Extra: Using where
MovieLocations select_type: SIMPLE possible_keys: LocAct1 rows: 7 filtered: 100 Extra: NULL

About 60ms

It goes to the Actors table first and only 7 rows.

Let's add an index to the Actors table:

CREATE INDEX ActorNames
    ON  Actors(Name);

Now runnign EXPLAIN

EXPLAIN SELECT *
FROM  MovieLocations,
      Actors
WHERE Actor1=Name;

It reverts to the original order:

MovieLocations select_type: SIMPLE possible_keys: LocAct1 rows: 1819 filtered: 100 Extra: Using where
Actors select_type: simple possible_keys: ActorNames rows: 1819 filtered: 100 Extra: NULL

About 5ms

There is also a EXPLAIN ANALYZE version of the command.

Removing the safety net - Denormalization

What is normalizaiton bias?

Normalization:

Can reduce disk reads
Can reduce integrity checks
Reduces storage use
Increases use of joins

We can denormalize Denormalization:

Merge tables to reduce joins
Effectively caches a joined SELECT
Reduces use of joins
Only sometimes faster

Example of denormalized table

CREATE TABLE MovieActors
(PRIMARY KEY
    (Title, Year, Location))
AS
SELECT Title, Year, Location, Name,
        DoB, Gender
FROM MovieLocations
     LEFT JOIN
     Actors
     ON Actor1=Name

Risky for very dynamic data. Duplicated data takes time and is risky.

For these issues, we can create a view:

Alternative cache:

CREATE VIEW SFMoviesAndActors
AS
SELECT Title, Year, Location, Name, Dob, Gender
FROM MovieLocations
    LEFT JOIN
    Actors
    ON Actor1=Name;

The views are dynamic, making queries on the view also expensive

Caching views

SQL standard: SNAPSHOW
aka Embovied or Materialized Views, are saved views.
Can update when data changes
We can treat the view as a table, add rows, etc. Set permissions to the view. When the parent table is update, it regenerates the view.
Not implemented in MySQL

Denormalizing

Databases are fast
Keyed based joins are optimized
Evaluate speed and efficiency before trying to optmize.

Conclusions

Relations Databases are:

powerful
fast
grounded in theory
adaptable
mature
based on standards(often)
best with tabular data
initially confusing

4.3 KiB Raw Permalink Blame History