Monday, August 10, 2020

Producing Inconsistent Read Phantoms for Illustration Purposes Only, then Considering the Real-world Implications


The default transaction isolation level in Oracle (and Snowflake, PostgresSQL, and many other databases) is statement-level read consistency.  

Two cases-in-point:
(1) a view referencing a function containing a 2nd sql statement is not by default read consistent. 
(2) a PLSQL process with two cursors is not by default read-consistent across cursors.

Let's try to fashion the simplest possible example to demonstrate. This example is for Oracle, but should be applicable to other targets without too much fuss.

Setup

-- cleanup prior
drop table person_sal;
drop table person;
drop function get_salary_amt;
drop view person_view;

-- create
create table person(
    person_no number,
    person_name varchar2(10),
    job_title varchar2(10),
    primary key(person_no)
);
create table person_sal(
    person_no number,
    salary_amt number,
    primary key (person_no),
    foreign key(person_no) references person(person_no)
);
create function get_salary_amt(n in number) return number is
    x number;
begin
    select salary_amt into x from person_sal where person_no = n;
    return x;
end;
/
create or replace view person_view as (
    select
        person_no,
        person_name,
        job_title,
        get_salary_amt(person_no) salary_amt
    from person
);

Test #1

A view referencing a function containing a SQL statement is not by default read-consistent.

-- t0 (initialize test data)
delete from person_sal;
delete from person;
insert into person(person_no, person_name, job_title)
    values (101, 'Sue','Trainee');
insert into person_sal(person_no, salary_amt)
    values (101, 17000);
commit;
select * from person_view;

-- session 1 (long-running process with read consistency problems)
set serveroutput on size 999999;
declare
    cursor cur1 is select * from person_view;
    r cur1%rowtype;
begin
    open cur1;
    dbms_lock.sleep(10);
    fetch cur1 into r;
    dbms_output.put_line(
        r.person_no||' '||
        r.person_name||' '||
        r.job_title||' '||
        r.salary_amt);
    close cur1;
end;
/

-- session 2 t2 (2nd process that changes data while 1st process is running)
-- Execute this in a 2nd sqlplus session during the 10 seconds the above
-- process is sleeping.
update person set job_title = 'CEO';
update person_sal set salary_amt = 250000;
commit;

-- tx (final state of data after both processes have ended)
select * from person_view;

 

Test #1 Sample Output

-- t0 (Sue is a trainee with a low salary)
 PERSON_NO PERSON_NAM JOB_TITLE  SALARY_AMT
---------- ---------- ---------- ----------
       101 Sue          Trainee          17000

-- session 1 t1
--  Process 1 starts executing...
--  nothing to show yet

-- session 2 t2
--  Sue gets promoted to ceo with a much higher salary.
--  This happens while the session 1 long-running process is fetching
--  Sue's details.
1 row updated.
1 row updated.
Commit complete.

-- session 1 t3
-- Process 1 finishes and outputs: Note, that the output is a mix of two
--  different points in time.  job title is not yet updated, but salary is.
--  This is commonly called a "phantom".  The only two valid consistent states
--  for sue's details are: 101 Sue    Trainee    17000 (before the promotion) 101
--  Sue CEO 250000 (after the promotion) BAD!
101 Sue Trainee 250000


-- Why does this break? The view references a function containing a 2nd sql
-- statement. The 2nd sql statement is not read-consistent with the 1st (the
-- view sql).

-- either session final state:
 PERSON_NO PERSON_NAM JOB_TITLE  SALARY_AMT
---------- ---------- ---------- ----------
       101 Sue          CEO         250000


Test #1: How to correct?

-- You use the SET TRANSACTION statement to begin a read-only or read-write
-- transaction, establish an isolation level, or assign your current transaction
-- to a specified rollback segment. Read-only transactions are useful for running
-- multiple queries against one or more tables while other users update the same
-- tables.

-- During a read-only transaction, all queries refer to the same snapshot of the
-- database, providing a multi-table, multi-query, read-consistent view. Other
-- users can continue to query or update data as usual.

--  corrected with explicit transaction control
set serveroutput on size 999999;
declare
    cursor cur1 is select * from person_view;
    r cur1%rowtype;
begin
    set transaction read only name 'fetch people';
    open cur1;
    dbms_lock.sleep(10);
    fetch cur1 into r;
    dbms_output.put_line(
        r.person_no||' '||
        r.person_name||' '||
        r.job_title||' '||
        r.salary_amt);
    close cur1;
    commit;
end;
/

-- another way is to rewrite the view to join the two tables in pure sql. This
-- will eliminate the function and it's embed 2nd sql statement.

-- corrected with rewritten view
create or replace view person_view as (
    select
        p.person_no,
        p.person_name,
        p.job_title,
        ps.salary_amt
    from person p
        inner join person_sal ps
            on p.person_no = ps.person_no
);

-- There are other ways to correct as well. This is not an exhaustive treatment.

Test #2

A PL/SQL process with two cursors is not by default read-consistent.

-- the rest is same as above (including the BAD output)

-- session 1 (long-running process with read consistency problems)
set serveroutput on size 999999;
declare
    cursor cur1 is select * from person;
    r1 cur1%rowtype;
    cursor cur2 is select * from person_sal;
    r2 cur2%rowtype;
begin
    open cur1;
    fetch cur1 into r1;
    dbms_lock.sleep(10);
    open cur2;
    fetch cur2 into r2;
    dbms_output.put_line(
        r1.person_no||' '||
        r1.person_name||' '||
        r1.job_title||' '||
        r2.salary_amt);
    close cur2;
    close cur1;
end;
/

I will leave example output and the correction as an exercise to the reader.

NOW CONSIDER THIS


What if the "problem" process ran 3 hours instead of a few seconds?  What if there were not just two, but 18 statements without read consistency in the long-running process?  What if the long-running process with read-consistency issues was processing not just one, but 65 thousand  records?  What if there were not one, but over 20 of these long-running processes?  What if you knew that at least 2 outbound interfaces were doing this and you suspect others are the same?  what if you spot-checked a few of many views, found several that do this, expect most or all do?

I have found this problem in sooooooo many systems. Sometimes, it feels like it can be found in every system if I'm willing to look carefully enough.

Do you feel safe?



No comments:

Post a Comment