Monday, August 10, 2020

Producing Inconsistent Read Phantoms for Illustration Purposes Only, then Considering the Real-world Implications


The default transaction isolation level in Oracle (and Snowflake, PostgresSQL, and many other databases) is statement-level read consistency.  

Two cases-in-point:
(1) a view referencing a function containing a 2nd sql statement is not by default read consistent. 
(2) a PLSQL process with two cursors is not by default read-consistent across cursors.

Let's try to fashion the simplest possible example to demonstrate. This example is for Oracle, but should be applicable to other targets without too much fuss.

Setup

-- cleanup prior
drop table person_sal;
drop table person;
drop function get_salary_amt;
drop view person_view;

-- create
create table person(
    person_no number,
    person_name varchar2(10),
    job_title varchar2(10),
    primary key(person_no)
);
create table person_sal(
    person_no number,
    salary_amt number,
    primary key (person_no),
    foreign key(person_no) references person(person_no)
);
create function get_salary_amt(n in number) return number is
    x number;
begin
    select salary_amt into x from person_sal where person_no = n;
    return x;
end;
/
create or replace view person_view as (
    select
        person_no,
        person_name,
        job_title,
        get_salary_amt(person_no) salary_amt
    from person
);

Test #1

A view referencing a function containing a SQL statement is not by default read-consistent.

-- t0 (initialize test data)
delete from person_sal;
delete from person;
insert into person(person_no, person_name, job_title)
    values (101, 'Sue','Trainee');
insert into person_sal(person_no, salary_amt)
    values (101, 17000);
commit;
select * from person_view;

-- session 1 (long-running process with read consistency problems)
set serveroutput on size 999999;
declare
    cursor cur1 is select * from person_view;
    r cur1%rowtype;
begin
    open cur1;
    dbms_lock.sleep(10);
    fetch cur1 into r;
    dbms_output.put_line(
        r.person_no||' '||
        r.person_name||' '||
        r.job_title||' '||
        r.salary_amt);
    close cur1;
end;
/

-- session 2 t2 (2nd process that changes data while 1st process is running)
-- Execute this in a 2nd sqlplus session during the 10 seconds the above
-- process is sleeping.
update person set job_title = 'CEO';
update person_sal set salary_amt = 250000;
commit;

-- tx (final state of data after both processes have ended)
select * from person_view;

 

Test #1 Sample Output

-- t0 (Sue is a trainee with a low salary)
 PERSON_NO PERSON_NAM JOB_TITLE  SALARY_AMT
---------- ---------- ---------- ----------
       101 Sue          Trainee          17000

-- session 1 t1
--  Process 1 starts executing...
--  nothing to show yet

-- session 2 t2
--  Sue gets promoted to ceo with a much higher salary.
--  This happens while the session 1 long-running process is fetching
--  Sue's details.
1 row updated.
1 row updated.
Commit complete.

-- session 1 t3
-- Process 1 finishes and outputs: Note, that the output is a mix of two
--  different points in time.  job title is not yet updated, but salary is.
--  This is commonly called a "phantom".  The only two valid consistent states
--  for sue's details are: 101 Sue    Trainee    17000 (before the promotion) 101
--  Sue CEO 250000 (after the promotion) BAD!
101 Sue Trainee 250000


-- Why does this break? The view references a function containing a 2nd sql
-- statement. The 2nd sql statement is not read-consistent with the 1st (the
-- view sql).

-- either session final state:
 PERSON_NO PERSON_NAM JOB_TITLE  SALARY_AMT
---------- ---------- ---------- ----------
       101 Sue          CEO         250000


Test #1: How to correct?

-- You use the SET TRANSACTION statement to begin a read-only or read-write
-- transaction, establish an isolation level, or assign your current transaction
-- to a specified rollback segment. Read-only transactions are useful for running
-- multiple queries against one or more tables while other users update the same
-- tables.

-- During a read-only transaction, all queries refer to the same snapshot of the
-- database, providing a multi-table, multi-query, read-consistent view. Other
-- users can continue to query or update data as usual.

--  corrected with explicit transaction control
set serveroutput on size 999999;
declare
    cursor cur1 is select * from person_view;
    r cur1%rowtype;
begin
    set transaction read only name 'fetch people';
    open cur1;
    dbms_lock.sleep(10);
    fetch cur1 into r;
    dbms_output.put_line(
        r.person_no||' '||
        r.person_name||' '||
        r.job_title||' '||
        r.salary_amt);
    close cur1;
    commit;
end;
/

-- another way is to rewrite the view to join the two tables in pure sql. This
-- will eliminate the function and it's embed 2nd sql statement.

-- corrected with rewritten view
create or replace view person_view as (
    select
        p.person_no,
        p.person_name,
        p.job_title,
        ps.salary_amt
    from person p
        inner join person_sal ps
            on p.person_no = ps.person_no
);

-- There are other ways to correct as well. This is not an exhaustive treatment.

Test #2

A PL/SQL process with two cursors is not by default read-consistent.

-- the rest is same as above (including the BAD output)

-- session 1 (long-running process with read consistency problems)
set serveroutput on size 999999;
declare
    cursor cur1 is select * from person;
    r1 cur1%rowtype;
    cursor cur2 is select * from person_sal;
    r2 cur2%rowtype;
begin
    open cur1;
    fetch cur1 into r1;
    dbms_lock.sleep(10);
    open cur2;
    fetch cur2 into r2;
    dbms_output.put_line(
        r1.person_no||' '||
        r1.person_name||' '||
        r1.job_title||' '||
        r2.salary_amt);
    close cur2;
    close cur1;
end;
/

I will leave example output and the correction as an exercise to the reader.

NOW CONSIDER THIS


What if the "problem" process ran 3 hours instead of a few seconds?  What if there were not just two, but 18 statements without read consistency in the long-running process?  What if the long-running process with read-consistency issues was processing not just one, but 65 thousand  records?  What if there were not one, but over 20 of these long-running processes?  What if you knew that at least 2 outbound interfaces were doing this and you suspect others are the same?  what if you spot-checked a few of many views, found several that do this, expect most or all do?

I have found this problem in sooooooo many systems. Sometimes, it feels like it can be found in every system if I'm willing to look carefully enough.

Do you feel safe?