Monday, August 10, 2020

Producing Inconsistent Read Phantoms for Illustration Purposes Only, then Considering the Real-world Implications

The default transaction isolation level in Oracle (and Snowflake, PostgresSQL, and many other databases) is statement-level read consistency.

Two cases-in-point:
(1) a view referencing a function containing a 2nd sql statement is not by default read consistent.
(2) a PLSQL process with two cursors is not by default read-consistent across cursors.

Let's try to fashion the simplest possible example to demonstrate. This example is for Oracle, but should be applicable to other targets without too much fuss.

Setup

-- cleanup prior
drop table person_sal;
drop table person;
drop function get_salary_amt;
drop view person_view;

-- create
create table person(
    person_no number,
    person_name varchar2(10),
    job_title varchar2(10),
    primary key(person_no)
);
create table person_sal(
    person_no number,
    salary_amt number,
    primary key (person_no),
    foreign key(person_no) references person(person_no)
);
create function get_salary_amt(n in number) return number is
    x number;
begin
    select salary_amt into x from person_sal where person_no = n;
    return x;
end;
/
create or replace view person_view as (
    select
        person_no,
        person_name,
        job_title,
        get_salary_amt(person_no) salary_amt
    from person
);

Test #1

A view referencing a function containing a SQL statement is not by default read-consistent.

-- t0 (initialize test data)
delete from person_sal;
delete from person;
insert into person(person_no, person_name, job_title)
    values (101, 'Sue','Trainee');
insert into person_sal(person_no, salary_amt)
    values (101, 17000);
commit;
select * from person_view;

-- session 1 (long-running process with read consistency problems)
set serveroutput on size 999999;
declare
    cursor cur1 is select * from person_view;
    r cur1%rowtype;
begin
    open cur1;
    dbms_lock.sleep(10);
    fetch cur1 into r;
    dbms_output.put_line(
        r.person_no||' '||
        r.person_name||' '||
        r.job_title||' '||
        r.salary_amt);
    close cur1;
end;
/

-- session 2 t2 (2nd process that changes data while 1st process is running)
-- Execute this in a 2nd sqlplus session during the 10 seconds the above
-- process is sleeping.
update person set job_title = 'CEO';
update person_sal set salary_amt = 250000;
commit;

-- tx (final state of data after both processes have ended)
select * from person_view;

Test #1 Sample Output

-- t0 (Sue is a trainee with a low salary)
PERSON_NO PERSON_NAM JOB_TITLE SALARY_AMT
---------- ---------- ---------- ----------
       101 Sue          Trainee          17000

-- session 1 t1
-- Process 1 starts executing...
-- nothing to show yet

-- session 2 t2
-- Sue gets promoted to ceo with a much higher salary.
-- This happens while the session 1 long-running process is fetching
-- Sue's details.
1 row updated.
1 row updated.
Commit complete.

-- session 1 t3
-- Process 1 finishes and outputs: Note, that the output is a mix of two
-- different points in time. job title is not yet updated, but salary is.
-- This is commonly called a "phantom". The only two valid consistent states
-- for sue's details are: 101 Sue    Trainee    17000 (before the promotion) 101
-- Sue CEO 250000 (after the promotion) BAD!
101 Sue Trainee 250000

-- Why does this break? The view references a function containing a 2nd sql
-- statement. The 2nd sql statement is not read-consistent with the 1st (the
-- view sql).

-- either session final state:
PERSON_NO PERSON_NAM JOB_TITLE SALARY_AMT
---------- ---------- ---------- ----------
       101 Sue          CEO         250000

Test #1: How to correct?

-- You use the SET TRANSACTION statement to begin a read-only or read-write
-- transaction, establish an isolation level, or assign your current transaction
-- to a specified rollback segment. Read-only transactions are useful for running
-- multiple queries against one or more tables while other users update the same
-- tables.

-- During a read-only transaction, all queries refer to the same snapshot of the
-- database, providing a multi-table, multi-query, read-consistent view. Other
-- users can continue to query or update data as usual.

-- corrected with explicit transaction control
set serveroutput on size 999999;
declare
    cursor cur1 is select * from person_view;
    r cur1%rowtype;
begin
    set transaction read only name 'fetch people';
    open cur1;
    dbms_lock.sleep(10);
    fetch cur1 into r;
    dbms_output.put_line(
        r.person_no||' '||
        r.person_name||' '||
        r.job_title||' '||
        r.salary_amt);
    close cur1;
    commit;
end;
/

-- another way is to rewrite the view to join the two tables in pure sql. This
-- will eliminate the function and it's embed 2nd sql statement.

-- corrected with rewritten view
create or replace view person_view as (
    select
        p.person_no,
        p.person_name,
        p.job_title,
        ps.salary_amt
    from person p
        inner join person_sal ps
            on p.person_no = ps.person_no
);

-- There are other ways to correct as well. This is not an exhaustive treatment.

Test #2

A PL/SQL process with two cursors is not by default read-consistent.

-- the rest is same as above (including the BAD output)

-- session 1 (long-running process with read consistency problems)
set serveroutput on size 999999;
declare
    cursor cur1 is select * from person;
    r1 cur1%rowtype;
    cursor cur2 is select * from person_sal;
    r2 cur2%rowtype;
begin
    open cur1;
    fetch cur1 into r1;
    dbms_lock.sleep(10);
    open cur2;
    fetch cur2 into r2;
    dbms_output.put_line(
        r1.person_no||' '||
        r1.person_name||' '||
        r1.job_title||' '||
        r2.salary_amt);
    close cur2;
    close cur1;
end;
/

I will leave example output and the correction as an exercise to the reader.

NOW CONSIDER THIS

What if the "problem" process ran 3 hours instead of a few seconds? What if there were not just two, but 18 statements without read consistency in the long-running process? What if the long-running process with read-consistency issues was processing not just one, but 65 thousand records? What if there were not one, but over 20 of these long-running processes? What if you knew that at least 2 outbound interfaces were doing this and you suspect others are the same? what if you spot-checked a few of many views, found several that do this, expect most or all do?

I have found this problem in sooooooo many systems. Sometimes, it feels like it can be found in every system if I'm willing to look carefully enough.

Do you feel safe?

Thursday, July 30, 2020

Incremental System Integration Approaches

This is a republish of an article I wrote twenty years ago, before the Agile Manifesto. Now that I am re-reading in the Post- "Lean, Agile, DevOps, Continuous Integration, Continuous Delivery" world, it still makes sense, but without all the latest buzzwords😆.

For the teams that are effectively doing continuous integration today, phased integration may be a quaint reminder of the ways things used to be, but you still have the same incremental integration strategy decisions to make; you just have much better tooling for taking most of the drudgery out of the integration work. For the rest of the software development community who still struggles with quarterly releases, big-bang integration and Fragile methodologies, considering the disadvantages of phased integration is still quite relevant.

How you approach System Integration affects the order in which components are designed, coded, and tested. It also affects the ease with which you can test and debug them. Incremental integration comes in several varieties, and, unless the project is trivial, any one of them is better than phased integration. A good incremental approach assumes that parts of the course are uncharted, that flexibility is important and plans to respond to unexpected opportunities.

In engineering fields other than software, the importance of proper integration is well known. The Pacific Northwest saw a dramatic illustration of the hazards of poor integration when the football stadium at the University of Washington collapsed partway through construction. The structure wasn’t strong enough to support itself as it was being built. It doesn’t matter that it would have been strong enough by the time it was done; it needed to be strong enough at each step. If you integrate software in the wrong order, it’s hard to code, hard to test, and hard to debug. If none of it will work until all of it works, it can seem as though it will never be finished. It too can collapse under its own weight during construction – even though the finished product should work.

Because it’s done after project members have finished unit testing and in conjunction with system testing, integration is sometimes thought of as a testing activity. It’s complex enough, however, that is should be viewed as an independent activity. Here are some of the benefits you can expect from careful integration:

Easier defect diagnosis
Fewer defects
Less scaffolding
Shorter time to first working product
Shorter overall development schedules
Better customer relations
Improved morale
Improved chance of project creation
More reliable schedule estimates
More accurate status reporting
Improved code quality
Less documentation

Phased Integration

Until a few years ago, phased integration was the norm. It follows these well-defined steps:

1. Design, code, buy or reuse, test and debug each component. This step is called “unit development”.
2. Combine the components into one whopping-big system. This is called “system integration.”
3. Test and debug the whole system. This is called “system dis-integration.”

One problem with phased integration is that when the components in a system are put together for the first time, new problems inevitably surface and the causes of the problems could be anywhere. Since you have a large number of components that have never worked together before, the culprit might be poorly tested components, an error in the interface between two components, or an error caused by the interaction between two components. All components are suspect.

The uncertainty about the location of any specific problems is compounded by the fact that all the problems suddenly present themselves at once. This forces you to deal not only with the problems caused by interactions between components but with problems that are hard to diagnose because the problems themselves interact. For this reason, another name for phased integration is “big bang integration.”

Phased integration can’t begin until late in the project, after all the components have been unit-tested. When the components are finally combined and errors surface by the score, the project team immediately goes into panicky debugging mode rather than methodical error detection and correction.

Incremental Integration

In incremental integration, you write and test a program in small pieces and then combine the pieces one at a time. In this one-at-a-time approach to integration, you follow these steps:

1. Develop a small, functional part of the system. It can be the smallest functional part, the hardest part, or a key part. Thoroughly test and debug it. It will serve as a skeleton on which to hang the remaining parts of the system.
2. Design, code, test and debug a component.
3. Integrate the new component with the skeleton. Test and debug the combination of skeleton and new component. Make sure the combination works before you add any new components. If work remains to be done, repeat the process starting at step 2.

Benefits of Incremental Integration

Errors are easier to locate

When new problems surface during incremental integration, the new component is obviously at fault. Either its interface to the rest of the system contains an error or its integration with a previously integrated component produces an error. Either way, you know exactly where to look. Moreover, simply because you have fewer problems at once, you reduce the risk that multiple problems will interact or that one problem will mask another. The more interface errors you tend to have, the more this benefit of incremental integration will help in your projects. Since many projects spend up to 50 percent of their time debugging, maximizing debugging effectiveness by making errors easy to locate provides benefits in quality and productivity.

The system succeeds early in the project

When code is integrated and running, even if the system isn’t usable, it’s apparent that it soon will be. With incremental integration, the project team can see early results from their work, so their morale is better than when they suspect that their project will never draw its first breath. Management also has a more tangible sense of progress from seeing a system working with 50 percent of its capability than from hearing that coding is “99 percent complete.”

The units of the system are tested more fully

Integration starts early in the project. You integrate each component as it’s developed, rather than waiting for one magnificent binge of integration at the end. Components are unit-tested in both cases, but each component is exercised as part of the overall system more often with incremental integration than it is with phased integration.

Shorter development schedule

If integration is planned carefully, you can design part of the system while another part is being coded. This doesn’t reduce the total number of work hours required to develop the complete design and code, but it allows some work to be done in parallel, an advantage when calendar time is at a premium.

Incremental Integration Strategies

With phased integration, you don’t have to plan the order in which project components are built. All components are integrated at the same time, so you can build them in any order as long as they’re all ready by D-day.

With incremental integration, you have to plan more carefully. Most systems will call for the integration of some components before the integration of others. Planning for integration thus affects planning for construction; the order in which the components are constructed has to support the order in which they will be integrated.

Integration-order strategies come in a variety of shapes and sizes, and none is best in every case. The best integration approach varies from project to project and the best solution is always one that you create to meet the specific demands of a specific project. Knowing the points on the methodological continuum will give you insight into the possible solutions.

Top-Down Integration

In top-down integration, the component at the top of the hierarchy is written and integrated first. Stubs have to be written to exercise the top component. Then, as components are integrated from top down, stub components are replaced with real ones.

An important aspect of top-down integration is that the interfaces between components must be carefully specified. The most troublesome errors to debug are not the ones that affect single components but those that arise from subtle interactions between components. Careful interface specification can reduce the problem. Interface specification isn’t especially an integration activity, but making sure that the interfaces have been specified well is.

An advantage of top-down integration is that the control logic of the system is tested relatively early. All the components at the top of the hierarchy are exercised a lot, so that big, conceptual design problems are exposed quickly.

Another advantage of top-down integration is that, if you plan it carefully, you can complete a partially working system early in the project. If the user-interface parts are at the top, you can get a basic interface working quickly and flesh out the details later. The morale of both customers and the project team benefits from getting something visible working early.

Top-down incremental integration also allows you to begin coding before the low-level design details are complete. Once the design has been driven down to a fairly-low level of detail in all areas, you can begin implementation and integration of the routines at the higher levels without waiting.

In spite of these advantages, pure top-down integration usually involves disadvantages that are more troublesome than you’ll want to put up with.

Pure top-down integration leaves exercising the tricky low-level interfaces until last. If low-level interfaces are buggy or a performance problem, you’d usually like to get to them long before the end of a project. It’s not unusual for a low-level problem to bubble its way to the top of a system, causing high-level changes and reducing the benefit of earlier integration work. Minimize the bubbling problem through early careful unit testing and performance analysis of the routines that exercise the low-level interfaces.

Another problem with pure top-down integration is that it takes a lot of stubs to integrate from the top down. Many low-level components been integrated, which implies that a large number of stubs will be needed during intermediate steps in integration. Stubs are problematic in that, as test code, they are more likely to contain errors than the more carefully designed system code. Errors in the new stubs that support a new component defeat the purpose of incremental integration, which is to restrict the source of new errors to one new component.

Top-down integration is also nearly impossible to implement purely. In top-down integration done by the book, you start at the top level and then integrate all the routines at the second level. When you’ve integrated all the routines from the second level, and not before, you integrate the routines from the third level. The rigidity in pure top-down integration is completely arbitrary. Most people use a hybrid approach such as integrating from the top-down in sections instead.

Finally, you can’t use top-down integration if the collection of components doesn’t have a top.

Even though pure top-down integration isn’t workable, thinking about it will help you decide on a general approach. Some of the benefits and hazards that apply to a pure, top-down approach apply, less obviously, to a looser top-down approach, so keep them in mind.

Bottom-up Integration

In bottom-up integration, you write and integrate components at the bottom of the hierarchy first. You write test drivers to exercise the low-level components initially and add components to the test-driver scaffolding as they’re developed. As you add high-level components, you replace driver components with real ones.

Bottom-up integration provides a limited set of advantages. It restricts the possible source of errors to the single component being integrated, so errors are easy to locate. Integration can start early in the project.

Bottom-up integration also exercises potentially troublesome low-level interfaces early. Since low-level limitations often determine whether you can meet the systems’ goals, making sure the low-level interfaces have had a full workout is worth the trouble.

The main problem with bottom-up integration is that it leaves integration of the major, high-level interfaces until last. If the system has conceptual design problems at the higher levels, the implementation activity won’t find them until all the detailed work has been done. If the design must be changed significantly, some of the low-level work might have to be discarded.

Bottom-up integration requires you to complete the design of the whole system before you start integration. If you don’t, assumptions that needn’t have controlled the design might end up deeply embedded in low-level code., giving rise to the awkward situation in which you design the high-level components to work around problems in the low-level ones. Letting low-level details drive the design of the higher level components contradicts principles of information hiding and structured design. The problems of integrating higher-level components are small compared to the problems you’ll have if you don’t complete the design of high-level components before you begin low-level coding.

As with top-down integration, pure bottom-up integration is rare, and you can use a hybrid approach instead, integrating in sections.

Sandwich Integration

The problems with pure bottom-up and pure bottom-down integration have led some software integrators to recommend a sandwich approach. You first integrate the control components at the top of the hierarchy. Then you integrate the low-level components and widely-used utility components at the bottom. These high-level and low-level components are the bread of the sandwich. You leave the middle-level components until later. These make up the meat, cheese and tomatoes of the sandwich.

This approach avoids the rigidity of pure bottom-up or top-down integration. It integrates the often-troublesome components first and has the potential to minimize the amount of scaffolding you’ll need. It’s a realistic, practical approach.

Risk-Oriented Integration

Risk-oriented integration is also called “hard part first” integration. It’s like sandwich integration in that it seeks to avoid the problems inherent in pure top-down or pure bottom-up integration. Coincidentally, it also tends to integrate the components at the top and bottom first, saving the middle-level components for last. The motivation, however, is different.

In risk-oriented integration, you identify the level of risk associated with each component. You decide which will be the most challenging parts to implement, and you implement them first. Experience indicates that top-level interfaces are risky, so they are often at the top of the risk list. Low-level interfaces are also risky, so they’re also at the top of the risk list. In addition, you might know of components in the middle that will be challenging. Perhaps a component implements a poorly understood algorithm or has ambitious performance goals. Such components can also be identified as high risks and integrated relatively early.

The remainder of the code, the easy stuff, can wait until later. Some of it will probably turn out to be harder than you thought, but that’s unavoidable.

Feature-Oriented Integration

A final kind of incremental integration is integrating one feature at a time. The term “feature” refers to an identifiable function of the system you are integrating.

Many features must be implemented in multiple components, which puts a twist on the one-component-at-a-time aspect of incremental integration. When the feature to be integrated is bigger than a single component, the “increment” in incremental integration is bigger than a single component. This reduces the benefit of incrementalism a little in that it reduces your certainty about the source of new errors, but if you have thoroughly tested the new components that implement the new feature before you integrate them, that’s only a small disadvantage. You can use the incremental integration strategies recursively by integrating small pieces to form features and then incrementally integrating features to form a system.

You’ll usually want to start with a skeleton you’ve chosen for its ability to support other features. You can hang the reset of the features on the feature you integrate first. Components are added in “feature-trees,” hierarchical collections of routines that make up a feature. Integration is easier if a feature is relatively independent, perhaps calling the same low-level component as the routines for other features, but having no calls to middle-level code in common with other features.

Feature-oriented integration offers three main advantages. First, it eliminates scaffolding for virtually everything except low-level library components. The skeleton might need a little scaffolding, or some parts of the skeleton might not be operational until particular features have been added. When each feature has been hung on the structure, however, no additional scaffolding is needed. Since each feature is self-contained, each feature contains all the support code it needs.

The second main advantage is that each newly integrated feature brings out an incremental addition in functionality. This provides evidence that the project is moving steadily forward.

A third advantage is that feature-oriented integration works well with object-oriented design. Objects tend to map well to features, which makes feature-oriented integration a natural choice for object-oriented systems.

Pure feature-oriented integration is as difficult to pursue as pure top-down or pure bottom-up integration. Usually some of the low-level code must be integrated before significant features can be.

Conclusion

None of these approaches are robust procedures that you should follow methodically and then declare yourself to be done. Like software design approaches, they are heuristics more than algorithms, and rather than following any procedure dogmatically, you come out ahead by making up a unique strategy tailored to your specific project.

References

Adapted and summarized from the below works. Most content is from (2), chapter 7 and (3), Chapter 3.

(1) Hunt, Andrew and Thomas, David. The Pragmatic Programmer: From Journeyman to Master. Addison Wesley, 1999.
(2) Jones, Capers. Software Assessments, Benchmarks and Best Practices. Addison Wesley, 2000.
(3) McConnell, Steve. Code Complete. Microsoft Press, 1993.

Tuesday, July 14, 2020

The Unreasonable Effectiveness* of Pragmatic Automation

* apologies to Eugene Wigner

We all know that computers are great at doing repetitive tasks, but often fail to realize that there are lots of ways we can use computers to can make our own daily work more efficient. We can be a lot more effective about how we use our computers if we use them as a tool for ourselves.

The shell is the primary way you interact with a computer once you want to do more than what visual user interfaces allow. Visual user interfaces are somewhat limited in what they can allow because you can only do the things that there are buttons and sliders and input fields for.

Textual tools are often built to be composable: providing many different ways to combine, program or automate them. Using the computer as a tool for ourselves starts with understanding a few basic things: how to use your shell, terminal, and a programming editor.

Many software developers are captives of button-pushing repetitive drudgery. The cobbler's children have no shoes. As they are developing tools for others, they fail to use the computer effectively as a tool for themselves.

Past examples from my own observation:

tests are manual, results are word docs of pasted screenshots
code is written in Word or notepad
code is kept in a scattering of "secret" places, only a handful of people have access to, or not kept at all
documentation is separate from code, all hand-written, inaccessible/obfuscated in proprietary systems, or just missing
deployment is via runbook Word doc: a detailed step-by-step instructions for humans to execute

A wishlist for using the computer as a tool for myself: automating nearly all of the repetitive drudgery that goes into software development, leaving more time for the fun parts:

streamlined management of all your source, docs and related artifacts in a distributed fashion to track changes, revision history, rollback breaking changes, collaborate with team, isolate changes from main codebase, troubleshoot changes, and much more...
all aspects of which are managed across multiple versions of software
within a framework for project packaging and package dependencies
and a framework for unit testing, automated across all versions
and automated code coverage analysis
automated testing, continuous integration and continuous delivery across multiple environments
automated lint checks for code style, likely bugs, design flaws, import order
documentation of code in-place, automated generation of docs, automated checking for doc standards conformance
automated check of dependencies for known security vulnerabilities
automated decoration of untyped code with type annotations, then used for static and runtime type checking
auto-formatting of code on pre-commit
automatic test that in-line docs match code and docstring examples match documented example output
status tracking of continuous integration tests and doc tests with dashboard badges
automatically create release notes
automatically publish to a package repository for easy deployments/updates
automatically publish versioned docs to a document repository, auto-rebuilt on project update

Most shops use computers as tools to automate a small fraction of this list. Most of this list is just skipped and not done. The essential bits are handled with repetitive human drudgery.

I've recently been playing with Python frameworks that deliver every single bit of the ENTIRE wishlist above and it has provided much food for thought. I would love to apply the same to an end-to-end data engineering solution. Next step, become familiar with DBT!

get_wp project on GitHub: https://github.com/SteveRutledge/get_wp

DBT: https://www.getdbt.com/

Wednesday, July 10, 2019

Congratulations! We would like to make you a job offer!

I wrote a program to handle the DBSRC Code Librarian job. It took a while to go through the legal department to publish as open-source, but you can now see the code at https://github.com/SteveRutledge/dbsrc.

Here are my design notes:

(1) Introduction

This is a utility for comprehensive, flexible, performant DDL export of large database systems consisting of hundreds of thousands of DDL objects (or more).

This is not a complete utility, not is it production ready, so comes with no warranties in any way whatsoever. This utility has not be productized or packaged for ease-of-use or understandability. It is simply a prototype to explore feasibility. Having said all that, I share it freely in case it may be useful to someone somehow...

(2) The Problem

Incomplete database source code

Database source code for a complete build of a database system typically does not exist. Database changes are coded with incremental DDL for each release. DDL or each release depends on the previous release database. There exists no baseline DDL to "build from scratch".

Primitive source code management tools and processes

Teams use shared development database servers for development work. Like many conveniences in software development, a shared database is a tar pit waiting to fossilize a project: Developers overwrite each others changes, changes made on the database break the code in the webapp, etc.

Teams have beautiful tools that help us do source management in most coding environments (java, .NET), but in database environments we are stuck with an arcane, time-intensive manual process that leaves much to be desired and offers very little by way of consistency or reproduceability.

(3) The Need

There are many different approaches to managing database source code, with varying processes and tools, but one fundamental capability is being able to simply export database source code from an existing database.

How is this useful? You can:

Establish baseline database source code for a system that is missing database source code.
Re-baseline database system source code to incorporate a series of incremental changes into an updated baseline script.
Produce authoritative database source scripts for performant reverse-engineering into a data modeling tool.
Create an independent sandbox database from a DDL export of a shared database to support new R&D and development efforts.
Create a demonstration or training database source script.
Diff database source code over time.

(4) The Challenge

There is a potentially overwhelming amount of unmanageable database source code. The size and complexity of large real-world database systems overwhelms conventional database source code management approaches and tools.

... which leads to the question:

What is the feasibility of a comprehensive, flexible, performant DDL export for large database systems numbering from thousands to hundreds of thousands of objects?

Is this even possible?

A viable solution must be:

comprehensive = ALL database object types not just tables and views, but including everything necessary for building the database for a system (grants, types, users, packages, roles, VPD policies, db_links, etc...)
flexible = filterable to object subsets, with variety of post-processing requirements to many various build target needs.
performant = DDL export in seconds to minutes, not hours to days.

(5) DDL Export Approaches: Considered Alternatives

(1) ERWin

Cancelled after running for 5 hours for DDL export of large & complex system.
Captive GUI (no effective way to automate)
Limited coverage of object types. ERWin is, at its core, a multi-rdbms data modeling tool, so supports some logicial concepts that are not relevant to DDL export as well as lacks support for some physical concepts specific to our rdbms technology and version.
ERwin does not know anything about DAC (data access language), so is blind to object grants, system privileges, roles, role grants, vpd policies, etc. This is a significant part of the DDL export equation. Also, ERwin does not know anything about Oracle object and XML extensions: types, xmlschema, etc.
Need to purchase license.

(2) Oracle SQL Developer

Free.
Impossibly slow: one hour for 5% of DDL export. The "one-at-a-time" design explains slowness: initially queries against sys.all_* dictionary views, then uses dbms_metadata.get_ddl and get_dependent_ddl one object at a time.
Captive GUI (no effective way to automate)
Limited GUI although it leverages dbms_metadata to do the work, many of the underlying options are hidden (ex: cannot exclude partitioning)

(3) TOAD

Runs out of memory 5% through DDL export of large & complex system.
Captive GUI (no effective way to automate)
Need to purchase license.

(4) Oracle Change Management Pack

Would need significant dba buy-in and involvement for setup and ongoing admin of approach. Also, need to purchase license. Not attractive.

(5) Oracle-supplied dbms_metadata PL/SQL API, wrapped in custom scripts

(This is the selected approach)

A 3rd party utility Will be fundamentally less comprehensive and compatible with the DDL it can export compared to leveraging the built-in oracle database DDL export functionality. Built-in functionality is by its nature in a compatibility lock-step with the database. It is also a critical core component leveraged by the database product itself to support the datapump utility, so one can expect a certain amount of quality and comprehensive functionality.
A 3rd-party client utility will be fundamentally less efficient than a database server-based approach. With a client, processing will occur remote to the database and across the network, while database server-based processing will occur within the database engine itself, enjoying a several-fold potential performance advantage.
free (as in "comes-with" something we are alreadying paying for).
Should be Possibly feasible. Lets do a proof-of-concept.

(6) General Approach

Design Points

use large complex system as a test case
wrap oracle dbms_metadata supplied PL/SQL package with custom package to adapt to our needs
maximize leveraging of dbms_metadata by using heterogeneous database_export mode instead of one-object-at-a-time approaches.
two logical parts: (1) export raw DDL from db into table of clobs (2) post-process db from table of clobs and write into SQL scripts
capture dbms_metadata transforms as table data
capture dbms_metadata filters as table data
capture post-processing logic as table data
assemble all transforms, filters and post-processing logic for a single system into a "configuration", stored as data.
use simple SQL scripts and batch files to execute
generate thorough script "documentation" comments to describe configuration set, dbsrc and post-processing details.

observed performance? OK (not great, but acceptable)

full DDL export of 16k objects from a 5 million object system in ~70m:
  exporting the DDL  to clobs: ~60m
  post-processing ~45s
  writing clobs to files ~10m
    includes 5 scripts: create, create-ordered, drop, manifest, define
    writing create only takes ~4m

comprehensive and correct results ? (tentative) YES

successfully generated a build script
success = executed against vanilla oracle db with no errors and no invalid
  objects!

learnings:

Custom post-processing was the most complex part. Without it, the raw exported DDL was not very useful.
No clear way to tell from the db which roles are used by which "systems" (roles are not schema-level objects), so that filter must be set up (at least once) manually.
Must also identify and filter out about 60 oracle built-in roles.
Very useful to transform default tablespace names to USERS when creating users unless you want to always also export all tablespaces.
must transform and parameterize user and dblink passwords. dbms_metadata DDL export API does not include these except for SYS. Very useful to make this flexible anyhow for many reasons.
Useful and safer to be able rewrite dblinks pointing to production to instead point to dev.
useful to add "show errors" after stored code.
Actually quite tricky to manage the creation of private dblinks within the proper schemas. Possibly the most complex post-processing.
needed to add force option to view creation or inevitable cyclical dependencies caused unnecessary failures.
Useful to flexibly add/remove storage attributes
Useful to flexibly add/remove partition attributes
Needed to handle one dbms_metadata weirdness: mv redundant table creation
Useful to flexibly choose include/exclude invalid objects in export
Need to handle SQLPlus 2499 char line limitation with expanded "select *" views or else DDL scripts just will not work in SQLPlus.
Useful to be able to exclude malformed, unusable dblinks.
Needed to flexibly introduce and manage a custom prologue script.
Needed to flexibly introduce and manage a custom epilogue script.
To get clean build, needed to include post-build invalid object recompile across a set of schemas.
To get a clean build, need to flexibly choose to include dependent objects to the target object set.
There is very limited documentation on object_paths for set_transform. The object_paths I found that ultimately worked are completely different from those documented.
Undocumented dbms_metadata.set_transform limitation: If you explicitly include too many filter paths, then you blow a buffer somewhere and dbms_metadata fails with cryptic errors (ORA-06512: at "SYS.DBMS_METADATA", line 7909. ORA-06502: PL/SQL: numeric or value error: character string buffer too small), so I include only the minimal high-level paths to and restate filters more succinctly as "WHERE IN ()" expressions.
Undocumented dbms_metadata.set_transform limitation: If not excluded, the object path DATABASE_EXPORT/XS_SECURITY causes dbms_metadata to abort with cryptic errors (LPX-00210: expected '<' instead of 'C').
Undocumented dbms_metadata.set_transform limitation: When turning dbms_metadata back on the oracle built-in objects, You must have EXP_FULL_DATABASE role or you will fail with ORA-01031: insufficient privileges.
Undocumented dbms_metadata.set_transform limitation: You must exclude object_path DATABASE_EXPORT/SCHEMA/DVPS_STAGING_REALM_MEMBERSHIP or else dbms_metadata aborts with cryptic errors (ORA-01031: insufficient privileges) unless you have EXP_FULL_DATABASE role, which is not a documented requirement.
Undocumented dbms_metadata.set_transform limitation: You must exclude object_path DATABASE_EXPORT/SCHEMA/PASSWORD_HISTORY or else dbms_metadata aborts with cryptic errors, (ORA-00942: table or view does not exist) unless you have EXP_FULL_DATABASE role, which is not a documented requirement.

limitations:

This approach becomes more useful the larger the system is and the stricter the requirements are for correctness and completeness. For example, a one-time need for DDL for a few dozen tables used for ad-hoc analysis is better served with Toad, Oracle SQL Developer or ERWin. Only when you need a repeatable method for complete DDL generation for a large system is this approach worthwhile.
Fundamental complexity is hard to mitigate. A utility similar to this prototype will not be easy to understand, maintain and use except for a small subset of people with mastery of oracle, PL/SQL and SQL.
Even just sharing the output scripts will be enough to befuddle many, although that may be a more useful way to leverage the utility than expecting people to install and run the utility themselves. In practice, I have found that this is the most useful way of scaling usage of this utility (just share the output scripts and sit down with colleague the first time they try to deploy them).
Like any custom-built utility, it will likely only survive until its builder and maintainer moves on.

(7) Data Model

Logical

Physical

(8) PL/SQL Package

(9) Files

Install

Everything Else

(7) Next Steps

continue use as a personal utility to further evaluate functionality and usefulness. Incremental development/bug fixes as time and attention allows.
socialize problem and solution approach as opportunities arise.
continue to evaluate other alternatives.