viktorianer

viktorianer

High Performance PostgreSQL for Rails: Clone and Replace using creation DDL statements (page 58-60)

@andatki

I have encountered a couple of issues in the sections explaining the cloning of tables without constraints, copying all rows, and recreating constraints using creation DDL statements. I believe these areas could benefit from additional clarity and completeness.

The area where I found some confusion is related to the statement:

A better way is to list out the constraint definitions as creation DDL statements. This makes for straightforward copy-and-paste to recreate them.

While the book does an excellent job explaining how to handle each constraint, sequence, and index individually, it does not provide a cohesive example of create an equivalent constraint on the destination table using the definition from the source table using creation DDL statements.

Including a comprehensive example that demonstrates the steps involved in extracting constraint definitions and applying them to the destination table would be extremely beneficial.

I appreciate your attention to these points and believe that addressing them will significantly enhance the clarity and usefulness of your book for readers.

Best regards,
Viktor

Marked As Solved

viktorianer

viktorianer

Dear Andrew @andatki,

I have found a working solution for the problem discussed in your book related to cloning tables without constraints, copying rows, and recreating constraints. I would like to share this solution, which may help other readers who are facing similar issues.

Solution: SCRUB_BATCHES Procedure

Below is the SCRUB_BATCHES procedure I created. This procedure iterates over the current tables, scrubs sensitive data using predefined functions, and copies the data to a new table:

-- Ensure the hstore extension is enabled
-- CREATE EXTENSION IF NOT EXISTS hstore WITH SCHEMA rideshare;
CREATE OR REPLACE PROCEDURE SCRUB_BATCHES(schema_name text, tablename text)
LANGUAGE plpgsql
AS $$
DECLARE
  current_id INT;
  max_id INT;
  batch_size INT := 1000;
  rows_inserted INT;
  column_list text;
  value_list text;
  attr_rec RECORD;
  scrub_functions hstore :=   'name => SCRUB_NAME,
                              -- Additional functions follow
                              email => SCRUB_EMAIL,
                              secret => SCRUB_SECRET,
                              -- Additional functions follow
                              ssn => SCRUB_SSN';
  function_name text;
  key text;
  value text;
BEGIN
  -- Get the minimum and maximum IDs for the specified table
  EXECUTE format('SELECT MIN(id), MAX(id) FROM %I.%I', schema_name, tablename)
  INTO current_id, max_id;

  -- Loop over the table in batches of `batch_size`
  WHILE current_id IS NOT NULL AND current_id <= max_id LOOP
    -- Reset the column and value lists for each batch
    column_list := 'id';
    value_list := 'id';

    -- Retrieve the list of attributes for the specified table
    FOR attr_rec IN
      SELECT a.attname, col_description(a.attrelid, a.attnum) as comment
      FROM pg_attribute a
      WHERE a.attrelid = format('%I.%I', schema_name, tablename)::regclass
        AND a.attnum > 0
        AND NOT a.attisdropped
        AND a.attname NOT IN ('id') -- Exclude 'id' column from updates
    LOOP
      -- Determine the appropriate scrubbing function based on attribute name patterns
      function_name := NULL;
      FOR key, value IN SELECT * FROM each(scrub_functions)
      LOOP
        IF attr_rec.attname NOT IN ('id') AND attr_rec.attname NOT ILIKE '%_id'
            AND attr_rec.attname ILIKE '%' || key || '%' THEN
          function_name := value;
          EXIT;
        END IF;
      END LOOP;

      -- Append attribute to the column and value lists with the determined scrubbing function
      column_list := column_list || format(', %I', attr_rec.attname);

      IF function_name IS NOT NULL THEN
        value_list := value_list || format(', CASE WHEN %I.%I.%I IS NOT NULL THEN %s(%I.%I.%I) ELSE %I.%I.%I END',
                                            schema_name, tablename, attr_rec.attname,
                                            function_name, schema_name, tablename, attr_rec.attname,
                                            schema_name, tablename, attr_rec.attname);
      ELSE
        -- Default case if no specific scrubbing function is defined
        value_list := value_list || format(', %I.%I.%I', schema_name, tablename, attr_rec.attname);
      END IF;
    END LOOP;

    -- Execute the insert statement with the dynamically built column and value lists
    EXECUTE format('INSERT INTO %I.%I_copy (%s) SELECT %s FROM %I.%I WHERE id >= %L AND id < %L',
                    schema_name, tablename,
                    column_list,
                    value_list,
                    schema_name, tablename, current_id::bigint, (current_id + batch_size)::bigint);

    GET DIAGNOSTICS rows_inserted = ROW_COUNT;

    COMMIT;
    RAISE NOTICE 'Table: %, current_id: % - Number of rows inserted: %', tablename, current_id, rows_inserted;

    current_id := current_id + batch_size + 1;
  END LOOP;
END $$;

This SCRUB_BATCHES procedure can be executed for iterating over the current tables as follows:

CALL SCRUB_BATCHES(schema_name, table_rec.tablename);

It can be easily changed to use a batched UPDATE statement, as explained in the book on page 64.

Issues with Existing Constraints and Indexes

Following the book explanations, I was able to use this procedure on all tables. However, I still encountered issues with existing constraints that cascade on the old tables and the new, copied tables.

Example constraints:

       table_name         |        foreign_key        |                    pg_get_constraintdef
--------------------------+---------------------------+-----------------------------------------------
 trip_positions           | trip_positions_pkey_copy  | PRIMARY KEY (id)
 trip_positions           | fk_rails_9688ac8706_copy  | FOREIGN KEY (trip_id) REFERENCES trips_old(id)
 trip_positions_old       | trip_positions_pkey       | PRIMARY KEY (id)
 trip_positions_old       | fk_rails_9688ac8706       | FOREIGN KEY (trip_id) REFERENCES trips_old(id)
-- Additional constraints follow

Example indexes:

indexname                 |        tablename
--------------------------+---------------------
trip_positions_pkey       | trip_positions_old
trip_positions_pkey_copy  | trip_positions
 -- Additional indexes follow

After several hours of working on this script, I could not find a good way to remove these constraints and indexes automatically.

If you would like to see the full example, I can upload it to the forum for further discussion and review.

Thank you for your time and consideration.

Best regards,

Viktor

PS: If found the chapter order Performing Database Maintenance and Performing Updates in Batches not logical and weird. I would expect it in opposite order. Also, a missed a notice, that VACUUM cannot run in a transaction and cannot run in a function or in a procedure. Which is crucial for the provided examples.

Also Liked

viktorianer

viktorianer

I successfully resolved the issue with existing constraints and indexes! :tada: I now have a complete script that works across all tables and projects. It took me two full days to write, incorporating 40 functions, 1 procedure, and around 10 queries.

If anyone needs this script, feel free to reach out to me.

PS: @andatki I’d love to see it included in the book, as this was a missing piece that could benefit many readers.

Best regards,

Viktor

Where Next?

Popular Pragmatic Bookshelf topics Top

belgoros
Following the steps described in Chapter 6 of the book, I’m stuck with running the migration as described on page 84: bundle exec sequel...
New
GilWright
Working through the steps (checking that the Info,plist matches exactly), run the demo game and what appears is grey but does not fill th...
New
herminiotorres
Hi! I know not the intentions behind this narrative when called, on page XI: mount() |&gt; handle_event() |&gt; render() but the correc...
New
jskubick
I’m running Android Studio “Arctic Fox” 2020.3.1 Patch 2, and I’m embarrassed to admit that I only made it to page 8 before running into ...
New
creminology
Skimming ahead, much of the following is explained in Chapter 3, but new readers (like me!) will hit a roadblock in Chapter 2 with their ...
New
dtonhofer
@parrt In the context of Chapter 4.3, the grammar Java.g4, meant to parse Java 6 compilation units, no longer passes ANTLR (currently 4....
New
bjnord
Hello @herbert ! Trying to get the very first “Hello, Bracket Terminal!" example to run (p. 53). I develop on an Amazon EC2 instance runn...
New
gorkaio
root_layout: {PentoWeb.LayoutView, :root}, This results in the following following error: no “root” html template defined for PentoWeb...
New
SlowburnAZ
Getting an error when installing the dependencies at the start of this chapter: could not compile dependency :exla, "mix compile" failed...
New
roadbike
From page 13: On Python 3.7, you can install the libraries with pip by running these commands inside a Python venv using Visual Studio ...
New

Other popular topics Top

PragmaticBookshelf
From finance to artificial intelligence, genetic algorithms are a powerful tool with a wide array of applications. But you don't need an ...
New
foxtrottwist
A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...
New
PragmaticBookshelf
Rails 7 completely redefines what it means to produce fantastic user experiences and provides a way to achieve all the benefits of single...
New
Help
I am trying to crate a game for the Nintendo switch, I wanted to use Java as I am comfortable with that programming language. Can you use...
New
PragmaticBookshelf
Author Spotlight: VM Brasseur @vmbrasseur We have a treat for you today! We turn the spotlight onto Open Source as we sit down with V...
New
PragmaticBookshelf
Programming Ruby is the most complete book on Ruby, covering both the language itself and the standard library as well as commonly used t...
New
First poster: bot
zig/http.zig at 7cf2cbb33ef34c1d211135f56d30fe23b6cacd42 · ziglang/zig. General-purpose programming language and toolchain for maintaini...
New
AstonJ
This is cool! DEEPSEEK-V3 ON M4 MAC: BLAZING FAST INFERENCE ON APPLE SILICON We just witnessed something incredible: the largest open-s...
New
CommunityNews
Open-source implementation of the classic GTA engine now running directly in your browser. Experience the reVC technology demo on DOS.Zon...
New
xiji2646-netizen
Woke up to this today: Claude Code’s complete source code exposed via npm source map. Not a snippet. All 512,000 lines. 1,900 TypeScript ...
New

Sub Categories: