Database Migrations with Alembic#

We use Alembic for database migrations. Alembic compares the database schema defined in the models to the current state of the database and generates revision files containing the changes needed to update the schema.

Docs: https://alembic.sqlalchemy.org/en/latest/index.html


Migration Workflow (Separated: DEV Test and PROD Deploy)#

Warning

Be extremely careful to never mix up DEV and PROD. DATABASE_URL is what determines which database you are operating on. Before any command that can change schema/data, verify: 1) DATABASE_URL points to the intended RDS instance (DEV or PROD) 2) alembic current matches the intended environment

A) DEV Migration + Testing (Steps 1–6)#

  1. Make your changes to the models

    Make your desired changes to the models in sds_data_manager/lambda_code/SDSCode/database/models.py.

  2. Create a revision (DEV ONLY)

    This compares the database RDS instance in aws to the DEV schema and generates a migration file with upgrade and downgrade.

    Important

    Never create revisions on PROD.

    export DATABASE_URL=dev_database_url
    alembic revision --autogenerate -m "description"
    

    This generates a new file in alembic/versions/. Always review the file before applying — Alembic cannot detect all changes. See what it misses here: https://alembic.sqlalchemy.org/en/latest/autogenerate.html#what-does-autogenerate-detect-and-what-does-it-not-detect

  3. Preview the SQL (Dry Run)

    To see the SQL that would be run without actually applying it:

    export DATABASE_URL=dev_database_url
    alembic upgrade head --sql
    
  4. Apply migration to DEV

    export DATABASE_URL=dev_database_url
    alembic current
    alembic upgrade head
    
  5. Test on DEV

    Test your changes thoroughly to ensure the migration works as expected. Connect to the database on DataGrip and

    verify the schema changes.

  6. Downgrade DEV to verify downgrade path

    export DATABASE_URL=dev_database_url
    alembic downgrade -1
    

    Make sure the downgrade works as expected.

B) PROD Verification + Deployment (Steps 7–10)#

  1. Commit and push migration file + code changes

  2. Open PR, review, approve, merge

    First re-check DEV after merge:

    export DATABASE_URL=dev_database_url
    alembic current
    alembic upgrade head
    

    Then repeat Step 3 (Dry Run), but on PROD:

    Warning

    PROD safety gate (must all be true): - DATABASE_URL=prod_database_url - URL host/DB clearly match the PROD RDS instance - alembic current is the expected PROD revision

    export DATABASE_URL=prod_database_url
    alembic current
    alembic upgrade head --sql
    alembic upgrade head
    
  1. Post-deploy monitoring (DEV + PROD)

    Confirm both environments are healthy and at expected revision: run alembic current and functional checks.


Setup (This is performed once)#

Ensure you are in the root of the sdc-data-manager repo.

alembic init alembic

Edit alembic/env.py and add the following at the top:

# Override the URL from environment variable
# Do not hardcode the DATABASE_URL in the config file for security reasons
config.set_main_option("sqlalchemy.url", os.environ["DATABASE_URL"])

Import the database schema and point target_metadata to Base.metadata:

from sds_data_manager.lambda_code.SDSCode.database.models import Base
target_metadata = Base.metadata

Connecting to the Database#

The DATABASE_URL follows this format:

driver://username:password@host:port/database_name

We have different databases for DEV and PROD, so you will need to set the DATABASE_URL environment variable before running any Alembic commands.

To find the credentials: 1. Log into AWS account (dev or prod) 2. Go to Secrets ManagerSecretssdp-database-cred 3. Click Retrieve secret value 4. Construct the URL from those values 5. Verify the URL points to the intended RDS instance (DEV vs PROD)

Export the URL:

export DATABASE_URL=postgresql://username:password@host:port/database_name

Test the connection:

alembic current

Warning

Always double check which database your DATABASE_URL is pointing to before running any commands. DATABASE_URL is the environment boundary between DEV and PROD. Always test on DEV first before applying to PROD.

Downtime Considerations#

Most schema changes in PostgreSQL are non-blocking, but some operations will lock tables and cause downtime:

  • Adding a NOT NULL column without a default — locks the table

  • Dropping a column — locks the table briefly

  • Renaming a column or table — locks the table

  • Adding a non-concurrent index — locks the table for writes

  • Changing a column type — locks the table and may require a full rewrite

To minimize downtime on PROD, consider running heavy migrations during off-hours or using CONCURRENTLY for index operations where possible.


Troubleshooting#

Can’t locate revision identified by ‘xxxx’#

The database has a revision recorded that no longer has a corresponding file. This usually means a migration file was deleted after being applied. Connect directly to the database and clear the version table:

DELETE FROM alembic_version;

Then stamp to a known good revision:

alembic stamp head

Target database is not up to date#

There are pending migrations. Run alembic current to see where the DB is, then alembic upgrade head to apply.