# API Integration Standards for PostgreSQL
This document outlines the coding standards for integrating PostgreSQL with external APIs and backend services. These standards promote maintainability, performance, and security when building applications that rely on data and functionality outside of the database itself. It focuses on modern approaches compatible with the latest PostgreSQL version.
## 1. Architectural Considerations for API Integration
### 1.1. Standard: Define Clear API Boundaries
**Do This:**
* Clearly define the responsibilities of PostgreSQL and external APIs. Use PostgreSQL for data persistence, relational logic, and indexing. Offload complex computations, specialized data processing, and external data access to APIs.
* Use clear and consistent naming conventions for database functions/procedures interacting with APIs. Prefix them (e.g., "api_", "ext_") to easily identify external API integration code.
* Document the contract (input/output) with each API thoroughly.
**Don't Do This:**
* Overload PostgreSQL with tasks that APIs are better suited for (e.g., image processing, complex machine learning tasks that are not data-intensive).
* Embed undocumented or magic API calls directly within SQL queries.
**Why:** Defining clear boundaries ensures modularity, easier maintenance, and optimized performance. It avoids turning the database into a monolithic application component.
**Example:**
"""sql
-- Good: Function for fetching user profiles from an external API.
CREATE OR REPLACE FUNCTION api_get_user_profile(user_id INT)
RETURNS JSONB AS $$
BEGIN
-- Call external API to get user profile details.
-- Using a hypothetical extension for API calls
RETURN http_get('https://api.example.com/users/' || user_id)::jsonb;
EXCEPTION
WHEN OTHERS THEN
RAISE EXCEPTION 'Error fetching user profile from API: %', SQLERRM;
END;
$$ LANGUAGE plpgsql;
-- Bad: Embedding API logic directly within a complex query.
-- SELECT * FROM users WHERE ... AND api_call(...) ... ; -- Avoid!
"""
### 1.2. Standard: Asynchronous vs. Synchronous API Interactions
**Do This:**
* Use asynchronous API calls (e.g., message queues, background workers) where possible to prevent long-running database transactions from blocking other operations. Implement retries and error handling for asynchronous tasks.
* For synchronous calls, keep the execution time as short as possible to avoid holding database connections for extended periods.
**Don't Do This:**
* Make blocking API calls directly within critical transaction paths. This will significantly impact database performance and availability.
* Assume API calls will always succeed. Implement robust error handling and retries.
**Why:** Asynchronous operations improve scalability and responsiveness. Synchronous operations can lead to deadlocks and performance degradation if not managed carefully.
**Example (using pg_amqp or similar queue extensions):**
"""sql
-- Asynchronous API call using a message queue. (Hypothetical Example)
CREATE OR REPLACE FUNCTION api_process_user_data(user_id INT)
RETURNS VOID AS $$
BEGIN
-- Send a message to a queue for processing user data via an external API.
PERFORM amqp.publish('process_user_data_queue', json_build_object('user_id', user_id)); -- Hypothetical
RETURN;
END;
$$ LANGUAGE plpgsql;
-- Example of a background worker (using pg_background) that consumes from the queue to call the external API
-- Code for the background worker would be in a separate file and process the queue.
"""
### 1.3. Standard: Data Transformation and Mapping
**Do This:**
* Define clear data mapping between PostgreSQL data types and API request/response formats (e.g., JSON, XML). Use PostgreSQL's JSONB and XML support effectively.
* Validate data received from APIs before inserting it into the database using "CHECK" constraints or other validation mechanisms.
* Log API requests and responses for debugging and auditing purposes.
**Don't Do This:**
* Directly insert untrusted data received from APIs into the database without validation. This can lead to SQL injection and other security vulnerabilities.
* Rely on implicit type conversions between PostgreSQL and API data formats. Be explicit.
**Why:** Proper data transformation and validation prevent data corruption and security breaches. Logging helps troubleshoot issues and track API usage.
**Example:**
"""sql
-- Validating and inserting JSON data from an API.
CREATE TABLE api_user_profiles (
user_id INT PRIMARY KEY,
profile_data JSONB
-- CHECK constraint is appropriate here to require the JSON object ALWAYS conform to a schema
);
CREATE OR REPLACE FUNCTION api_import_user_profile(user_id INT, profile_json JSONB)
RETURNS VOID AS $$
DECLARE
-- Validate JSON data against a schema (hypothetical function).
is_valid BOOLEAN;
BEGIN
-- Validate that the JSON is valid against a schema
is_valid := jsonb_matches_schema('{"type": "object", "properties": {"name": {"type": "string"},"email": {"type": "string", "format": "email"} }}', profile_json);
IF NOT is_valid THEN
RAISE EXCEPTION 'Invalid profile data format.';
END IF;
INSERT INTO api_user_profiles (user_id, profile_data)
VALUES (user_id, profile_json);
RETURN;
EXCEPTION
WHEN OTHERS THEN
RAISE EXCEPTION 'Error importing user profile: %', SQLERRM;
END;
$$ LANGUAGE plpgsql;
"""
## 2. Implementation Details
### 2.1. Standard: Choosing the Right API Interaction Method
**Do This:**
* Evaluate these methods:
* **HTTP Requests (using extensions like "http" or "curl"):** Suitable for RESTful APIs.
* **Message Queues (using extensions like "pg_amqp" or "pg_kafka"):** Ideal for asynchronous communication.
* **Foreign Data Wrappers (FDWs):** For integrating with other databases or data stores directly.
* Choose the method that best fits the API's protocol, data format, and communication pattern.
**Don't Do This:**
* Force a specific integration method because it's familiar. Consider alternatives based on the API's characteristics.
* Build custom, ad-hoc solutions when standard extensions and FDWs provide the necessary functionality.
**Why:** Selecting the right method simplifies integration, improves performance, and reduces development effort.
**Example (using "http" extension for a REST API):**
"""sql
-- Example using the http extension to call a REST API
CREATE EXTENSION IF NOT EXISTS http;
CREATE OR REPLACE FUNCTION api_get_weather(city TEXT)
RETURNS JSONB AS $$
DECLARE
api_url TEXT := 'https://api.weatherapi.com/v1/current.json?key=YOUR_API_KEY&q=' || city;
response HTTPResponse;
BEGIN
response := http_get(api_url);
IF response.status_code = 200 THEN
RETURN response.content::jsonb;
ELSE
RAISE EXCEPTION 'Weather API error: %', response.content;
END IF;
EXCEPTION
WHEN OTHERS THEN
RAISE EXCEPTION 'Error fetching weather data: %', SQLERRM;
END;
$$ LANGUAGE plpgsql;
-- SELECT api_get_weather('London');
"""
### 2.2. Standard: Error Handling and Retries
**Do This:**
* Implement robust error handling for API calls. Catch exceptions, log errors, and implement retry mechanisms with exponential backoff.
* Distinguish between transient and permanent errors. Retry transient errors (e.g., network timeouts), and log permanent errors (e.g., invalid API key) for investigation.
* Set appropriate timeouts for API calls to prevent indefinite blocking.
* Consider using "TRY...CATCH" blocks for error handling within PL/pgSQL functions.
**Don't Do This:**
* Ignore errors from API calls. At a minimum, log the error so it can be investigated later.
* Retry indefinitely without a limit or backoff strategy. This can overload the API or the database.
**Why:** Robust error handling ensures resilience and prevents cascading failures. It provides valuable insights into API issues.
**Example:**
"""sql
CREATE OR REPLACE FUNCTION api_get_data_with_retry(url TEXT, max_retries INT DEFAULT 3)
RETURNS JSONB AS $$
DECLARE
response HTTPResponse;
retries INT := 0;
delay INTERVAL := '1 second';
BEGIN
LOOP
BEGIN
response := http_get(url);
IF response.status_code = 200 THEN
RETURN response.content::jsonb;
ELSE
RAISE WARNING 'API call failed with status code: %', response.status_code;
-- Check for non-retryable errors here!
-- IF response.status_code = 400 THEN RETURN NULL; -- Bad Request (do not retry)
END IF;
EXCEPTION
WHEN OTHERS THEN
RAISE WARNING 'API call error: %', SQLERRM;
END;
retries := retries + 1;
IF retries >= max_retries THEN
RAISE EXCEPTION 'Max retries exceeded for API call.';
END IF;
RAISE NOTICE 'Retrying in %', delay;
PERFORM pg_sleep(extract(epoch from delay));
delay := delay * 2; -- Exponential backoff
END LOOP;
EXCEPTION
WHEN OTHERS THEN
RAISE EXCEPTION 'Failed to get data after multiple retries: %', SQLERRM;
END;
$$ LANGUAGE plpgsql;
"""
### 2.3. Standard: Security Considerations
**Do This:**
* Store API keys and secrets securely using PostgreSQL's configuration parameters or a dedicated secrets management solution. NEVER hardcode API keys in SQL code.
* Use HTTPS for all API calls to encrypt data in transit.
* Validate API responses to prevent data injection (e.g., JSON injection).
* Implement rate limiting to prevent abuse.
* Use least privilege principle when granting permissions to API interaction functions.
**Don't Do This:**
* Hardcode API keys or secrets in SQL code or store them in plain text in the database.
* Trust API responses implicitly. Always validate the data.
* Expose your PostgreSQL database directly to the internet without proper firewall and security measures.
**Why:** Security is paramount. Protecting API keys, encrypting data, and rate limiting prevent unauthorized access and malicious attacks.
**Example:**
"""sql
-- Storing API key securely using postgresql.conf
-- In postgresql.conf:
-- api.weather_api_key = 'YOUR_API_KEY'
-- SQL to retrieve the API key
CREATE OR REPLACE FUNCTION api_get_weather_secure(city TEXT)
RETURNS JSONB AS $$
DECLARE
api_url TEXT := 'https://api.weatherapi.com/v1/current.json?key=' || current_setting('api.weather_api_key') || '&q=' || city;
response HTTPResponse;
BEGIN
response := http_get(api_url);
IF response.status_code = 200 THEN
RETURN response.content::jsonb;
ELSE
RAISE EXCEPTION 'Weather API error: %', response.content;
END IF;
EXCEPTION
WHEN OTHERS THEN
RAISE EXCEPTION 'Error fetching weather data: %', SQLERRM;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER; -- SECURITY DEFINER crucial for accessing external configurations
-- Revoke execute permission from public
REVOKE EXECUTE ON FUNCTION api_get_weather_secure(TEXT) FROM PUBLIC;
-- Grant access to specific roles
GRANT EXECUTE ON FUNCTION api_get_weather_secure(TEXT) TO your_application_role;
"""
### 2.4. Standard: Performance Optimization
**Do This:**
* Cache API responses to reduce the number of API calls, especially for frequently accessed data. Use "MATERIALIZED VIEW" or a custom cache table.
* Use connection pooling to minimize the overhead of establishing new connections to APIs. Some HTTP extensions do this internally.
* Optimize data transfer by requesting only the necessary fields from the API. Use appropriate query parameters.
**Don't Do This:**
* Make redundant API calls. Identify opportunities for caching or batching.
* Retrieve large amounts of data from APIs when only a small subset is needed.
**Why:** Performance optimization improves application responsiveness and reduces API usage costs.
**Example (using a materialized view for caching):**
"""sql
CREATE MATERIALIZED VIEW weather_cache AS
SELECT
city,
api_get_weather(city) AS weather_data,
NOW() AS last_updated
FROM (VALUES ('London'), ('New York'), ('Tokyo')) AS cities(city);
CREATE UNIQUE INDEX idx_weather_cache_city ON weather_cache (city);
-- Refresh the cache periodically
CREATE OR REPLACE FUNCTION refresh_weather_cache()
RETURNS VOID AS $$
BEGIN
REFRESH MATERIALIZED VIEW CONCURRENTLY weather_cache;
RETURN;
END;
$$ LANGUAGE plpgsql;
-- Schedule daily refreshes with pg_cron or a similar scheduler:
-- SELECT cron.schedule('0 0 * * *', 'SELECT refresh_weather_cache()');
-- Usage:
CREATE OR REPLACE FUNCTION get_weather_from_cache(city TEXT)
RETURNS JSONB AS $$
BEGIN
RETURN (SELECT weather_data FROM weather_cache WHERE city = get_weather_from_cache.city);
EXCEPTION
WHEN no_data_found THEN
RETURN api_get_weather(city); -- if not in cache, fetch it from the API
END;
$$ LANGUAGE plpgsql;
"""
## 3. Coding Style and Conventions
### 3.1. Standard: Code Formatting and Comments
**Do This:**
* Use consistent indentation (typically 4 spaces) and line breaks to improve readability.
* Add comments to explain complex logic, API calls, and data transformations.
* Use meaningful names for variables, functions, and parameters.
**Don't Do This:**
* Write long, monolithic functions without comments or clear structure.
* Use cryptic or ambiguous names.
**Why:** Consistent formatting and clear comments make the code easier to understand and maintain.
### 3.2. Standard: Transaction Management
**Do This:**
* Wrap API calls within explicit transactions when necessary to ensure data consistency. Use "BEGIN", "COMMIT", and "ROLLBACK".
* Handle potential errors during API calls gracefully and roll back the transaction if necessary.
**Don't Do This:**
* Leave transactions open for extended periods of time while waiting for API responses.
* Commit transactions before ensuring the success of all related API calls.
**Why:** Proper transaction management ensures data integrity and prevents inconsistencies.
### 3.3. Standard: Testing
**Do This:**
* Write unit tests for API interaction functions to verify that they handle different scenarios correctly (e.g., success, error, timeout).
* Use mock APIs or stubs to isolate the database from external dependencies during testing.
* Write integration tests to ensure that the database and APIs work together seamlessly.
**Don't Do This:**
* Skip testing API interaction code. This can lead to unexpected errors and integration issues in production.
* Rely solely on manual testing.
**Why:** Automated testing improves code quality, reduces the risk of regressions, and facilitates continuous integration and delivery.
These API integration standards will help create reliable, secure, and maintainable PostgreSQL applications that integrate effectively with external services. Remember to stay updated with the latest PostgreSQL features and best practices as the ecosystem evolves.
danielsogl
Created Mar 6, 2025
This guide explains how to effectively use .clinerules
with Cline, the AI-powered coding assistant.
The .clinerules
file is a powerful configuration file that helps Cline understand your project's requirements, coding standards, and constraints. When placed in your project's root directory, it automatically guides Cline's behavior and ensures consistency across your codebase.
Place the .clinerules
file in your project's root directory. Cline automatically detects and follows these rules for all files within the project.
# Project Overview project: name: 'Your Project Name' description: 'Brief project description' stack: - technology: 'Framework/Language' version: 'X.Y.Z' - technology: 'Database' version: 'X.Y.Z'
# Code Standards standards: style: - 'Use consistent indentation (2 spaces)' - 'Follow language-specific naming conventions' documentation: - 'Include JSDoc comments for all functions' - 'Maintain up-to-date README files' testing: - 'Write unit tests for all new features' - 'Maintain minimum 80% code coverage'
# Security Guidelines security: authentication: - 'Implement proper token validation' - 'Use environment variables for secrets' dataProtection: - 'Sanitize all user inputs' - 'Implement proper error handling'
Be Specific
Maintain Organization
Regular Updates
# Common Patterns Example patterns: components: - pattern: 'Use functional components by default' - pattern: 'Implement error boundaries for component trees' stateManagement: - pattern: 'Use React Query for server state' - pattern: 'Implement proper loading states'
Commit the Rules
.clinerules
in version controlTeam Collaboration
Rules Not Being Applied
Conflicting Rules
Performance Considerations
# Basic .clinerules Example project: name: 'Web Application' type: 'Next.js Frontend' standards: - 'Use TypeScript for all new code' - 'Follow React best practices' - 'Implement proper error handling' testing: unit: - 'Jest for unit tests' - 'React Testing Library for components' e2e: - 'Cypress for end-to-end testing' documentation: required: - 'README.md in each major directory' - 'JSDoc comments for public APIs' - 'Changelog updates for all changes'
# Advanced .clinerules Example project: name: 'Enterprise Application' compliance: - 'GDPR requirements' - 'WCAG 2.1 AA accessibility' architecture: patterns: - 'Clean Architecture principles' - 'Domain-Driven Design concepts' security: requirements: - 'OAuth 2.0 authentication' - 'Rate limiting on all APIs' - 'Input validation with Zod'
# Database: Create RLS policies You're a Supabase Postgres expert in writing row level security policies. Your purpose is to generate a policy with the constraints given by the user. You should first retrieve schema information to write policies for, usually the 'public' schema. The output should use the following instructions: - The generated SQL must be valid SQL. - You can use only CREATE POLICY or ALTER POLICY queries, no other queries are allowed. - Always use double apostrophe in SQL strings (eg. 'Night''s watch') - You can add short explanations to your messages. - The result should be a valid markdown. The SQL code should be wrapped in ``` (including sql language tag). - Always use "auth.uid()" instead of "current_user". - SELECT policies should always have USING but not WITH CHECK - INSERT policies should always have WITH CHECK but not USING - UPDATE policies should always have WITH CHECK and most often have USING - DELETE policies should always have USING but not WITH CHECK - Don't use `FOR ALL`. Instead separate into 4 separate policies for select, insert, update, and delete. - The policy name should be short but detailed text explaining the policy, enclosed in double quotes. - Always put explanations as separate text. Never use inline SQL comments. - If the user asks for something that's not related to SQL policies, explain to the user that you can only help with policies. - Discourage `RESTRICTIVE` policies and encourage `PERMISSIVE` policies, and explain why. The output should look like this: ```sql CREATE POLICY "My descriptive policy." ON books FOR INSERT to authenticated USING ( (select auth.uid()) = author_id ) WITH ( true ); ``` Since you are running in a Supabase environment, take note of these Supabase-specific additions below. ## Authenticated and unauthenticated roles Supabase maps every request to one of the roles: - `anon`: an unauthenticated request (the user is not logged in) - `authenticated`: an authenticated request (the user is logged in) These are actually [Postgres Roles](/docs/guides/database/postgres/roles). You can use these roles within your Policies using the `TO` clause: ```sql create policy "Profiles are viewable by everyone" on profiles for select to authenticated, anon using ( true ); -- OR create policy "Public profiles are viewable only by authenticated users" on profiles for select to authenticated using ( true ); ``` Note that `for ...` must be added after the table but before the roles. `to ...` must be added after `for ...`: ### Incorrect ```sql create policy "Public profiles are viewable only by authenticated users" on profiles to authenticated for select using ( true ); ``` ### Correct ```sql create policy "Public profiles are viewable only by authenticated users" on profiles for select to authenticated using ( true ); ``` ## Multiple operations PostgreSQL policies do not support specifying multiple operations in a single FOR clause. You need to create separate policies for each operation. ### Incorrect ```sql create policy "Profiles can be created and deleted by any user" on profiles for insert, delete -- cannot create a policy on multiple operators to authenticated with check ( true ) using ( true ); ``` ### Correct ```sql create policy "Profiles can be created by any user" on profiles for insert to authenticated with check ( true ); create policy "Profiles can be deleted by any user" on profiles for delete to authenticated using ( true ); ``` ## Helper functions Supabase provides some helper functions that make it easier to write Policies. ### `auth.uid()` Returns the ID of the user making the request. ### `auth.jwt()` Returns the JWT of the user making the request. Anything that you store in the user's `raw_app_meta_data` column or the `raw_user_meta_data` column will be accessible using this function. It's important to know the distinction between these two: - `raw_user_meta_data` - can be updated by the authenticated user using the `supabase.auth.update()` function. It is not a good place to store authorization data. - `raw_app_meta_data` - cannot be updated by the user, so it's a good place to store authorization data. The `auth.jwt()` function is extremely versatile. For example, if you store some team data inside `app_metadata`, you can use it to determine whether a particular user belongs to a team. For example, if this was an array of IDs: ```sql create policy "User is in team" on my_table to authenticated using ( team_id in (select auth.jwt() -> 'app_metadata' -> 'teams')); ``` ### MFA The `auth.jwt()` function can be used to check for [Multi-Factor Authentication](/docs/guides/auth/auth-mfa#enforce-rules-for-mfa-logins). For example, you could restrict a user from updating their profile unless they have at least 2 levels of authentication (Assurance Level 2): ```sql create policy "Restrict updates." on profiles as restrictive for update to authenticated using ( (select auth.jwt()->>'aal') = 'aal2' ); ``` ## RLS performance recommendations Every authorization system has an impact on performance. While row level security is powerful, the performance impact is important to keep in mind. This is especially true for queries that scan every row in a table - like many `select` operations, including those using limit, offset, and ordering. Based on a series of [tests](https://github.com/GaryAustin1/RLS-Performance), we have a few recommendations for RLS: ### Add indexes Make sure you've added [indexes](/docs/guides/database/postgres/indexes) on any columns used within the Policies which are not already indexed (or primary keys). For a Policy like this: ```sql create policy "Users can access their own records" on test_table to authenticated using ( (select auth.uid()) = user_id ); ``` You can add an index like: ```sql create index userid on test_table using btree (user_id); ``` ### Call functions with `select` You can use `select` statement to improve policies that use functions. For example, instead of this: ```sql create policy "Users can access their own records" on test_table to authenticated using ( auth.uid() = user_id ); ``` You can do: ```sql create policy "Users can access their own records" on test_table to authenticated using ( (select auth.uid()) = user_id ); ``` This method works well for JWT functions like `auth.uid()` and `auth.jwt()` as well as `security definer` Functions. Wrapping the function causes an `initPlan` to be run by the Postgres optimizer, which allows it to "cache" the results per-statement, rather than calling the function on each row. Caution: You can only use this technique if the results of the query or function do not change based on the row data. ### Minimize joins You can often rewrite your Policies to avoid joins between the source and the target table. Instead, try to organize your policy to fetch all the relevant data from the target table into an array or set, then you can use an `IN` or `ANY` operation in your filter. For example, this is an example of a slow policy which joins the source `test_table` to the target `team_user`: ```sql create policy "Users can access records belonging to their teams" on test_table to authenticated using ( (select auth.uid()) in ( select user_id from team_user where team_user.team_id = team_id -- joins to the source "test_table.team_id" ) ); ``` We can rewrite this to avoid this join, and instead select the filter criteria into a set: ```sql create policy "Users can access records belonging to their teams" on test_table to authenticated using ( team_id in ( select team_id from team_user where user_id = (select auth.uid()) -- no join ) ); ``` ### Specify roles in your policies Always use the Role of inside your policies, specified by the `TO` operator. For example, instead of this query: ```sql create policy "Users can access their own records" on rls_test using ( auth.uid() = user_id ); ``` Use: ```sql create policy "Users can access their own records" on rls_test to authenticated using ( (select auth.uid()) = user_id ); ``` This prevents the policy `( (select auth.uid()) = user_id )` from running for any `anon` users, since the execution stops at the `to authenticated` step.
# Database: Create migration You are a Postgres Expert who loves creating secure database schemas. This project uses the migrations provided by the Supabase CLI. ## Creating a migration file Given the context of the user's message, create a database migration file inside the folder `supabase/migrations/`. The file MUST following this naming convention: The file MUST be named in the format `YYYYMMDDHHmmss_short_description.sql` with proper casing for months, minutes, and seconds in UTC time: 1. `YYYY` - Four digits for the year (e.g., `2024`). 2. `MM` - Two digits for the month (01 to 12). 3. `DD` - Two digits for the day of the month (01 to 31). 4. `HH` - Two digits for the hour in 24-hour format (00 to 23). 5. `mm` - Two digits for the minute (00 to 59). 6. `ss` - Two digits for the second (00 to 59). 7. Add an appropriate description for the migration. For example: ``` 20240906123045_create_profiles.sql ``` ## SQL Guidelines Write Postgres-compatible SQL code for Supabase migration files that: - Includes a header comment with metadata about the migration, such as the purpose, affected tables/columns, and any special considerations. - Includes thorough comments explaining the purpose and expected behavior of each migration step. - Write all SQL in lowercase. - Add copious comments for any destructive SQL commands, including truncating, dropping, or column alterations. - When creating a new table, you MUST enable Row Level Security (RLS) even if the table is intended for public access. - When creating RLS Policies - Ensure the policies cover all relevant access scenarios (e.g. select, insert, update, delete) based on the table's purpose and data sensitivity. - If the table is intended for public access the policy can simply return `true`. - RLS Policies should be granular: one policy for `select`, one for `insert` etc) and for each supabase role (`anon` and `authenticated`). DO NOT combine Policies even if the functionality is the same for both roles. - Include comments explaining the rationale and intended behavior of each security policy The generated SQL code should be production-ready, well-documented, and aligned with Supabase's best practices.
# Postgres SQL Style Guide ## General - Use lowercase for SQL reserved words to maintain consistency and readability. - Employ consistent, descriptive identifiers for tables, columns, and other database objects. - Use white space and indentation to enhance the readability of your code. - Store dates in ISO 8601 format (`yyyy-mm-ddThh:mm:ss.sssss`). - Include comments for complex logic, using '/_ ... _/' for block comments and '--' for line comments. ## Naming Conventions - Avoid SQL reserved words and ensure names are unique and under 63 characters. - Use snake_case for tables and columns. - Prefer plurals for table names - Prefer singular names for columns. ## Tables - Avoid prefixes like 'tbl\_' and ensure no table name matches any of its column names. - Always add an `id` column of type `identity generated always` unless otherwise specified. - Create all tables in the `public` schema unless otherwise specified. - Always add the schema to SQL queries for clarity. - Always add a comment to describe what the table does. The comment can be up to 1024 characters. ## Columns - Use singular names and avoid generic names like 'id'. - For references to foreign tables, use the singular of the table name with the `_id` suffix. For example `user_id` to reference the `users` table - Always use lowercase except in cases involving acronyms or when readability would be enhanced by an exception. #### Examples: ```sql create table books ( id bigint generated always as identity primary key, title text not null, author_id bigint references authors (id) ); comment on table books is 'A list of all the books in the library.'; ``` ## Queries - When the query is shorter keep it on just a few lines. As it gets larger start adding newlines for readability - Add spaces for readability. Smaller queries: ```sql select * from employees where end_date is null; update employees set end_date = '2023-12-31' where employee_id = 1001; ``` Larger queries: ```sql select first_name, last_name from employees where start_date between '2021-01-01' and '2021-12-31' and status = 'employed'; ``` ### Joins and Subqueries - Format joins and subqueries for clarity, aligning them with related SQL clauses. - Prefer full table names when referencing tables. This helps for readability. ```sql select employees.employee_name, departments.department_name from employees join departments on employees.department_id = departments.department_id where employees.start_date > '2022-01-01'; ``` ## Aliases - Use meaningful aliases that reflect the data or transformation applied, and always include the 'as' keyword for clarity. ```sql select count(*) as total_employees from employees where end_date is null; ``` ## Complex queries and CTEs - If a query is extremely complex, prefer a CTE. - Make sure the CTE is clear and linear. Prefer readability over performance. - Add comments to each block. ```sql with department_employees as ( -- Get all employees and their departments select employees.department_id, employees.first_name, employees.last_name, departments.department_name from employees join departments on employees.department_id = departments.department_id ), employee_counts as ( -- Count how many employees in each department select department_name, count(*) as num_employees from department_employees group by department_name ) select department_name, num_employees from employee_counts order by department_name; ```
# Core Architecture Standards for PostgreSQL This document outlines the coding standards for the core architecture of PostgreSQL. It aims to provide clear guidance for developers contributing to the core codebase, ensuring maintainability, performance, security, and consistency. The standards reflect modern approaches, patterns, and the latest features of PostgreSQL. ## 1. Fundamental Architectural Patterns PostgreSQL's core architecture is based on a process-based model, where each client connection is handled by a separate server process. This concurrency model heavily relies on shared memory for inter-process communication and data sharing. **Do This:** * Understand the process-based architecture deeply. Familiarize yourself with the following processes: "postgres" (the postmaster), "backend" (server processes), "walwriter", "autovacuum launcher", "stats collector", and "bgwriter". * Design extensions with process isolation in mind. Avoid global state modification to prevent unintended side effects across different backend processes. * Favor shared memory mechanisms for data sharing across backends over file-based communication where performance is critical. **Don't Do This:** * Create singletons or static variables that hold global state without proper consideration for concurrency. This will lead to unexpected behavior and difficult to debug race conditions. * Introduce shared resources without adequate locking mechanisms. * Rely on inter-process communication (IPC) without understanding the potential for deadlocks or race conditions. **Why:** Maintaining a well-defined process model ensures stability and scalability. Properly isolated processes minimize the risk of crashes affecting other connections. ### 1.1 Process Lifecycle Each PostgreSQL backend process follows a well-defined lifecycle: 1. **Startup:** Initialization of process-specific resources and connection to the shared memory. 2. **Authentication:** Verification of the client's identity. 3. **Query Processing:** Parsing, planning, and execution of SQL queries. 4. **Transaction Management:** Ensuring ACID properties of database operations. 5. **Shutdown:** Clean-up of resources and disconnection from shared memory. **Do This:** * Ensure proper resource cleanup in all stages of the lifecycle, especially during error handling. * Use "elog()" with appropriate severity levels for logging events during the lifecycle. * Catch and handle exceptions appropriately throughout the lifecycle. **Don't Do This:** * Leak resources (memory, file descriptors, etc.) during any phase of the process lifecycle. * Ignore errors during startup or shutdown. * Introduce long-running operations inside the authentication phase. **Why:** Strict adherence to the process lifecycle prevents resource exhaustion and ensures a clean state upon process termination. ### 1.2 Shared Memory Management Shared memory provides a crucial mechanism for communication and data sharing between PostgreSQL backend processes. **Do This:** * Use PostgreSQL's shared memory APIs (e.g., "ShmemAlloc()", "ShmemInitStruct()") for allocating and managing shared memory. These functions handle the platform-specific details of shared memory allocation and ensure proper alignment and size constraints. * Protect access to shared memory regions using appropriate locking mechanisms (e.g., "LWLock", "SpinLock"). * Define shared memory segments in "src/backend/utils/misc/ipc.c" or a relevant module's initialization function. **Don't Do This:** * Directly use system calls like "shmget()" and "shmat()" without going through PostgreSQL's shared memory APIs. * Assume atomicity of operations on shared memory regions. Always use locking. * Overallocate shared memory. Reserve only what is necessary. **Why:** Proper shared memory management prevents corruption, ensures data integrity, and avoids resource conflicts between processes. **Example:** """c /* Example of allocating and using shared memory */ typedef struct { int counter; LWLock lock; } MySharedData; static MySharedData *mySharedData; void initializeMySharedData(void) { bool found; mySharedData = ShmemInitStruct("MySharedData", sizeof(MySharedData), &found); if (!found) { /* Initialize shared memory on first allocation */ mySharedData->counter = 0; LWLockInitialize(&mySharedData->lock, LWLockAssign()); } } int incrementCounter(void) { int result; LWLockAcquire(&mySharedData->lock, LW_EXCLUSIVE); result = ++mySharedData->counter; LWLockRelease(&mySharedData->lock); return result; } """ ## 2. Project Structure and Organization PostgreSQL's source code is organized into a directory structure that reflects its functionality. **Do This:** * Familiarize yourself with the top-level directories: "src", "doc", "contrib", "src" is where the core source code resides. * Understand the purpose of subdirectories within "src", such as "backend", "include", and "port". * Place new code in the appropriate directory based on its functionality. * Maintain consistency in coding style and naming conventions within each directory. **Don't Do This:** * Randomly place files in arbitrary directories. * Create unnecessary dependencies between modules. * Violate the established directory structure without a clear justification. **Why:** A well-organized project structure facilitates navigation, understanding, and maintenance of the codebase. Clear directory conventions maintain code clarity. ### 2.1 Core Directories Key directories within the "src" directory include: * "src/backend": Contains the core backend code, including query processing, transaction management, storage, and indexing. * "src/include": Contains header files that define the interfaces used by the backend code. * "src/port": Contains platform-specific code. * "src/common": Contains code shared across multiple parts of the backend. * "src/fe_utils": Contains utilities used by the frontend. **Do This:** * Follow the existing directory structure when adding new features or modifying existing ones. * Create new subdirectories within existing directories if necessary to organize logically related code. * Use header files in "src/include" to define public interfaces for modules. **Don't Do This:** * Include implementation details in header files. * Create circular dependencies between directories. **Why:** A modular directory structure ensures a logical separation of concerns and minimizes dependencies between modules helping reduce build times. ### 2.2 Coding Style PostgreSQL has a well-defined coding style outlined in "doc/src/sgml/develop.sgml". **Do This:** * Adhere to the coding style guidelines regarding indentation, spacing, naming conventions, and comment formatting. * Use "pgindent" to automatically format your code. * Write concise and informative comments. **Don't Do This:** * Ignore the coding style guidelines. * Write lengthy or redundant comments. * Use inconsistent naming conventions. **Why:** Consistent coding style improves readability and maintainability of the code. "pgindent" ensures code conforms to the standard style automatically. ## 3. Modern Approaches and Patterns Modern PostgreSQL development emphasizes several key approaches: * **Extensibility:** PostgreSQL is designed to be extensible through extensions. * **Concurrency:** Handling multiple concurrent connections efficiently is crucial. * **Security:** Preventing vulnerabilities and ensuring data integrity are paramount. ### 3.1 Extension Development Extensions are the primary way to add new functionality to PostgreSQL. **Do This:** * Use the Extension Control File (".control") to define the extension's metadata. * Provide SQL scripts for creating and dropping database objects. * Use hooks ("ExecutorStart_hook", "ExecutorRun_hook", etc.) to extend the core functionality. * Follow the security guidelines for extension development. **Don't Do This:** * Modify the core PostgreSQL code directly (unless absolutely necessary and approved by the community). * Introduce security vulnerabilities through insecure extension code. * Make assumptions about the internal implementation details of PostgreSQL that could change in future versions. **Why:** Extensions allow adding new features without modifying the core code. **Example:** """sql -- Example SQL script for creating a function in an extension CREATE FUNCTION my_extension_function(text) RETURNS text AS '$libdir/my_extension', 'my_extension_function' LANGUAGE C IMMUTABLE STRICT; """ ### 3.2 Concurrency Control PostgreSQL uses Multi-Version Concurrency Control (MVCC) to manage concurrent access to data. **Do This:** * Understand MVCC and its implications for data consistency. * Use appropriate transaction isolation levels to prevent data anomalies. * Minimize lock contention by optimizing queries and using appropriate indexing strategies. * When working with internal data structures, be mindful of concurrent access and utilize PostgreSQL's locking primitives (LWLock, spinlocks) appropriately. **Don't Do This:** * Ignore the potential for data anomalies when using low transaction isolation levels. * Introduce unnecessary locking that could lead to deadlocks. * Perform long-running operations within a single transaction. **Why:** MVCC ensures data consistency and allows concurrent access to data. ### 3.3 Security Best Practices Security is a critical aspect of PostgreSQL development. **Do This:** * Follow secure coding practices to prevent vulnerabilities such as SQL injection and buffer overflows. * Use hardened APIs to avoid common security pitfalls. * Validate input data carefully. * Avoid hardcoding sensitive information such as passwords. * Be aware of the security implications of new features. **Don't Do This:** * Ignore security warnings. * Implement custom encryption algorithms (use PostgreSQL's built-in encryption features). * Grant excessive privileges to users or roles. **Why:** Secure coding practices are essential for preventing data breaches and ensuring the integrity of the database. ### 3.4 Memory Management Efficient Memory management is key to PostgreSQL's performance and stability. **Do This:** * Use PostgreSQL's memory context mechanism ("MemoryContext") for allocating and freeing memory within a query lifecycle. This mechanism provides automatic memory cleanup at the end of a query preventing memory leaks. * Understand the different memory contexts (e.g., "TopMemoryContext", "QueryMemoryContext") and use them appropriately. * Avoid manual memory management ("malloc"/"free") unless absolutely necessary (and only if you REALLY know what you are doing). Use PostgreSQL's "palloc"/"pfree" within a memory context. * Profile memory usage to identify and fix memory leaks. **Don't Do This:** * Leak memory by failing to free allocated memory. * Allocate large amounts of memory without considering the impact on performance. * Use "malloc"/"free" without a deep understanding of PostgreSQL's memory management. **Why:** Efficient memory management prevents memory leaks, reduces memory fragmentation, and improves overall performance. The memory context system automates this and integrates with the query processing lifecycle. **Example:** """c /* Example using MemoryContext */ MemoryContext myContext; char *data; /* Create a new memory context */ myContext = AllocSetContextCreate(CurrentMemoryContext, "MyContext", ALLOCSET_DEFAULT_SIZES); /* Switch to the new memory context */ MemoryContext oldContext = MemoryContextSwitchTo(myContext); /* Allocate memory within the new context */ data = palloc(100); /* Switch back to the previous memory context. The 'data' still exists */ MemoryContextSwitchTo(oldContext); /* ... use data ... */ /* At the end, the memory context 'myContext' is destroyed, and all memory allocated within it is automatically freed */ MemoryContextDelete(myContext); """ These standards aim to provide a comprehensive guide for contributing to the core architecture of PostgreSQL, by promoting best practices and ensuring code maintainability, performance, and security. By following these guidelines, developers can help ensure that PostgreSQL remains a robust, reliable, and extensible database system.
# Component Design Standards for PostgreSQL This document outlines coding standards specifically for component design within PostgreSQL. These standards promote the creation of reusable, maintainable, performant, and secure components. It focuses on the latest PostgreSQL features and modern architectural patterns. While the overall PostgreSQL coding standards document covers other areas like naming conventions, this document *exclusively* focuses on the **design** of independent and composable database components and their interaction. ## 1. Architectural Principles ### 1.1. Modularity and Loose Coupling * **Do This:** Design components with a single, well-defined purpose and minimal dependencies on other components. Favor interfaces and contracts over direct implementation dependencies. * **Don't Do This:** Create monolithic functions or stored procedures that perform multiple unrelated tasks. Avoid tightly coupled components that are difficult to modify or reuse independently. * **Why:** Modularity improves code readability, testability, and reusability. Loose coupling reduces the impact of changes to one component on other parts of the system. This promotes agility. ### 1.2. Separation of Concerns * **Do This:** Separate data access logic, business logic, and presentation logic within your database components. Use views for data transformation and presentation, stored procedures for business logic, and appropriate roles and permissions for access control. Maintain clean separation between database schema, functions, data types, and the application. * **Don't Do This:** Mix business logic and data access code within a single component. Embed presentation logic (e.g., formatting output) within stored procedures. Allow direct data access from outside the database without proper authorization and abstraction layers. * **Why:** Separation of concerns improves code organization, reduces redundancy, promotes maintainability, and enhances security. This simplifies testing and debugging. Applying separation of concerns to database design yields long-term benefits during evolving business requirements. ### 1.3. Abstraction * **Do This:** Use views, functions, and custom types to abstract away complex data structures or logic. Provide a simplified interface to the underlying functionality. Leverage object-relational mapping (ORM) where applicable, but always understand the underlying SQL generated. * **Don't Do This:** Expose raw tables directly to applications. Implement complex calculations or data transformations repeatedly within the same codebase. Build logic directly against internal system tables. * **Why:** Abstraction makes code easier to understand, simplifies maintenance, and allows you to change the underlying implementation without affecting the code that uses it. Well-defined abstractions guard against unforeseen consequences from seemingly innocuous changes elsewhere in the codebase. ### 1.4. Information Hiding * **Do This:** Restrict access to internal data structures and implementation details of components using appropriate permissions and roles. Use functions to encapsulate data access and manipulation. * **Don't Do This:** Grant unnecessary permissions to users or roles. Expose internal tables or columns without proper control. Rely on undocumented or unstable system tables. * **Why:** Information hiding protects the integrity of your data and prevents unintended side effects. It allows you to change the internal implementation of a component without affecting the code that uses it. ## 2. Component Types and Best Practices ### 2.1. Views * **Do This:** Use views to simplify complex queries, pre-aggregate data, and present a customized view of the data to different users or applications. Create materialized views for performance-critical reports or aggregations. Consider updatable views for simple write-through scenarios. Use "SECURITY DEFINER" appropriately. * **Don't Do This:** Create overly complex views that perform multiple unrelated tasks. Use views to replace proper indexing or query optimization. Update data directly through complex views with potential performance problems or data integrity risks without careful planning. * **Why:** Views provide a layer of abstraction over the underlying database tables, simplifying queries and improving data access control. Materialized views improve query performance by pre-computing and storing the results of a query. * **Example:** """sql -- Simple view for reporting CREATE VIEW sales_summary AS SELECT customer_id, SUM(amount) AS total_sales FROM orders GROUP BY customer_id; -- Materialized view for performance CREATE MATERIALIZED VIEW monthly_sales_summary AS SELECT DATE_TRUNC('month', order_date) AS sales_month, SUM(amount) AS total_sales FROM orders GROUP BY sales_month; -- Refreshing the materialized view (consider background process) REFRESH MATERIALIZED VIEW monthly_sales_summary; -- Updatable view CREATE VIEW employee_public AS SELECT id, name, email FROM employees WHERE is_public = TRUE; CREATE RULE employee_public_insert AS ON INSERT TO employee_public DO INSTEAD INSERT INTO employees (name, email, is_public) VALUES (NEW.name, NEW.email, TRUE); """ ### 2.2. Functions and Procedures * **Do This:** Use functions to encapsulate reusable logic, perform calculations, or implement business rules. Use stored procedures for complex transactions or batch operations. Use the "SECURITY DEFINER" clause appropriately, and always validate input parameters. Consider immutability when appropriate. * **Don't Do This:** Implement long, complex functions that are difficult to understand and maintain. Use functions for operations that should be performed by the application layer. Ignore SQL injection vulnerabilities. Overuse "SECURITY DEFINER", potentially bypassing row-level security. * **Why:** Functions and procedures provide a way to encapsulate reusable logic and improve code organization. "SECURITY DEFINER" functions allow you to grant controlled access to data or operations that a user would not otherwise have access to. However, they must be carefully written to avoid security vulnerabilities. * **Example:** """sql -- Function for calculating total order amount CREATE OR REPLACE FUNCTION calculate_total_amount(order_id INT) RETURNS NUMERIC AS $$ DECLARE total NUMERIC; BEGIN SELECT SUM(price * quantity) INTO total FROM order_items WHERE order_id = calculate_total_amount.order_id; RETURN total; END; $$ LANGUAGE plpgsql; -- Stored procedure for creating a new order CREATE OR REPLACE PROCEDURE create_new_order(customer_id INT, order_date DATE) LANGUAGE plpgsql AS $$ BEGIN INSERT INTO orders (customer_id, order_date) VALUES (customer_id, order_date); -- Perform other operations related to order creation COMMIT; END; $$; -- SECURITY DEFINER example CREATE OR REPLACE FUNCTION get_sensitive_data(user_id INT) RETURNS TEXT SECURITY DEFINER AS $$ DECLARE sensitive_info TEXT; BEGIN -- Access table only accessible to a service role SELECT data INTO sensitive_info FROM sensitive_data WHERE user_id = get_sensitive_data.user_id; RETURN sensitive_info; END; $$ LANGUAGE plpgsql; -- It's very important to set the search path accordingly to prevent malicious users -- from exploiting the function ALTER FUNCTION get_sensitive_data(INT) SET search_path = my_secure_schema; -- Example of an IMMUTABLE function CREATE FUNCTION add(integer, integer) RETURNS integer IMMUTABLE LANGUAGE SQL RETURN $1 + $2; """ ### 2.3. Custom Types * **Do This:** Use custom types to represent complex data structures or domain-specific concepts. Define appropriate input and output functions for custom types. Use enumerated types ("ENUM") for representing a fixed set of values. * **Don't Do This:** Overuse custom types for simple data values. Create overly complex custom types that are difficult to understand and maintain. Neglect defining required input/output functions. * **Why:** Custom types improve code readability, data integrity, and type safety. They allow you to represent complex data structures in a more natural and intuitive way. * **Example:** """sql -- Create a custom type for representing a point CREATE TYPE point AS ( x NUMERIC, y NUMERIC ); -- Creating an ENUM type CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy'); -- Using the custom type CREATE TABLE locations ( id SERIAL PRIMARY KEY, name TEXT, coordinates POINT ); INSERT INTO locations (name, coordinates) VALUES ('Office', (10.0, 20.0)); SELECT (coordinates).x, (coordinates).y FROM locations WHERE name = 'Office'; """ ### 2.4. Extensions * **Do This:** Leverage existing PostgreSQL extensions to add specialized functionality (e.g., PostGIS for geospatial data, pg_trgm for fuzzy string matching, uuid-ossp for UUID generation). Create your own extensions for sharing reusable database components across multiple databases. Check compatibility before using an extension. * **Don't Do This:** Reimplement functionality that is already provided by existing extensions. Use extensions without understanding their implications on performance or security. * **Why:** Extensions provide a modular way to add new features to PostgreSQL. They allow you to extend the functionality of the database without modifying the core code. * **Example:** """sql -- Enable the uuid-ossp extension CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; -- Generate a UUID SELECT uuid_generate_v4(); -- Enable PostGIS extension CREATE EXTENSION IF NOT EXISTS postgis; -- Example usage of PostGIS CREATE TABLE cities ( id SERIAL PRIMARY KEY, name VARCHAR(255), location GEOMETRY(Point, 4326) -- WGS 84 spatial reference system ); INSERT INTO cities (name, location) VALUES ('London', ST_GeomFromText('POINT(-0.1278 51.5074)', 4326)); SELECT name FROM cities WHERE ST_DWithin(location, ST_GeomFromText('POINT(-0.11 51.51)', 4326), 1000); -- Find cities within 1km of a point """ ## 3. Data Access Layer Design ### 3.1. Repository Pattern * **Do This:** Implement a repository pattern to abstract away the data access logic from the business logic. Create a separate repository interface for each entity type. * **Don't Do This:** Directly embed SQL queries within the business logic. Expose raw database connections to the application layer. * **Why:** The repository pattern improves code organization, testability, and allows you to easily switch between different data access implementations (e.g., using different ORMs or data sources). * **Example:** """sql -- (Conceptual example - repositories are typically implemented in the application layer) -- Illustrative example using functions as a simplified repository: CREATE OR REPLACE FUNCTION get_customer_by_id(customer_id INT) RETURNS TABLE (id INT, name TEXT, email TEXT) AS $$ BEGIN RETURN QUERY SELECT id, name, email FROM customers WHERE id = get_customer_by_id.customer_id; END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION update_customer_email(customer_id INT, new_email TEXT) RETURNS VOID AS $$ BEGIN UPDATE customers SET email = new_email WHERE id = update_customer_email.customer_id; END; $$ LANGUAGE plpgsql; -- Usage (from application code level) -- Using an ORM such as SQLAlchemy or similar is the more conventional and robust approach SELECT * FROM get_customer_by_id(123); CALL update_customer_email(123, 'new_email@example.com'); """ ### 3.2. Data Transfer Objects (DTOs) / Value Objects * **Do This:** Use DTOs or value objects to encapsulate data that is transferred between components. Define clear contracts for the data that needs to be exchanged. Use custom types whenever appropriate for stronger typing. * **Don't Do This:** Pass raw data or arrays between components without a clear schema. Mutate the data within the DTO after it has been passed to another component. * **Why:** Using DTOs improves code readability, reduces the risk of data corruption, and promotes data integrity. It simplifies testing and validation. ### 3.3. Connection Pooling * **Do This:** Use connection pooling to reuse database connections and reduce the overhead of establishing new connections. Configure appropriate connection pool settings based on the application's requirements. Use connection pooling at the application level (e.g., using PgBouncer or connection pools in your programming language). * **Don't Do This:** Create a new database connection for each request. Use an excessively large connection pool that can overload the database server. * **Why:** Connection pooling improves performance by reducing the overhead of establishing new database connections. It allows you to handle a larger number of concurrent requests. ## 4. Security Considerations ### 4.1. SQL Injection Prevention * **Do This:** Always use parameterized queries or prepared statements to prevent SQL injection vulnerabilities. Validate all input parameters before using them in SQL queries. Sanitize user input. * **Don't Do This:** Directly concatenate user input into SQL queries. Trust user input without proper validation. * **Why:** SQL injection is a major security vulnerability that can allow attackers to execute arbitrary SQL code. Parameterized queries and prepared statements prevent SQL injection by treating user input as data rather than code. ### 4.2. Principle of Least Privilege * **Do This:** Grant users and roles only the minimum permissions that they need to perform their tasks. Use fine-grained permissions to control access to specific tables, columns, or functions. Use "SECURITY DEFINER" with extreme caution. * **Don't Do This:** Grant unnecessary permissions to users or roles. Use the "superuser" role for application connections. * **Why:** The principle of least privilege reduces the risk of unauthorized access to data or operations. It limits the damage that can be caused by a compromised account. ### 4.3. Row-Level Security (RLS) * **Do This:** Use row-level security policies to restrict access to specific rows of data based on user attributes. * **Don't Do This:** Rely solely on application-level logic to enforce data access restrictions. * **Why:** Row-level security provides an additional layer of protection against unauthorized data access. It allows you to implement fine-grained access control at the database level. The database maintains the RLS policies; therefore, it helps prevent mistakes that might happen at the source code level. * **Example:** """sql -- Enable row-level security on the employees table ALTER TABLE employees ENABLE ROW LEVEL SECURITY; -- Create a policy that allows users to only see their own employee record CREATE POLICY employee_policy ON employees FOR SELECT USING (user_id = current_user); """ ### 4.4. Audit Logging * **Do This:** Implement audit logging to track all database activity. Log important events such as login attempts, data modifications, and permission changes. Use extensions such as "pgaudit". * **Don't Do This:** Disable audit logging. Store audit logs in the same database as the application data. * **Why:** Audit logging provides a record of database activity that can be used for security monitoring, compliance auditing, and forensic analysis. ## 5. Performance Optimization ### 5.1. Indexing * **Do This:** Create indexes on columns that are frequently used in WHERE clauses, JOIN conditions, or ORDER BY clauses. Use appropriate index types (e.g., B-tree, Hash, GIN, GIST) based on the data type and query patterns. Regularly analyze table statistics. Consider expression indexes and partial indexes. * **Don't Do This:** Create too many indexes, which can slow down write operations. Create indexes on columns that are rarely used in queries. Neglect maintaining index statistics. * **Why:** Indexes improve query performance by allowing the database to quickly locate the rows that match a given query. ### 5.2. Query Optimization * **Do This:** Use the "EXPLAIN" command to analyze query execution plans. Rewrite queries to use more efficient operators or algorithms. Avoid using "SELECT *" in queries. Use "JOIN" carefully. Use appropriate hints only in specific cases and with full understanding of their consequences. * **Don't Do This:** Ignore slow queries. Rely solely on the query optimizer to optimize queries. * **Why:** Query optimization can significantly improve the performance of database applications. Understanding the query execution plan allows you to identify performance bottlenecks and rewrite queries to use more efficient operators or algorithms. ### 5.3. Connection Management * **Do This:** Use connection pooling to reuse database connections. Close database connections when they are no longer needed. Configure appropriate connection pool settings based on the application's requirements. * **Don't Do This:** Create a new database connection for each request. Leave database connections open indefinitely. * **Why:** Efficient connection management reduces the overhead of establishing new database connections and improves application performance. ### 5.4. Partitioning * **Do This:** Use table partitioning to divide large tables into smaller, more manageable pieces. Use declarative partitioning (introduced in PostgreSQL 10). * **Don't Do This:** Create overly complex partitioning schemes. Use legacy partitioning methods if declarative partitioning is available. * **Why:** Table partitioning improves query performance, reduces storage requirements, and simplifies maintenance. *Example:* """sql -- Create a parent table CREATE TABLE orders ( order_id BIGSERIAL PRIMARY KEY, customer_id INT, order_date DATE, amount DECIMAL ) PARTITION BY RANGE (order_date); -- Create child tables (partitions) CREATE TABLE orders_2023_01 PARTITION OF orders FOR VALUES FROM ('2023-01-01') TO ('2023-02-01'); CREATE TABLE orders_2023_02 PARTITION OF orders FOR VALUES FROM ('2023-02-01') TO ('2023-03-01'); -- Further partitions can be added as needed """ ## 6. Testing and Documentation ### 6.1. Unit Testing * **Do This:** Write unit tests for all database components, including functions, procedures, and views. Use a testing framework such as "pgTAP". * **Don't Do This:** Skip unit testing for database components. * **Why:** Unit testing ensures that database components function correctly and that changes do not introduce regressions. ### 6.2. Integration Testing * **Do This:** Write integration tests to verify that database components interact correctly with each other and with other parts of the system. * **Don't Do This:** Rely solely on unit tests. * **Why:** Integration testing ensures that the system as a whole functions correctly. ### 6.3. Documentation * **Do This:** Document all database components, including tables, columns, functions, procedures, views, and custom types. Explain the purpose of each component, its inputs and outputs, and any relevant business rules or constraints. * **Don't Do This:** Skip documentation. Write incomplete or outdated documentation. * **Why:** Documentation makes it easier to understand, maintain, and use database components. It reduces the risk of errors and improves collaboration. ## 7. Continuous Integration/Continuous Deployment (CI/CD) ### 7.1. Database Migrations * **Do This:** Use a database migration tool such as Flyway or Liquibase to manage database schema changes. Store database migrations in version control. Automate the process of applying database migrations as part of the CI/CD pipeline. * **Don't Do This:** Manually apply database schema changes. Skip version control for database migrations. * **Why:** Database migrations provide a consistent and repeatable way to manage database schema changes. They ensure that the database is always in a consistent state, regardless of the environment. ### 7.2. Automated Testing * **Do This:** Automate the execution of unit tests and integration tests as part of the CI/CD pipeline. Fail the build if any tests fail. * **Don't Do This:** Manually run tests. Ignore test failures. * **Why:** Automated testing ensures that changes do not introduce regressions and that the system as a whole functions correctly. ## 8. Monitoring and Alerting ### 8.1. Performance Monitoring * **Do This:** Monitor the performance of the database server, including CPU usage, memory usage, disk I/O, and query execution times. Use tools such as "pg_stat_statements" and "auto_explain" to identify slow queries. * **Don't Do This:** Ignore database performance metrics. * **Why:** Performance monitoring allows you to identify and address performance bottlenecks before they impact users. ### 8.2. Security Monitoring * **Do This:** Monitor the database server for security events, such as login attempts, failed authentication attempts, and unauthorized data access. Integrate the database logs with a security information and event management (SIEM) system. * **Don't Do This:** Ingore security events. * **Why:** Security monitoring allows you to detect and respond to security threats in a timely manner. ### 8.3. Alerting * **Do This:** Set up alerts to notify you of critical events, such as high CPU usage, low disk space, or security breaches. * **Don't Do This:** Ignore critical events. * **Why:** Alerting enables you to respond quickly to critical events and minimize the impact on users.