av天堂中文字幕在线播放,成年午夜性视频

首頁

Java

java教程

pg-index-health – PostgreSQL 數(shù)據(jù)庫的靜態(tài)分析工具

Linda Hamilton

Jan 06, 2025 pm 06:20 PM

你好！

自 2019 年以來，我一直在開發(fā)一個名為 pg-index-health 的開源工具，它可以分析數(shù)據(jù)庫結(jié)構(gòu)并識別潛在問題。在我之前的一篇文章中，我分享了這個工具如何誕生的故事。

多年來，pg-index-health 不斷發(fā)展和改進。 2024 年，在幾位貢獻者的支持下，我成功解決了大部分剩余的“成長的煩惱”，并使該項目達到了可以大規(guī)模擴展的狀態(tài)。

數(shù)據(jù)庫隨著微服務(wù)的興起而增長

自 2015 年以來，我一直在使用 PostgreSQL，這段迷人的旅程始于位于雅羅斯拉夫爾的 Tensor 公司。

早在2015年，那還是一個擁有海量數(shù)據(jù)庫和大量表的單體時代。通常，對此類數(shù)據(jù)庫結(jié)構(gòu)的任何更改都需要獲得作為關(guān)鍵知識持有者的架構(gòu)師或開發(fā)主管的強制批準(zhǔn)。雖然這可以防止大多數(shù)錯誤，但它減慢了更改的過程并且完全無法擴展。

漸漸地，人們開始轉(zhuǎn)向微服務(wù)。
數(shù)據(jù)庫的數(shù)量顯著增加，但每個數(shù)據(jù)庫中的表數(shù)量卻相反減少。現(xiàn)在，每個團隊開始獨立管理自己的數(shù)據(jù)庫結(jié)構(gòu)。集中的專業(yè)知識來源消失了，數(shù)據(jù)庫設(shè)計錯誤開始成倍增加并從一項服務(wù)傳播到另一項服務(wù)。

測試金字塔及其形狀

你們中的大多數(shù)人可能都聽說過測試金字塔。對于整體而言，它具有相當(dāng)?shù)湫偷男螤詈蛷V泛的單元測試基礎(chǔ)。欲了解更多詳情，我推薦 Martin Fowler 的文章。

pg-index-health – a static analysis tool for you PostgreSQL database

微服務(wù)不僅改變了開發(fā)方法，還改變了測試金字塔的外觀。這種轉(zhuǎn)變很大程度上是由容器化技術(shù)（Docker、Testcontainers）的興起推動的。如今，測試金字塔根本不再是金字塔。它可以有一個非常奇怪的形狀。最著名的例子是蜂巢和測試獎杯。

pg-index-health – a static analysis tool for you PostgreSQL database

現(xiàn)代趨勢是編寫盡可能少的單元測試，重點關(guān)注實現(xiàn)細節(jié)，并優(yōu)先考慮驗證服務(wù)提供的實際功能的組件和集成測試。

我個人最喜歡的是測試獎杯。其基礎(chǔ)是靜態(tài)代碼分析，旨在防止常見錯誤。

靜態(tài)代碼分析的重要性

Java 和 Kotlin 代碼的靜態(tài)分析現(xiàn)在是常見的做法。對于 Kotlin 服務(wù)，選擇的工具通常是 detekt。對于 Java 應(yīng)用程序，可用工具（通常稱為 linter）的范圍更廣。主要工具包括Checkstyle、PMD、SpotBugs和Error Prone。您可以在我的上一篇文章中閱讀有關(guān)它們的更多信息。

值得注意的是，detekt 和 Checkstyle 也可以處理代碼格式化，有效地充當(dāng)格式化程序。

數(shù)據(jù)庫遷移的靜態(tài)分析

現(xiàn)代微服務(wù)通常包括數(shù)據(jù)庫遷移，用于創(chuàng)建和更新數(shù)據(jù)庫結(jié)構(gòu)以及應(yīng)用程序代碼。

在 Java 生態(tài)系統(tǒng)中，管理遷移的主要工具是 Liquibase 和 Flyway。對數(shù)據(jù)庫結(jié)構(gòu)的任何更改都必須始終記錄在遷移中。即使在生產(chǎn)中發(fā)生事件期間手動進行更改，也必須稍后創(chuàng)建遷移以在所有環(huán)境中應(yīng)用這些更改。

用純 SQL 編寫遷移是最佳實踐，因為與學(xué)習(xí) Liquibase 等工具的 XML 方言相比，它提供了最大的靈活性并節(jié)省時間。我在我的文章“在功能測試中使用 PostgreSQL 的六個技巧”中談到了這一點。

驗證SQL遷移代碼

要驗證遷移中的 SQL 代碼，我建議使用 SQLFluff，它本質(zhì)上是 SQL 的 Checkstyle 等效項。此 linter 支持多種數(shù)據(jù)庫和方言（包括 PostgreSQL），并且可以集成到您的 CI 管道中。它提供了 60 多種可自定義規(guī)則，使您能夠管理表和列別名、SQL 命令大小寫、縮進、查詢中的列排序等等。

比較帶格式和不帶格式的查詢：

-- well-formatted SQL
select
    pc.oid::regclass::text as table_name,
    pg_table_size(pc.oid) as table_size
from
    pg_catalog.pg_class pc
    inner join pg_catalog.pg_namespace nsp on nsp.oid = pc.relnamespace
where
    pc.relkind = 'r' and
    pc.oid not in (
        select c.conrelid as table_oid
        from pg_catalog.pg_constraint c
        where c.contype = 'p'
    ) and
    nsp.nspname = :schema_name_param::text
order by table_name;

-- poorly formatted SQL
SELECT pc.oid::regclass::text AS table_name, pg_table_size(pc.oid) AS table_size
FROM pg_catalog.pg_class  pc
JOIN pg_catalog.pg_namespace AS nsp
ON nsp.oid =  pc.relnamespace
WHERE pc.relkind = 'r’
and pc.oid NOT in (
  select c.conrelid as table_oid
  from pg_catalog.pg_constraint   c
  where    c.contype = 'p’
)
and nsp.nspname  = :schema_name_param::text
ORDER BY  table_name;

格式良好的 SQL 代碼更容易閱讀和理解。最重要的是，代碼審查將不再因有關(guān)格式首選項的討論而陷入困境。 SQLFluff 強制執(zhí)行一致的樣式，節(jié)省時間。

SQLFluff 的實際應(yīng)用

這就是真實拉取請求中的樣子：

pg-index-health – a static analysis tool for you PostgreSQL database

這里SQLFluff發(fā)現(xiàn)select語句中返回值格式化有問題：當(dāng)只返回一列時，我們不會將其放在單獨的行。第二點是選擇結(jié)果中的列順序不正確：首先我們返回簡單列，然后才返回計算結(jié)果。第三個是 join 語句中和的大小寫不正確：我更喜歡用小寫形式編寫所有查詢。

有關(guān)使用 SQLFluff 的更多示例，請查看我的開源項目：一、二。

使用元數(shù)據(jù)分析數(shù)據(jù)庫結(jié)構(gòu)

還可以檢查數(shù)據(jù)庫本身的結(jié)構(gòu)。然而，處理遷移非常不方便：遷移數(shù)量可能很多；新的遷移可能會修復(fù)先前遷移中的錯誤，等等。通常，我們對數(shù)據(jù)庫的最終結(jié)構(gòu)比其中間狀態(tài)更感興趣。

利用信息模式

PostgreSQL（像許多其他關(guān)系數(shù)據(jù)庫一樣）存儲有關(guān)所有對象及其之間關(guān)系的元數(shù)據(jù)，并以 information_schema 的形式向外部提供。我們可以使用對information_schema的查詢來識別任何偏差、問題或常見錯誤（這正是SchemaCrawler所做的）。

由于我們僅使用 PostgreSQL，因此我們可以使用系統(tǒng)目錄（pg_catalog 架構(gòu)），而不是 information_schema，它提供有關(guān)特定數(shù)據(jù)庫內(nèi)部結(jié)構(gòu)的更多信息。

累計統(tǒng)計系統(tǒng)

除了元數(shù)據(jù)之外，PostgreSQL還收集每個數(shù)據(jù)庫的運行信息：執(zhí)行了哪些查詢、如何執(zhí)行、使用了哪些訪問方法等。累積統(tǒng)計系統(tǒng)負責(zé)收集這個數(shù)據(jù)。

通過系統(tǒng)視圖查詢這些統(tǒng)計數(shù)據(jù)并將其與系統(tǒng)目錄中的數(shù)據(jù)相結(jié)合，我們可以：

識別未使用的索引；
檢測缺乏足夠索引的表。

統(tǒng)計數(shù)據(jù)可以手動重置。上次重置的日期和時間記錄在系統(tǒng)中。考慮這一點對于了解統(tǒng)計數(shù)據(jù)是否可信非常重要。例如，如果您有一些業(yè)務(wù)邏輯每月/每季度/每半年執(zhí)行一次，則需要收集至少上述間隔時間的統(tǒng)計信息。

如果使用數(shù)據(jù)庫集群，則統(tǒng)計信息將在每個主機上獨立收集，并且不會在集群內(nèi)復(fù)制。

pg-index-health 及其結(jié)構(gòu)

如上所述，基于數(shù)據(jù)庫本身內(nèi)的元數(shù)據(jù)分析數(shù)據(jù)庫結(jié)構(gòu)的想法已由我以名為 pg-index-health 的工具的形式實現(xiàn)。

我的解決方案包括以下組件：

一組 SQL 查詢形式的檢查，放置在單獨的存儲庫中（當(dāng)前包含 25 個檢查）。這些查詢與 Java 代碼庫解耦，可以在用其他編程語言編寫的項目中重用。
領(lǐng)域模型 - 將檢查結(jié)果表示為對象的最小類集。
HighAvailabilityPgConnection 抽象，用于連接到由多個主機組成的數(shù)據(jù)庫集群。
用于執(zhí)行 SQL 查詢并將結(jié)果序列化為域模型對象的實用程序。
Spring Boot 啟動器，用于方便快速地將檢查集成到單元/組件/集成測試中。
遷移生成器，可以為已識別的問題創(chuàng)建糾正性 SQL 遷移。

支票類型

所有檢查（也稱為診斷）分為兩組：

運行時檢查（需要統(tǒng)計）。
靜態(tài)檢查（不需要統(tǒng)計）。

運行時檢查

運行時檢查僅在生產(chǎn)中的實時數(shù)據(jù)庫實例上執(zhí)行時才有意義。這些檢查需要累積統(tǒng)計數(shù)據(jù)并聚合來自集群中所有主機的數(shù)據(jù)。

讓我們考慮一個由三個主機組成的數(shù)據(jù)庫集群：主主機、輔助主機和異步副本。某些服務(wù)使用具有類似拓撲的集群，并且僅在異步副本上執(zhí)行大量讀取查詢以平衡負載。此類查詢通常不會在主主機上執(zhí)行，因為它們會產(chǎn)生額外的負載并對其他查詢的延遲產(chǎn)生負面影響。

pg-index-health – a static analysis tool for you PostgreSQL database

如前所述，在 PostgreSQL 中，統(tǒng)計信息是在每個主機上單獨收集的，并且不會在集群內(nèi)復(fù)制。因此，您很容易遇到僅在異步副本上使用和需要某些索引的情況。為了可靠地確定是否需要索引，需要在集群中的每個主機上運行檢查并聚合結(jié)果。

靜態(tài)檢查

靜態(tài)檢查不需要累積統(tǒng)計數(shù)據(jù)，可以在應(yīng)用遷移后立即在主主機上執(zhí)行。當(dāng)然，它們也可以用于生產(chǎn)數(shù)據(jù)庫來實時獲取數(shù)據(jù)。然而，大多數(shù)檢查都是靜態(tài)的，它們在測試中特別有用，因為它們有助于捕獲和防止開發(fā)階段的常見錯誤。

pg-index-health – a static analysis tool for you PostgreSQL database

如何使用 pg-index-health

pg-index-health 的主要用例是添加測試來驗證測試管道中的數(shù)據(jù)庫結(jié)構(gòu)。

對于 Spring Boot 應(yīng)用程序，您需要將啟動器添加到測試依賴項中：

-- well-formatted SQL
select
    pc.oid::regclass::text as table_name,
    pg_table_size(pc.oid) as table_size
from
    pg_catalog.pg_class pc
    inner join pg_catalog.pg_namespace nsp on nsp.oid = pc.relnamespace
where
    pc.relkind = 'r' and
    pc.oid not in (
        select c.conrelid as table_oid
        from pg_catalog.pg_constraint c
        where c.contype = 'p'
    ) and
    nsp.nspname = :schema_name_param::text
order by table_name;

然后添加標(biāo)準(zhǔn)測試：

-- poorly formatted SQL
SELECT pc.oid::regclass::text AS table_name, pg_table_size(pc.oid) AS table_size
FROM pg_catalog.pg_class  pc
JOIN pg_catalog.pg_namespace AS nsp
ON nsp.oid =  pc.relnamespace
WHERE pc.relkind = 'r’
and pc.oid NOT in (
  select c.conrelid as table_oid
  from pg_catalog.pg_constraint   c
  where    c.contype = 'p’
)
and nsp.nspname  = :schema_name_param::text
ORDER BY  table_name;

在此測試中，所有可用的檢查都作為列表注入。然后，只有靜態(tài)檢查會在部署在應(yīng)用了遷移的容器中的真實數(shù)據(jù)庫上過濾和執(zhí)行。

理想情況下，每次檢查都應(yīng)返回一個空列表。如果添加下一個遷移時有任何偏差，測試將會失敗。開發(fā)人員將被迫關(guān)注這一點并以任何方式解決問題：要么在遷移中修復(fù)它，要么明確忽略它。

誤報和添加排除

重要的是要了解 pg-index-health 與任何其他靜態(tài)分析器一樣，可能會產(chǎn)生誤報。此外，某些檢查可能與您的項目無關(guān)。例如，記錄數(shù)據(jù)庫結(jié)構(gòu)被認(rèn)為是很好的做法。 PostgreSQL 允許向幾乎所有數(shù)據(jù)庫對象添加注釋。在遷移中，這可能如下所示：

dependencies {
    testImplementation("io.github.mfvanek:pg-index-health-test-starter:0.14.4")
}

在您的團隊中，您可能同意不這樣做。在這種情況下，相應(yīng)檢查的結(jié)果 (TABLES_WITHOUT_DESCRIPTION、COLUMNS_WITHOUT_DESCRIPTION、FUNCTIONS_WITHOUT_DESCRIPTION）對您來說變得無關(guān)緊要。

您可以完全排除這些檢查：

import io.github.mfvanek.pg.core.checks.common.DatabaseCheckOnHost;
import io.github.mfvanek.pg.core.checks.common.Diagnostic;
import io.github.mfvanek.pg.model.dbobject.DbObject;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.ActiveProfiles;

import java.util.List;

import static org.assertj.core.api.Assertions.assertThat;

@SpringBootTest
@ActiveProfiles("test")
class DatabaseStructureStaticAnalysisTest {

    @Autowired
    private List<DatabaseCheckOnHost<? extends DbObject>> checks;

    @Test
    void checksShouldWork() {
        assertThat(checks)
            .hasSameSizeAs(Diagnostic.values());

        checks.stream()
            .filter(DatabaseCheckOnHost::isStatic)
            .forEach(c -> assertThat(c.check())
                .as(c.getDiagnostic().name())
                .isEmpty());
    }
}

或者干脆忽略他們的結(jié)果：

create table if not exists demo.warehouse
(
    id bigint primary key generated always as identity,
    name varchar(255) not null
);

comment on table demo.warehouse is 'Information about the warehouses';
comment on column demo.warehouse.id is 'Unique identifier of the warehouse';
comment on column demo.warehouse.name is 'Human readable name of the warehouse';

在引入pg-index-health時，你可能經(jīng)常會遇到數(shù)據(jù)庫結(jié)構(gòu)已經(jīng)存在一些偏差的情況，但你又不想立即解決它們。同時，該檢查是相關(guān)的，禁用它不是一個選項。在這種情況下，最好修復(fù)代碼中的所有偏差：

@Test
void checksShouldWork() {
    assertThat(checks)
        .hasSameSizeAs(Diagnostic.values());

    checks.stream()
        .filter(DatabaseCheckOnHost::isStatic)
        .filter(c -> c.getDiagnostic() != Diagnostic.TABLES_WITHOUT_DESCRIPTION &&
            c.getDiagnostic() != Diagnostic.COLUMNS_WITHOUT_DESCRIPTION)
        .forEach(c -> assertThat(c.check())
            .as(c.getDiagnostic().name())
            .isEmpty());
}

現(xiàn)在，我想更詳細地關(guān)注最常遇到的問題。

沒有主鍵的表

由于 PostgreSQL 中 MVCC 機制的特殊性，可能會發(fā)生諸如膨脹之類的情況，即表（或索引）的大小由于大量死元組而快速增長。例如，由于長時間運行的事務(wù)或一次性更新大量行，可能會發(fā)生這種情況。

數(shù)據(jù)庫內(nèi)的垃圾收集由autovacuum進程處理，但它不會釋放占用的物理磁盤空間。有效減少表物理大小的唯一方法是使用 VACUUM FULL 命令，該命令在操作期間需要獨占鎖。對于大桌子，這可能需要幾個小時，使得完全吸塵對于大多數(shù)現(xiàn)代服務(wù)來說是不切實際的。

為了在不停機的情況下解決表膨脹問題，經(jīng)常使用像pg_repack這樣的第三方擴展。 pg_repack 的強制要求之一是目標(biāo)表上存在主鍵或其他一些唯一性約束。 TABLES_WITHOUT_PRIMARY_KEY 診斷有助于檢測沒有主鍵的表并防止將來出現(xiàn)維護問題。

下面是一個沒有主鍵的表的示例。如果此表中出現(xiàn) bloat，pg_repack 將無法處理它并返回錯誤。

-- well-formatted SQL
select
    pc.oid::regclass::text as table_name,
    pg_table_size(pc.oid) as table_size
from
    pg_catalog.pg_class pc
    inner join pg_catalog.pg_namespace nsp on nsp.oid = pc.relnamespace
where
    pc.relkind = 'r' and
    pc.oid not in (
        select c.conrelid as table_oid
        from pg_catalog.pg_constraint c
        where c.contype = 'p'
    ) and
    nsp.nspname = :schema_name_param::text
order by table_name;

重復(fù)索引

我們的數(shù)據(jù)庫運行在資源有限的主機上，磁盤空間就是其中之一。使用數(shù)據(jù)庫即服務(wù)解決方案時，最大數(shù)據(jù)庫大小通常存在無法更改的物理限制。

表中的每個索引都是磁盤上的一個單獨的實體。它占用空間并且需要資源進行維護，這會減慢數(shù)據(jù)插入和更新的速度。我們創(chuàng)建索引是為了加快搜索速度或確保某些值的唯一性。然而，索引使用不當(dāng)可能會導(dǎo)致它們的總大小超過表本身有用數(shù)據(jù)的大小。因此，表中的索引數(shù)量應(yīng)該盡可能少，但足以滿足其功能。

我遇到過很多在遷移中創(chuàng)建不必要索引的情況。例如，主鍵的索引是自動創(chuàng)建的。雖然技術(shù)上可以手動索引 id 列，但這樣做完全沒有意義。

-- poorly formatted SQL
SELECT pc.oid::regclass::text AS table_name, pg_table_size(pc.oid) AS table_size
FROM pg_catalog.pg_class  pc
JOIN pg_catalog.pg_namespace AS nsp
ON nsp.oid =  pc.relnamespace
WHERE pc.relkind = 'r’
and pc.oid NOT in (
  select c.conrelid as table_oid
  from pg_catalog.pg_constraint   c
  where    c.contype = 'p’
)
and nsp.nspname  = :schema_name_param::text
ORDER BY  table_name;

獨特約束也會出現(xiàn)類似的情況。當(dāng)您使用 unique 關(guān)鍵字標(biāo)記一列（或一組列）時，PostgreSQL 會自動為該列（或一組列）創(chuàng)建唯一索引。無需手動創(chuàng)建額外的索引。如果這樣做，這會導(dǎo)致重復(fù)的索引。此類冗余索引可以而且應(yīng)該被刪除，DUPLICATED_INDEXES 診斷可以幫助識別它們。

-- well-formatted SQL
select
    pc.oid::regclass::text as table_name,
    pg_table_size(pc.oid) as table_size
from
    pg_catalog.pg_class pc
    inner join pg_catalog.pg_namespace nsp on nsp.oid = pc.relnamespace
where
    pc.relkind = 'r' and
    pc.oid not in (
        select c.conrelid as table_oid
        from pg_catalog.pg_constraint c
        where c.contype = 'p'
    ) and
    nsp.nspname = :schema_name_param::text
order by table_name;

重疊（相交）索引

大多數(shù)索引都是為單個列創(chuàng)建的。當(dāng)查詢優(yōu)化開始時，可??能會添加更復(fù)雜的索引，涉及多個列。這導(dǎo)致了為 A、A B 和 A B C 等列創(chuàng)建索引的情況。本系列中的前兩個索引通常可以丟棄，因為它們是第三個索引的前綴（我建議觀看此視頻）。刪除這些冗余索引可以節(jié)省大量磁盤空間，INTERSECTED_INDEXES 診斷旨在檢測此類情況。

-- poorly formatted SQL
SELECT pc.oid::regclass::text AS table_name, pg_table_size(pc.oid) AS table_size
FROM pg_catalog.pg_class  pc
JOIN pg_catalog.pg_namespace AS nsp
ON nsp.oid =  pc.relnamespace
WHERE pc.relkind = 'r’
and pc.oid NOT in (
  select c.conrelid as table_oid
  from pg_catalog.pg_constraint   c
  where    c.contype = 'p’
)
and nsp.nspname  = :schema_name_param::text
ORDER BY  table_name;

沒有索引的外鍵

PostgreSQL 允許創(chuàng)建外鍵約束而不指定支持索引，這意味著引用另一個表不需要也不會自動創(chuàng)建索引。在某些情況下，這可能不是問題，并且可能根本不會顯現(xiàn)出來。然而，有時它可能會導(dǎo)致生產(chǎn)中發(fā)生事故。

讓我們看一個小例子（我使用的是 PostgreSQL 16.6）：

dependencies {
    testImplementation("io.github.mfvanek:pg-index-health-test-starter:0.14.4")
}

我們有一個 orders 表和一個 order_item 表。它們通過 order_id 列上的外鍵鏈接。外鍵應(yīng)始終引用主鍵或某些唯一約束，這在我們的示例中得到滿足。

讓我們用數(shù)據(jù)填充表格并收集統(tǒng)計數(shù)據(jù)。我們將添加 100,000 個訂單，其中一半有兩件商品，另一半有一件商品。

import io.github.mfvanek.pg.core.checks.common.DatabaseCheckOnHost;
import io.github.mfvanek.pg.core.checks.common.Diagnostic;
import io.github.mfvanek.pg.model.dbobject.DbObject;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.ActiveProfiles;

import java.util.List;

import static org.assertj.core.api.Assertions.assertThat;

@SpringBootTest
@ActiveProfiles("test")
class DatabaseStructureStaticAnalysisTest {

    @Autowired
    private List<DatabaseCheckOnHost<? extends DbObject>> checks;

    @Test
    void checksShouldWork() {
        assertThat(checks)
            .hasSameSizeAs(Diagnostic.values());

        checks.stream()
            .filter(DatabaseCheckOnHost::isStatic)
            .forEach(c -> assertThat(c.check())
                .as(c.getDiagnostic().name())
                .isEmpty());
    }
}

如果我們嘗試檢索 ID=100 的訂單的商品，我們應(yīng)該成功返回 2 行。由于訂單表中的 id 列上有索引，因此該查詢似乎應(yīng)該很快。

create table if not exists demo.warehouse
(
    id bigint primary key generated always as identity,
    name varchar(255) not null
);

comment on table demo.warehouse is 'Information about the warehouses';
comment on column demo.warehouse.id is 'Unique identifier of the warehouse';
comment on column demo.warehouse.name is 'Human readable name of the warehouse';

但是，如果我們嘗試分析此查詢，我們將在執(zhí)行計劃中看到對表的順序掃描。我們還應(yīng)該關(guān)注需要讀取的大量頁面（Buffers 參數(shù)）。

@Test
void checksShouldWork() {
    assertThat(checks)
        .hasSameSizeAs(Diagnostic.values());

    checks.stream()
        .filter(DatabaseCheckOnHost::isStatic)
        .filter(c -> c.getDiagnostic() != Diagnostic.TABLES_WITHOUT_DESCRIPTION &&
            c.getDiagnostic() != Diagnostic.COLUMNS_WITHOUT_DESCRIPTION)
        .forEach(c -> assertThat(c.check())
            .as(c.getDiagnostic().name())
            .isEmpty());
}

@Test
void checksShouldWork() {
    assertThat(checks)
        .hasSameSizeAs(Diagnostic.values());

    checks.stream()
        .filter(DatabaseCheckOnHost::isStatic)
        .forEach(c -> {
            final ListAssert<? extends DbObject> listAssert = assertThat(c.check())
                .as(c.getDiagnostic().name());
            switch (c.getDiagnostic()) {
                case TABLES_WITHOUT_DESCRIPTION, COLUMNS_WITHOUT_DESCRIPTION -> listAssert.hasSizeGreaterThanOrEqualTo(0); // ignored

                default -> listAssert.isEmpty();
            }
        });
}

如果我們?yōu)閹в型怄I的列創(chuàng)建索引，情況就會恢復(fù)正常：

@Test
void checksShouldWorkForAdditionalSchema() {
    final PgContext ctx = PgContext.of("additional_schema");
    checks.stream()
        .filter(DatabaseCheckOnHost::isStatic)
        .forEach(c -> {
            final ListAssert<? extends DbObject> listAssert = assertThat(c.check(ctx))
                .as(c.getDiagnostic().name());

            switch (c.getDiagnostic()) {
                case TABLES_WITHOUT_DESCRIPTION, TABLES_NOT_LINKED_TO_OTHERS ->
                    listAssert.hasSize(1)
                        .asInstanceOf(list(Table.class))
                        .containsExactly(
                            Table.of(ctx, "additional_table")
                        );

                default -> listAssert.isEmpty();
            }
        });
}

順序掃描將從查詢計劃中消失，讀取的頁數(shù)將顯著減少：

create table if not exists demo.payment
(
    id bigint not null, -- column is not marked as primary key
    order_id bigint references demo.orders (id),
    status int not null,
    created_at timestamp not null,
    payment_total decimal(22, 2) not null
);

FOREIGN_KEYS_WITHOUT_INDEX 診斷將使您能夠在開發(fā)過程中及早發(fā)現(xiàn)此類情況，從而防止出現(xiàn)性能問題。

我是否應(yīng)該創(chuàng)建索引？

記住誤報問題很重要：并非所有外鍵列都需要索引。嘗試估算生產(chǎn)中大概的工作臺尺寸；檢查您的代碼以在外鍵列上進行過濾、搜索或連接。如果您 100% 確定不需要該索引，則只需將其添加到排除項即可。如果您不確定，最好創(chuàng)建索引（以后隨時可以將其刪除）。

我經(jīng)常遇到由于外鍵上沒有索引而導(dǎo)致數(shù)據(jù)庫“變慢”的事件，但我還沒有看到任何由于存在此類索引而導(dǎo)致數(shù)據(jù)庫“變慢”的事件。因此，我不同意 Percona 博客文章中提出的觀點，即從一開始就不應(yīng)該創(chuàng)建外鍵索引。這是一種DBA方法。您的團隊中有專門的DBA嗎？

索引中的空值

默認(rèn)情況下，PostgreSQL 在 btree 索引中包含空值，但通常不需要它們。所有空值都是唯一的，您不能簡單地檢索列值為空的記錄。大多數(shù)時候，最好通過在 nullable 列上創(chuàng)建部分索引來從索引中排除空值，例如 where ;不為空。診斷INDEXES_WITH_NULL_VALUES有助于檢測此類情況。

讓我們考慮一個 orders 和 order_items 的示例。 order_item 表有一個 nullable 列 warehouse_id，代表倉庫 ID。

-- well-formatted SQL
select
    pc.oid::regclass::text as table_name,
    pg_table_size(pc.oid) as table_size
from
    pg_catalog.pg_class pc
    inner join pg_catalog.pg_namespace nsp on nsp.oid = pc.relnamespace
where
    pc.relkind = 'r' and
    pc.oid not in (
        select c.conrelid as table_oid
        from pg_catalog.pg_constraint c
        where c.contype = 'p'
    ) and
    nsp.nspname = :schema_name_param::text
order by table_name;

假設(shè)我們有幾個倉庫。訂單付款后，我們開始組裝。我們將更新部分訂單的狀態(tài)并將其標(biāo)記為已付款。

-- poorly formatted SQL
SELECT pc.oid::regclass::text AS table_name, pg_table_size(pc.oid) AS table_size
FROM pg_catalog.pg_class  pc
JOIN pg_catalog.pg_namespace AS nsp
ON nsp.oid =  pc.relnamespace
WHERE pc.relkind = 'r’
and pc.oid NOT in (
  select c.conrelid as table_oid
  from pg_catalog.pg_constraint   c
  where    c.contype = 'p’
)
and nsp.nspname  = :schema_name_param::text
ORDER BY  table_name;

訂單中的單個商品可能會根據(jù)內(nèi)部算法從不同倉庫發(fā)貨，考慮物流、庫存、倉庫負載等。分配倉庫并更新庫存后，我們更新warehouse_id 訂單中每個商品的字段（最初為空）。

dependencies {
    testImplementation("io.github.mfvanek:pg-index-health-test-starter:0.14.4")
}

我們需要通過特定的倉庫 ID 進行搜索，以了解哪些物品需要完成并發(fā)貨。我們只接受特定時間范圍內(nèi)的付費訂單。

import io.github.mfvanek.pg.core.checks.common.DatabaseCheckOnHost;
import io.github.mfvanek.pg.core.checks.common.Diagnostic;
import io.github.mfvanek.pg.model.dbobject.DbObject;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.ActiveProfiles;

import java.util.List;

import static org.assertj.core.api.Assertions.assertThat;

@SpringBootTest
@ActiveProfiles("test")
class DatabaseStructureStaticAnalysisTest {

    @Autowired
    private List<DatabaseCheckOnHost<? extends DbObject>> checks;

    @Test
    void checksShouldWork() {
        assertThat(checks)
            .hasSameSizeAs(Diagnostic.values());

        checks.stream()
            .filter(DatabaseCheckOnHost::isStatic)
            .forEach(c -> assertThat(c.check())
                .as(c.getDiagnostic().name())
                .isEmpty());
    }
}

第一個解決方案可能是 warehouse_id 列上的常規(guī)索引：

-- well-formatted SQL
select
    pc.oid::regclass::text as table_name,
    pg_table_size(pc.oid) as table_size
from
    pg_catalog.pg_class pc
    inner join pg_catalog.pg_namespace nsp on nsp.oid = pc.relnamespace
where
    pc.relkind = 'r' and
    pc.oid not in (
        select c.conrelid as table_oid
        from pg_catalog.pg_constraint c
        where c.contype = 'p'
    ) and
    nsp.nspname = :schema_name_param::text
order by table_name;

如果我們創(chuàng)建這樣的索引，那么在搜索特定倉庫的項目時將不會出現(xiàn)問題?？雌饋磉@個索引應(yīng)該允許有效地查找尚未分配倉庫的所有項目，過濾條件為warehouse_id為null的記錄。

-- poorly formatted SQL
SELECT pc.oid::regclass::text AS table_name, pg_table_size(pc.oid) AS table_size
FROM pg_catalog.pg_class  pc
JOIN pg_catalog.pg_namespace AS nsp
ON nsp.oid =  pc.relnamespace
WHERE pc.relkind = 'r’
and pc.oid NOT in (
  select c.conrelid as table_oid
  from pg_catalog.pg_constraint   c
  where    c.contype = 'p’
)
and nsp.nspname  = :schema_name_param::text
ORDER BY  table_name;

但是，如果我們查看查詢執(zhí)行計劃，我們將看到那里的順序訪問 - 未使用索引。

dependencies {
    testImplementation("io.github.mfvanek:pg-index-health-test-starter:0.14.4")
}

當(dāng)然，這與測試數(shù)據(jù)庫中數(shù)據(jù)的具體分布有關(guān)。 warehouse_id 列的基數(shù)較低，這意味著其中唯一值的數(shù)量較少。該列上的索引選擇性較低。索引選擇性是指不同索引值的數(shù)量（即基數(shù)）與表中總行數(shù)的比率distinct / count()。例如，唯一索引的選擇性為一。

我們可以通過刪除空值并在 warehouse_id 列上創(chuàng)建部分索引來提高索引的選擇性。

import io.github.mfvanek.pg.core.checks.common.DatabaseCheckOnHost;
import io.github.mfvanek.pg.core.checks.common.Diagnostic;
import io.github.mfvanek.pg.model.dbobject.DbObject;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.ActiveProfiles;

import java.util.List;

import static org.assertj.core.api.Assertions.assertThat;

@SpringBootTest
@ActiveProfiles("test")
class DatabaseStructureStaticAnalysisTest {

    @Autowired
    private List<DatabaseCheckOnHost<? extends DbObject>> checks;

    @Test
    void checksShouldWork() {
        assertThat(checks)
            .hasSameSizeAs(Diagnostic.values());

        checks.stream()
            .filter(DatabaseCheckOnHost::isStatic)
            .forEach(c -> assertThat(c.check())
                .as(c.getDiagnostic().name())
                .isEmpty());
    }
}

我們將立即在查詢計劃中看到該索引：

create table if not exists demo.warehouse
(
    id bigint primary key generated always as identity,
    name varchar(255) not null
);

comment on table demo.warehouse is 'Information about the warehouses';
comment on column demo.warehouse.id is 'Unique identifier of the warehouse';
comment on column demo.warehouse.name is 'Human readable name of the warehouse';

如果我們比較索引的大小，我們會看到顯著的差異。部分索引要小得多，更新頻率也較低。使用此索引，我們可以節(jié)省磁盤空間并提高性能。

查詢獲取索引的大小

@Test
void checksShouldWork() {
    assertThat(checks)
        .hasSameSizeAs(Diagnostic.values());

    checks.stream()
        .filter(DatabaseCheckOnHost::isStatic)
        .filter(c -> c.getDiagnostic() != Diagnostic.TABLES_WITHOUT_DESCRIPTION &&
            c.getDiagnostic() != Diagnostic.COLUMNS_WITHOUT_DESCRIPTION)
        .forEach(c -> assertThat(c.check())
            .as(c.getDiagnostic().name())
            .isEmpty());
}