SQL 資料幀整合¶

DbApiHook 提供了與流行資料分析框架的內建整合，使您能夠直接查詢資料庫並將結果檢索為 Pandas 或 Polars 資料幀。這種整合消除了 SQL 查詢結果與資料幀之間手動轉換的需求，從而簡化了資料工作流程。

Pandas 整合¶

Pandas 是一個廣泛使用的資料分析和處理庫。SQL hook 允許您將查詢結果直接檢索為 Pandas 資料幀，這對於在 Airflow 任務中進行進一步的資料轉換、分析或視覺化特別有用。

# Get complete DataFrame in a single operation
df = hook.get_df(
    sql="SELECT * FROM my_table WHERE date_column >= %s", parameters=["2023-01-01"], df_type="pandas"
)

# Get DataFrame in chunks for memory-efficient processing of large results
for chunk_df in hook.get_df_by_chunks(sql="SELECT * FROM large_table", chunksize=10000, df_type="pandas"):
    process_chunk(chunk_df)

要使用此功能，請在安裝此 provider 包時安裝 pandas extra。有關安裝說明，請參閱 <index>。

Polars 整合¶

Polars 是一個現代化的、高效能的資料幀庫，使用 Rust 實現並提供了 Python 繫結。它專為處理大型資料集時的速度和效率而設計。SQL hook 支援直接將資料檢索為 Polars 資料幀，這對於效能關鍵型資料處理任務特別有利。

# Get complete DataFrame in a single operation
df = hook.get_df(
    sql="SELECT * FROM my_table WHERE date_column >= %s",
    parameters={"date_column": "2023-01-01"},
    df_type="polars",
)

# Get DataFrame in chunks for memory-efficient processing of large results
for chunk_df in hook.get_df_by_chunks(sql="SELECT * FROM large_table", chunksize=10000, df_type="polars"):
    process_chunk(chunk_df)

要使用此功能，請在安裝此 provider 包時安裝 polars extra。有關安裝說明，請參閱 <index>。