airflow.providers.google.cloud.transfers.sql_to_gcs¶

SQL 到 GCS 運算子的基礎類。

類¶

BaseSQLToGCSOperator

將資料從 SQL 複製到 Google Cloud Storage，格式包括 JSON、CSV 或 Parquet。

模組內容¶

class airflow.providers.google.cloud.transfers.sql_to_gcs.BaseSQLToGCSOperator(*, sql, bucket, filename, schema_filename=None, approx_max_file_size_bytes=1900000000, export_format='json', stringify_dict=False, field_delimiter=',', null_marker=None, gzip=False, schema=None, parameters=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, upload_metadata=False, exclude_columns=None, partition_columns=None, write_on_empty=False, parquet_row_group_size=100000, **kwargs)[source]¶

基類: airflow.models.BaseOperator

將資料從 SQL 複製到 Google Cloud Storage，格式包括 JSON、CSV 或 Parquet。

引數：

sql (str) – 要執行的 SQL 查詢。
bucket (str) – 要上傳到的儲存桶。
filename (str) – 上傳到 Google Cloud Storage 時用作物件名稱的檔名。檔名中應包含 {}，以便運算子在檔案因大小而分割時注入檔案編號。
schema_filename (str | None) – 如果設定，則作為上傳包含從資料庫轉儲的表的 BigQuery 模式欄位的 .json 檔案時的物件名稱。
approx_max_file_size_bytes (int) – 此運算子支援將大型錶轉儲分割成多個檔案的功能（參見上面 filename 引數文件中的註釋）。此引數允許開發者指定分割後的檔案大小。請查閱 https://cloud.google.com/storage/quotas 以檢視單個物件允許的最大檔案大小。
export_format (str) – 要匯出的檔案的所需格式。(json, csv 或 parquet)
stringify_dict (bool) – 是否將字典型別物件（如 JSON 列）轉儲為字串。僅適用於 CSV/JSON 匯出格式。
field_delimiter (str) – 用於 CSV 檔案的欄位分隔符。
null_marker (str | None) – 用於 CSV 檔案的 null 標記。
gzip (bool) – 上傳檔案時是否壓縮（不適用於模式檔案）。
schema (str | list | None) – 要使用的模式，如果存在。應為字典列表或字串。如果使用 Jinja 模板，則傳遞字串；否則，傳遞字典列表。示例參見：https://cloud.google.com/bigquery/docs /schemas#specifying_a_json_schema_file
gcp_conn_id (str) – (可選) 用於連線到 Google Cloud 的連線 ID。
parameters (dict | None) – 一個引數字典，在查詢執行時進行替換。
impersonation_chain (str | collections.abc.Sequence[str] | None) – (可選) 要使用短期憑據模擬的服務帳戶，或獲取列表中最後一個帳戶（將在請求中被模擬）的 access_token 所需的鏈式帳戶列表。如果設定為字串，該帳戶必須授予源帳戶 Service Account Token Creator IAM 角色。如果設定為序列，列表中的身份必須授予緊前身份 Service Account Token Creator IAM 角色，列表中第一個帳戶授予此角色給源帳戶 (模板化)。
upload_metadata (bool) – 是否將行數元資料作為 blob 元資料上傳。
exclude_columns (set | None) – 從傳輸中排除的列集合。
partition_columns (list | None) – 用於檔案分割槽的列列表。為了使用此引數，您必須按 partition_columns 對資料集進行排序。透過在 sql 查詢中傳遞 ORDER BY 子句來實現。檔案作為物件上傳到 GCS，具有 hive 風格的分割槽目錄結構 (模板化)。
write_on_empty (bool) – 可選引數，指定如果匯出沒有返回任何行是否寫入檔案。預設為 False，因此如果匯出沒有返回任何行，我們將不寫入檔案。
parquet_row_group_size (int) – 使用 parquet 格式時，每個行組的大約行數。使用較大的行組大小可以減少檔案大小並提高讀取資料的效能，但執行運算子需要更多記憶體。(預設: 100000)

template_fields: collections.abc.Sequence[str] = ('sql', 'bucket', 'filename', 'schema_filename', 'schema', 'parameters', 'impersonation_chain',...[source]¶

template_ext: collections.abc.Sequence[str] = ('.sql',)[source]¶

template_fields_renderers[source]¶

ui_color = '#a0e08c'[source]¶

sql[source]¶

bucket[source]¶

filename[source]¶

schema_filename = None[source]¶

approx_max_file_size_bytes = 1900000000[source]¶

export_format = ''[source]¶

stringify_dict = False[source]¶

field_delimiter = ','[source]¶

null_marker = None[source]¶

gzip = False[source]¶

schema = None[source]¶

parameters = None[source]¶

gcp_conn_id = 'google_cloud_default'[source]¶

impersonation_chain = None[source]¶

upload_metadata = False[source]¶

exclude_columns = None[source]¶

partition_columns = None[source]¶

write_on_empty = False[source]¶

parquet_row_group_size = 100000[source]¶

execute(context)[source]¶

建立運算子時派生此方法。

Context 是與渲染 jinja 模板時使用的相同的字典。

有關更多上下文，請參閱 get_template_context。

convert_types(schema, col_type_dict, row)[source]¶

將 DBAPI 中的值轉換為適合輸出的格式。

abstract query()[source]¶: 執行 DBAPI 查詢。

abstract field_to_bigquery(field)[source]¶

將 DBAPI 欄位轉換為 BigQuery 模式格式。

abstract convert_type(value, schema_type, **kwargs)[source]¶: 將 DBAPI 中的值轉換為適合輸出的格式。