摘要:
pg的词法分析和语法分析使用的技术与mysql相同,此法分析使用bison,语法分析使用了flex。之后ANSI SQL句子被转换为查询树。

(图片来源网络,侵删)
Query 结构体保存了文本类型的 SQL,经过语法分析后的分析结果,Query 结构体的成员与 SOL语句的各个子句基本一一对应。SOL语法分析主要是把一条完整的文本字符串分解为查询树上具有不同含义的 Ouery结构体的成员,以便于程序理解SOL语句。如程序判断 hasSubLinks 值为 TRUE,则表示 SOL 语句中含有子链接操作。
本文对查询树进行一定的分析。

(图片来源网络,侵删)
相关上下文: 2024-05-08 postgres-调试及分析-记录-CSDN博客
原始的查询SQL:
SELECT * FROM t1 LEFT JOIN t2 ON t2.a = t1.a WHERE t2.b函数调用堆栈:
(gdb) bt #0 create_scan_plan (root=0x2ce5808, best_path=0x2cfb508, flags=2) at createplan.c:558 #1 0x00000000007dbaa4 in create_plan_recurse (root=0x2ce5808, best_path=0x2cfb508, flags=2) at createplan.c:410 #2 0x00000000007e2777 in create_mergejoin_plan (root=0x2ce5808, best_path=0x2cfbee8) at createplan.c:4376 #3 0x00000000007dc862 in create_join_plan (root=0x2ce5808, best_path=0x2cfbee8) at createplan.c:1067 #4 0x00000000007dbac0 in create_plan_recurse (root=0x2ce5808, best_path=0x2cfbee8, flags=1) at createplan.c:415 #5 0x00000000007db9ba in create_plan (root=0x2ce5808, best_path=0x2cfbee8) at createplan.c:346 #6 0x00000000007ecfbf in standard_planner (parse=0x2c16948, query_string=0x2c15718 "SELECT * FROM t1 LEFT JOIN t2 ON t2.a = t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:407 #7 0x00000000007ecd2a in planner (parse=0x2c16948, query_string=0x2c15718 "SELECT * FROM t1 LEFT JOIN t2 ON t2.a = t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:271 #8 0x000000000090572b in pg_plan_query (querytree=0x2c16948, query_string=0x2c15718 "SELECT * FROM t1 LEFT JOIN t2 ON t2.a = t1.a;", cursorOptions=2048, boundParams=0x0) at postgres.c:847 #9 0x0000000000905869 in pg_plan_queries (querytrees=0x2ce5728, query_string=0x2c15718 "SELECT * FROM t1 LEFT JOIN t2 ON t2.a = t1.a;", cursorOptions=2048, boundParams=0x0) at postgres.c:939 #10 0x0000000000905bc0 in exec_simple_query (query_string=0x2c15718 "SELECT * FROM t1 LEFT JOIN t2 ON t2.a = t1.a;") at postgres.c:1133 #11 0x000000000090a0ef in PostgresMain (argc=1, argv=0x7fff70e94eb0, dbname=0x2c42428 "d1", username=0x2c10a58 "kevin") at postgres.c:4496 #12 0x0000000000857a54 in BackendRun (port=0x2c39e30) at postmaster.c:4530 #13 0x00000000008573c1 in BackendStartup (port=0x2c39e30) at postmaster.c:4252 #14 0x0000000000853b10 in ServerLoop () at postmaster.c:1745 #15 0x00000000008533c9 in PostmasterMain (argc=1, argv=0x2c0ea10) at postmaster.c:1417 #16 0x0000000000760270 in main (argc=1, argv=0x2c0ea10) at main.c:209查询树Query的结构体:
查询树(Query)又称语法分析树SOL语法分析模块对文本类型的 SOL 语句进行词法分析和语法分析后,得到的语法树。语法分析用于检查 SOL语句的语法(格式)是否正确:把SOL语句按语法进行格式分解(P的语法规则参见 src/backend/parser/gram.y文件)即可转化为查询树。
初步得到的查询树,还将进行语义处理。语义处理是检查 SOL语句中的对象是否符合数据库的用户模式(如表定义)以及对象表达的意义是否有效(如对一个数值型列做LIKE操作是无意义的)等。
查询语句通过语法分析器形成查询树传给查询优化器的planner 函数进行优化。
/***************************************************************************** * Query Tree *****************************************************************************/ /* * Query - * Parse analysis turns all statements into a Query tree * for further processing by the rewriter and planner. * * Utility statements (i.e. non-optimizable statements) have the * utilityStmt field set, and the rest of the Query is mostly dummy. * * Planning converts a Query tree into a Plan tree headed by a PlannedStmt * node --- the Query structure is not used by the executor. */ typedef struct Query { NodeTag type; CmdType commandType; /* select|insert|update|delete|utility */ QuerySource querySource; /* where did I come from? */ uint64 queryId; /* query identifier (can be set by plugins) */ bool canSetTag; /* do I set the command result tag? */ Node *utilityStmt; /* non-null if commandType == CMD_UTILITY */ int resultRelation; /* rtable index of target relation for * INSERT/UPDATE/DELETE; 0 for SELECT */ bool hasAggs; /* has aggregates in tlist or havingQual */ bool hasWindowFuncs; /* has window functions in tlist */ bool hasTargetSRFs; /* has set-returning functions in tlist */ bool hasSubLinks; /* has subquery SubLink */ bool hasDistinctOn; /* distinctClause is from DISTINCT ON */ bool hasRecursive; /* WITH RECURSIVE was specified */ bool hasModifyingCTE; /* has INSERT/UPDATE/DELETE in WITH */ bool hasForUpdate; /* FOR [KEY] UPDATE/SHARE was specified */ bool hasRowSecurity; /* rewriter has applied some RLS policy */ bool isReturn; /* is a RETURN statement */ List *cteList; /* WITH list (of CommonTableExpr's) */ List *rtable; /* list of range table entries */ FromExpr *jointree; /* table join tree (FROM and WHERE clauses) */ List *targetList; /* target list (of TargetEntry) */ OverridingKind override; /* OVERRIDING clause */ OnConflictExpr *onConflict; /* ON CONFLICT DO [NOTHING | UPDATE] */ List *returningList; /* return-values list (of TargetEntry) */ List *groupClause; /* a list of SortGroupClause's */ bool groupDistinct; /* is the group by clause distinct? */ List *groupingSets; /* a list of GroupingSet's if present */ Node *havingQual; /* qualifications applied to groups */ List *windowClause; /* a list of WindowClause's */ List *distinctClause; /* a list of SortGroupClause's */ List *sortClause; /* a list of SortGroupClause's */ Node *limitOffset; /* # of result tuples to skip (int8 expr) */ Node *limitCount; /* # of result tuples to return (int8 expr) */ LimitOption limitOption; /* limit type */ List *rowMarks; /* a list of RowMarkClause's */ Node *setOperations; /* set-operation tree if this is top level of * a UNION/INTERSECT/EXCEPT query */ List *constraintDeps; /* a list of pg_constraint OIDs that the query * depends on to be semantically valid */ List *withCheckOptions; /* a list of WithCheckOption's (added * during rewrite) */ /* * The following two fields identify the portion of the source text string * containing this query. They are typically only populated in top-level * Queries, not in sub-queries. When not set, they might both be zero, or * both be -1 meaning "unknown". */ int stmt_location; /* start location, or -1 if unknown */ int stmt_len; /* length in bytes; 0 means "rest of string" */ } Query;在执行堆栈中打印Query查询树:
(gdb) f 7 #7 0x00000000007ecd2a in planner (parse=0x134b9a8, query_string=0x134a598 "SELECT * FROM t1 LEFT JOIN t2 ON t2.a = t1.a WHERE t2.b修改配置文件 postgresql.conf 日志记录parse
1. 打开parse及相关的日志
debug_print_parse = on debug_print_rewritten = on debug_print_plan = on debug_pretty_print = on2. 日志文件
logging_collector = on # Enable capturing of stderr and csvlog log_directory = 'log' # directory where log files are written, log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern, log_file_mode = 0600 # creation mode for log files,3. 日志文件位置
- 在 $PGDATA/log_directory 目录下
- $PGDATA为初始化时设置的data目录
parse后的Query结构体的内容:
2024-05-08 04:04:40.223 EDT [42965] LOG: parse tree: 2024-05-08 04:04:40.223 EDT [42965] DETAIL: {QUERY :commandType 1 :querySource 0 :canSetTag true :utilityStmt :resultRelation 0 :hasAggs false :hasWindowFuncs false :hasTargetSRFs false :hasSubLinks false :hasDistinctOn false :hasRecursive false :hasModifyingCTE false :hasForUpdate false :hasRowSecurity false :isReturn false :cteList :rtable ( {RTE :alias :eref {ALIAS :aliasname t1 :colnames ("a" "b") } :rtekind 0 :relid 16385 :relkind r :rellockmode 1 :tablesample :lateral false :inh true :inFromCl true :requiredPerms 2 :checkAsUser 0 :selectedCols (b 8 9) :insertedCols (b) :updatedCols (b) :extraUpdatedCols (b) :securityQuals } {RTE :alias :eref {ALIAS :aliasname t2 :colnames ("a" "b") } :rtekind 0 :relid 16388 :relkind r :rellockmode 1 :tablesample :lateral false :inh true :inFromCl true :requiredPerms 2 :checkAsUser 0 :selectedCols (b 8 9) :insertedCols (b) :updatedCols (b) :extraUpdatedCols (b) :securityQuals } {RTE :alias :eref {ALIAS :aliasname unnamed_join :colnames ("a" "b" "a" "b") } :rtekind 2 :jointype 1 :joinmergedcols 0 :joinaliasvars ( {VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 1 :location -1 } {VAR :varno 1 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 2 :location -1 } {VAR :varno 2 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 2 :varattnosyn 1 :location -1 } {VAR :varno 2 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 2 :varattnosyn 2 :location -1 } ) :joinleftcols (i 1 2) :joinrightcols (i 1 2) :join_using_alias :lateral false :inh false :inFromCl true :requiredPerms 0 :checkAsUser 0 :selectedCols (b) :insertedCols (b) :updatedCols (b) :extraUpdatedCols (b) :securityQuals } ) :jointree {FROMEXPR :fromlist ( {JOINEXPR :jointype 1 :isNatural false :larg {RANGETBLREF :rtindex 1 } :rarg {RANGETBLREF :rtindex 2 } :usingClause :join_using_alias :quals {OPEXPR :opno 96 :opfuncid 65 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ( {VAR :varno 2 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 2 :varattnosyn 1 :location 34 } {VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 1 :location 41 } ) :location 39 } :alias :rtindex 3 } ) :quals {OPEXPR :opno 97 :opfuncid 66 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ( {VAR :varno 2 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 2 :varattnosyn 2 :location 52 } {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 59 :constvalue 4 [ 5 0 0 0 0 0 0 0 ] } ) :location 57 } } :targetList ( {TARGETENTRY :expr {VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 1 :location 7 } :resno 1 :resname a :ressortgroupref 0 :resorigtbl 16385 :resorigcol 1 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 2 :location 7 } :resno 2 :resname b :ressortgroupref 0 :resorigtbl 16385 :resorigcol 2 :resjunk false } {TARGETENTRY :expr {VAR :varno 2 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 2 :varattnosyn 1 :location 7 } :resno 3 :resname a :ressortgroupref 0 :resorigtbl 16388 :resorigcol 1 :resjunk false } {TARGETENTRY :expr {VAR :varno 2 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 2 :varattnosyn 2 :location 7 } :resno 4 :resname b :ressortgroupref 0 :resorigtbl 16388 :resorigcol 2 :resjunk false } ) :override 0 :onConflict :returningList :groupClause :groupDistinct false :groupingSets :havingQual :windowClause :distinctClause :sortClause :limitOffset :limitCount :limitOption 0 :rowMarks :setOperations :constraintDeps :withCheckOptions :stmt_location 0 :stmt_len 60 } 2024-05-08 04:04:40.223 EDT [42965] STATEMENT: SELECT * FROM t1 LEFT JOIN t2 ON t2.a = t1.a WHERE t2.b