Tuesday, March 22, 2016

Native Compilation Part-2 @ PGDay Asia 2016 Conference

Finally first PGDay Asia held in Singapore has successfully finished. I am also one of the organizer of this conference. It took months of hard work by many specially Sameer Kumar to achieve this.

I had also opportunity to present my paper on Native Compilation part-2, it was subsequent to my paper presented in last PGCon 2015 (Procedure compilation is additional).

Summary of this talk (For detail talk visit Native Compilation at PGDay Asia 2016):

Why Native Compilation Required:

Seeing the current hardware trend, I/O is no more bottleneck for the executor mainly because of below reasons:
1. Increase in RAM Size
2. Prevalence of High speed SSD
At the same time, though lot of work has happened in hardware but there is no focus on improving the CPU efficiency. Because of this, current executor biggest bottleneck is CPU usage efficiency not I/O. So in order to tackle CPU bottleneck without compromising on any feature, we need mechanism to execute lesser number of instruction and still getting all functionality.


What is Native Compilation:

Native Compilation is a methodology to reduce CPU instructions by executing only instruction specific to given query/objects unlike interpreted execution. Steps are:
 1. Generate C-code specific to objects/query.
 2. Compile C-code to generate DLL and load with server executable.
 3. Call specialized function instead of generalized function.

Native compilation can be applied to any entity which are inefficient in terms of CPU e.g.:
 1. Table
 2. Procedure
 3. Query

Table Native Compilation (Aka Schema Binding):

Since most of the properties of a particular table remains same once it is created, so its data gets stored and accessed in the similar pattern irrespective of any data. So instead of accessing tuples for same relation in generic way, we create a specialized access function for the particular relation during its creation and the same gets used for further query containing that table. This approach eliminates the need for various comparison (e.g. data-type check, length check, number of attribute check) during the access of tuple. More the deforming of tuples contribute to total run of query, more performance will be observed. Generation of code can be done in 3-ways, details of each of them is available in presentation.


Fig-1: Table compilation


Once a create table command is issued, a C-file with all specialized access function is generated, which is in turns gets loaded as DLL. These loaded functions are used by all SQL query accessing the compiled table.

Performance Improvement:

Below is the performance improvement observed on standard TPC-H benchmark.
The system configuration is as below:
SUSE Linux Enterprise Server 11 (x86_64), 2 Cores, 10 sockets per core
TPC-H Configuration: Default

Graph-1: TPC-H Performance

Table-1: TPC-H Performance 



With the above performance result, we observed that there is more than 70% reduction in CPU instruction and upto 36% performance improvement on TPC-H queries.

Procedure Compilation:

There are two parts in any of the procedure:
 1. SQL Query statements
 2. Non-SQL statement (e.g. loop, conditional, simple expression etc).

In this presentation, we focused only on 2nd part. As part of this whenever we create the procedure, we compile the Pl/Pgsql function into corresponding normal C-function. While compiling:
 1. Transform the Pl/Pgsql procedure signature to normal C-function signature.
 2. Transform all variable declaration as C-variable declaration.
 3. Transform all other non-SQL statement in corresponding normal C-statement.
 4. All SQL statements are transformed in the format SPI_xxx as below:
if (plan == NULL)
{
stmt = SPI_prepare(modified_query, number_of_variables, type_of_variable used);
SPI_keepplan(stmt);
plan = stmt;
}
SPI_execute_plan(plan, value,  isnull,  false, row_count);
 5. Finally compile the generated C-function using a traditional compiler to generate DLL and link  the same with SERVER executable.
 6. So on subsequent execution, directly C-function will be called instead of Pl/Pgsql function.


Fig-2: Procedure Compilation

Fig-3: Compiled Procedure invocation

Fig-2 highlights at what step and how procedure will be compiled.  Once the parsing of procedure is done, we will have all information about the procedure in PLpgSQL_function pointer, which we can traverse as planner does and for each statement corresponding C-code can be generated.
Fig-3 explain, how compiled function will be invoked.

How number of instruction reduced:

Consider a statment as x=2+y in Pl/Pgsql procedure. This statement will be executed as a traditional SELECT command as "SELECT 2+y;". So everytime this procedure gets executed, this query will be executed as if it was normal SQL query and hence many CPU instruction are executed.
But if we convert this to C-based statment, it execution will be directly as 2+y and hence will execute very few instruction comapred to original one.

Performance Improvement:

Below is the performance improvement observed on standard TPC-C benchmark:
The system configuration is as below:
SUSE Linux Enterprise Server 11 (x86_64), 2 Cores, 10 sockets per core
TPC-C Configuration: runMins=0, runTxnsPerTerminal=200000
Checkpoint_segment = 100

Graph-2: TPC-C Performance

Table-2: TPC-C Performance

With some basic compilation of procedure, we are able to get around 23% performance improvement.

Query Compilation:

Though this was not part of my presentation but would like to give some details to do this.
Unlike above two compilation, in order to compile query we need to use LLVM as the compilation time is very important while executing the query.
LLVM is open source compilation framework for very efficient and fast compilation. The LLVM compiler can only understand the IR (Internal representation something similar to assembly code). So before compilation using LLVM, we need to generate IR form of the code, which can be either of the two ways:
1. Generate C-code and generate IR for this using Clang (a tool built for C-code)
2. Or just use the IR-Builder interface function provided by LLVM to generate code directly in IR format.
The steps to compile query is as below:
1. Generate the Query plan as it is.
2. Choose the plan node to be compiled (if whole query to be compiled, then all nodes), generate a specific code corresponding to the node. This can be done in either of the ways explained above.
3. Compile the generated code using LLVM.
4. The code compiled using LLVM will return an object handler, which can be attached to the corresponding node.
5. So while executing that node, we can call compiled stored object if the compilation object stored for the corresponding node.

The most of the benefit is expected out of this query compilation. Once query compilation is done, even the SQL query statement of procedure also can be compiled in the same way (perhaps nothing explicit will be required for this...).

Conclusion:

Seeing the industry trend, we have implemented two way of specialization, which resulted in up to 36% and 23% of performance improvement on standard benchmark TPC-H and TPC-C respectively.


Any feedback/comments/queries are welcome.

17 comments:

  1. Very interesting topic.
    Oracle done something similar from 9i ownwards.. they called it NCOMP.
    http://www.oracle.com/technetwork/database/features/plsql/ncomp-faq-087606.html
    upto 10g they used C compliner in the system. process was same.. PL -> c -> shared library
    but from 11g onward they are shipping a built in compiler which is capable of directly converting pl code to native code.
    But the second part (Query compilation) is something unique. that may add great value to postgres

    ReplyDelete
    Replies
    1. Thanks for replying.
      Yes, Oracle also does something similar. Any idea how much performance improvement observed in oracle with this approach?
      SQL Server 2014 (Hekaton) also does procedure compilation along with SQL queries of procedure.

      Delete
    2. Oracle's own internal benchmarks claims 10 to 14 times reduction in "CPU time" even though total time spend on the entire block of PL code was not that great.
      (Personally i don't trust their internal benchmarks because of past experiences).
      Here I see an independent benchmark.
      http://logicalread.solarwinds.com/oracle-11g-native-compilation-mc02/#.VvS6K-J97Dc

      Moreover PL code improvement is not so important in todays world where far less code is developed in PL. There is a great shift towards Object Oriented languages and frameworks for application development.
      The pre-optimized, stored execution-plan is the great advantage of Oracle. Quries with unstable execution plans can be just "profile"ed. Looking at your proposal for Query compilation opens door for the same.

      Delete
    3. You are right. 10-15 times does not seem to be possible unless any other optimization applied on top of native compilation.
      Performance reading given in your shared independent benchmark seems to be more reasonable.
      I got around 23% improvement on standard TPC-C (Though I feel, it can be further optimized.)
      Yes Query optimization along with LLVM is going to be major shift technology to gain performance improvement in multiple of X's.

      Delete
  2. Is this already implemented in Postgre?

    ReplyDelete
    Replies
    1. It is not yet implemented in core PostgreSQL.
      All experiment/prototyping was done on PostgreSQL.

      Delete
  3. Where is the source code and the test harness you used to test it?

    ReplyDelete
    Replies
    1. Still the source code has not be open sourced. We have just prototype to judge potential of this technology and we are happy the way it has turned out.
      We have used standard TPC-H and TPC-C benchmark for all of our testing.
      Please let me know if this answer your question.

      Delete
    2. Not exactly. Running TPC-* is not a trivial setup, so "the test harness" would include steps to reproduce your results when pointed at a stock PostgreSQL database.

      You'll get more people interested in the code if you make it available as a patch for inclusion in stock PostgreSQL. As it is now, people have no way to test it. If there's some issue around that, you might want to set up a way to give external testers a way to get an endpoint.

      Delete
  4. Good to see my post being cited by Bruce Momjian's (One of the core member of PostgreSQL) recent Post:
    http://momjian.us/main/blogs/pgblog/2016.html#April_1_2016

    ReplyDelete
  5. Get Blazing Fast Asia Optimized Server For Affordable Price Here - THESTACK.NET j

    ReplyDelete
  6. Greetings! Very helpful advice within this post! It is the little changes that will make the most significant changes. Thanks a lot for sharing! Feel free to visit my website; 온라인카지노

    ReplyDelete
  7. I am really glad to glance at this website posts which includes plenty of useful data, thanks for providing such information. 한국야동

    Also feel free to visit may web page check this link 야한동영상

    ReplyDelete
  8. Thank you. I authentically greeting your way for writing an article. I safe as a majority loved it to my bookmark website sheet list and will checking rear quite than later. 야설

    Also feel free to visit may web page check this link 한국야동

    ReplyDelete
  9. This is a great post I seen because of offer it. It is truly what I needed to see seek in future you will proceed after sharing such a magnificent post. 야동

    Also feel free to visit may web page check this link 일본야동

    ReplyDelete
  10. Having read this I believed it was extremely enlightening. I appreciate you spending some time and effort to put this short article together. 국산야동

    Also feel free to visit may web page check this link 야설

    ReplyDelete
  11. Yes i am totally agreed with this article and i just want to say that this article is very nice and very informative. 한국야동

    Also feel free to visit may web page check this link 야설

    ReplyDelete