Available in versions: Dev (3.19) | Latest (3.18) | 3.17 | 3.16 | 3.15 | 3.14 | 3.13 | 3.12 | 3.11 | 3.10 | 3.9
Throttling
Applies to ✅ Open Source Edition ✅ Express Edition ✅ Professional Edition ✅ Enterprise Edition
When importing large data sets, it may be beneficial to explicitly define the optimal size for each:
- Bulk size: The number of rows that are sent to the server in one SQL statement. Defaults to 1.
- Batch size: The number of statements that are sent to the server in one JDBC statement batch. Defaults to 1.
- Commit size: The number of statement batches that are committed in one transaction. Defaults to 1.
All of the three types of throttling can be combined.
Not all RDBMS offer the same optimisation capabilities. Please refer to your database manual to learn how these tuning capabilities may affect your data import performance. Also, actual measurements may help improve these numbers. Do not optimise prematurely, or based on assumptions. Always measure if your optimisation has the desired effect!
The bulk size
Bulk data processing is important in SQL, which is a set based language. It is possible to express bulk INSERT
statements as well with INSERT .. VALUES or INSERT .. SELECT. Sophisticated RDBMS may use these statements to improve the disk block allocation process, because if an INSERT
statement has more than 1 row, the optimiser knows better how much space will be needed to store the incoming data. This approach also helps "clustering" inserted data (keeping data that is inserted at the same time in local disk block "clusters"), which may be beneficial if the same data is also frequently read, later on. While the benefit may be marginal on SSDs or other random access disks, it may be significant on HDDs.
There are 3 possible, mutually exclusive configurations of specifying the bulk size:
create.loadInto(BOOK) // Put all the data in a single bulk statement. .bulkAll() // Put up to 32 rows in a single bulk statement. .bulkAfter(32) // Do not put more than 1 row in a statement. .bulkNone() .loadCSV(inputstream) .fields(BOOK.ID, BOOK.AUTHOR_ID, BOOK.TITLE) .execute();
The batch size
Batch data processing allows for reducing the network traffic overhead, because it allows the JDBC driver to buffer bind values for several subsequent statement executions and send them all in one go.
There are 3 possible, mutually exclusive configurations of specifying the batch size:
create.loadInto(BOOK) // Execute all statements (bulk or not) in a single large statement batch. .batchAll() // Put up to 32 statements (bulk or not) in a single statement batch. .batchAfter(32) // Execute each statement (bulk or not) individually. .batchNone() .loadCSV(inputstream) .fields(BOOK.ID, BOOK.AUTHOR_ID, BOOK.TITLE) .execute();
The commit size
Committing a transaction can be a costly operation if done too often, or not often enough. If there are too many commits, this can lead to a lot of logging overhead on the server. If a too many changes are left uncommitted for too long, there may be too much locking in 2PL transaction models, or log contention in MVCC transaction models. An empirically discovered, optimal commit size that leads to committing e.g. 1000 rows (or 10000, or 100, please measure what works best for you) may produce best results.
There are 3 possible, mutually exclusive configurations of specifying the batch size:
create.loadInto(BOOK) // Commit all statements (batch, bulk, or not) in a single large transaction. .commitAll() // Put up to 32 statements (batch, bulk, or not) in a transaction. .commitAfter(32) // Commit each statement (batch, bulk, or not) in a transaction, just like commitAfter(1) .commitEach() // Do not commit any statement, leave committing to client code .commitNone() .loadCSV(inputstream) .fields(BOOK.ID, BOOK.AUTHOR_ID, BOOK.TITLE) .execute();
Feedback
Do you have any feedback about this page? We'd love to hear it!