Skip to content

Improve performance for tables with composite primary keys#1686

Open
coding-chimp wants to merge 4 commits into
github:masterfrom
Shopify:composite-union
Open

Improve performance for tables with composite primary keys#1686
coding-chimp wants to merge 4 commits into
github:masterfrom
Shopify:composite-union

Conversation

@coding-chimp
Copy link
Copy Markdown
Contributor

A Pull Request should be associated with an Issue.

Related issue: #1669

Description

This PR introduces an optimized query pattern for tables with composite primary keys. As described in the linked issue, gh-ost's queries against these tables can currently be very slow. We've seen a single query take up to 16 minutes, resulting in slow migrations that might not even finish when gh-ost falls behind in binlog streaming. The new query pattern has a constant query time (~30ms in our testing) regardless of the table's data shape. We're only introducing it for two-column composite keys because the complexity would grow exponentially with more columns.

We're introducing the new query pattern in BuildRangeInsertQuery, BuildUniqueKeyRangeEndPreparedQueryViaOffset, BuildUniqueKeyRangeEndPreparedQueryViaTemptable.

I've tested the changes against real data on a test DB, and the migration took ~12h to complete. In comparison, I'm currently running the same migration again without these changes, and it has only reached 15.6% after 22h. The current ETA is 87h.

In case this PR introduced Go code changes:

  • contributed code is using same conventions as original code
  • script/cibuild returns with no formatting errors, build errors or unit test errors.

Comment thread go/sql/builder.go
col2StartOp := string(minRangeComparisonSign)
fromClause := fmt.Sprintf("%s.%s force index (%s)", databaseName, originalTableName, uniqueKey)

if sameFirstColumnValue(rangeStartArgs, rangeEndArgs) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sameFirstColumnValue reports whether the first column's range start and end bounds are equal. When true, the range fits within a single col1 partition, and a simpler query must replace the 3-part UNION.

It's not just a performance shortcut, it's a correctness requirement. Look at what the UNION produces when col1_start == col1_end:

  • Part 1: col1 = x AND col2 > start_col2 - no upper bound on col2, so it includes all rows, including rows above end_col2
  • Part 2: col1 > x AND col1 < x - returns nothing
  • Part 3: col1 = x AND col2 <= end_col2 - no lower bound, so it includes rows below start_col2

The UNION ALL result is essentially every row where col1 = x. The outer ORDER BY ... LIMIT 1 OFFSET chunkSize-1 then picks the Nth row, which is the wrong boundary.

The simplified query fixes this:

WHERE col1 = ? AND col2 > ? AND col2 <= ?

Both bounds are explicit, and because col1 is pinned with equality, MySQL can use the composite index directly as a seek on (col1, col2).

Comment thread go/sql/builder.go
return result, explodedArgs, nil
}

func buildUniqueKeyRangeEndTwoColumnViaTemptable(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UNION optimization is arguably unnecessary for the ViaTemptable code path, since it'll only be called once at the end of the migration, when the query is unlikely to be slow. However, it provides consistency with ViaOffset and doesn't add much code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant