View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
4717 | Composr | core | public | 2021-11-18 01:00 | 2022-09-12 14:04 |
Reporter | Chris Graham | Assigned To | Guest | ||
Priority | normal | Severity | feature | ||
Status | new | Resolution | open | ||
Summary | 4717: GUIDs instead of auto-incrementing IDs | ||||
Description | In order to have content running across a geo-distributed cloud, i.e. outside of a single server farm, we need to move from auto-increment IDs. Otherwise to avoid ID conflicts we have to rely on a single shared database with high latency times. (Actually there is a fudge, to set different base values for auto-increment - but this is a horrible hack in my opinion and a recipe for big headaches). If we do this, it makes sense to standardise on the notion that most content types will have: GUID, Title, Moniker. So galleries for example would no longer have a 'name' field, that'd be replaced with the GUID too. Implementing this issue also allows syndicating content from one site to another, e.g. from a staging site to a live site. | ||||
Tags | Risk: Core rearchitecting , Roadmap: Over the horizon, Type: Cloudification, Type: Cross-cutting feature | ||||
Attach Tags | |||||
Time estimation (hours) | 32 | ||||
Sponsorship open | |||||
related to | 3046 | Not Assigned | Guest | Composr | Drop 32-bit support (on hold) |
related to | 4718 | Not Assigned | Guest | Composr | High quality staging server support (content push) |
related to | 3549 | Resolved | PDStig | Composr | Don't rely on ID sorting for these tables |
child of | 3792 | Not Assigned | Guest | Composr website (compo.sr) | Host on geo-distributed ARM cluster |
|
Also, look at the table_id_locking_start/table_id_locking_end functions. These will no longer be needed with UUIDs. |
|
We don't actually need to use a UUID (aka GUID) per se. I've done a deep dive into unique identifier formats. Features commonly available in different formats: 1) Sortable by time to establish (potentially) sub-millisecond order 2) Massive pool of possible values to reduce chance of collision at a global scale (e.g. for a social network, or otherwise large networks of machines communicating with each other across a common ID domain) 3) Identification of machine creating ID 4) Secure booking of IDs by a ticketing server if we do not trust nodes to generate appropriate IDs on their own 5) Speed of generation 6) Efficient insertion into B-tree indexes, always at end of latest record block 7) Efficient retrieval of recent records, always in the same few record block(s) 8) Hard to guess what new records may be being created. Most of these do not apply to Composr: 1) Ordering by the add_date columns should be fine for us (millisecond precision). 2) As we are scoped to a single site (and not a massive population-scale social networking site) we do not need a massive identifier pool. 3) We do not need this particularly, although might be a nice-to-have. 4) Over-complex, we can reasonably trust any server in a Composr cloud. 5) While this may have been an issue when UUIDs were first specified in the 80s, I don't think it's an issue nowadays. 6/7) This is important. We need the sequence to be ROUGHLY sequential. 8) Nice to have. If we can fit into 64 bits we can just do as an integer on a 64 bit machine, which makes things far easier for us, and far more efficient (no storing long strings). We can even avoid making changes to our database layer or remove auto_increment from the schema. We could simply do a wrap around query_insert calls to retrofit the existing code to create the IDs... $GLOBALS['SITE_DB']->query_insert('table', create_unique_db_id(['a' => 123])); create_unique_db_id could be a null-op unless we enable a new config option. In terms of the actual format: Bit 1: Always 1, as we always want positive numbers Bits 2-33 (32 bits): Seconds since 2022 (good for 2^32/60/60/24/365=136 years) Bits 34-64 (31 bits): Random component (good for 2^31=2147483648=2.1 billion unique values within any second, making collisions unlikely) Working out the exact chance of a collision after repeated trials using https://www.statology.org/binomial-distribution-calculator/ p, Probability of failure on a given trial: 1/2147483648=0.00000000046566129 n, Number of trials: 1000000 (1 million) k, Number of failures: 1 P(X>=1) = 0.00047 = 0.047% i.e. After 1 million IDs generated within a 1-second time window of another ID, there's only 0.047% of a collision. If we put in numbers for 1 billion trials, there's a 37% chance of a collision. I think that's about reasonable, failing once per billion records. References: https://en.wikipedia.org/wiki/Universally_unique_identifier https://medium.com/geekculture/the-wild-world-of-unique-identifiers-uuid-ulid-etc-17cfb2a38fce https://segment.com/blog/a-brief-history-of-the-uuid/ https://www.ietf.org/id/draft-peabody-dispatch-new-uuid-format-04.html https://stackoverflow.com/questions/56611375/innodb-clustered-index-performance-when-using-random-values-as-primary-key |
|
Actually we would need to update all the AUTO columns to be bigints. Add we should add an automated test to ensure create_unique_db_id is used whenever necessary, or perhaps a dev-mode check. |
|
It's also worth noting we could consider making the GUIDs mandatory, and then massively simplify the cms_merge importer - which would help a lot with long term maintenance. |
|
One thing to consider is that we will need to make sure all DB Drivers return the 'id' field for insert queries even if it is explicitly set (i.e. not coming from auto-increment). Currently this is not consistent, and I know for sure the XML DB driver does not. |
Date Modified | Username | Field | Change |
---|---|---|---|
2021-11-18 01:00 | Chris Graham | New Issue | |
2021-11-18 01:00 | Chris Graham | Tag Attached: Risk: Core rearchitecting | |
2021-11-18 01:00 | Chris Graham | Tag Attached: Type: Cloudification | |
2021-11-18 01:00 | Chris Graham | Tag Attached: Type: Cross-cutting feature | |
2021-11-18 01:01 | Chris Graham | Description Updated | |
2021-11-18 01:02 | Chris Graham | Relationship added | related to 4718 |
2021-11-18 01:03 | Chris Graham | Tag Attached: Roadmap: v12 | |
2021-11-18 01:04 | Chris Graham | Relationship added | related to 3549 |
2022-07-28 02:47 | Chris Graham | Relationship added | child of 3792 |
2022-08-25 21:57 | Chris Graham | Relationship added | related to 3046 |
2022-08-26 00:28 | Chris Graham | Note Added: 0007490 | |
2022-08-26 00:28 | Chris Graham | Note Added: 0007491 | |
2022-08-26 00:33 | Chris Graham | Note Added: 0007492 | |
2022-08-26 02:15 | Chris Graham | Note Edited: 0007491 | |
2022-08-26 16:34 | Chris Graham | Note Edited: 0007491 | |
2022-09-01 18:58 | Chris Graham | Note Added: 0007498 | |
2022-09-12 14:04 | Chris Graham | Note Added: 0007509 | |
2024-03-26 00:58 | PDStig | Tag Renamed | Roadmap: v12 => Roadmap: Over the horizon |