#6336 - Stats are still failing to generate on high-traffic sites
| Identifier | #6336 |
|---|---|
| Issue type | Major issue (breaks an entire feature) |
| Title | Stats are still failing to generate on high-traffic sites |
| Status | Open |
| Handling member | Deleted |
| Version | 11 beta8 |
| Addon | stats |
| Description | Despite optimisation, pre-processing of statistics is still failing on composr.app, probably because of all the guest traffic it has been receiving. We need to optimise even more. |
| Funded? | No |
The system will post a comment when this issue is modified (e.g., status changes). To be notified of this, click "Enable comment notifications".

Comments
My approach is to flatten the data that we store in the database. Instead of dumping serialized data into p_data for each bucket/interval, we will flatten out the keys. Every key-value pair (data point) will be its own row in the database.
Pros:
- Much less memory use as we are not selecting dumps of p_data data; these can easily be multiple MBs;
- The flat key structure means that we can select groups of data points that we want using LIKE `keys||to||select||%` instead of loading entire dumps of data in, running unserialize on them, and finding the data points that we want. Since this is directly SQL, we can also select keys in batches (e.g., 100 at a time) to avoid OOM.
Cons:
- Many more rows in the database (but they will be smaller)
- Many more SQL queries involved (but that's mainly on the scheduler; graphs won't see that much of an increase due to selecting all data points that we need together with a wildcard LIKE statement)
I am testing the following changes:
Processing times have been reduced to about 4-5 minutes. I will continue to monitor the changes.