View Issue Details

IDProjectCategoryView StatusLast Update
5546Composrstatspublic2024-07-23 17:10
ReporterChris Graham Assigned ToGuest  
PrioritynormalSeverityfeature 
Status newResolutionopen 
Summary5546: cms_stats table takes too much space
DescriptionWhen I designed this table, I didn't realize varchar's reserve space for maximum char lengths, in order for record sizes to be regular.

This query reduces it by about 25%:
ALTER TABLE cms_stats MODIFY COLUMN the_page VARCHAR(100), MODIFY COLUMN session_id VARCHAR(15), MODIFY COLUMN operating_system VARCHAR(190);

(Composr's database schema check tool will complain about this, but I verified it is okay by checking field lengths in the existing data set)

For us to do this properly we'd need to align to Composr's meta-DB field lengths, possibly adding new ones. Or, gasp, allowing custom length values (lots of work).

Also, other DB systems store IPs as numeric values as it saves a lot of space. We could consider that.

All this said, I actually reduced table usage down to 1/25th the original size because I think the table needed to be re-optimised (changing field types did that as a side effect). So another consideration is to somehow run optimisations automatically. I'm not sure if there's a good way to do that on MySQL without downtime, but a few minutes of downtime every month or so may be okay to stop the site going down like 2 hours due to lack of disk space like it did for me this morning.
TagsRisk: Database change
Attach Tags
Time estimation (hours)4
Sponsorship open

Sponsor

Date Added Member Amount Sponsored

Relationships

related to 5740 Not AssignedGuest Analyse use of fields in database for possible optimisation 

Activities

PDStig

2024-01-23 13:20

administrator   ~8224

Last edited: 2024-01-23 13:21

Storing IPs as numerics might be problematic for v6 IP addresses considering the 32-bit limit on integers.

I think we could use a new custom Composr type for SESSION (VARCHAR(15)) or even just call it TOKEN to also include things like CSRF. It might come in handy especially for the privacy system to auto-anonymise those in the serialise method.

Anywhere else where we could, we could perhaps add a custom DB-meta type as well... perhaps ID_CODE for VARCHAR(100) (if there would never be a case a code name would be > 100 characters... ideally this should never happen as code names are often used in URLs).

I think we should generally scrub DB-Meta and adjust current types to save on space, just making sure for instance we're not using longtext on something that should be short_text. I noticed for example in v11 there were several username fields that were short_text when they should have been ID_TEXT. So v11 could get a meta look-over at least, and v12 implementation of new types.

PDStig

2024-06-07 20:11

administrator   ~8837

Upon further thought, I believe this should be tackled for v11. The stats table has been an ongoing issue for space in both v9 and v10. We should address it ASAP.

Chris Graham

2024-07-22 20:11

administrator   ~8864

"I think we should generally scrub DB-Meta and adjust current types to save on space, just making sure for instance we're not using longtext on something that should be short_text. I noticed for example in v11 there were several username fields that were short_text when they should have been ID_TEXT"

SHORT_TEXT is varchar(255)
ID_TEXT is varchar(80)

The maximum_username_length option is configurable. It should be capped at 255, and is not, so I'll fix that. Regardless, ID_TEXT is not long enough.

Using LONG_TEXT when we could get away with SHORT_TEXT is not necessarily good. LONG_TEXT uses fewer bytes due to variable-length encoding. But there's a performance penalty, and indexing penalty, to it.

I don't think in general we're in a bad spot. You can buy a 1TB SSD for under $70 now. The stats table is a particular concern because it just grows and grows, stores a lot, and it needs to be very indexable.
I'm untagging as v11 though because it's too late now, and it's not an order-of-magnitude kind of fix.

A SESSION field type does make sense. Reducing length of pages and operating_system in there would require a bit more thought.

Good point about ipv6.

Chris Graham

2024-07-22 20:25

administrator   ~8865

I stand corrected, f_members.m_username is actually an ID_TEXT. However, members from outside Composr could have longer usernames, so the inconsistency may be expected.

Add Note

View Status
Note
Upload Files
Maximum size: 32,768 KiB

Attach files by dragging & dropping, selecting or pasting them.
You are not logged in You are not logged in. This means you will not get any e-mail notifications. And if you reply, we will not know for sure you are the original poster of the issue.

Issue History

Date Modified Username Field Change
2024-01-14 15:21 Chris Graham New Issue
2024-01-14 15:21 Chris Graham Tag Attached: Roadmap: v12
2024-01-23 13:20 PDStig Note Added: 0008224
2024-01-23 13:20 PDStig Note Edited: 0008224
2024-01-23 13:21 PDStig Note Edited: 0008224
2024-03-26 00:58 PDStig Tag Renamed Roadmap: v12 => Roadmap: Over the horizon
2024-06-07 20:10 PDStig Tag Attached: Risk: Database change
2024-06-07 20:11 PDStig Tag Detached: Roadmap: Over the horizon
2024-06-07 20:11 PDStig Tag Attached: Roadmap: v11
2024-06-07 20:11 PDStig Note Added: 0008837
2024-07-22 20:11 Chris Graham Note Added: 0008864
2024-07-22 20:11 Chris Graham Tag Detached: Roadmap: v11
2024-07-22 20:25 Chris Graham Note Added: 0008865
2024-07-23 17:10 PDStig Relationship added related to 5740