View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
6160 | Composr | General / Uncategorised | public | 2025-03-02 22:39 | 2025-03-14 23:27 |
Reporter | jacobgkau | Assigned To | Guest | ||
Priority | normal | Severity | trivial | ||
Status | new | Resolution | open | ||
Product Version | 10.0.43 | ||||
Summary | 6160: Scalability considerations for block_side_stats | ||||
Description | Currently, the side stats block offers page hit counters for today, this week, and this month. The code for all three of these takes the current time, subtracts the given time interval, then does a COUNT in the MySQL table that logs hits. Over the past few days, I've had a massive influx of new web scraping (which I understand many are dealing with this year as corporations attempt to train their AI models on public websites). My website, which actually only gets <100 valid visitors per day according to my Matomo analytics, has logged half a million hits in the last 24 hours, nearly 2 million hits over the last week, and nearly 5 million hits over the last month according to Composr's stats. I started seeing my website fail to load altogether due to MariaDB's max connections limit being surpassed. Looking at the active processes in MariaDB, I saw that many of the running processes were simply selecting counts from the hit table-- mostly the "this month" query (which I could see via the timestamp used in the query). When I ran one of the "this month" queries manually, it took several seconds to complete. It makes sense that hundreds of simultaneous connections all attempting to do this (while the hits are still being updated, and thus MariaDB's internal cache constantly invalidated) would overwhelm the database server. For the time being, I've mitigated this issue by disabling the "this month" counter (since that one took the longest to count), as well as at the network level by blocking the most egregious scraper in my firewall. However, given that some websites might ideally get a massive amount of traffic, this incident's raised questions about the implementation of the stat counters. For me, the purpose of "Hits this week:" and "Hits this month:" are to provide a general order of magnitude. The specific number is interesting, but what people actually see when they glance at the number is mostly if it's in the hundreds, thousands, etc. Therefore, I don't think "Hits this week:" and "Hits this month:" actually need to be constantly updated (by taking a reading of "hits between X days ago from this very second" every time like Composr does now). Instead, it would be much more scalable if, for example, "Hits this week:" and "Hits this month:" calculated the number of hits over the last X days ONE time per day, then cached/stored that value. That way, every single website visitor that day would only have to select a single number from the database, rather than performing a COUNT of potentially millions of rows. This would cut down on database load and increase webpage load times. | ||||
Steps To Reproduce | 1. Enable the stats module and add block_side_stats to a page 2. Receive massive amounts of traffic from poorly designed web scrapers 3. Observe heavy database load causing website failure | ||||
Additional Information | I've checked and confirmed the code relevant to this hasn't changed yet in v11 or newer versions of v10 than what I'm running. | ||||
Tags | No tags attached. | ||||
Attach Tags | |||||
Attached Files | |||||
Time estimation (hours) | |||||
Sponsorship open | |||||
|
Nitpick: I of course meant "decrease webpage load times" or "increase webpage performance" in my last sentence. It doesn't look like regular users are allowed to edit their own issues in this tracker? |
|
Hello, thank you for the report. Correct, only developers can edit issues. What you did was fine, commenting any corrections. Blocks should have a cache on them to prevent situations like this. It sounds like either you have cache disabled on this block, or the cache is not working as it should. Can you double-check with me to see if cache is enabled on the instance of the block? You should be able to do that in the WYSIWYG editor of the Comcode page, or in the zone editor if it's in a panel. If not using WYSIWYG, if you do not see any cache parameters defined in the Comcode for that block, then it is enabled (enabled by default unless specified to be disabled). Also I would recommend enabling rate limiting on the site. It is indeed a problem for a lot of Composr sites recently. You can enable rate limiting via yoursite/config_editor.php . |
|
The Cache drop-down was already set to "caching" in the block editor. The "Quick Cache" checkbox was not checked, however. The block text is just "[block]side_stats[/block]". How exactly is the caching supposed to work? I did notice that if I simply refreshed my logged-in session or refreshed a private tab, the numbers did not update every single time. However, the numbers were different between my logged-in session and the private tab, so whatever caching's being done didn't seem very aggressive. I don't think this is relevant to the issue, but in case it's somehow related to the caching not working properly (if that's the case): in the admin zone's settings for this block, with all the checkboxes for different stats the block can display, I'm missing the checkboxes for "Hits this week" and "Hits this month." I only have a checkbox for "Hits today." In order to disable "Hits this month," I had to go into my database and update the `activity_show_stats_count_page_views_this_month` setting manually (then clear the block cache via Composr's cleanup tools). I haven't yet figured out why that is; while troubleshooting, I found that I haven't edited the block template or any files under `sources` regarding this block. |
|
You indicated on the issue you are running version 10.0.43. Is there anything stopping you from upgrading to 10.0.50, the latest v10 version? I'm not 100% sure on the fixes made in that time period, but that would be a good first step to see if any of these issues resolve themselves. Note that version 10 isn't maintained anymore except for critical issues (development is happening mainly on v11 beta now), so it is unlikely I will fix this for v10 since a good workaround is to disable the stat. But I'm wondering if cache is fundamentally broken for you. To answer your question about how cache works in blocks. Most blocks (sources/blocks and sources_custom/blocks) have a function which defines their cache conditions (the code below comes from side_stats): /** * Find caching details for the block. * * @return ?array Map of cache details (cache_on and ttl) (null: block is disabled). */ public function caching_environment() { $info = array(); $info['cache_on'] = ''; $info['ttl'] = 15; return $info; } The cache_on parameter defines a string of something that can be evaluated / serialized and MD5 hashed by PHP (usually an array) to produce an identifier which determines on what to cache against (for example, if you have multiple instances of the same block but with different parameters, this ensures they don't share the same cache since they would produce different output given the different parameters). TTL is the time to live, in minutes, of how long the cache lasts. Cache can be stored in multiple ways depending on your server environment. It might be persistent cache if you have a cache extension enabled on PHP, it might be in the database *_caches table, or it might be a file under caches/persistent. |
|
I spent some time tonight upgrading from 10.0.43 to 10.0.50 (it takes a while since I have to diff and merge changes into all of my _custom files to avoid other random breakage). The caching behavior still looks like what I described. I'm attaching a video that shows the numbers differing between a logged-in session and a private window, but persisting between refreshes within each window. From what you described, it doesn't sound like this particular cache should be specific to a user session-- is it more complex than you described, or is it not working how it's supposed to in my environment? |
|
For a bit more information about my environment, I do not have a `caches/persistent` folder, but I do have a `cms_cache` (not `caches`) table in my database. I noticed that one of the fields in that table is called `the_member`. It is `NULL` for some (multiple) rows where `cached_for` is `side_stats`. Other items that differ between them include `timezone` and `is_bot`. Basically, it seems like the web scrapers hammering my site over the weekend were somehow appearing as needing a different cache very often. (I did figure out while watching my logs that a lot of the requests were going through every single day of every member's calendar, and some of the URLs included `keep_su=test` for some reason; I went ahead and disabled the calendar add-on since nobody's ever used it on my site, along with turning on rate limiting like you suggested and taking other steps, but I'm still interested in how this block's queries were running so often during the "attack.") |
|
Oh, and I still do not have `Hits this week` or `Hits this month` in my admin zone GUI, only `Hits today`. The strings are in `lang/EN/global.ini`, and the settings are defined under `/sources/hooks/systems/config/` in `activity_show_stats_count_page_views_this_month.php` and `activity_show_stats_count_page_views_this_week.php` (which are not being overridden in `sources_custom`). Not sure what else is needed for them to show up in the admin zone GUI. |
|
I noticed while troubleshooting high disk usage today that my `errorlog.php` file was taking up 5.2GB (it appears that it's never been cycled out or truncated since the website was first installed in 2016). For well over a year (from 27-Nov-2023 until 03-Mar-2025 when I did the latest upgrade), the following log line was repeating constantly (sometimes more than once per second): ``` PHP Deprecated: Creation of dynamic property Self-learning_cache::$keys_initial is deprecated in /var/www/html/sources/caches.php on line 212 ``` I'm assuming that could've had something to do with the issue I experienced? It seems to just be a warning, but would that code getting hit constantly imply something was going wrong before that in the caching process, for example? This is a VPS on an SSD, but I'm assuming the constant writing to disk could've also contributed to resource pressure. (After the upgrade, I'm currently receiving the following log line constantly instead: ``` PHP Deprecated: setcookie(): Passing null to parameter 5 ($domain) of type string is deprecated in /var/www/html/sources/integrator.php on line 121 ``` Which I'm assuming, for now, is unrelated to this issue/block.) |
|
Yeah that's unrelated. You said you are running 10.0.43. You should upgrade to 10.0.50 because it fixes some of those deprecation issues. If you still get them then please add as a separate issue (and include your PHP version). Also there should be a setting to prune out the error logs every X days (I think it's under privacy but not 100% sure for v10; it's been months since I worked on v10 but will be flipping back over to it shortly) |
|
I told you that I already upgraded to 10.0.50 on 3/5. |
|
I'm not seeing the error log prune setting you're referring to (again, on 10.0.50). That is tangential to this particular tracker issue, though. |
Date Modified | Username | Field | Change |
---|---|---|---|
2025-03-02 22:39 | jacobgkau | New Issue | |
2025-03-02 22:46 | jacobgkau | Note Added: 0009835 | |
2025-03-02 22:55 | PDStig | Note Added: 0009836 | |
2025-03-03 00:10 | jacobgkau | Note Added: 0009838 | |
2025-03-03 00:39 | PDStig | Note Added: 0009839 | |
2025-03-03 00:39 | PDStig | Note Edited: 0009839 | |
2025-03-03 00:43 | PDStig | Note Edited: 0009839 | |
2025-03-05 05:33 | jacobgkau | File Added: 2025-03-04 22-31-18.mp4 | |
2025-03-05 05:33 | jacobgkau | Note Added: 0009841 | |
2025-03-05 05:52 | jacobgkau | Note Added: 0009842 | |
2025-03-05 05:59 | jacobgkau | Note Added: 0009843 | |
2025-03-13 21:21 | jacobgkau | Note Added: 0009859 | |
2025-03-14 14:53 | PDStig | Note Added: 0009861 | |
2025-03-14 23:22 | jacobgkau | Note Added: 0009871 | |
2025-03-14 23:27 | jacobgkau | Note Added: 0009872 |