#3797 - Cleanup queries so can_arbitrary_groupby/remove_duplicate_rows – Composr CMS: Your Data, Your Privacy, Your Control

This is a spacer post for a website comment topic. The content this topic relates to: #3797 - Cleanup queries so can_arbitrary_groupby/remove_duplicate_rows

By Guest posted 17th Nov 2019, 11:39 AM

Also look at the export_* tasks, and try and make sure they don't have to load all rows into memory before starting streaming out a spreadsheet.

By Guest posted 9th Feb 2020, 6:23 PM

This is more complex than I thought.

In many cases we are doing JOINs that can bind 0-to-many rows because we want to get some results back from the join. So we cannot simply switch to EXISTS or IN clauses.

A better solution is probably just to be more careful about what fields we SELECT, so that we're only selecting stuff covered in the GROUP BY clause. That will only work when we don't want to SELECT something we explicitly know might be duplicated up.

By Guest posted 9th Feb 2020, 6:25 PM

MySQL error message is like:
"Expression 28 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'cms.p.id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by"

When doing "SELECT * FROM cms11_f_topics t JOIN cms11_f_posts p WHERE t.id=1 GROUP BY t.id;"

But this is fine:
"SELECT t.id FROM cms11_f_topics t JOIN cms11_f_posts p WHERE t.id=1 GROUP BY t.id;"

By Guest, By Guest, posted 9th Feb 2020, 7:45 PM

Unfortunately sometimes we really do want to just throw away extra matches. Here's an interesting query pattern that works...

"SELECT t.id FROM cms11_f_topics t JOIN cms11_f_posts p ON p.id=(SELECT MIN(p.id) FROM cms11_f_posts p WHERE t.id=p.p_topic_id);"

In this I am explicitly saying which of multiple rows to join to (the one with the lowest ID).

By Guest posted 10th Feb 2020, 10:23 AM

For reference, I am using all the following techniques to solve this...

IN
EXISTS
() subqueries
Joins selecting on a unique ID found via a minimiser/maximiser check
GROUP BY
DISTINCT * & COUNT(DISTINCT *)

Which one to use really depends on the situation.

By Guest posted 13th Feb 2020, 3:40 PM

This is now implemented. Turned out way more complex than I imagined, and took more like a week. Significant parts of the galleries code needed to be reworked, the XML DB implementation needed significant improvements to support the more sophisticated syntaxes, unit testing of that was needed, and work was needed to fix SQL compatibility on how we treat ORDER BY (anything in ORDER BY must also be selected). Generally a lot of stuff needed cleaning up, as I got into the guts of a lot of different systems and saw many issues.

The good news is some of the work towards #3354 was done in the gallery refactoring (as images and videos needed to be queried together, so it made sense to create an API for that, which will also work as an API for querying all content types at once).

Gabri	Points: 60 Voting power: 16.626 Voting power control percentage: 0.235%
Master Rat	Points: 15 Voting power: 12.925 Voting power control percentage: 0.182%
PDStig	Points: 10 Voting power: 27.666 Voting power control percentage: 0.390%

#3797 - Cleanup queries so can_arbitrary_groupby/remove_duplicate_rows

Leader-board Top Weekly Earners

Coming soon…

Statistics