View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
5555 | Composr alpha bug reports | General / Uncategorised | public | 2024-01-17 02:29 | 2025-02-27 22:54 |
Reporter | PDStig | Assigned To | PDStig | ||
Priority | normal | Severity | major | ||
Status | resolved | Resolution | fixed | ||
Summary | 5555: Privacy purging overhaul | ||||
Description | There are numerous massive issues with how the privacy purging system currently works (potential to delete data belonging to other members, anonymising on key fields, not knowing which member owns the specified content, etc). Overhaul the entire system as follows: ***Privacy hooks*** * New properties on database tables: - owner_id_field: ?string (field identifying the Member ID who owns the data / row) - username_fields: array[string] (fields where textual usernames are stored ID_TEXT format) * Modify properties: - member_id_fields -> additional_member_id_fields (and remove whichever field is going in the owner_id_field) * New functions: - is_owner($table_name, $row, $member_id, $username, $email_address) : bool (determines if the given identifiable information is the owner of the provided content row and thus can delete it; has a default parent method and can be overridden via privacy hooks) * logged mail messages: - We must find a way to match and anonymise emails from m_to_email despite it being in a serialised format. No exceptions; it's still user data and still subject to GDPR protections. Account for this. * f_warnings_punitive - Instead of using param fields for user data, add new fields specific for those (IP address, etc). And account for these in the hooks. * mentorr - Either mentor or mentee is owner; account for this in is_owner since owner_id_field only takes one field * points_ledger - default owner is sender_id, but if this is guest, then owner is actually recipient_id; account for this in is_owner. ***Modified search criteria behaviour*** * Username searches are now textual instead of converted to a member ID; searched within the newly-defined username_fields. * Provided username does not need to match an existing member account (we could have stray textual usernames from old accounts in the database) * If Member ID specified but not username, fill in username from member ID in criteria. * If username specified but not member ID, fill in member ID from specified username if one exists. * If both username and member ID specified, username could be an old username of the given member ID; allow username searches/matches except when username points to a member ID that is different from the one provided. * If securitylogging addon installed, use lookup to fill in missing data if member ID, e-mail, or username was provided. * If securitylogging addon not installed, use conventional forum driver methods to fill in missing info. ***Modified log_it behaviour*** * When purging, log_it could be adding additional records to actionlogs. Therefore, actionlogs needs to always be the last table purged. * Furthermore, log_it needs to forcefully run immediately instead of being registered as a shutdown function when running purge. ***Modified anonymise behaviour*** * For each row matched against the criteria, only fields which contain values equal to what is in the criteria will be anonymised (e.g. if there are multiple member fields, only the member field matching the given member ID will be anonymised) * If a field set to be anonymised is a key field, it cannot be anonymised (or we will get database errors). Assume then that since we matched against a key field (which means there won't be a duplicate), the individual owns that content. Delete the row entirely instead of anonymising the field (unless delete is not an allowed method. In that case, bail out on error [make sure unit tests pick up on hooks that could lead to this]). * If is_owner passes, then everything in additional_anonymise_fields is anonymised even if it does not match the "Others" criteria because we assume all of these fields belong to the owner. ***Modified delete behaviour*** * Rows can be deleted only if the new is_owner method returns true (usually, this means that the criteria has a member ID and it matches the owner_id_field; but if there is no owner_id_field, other criteria can be used to determine if, given the search criteria, we have a high confidence that individual owns the row / content.). Otherwise, the anonymise method will be used. The one exception is if the anonymise method cannot anonymise a field because it's a key field, and delete is an allowed method. In that case, delete is forced. * Ensure where applicable we are always using Composr's API methods to delete content instead of just deleting the database row. ***Modified download behaviour*** * Any fields defined as personal user data fields that do not match the given criteria are anonymised in the output as they probably belong to other members. * All fields in additional_anonymise_fields are anonymised unless is_owner passes or the individual field matches Others criteria. * Ensure that we are using human-understandable context to the best of our abilities. In serialise, make sure we are using _dereferenced on every custom Composr type (MEMBER, GROUP, etc). Wherever ID numbers are used, change them to something humans can understand (unless it's AUTO). Language codes should be do_lang'd. AUTO_LINK to provided content where applicable (or just specify in an __auto_link property the table.field which that ID points to). * Instead of one huge JSON file, generate a TAR archive (name is username_time); each table is its own JSON file (name is the table name). Do not include tables with no records. ***Modified UI*** * Include number of records matched for each table. ***Bug fixes*** * If no where clauses can be defined when making the sql, then do not run the query at all. Previously, we would run into cases where there might not be any where conditions given the search criteria, so we would select the whole table. That was a major bug! * Don't use contextual dates in download data. Use real dates / times. | ||||
Additional Information | Document everything! Regarding is_owner checks (default method): * If owner_id_field is specified and member ID was provided in criteria: - If member ID matches owner_id_field, they are owner. Else, continue. - If either owner_id_field is null or member ID was not provided in criteria, continue. * If there is exactly one username_fields defined, and username was provided in criteria: - If username matches the field, and owner_id_field is null, they are owner. Else, continue. - If username was not provided, or there are not exactly one username_fields defined, continue. * If there is exactly one email_fields defined, and email was provided in criteria: - If email matches exactly one member in the system, and owner_id_field is null, they are owner. - If email matches exactly one member in the system, and the matched member ID matches owner_id_field, they are owner. - Else, continue. * If any of the given criteria (member ID, username, or email address, in that order) matches any fields that have a database key, they are owner. * If we reach this point, then they are not owner. is_owner will NEVER use IP address or "Others" in ownership checking; we can never determine with strong confidence someone owns the content using those criteria. | ||||
Tags | Risk: Major rearchitecting , Roadmap: v11, Type: Legal compliance / Privacy | ||||
Attach Tags | |||||
Sponsorship open | |||||
related to | 5514 | Closed | PDStig | Composr testing platform / automation goals | Add a unit test for removal_default_handle_method and allowed_handle_methods in privacy hooks |
related to | 5550 | Resolved | PDStig | Composr | Point transaction tables must not be deleted by privacy system |
related to | 5584 | Assigned | Chris Graham | Composr | Handle catalogue fields in privacy system |
related to | 5583 | Resolved | PDStig | Composr | Allow downloading or purging of data from user profile |
related to | 5585 | Resolved | PDStig | Composr | Provide files in privacy download archive |
|
Also since some hooks use the Composr API to delete content, such actions could cause additional records to get added into action log. It's likely we already purged the action log by that point though, so that's a problem. Let's make action log the last table purged. EDIT: Actually, can't do that as log_it could be a shutdown function. We will have to re-run purge again in another task, but only on action log. EDIT 2: Action logs doesn't define user privacy data; maybe it should? EDIT 3: It does... in core. A bit confusing it's there and not actionlog. EDIT 4: I thought about using push/pop methods instead of making it more confusing with another task run for just actionlogs, but actionlogs allow anonymisation or deletion. So this won't work; we really do need to run it as a separate purge task. EDIT 5: Actually we probably could use push/pop to force log_it to log immediately instead of during shut-down. Then, we can just make actionlogs the last table purged instead of using another task. |
|
I still don't have a method yet for logged mail messages. This will require careful consideration. I'm going to put that in a separate issue. Same for allowing null on points_ledger sender_id and recipient_id |
Date Modified | Username | Field | Change |
---|---|---|---|
2024-01-17 02:29 | PDStig | New Issue | |
2024-01-17 02:29 | PDStig | Status | Not Assigned => Assigned |
2024-01-17 02:29 | PDStig | Assigned To | => user4172 |
2024-01-17 02:41 | PDStig | Additional Information Updated | |
2024-01-17 02:52 | PDStig | Description Updated | |
2024-01-17 02:53 | PDStig | Description Updated | |
2024-01-17 02:55 | PDStig | Description Updated | |
2024-01-17 02:57 | PDStig | Description Updated | |
2024-01-17 02:58 | PDStig | Description Updated | |
2024-01-17 03:02 | PDStig | Description Updated | |
2024-01-17 03:08 | PDStig | Description Updated | |
2024-01-17 03:12 | PDStig | Tag Attached: Roadmap: v11 | |
2024-01-17 03:12 | PDStig | Tag Attached: Type: Legal compliance / Privacy | |
2024-01-17 03:12 | PDStig | Tag Attached: Risk: Major rearchitecting | |
2024-01-17 03:17 | PDStig | Additional Information Updated | |
2024-01-17 03:17 | PDStig | Additional Information Updated | |
2024-01-17 03:18 | PDStig | Additional Information Updated | |
2024-01-17 03:19 | PDStig | Additional Information Updated | |
2024-01-17 03:19 | PDStig | Additional Information Updated | |
2024-01-17 22:17 | PDStig | Relationship added | related to 5514 |
2024-01-17 22:18 | PDStig | Relationship added | related to 5550 |
2024-01-17 22:30 | PDStig | Additional Information Updated | |
2024-01-17 22:31 | PDStig | Additional Information Updated | |
2024-01-17 22:35 | PDStig | Additional Information Updated | |
2024-01-17 22:40 | PDStig | Description Updated | |
2024-01-17 22:42 | PDStig | Description Updated | |
2024-01-17 22:44 | PDStig | Additional Information Updated | |
2024-01-17 22:45 | PDStig | Additional Information Updated | |
2024-01-17 22:58 | PDStig | Additional Information Updated | |
2024-01-17 22:59 | PDStig | Additional Information Updated | |
2024-01-17 23:01 | PDStig | Additional Information Updated | |
2024-01-20 21:12 | PDStig | Note Added: 0008218 | |
2024-01-20 21:15 | PDStig | Note Edited: 0008218 | |
2024-01-20 21:30 | PDStig | Note Edited: 0008218 | |
2024-01-20 22:35 | PDStig | Note Edited: 0008218 | |
2024-01-21 03:09 | PDStig | Note Edited: 0008218 | |
2024-01-21 19:57 | PDStig | Note Edited: 0008218 | |
2024-01-22 01:03 | PDStig | Status | Assigned => Resolved |
2024-01-22 01:03 | PDStig | Resolution | open => fixed |
2024-01-22 19:36 | PDStig | Note Added: 0008221 | |
2024-01-22 19:38 | PDStig | Note Edited: 0008221 | |
2024-01-22 19:48 | PDStig | Description Updated | |
2024-01-30 17:50 | PDStig | Relationship added | related to 5584 |
2024-01-30 17:50 | PDStig | Relationship added | related to 5583 |
2024-01-30 17:50 | PDStig | Relationship added | related to 5585 |