View Issue Details

IDProjectCategoryView StatusLast Update
5555Composr alpha bug reportsGeneral / Uncategorisedpublic2025-02-27 22:54
ReporterPDStig Assigned ToPDStig  
PrioritynormalSeveritymajor 
Status resolvedResolutionfixed 
Summary5555: Privacy purging overhaul
DescriptionThere are numerous massive issues with how the privacy purging system currently works (potential to delete data belonging to other members, anonymising on key fields, not knowing which member owns the specified content, etc). Overhaul the entire system as follows:

***Privacy hooks***

* New properties on database tables:
 - owner_id_field: ?string (field identifying the Member ID who owns the data / row)
 - username_fields: array[string] (fields where textual usernames are stored ID_TEXT format)

* Modify properties:
 - member_id_fields -> additional_member_id_fields (and remove whichever field is going in the owner_id_field)

* New functions:
 - is_owner($table_name, $row, $member_id, $username, $email_address) : bool (determines if the given identifiable information is the owner of the provided content row and thus can delete it; has a default parent method and can be overridden via privacy hooks)

* logged mail messages:
 - We must find a way to match and anonymise emails from m_to_email despite it being in a serialised format. No exceptions; it's still user data and still subject to GDPR protections. Account for this.

* f_warnings_punitive
 - Instead of using param fields for user data, add new fields specific for those (IP address, etc). And account for these in the hooks.

* mentorr
 - Either mentor or mentee is owner; account for this in is_owner since owner_id_field only takes one field

* points_ledger
 - default owner is sender_id, but if this is guest, then owner is actually recipient_id; account for this in is_owner.

***Modified search criteria behaviour***

* Username searches are now textual instead of converted to a member ID; searched within the newly-defined username_fields.
* Provided username does not need to match an existing member account (we could have stray textual usernames from old accounts in the database)
* If Member ID specified but not username, fill in username from member ID in criteria.
* If username specified but not member ID, fill in member ID from specified username if one exists.
* If both username and member ID specified, username could be an old username of the given member ID; allow username searches/matches except when username points to a member ID that is different from the one provided.
* If securitylogging addon installed, use lookup to fill in missing data if member ID, e-mail, or username was provided.
* If securitylogging addon not installed, use conventional forum driver methods to fill in missing info.

***Modified log_it behaviour***

* When purging, log_it could be adding additional records to actionlogs. Therefore, actionlogs needs to always be the last table purged.
* Furthermore, log_it needs to forcefully run immediately instead of being registered as a shutdown function when running purge.

***Modified anonymise behaviour***

* For each row matched against the criteria, only fields which contain values equal to what is in the criteria will be anonymised (e.g. if there are multiple member fields, only the member field matching the given member ID will be anonymised)
* If a field set to be anonymised is a key field, it cannot be anonymised (or we will get database errors). Assume then that since we matched against a key field (which means there won't be a duplicate), the individual owns that content. Delete the row entirely instead of anonymising the field (unless delete is not an allowed method. In that case, bail out on error [make sure unit tests pick up on hooks that could lead to this]).
* If is_owner passes, then everything in additional_anonymise_fields is anonymised even if it does not match the "Others" criteria because we assume all of these fields belong to the owner.

***Modified delete behaviour***

* Rows can be deleted only if the new is_owner method returns true (usually, this means that the criteria has a member ID and it matches the owner_id_field; but if there is no owner_id_field, other criteria can be used to determine if, given the search criteria, we have a high confidence that individual owns the row / content.). Otherwise, the anonymise method will be used. The one exception is if the anonymise method cannot anonymise a field because it's a key field, and delete is an allowed method. In that case, delete is forced.
* Ensure where applicable we are always using Composr's API methods to delete content instead of just deleting the database row.

***Modified download behaviour***

* Any fields defined as personal user data fields that do not match the given criteria are anonymised in the output as they probably belong to other members.
* All fields in additional_anonymise_fields are anonymised unless is_owner passes or the individual field matches Others criteria.
* Ensure that we are using human-understandable context to the best of our abilities. In serialise, make sure we are using _dereferenced on every custom Composr type (MEMBER, GROUP, etc). Wherever ID numbers are used, change them to something humans can understand (unless it's AUTO). Language codes should be do_lang'd. AUTO_LINK to provided content where applicable (or just specify in an __auto_link property the table.field which that ID points to).
* Instead of one huge JSON file, generate a TAR archive (name is username_time); each table is its own JSON file (name is the table name). Do not include tables with no records.

***Modified UI***

* Include number of records matched for each table.

***Bug fixes***

* If no where clauses can be defined when making the sql, then do not run the query at all. Previously, we would run into cases where there might not be any where conditions given the search criteria, so we would select the whole table. That was a major bug!
* Don't use contextual dates in download data. Use real dates / times.
Additional InformationDocument everything!

Regarding is_owner checks (default method):
* If owner_id_field is specified and member ID was provided in criteria:
 - If member ID matches owner_id_field, they are owner. Else, continue.
 - If either owner_id_field is null or member ID was not provided in criteria, continue.

* If there is exactly one username_fields defined, and username was provided in criteria:
 - If username matches the field, and owner_id_field is null, they are owner. Else, continue.
 - If username was not provided, or there are not exactly one username_fields defined, continue.

* If there is exactly one email_fields defined, and email was provided in criteria:
 - If email matches exactly one member in the system, and owner_id_field is null, they are owner.
 - If email matches exactly one member in the system, and the matched member ID matches owner_id_field, they are owner.
 - Else, continue.

* If any of the given criteria (member ID, username, or email address, in that order) matches any fields that have a database key, they are owner.

* If we reach this point, then they are not owner.

is_owner will NEVER use IP address or "Others" in ownership checking; we can never determine with strong confidence someone owns the content using those criteria.
TagsRisk: Major rearchitecting , Roadmap: v11, Type: Legal compliance / Privacy
Attach Tags
Sponsorship open

Sponsor

Date Added Member Amount Sponsored

Relationships

related to 5514 ClosedPDStig Composr testing platform / automation goals Add a unit test for removal_default_handle_method and allowed_handle_methods in privacy hooks 
related to 5550 ResolvedPDStig Composr Point transaction tables must not be deleted by privacy system 
related to 5584 AssignedChris Graham Composr Handle catalogue fields in privacy system 
related to 5583 ResolvedPDStig Composr Allow downloading or purging of data from user profile 
related to 5585 ResolvedPDStig Composr Provide files in privacy download archive 

Activities

PDStig

2024-01-20 21:12

administrator   ~8218

Last edited: 2024-01-21 19:57

Also since some hooks use the Composr API to delete content, such actions could cause additional records to get added into action log. It's likely we already purged the action log by that point though, so that's a problem. Let's make action log the last table purged.

EDIT: Actually, can't do that as log_it could be a shutdown function. We will have to re-run purge again in another task, but only on action log.

EDIT 2: Action logs doesn't define user privacy data; maybe it should?

EDIT 3: It does... in core. A bit confusing it's there and not actionlog.

EDIT 4: I thought about using push/pop methods instead of making it more confusing with another task run for just actionlogs, but actionlogs allow anonymisation or deletion. So this won't work; we really do need to run it as a separate purge task.

EDIT 5: Actually we probably could use push/pop to force log_it to log immediately instead of during shut-down. Then, we can just make actionlogs the last table purged instead of using another task.

PDStig

2024-01-22 19:36

administrator   ~8221

Last edited: 2024-01-22 19:38

I still don't have a method yet for logged mail messages. This will require careful consideration. I'm going to put that in a separate issue.

Same for allowing null on points_ledger sender_id and recipient_id

Issue History

Date Modified Username Field Change
2024-01-17 02:29 PDStig New Issue
2024-01-17 02:29 PDStig Status Not Assigned => Assigned
2024-01-17 02:29 PDStig Assigned To => user4172
2024-01-17 02:41 PDStig Additional Information Updated
2024-01-17 02:52 PDStig Description Updated
2024-01-17 02:53 PDStig Description Updated
2024-01-17 02:55 PDStig Description Updated
2024-01-17 02:57 PDStig Description Updated
2024-01-17 02:58 PDStig Description Updated
2024-01-17 03:02 PDStig Description Updated
2024-01-17 03:08 PDStig Description Updated
2024-01-17 03:12 PDStig Tag Attached: Roadmap: v11
2024-01-17 03:12 PDStig Tag Attached: Type: Legal compliance / Privacy
2024-01-17 03:12 PDStig Tag Attached: Risk: Major rearchitecting
2024-01-17 03:17 PDStig Additional Information Updated
2024-01-17 03:17 PDStig Additional Information Updated
2024-01-17 03:18 PDStig Additional Information Updated
2024-01-17 03:19 PDStig Additional Information Updated
2024-01-17 03:19 PDStig Additional Information Updated
2024-01-17 22:17 PDStig Relationship added related to 5514
2024-01-17 22:18 PDStig Relationship added related to 5550
2024-01-17 22:30 PDStig Additional Information Updated
2024-01-17 22:31 PDStig Additional Information Updated
2024-01-17 22:35 PDStig Additional Information Updated
2024-01-17 22:40 PDStig Description Updated
2024-01-17 22:42 PDStig Description Updated
2024-01-17 22:44 PDStig Additional Information Updated
2024-01-17 22:45 PDStig Additional Information Updated
2024-01-17 22:58 PDStig Additional Information Updated
2024-01-17 22:59 PDStig Additional Information Updated
2024-01-17 23:01 PDStig Additional Information Updated
2024-01-20 21:12 PDStig Note Added: 0008218
2024-01-20 21:15 PDStig Note Edited: 0008218
2024-01-20 21:30 PDStig Note Edited: 0008218
2024-01-20 22:35 PDStig Note Edited: 0008218
2024-01-21 03:09 PDStig Note Edited: 0008218
2024-01-21 19:57 PDStig Note Edited: 0008218
2024-01-22 01:03 PDStig Status Assigned => Resolved
2024-01-22 01:03 PDStig Resolution open => fixed
2024-01-22 19:36 PDStig Note Added: 0008221
2024-01-22 19:38 PDStig Note Edited: 0008221
2024-01-22 19:48 PDStig Description Updated
2024-01-30 17:50 PDStig Relationship added related to 5584
2024-01-30 17:50 PDStig Relationship added related to 5583
2024-01-30 17:50 PDStig Relationship added related to 5585