I'm really sorry to hear this happened yet again :(.
I am really wondering if it is Mandrill at this point, as it's not something other users have had, and we have a lock in place for the newsletter drip sender now.
While I was fixing this bug, I observed about 30k emails in the queue. So it's definitely not an issue entering the queue, as we'd expect to see the duplicates at that point in that case.
In sources/hooks/systems/cron/newsletter_drip_send.php we use 'newsletter_currently_dripping' as a lock. If it is set, it skips a drip cycle. It is set while starting a drip cycle and only unset after the mails for the cycle have been deleted from the queue. That means any concurrency issues should be impossible, as it will either skip concurrency via lock, or lock when writing the lock and achieve the same end.
You may want to check my code in case somehow I missed something, but I think it's sound.
I am considering what could be going wrong with Mandrill now.
I am really wondering if it is Mandrill at this point, as it's not something other users have had, and we have a lock in place for the newsletter drip sender now.
While I was fixing this bug, I observed about 30k emails in the queue. So it's definitely not an issue entering the queue, as we'd expect to see the duplicates at that point in that case.
In sources/hooks/systems/cron/newsletter_drip_send.php we use 'newsletter_currently_dripping' as a lock. If it is set, it skips a drip cycle. It is set while starting a drip cycle and only unset after the mails for the cycle have been deleted from the queue. That means any concurrency issues should be impossible, as it will either skip concurrency via lock, or lock when writing the lock and achieve the same end.
You may want to check my code in case somehow I missed something, but I think it's sound.
I am considering what could be going wrong with Mandrill now.
The newsletter queue was feeding to the mail queue, and the mail queue had no locking.
I have made the newsletter queue send direct, which I thought was the case.
i have also implemented locking on the mail queue in case lots does end up dumping into there.
The reason it only affects you is the latency of Mandrill considerably raises the probability of CRON jobs overlapping.
We are going to send another newsletter out tonight - will let you know how that goes.
Thanks
Ade