Remotely repair broken Malwarebytes endpoint
A walkthrough of my troubleshooting process
This article details my process in troubleshooting broken Malwarebytes antivirus installations at my organisation across several affected endpoints. These affected endpoints can display various symptoms such as:
- endpoint missing from Nebula cloud console
- no reported protection service version
- enormous EDR backup folder that will continually grow and never shrink (when EDR plugin enabled)
The reason the first broken endpoint was escalated to me was because the hard disk was full. In organisations where staff turn over is high and there is no group policy or automated process to remove user profiles after x days of no logon, the process was to manually remove user profiles of users who had left the company and call it a day.
After the users of the affected endpoint soon called back to say the disk was full again, Treesize was used to get a better understanding of the culprit directory location(s). The culprit was
C:\ProgramData\Malwarebytes Endpoint Agent\Plugins\EDRPlugin\Backup.
After a brief web search, a relevant thread on the official support forum was found. From reading the thread, it was obvious that deleting the files inside the folder was not an option due to being protected by the antivirus agent. Further down the thread, there was the hint of the policy that controls retention of file backups being key to this behavior, perhaps if it wasn't applying correctly, the result would be that the backups would not be culled, and subsequently fill up the hard disk.
I logged onto the Nebula management console and found the endpoint online. I tried sending actions to the endpoint such as running a scan or updating the endpoint version, but these would just sit as "Pending".
When in doubt, reinstalling the endpoint seemed the best course of action to take. However, there was an error midway through the uninstall process, and it seemed to be in a half-installed state somehow more broken than before.
When checking remotely, I noticed the Malwarebytes Endpoint Agent service was no longer running and I couldn't start it.
Doing another web search, I came across this inbuilt tool called the Configuration Recovery Tool. This tool attempts to restore the endpoint to a previously working version, so I ran it on the endpoint remotely using PSExec.
psexec \\computer cmd
C:\Windows\system32> cd "C:\Program Files\Malwarebytes Endpoint Agent\"
C:\Program Files\Malwarebytes Endpoint Agent> ConfigurationRecoveryTool.exe MBCloudEA.exe
After approximately 10 minutes, the free space on the computer started to slowly increase, until after a few hours it was back to normal. The endpoint was restored and the backup retention policies had kicked in and culled the old files.
More broken endpoints
Over the course of several weeks, more endpoints were having a similar issue. In the event that the endpoint didn't show up in the Nebula console, remotely running the Configuration Recovery Tool with MBCloudEA.exe as the argument would reconnect the endpoint to Nebula. Over the course of a few hours, the free space issue on the hard disk would be resolved.
Malwarebytes support suggest doing a reinstall when there is no protection service version present. Doing this remotely would involve moving the endpoint to a group with no tamper protection enabled, and then using the "EACmd" tool documented on this page. A remote installation using PSExec/SCCM/Group Policy should then be sufficient to get the endpoint fully restored.
Next steps: Automation
After a period of reactivity to these issues, my next steps will be to put some automation in place to get proactive in fixing endpoints before their disk would fill up. The general idea would be to get a list of workstation objects from Active Directory and compare it to the endpoints present in Nebula. Any endpoints not present, or that have no reported protection service version will be targeted as being broken endpoints needing attention.
I've since found that a Malwarebytes community member has produced a comprehensive-looking script that checks health and software update pending status, runs the configuration recovery tool if any issues, then checks the cloud console for confirmation.
At the very least, it will give me some inspiration at getting my own automated process in place!
Thanks for reading this article. I'm open to any feedback on the troubleshooting process that I took or the solutions that I found. I'm new to public technical writeups, and hope to grow through consistency, reading other blogs and receiving feedback.