Filtering strings output - Gabriel Gonzalez

Tired of scrolling through endless walls of garbage when running strings on a binary? Same here. When you’re doing quick triage before diving into reverse engineering, most of what strings spits out is just useless noise — random bytes that happen to form printable characters but don’t mean anything.

To make my life easier, I put together a small set of heuristics (chi-square, simple pattern checks, dictionary matching, etc.) to detect only the strings that actually look like meaningful C text. The idea is to surface the good stuff and hide the junk so you can focus on what matters instead of wasting time scrolling.

Usage is dead simple:

strings firmware.img | python3 ~/bin/strings_english.py

This filters most of the annoying random output and keeps the strings that are likely relevant for analysis. There are two small files involved: the Python script itself, and a lightweight dictionary that helps catch common identifiers like strstr.

https://github.com/ggonzalez/CyberSecurity-KnowledgeBase/blob/main/ReverseEngineering/strings_english.py

https://github.com/ggonzalez/CyberSecurity-KnowledgeBase/blob/main/ReverseEngineering/words_alpha.txt

Would you like to receive notifications about new posts?