Reading Binaries#
janus-llm
uses Ghidra to decompile and parse binary files. This allows the user to translate binaries to another programming language or to perform retrieval augmented generation (RAG) on their decompiled contents.
To read a binary as input, you’ll follow the same instructions as in the Quick Start documentation. However, you’ll first need to install Ghidra.
Then you’ll set the GHIDRA_INSTALL_PATH
environment variable to the location of the Ghidra installation.
export GHIDRA_INSTALL_PATH=/Users/mdoyle/programs/ghidra_10.4_PUBLIC
After setting the environment variable, you can use the janus
CLI to read binaries.
Adding to Chroma#
janus db add --input-dir janus/language/binary/_tests --input-lang binary binary-collection
Then we can peek at the collection we just created. You can see in the output that it decompiled the Hello World binary to C-like pseudocode and embedded that document in the embedding database:
janus db ls --peek binary-collection
Output:
Collection: binary-collection
ID: 04b71a0f-50d8-4061-8775-0b48b575601f
Metadata: {'date_created': '2024-02-01', 'time_created': '22:50:22.857642'}
Tenant: default_tenant
Database: default_database
Length: 1
Peeking at first entry:
{
'ids': ['566265ca-c197-11ee-ab3f-5accbc90a9b9'],
'embeddings': [0.07846622169017792, 0.06490962952375412, '...'],
'metadatas': [
{
'cost': 0,
'end_line': 8,
'hash': -7749008126979110064,
'original_filename': 'hello.bin',
'start_line': 1,
'tokens': 20,
'type': 'translation_unit'
}
],
'documents': ['undefined4 entry(void)\n\n{\n _printf("Hello, World!");\n return 0;\n}\n\n'],
'uris': None,
'data': None
}
Translating#
janus translate --input-lang binary --output-lang python --input-dir janus/language/binary/_tests --output-dir python-tests
Then we can cat
the translated code we just created with ChatGPT:
cat python-tests/hello.py
python-tests/hello.py
:
def entry():
print("Hello, World!")
return 0