It's 100% decompiled to C, but not fully labelled yet. That means there's lots it's auto-generated names all over the place. It would be interesting to see someone try to port it now though.
I wish someone ran a proper study. In my experience it helps mark some patterns you may not be immediately familiar with, like CRC functions/tables. It also does a good job where no thinking is required, like when you have partial information: "for(unk=0; unk<unk2; unk++) { unk3=players[unk]... }" - you know what the names are, you just need to do the boring part. For completely unknown things, it may get more interesting. But I know I'd like to at least see the suggestions. It's a long and boring work to decompile things fully.
Would really like to know what makes a person (or group of people) invest the time and energy to do this? Is there a group of hobbyist gamers who work on titles they love? Is it about digital conservation?
This is how the text adventure/interactive fiction community started. Some hackers reverse engineered the Infocom z-machine then built new languages and compilers so new games could be created.
I've spent a lot of time reverse-engineering vintage synthesizer firmware (which is a bit simpler than modern games). I did complete end-to-end annotations of these two vintage synth ROMs:
It started because I was just curious about how these devices actually worked. In the end I learned a lot of really invaluable skills that really broadened my horizons as an engineer. I got a chance to talk to a handful of incredibly smart people too. The actual work can be a lot of fun. It's like piecing together a really large and technical jigsaw puzzle. In my case, it also led to me being able to release a fun firmware mod: https://github.com/ajxs/yamaha_dx97
It can be a bit analogous to archaeology too. Even though in my case the DX7 is only 42 years old, that was an aeon ago in computing terms. You gain a bit of insight into how different engineers used to design and build things. Even though development for the N64 is fairly recent, from memory the console had some interesting constraints that made development tricky.
> the console had some interesting constraints that made development tricky
The ones that come to mind are the tiny 4KB texture cache, high memory latency (thanks Rambus), and inefficient RCP microcode. The N64 could have been so much more with a few architectural tweaks but developers liked the Playstation much better on account of its simplicity despite it being technically inferior in most respects.
There are people who spend hours and hours analyzing bit characters in things like Lord of the Rings (where did the Blue Wizards go? Who is Tom Bombadil?) or Star Wars. This is a similar fan obsession. Remember fan comes from fanatic.
Maybe they just really love the game. This is a form of tribute.
I too have a beloved video game from my childhood: Mega Man Battle Network 2. That game changed my life. I learned English and became a programmer because of it. I have two physical copies of it in my collection, one of them factory sealed.
Sometimes I open the game in IDA and try to reverse engineer bits and pieces of it. I just want to understand the game. I don't have the time, the dedication or even the low level programming knowledge that these badass folks in the ROM hacking community have, but I still try it.
In addition to those categories, speedrunning glitch hunters tend to gravitate to participating in these projects as well. E.g. the Twilight Princess decomp was started primarily by and for the speedrunning community.
It's also the endgame for romhacking, once a game is fully decompiled modders can go far beyond what was feasible through prodding the original binary. That can mean much more complicated gameplay mods, but also porting the engine to run natively on modern platforms, removing framerate limits, and so on.
Same. Is there a project page or anything that explains the context, the reasons, the history behind this? I bet it would be very interesting.
The Readme is too technical and misses a writeup on the soul of the project: Section 1, title. Section 2, already talking about Ubuntu and dependencies. Where is section "Why?" :-) ?
Based off the commit history, this has been one person's on-off project for 3 years. My guess is that they like this game and they were curious about how decomps come to fruition - and what better way to find out than to do it?
You climb a mountain because it's there. Different people have different mountains.
It's an interesting challenge, you can improve it or make it do X,Y,Z, you can add speedrunning or competition gaming features, solving puzzles gives a sense of accomplishment, a certain small group gives you social clout, etc.
The parent poster is not making a legal statement. They copied/pasted the first line of the Readme. I made the clarification that the note is a legal disclaimer, not s technical requirement, so people, including the parent poster, are not confused.
Functionally, the README describes that providing a game copy is necessary for creating a build. This would make sense, since unless the sound, image, text, etc. assets are all baked into the code, those would have to come separately.
Legally, it further doesn't make much sense. This is cleaned up (?) and painstakingly bytematched decompiler output (again based on the README), so it's unfortunately just plain illegal [0], disclaimers nonwithstanding.
[0] as always, legality depends on jurisdiction - so as always, if in doubt, consult an actual lawyer
Usually these projects only contain a copy of the source code to build the binary. You still need the game assets like the levels and sounds to play the game.
You can definitely do a lot of relabeling that way. It may be also worth trying a loop of "fix until it matches binary" for separate files... But I haven't seen anyone actually write it up.
Edit: just gave it a go, and guessing reasonable variable names works extremely well when there's partial information already available. Especially nice for things like naming all the counters and temporaries when you know what you're iterating over (so it's just boring manual and trivial work), but can also figure out the meaning of larger patterns for function names.
Fairly well. They aren't perfect, but they save a lot of time.
They are also downright superhuman at recognizing common library functions or spotting well known algorithms, even if badly mangled by compilation and decompilation.
I'm using an agent to port a game. I have the source. It's not going well. Lots of rabbit holes that are self-inflicted because the LLM doesn't want to port a lot of libraries because it's too much work for one round. It does a lot of stubbing and makes assumptions and that breaks the whole thing.
Why Duke Nukem: Zero Hour of all games?
It's 100% decompiled to C, but not fully labelled yet. That means there's lots it's auto-generated names all over the place. It would be interesting to see someone try to port it now though.
Would LLMs be good at labelling, or would the risk of false-positives just waste more time than it saved?
I wish someone ran a proper study. In my experience it helps mark some patterns you may not be immediately familiar with, like CRC functions/tables. It also does a good job where no thinking is required, like when you have partial information: "for(unk=0; unk<unk2; unk++) { unk3=players[unk]... }" - you know what the names are, you just need to do the boring part. For completely unknown things, it may get more interesting. But I know I'd like to at least see the suggestions. It's a long and boring work to decompile things fully.
Seems like it would be pretty straight forward to fine tune an LLM based on code + asm pairs to help facilitate reverse engineering.
[dead]
Gillou68310 looks to have been a one person army for 99% of it, what an impressive show of dedication.
The Legend of Zelda: Twilight Princess has been getting farther along as well https://decomp.dev/zeldaret/tp
Would really like to know what makes a person (or group of people) invest the time and energy to do this? Is there a group of hobbyist gamers who work on titles they love? Is it about digital conservation?
This is how the text adventure/interactive fiction community started. Some hackers reverse engineered the Infocom z-machine then built new languages and compilers so new games could be created.
I've spent a lot of time reverse-engineering vintage synthesizer firmware (which is a bit simpler than modern games). I did complete end-to-end annotations of these two vintage synth ROMs:
- https://github.com/ajxs/yamaha_dx7_rom_disassembly
- https://github.com/ajxs/yamaha_dx9_rom_disassembly
It started because I was just curious about how these devices actually worked. In the end I learned a lot of really invaluable skills that really broadened my horizons as an engineer. I got a chance to talk to a handful of incredibly smart people too. The actual work can be a lot of fun. It's like piecing together a really large and technical jigsaw puzzle. In my case, it also led to me being able to release a fun firmware mod: https://github.com/ajxs/yamaha_dx97
In case anyone is curious about how I worked, I wrote a bit of a tutorial article: https://ajxs.me/blog/Introduction_to_Reverse-Engineering_Vin...
It can be a bit analogous to archaeology too. Even though in my case the DX7 is only 42 years old, that was an aeon ago in computing terms. You gain a bit of insight into how different engineers used to design and build things. Even though development for the N64 is fairly recent, from memory the console had some interesting constraints that made development tricky.
> the console had some interesting constraints that made development tricky
The ones that come to mind are the tiny 4KB texture cache, high memory latency (thanks Rambus), and inefficient RCP microcode. The N64 could have been so much more with a few architectural tweaks but developers liked the Playstation much better on account of its simplicity despite it being technically inferior in most respects.
There are people who spend hours and hours analyzing bit characters in things like Lord of the Rings (where did the Blue Wizards go? Who is Tom Bombadil?) or Star Wars. This is a similar fan obsession. Remember fan comes from fanatic.
I guess you’ve never kicked ass and chewed bubble gum
Maybe they just really love the game. This is a form of tribute.
I too have a beloved video game from my childhood: Mega Man Battle Network 2. That game changed my life. I learned English and became a programmer because of it. I have two physical copies of it in my collection, one of them factory sealed.
Sometimes I open the game in IDA and try to reverse engineer bits and pieces of it. I just want to understand the game. I don't have the time, the dedication or even the low level programming knowledge that these badass folks in the ROM hacking community have, but I still try it.
In addition to those categories, speedrunning glitch hunters tend to gravitate to participating in these projects as well. E.g. the Twilight Princess decomp was started primarily by and for the speedrunning community.
It's also the endgame for romhacking, once a game is fully decompiled modders can go far beyond what was feasible through prodding the original binary. That can mean much more complicated gameplay mods, but also porting the engine to run natively on modern platforms, removing framerate limits, and so on.
Same. Is there a project page or anything that explains the context, the reasons, the history behind this? I bet it would be very interesting.
The Readme is too technical and misses a writeup on the soul of the project: Section 1, title. Section 2, already talking about Ubuntu and dependencies. Where is section "Why?" :-) ?
Based off the commit history, this has been one person's on-off project for 3 years. My guess is that they like this game and they were curious about how decomps come to fruition - and what better way to find out than to do it?
You climb a mountain because it's there. Different people have different mountains.
It's an interesting challenge, you can improve it or make it do X,Y,Z, you can add speedrunning or competition gaming features, solving puzzles gives a sense of accomplishment, a certain small group gives you social clout, etc.
Note: To use this repository, you must already own a copy of the game.
I used it just fine without one, I think you’re wrong.
I believe you are making a technical statement and the parent poster is making a legal one. You're both right I guess
The parent poster is not making a legal statement. They copied/pasted the first line of the Readme. I made the clarification that the note is a legal disclaimer, not s technical requirement, so people, including the parent poster, are not confused.
well, you better delete it within 24 hours then!
this is a legal disclaimer lol, not an actual requirement
How so?
Functionally, the README describes that providing a game copy is necessary for creating a build. This would make sense, since unless the sound, image, text, etc. assets are all baked into the code, those would have to come separately.
Legally, it further doesn't make much sense. This is cleaned up (?) and painstakingly bytematched decompiler output (again based on the README), so it's unfortunately just plain illegal [0], disclaimers nonwithstanding.
[0] as always, legality depends on jurisdiction - so as always, if in doubt, consult an actual lawyer
"A decompilation of Duke Nukem Zero Hour for N64.
Note: To use this repository, you must already own a copy of the game."
We all now do, of course
Usually these projects only contain a copy of the source code to build the binary. You still need the game assets like the levels and sounds to play the game.
Are LLMs well suited to this kind of reverse engineering?
You can definitely do a lot of relabeling that way. It may be also worth trying a loop of "fix until it matches binary" for separate files... But I haven't seen anyone actually write it up.
There are attempts like this https://github.com/louisgthier/decompai that are related, but not quite the same as this project.
Edit: just gave it a go, and guessing reasonable variable names works extremely well when there's partial information already available. Especially nice for things like naming all the counters and temporaries when you know what you're iterating over (so it's just boring manual and trivial work), but can also figure out the meaning of larger patterns for function names.
Fairly well. They aren't perfect, but they save a lot of time.
They are also downright superhuman at recognizing common library functions or spotting well known algorithms, even if badly mangled by compilation and decompilation.
I'm using an agent to port a game. I have the source. It's not going well. Lots of rabbit holes that are self-inflicted because the LLM doesn't want to port a lot of libraries because it's too much work for one round. It does a lot of stubbing and makes assumptions and that breaks the whole thing.
How did you approach it? Some specific harness? Planning?
I’ve not experimented but I thought they might be valuable for isolated variable / function renaming
Still [eagerly] waiting over here for Duke Nuke Forever!
..since how long? I've lost track (:
Not sure if it's a joke, but in case you missed the release - it's already out: https://store.steampowered.com/agecheck/app/57900/
That release is like the Matrix movie sequels.
There are no Matrix movie sequels I hear you say? ... Indeed.
As of earlier this year, it's been out longer than the time between it getting announced and released.
oh, this just makes me sad. Has it really been that long?