jart 4 days ago

Wow! This might actually make it possible for Actually Portable Executable to support running on Windows ARM. I'm already putting the ARM code inside all my binaries. There's just never been a way to encode that in the PE headers. But if my emulated WinMain() function for x86-64 could detect that it's being emulated and then simply ask a WIN32 API to jump to the ARM entrypoint instead, it'd be the perfect solution to my problems. I actually think I'm going to rush out and buy a Windows ARM computer right now.

  • yjftsjthsd-h 4 days ago

    And this kind of beautiful insanity is why you're one of my favorite developers of this era.

    Also,

    > I'm already putting the ARM code inside all my binaries.

    Wait, I thought CPU architecture was the one limitation that did affect APE - you mean on unix-likes APE binaries are already compatible across amd64 and aarch64?

    Edit: rereading https://justine.lol/cosmo3/ it does say that, doesn't it - and ARM64 listing "Windows (non-native)" just means that one platform uses (for the next few hours-days, at least...) emulation. That's amazing:)

  • mappu 4 days ago

    I found the http://www.emulators.com/docs/abc_arm64ec_explained.htm article extremely helpful at understanding what the ABI is doing, you might like it too.

    • adzm 4 days ago

      This was a brilliant and informative read, deserving of its own post really. Thanks!

    • KerrAvon 4 days ago

      tl;dr: Microsoft reinvented the Mixed Mode Manager from Mac System 7.1.x circa 1993.

      • pjmlp 3 days ago

        Or the Windows 3.1 Enhanced Mode with V86, more likely.

  • justinfrankel 4 days ago

    If you have an Apple Silicon mac you can install Win11 in UTM and it works great for dev purposes. Can get the free builds via Windows Insider, too.

    • conradev 4 days ago

      I would recommend getting an official consumer build to test all of the latest consumer features like Copilot

      Parallels has a Microsoft partnership and has an official AMR64 image which I was able to grab (and run in anything). I’m sure there are a lot more now, though!

  • lewurm 4 days ago

    > I actually think I'm going to rush out and buy a Windows ARM computer right now.

    If you have an Apple Silicon machine you can run a Windows Insider build via UTM in a VM.

  • zorgmonkey 4 days ago

    I'm pretty sure the API you'll want to detect that is IsWow64Process2.

    • szundi 4 days ago

      I love the 2s and Exes at the end of Windows API call names

      • formerly_proven 4 days ago

        wait/waitid/waitpid/wait3/wait4

        dup/dup2/dup3

        creat/open/openat/openat2

        cough

        • skissane 4 days ago

          Even more: clone3, __clone2 (only exists on Itanium), fchmodat2, preadv2, pwritev2, pipe2, sync_file_range2, mmap2 (only certain architectures; for x86, only 32-bit), renameat2, mlock2, faccessat2, epoll_pwait2

          My personal prediction is sooner or later we'll see execveat2, to permit setting /proc/PID/comm when using execveat [0].

          I doubt we'll ever see clone4, because clone3 is passed a structure argument with the structure size, so new fields can be supported just by increasing the structure size. If other syscalls had done that from the start, much of the 2/3/etc would have been avoided. It is actually a very common practice on Windows (since NT), it has only much more recently been adopted in the Linux kernel

          [0] see https://uapi-group.org/kernel-features/

        • phaedrus 4 days ago

          I work on a team that supports some equipment related to airplanes. An acronym for one piece of equipment that is decades old is "RCSU". When I got a support call talking about "RSCU", I assumed the person meant "RCSU".

          Nope. It turns out, when they made their next-generation piece of equipment, the vendor differentiated it by swapping the inner two letters in an already easy-to-say-wrong acronym.

          My reaction was, "WTF didn't they just call it the RCSU2?!"

    • orthoxerox 4 days ago

      IsWow64Process2ForRealThisTime

    • ale42 4 days ago

      wonder why they didn't call it IsWow64ProcessEx

  • dboreham 4 days ago

    > could detect that it's being emulated

    Down the rabbit hole...

    > buy a Windows ARM computer

    You can still get Surface Pro X (16G/LTE) on Amazon for $800

    • officeplant 4 days ago

      If you're lucky you can also find the snapdragon thinkpads for under $350 on amazon.

      • adastra22 4 days ago

        Whoa

        • officeplant 3 days ago

          unfortunately that time seems to have passed, the refurb units I can find now are back up to $650.

  • szundi 4 days ago

    Quickly create a donation page, you have this moment haha

userbinator 4 days ago

Windows 9x can run 16-bit realmode (V86), 16-bit protected mode, and 32-bit protected mode code in the same process by using different segment descriptors. Too bad amd64 wasn't compatible with that model, nor the virtualisation features that came afterwards, or Intel could've made ARM32/64-mode segments a reality if they decided to add an ARM decoder to their microarchitecture.

  • st_goliath 4 days ago

    > ... 16-bit realmode (V86), 16-bit protected mode, and 32-bit protected mode code in the same process by using different segment descriptors...

    > ...Intel could've made ARM32/64-mode segments a reality...

    While I myself admire this particular breed of masochism, the direction that Intel currently wants to take is apparently quite the opposite.

    In May last year, they proposed X86S[1][2][3] which tosses out 16-bit support completely, along with 32 bit kernel mode (i.e. the CPU boots directly into 64 bit mode, 32 bit code is only supported in ring 3).

    The proposal trims a lot of historical baggage, including fancy segmentation/TSS shenanigans, privilege rings 1 & 2, I/O port access from ring 3, non-flat memory models, etc... limiting the CPU to 64 bit kernel mode, and 64 or 32 bit x86 user mode. With the requirement for 64 bit kernel mode, it effectively also removes un-paged memory access.

    [1] https://en.wikipedia.org/wiki/X86-64#X86S

    [2] https://www.intel.com/content/www/us/en/developer/articles/t...

    [3] https://news.ycombinator.com/item?id=36006446

    • trollbridge 4 days ago

      The TSS was always one of the most obnoxious aspects of the 80286 that stuck around much longer than it should have. On 386 or anything newer, using it was _slower_ than implementing it in software, yet you still needed them to implement task gates necessary for things like exceptions and interrupts.

      If anyone actually has a serious need to use ancient 16 bit software, emulators like 86Box work very well. Software that old doesn’t really need performance faster than, say, a Pentium 90, which 86Box has no trouble achieving on my M1 (ARM) MacBook.

      You can also use winevdm[1] on modern 64 bit Windows operating systems. I have this in production use for a niche case where someone can’t give up a particular 16 bit app, and I didn’t want to tangle with a VM for them.

      The technical details of making sure a modern CPU still functions exactly like an 80386, which in turn made sure it functioned like an 80286, when you fire up a 16 bit task on, say, 32-bit Windows 10 (or 64-bit with something like winevdm[1]) sound like a nightmare for a microcode engineer or QA tester.

      [1] https://github.com/otya128/winevdm

      • leeter 4 days ago

        Oh it doesn't, AMD and Intel gave up on that awhile back. v8086 mode might... but I'd guess it has quite a bit of errata. Everything else has most certainly changed. CPUs don't support the A20 gate for example. Nor do they truly support real mode (they boot in 'unreal mode' now). If you want a 386 compatible you're looking at ALi or DM&P CPUs that are basically Pentium/486/386 clones.

        I'd argue the break started with the Pentium Pro, at that point things shifted architecturally.

        • trollbridge a day ago

          The 80286 and 80386 never had special support for the "A20 gate". That was provided by (often slow) external circuitry.

          Some CPUs (I cannot remember which) built in an A20 gate to their CPUs to improve performance.

          The P6 was a complete implementation of the 80286 and 80386, Virtual 8086 mode, TSS, and all - you could boot DOS or an 80286 operating system on a P6 without any problems, although the design was not optimised to improve performance of 16-bit software. This was enough of a problem that they rolled back that design by the Celeron era because there were still a lot of people using 16-bit apps.

      • userbinator 4 days ago

        On 386 or anything newer, using it was _slower_ than implementing it in software

        ...and thus it didn't get used, meaning Intel didn't make it faster, and so the vicious cycle continued.

        Hardware task switching could've made software simpler and more forward-compatible.

        Of course they eventually reinvented most of it with the virtualisation extensions anyway.

        • jlokier 4 days ago

          Actually it did get used. Linux and Windows used the x86 TSS for process context-switching for years.

          During that time, Linux had a limit on the number of processes, which was due the maximum number of TSS entries that fit in the x86 GDT.

          Eventually the Linux kernel was changed to the more versatile context-switch method it uses today. Among other things, this change was important for thread performance, as thread context switches can skip the TLB flush. Same for kernel mode tasks. Software task switching also greatly increased the number of processes and threads that can be launched, from about 8000 (across all CPU cores) to millions.

    • cesarb 4 days ago

      > the direction that Intel currently wants to take is apparently quite the opposite.

      It's not just Intel. It's clear that ARM is also going in the same direction, by allowing newer cores to be 64-bit (AArch64) only, dropping compatibility with the older 32-bit ARM ISA (actually three ISAs: traditional 32-bit ARM, Thumb, and Thumb2), and IIRC some manufacturers of ARM-based chips are already doing that.

      • leeter 4 days ago

        Allegedly there are already off list SKUs from both AMD and Intel that don't support 16/32bit code and boot up without the legacy bits. How far they went in that? I don't know. I'd hope they removed LDT etc. and reduced GDT to just ES and GS (or just used an esbase and gsbase MSRs).

    • userbinator 4 days ago

      The proposal trims a lot of historical baggage

      All of that is a tiny amount of die area relative to the whole CPU. After all, a 386 has only 275k transistors.

      X86S is Stupid. Intel apparently forgot what made them worth choosing over competitors like ARM and now RISC-V. Non-compatible x86 makes little sense.

      ...and if they want to include the virtualisation extension, they still need to include that backwards-compatible functionality.

      • Symmetry 4 days ago

        A tiny amount of die area, a huge amount of engineering and validation effort. If segmentation issues can cause the register renamer to lose track of who owns a physical register that's the sort of issue that's terrible to find and debug but which also can't be allowed in a real device. Intel has traditionally been able to just throw more engineers at the problem than their competitors, but I"m not sure that'll be the case going forwards.

      • whizzter 4 days ago

        Mainline OS's have been 64bit for about 15-20 years by this point, the point is to trim parts of X86 that isn't used when running a 64bit OS.

        Notice that only 32bit kernel/R-0 is removed, but not usermode/R-3 so even when reducing this your 64bit Windows will still run clean 32bit software built for Win95 from the 90s.

        Even today you need to run a virtualized 32bit OS to run old 16bit software (the negative part is if you still run a virtualized 32bit OS then it'll need to be emulated instead of HW virtualized if the virtualization solutions allowed that).

      • 15155 4 days ago

        > Intel apparently forgot what made them worth choosing over competitors like ARM

        People (myself and others I know) choose ARM chips because they don't absolutely mandate the purchase of sanctioned chipsets/other supporting components you don't have access to, impossible-to-obtain specs, etc.

  • Dwedit 4 days ago

    For x64, there's OTVDM to run Windows 3.1 applications.

sylware 4 days ago

Funny, I started to code some of my linux x86_64 programs... using RV64 assembly (the new C), with a small in-process RV64 assembly interpreter.

Everything seems to converge more and more toward RISC-V these days.

Symmetry 4 days ago

Sounds similar to what NVidia was doing with their Project Denver cores, using a mix of emulated ARM and native VLIW instructions with gradual compilation from one to another.

frozenport 4 days ago

Struggling with the use case.

It seems like this is when you have the source or the libs but choose to mix x86 and arm?

It would seem if you have the source etc you should just bite the bullet and port everything.

  • adamjs 4 days ago

    Two use-cases jump to mind:

    * Allows incremental porting of large codebases to ARM. (It's not always feasible to port everything at once-- I have a few projects with lots of hand-optimized SSE code, for example.)

    * Allows usage of third-party x64 DLLs in ARM apps without recompilation. (Source isn't always available or might be too much of a headache to port on your own.)

    • vsl 4 days ago

      3. Improve x64 emulation performance for everybody. Windows 11 on ARM ships system DLLs compiled as Arm64EC - makes the x64 binaries run native ARM code at least within system libraries.

    • ack_complete 4 days ago

      It's not worth using ARM64EC for just for incremental porting -- it's an unusual mode with even less build/project support than Windows ARM64 and there are EC-specific issues like missing x64 intrinsic emulations and slower indirect calls. I wouldn't recommend it except for the second case with external x64 DLLs.

    • callalex 4 days ago

      At that point why trust the emulator over the port? Either you have sufficient tests for your workload or you don’t, anything else is voodoo/tarot/tea leaves/SWAG.

      • wtallis 4 days ago

        "Why trust the emulator?" sounds a lot like asking "why trust the compiler?". It's going to be much more widely-used and broadly-tested than your own code, and probably more thoroughly optimized.

      • szundi 4 days ago

        We might be lucky and the emulator guys might have enough testing

    • amelius 4 days ago

      > Allows incremental porting of large codebases to ARM. (It's not always feasible to port everything at once-- I have a few projects with lots of hand-optimized SSE code, for example.)

      Wouldn't it make more sense to have a translator that translates the assembly, instead of an emulator that runs the machine code?

    • frozenport 4 days ago

      Yeah but you need to port the SIMD before shipping anyways?

      So if you're doing incremental stuff might as well stub out the calls with "not implemented", and start filling them in.

      • creshal 4 days ago

        The SIMD part will be emulated as normal, as far as I understand. So you can ship a first version with all-emulated code, and then incrementally port hotspots to native code, while letting the emulator handle the non-critical parts.

        At least in theory, we'll see how it actually pans out in practice.

  • selimnairb 4 days ago

    I feel like binary translation is a better approach. It’s a temporary workaround that allows users to use non-native programs while they are ported properly. ARM64EC seems like it will incentivize “eh that’s good enough” partial porting efforts that will never result in a full port, while making the whole system more complicated, with a larger attack surface (binary translation also makes the system more complicated, but it seems more isolated/less integrated with the rest of OS).

    • PaulHoule 4 days ago

      My understanding is that ARM64EC only makes sense in terms of binary translation. That is, the x64 bits get translated and the ARM bits don’t.

  • anaisbetts 4 days ago

    The use-case is huge apps that have a native plugin ecosystem, think Photoshop and friends. Regular apps will typically just compile separate x64 and ARM64 versions

  • doctorpangloss 4 days ago

    Yes, bite the bullet and port. Of course it makes no sense.

    These sorts of things are only conceived in conversations between two huge corporations.

    Like Microsoft needs game developers to build for ARM. There’s no market there. So their “people” author GPT-like content at each other, with a ratio of like 10 middlemen hours per 1 engineer hour, to agree to something that narratively fulfills a desire to build games for ARM. I can speculate endlessly how a conversation between MS and EA led to this exact standard but it’s meaningless, I mean both MS and EA do a ton of things that make no sense, and I can’t come up with nonsense answers.

    Anyway, so this thing gets published many, many months after it got on some MS PM’s boss’s partner’s radar. Like the fucking devices are out! It’s too late for any of this to matter.

    You can’t play Overwatch on a Snapdragon whatever (https://www.pcgamer.com/hardware/gaming-laptops/emulation-pr... ) End of story. Who cares what the ABI details are.

    Microsoft OWNS Blizzard and couldn’t figure this out. Whom is this for?

    • comex 4 days ago

      > Anyway, so this thing gets published many, many months after it got on some MS PM’s boss’s partner’s radar.

      Arm64EC is not new. It was released back in 2021.

Tempest1981 4 days ago

> requires the use of the Windows 11 SDK and is not available on Windows 10 on Arm.

So what should developers do re: Win10 users? Separate builds for them?

  • goosedragons 4 days ago

    Is it really even a big enough concern to think about them? Windows 10 on ARM lacks x64 emulation support and the devices never sold well. I can't imagine there's too too many Windows 10 on ARM devices hanging around still running Windows 10.

    • dmitrygr 4 days ago

      > Windows 10 on ARM lacks x64 emulation support

      The last build of win10 on arm supported x64 and many of us who do not want win11 still use it.

      • goosedragons 4 days ago

        Sort of. An insider build that was never fully released. Does that even get updates anymore?

        • dmitrygr 4 days ago

          Windows Update? Why would you volunteer for that experience‽‽‽ It is a VM in my MacBook. It needs no updates

      • NelsonMinar 4 days ago

        There must be literally tens of you.

  • pjmlp 4 days ago

    From Microsoft's point of view, ignore them after 2025, unless they pay big.

    In reality, yes, different builds, like it already happened with previous Windows versions.

  • AshamedCaptain 4 days ago

    The same thing you do for users of all previous failed windows on arm attempts?

    If you meant x86 win10 users you can use the win11 sdk to target them

  • TiredOfLife 4 days ago

    Only the first Snapdragon 835 is not capable to run windows11. Starting with Snapdragon 850 all are compatible.

    Snapdragon 835 is also horribly slow.

spullara 4 days ago

How is this different than what Apple did for the x86 -> ARM transition?

  • duskwuff 4 days ago

    Rosetta 2 operates on the process level -- on an Apple Silicon system, a process can run an ARM executable and run all ARM code, or can run an x86_64 executable and run all x86_64 code. ARM64EC allows processes to run a mixture of native and emulated code. Whether this is actually useful is debatable, but the option exists.

    • tedunangst 4 days ago

      Rosetta allows loading x86 plugins into arm apps.

      • duskwuff 4 days ago

        Source? My understanding is that cross-architecture plugins are handled out of process over XPC.

        • tedunangst 4 days ago

          That sounds right. I think I misread a source. Was just looking at this yesterday, but didn't look close enough.

  • anaisbetts 4 days ago

    ARM64EC is usually for stuff like plugins or really large apps - most people will simply compile an ARM64 and x64 version of their app

  • tedunangst 4 days ago

    It requires you to recompile your application a third time if you want to load x64 plugins, and then it becomes incompatible with arm plugins.

nomercy400 4 days ago

So is this Arm64EC Windows-only? Is it standardized?

If not, is this not just another target architecture? You cannot use it on arm64 architectures, and your app already supports x86.

  • ComputerGuru 4 days ago

    It’s not anything special, it’s arm code compiled with the x64 abi. The theory behind it is simple enough.