pjmlp 3 months ago

The main issue is that most of these tools like TLA+, live in another dimension, proving something in TLA+, doesn't mean when the algorithm gets actually implemented in C, it still maps to TLA+ as proven.

Formal methods need to be like SPARK 2014, or DbC, actually part of the programming language, or being able to generate code in the target language, interop with its libraries ecosystem, and have good quality good with performance, that won't make people want to remove the generated code and rewrite it manually.

  • throwawayFinX 3 months ago

    There are tools available for direct formal analysis of the system as implemented (and not “just” a model of it).

    With e.g. the C bounded model checker CBMC, your model language is also C. Frama-C is another approach (Hoare-like, axiomatic logic). Klee works on LLVM IR etc.

    Stronger proponents would perhaps suggest extracting the implementation from the model etc., but you don’t have to drink all the koolaid at once to get many of the benefits.

    And not all your software needs to be formalized either, to get many of the benefits.

    I’ved succesfully used CBMC and Klee on C code for verifying important parts of embedded safety/mission critical software on resource constrained HW (i.e. no OS, static allocation, single to double digit kilobytes of RAM etc.)

    Formal methods rock!

    • rramadass 3 months ago

      Can you recommend some good resources on these over and above their main websites ?

  • coldtea 3 months ago

    Proving that your algorithmic design is correct (to the degree the proof tool can assure) is already a big help - even if your implementation of the algorithm might still have bugs.

    At least now you know it's just bugs you need to fix, and not a faulty logic to begin with.

    • pjmlp 3 months ago

      Faulty logic can only be proven to a specific mathematical limit, if the actual system constraints, hardware and OS APIs, aren't part of the model.

      • coldtea 3 months ago

        My comment was already about proofs "to a specific mathematical limit" still being useful and better than nothing, tho

  • jgalt212 3 months ago

    > doesn't mean when the algorithm gets actually implemented in C, it still maps to TLA+ as proven.

    So true, but you know what would be amazing and useful? A TLA+ spec transpiler that emits fuzzing code. That would probably be easier to make than TLA+ spec transpiler that emits C.

  • bvrmn 3 months ago

    TLA+ helps to find errors in your algorithm. It's a way easier than trying to model the algorithm in terms of end implementation. Too much complexity and state are introduced because of details.

    • pjmlp 3 months ago

      Abstract algorithms in mathematical sense, not algorithms that depend on hardware and OS constraints.

      • MarceColl 3 months ago

        It can model that if you add it. You can model total or partial hardware failure if you want and its a case you care about, you can add steps that corrupt your data or steps that kills machines at any time. TLA+ is just a way of defining state transitions and temporal properties, inside that add the use cases you need and it will handle them just fine.

        • pjmlp 3 months ago

          And then calling that C API, that Vulkan shader with a broken SPIR-V driver, the automated router firmware update done overnight, just brings down all the hard verification work, done in an external tool.

          • MarceColl 3 months ago

            So because a tool doesn't fix everything it's best not to use it? Each tool has its place, its uses and their limitations. You buy a car even if it doesn't cook you food. If you have multiprocess algorithms, distributed systems or concurrency at all, TLA+ is a lifesaver to understand and fix fundamental issues in your system, if you care about a broken shader, it's just not your tool. When there is a bug you can discard a whole part of the problems and you can check the implementation against a known specification that works.

            I wouldn't think I need to explain this to a professional developer, if you are waiting for the silver bullet that solves all your problems you may as well go take care of goats.

            • pjmlp 3 months ago

              To pick up your example, it is more like getting the driver license with only the theorical written test, and only thereafter drive a car for the very first time on the road with the newly received license.

              Yes, we should be open minded, and the tools should also be improved so that developers on the trenches bother to learn and use them.

              • MarceColl 3 months ago

                You are clearly engaging in bad faith at this point when you pick the "car not being useful for everything" analogy and you cherrypick it into absurdity. For someone that says is interested in distributed algorithms it's clear you are not interested in being able to reason, about them.

                EDIT: There was an edit adding the final part of the comment after I submitted mine above. Here is my response to the last part, since the comment was toned down a bit:

                Yes, we need to improve the tooling, and TLA+ can be improved a lot, I'd love for it to be able to reason about my code and to have an easier bridge between both words. However engaging in the discussion of how it can be improved without acknowledging what good and usefulness it brings is just not useful in general, and it just puts off people that could benefit from it from ever using it.

                TLA+ has a lot of value in some cases, and we should be acknowledging that!

      • bvrmn 3 months ago

        Yes. And they could contain errors even without OS and hardware. And practice clearly shows it. Proofs are easier to make on the same abstraction level, aren't they?

  • IshKebab 3 months ago

    I don't know if that's always the case. My understanding is that TLA+ is really designed to prove properties about algorithms. Knowing that complex algorithms are correct is very useful eventually if it doesn't provide a correct implementation in a real world language.

    But TLA+ is a bit of an unusual proof language. I think most of them are like you describe: SPARK, Dafny, F*, CBMC, all the Rust ones, etc. I'm not sure about the interactive proof languages (Coq et al), they are too complicated for my normal sized brain.

    • pjmlp 3 months ago

      Yes, and making those languages (SPARK, Dafny, F*, CBMC,....) more widespread, the kind of something that everyone expects to be out of the box on their favourite language is how this can eventually gain more adoption, than a few selected companies, or academic research.

  • pyrale 3 months ago

    For that you would need a proper, well-grounded type system and a language that doesn’t stray from it.

    Some fp languages are close enough to fit that bill; Rust too is going down this path: its type system was built to make a proof, after all.

  • RaxcN 3 months ago

    I do not know TLA+, but there is an abundance of other theorem provers that can model C very closely, including overflow, assignments, etc.

    So closely, that when you transcribe such an algorithm to C, there is very little room for error, even if the C code isn't proven directly.

    • rramadass 3 months ago

      > there is an abundance of other theorem provers that can model C very closely

      Can you list some/all of them please? Would be helpful to others here.

  • scrubs 3 months ago

    "most of these tools like TLA+, live in another dimension, proving something in TLA+, doesn't mean..."

    This sentiment is sloppy, dismissive, with an tinge of obligation to bend to old man wisdom. I don't buy it for a moment.

    Let's do better.

    Formal tools help suss out requirements solved with high level algos for selected hard sub-issues of the problem under consideration. It's not code to be sure. And, this argument cuts both ways.

    It's also far from a clear a pile of code knows, expresses, implements, or communicates its requirements. Just because the code compiles doesn't necessarily mean squat either.

    So first thing: specification and implementation are both important and walk hand in hand. How tightly you hold onto one or both depends on several factors such as: are you competent in formal tools (learning them is tough), are there really hard sub-problems e.g. distributed computing on the table? etc.

    Second, I want to deal with risk ... you know ... connecting the dots end-to-end in software from customer to requirements to coded. I'll motivate my point with a imaginary story.

    In Silicon Valley two companies A, B on opposite corners of the same block are having a big team meeting. The big boss is in with important news.

    Company A: We just got a huge, new sale 10x our normal size. They want our basic product but with a lot of extensions, customizations. Business will get you situated with it. Effective now we're suspending our current work. All team leads and lead techs are required to meet with business to assess work planning, risk, task order, deliverables, and so on. We expect this work to run for 18 months. In that time we may abandon, re-prioritize, and clarify work as coordinate with the customer. The customer may change their mind etc.. But as we know well, coding the product goes hand in hand with risk, discovering clearer requirements, and doing a better job of delivering them. We're equal to it.

    Company B: We just got a huge, new sale 10x our normal size. They want our basic product but with a lot of extensions, customizations. Business will get you situated with it. Effective now we're suspending our current work. Part of this new work requires us to glue feedback from three controllers, plus sensors in realtime to compute steering commands to the cockpit console. If we nail this, there's 100x more work for us. Given how crucial this aspect is, we'll use Lean, Spin to model requirements and work out some LTL to make sure we're on the right path before we code. We expect this work to run for 18 months minimum. In that time we may abandon, re-prioritize, and ....

    Now at this point the boss at company B is cut off with a bunch of whining from its developers:

    * formal live in another dimension

    * a good model doesn't mean right requirements

    * even if the model is right there's no proof it's implemented right

    * it'll take longer to do because we've added scope

    * etc..

    The boss at company A, B are both doing the right thing. Both companies have risk - a lot of it. But company A has engineers on staff who are preternaturally good at dealing with risk. Company B, unfortunately doesn't really understand risk at all in terms of doing something about it.

    In this story company A never talked about formal tools. That's on purpose. I want to emphasize only that the problem to be impacted is risk levels, risk location, and sensible ways to manage it. Here formal is but a tool. It's not magic. It's just a tool. Had the boss at A mentioned formal tools, engineers would have rolled with it seeing it as one more bullet in their armory.

  • immibis 3 months ago

    The main issue is that proofs take almost literally forever to write. You cut your productivity by a factor of 100-1000 if you want to prove things.

    • MarceColl 3 months ago

      Not my experience (with Alloy and TLA+, some FM are very very hard and time intensive), it takes me a day to have a decently sized proof of some pretty complex difficult to test algorithms. The idea is not to use it for everything. But testing, thinking about and ensuring the validity of my solution with TLA+ is a lot faster than without TLA+. Not only that, but if any time in the future you need to add something to the system you can add that in half an hour and make sure all your important invariants hold without having to put yourself again in "paranoid mode" and having to remember everything about the system. Tests help, but some things are very very hard to test properly.

      As everything it is a tradeoff, a lot of things do not deserve a proof, but for the ones that do you are a lot more productive.

      It also depends on the level of the proof, I want to make sure the idea is valid and it takes some time to know what to abstract and what not to (it is a model after all). So if you are aware of what you are testing and at ehat level the tool and your proof is working it is very beneficial

    • rurban 3 months ago

      My formal proofs for C or C++ take almost no time to write at all, because I'm using the right tools. Some asserts, some _CPROVE_assume hints, and mostly limitation of big for loops or recursions. If it works for 6, it also should work for 1000.

    • RaxcN 3 months ago

      The visible "productivity" as in manager friendly LOC. The actual productivity goes up.

      Managers don't want that though, for reasons that already Dijkstra has outlined.

      • xavxav 3 months ago

        I'm a researcher in formal verification; my thesis was building a tool to do this kind of stuff and I agree with the grandparent (though I would say probably closer to 5-10x slowdown not 100x).

        Proofs are hard and often not for interesting reasons; its stuff like proving that you won't overflow a 2^64 counter which you only increment (aka something which won't happen for another couple billion years).

        Current tools are only useful for specific kinds of problems in specific domains; things where a life really depends on correctness. Outside of those cases, lightweight techniques provide much more bang for your buck imo.

robocat 3 months ago

I like the quote:

  “It would be well if engineering were less generally thought of, and even defined, as the art of constructing. In a certain important sense it is rather the art of not constructing; or, to define it rudely but not inaptly, it is the art of doing that well with one dollar, which any bungler can do with two after a fashion.” Arthur Wellington
Where is the boundary between finance and engineering? And Engineering is also about making optimal tradeoffs in other dimensions (he mentions time, performance, scalability, sustainability, and efficiency).
  • erik_seaberg 3 months ago

    I've seen this stated "Any idiot can build a bridge that stands, but it takes an engineer to build a bridge that barely stands."

    • kazinator 3 months ago

      In reality, it's more like: any idiot can build a bridge that, in some places, is 5X as strong as it will ever need to be, in other places 2X as strong, and in a couple places only 0.8X as strong as it will ever need to be.

      It takes the engineer to build it so that every part is 1.25X as strong as it will ever need to be, consistently everywhere.

      To the idiot, that then looks like it is barely standing, because some parts are 4X weaker than he would have made them, based on bad intutition.

    • hn_throwaway_99 3 months ago

      In modern days we generally stand in awe of ancient structures that have stood for millennia.

      But in some sense, many ancient engineers had to drastically overbuild their bridges and other structures because they didn't have the tools to know what the "barely stands" threshold was, so they had to build for a big margin of error when they lacked the tools of precision.

      • bluGill 3 months ago

        Most of those old bridges are not suitable for modern needs. They are nice to look at, but cannot carry a truck, or can only at very low speeds. Or buildings without restrooms. Just adding light to old things often destroys the beauty people praise them for

      • erik_seaberg 3 months ago

        Yeah, imagine digging up red clay, throwing it in a charcoal furnace, and then hammering it for a while. You can tell you're getting wrought iron, but would you know from one batch to the next whether it's good?

        • j16sdiz 3 months ago

          They did that with specification like "use red clay from this village" with "charcoal from that city" in this furnace.

          It is not scalable, but the quality is quite stable across batches

          • fmajid 3 months ago

            And then you have the rebar-free Roman lime-pozzolan concrete used in the Pantheon, still standing after 2000 years.

            • lesuorac 3 months ago

              Not a civil engineer but it sounds like the Romans knew much better then "baked red clay strong".

              > https://en.wikipedia.org/wiki/Pantheon,_Rome#Structure

              > The stresses in the dome were found to be substantially reduced by the use of successively less dense aggregate stones, such as small pots or pieces of pumice, in higher layers of the dome. Mark and Hutchison estimated that, if normal weight concrete had been used throughout, the stresses in the arch would have been some 80% greater. Hidden chambers engineered within the rotunda form a sophisticated structural system.[55] This reduced the weight of the roof, as did the oculus eliminating the apex.

              That said, I'm guessing the Pantheon is in one piece (unlike the coliseum) because it's been used as a church for the past ~1600 years and presumably (similar to Notre Dame) it gets repair as-needed.

    • armchairhacker 3 months ago

      On the other hand, in many cases what's defined as the "bare minimum" ends up being more than expected. The corner-cutting approach ends up pricier than the thorough approach in the long run, because it must be constantly adjusted if not entirely re-done.

      e.g. building a bridge to support a city's normal traffic density, but the city ends up growing, and the way the bridge was designed makes adding reinforcement difficult. Or building a bridge whose materials start to melt at a certain temperature that's above what's expected for the region, but climate change raises the temperature over the threshold, so it needs a completely different material.

      Hence the article's point that software companies should invest in formal methods, because while formally proving something is a long process without much reward in the short term, it eliminates an entire class of bugs which, in some long-term projects, makes it worth the effort.

      • vlovich123 3 months ago

        Conversely, we also know that taking longer to build something scalable is more expensive than just building cheaper things that accomplish the job now and solving the bigger problems when they’re relevant to the point where it can be the tipping point of success in the first place. In other words, that super reliable bridge that can handle all the growth you throw at it can take longer and too much money and result in a city that fails to have a growth problem in the first place.

        We know that’s true experientially in both CS (YAGNI) and in the physical spaces (eg Wright brothers constantly iterating with cheap prototypes until they figured out something that worked and SpaceX doing similar things with their prototypes). Iteration = learning. In theory formal methods let you iterate at a cheaper phase of the project. In practice, the methods themselves are very difficult to scale to large projects and these days require you to build the project twice - once for the formal language and then hand-translated to a production language.

  • kmoser 3 months ago

    Even with an unlimited budget and unlimited time, the difference between good and bad engineering is that a bad engineer won't know where to apply their talents to ensure things won't go wrong in the future. While every discipline in the world is bound by budget and time constraints (even an armchair philosopher has a limited lifespan), that doesn't change the essence of the discipline itself.

    I think of engineering not just as designing and building things, but also knowing where the limitations are of the thing you're constructing, whether it's purely on paper, in a computer's memory, or from brick and mortar.

  • juancn 3 months ago

    I like that quote.

    For me, engineering is applied science for money. That's the main thing. Someone is paying for everything you do.

    We get into it because we care about the how more than the why, but we end up having to deal with the who is paying for this.

    The boundary is fuzzy, engineering is heavily influenced by finance, otherwise it's just a hobby.

    Formal methods are great or an awful decision depending on wether or not they make economic sense for your product or service at your organization's current stage.

    Amazon EC2? A large, mature org, with products with extremely high fubar potential? Go wild.

    Your startup where market-fit is not proven? WTF? Are you high? use that money on something else.

    It's like unit testing, it in general makes great sense if the cost of maintaining the tests is lower than the opportunity cost of slowing down. If you're prototyping something that you'll likely throw away, it may be a bad idea. Once it proves value, you better build a decent test harness.

    Startups take technical debt because it's cheap when the market fit is not demonstrated (you pay it back later once you have succeeded somewhat).

    It's a rational decision.

    You don't need perfection if you're building the wrong thing, first you figure out if what you're building makes sense, then you decide in what to invest.

    On the other hand, if you're building a deep space probe, a large scale distributed system for hire, or any other thing were a failure is BAD (with capital letters), the extra cost of ensuring correctness is well worth it.

    • hnthrowaway0328 3 months ago

      > The boundary is fuzzy, engineering is heavily influenced by finance, otherwise it's just a hobby.

      This rings true for me. I'm interested in learning things, e.g. how processes work in Windows NT kernel, or how to write a custom memory allocator, instead of creating a product. I do have some passion for a certain product but it is highly personal and won't be shared.

      Quoting David Cutler, "What I really wanted to do was work on computers, not apply them to problems". I wish I knew this when I was younger.

    • __MatrixMan__ 3 months ago

      I feel like money is too narrow here, but there does need to be some kind of budget involved. For instance, getting the energy output of a nuclear fusion reactor to exceed the energy input is an engineering problem. An energy budget, in that case.

    • drewcoo 3 months ago

      > engineering is applied science for money

      I thought that was science, not engineering! Money determines the subjects researched, how they're researched, and the results, right?

      • juancn 3 months ago

        In science the money is not tied to the utility of the research (or at least no directly tied in most cases).

        It's possible to get funding to research something completely bonkers, in engineering it's less likely.

  • grumpyprole 3 months ago

    Absolutely it is about trade offs, sometimes bugs are just an inconvenience and time-to-market is more important. But bugs can result in lost business and even lost lives, for example UK Post Office Horizon. Horizon was built (bungled) on the cheap, but the overall cost of that saving must be dwarfed by the financial cost of the resultant scandal.

    • intelVISA 3 months ago

      Wasn't that system claimed 'infalliable'? That outsourced sweatshop must know some CS secrets we don't

  • kazinator 3 months ago

    You can kill people with bad finance and go to jail; same as engineering.

  • photochemsyn 3 months ago

    Depends on whether the systems you're working on have catastrophic failure modes, or not. In terms of how the responsible parties behave in both realms, it comes down to whether the costs of catastrophic failure can be externalized or not (e.g. government bailouts for chaos in in subprime mortgages or natural gas futures markets, Price-Anderson indemnity for nuclear reactor meltdowns). Miscalculation (see Boeing) can be disastrous.

    Now if you can't externalize the costs of catastrophic failure, that will be the fundamental constraint on your engineering / finance strategy. If you really don't want your network to be breached, you'll build it around security features, not try to hack them on after the fact. I like to think SpaceX, as a private company without the kind of political heavyweight backers that ULA enjoyed, thus put a lot more effort into avoiding the kind of problems ULA is currently having, for just that reason.

    Conclusion: take away the government safety nets, and both engineering and finance will perform better? Now about Silicon Valley Bank...

Almondsetat 3 months ago

In my opinion formal methods are what's missing from software engineering to make it a serious engineering discipline. In all other engineering fields you have to produce calculations and "proofs" that your design ought to work. In software engineering everything is basically overcomplicated handwaving.

  • vasco 3 months ago

    Software gets used for mission critical things all the time, with the same level of assuredness. What you are complaining about is that people making apps to show you the right advertisement or a website to watch pictures of cats sometimes is buggy and gets deployed without proper tests.

    Meanwhile almost every medical device under the sun today has software, nuclear powerplants run on software, space missions use software, airplanes drive themselves and land through software, etc.

    • ethanwillis 3 months ago

      It may be perceived that showing the right advertisements or showing pictures of cats isn't mission critical, and even just a toy. However, when massive amounts of personal information is being collected and shared by such applications I would say they're pretty important.

    • davidgrenier 3 months ago

      Your very last example kinda supports the thesis up there, considering how it's been going.

      • hwayne 3 months ago

        Notoriously, the 737 Max bug that killed a bunch of people was a software patch over a serious mechanical design flaw, and Boeing decided it was easier to make the software engineers fix it than redesign the entire plane.

      • thebruce87m 3 months ago

        Incredibly well, statistically?

  • ozim 3 months ago

    My opinion is that too many people think software engineering is not a serious engineering discipline and romanticize other engineering too much.

    For software that really matters you have standards like MISRA C you also have professional bodies like IEEE, ANSI, you have ISO standards for software development and infrastructure, there is a lot of standards and regulations that you can follow just like in all other fields. There is NIST for security and I could probably go on and on. So it is not "overcomplicated hand-waving".

    But no one is going to deal with "implementing formal approach" for "yet another crud app", they will hire whizz kid who knows latest JS framework and be done with it paying him 1/10th of cost of someone who knows more than that.

  • ffsm8 3 months ago

    In theory, your software itself provides the proof that it does what it claims. The proof would directly link to unit tests or integration tests at the meta level.

    Arguably, software engineering does this much more exhaustively then any other engineering field... To very varying quality output, from masterpieces like SQLite to your average B2B artifact which barely works despite having 99% coverage.

    • xigoi 3 months ago

      Tests are not a proof. Just like you can’t prove a mathematical statement by checking it for a million random numbers.

      • lucumo 3 months ago

        Tests are a basis for every engineering discipline. In software we have the luxury of being able to test each delivered project. Other engineered projects often only have their components tested, and then need to rely on calculations to determine the end result.

        Tests proof that something can handle the exact scenario that we're testing. The forces on a software system are just different and in some ways less predictable than for e.g. a building. But in software the consequences of failure are often less catastrophic[1] and it's a lot cheaper to run a battery of tests.

        [1] And if they are not less catastrophic than a collapsing building, you'd better test it far more extensively than an average system. That's engineering too, making a trade-off between the costs of failure and the cost of preventing failure.

      • high_na_euv 3 months ago

        Incorrect.

        Proof by exhaustion, also known as proof by cases, proof by case analysis, complete induction or the brute force method, is a method of mathematical proof in which the statement to be proved is split into a finite number of cases or sets of equivalent cases, and where each type of case is checked to see if the proposition in question holds.

        • xigoi 3 months ago

          I know. Tests are not the same thing – usually they check only a subset of all possible cases.

          • high_na_euv 3 months ago

            Because usually people do not write critical/important software :)

            • xigoi 3 months ago

              That’s irrelevant. I was just responding to a comment saying that tests are a proof.

    • creshal 3 months ago

      > In theory, your software itself provides the proof that it does what it claims.

      If you manage to solve the halting problem first.

      • seethedeaduu 3 months ago

        People keep summoning the halting problem in unrelated contexts to handwave things awayvas impossible.

        • coldtea 3 months ago

          Except in this case, where it's 100% fitting.

          • auggierose 3 months ago

            It is only fitting if you think of providing the proof as a push button technology. That makes as much sense as thinking of providing the software itself as a push button technology.

            • coldtea 3 months ago

              It has nothing to do with the proof as a push button technology.

              Adding annotations to the code for example is not a workaround for the halting problem.

              (Working with a subset that's not-turing complete is, but that doesn't cover the general case the parent calls for, and is not practical with current software development languages and practices).

              • auggierose 3 months ago

                Not sure if we are talking about the same thing. If you know why your software is correct, you can write that down as a proof (if you have a language which allows you to do that). If you don't know why your software is correct, well, maybe you should not ship it, because maybe it isn't correct.

                So yes, it is all about push button technology here. If you are willing to put in the work to write down the proof yourself, instead of expecting a magical entity to generate it for you, then it is possible. All the halting problem says is that there is no such magical entity. You will need to know why your software is correct.

              • adrianN 3 months ago

                The halting problem is a worst case statement. For almost all functions a programmer encounters in their job it is trivial to solve.

                • Thiez 3 months ago

                  Exactly. In practice you always want each loop iteration to perform some action that brings you closer to some desired end state. That loop invariant + some notion of getting closer to the end is usually the proof you need to prove termination. A loop where you don't have these things is almost certainly a bug.

                  The vast majority of loops iterate over some finite collection of elements and are thus guaranteed to terminate. Tricky cases are stuff like "I will repeat this operation until the output stops changing".

                  • coldtea 3 months ago

                    >Exactly. In practice you always want each loop iteration to perform some action that brings you closer to some desired end state. That loop invariant + some notion of getting closer to the end is usually the proof you need to prove termination.

                    If you're making Space Shuttle software from scratch yes. In practice for 99% of current mainstream development you depend on piles upon piles of dependencies, which might or might not check those loop invariants correctly, or might have a (trivial to add) endless blocking timeout, a deadlock, and many other wonderful things.

                    Usually they're not even set in stone, you'll update to the next version, and if they introduced such an issue, there's lots of fun to be had.

                    Nobody doubted you can have formal proofs for algorithms, or checked software in certain conditions and for certain controlled environments (e.g. using MISRA C and such) (in fact, I support this in a comment elsewhere in this thread).

                    The objection is to the idea that this is applicable (or trivially applicable, and we just aren't doing it out of laziness or something) to regular software development.

          • ulbu 3 months ago

            how?

      • auggierose 3 months ago

        No, you just need to require that somebody who writes the software, also knows why the software halts, and documents the reason properly.

      • pyrale 3 months ago

        The halting problem is not an actual issue. It’s a darling of computer science classes, but is you need to prove that your program halts, it’s not that hard to write it in a language subset that is not turing-complete.

      • adrianN 3 months ago

        The calculations you do to make sure that the bridge doesn’t collapse also require that the steel matched whatever properties your formula assumes. That doesn’t make the calculations useless.

      • ffsm8 3 months ago

        That doesn't need to be solved.

        Continuing the stream of inputs holds no meaning once you've tested every possible permutation (branch coverage) beyond checking for memory leaks. Every software has a finite amount of states it can hold, most developers just erroneously expect some states to be unlikely or impossible to reach.

      • riku_iki 3 months ago

        > If you manage to solve the halting problem first.

        there is a large slice of algorithms, which are possible to prove that they are not affected by halting problem. I think having tooling for this slice of algorithms is already a big win.

    • impossiblefork 3 months ago

      No, the proof is the proof that you generate with formal methods.

      Is exhaustive testing even possible in theory?

      • ffsm8 3 months ago

        Not just in theory but actually done:

        https://www.sqlite.org/testing.html

        • impossiblefork 3 months ago

          I don't agree that that's exhaustive testing. Full branch test coverage etc., is normal and sensible and it's great, but that's very different from having a proof that given certain preconditions, you will have certain postconditions.

          • ffsm8 3 months ago

            100% branch coverage means that you've fundamentally covered all potential states your software can enter.

            What you're asking for is literally only possible in theory and entirely pointless in the context of reality, because you cannot guarantee that a device is available at runtime, invalidating any proof. The same applies to any engineering projects, software or otherwise

            • akiarie 3 months ago

              Not even remotely.

                if (x > 0) {
                  /* code */
                } else {
                  /* more code */
                }
              
              Branch coverage just means that all the lines above are executed in some test. That's fundamentally different from dealing with all the different values that `x` could take on at run time.
              • ffsm8 3 months ago

                You're missing the point.

                Given your example no software can ever handle this equation given infinite numbers, as all CPUs have a maximum number they can compare.

                Just like you cannot prove that a bridge will be able to handle a given load. Engineering fundamentally can only make assurances within expectations. Your tests define these expectations, and by covering 100% branching coverage you've exhaustively explored all valid states your software can have.

                Do note I'm talking about actual 100% branching coverage here, not just 100% coverage of your own code, ignoring libraries and runtimes you've imported. That's way harder then you seem to think and the reason why sqlite (the linked page) has so much more testing code then application code

            • ratmice 3 months ago

              I agree with the other commenter that his is not even remotely true... given something like

              loop { if (A) { ... } if (B) { ... } }

              100% branch coverage means you've encountered both cases, but the loop could have encountered these in any number of ways specifically this branch could be encountered as either A && B both A && !B && B && !A, all of them exhaustively, and could even depend upon whether the order branches were taken matters.

              While formal methods give an exhaustive proof.

            • SassyBird 3 months ago

              > 100% branch coverage means that you've fundamentally covered all potential states your software can enter.

              No, it doesn’t.

                bool is_even(int n) {
                    return true;
                }
              
                bool test_is_even() {
                    return is_even(2);
                }
              
              100% test coverage. And yet.
              • ffsm8 3 months ago

                Are you seriously that big brain that you think that's smart?

                Your example software makes the claim that integers that the given runtime/interpreter/VM can handle will be assured as true. This was tested.

                Nobody said that any description of the software is always correct, just because a test was added that tests random stuff

                Also note that that's not 100% coverage. It merely covers your code, which builds upon other code.

                Feel free to actually read the given link. Your horizon might be expanded. Or don't, and keep your dunning kruger worldview

                • discreteevent 3 months ago

                  Those who dunning kruger should be careful lest they be dunning krugered themselves!

                  • ffsm8 3 months ago

                    I guess we're desperate to get to answer to the question of how dumb comment chains on this forum can get, and wherever everyone involved is either mentally or actually in the /r/im-12-and-this-is-smart phase.

  • atoav 3 months ago

    > In software engineering everything is basically overcomplicated handwaving.

    No, it is worse than that. There are people who claim the proof is the code itself. Just look at it. These are the same kind of people claiming good code needs no documentation.

    That is like a bridge engineer saying they need no structural calculation because the stability of the bridge is obvious to a certain kind of bridge nerd if they look at it. Only that civil engineers figured out after a few high profile bridge collapses that "trust me bro" isn't good enough and software engineering people are still in the: "trust him, he is a genius"-phase of the field.

    • mrkeen 3 months ago

      > There are people who claim the proof is the code itself.

      Yet this is what emerges when you write a sufficiently detailed proof.

      The more detail you add to your TLA+ model, the more it looks like just another implementation (albeit untyped, so it's pretty easy to slip errors into).

      • atoav 3 months ago

        In the end, if you want the map to represent the landscape 100% you end up with another landscape.

        Granted. But engineering is about making sure we don't make mistakes and just like in math class a good way to assure that is to take two different paths and arrive at the same place.

      • szundi 3 months ago

        We should have a TLA+^2 then. Oh the complexity has to be expressed anyway? Uh Oh and users mostly rather suffer from bugs than give up complex desires and features? Uh We’re doomed haha

        • atoav 3 months ago

          Isn't that why we have modularization? In the end you can formally verify low level stuff on a per module basis. Knowing that your PNG decoder is formally verified and therefore it is less likely that a user uploads a PNG and gets Runtime Code Execution within that process context isn't worth nothing. And if it allows attackers to get RCE the number of mysterious crashes it might have produced with weird PNGs is probably also non-zero.

          In the end all security measures are a layered effort and doing a bit formal verification, a bit unit testing, a bit fuzzing etc. is always better than mentally aiming for 100% and not doing anything at all.

          The question regarding formal verification is: What is the code that is running in the critical domain? Someone who writes safe code will try to minimize the amount of code/executables/complexity in that space. And that domain isn't necessarily a single process, it can be a part in the code, or a process until it dropped privileges etc.

          If all your code is critical domain it either means you failed to sanitize inputs, everything runs with yolo/root privileges, all of your customers share a single table in your database, your hardcoded admin password is 1234 — or maybe you write code that controls nuclear power plants, airplanes or similar things. But if it is the former, no amount of formal verification will save you anyways because wrote software like a money writes poems: By accident.

          If the idea of a critical domain is new to those reading this comment, if you never thought about user proviledges when writing an application, please read this as proof of the point that the field of software engineering still lacks in all sort of places and it is in all our interest, both professional and as users/potential victims to improve the situation.

  • anal_reactor 3 months ago

    I was thinking about this, and to be honest, that's what I like about software. It needs to be a logical program, but there's also an artistic element to it. "Why did you name this variable like that? Because I was feeling like it"

  • cubefox 3 months ago

    > In all other engineering fields you have to produce calculations and "proofs" that your design ought to work.

    That seems obviously false, unless you understand ""proof"" in such a lax manner that software engineering has them as well.

nanolith 3 months ago

In my opinion, most developers should be using bounded model checking if available for their language / platform. This is certainly true for C, Rust, Java, and others.

I consider bounded model checking to be "formal methods lite". It provides most of the benefits at a lower cost of entry than using a proof assistant or building constructive proofs. Really, there's little added overhead. Perhaps 30% to 40% more time to build out the function contracts and model checking. Given that this overhead more or less prevents errors that would likely be introduced without it, I think it's a reasonable investment.

TLA+ is certainly related, since it uses an SMT solver at its base. I see it as useful for designing algorithms and protocols. A tool like CBMC or Kani provides similar guarantees at the source code level. It's not perfect, as currently CProver does not have direct threading support, but with a reasonable application of method shadowing and function contracts, even things like threading can be anticipated. Using a bounded model checker effectively means changing the design of software to work best with it. This is little different than using concepts like TDD or continuous integration.

  • vlovich123 3 months ago

    In my experience traditional property checks are pretty difficult to write already (30-40%). I get the sense that a bounded model check would be even more expensive than that, probably into the 2-3x range if not more. I’m talking about meaningfully complex logic, not very simple things.

    • rtpg 3 months ago

      It would be interesting to have a workbook of what people consider valuable examples of issues we are trying to solve. Like property checks are sometimes easy to write, when your property aligns well with property checking models! But then time-based stuff like TLA+ ends up working way better, sometimes.

      There are plenty of canonical examples out there for resolving some issues with types, and having a bunch of one-pagers on issues people hit that people might or might not want to tackle with some flavor of formal method.

    • nanolith 3 months ago

      You can write these checks as assertions in your regular source language. It's no more difficult than writing runtime parameter checks, really.

      There are some complexities, to be fair, but these are mostly around the performance of the model checker. Some things are easy to check, and other things, like loops and recursion, are harder to check. However, this is a matter of optimization, and with practice, this becomes quite easy to deal with.

      • vlovich123 3 months ago

        As I said I use property checks as much as possible, but they’re still cumbersome and the complexity and difficulty scale non-linearly with the complexity of the interface; some properties are easy to write whereas others are so difficult I can’t imagine how to write a generic property for all situation vs resorting to boundary conditions I think of. The challenges with the model checker is that there’s all sorts of constructs it can’t test. For example, Kani can’t handle await expressions which is a huge obstacle for any async Rust code.

        • nanolith 3 months ago

          Threading and async are an issue with the current CProver core. But, much of that can be simulated by writing helper functions that get shadowed during the model checking.

          It's simply not possible to make a bounded model checker work on arbitrary code, so instead, the code should be factored to work within the constraints of the model checker. The result is safer code, even if the style is different.

          • vlovich123 3 months ago

            I never said threading. I use a single-threaded runtime. But the lack of any async support even though it’s basically syntactic sugar at that point makes it a non-starter. There’s some amount of “make your code testable” that’s valid, but completely rewrite how your application is written and thus you can’t use the frameworks you need to is a bit much. At some point the testing framework has to also meet you where you are.

            • nanolith 3 months ago

              kani certainly could be extended to have better async / await support. But, I think this is a larger engineering problem. The value of code that has been verified using an existing model checking tool is greater than the value of code using a particular framework and hoping that Some Day there will exist a tool that will work with that framework.

              I'd wrap libraries that were problematic to build function contracts around before I punted on model checking. Hell, I'd fork and refactor these APIs to work with lightweight fibers before I gave up on model checking. The utility of something like async / await is small potatoes compared to the utility of having code that has even light formal verification in place.

  • siscia 3 months ago

    In Java it would be something like JBMC?

    • nanolith 3 months ago

      Yep. JBMC is part of the CProver / CBMC family.

kazinator 3 months ago

Non-software engineering doesn't excessively use formal methods. Only where required. (Like where a system is tightly optimized.)

For instance, in electronic engineering, you don't use the most accurate model of a diode at all times. Sometimes it's just a one-way valve. Sometimes, it's just one-way valve with a 0.7V drop (if silicon).

In mass production, you will not get the accurate parts needed for the most formal model to be justifiable, and the cost of those parts would not be justified in most of the circuitry.

erik_seaberg 3 months ago

I learned some TLA+ and started reading about Coq, but I was pretty disappointed to learn of the "verification gap" between the the algorithm to be delivered in a normal programming language and the manually-restated algorithm that passed the checker. We need to make a checker's input language ergonomic enough for daily use and then "synthesize" something that runs efficiently while still being known correct, or make a checker understand everyday programs (which, not being so minimalist, probably have a much larger space of reachable states to check).

  • brabel 3 months ago

    Yep, I have the same impression. Anything that proves something about stuff that's not the actual running code is nearly useless to me. The code will be changed every single day by developers who may not be familiar with that code. The verification must work on that implementation code.

    As I see it, the only way to do it properly is to embed the proof in the code like we embed types in the code. For example, we've been able to get rid of NullPointerException completely by using the checker framework, all we need to do is maunually annotate all values that may be null (everything else is assumed non-null, if you try to assign it to null the compiler won't compile):

        @Nullable String getSomething();
    
    The annotation is not a layer on top of the code, it is code. The compiler can check it (with appropriate extensions).

    I even played with adding support for dependent types, I am convinced the compiler extension would allow me to do that, but unfortunately I have other things to do.

  • ukuina 3 months ago

    This is my biggest gripe with verification. I am hoping that LLMs help with inferring intent from a subroutine, then asking the programmer if the inferred intent is complete and correct, and then automatically writing/updating a verification harness.

    • eschaton 3 months ago

      What makes you think autocomplete on steroids with no internal reasoning capability or representation of knowledge would be at all useful for inferring intent? Or have you been fooled into thinking an LLM is something like Eurisko/Cyc?

      • bubblyworld 3 months ago

        Personally I'm with the parent poster - I use LLMs to help me with intent in new codebases I don't understand yet all the time, and empirically they seem to understand it pretty well. Useful, especially when you don't have good documentation on hand.

        • eschaton 3 months ago

          Boy it sure is useful that you can trust code you don’t understand to have clear variable names and comments for the LLM to key off of.

          • bubblyworld 3 months ago

            Clear variable names and comments aren't a requirement at all.

            It sounds to me like you have a philosophical problem with LLMs, which is something I don't think we can debate in good faith. I can just share my experience, which is that they are excellent tools for this kind of thing. Obvious caveats apply - only a fool would rely entirely on an LLM without giving it any thought of their own.

            • eschaton 3 months ago

              I don’t have a philosophical problem with LLMs. I have a problem with treating LLMs as something other than what they are: Predictive text generators. There’s no understanding beneath that which informs the generation, just compression techniques that arise as part of training. Thus I wouldn’t trust them for anything except churning out plausibly-structured text.

      • pxeger1 3 months ago

        Because ideally good code should make the intent obvious from the names and comments, so inferring a full description should really just be an autocomplete task.

        • eschaton 3 months ago

          The topic is using an LLM to learn a codebase one doesn’t understand. Does that sound like a codebase that has names and comments from which a full description could be inferred?

    • bubblyworld 3 months ago

      Have a look at Eureka, they use a combo of LLMs and basic evolutionary methods to automate solving reinforcement learning problems. What's interesting is that the only human input is the code for the environment of the RL problem and some fixed scaffolding prompts for the GA.

      Sounds like it could be a good starting point for something like this.

  • yencabulator 3 months ago

    It would be lovely if Kani and/or Prusti could cope with more realistic code, but they are where my current hopes lie in eliminating the verification gap. The proof is built from (very!) simple functions written in a subset of Rust. If that manages to get to you all the way to the end result, then there was no gap.

    https://github.com/model-checking/kani

    http://prusti.org/

ChrisMarshallNY 3 months ago

I worked for a [Japanese] corporation, that took “formal methods” into overdrive.

They made really, really good hardware, but it was extremely painful for us software schlubs.

That said, I don’t know if it’s really possible to do really big stuff, with a team, unless you have some degree of formality.

Best practices are usually a good place to start, and I would suggest that the degree of necessary formality, is inversely proportional to the experience of the team. If you have a lot of really experienced engineers, in a mature team that has been together a long time, the formality is still there, but doesn’t need to be written down.

I usually work alone, or as the sole technical person, in a diverse team. I’m really experienced, and don’t write much down.

But I’m also really, really formal. It just doesn’t look like it.

  • spankalee 3 months ago

    Formal methods is much more specific than formality and writing things down. There's a degree of proof that you achieve with formal methods that you don't even with design documents, reviews, and the usual tests.

    • ChrisMarshallNY 3 months ago

      Agreed, but also, “writing things down,” was a bit of a euphemism for the many, many aspects of a formal approach.

      The main issue is that we can get so wrapped up in the process, that we fail to get anything done.

      It’s a question of balance.

  • nextos 3 months ago

    There are relatively lightweight methods which can be less taxing. For some domains, where requirements are frozen, iterative refinement of specifications to code can be easy and enjoyable. See for instance: https://raisetools.github.io/

    My hope is that code assistants can make all this much easier by facilitating synthesis, you write the specification, and ML gets you the code. I have a fairly good toy prototype doing such a thing for the subset of a common functional programming language. Obviously, the difficulty is to scale it and make it practical.

  • kccqzy 3 months ago

    It doesn't seem like you are describing formal methods. You seem to be describing plain old bureaucracy.

    • ChrisMarshallNY 3 months ago

      Not gonna argue. I know that, for many folks, these days, anything that imposes guardrails and boundaries, is considered “bureaucracy.”

      It’s funny, but there’s a heck of a lot of structure, in today’s software industry. It’s just not called “process,” or “formal methods.” It’s just “this is how everyone does it.”

      Pretty much a requirement, when your workforce leaves, every 18 months.

      • yencabulator 3 months ago

        I think you still haven't grasped what formal methods really means.

        Think proving your software correct with math, not "project management forces us to write documents".

        Formal methods is very much not "how everyone does it", currently.

        • ChrisMarshallNY 3 months ago

          Why are you arguing?

          BTW: that corporation was very much into “proving right with math.” The Japanese really like math, and this corporation probably has hundreds of patents, describing testing and validation methodology.

          Watts Humphrey once wrote a book that described a personal formal methodology. Can’t remember the name, but it was a really big deal, in the 1980s. I think it went a bit overboard, but it worked well, for folks that followed it completely.

          • yencabulator 3 months ago

            You keep saying things that don't sound like anything at all like what TLA+ is.

            Humphrey's Personal Software Process is just another project management methodology; nothing do with this topic.

            • ChrisMarshallNY 3 months ago

              OK. You're right. I don't know what I'm talking about, and you win the Internet.

              Cool. Have a great day!

  • jxramos 3 months ago

    Does it come out instead in PR reviews?

    • ChrisMarshallNY 3 months ago

      If I work alone, there aren’t any PRs.

      But I do spend a great deal of time reviewing my work; often going back, and tweaking and testing, for days.

      My testing code tends to dwarf my implementation code.

hresvelgr 3 months ago

I think we might potentially be looking at this backwards. There is a great talk by Jack Rusher [1] on why having really slow iteration loops and being too disconnected from running programs creates a desire for theorem proving because it becomes the faster way to arrive at a correct solution.

I am of the belief that we need to return to modifying live environments instead of sending code through a lengthy build pipeline. Instead of trying to make live environments secure and durable we got scared and now we have modern CI/CD.

[1] https://www.youtube.com/watch?v=8Ab3ArE8W3s

  • yencabulator 3 months ago

    Good luck trying to get something like Paxos correct by poking at it in a REPL.

    • hresvelgr 3 months ago

      Not what I'm advocating and certainly not REPL. When I'm talking about a live environment, I mean something _like_ Pharo [1]. I'm arguing against long compile and deploy loops, not formal verification.

      [1] https://pharo.org/

zeroCalories 3 months ago

I do often employ the simple whiteboard methods described, but I've never found the energy to learn and use stuff like TLA+. What fields do people find them useful in?

  • Jtsummers 3 months ago

    My only time using TLA+ "in anger" was on an embedded system. There was a problem in the hardware portion and our boss wanted proof it was the hardware and not our software. He didn't accept any of our evidence that it was a hardware flaw, partially because our tests weren't failing 100% of the time. I used TLA+ and ended up with a handful of traces which we were able to recreate with the hardware and some small programs (remove everything our system actually did, just use the busted data bus) to demonstrate the failure, they always failed instead of failing only 90% of the time (shouldn't have been needed, it was obvious the hardware was broken). The TLA+ model started off being a reasonable fidelity model of the specified hardware bus, and then I started messing with it (in a deliberate fashion, altering state transitions) until I got traces and invariant violations similar to the real-world hardware.

    I've used Alloy for similar things, but only after the fact not during the investigation and as a way to learn Alloy.

    I'd use them both again (TLA+ especially) if I could convince people it was worth the time, but it can be difficult to get the time to spend on it at work while also meeting other work obligations.

    I've used TLA+ for some personal projects involving concurrent and distributed systems (toys, nothing notable just playing). I used TLA+ to demonstrate that my design worked as I intended, and then started writing code based on the model.

    • tonyarkles 3 months ago

      Yeah, I've done something similar twice for embedded systems. First one was modelling a BLE protocol between a mobile device and a piece of hardware to confirm that the protocol the vendor provided would definitively lose messages in specific circumstances. The second was to model the LoRaWAN protocol as a state machine and prove that the SM model I'd put together couldn't deadlock; the C implementation of it modelled the TLA+ model very closely. I was honestly pretty shocked that after getting the first version of the implementation done there was exactly one bug in it and it was a typo (used the wrong variable somewhere) and not a logic error. Other than that the implementation basically worked perfectly first try and just kept running indefinitely.

  • chrsig 3 months ago

    I've been learning Alloy[0], which I think is probably more practical for most software engineers. I'm using it with the goal of documenting and decomposing a legacy system that I've been working with for a number of years.

    It's more geared towards the static elements of the system, and writing a spec feels quite like writing a relational database schema -- mostly because they're both expressions of relationships.

    It seems like a pretty natural path to write a spec in alloy before building out some crud interface.

    I think there's a huge tooling gap though. Alloy, TLA+ and friends all seem to be very jvm/UI centric, really targeting a user who is working in the editors the language ships with.

    I'd really love to see more tooling to run model checkers in a CI pipeline, or generate stubs and test cases for various languages. If that were in place, I think there would be more stickiness to formal methods.

    I've started writing a Go parser for alloy, in part to learn it better, in part because I know go and want to write some tooling that I think go would be a good implementation language for. I don't know if anything will come of it, but at least having a parser would allow for things like a formatter or code generator. If I were feeling ambitious, I might try to implement a model checker.

    [0] https://alloytools.org/

    • photonthug 3 months ago

      > tooling to run model checkers in a ci pipeline

      Helpful to escape the alloy UI:

      https://github.com/elo-enterprises/docker-alloy-cli

      • chrsig 3 months ago

        My hero!

        I looked into what it would take to tackle it from the source code, There were some barriers, like the jars not really being published to maven, but the biggest just being the time & energy investment to pick up java again.

        I'll definitely be looking into this project and seeing if there's anything I can contribute back.

  • 01HNNWZ0MV43FF 3 months ago

    I find my biggest problem is interfacing with apis from complex dependencies I can't control, usually OSes and gui libs. I assume there isn't a ton formal methods can do for that unless I set up a VM I can tightly control?

    • buescher 3 months ago

      At that point you're probably blowing up your verification/validation space to where it might be intractable , in both a machine and human sense. People have done research work in formal methods all the way down to the assembly level, of course, but could you even write the correctness properties for the software you're describing?

      Where you can use these methods - think of the part of your program's behavior you can describe with the article's "whiteboard" methods mentioned in the post above - truth tables, decision tables (! TIL), state machines/statecharts. If you can formulate it that way, not only is it easier to reason about and to test, but if you can also work out what would make it correct, you can run it through an automated checker.

verisimi 3 months ago

> What is TLA+? > > TLA+ is a formal language for specifying systems used in industry and academia to verify complex distributed and concurrent systems. Among others, the TLA+ methodology is successfully applied at Amazon Web Services, Microsoft, and Oracle.

https://conf.tlapl.us/home/

treprinum 3 months ago

"Beware of bugs in the above code; I have only proved it correct, not tried it. " -- Donald Knuth