Insights after 11 years with Datomic [video]

205 points by eduction 4 days ago

Several comments mention the data immutability of Datomic as a plus and I just wanted to say you can totally make a plain-old-RDBMS table append-only and get those benefits. I'm sure this is commonly done.

I did it with a timestamp on the tables that was captured at insert time. All reads were against views of the tables that were defined such that the only tuples returned were the "most recent" tuples by appropriate data fields and max(timestamp). "Deleted" records were just indicated by a flag.

This preserved the ability see the full history for a tuple from creation, all mutations, all the way to deletion. This scaled reasonable well up to low millions of tuples on a normal, single database server. But it was for an internal project, so the number of clients hammering at it was quite low.

diggan 4 days ago

> Several comments mention the data immutability of Datomic as a plus and I just wanted to say you can totally make a plain-old-RDBMS table append-only and get those benefits. I'm sure this is commonly done.
Absolutely, you can also store an append-only log to a txt file on disk and have your API backend recreate the "current" data from that append-only log.
I guess people mentioning as a plus as it's out-of-the-box default behavior for temporal databases like Datomic or XTDB, where they are optimized for these type of queries with years of work done on it.
Just as a fun aside and only slightly related: I discovered the other day that MariaDB supports (bi)temporal tables out-of-the-box too! Probably the only (FOSS) SQL database that does so? https://mariadb.com/kb/en/temporal-tables/
panick21_ 4 days ago

This works to a limited extend and leads to a huge complexity explosion as soon as you go beyond a single table. Going down this route is soon gone eat much of your complexity budget.
I worked on a hospital information system that did this for all forms, and the parts of the table had complex self referential links. Getting the actual history out of that thing was SQL hell.
- clusterhacks 4 days ago
  
  I agree that this approach would be pretty unwieldy for a large/complex backend schema. It worked quite well for a smaller, in-house application and set of requirements. I think I probably would be perfectly happy using the approach again for our in-house applications if requirements for audit/history preservation needed to be built into a data model. I don't have a good feeling for when I might say "this data model is too large/complex for this approach." I might instead think more about what and how many subsystems are going to have direct access to the RDBMS as a cut-off?
  Hospital/medical research information systems (my day job is in the backends of these apps) seem to use backends with many compromises and poor db designs bolted on. I have also dealt with the horror of electronic data capture forms. My most recent headache has been a poor data model that ultimately wraps each form in a very ugly JSON blob. I've never seen anything that so completely lacks any residue of design . . .
parhamn 4 days ago

Sure, most things are representable in a table. Its very tricky to get these things to perform well. Every query needs to have the additional filters/aggregations and every index needs to be smart and partial. E.g. you need all your indexes to be partial with a CREATE INDEX WHERE deleted_at = null for the deleted case as having a giant boolean filter can quickly include a large percentage of your data making the indexes useless.
I tried this once many years ago and it became quite a headache. Direct SQL access to the database to perform management queries outside your application becomes a headache too.
jacques_chester 4 days ago

This description isn't too far from bitemporal tables, which are one of my favorite obscure technologies.
I think though that one distinction is that immutability in this case requires cooperation from the client, a commitment not to modify existing records. As compare to the database enforcing it.
- clusterhacks 4 days ago
  
  I didn't mention it, but in this particular service, no client actually had SQL access to the database. There were create/read/update/delete service functions that clients used instead.
  The timestamp for a tuple was returned by reads, so when a client wanted to update/delete a tuple, the service functions all required the client to provide the timestamp from the client read. If that timestamp argument wasn't the most recent, the client had to deal with that. But the actual database insert would fail.
  Importantly, this particular service had only a few internal clients who almost 100% "owned" the tuples the client worked with. So there wasn't a lot of contention for any specific tuple by multiple clients.
  - jacques_chester 4 days ago
    
    > There were create/read/update/delete service functions that clients used instead.
    Oh! I haven't seen this pattern in a long time - not since I worked on Oracle Application Express apps. In a cousin comment I noted that the bulk of app developers don't think about how to use the database to their advantage.
- Mister_Snuggles 4 days ago
  
  This could be enforced in the schema via triggers and/or security permissions. Cooperation from the client is not required.
  EDIT: Oracle has append-only tables, and can also use "blockchain" to verify integrity. See the IMMUTABLE option on CREATE TABLE[0]. PostgreSQL doesn't appear to have append-only tables, so using security and/or triggers seems to be the only option there.
  [0] https://docs.oracle.com/en/database/oracle/oracle-database/2...
  - mike_hearn 4 days ago
    
    More to the point Oracle has flashback queries:
    SELECT ... AS OF <timestamp or logical clock time>
    So you can query the database as of any point in the past, making manual work to implement the same feature with custom columns redundant.
    Oracle can also show you a record of transactions made and what SQL to run to undo them, including dependency tracking between transactions.
    
    fulafel 2 days ago
    
    This sounds like proprietary syntax for SQL 2011 temporal queries. Apparently supported by eg MS SQL server and MariaDB.
  - jacques_chester 4 days ago
    
    You're correct, I overlooked triggers. Though that may be a bridge too far for some folks, triggers are only really comfortable for folks who are deep on RDBMSes. For lots of app developers the ORM is the limit of the world.
    
    Mister_Snuggles 4 days ago
    
    ORMs could offer a unique advantage by allowing the user to describe an append-only table and generating the required triggers (or the appropriate CREATE TABLE options). They'd also be able to include helpers to make working with the table easier - like defaulting to selecting current rows only, or an easy way to specify that you want rows as-of a certain point in time.
    I'm not sure if any ORMs actually support this though.
    
    refset 4 days ago
    
    I suspect there have been a great many examples and attempts at the ORM level - this one with "bi-temporal chaining" springs to mind: https://github.com/goldmansachs/reladomo
    And without the ORM layer there's this extension for Postgres: https://github.com/hettie-d/pg_bitemporal
    
    codr7 4 days ago
    
    My "ORM" supports event sourcing, or at least easily plugging it in at the model level.
    https://github.com/codr7/hostr/blob/532295b40dcad6c54082e1d3...
codr7 4 days ago

I would recommend going full event sourcing if you're going to put in that kind of effort into preserving history.
Assuming you don't need real time access to data from several points in time, that is.
Just store the events with timestamps and json in a separate table, I usually build a hierarchy with parent event refs to simplify auditing.
As an added bonus, the app is now completely command driven.
Getting historic data for a specific point in time means starting with an empty database and replaying earlier events, alternatively storing undo information in events.
- mananaysiempre 4 days ago
  
  What about Apache Samza[1], which is specifically built to maintain database-shaped views on top of event logs—do people run it? Or is the infra overhead from it requiring Kafka and a Java runtime and so on too much?
  [1] https://samza.apache.org/
  - codr7 2 days ago
    
    No experience.
    But the thing is, I usually already have a perfectly capable database, any extra infra better pull its weight and then some.
michaelteter 4 days ago

The timestamps are just the tip of the iceberg. The value, as I see from the YT talk, is that additions (and atomic groups of additions) can be recognized as one event, complete with who did it and additional metadata.
- mike_hearn 4 days ago
  
  Isn't that just logging transactions? The talk says Datomic is fundamentally different, but I don't see anything on his list of features at the end that's not been available for years in commercial databases.
  https://oracle-base.com/articles/11g/flashback-and-logminer-...
  - refset 4 days ago
    
    Datomic is built around a simple log of fully serial transactions. It doesn't have MVCC or other such magic to coordinate concurrent writers, and this means (1) read-only queries don't require open transactions and (2) the whole stack can have mechanically sympathy with this accrete-only design, affording massive efficiency and scaling potential. Capabilities may look similar on the surface but that's where the similarity ends. Take a look at slides 13 & 14: https://qconsf.com/sf2012/dl/qcon-sanfran-2012/slides/RichHi...
    Datomic is much more similar to Delta Lake or Apache Iceberg than to a classic RDBMS like Oracle in that respect - but then that analogy severely undervalues it's information modelling qualities.
    
    dustingetz 2 days ago
    
    if the log is as efficient and scalable why is MVCC even a thing? TBH the description on wikipedia looks identical to Datomic to me, minus the persistent data structures. Does it just come down to rectangular fixed width structs of RDBMS give better query performance?
    https://en.m.wikipedia.org/wiki/Multiversion_concurrency_con...
  - 15155 4 days ago
    
    Yes, Oracle has had proper/comparable time travel for years.
    Datomic is interesting with the Datalog interface combined with first-class temporality. One can likely achieve everything similarly in Oracle (but your pocketbook won't like it.)
hcarvalhoalves 4 days ago

You definitely can, but the ergonomics of doing that on top of tables vs. on top of a triple-store data model are very different.
pnathan 4 days ago

I've done that; I set it up so that the table was the record stream, then a materialized view with the current state would be refreshed asynchronously when an insert happened. Fast as lightning. Wouldn't do it by default for everything tho.
ilkhan4 4 days ago

Yeah, and this is probably sufficient for most common cases if we're being honest.
Immutability and bitemporal querying are nice features in Datomic, but the trade offs for most teams are an unfamiliar query language, unfamiliar runtime/hosting requirements, unknown performance footguns, little if any integration with 3rd party tools, and (until recently) licensing costs. If it were me, I'd probably deal with the headache or complexity of adding triggers/permissions and audit tables to Postgres to get that functionality if all of those other things are solved instead.
- refset 4 days ago
  
  As it happens there's a talk happening next week at PGConf NYC on time travel and system-time versioning in Postgres by the DBOS team: https://postgresql.us/events/pgconfnyc2024/schedule/session/...
  But system-time is only half the story. There has been chatter over the years on the PG mailing lists, but there doesn't seem to be much momentum currently towards adding full SQL:2011 bitemporal support to Postgres. Adding support in a way that feels as natural as using Datomic seems unlikely.
- stefcoetzee 4 days ago
  
  Think Datomic is unitemporal a.o.t. bitemporal.
null_investor 4 days ago

That doesn't that well though. Datomic is much more efficient on doing this

michaelteter 4 days ago

Datomic (as presented in this video) seems like a great thing. But if it's so great, why are so few people talking about it? Is it because it only exists for Clojure, and few people know or talk about Clojure?

I love Clojure as a language, and I have wanted to use it in production; but there are so few opportunities other than solo building - where you really have to climb a steep wall due to the Clojurist mentality of /Frameworks Bad! Choose all your own libraries good!/. Instead I have found myself happy and productive in the Elixir/BEAM world, of course initially because of Phoenix.

And like many people, I have just been accepting the destructive, non-historic approach to data management with typical RDBMSs. If there were a Datomic for Elixir, I'd probably use it.

Must Datomic (or a data management approach like it) be tightly coupled with Clojure?

augustl 4 days ago

I've used Datomic from both Kotin and Groovy (!)
I presented on Datomic at KotlinConf too, with some live coding starting around the 31 minute mark https://www.youtube.com/watch?v=hicQvxdKvnc
You probably need to be on the JVM, as the peer library (i.e. the "good one", where you embed the full query engine and data fetching directly into your business logic) is so far only implemented for the JVM.
I suppose 10+ years of weird license models and a hefty price tag haven't helped. Datomic turned free (but still proprietary) in 2023 though. But why Datomic isn't more widely adopted is a huge mystery to me...
- foobarian 4 days ago
  
  > But why Datomic isn't more widely adopted is a huge mystery to me...
  I've been hearing about Datomic on and off over the years, and I never saw a clear answer to the simple question: what is it? What does it do?
  Instead I saw it most often mentioned as an example of a successful Clojure project, and now that's how I think of it. Seems like poor marketing/branding if there ever was intentional attempt at it.
  - diggan 4 days ago
    
    > I've been hearing about Datomic on and off over the years, and I never saw a clear answer to the simple question: what is it? What does it do?
    I'm guessing most people, like me, first heard about both Clojure and Datomic from one of Hickey's famous talks about software development. Coming from one of those, I guess it's a bit easier to understand what the various concepts and words mean from Datomic's website for example.
    I guess the easiest way to put it: Datomic is a append-only (deletions via "retraction" records) database that lets you query through time (also called "Bitemporal modeling" with fancier words).
    The features/benefits page I think is pretty clear (https://www.datomic.com/benefits.html) but again, might just be my familiarity speaking.
    
    refset 4 days ago
    
    > that lets you query through time (also called "Bitemporal modeling" with fancier words)
    Datomic is actually only 'uni-temporal', it provides a database-wide, immutable "system time" (aka "transaction time") versioning + very effective as-of querying. This naturally falls out of the "Epochal Time Model" (see Deconstructing the Database, 2012). However there is no particular built-in support for any further mutable time dimension, see: https://vvvvalvalval.github.io/posts/2017-07-08-Datomic-this...
    System-time is still very powerful though. The way I think of the difference is: system-time versioning is mainly useful for debugging and auditing (and of course for the horizontal read scaling), whereas valid-time versioning is useful for timestamp-based reporting (or other forms of in-application time-travel and modelling 'truth' in business data) where the system-time timestamps are not the timestamps end users are directly interested in, see: https://tidyfirst.substack.com/p/eventual-business-consisten...
    
    foobarian 4 days ago
    
    I am aware of the basics but it's hard to find details on "why not Postgres" or other standard solution for basically event sourcing.
    
    diggan 3 days ago
    
    Event sourcing is usually something you'd have to implement yourself, it's "just" a pattern.
    While Datomic et al gives you a bunch of useful features out-of-the-box.
    Yes, you can build a query system that allows you to see data at specific points of time with event sourcing, but you likely have to implement that yourself. Compared to temporal databases where it's just (usually) a parameter you pass together with your query.
- michaelteter 4 days ago
  
  Thanks. I’ll check out the video.
  At this point I’m pretty addicted to BEAM and the Elixir ecosystem though. (And LiveView!).
  - macintux 4 days ago
    
    I doubt it's still the case, but once upon a time Riak (written primarily in Erlang) was one of the preferred storage backends for Datomic.
seanc 4 days ago

I used Datomic for a few years at a job.
In addition to the cost and proprietary nature, I'd say there are two reasons;
1) New and different things come with risk, and many folks are risk averse, especially in groups
2) More concretely, it is very easy to write slow queries in Datomic, and it can be a struggle to diagnose why the order of your datalog statements matters so much
refset 4 days ago

> If there were a Datomic for Elixir, I'd probably use it.
XTDB may possibly of interest in that case - there's been some recent chatter about adding support for Ecto via Postgres compatibility: https://discuss.xtdb.com/t/elixir-ecto-thread-from-slack/490
(I work on XTDB)
apwell23 4 days ago

Its closed source software thats VERY expensive and supported by one tiny company somewhere.
Try selling that to management as the place where you want to keep their data.
- armincerf 4 days ago
  
  Ok but XTDB has a lot in common with Datomic and is open source but still hardly widely used. I think this is partly because most people don't consider 'temporality' as a feature a database should offer; rather, they believe temporal problems should be solved via proper schema design and application logic. Additionally, using datalog (or any esoteric query language that isn't SQL) locks you out of many battle-tested tools that enterprises rely on.
  The XTDB team has pivoted towards a SQL-first approach (though still supports datalog) and now 'only' has the 'But why not just use Postgres' problem to solve.
  Having personally moved from trying to use Postgres for everything (including lots of timeseries data with Timescale) to a dedicated and relatively unknown DB built purely for the purpose I want it for (QuestDB), I am all for more people trying to build databases that do specific things better than Postgres. However, it will be very difficult to create something that does literally everything Postgres can do but better, which probably makes Postgres the sensible choice for the majority of applications.
  - refset 4 days ago
    
    > this is partly because most people don't consider 'temporality' as a feature a database should offer
    Not sure on your definition of 'people', but I think every business ultimately wants solid auditing and reporting capabilities across their IT systems.
    These concerns are only increasing in importance as new regulations demand stronger data provenance, but their implementation shouldn't be reliant on the process of "proper schema design" to get things right first time.
    Databases built for the modern world should be making this stuff bulletproof and easy.
    (I work on XTDB - and if Postgres already supported temporal tables I possibly wouldn't!)
    
    armincerf 4 days ago
    
    Oh, I totally agree that 'people' (which I guess refers to any potential user of a DBMS) often do need temporality (or even bitemporality), but they don't consider that the database should have this baked in, or they just don't consider it much at all.
    I'm fully on board with XTDB and similar solutions for this reason. Most people still gravitate towards Postgres and similar databases without giving much thought to these temporal challenges, even though those options lack a robust solution for the temporal issues that so many systems demand.
- nikodotio 4 days ago
  
  It is free now:
  https://blog.datomic.com/2023/04/datomic-is-free.html
  - rjbwork 4 days ago
    
    It's still closed source. If this company goes kaput, you're hosed. If you need to customize it to make progress years down the line you're hosed.
- jdminhbg 4 days ago
  
  "One tiny company" is Nubank, with a $70B market cap: https://finance.yahoo.com/quote/NU/
  - pjlegato 4 days ago
    
    Nubank _as a company_ does not support Datomic. It's a bank.
    _A few employees_ they acquihired support Datomic.
    If those few employees quit, get fired, get transferred to some other project, etc., then no more Datomic support.
    
    jdminhbg 4 days ago
    
    Nubank uses Datomic internally. They’re not just gonna throw their hands up and quit if someone retires.
    It’s perfectly reasonable to not want to use a DB that isn’t open source but Datomic isn’t a side project that’s on the edge of not existing.
  - apwell23 4 days ago
    
    hey boss lets have our database be supported by a ... brazilian bank. prbly not a winning proposal.
wry_discontent 4 days ago

It's because Datomic has a confusing new model, and suffers from the same beginner unfriendliness as the rest of Clojure.
It's a language aimed at experienced old school programmers, and doesn't aim to be friendly to younger less experienced programmers. They needed a better marketing team for the product.
- jwr 4 days ago
  
  > It's a language aimed at experienced old school programmers, and doesn't aim to be friendly to younger less experienced programmers.
  I think Rich Hickey had a point when he said "[musical] instruments are made for people who can play them". I don't see people complaining about the piano or the saxophone being difficult for beginners. They are.
cutler 4 days ago

There are plenty of Clojure frameworks. Electric Clojure is the latest but Kit with HugSQL serves all my needs.
- michaelteter 4 days ago
  
  Electric looks awesome, but their stern warning of “we are building this for ourselves, and if you get value that’s great” gave me pause.
  If it reaches a point of being documented and intended for general public use, then I’ll definitely try a project with it.
  - dustingetz 4 days ago
    
    That's right, I have a blog post cooking up about why Electric is for experts today. A major factor in this is because, like with Clojure, the users aren't paying us, so the documentation you want cannot yet afford to exist. Another is performance - to get Electric to purr you have to understand what you are asking the computer to do (I've seen the stuff senior engineers type with their AI codegen tools, that approach is simply not viable here, at least not yet). The net impact of these two factors is that if Electric is not obviously the exact thing you know you must have—i.e., you are already succeeding or have the possibility of succeeding with something else—there is high risk that your adoption will not succeed, leaving you frustrated and unhappy! Failed projects do neither of us any good, that is a recipe for a damaged brand.
    A bonus third factor is that the demos we've been cooking up internally—that we haven't revealed yet—are so f%cking incredible that everyone is going to be motivated to use and learn it anyway because Electric yields value that is previously unseen and unavailable anywhere else. So I am simply setting healthy expectations for success. For example, we just built the 80% that matters of the “sync engine” value prop in two weeks and 100 LOC. Implementing it in userland requires 1 LOC per query. With differential network traffic for over the wire O(1) remote incremental collection maintenance! for free! And the pattern works with any database!
  - diggan 4 days ago
    
    > Electric looks awesome, but their stern warning of “we are building this for ourselves, and if you get value that’s great” gave me pause.
    Clojure (the core language) is developed in exactly the same way, Rich Hickey is pretty forthcoming with the approach they take. So if a framework with that approach gives you pause, probably Clojure the language should do the same. Relevant: https://gist.github.com/richhickey/1563cddea1002958f96e7ba95...
    
    fulafel 2 days ago
    
    Clojure not developed in the same way - it's really conservative about backwards compatibility and puts a lot of thought in existing users. Electric is still making backwards incompatible changes, in contrast.
    You're probably thinking about the open source community developed project vs "it's not a democracy" aspect where Clojure is definitely in the latter camp.
    
    michaelteter 4 days ago
    
    But I know Clojure is in use in quite a few places, and it has quite a few really capable people supporting it in one way or another. Plus it has good documentation and several books teaching it. That's very different from a slick (and impressive) framework built by one company for themselves.
    Also, code written for Electric is very specific to Electric. But Clojure is just functional-first Lisp. It's very easy to rewrite in another language, even an imperative language. There wouldn't be a huge paradigm shift translating Clojure to any of half a dozen popular languages.
kragen 4 days ago

unlike clojure, datomic is proprietary software, so you'd be a fool to choose to invest your time in learning how to use it unless you're rich hickey
investing your time in learning how it works so you can clone it might be worthwhile
fulafel 4 days ago

I think the takeoff was hampered by being paid software at the time when it generated the most excitement and then the "free to use but proprietary" concided with XTDB. (And then for XTDB its v1 -> v2 reboot dampened that one)

geokon 4 days ago

Relevant blog I found interesting: (Datalevin)

https://yyhh.org/blog/2024/09/competing-for-the-job-with-a-t...

From the perspective of someone not familiar with the topic. And I touches on performance

geokon 4 days ago

*it
Typo, didn't meant to imply I wrote it :)

augustl 4 days ago

Statistically (and from experience) I'm probably the weird one here, but I cannot fathom why Datomic isn't more popular.

I get that postgres is a good default in many cases, and I don't expect SQL to die tomorrow. But there are _so many_ apps (most/all backoffice apps I've worked on for example) with 10s or 100s of transactions per second at most, that would love to have the data available directly inside your business logic, and where both business logic and devops would improve by many orders of magnitude by having a full transaction log of all changes done to your data over time.

Is it _just_ because Datomic is different and people don't get it, and that preconceived notions makes you think Datomic is something it isn't?

Here's to the crazy ones!

koito17 4 days ago

Professional Clojure dev for some time. Here is what prevents me from using Datomic.
- no Docker image; still distributed as a tarball. Although com.datomic/local exists, it only provides the Client API, so it's mostly suited towards mocking Datomic Cloud.
- Datomic Cloud is designed around AWS; applications wanting to use Datomic Cloud must also design themselves around AWS
- Datomic On-Prem does not scale the same way Datomic Cloud does (e.g. every database has an in-memory index; the transactor and *all peers* need to keep them all in-memory)
- No query planner whatsoever. In databases like XTDB, the order of :where clauses doesn't matter since the engine is able to optimize them. In Datomic, swapping two :where clauses can transform a 10ms query into a 10s query.
In addition to the above four points, I strongly believe the following points prevent others from using Datomic.
- Writing Datomic queries outside of Clojure (e.g. Java) requires writing everything in strings, which feels awful no matter what.
- If you are not using a JVM-based language, then there is no choice but the REST API for interaction. The REST API is orders of magnitude slower than the official Clojure and Java clients.
- Too many tooling exists around the pgwire protocol and various dialects of SQL. Datomic obviously does not fit into either of these categories of tooling.
- Applications like DBeaver do not support Datomic at all. The best you can do is access the datomic_kvs table from a database using JDBC storage engine.
- refset 4 days ago
  
  > In databases like XTDB, the order of :where clauses doesn't matter since the engine is able to optimize them
  Rich's observation was that query optimizers often get things wrong in ways that are hard to predict or control, but he's not fundamentally opposed to their use. That said, building a decent optimizer is a huge undertaking and I think they took the right decision to not attempt to bundle that sort of complexity into the original vision for Datomic otherwise they might never have shipped.
  The state-of-the-art commercial engine for query optimization and execution algorithms in this 'triple' space is probably https://relational.ai/
  - marcinzm 4 days ago
    
    As I see it, Rich's stance is that of an expert in the database that doesn't need to deliver business features using the database. New users are not experts and even experienced users that work for companies have pressure to deliver features. You can get initial popularity by targeting these types of expert users working on more experimental products. However long term growth and popularity requires targeting the other 99.9% of users. I've seen one company adopt Datomic due to this type of user and then a couple years latter rip it out because as it grew it's developers were no longer of this type.
    
    refset 4 days ago
    
    The original RDBMS vision was very explicitly for the users (both developers and analysts) to not have to be experts in their own database in order to achieve useful work, and without needing to think about procedural/3GL code from the get-go. In the intervening years query optimization has gotten a lot better, and hardware shifts have only worked in favour of this vision, but there's still a lot of work to be done before databases are truly "self-driving": https://www.cs.cmu.edu/~pavlo/blog/2018/04/what-is-a-self-dr...
    Until that's the case I can understand why people are tempted to bypass this traditional RDBMS wisdom, especially if they have a very strong conception about their data models, access patterns, and need for scale (e.g. see also Red Planet Labs 'Rama').
    
    kolme 4 days ago
    
    If you don't have RDBMS wisdom and program an application around a RDBMS, your application is going to suck.
    No ORM or framework is going to save you.
    
    closeparen 4 days ago
    
    In my experience, structuring a query to execute efficiency requires some basic software-engineering thinking about what's going on, while convincing a query planner to do the right thing requires deep expertise in the query planner.
    
    refset 4 days ago
    
    > convincing a query planner to do the right thing requires deep expertise in the query planner
    The advantage is that that can be somebody else's job though, and ideally (eventually) an AI's job.
- panick21_ 4 days ago
  
  I kind of agree, I find it was never that easy to set up and use. I also found the documentation to be quite limited on specific points I wanted to know.
  They showed of a JOOQ like Java API for Clojure once but as far as I can see, this was never released. That is crazy to me, using it amazingly well from Java and friends would seem to me to be an absolute no brainer. That alone made it basically impossible to be adopted. Going from SQL/JOOQ to Strings was just not gone happen.
  They focused so much on Datomic Cloud, and that just isn't where most people are gone deploy. Specially in the age of Kubernetes and Docker. Its kind of crazy that there were not official Docker images and things like that.
  So even while I love Datomic conceptually, and once you have it set up with Clojure its pretty awesome. I would hesitate to really use it for a larger project.
  I would really love if NuBank simply open-sourced it.
- cloogshicer 4 days ago
  
  As someone without Clojure experience but eyeing Datomic from the sidelines, thank you for the detailed answer, super interesting!
- tvaughan 4 days ago
  
  For an example of Datomic in containers, https://github.com/carrete/datopro-mic
- cfiggers 4 days ago
  
  Would you reach for XTDB instead of Datomic?
- JB024066 4 days ago
  
  >No query planner whatsoever. In databases like XTDB, the order of :where clauses doesn't matter since the engine is able to optimize them. In Datomic, swapping two :where clauses can transform a 10ms query into a 10s query.
  Datomic has query-stats https://docs.datomic.com/reference/query-stats.html
jwr 4 days ago

Clojure developer here. I make a living from my SaaS, which is written in Clojure.
Reasons why I don't use Datomic:
* I've been burned in the past by a not-very-popular closed-source expensive database developed by a small company (specifically, Franz Inc's AllegroCache used with AllegroCL).
* I don't actually want to preserve all history. I am worried about performance.
* The data model doesn't fit my use case. I know my usage patterns and queries very well, so I am better served by KV stores.
* None of the storage engines is a good fit for me.
In case you wonder, I've been using RethinkDB and I am now moving to FoundationDB. I need a distributed KV store with strict serializability (https://jepsen.io/consistency/models/strict-serializable) consistency, running on my hardware.
In other words, not every database is a good fit for every application. The "just use Postgres!" crowd is wrong, and so is anybody who says "just use Datomic".
Skinney 4 days ago

Part of the reason is that it's been a commercial database until recently. I also think it hasn't helped that it's closely associated with Clojure (for natural reasons), which might make Java developers turn away from it.
It's a shame, though. Datomic is a marvel.
- kragen 4 days ago
  
  quite aside from whether or not it's 'commercial', it's proprietary
- amelius 4 days ago
  
  Yeah, the cost is the biggest reason probably. There's probably a lot of in-house developed Datomic clones out there (I know one, that was developed even before Datomic was a thing).
Arcanum-XIII 4 days ago

Pricing killed it for my experimental project.
Of course the complexity of having to set a cluster early on doesn’t help for small shops.
Then there’s the complexity of learning datalog - love it, but when I see people already struggling with SQL, I’m not confidant about using it generally !
- augustl 4 days ago
  
  It's free now - but I suppose 10+ years of being both closed source and paid didn't do wonders for adoption.
  What part needs a cluster in your experience? The Datomic deployments I've been involved with have been running on a single server, with a single instance of backing storage etc.
  Datalog is interesting indeed... Back when I first started using Datomic in 2012 I had just fundamentally decided to use it, and it took probably a week before the query language "clicked", i.e. to be able to actually compose my own queries and not just copy paste my way to something that works.
  - kragen 4 days ago
    
    to clarify, it's not free as in 'free software'; it's free as in 'free beer'
    
    OJFord 4 days ago
    
    In reply to 'its pricing killed it for me', that was not ambiguous.
ndr 4 days ago

I love Clojure and would love to use Datomic. The fact that it is not open source pushes me away. The risk of being locked in is far too great for anything serious. There's no Valkey move if someone captures Datomic.
- Arjuna144 4 days ago
  
  Absolutely true!
  "No Valkey move", I like it! That should be come it's own meme!
  Being free but not open source makes no sense at all. It is just our little ego that wants to "keep" what we think of as "ours" to ourselves.
  Give all you have to all and progress will go so much faster. Others may learn from the implementation of Datomic and make something better. So your ego is hurt, but humanity wins!!
  Give all you have, and it will never be enough! Give anyway!
intrepidpar 4 days ago

Datatomic is not open-source, is it? And I think the licensing model was a bit odd. Those definitely don't help.
- sswezey 4 days ago
  
  Correct. It's free, but not open-source.
pjmlp 4 days ago

I guess because those of us that are fine with Datomic, also don't have any issues going into IBM, Oracle or Microsoft RDMS portfolio of database products.
benrutter 4 days ago

My best guess is Datomic's coupling to the JVM makes it great for clojure developers (or java, scala etc) but not even considered by python or javascript shops, which make up the vast majority.
If python and java has better interop, I think the story would be a little different maybe? My experience is that java is always a little stubborn as "child code" so libaries like pyspark often involve notoriously difficult set up.
imhoguy 4 days ago

Tabular data model, thus also relational, is in use since dawn of known civilization [0]. Anything what can't be described and literally touched as table causes discomfort and will do for centuries to go. This is why CSV is still the king in data transfer.
Data outlives logic. Simple data structures survive complex data structures.
And when you hardly tie data retrieval with complex logic and specific technology then one day you may not be able to simply retrieve it.
~~~
"ODE is a database system and environment based on the object paradigm. It offers one integrated data model for both database and general purpose manipulation. The database is defined, queried and manipulated in the database programming language O++ which is based on C++. O++ borrows and extends the object definition facility of C++, called the class. Classes support data encapsulation and multiple inheritance. We provide facilities for creating persistent and versioned objects, defining sets, and iterating over sets and clusters of persistent objects. We also provide facilities to associate constraints and triggers with objects. This paper presents the linguistic facilities provided in O++ and the data model it supports." 1989, [1]
Do you see similarities? It even started similarly as a research project. And it is gone now, both logic and data, likely gone with last program using O++, and will be with Clojure.
[0] https://www.datafix.com.au/BASHing/2020-08-12.html [1] https://dl.acm.org/doi/abs/10.1145/66926.66930
- panick21_ 4 days ago
  
  And multiple table, connected make graphs. Almost nothing is a single table. If you want a single flat table in datomic, you can have it. In fact its literally just a big flat table conceptually.
  > And when you hardly tie data retrieval with complex logic and specific technology then one day you may not be able to simply retrieve it.
  You don't know how Datomic actually works do you?
TacticalCoder 4 days ago

> I get that postgres is a good default in many cases ...
And Datomic can run on top of PostGres anyway right? So it's not as there wasn't a lot of the knowledge that wasn't reusable: not for the queries themselves but everything about managing the database. Or am I wrong here?
- augustl 4 days ago
  
  Yes - if I understand you correctly :) Datomic writes opaque blobs of index data to existing storage, such as Postgres or dynamodb or a handful of others. But you can't query that data meaningfully directly, it's just a bunch of compressed datomic-specific index data in there, no domain structure or query power etc.
emmanueloga_ 4 days ago

“Simple” comes from simplex (Latin for “not braided”), while “complex” means “braided together.” Datomic? Well, it feels like a whole lot of braiding going on. Datomic: complex made hard?
Datomic requires a storage backend (e.g., Cassandra, DynamoDB, MySQL, etc.), transactors, peers, and a separate transaction log... also, it appears the few real users of Datomic are primarily running it in AWS. Trying to run it elsewhere likely introduces more complications and uncertainties. Good luck getting support, too—you might have to wait for Rich Hickey to finish his hammock nap (ba-dum-tss)
While Datomic's immutability, time-travel queries, and Datalog query language are conceptually cool, they seem like niche features. My guess is that in 99% of the cases SQL can do the job just fine. Also, Rich Hickey may not have braids, but those curls come pretty close.
Without deep pockets for high-RAM servers, many tend to steer clear of Java. If Datomic were packaged as a single binary with fewer dependencies (and no JVM requirement), it could attract more users. As it stands, it's a complex setup with limited benefits, primarily suited for specific use cases, I think (financial auditing / historical data analysis?).
---
COLOPHON
(from the Greek word κολοφών [kolophōn], meaning "summit" or "finishing touch")
I don’t have real-world experience with Datomic; I played around with Datascript and ultimately steered away from Datomic after perceiving a low ROI given its complexity.
I hope my fellow fans of Clojure and the venerable 'Simple Made Easy' talk can forgive this little attempt at humor at Rich Hickey's expense! :-p
- augustl 4 days ago
  
  > Without deep pockets for high-RAM servers
  I just got a dedicated server on Hetzner to test out some things. It's $70/month with 64gb RAM and a CPU that builds a complex C++ thingie in 11 minutes where my laptop spends about an hour :D
  Scaling is of course not trivial, but the same set of backoffice apps I've worked with throughout the years that would be a good fit for Datomic, has a working set for the database much smaller than 64GB.
  > they seem like niche features
  That's the thing, though. Maybe it's because I've use Datomic a bunch. When I'm on projects that use a SQL db, a handful of problems are just fundamentally solved in Datomic, and none of the super knowledgeable devs that know SQL in and out are even aware that they are problems.
  Some examples:
  - What caused this column of this row to end up with this value?
  - Oh no, we were down 3am but when the first person investigated it at 7am (oh those backoffice SLAs...) everything works fine and nobody knows that the db state was at 3am
  - When we wrote value X, what other values did we also write at the same time?
  - We need to carefully write a transaction that reads some data, writes some data, and reads some more data, and hopefully lock correctly, and hopefully we understood the isolation level (that we probably didn't set anyway) correctly and...
  Which makes me think I must be the crazy one...
  - emmanueloga_ 3 days ago
    
    The easier of debugging does sound super appealing, the immutability and append-only design sounds really cool, especially if you don't really have that many transactions going on! I tried Terminus in this space, and there's also Dolt. [1]
    How about the append-only/size side of things? I'm guessing if you use Dynamo or Cassandra, you basically forget about it (except when it comes to the bill ...). Is trimming the data straightforward if you don't have that much storage?
    --
    1: https://www.dolthub.com/blog/2022-03-21-immutable-database/#...
- dimitar 4 days ago
  
  Datomic Pro doesn't need AWS, I don't see what complications it can cause. It has some helpers for running it in AWS (like a bucket to export to CloudWatch), but none of them are mandatory. And if your use case is small you can run with dev storage (https://www.h2database.com/html/main.html); pretty much the same as you would SQLite.
  It might be slightly harder to get started with, but then the simplicity comes in when it is time to solve common business problems. A trivial example would be - we have this nice db, now our clients want reports. You run your reporting as a separate clojure process, it doesn't impact production at all, without needing to setup reporting databases and log-shipping.
- watt 4 days ago
  
  "complex made hard" would be great. it seems you have missed the point of the talk: complex is always easy, complex is always where the gravitation pulls you towards.
- zefurocks 4 days ago
  
  curls jealousy
dustingetz 4 days ago

Rich said (in a random podcast, which was not widely circulated - possibly Cognicast) that their mission with Datomic Cloud was for it to be the easiest way to get started with clojure. He even used the word “Easy”! (unless i misremember, this needs to be checked). Anyway, Cloud had consumption based pricing and developers, companies, and even hobbyists are all accustomed to paying for managed cloud infrastructure. To me this seems like a viable strategy, and devs would have forgiven the previous licensing issues -- we just want to get paid to play with cool toys at work, you gotta work the money stuff out with the bosses.
The problem with Cloud was that it didn’t deliver on easy. It was not the Heroku they envisioned, rather it was a scary AWS amalgamation that required deep AWS knowledge to debug their CloudFormation templates to get it to work. This is not even really their fault!, as CloudFormation is a dumpster fire, there are much better cloud orchestration tools now. And while AWS's first few products were rock solid (S3, EC2, DynamoDB), around this time is when AWS turned on the enterprise growth machine and pivoted into box checking and everything cloud began to turn to quicksand underneath their feet.
On top of cloud quicksand, Datomic Cloud had technical scalability challenges which they brute force engineered through, motivating the architectural shift away from the easy synchronous/local entity api to the annoying async/remote client api, and then back again sorta with Ions -- but at the cost of, like, 3 years of product roadmap, during which time Typescript exploded in popularity, the serverless hype cycle began to break and SQL started making a comeback.
Also, the Java API had poor adoption. selling predominantly into Clojure commercial users isn’t a great market so consequently: no VC, small team, no sales/marketing/devrel - at a time when the startup boom was gaining momentum, developer infra was getting funded for the first time so competitors were spending tons of money on beautiful docs, marketing, full time twitter accounts, etc.
I still think there’s elements of an amazing product here, but the business window for a Clojure database has kinda closed - because Clojure itself has lost momentum. Datomic Cloud at its essence is a clojure hosting platform, the business is Clojure itself. I am very bullish on Clojure, I think it never found its killer app, and once found, once a money vector is opened, a bunch of cash (like, $100M†) will make all these problems go away and deliver on Clojure's mission to bring "simple made easy" to mass market application development, which is desperately needed. Maybe Hyperfiddle & Electric?
† Paraphrasing a Materialize press release, “it takes $100M to build a production ready database” -- and for the record, Materialize isn't doing too well, nobody seems to know what it is for and the VCs replaced the CEO earlier this year. Full quote: "Why did we raise $100M? Put quite simply, we believe this is the order of magnitude of investment that it takes to build a production-ready database. Databases are notoriously hard to get right, and we do not intend to cut any corners."
In the end, Cognitect chose to spend their lives doing something hard that matters, and I deeply respect that.
- panick21_ 4 days ago
  
  > To me this seems like a viable strategy,
  To me this seemed insane. The idea that I would use some pricey consumption based cloud think rather then using a systemd service or docker container locally to do most of my development is crazy. I don't want to deal with Amazon accounts authentication, VPC and all that other jazz just to start with a tiny project.
  Even outside of the other stuff you mentioned.
  > Also, the Java API had poor adoption.
  They also put no effort into it. We use JOOQ, and I don't see why a JOOQ like API for Datomic wasn't doable. With the existing Java API, no wonder no Java shop would use it.
  Not having a simple Docker container I can run and connect to a Spring Boot project within 10 minutes, so that my Java colleagues could use it, made it a a complete non-starter.
  > In the end, Cognitect chose to spend their lives doing something hard that matters, and I deeply respect that.
  Truly building something that really, really matters requires large adoption. And it seem to me every move they made was the opposite.
  I can understand not going open-source, but honestly, to get really adoption, real wide traction, you need to be open and be well integrated into Java/PHP/JS and Python. And it seem to me they never really cared about that much at all.
  - diggan 4 days ago
    
    > Truly building something that really, really matters requires large adoption. And it seem to me every move they made was the opposite.
    > but honestly, to get really adoption, real wide traction
    > they never really cared about that much at all.
    It's all a matter of perspective. Rich been really upfront with that both Clojure and Datomic are products of Rich's solution to particular problems he experienced.
    Datomic does really, really matter, even with the "small" adoption it has, for me. Even if I haven't used it myself a lot. And who are anyone of us to say what "truly matters" when it comes to how we spend our time? Clearly, Datomic does matter, otherwise these people wouldn't have spent a decade building it, so it does matter on some level.
    Maybe that doesn't match up to your "truly, really, really matters" imagination, but it feels kind of weird to reach the conclusion that Datomic doesn't matter, based on what you believe to be impactful.
    Ideas can live on beyond what the original projects carry, which is clearly the case with Datomic, and with basically any project (so far) Rich decided to work on.
- augustl 4 days ago
  
  > I still think there’s elements of an amazing product here
  Interesting framing. The people behind Clojure and Datomic aren't known for being amazing at scaling and shipping products (i.e. marketing and all that jazz).
  This is also not me judging them for it, I haven't built and shipped a Datomic, much less marketed it.
  - dustingetz 4 days ago
    
    They shipped and scaled Clojure!

createaccount99 2 days ago

I didn't get the part about running the queries against the production database. How exactly was it not overloading the database?

And the point he makes about storage capacity seems off; it's often the cost of db storage that's the problem. Why would I pay to store data I'll never need?

smj-edison 4 days ago

Does Datomic have similar features to TerminusDB? I've heard about Datomic before but never took the time to dive in.