Learning From The LND Bug On Lightning – Bitcoin Magazine
This is an opinion editorial by Shinobi, a self-taught educator in the Bitcoin space and tech-oriented Bitcoin podcast host.
On October 9, 2022, Burak from Bitmatrix (a swap tool built on the Liquid Network) created and broadcast a transaction to the main Bitcoin network, spending a UTXO with a Tapscript multisig with a 998-of-999 threshold. This transaction had 998 individual signatures in the witness field, and was almost 0.1 MB in size, and kind of hilariously, reused the exact same public key for every one of the 999 participants in the multisig. This transaction caused a massive disruption for the Lightning Network by exposing a bug in LND and btcd (an alternative client for the Bitcoin network).
The entire purpose of making this transaction was to demonstrate the improved scalability of multisignature scripts that Taproot has enabled. Even without using Schnorr-signature based MuSig protocols, Taproot can enable much larger multisig participant sets than prior versions of Bitcoin Script. This can be a bit of a nuanced discussion in regards to the previous size limitation of multisig if you dive into all the possible ways you can construct multisig with Bitcoin Script, so for the sake of simplicity I am going to simply discuss the previous limits applying to Pay-to-script-hash (P2SH) and Pay-to-witness-script-hash (P2WSH) multisig constructions. When it comes to the standard way to do a P2SH multisig, the maximum size limit of participants is only 15, and in the case of the standard P2WSH multisig the maximum size is 20. These limits are because of how big a script is allowed to be using these different script ops, and limitations in how many processing operations are allowed to be done in the scope of a single script. Violating either of these limits renders a transaction invalid.
With the implementation of Taproot, these script size limits were completely removed, meaning the only limits with Taproot script size are the block size limit itself. This is where the problem comes in regarding LND and btcd. The consensus rules implemented in btcd correctly removed these limits in regards to script size, but the problem is the code base for peer-to-peer communication also implemented checks on script size to add a double layer of defense for node operators. Blocks and transactions would go through a sort of “pre-consensus” consensus validation before even making it to the core consensus code that performs proper validation, the logic being that double checking things adds extra layers of defense against invalid blocks or transactions. This code was not properly updated to remove the script size limits, continuing to enforce prior script limits for SegWit against Taproot transactions. So while the actual consensus code itself would have properly validated this very large Taproot transaction, the block containing it was never actually passed from the peer-to-peer validation into the actual consensus validation logic, meaning that all btcd nodes stalled at the block including Burak’s transaction.
Why did this affect LND, given that many people run Bitcoin Core underneath their LND instance? It is because LND uses the same code btcd does to receive and process blocks. So even if your LND node was running on top of Bitcoin Core, which would have properly validated the relevant block and not stalled, your LND instance would have refused to accept that block and stalled even though your main chain node continued progressing properly.
This bug was very quickly patched, and to my knowledge was not actively exploited in a way that led to any harm, but this left open every LND node on the Lightning Network to potential theft of funds in channels unless they were using an external watchtower. Because the node was stalled at that block, it did not have a real time view of the blockchain, and in the event that a channel counterparty had submitted an old channel state to the blockchain it would have been completely unaware of it and unable to respond with the appropriate penalty transaction to secure the user’s funds. This was a very serious bug that put a massive percentage of the bitcoin on the Lightning Network at risk of theft unless users were manually patching and updating their nodes themselves, or personally monitoring their channels to be able to respond manually in the event of a closure with an outdated state. I must say that the vast majority of non-technical node operators would probably not have been able to do so.
Thankfully this issue was not widely exploited, but had this been discovered in the codebase before Burak’s transaction was pushed to the blockchain, this could have been intentionally exploited by bad actors in a very tactical way. An individual, or a group of people, could have very easily opened a large number of channels on the network and swapped all of the money in those channels back to themselves on-chain through a submarine swap, leaving all of the funds in the channel on the other side, and then submitted a large Taproot transaction like Burak did, immediately closing out their channels using an outdated state. The victims would not even be aware of it, and even if they were, given the relatively low technical competence of many node operators, it is very likely that most people would not have been able to respond in time to manually correct the issue with a penalty transaction.
This bug highlights two important issues to consider. Firstly, multiple independent implementations of Bitcoin nodes can be very dangerous. Thankfully, almost no one runs btcd as a node for anything serious, so the effect this had on the base Bitcoin network was something that could be completely ignored, except for a very tiny handful of individuals whose nodes simply stalled out. If miners had been running btcd, this could have very easily resulted in a chainsplit on the Bitcoin network that would have taken all btcd operators off on a minority chain that would have required manual intervention to correct. The second issue is that in the case of second layers above the main network, implementations of consensus checks should be done very carefully. This is a tricky issue, because while any Lightning node running on top of a Bitcoin full node could in theory simply outsource 100% of this validation to that node, not all Lightning nodes do make use of their own trusted full node. That is unlikely to change — many users will in all likelihood continue to operate nodes in such a manner, so to some degree checks on some or all of the Bitcoin consensus rules must be also supported in Lightning implementations as well.
Going forward I hope this is a wake-up call to how important it is to ensure that consensus validation checks are all in sync with each other across software in this space, as without that synchronicity between everything there isn’t actually a singular coherent Bitcoin network. Everyone should be very happy that this did not result in a massive exploit across the entire network, but people should be aware of how serious this issue could have been had things not played out the way they did.
This is a guest post by Shinobi. Opinions expressed are entirely their own and do not necessarily reflect those of BTC Inc or Bitcoin Magazine.