When is it appropriate to run a node locally?

In this part of the spec it says that you shouldn’t be connecting to an external node.

Running a full node on our platform is a rather involved process, generally requiring special hardware, staking and constantly updating your code to retain your network access. What’s more a node is randomly assigned to a network shard and can only act as a client to other shards.

Generally the way people are going to be interacting with the system is using a client which talks to any node. This client allows the construction of transactions and verifying cryptographically what transactions have actually been run.

Could you talk me through what exactly this rule is trying to prevent, so I can work out how best to apply it to our system?

Welcome to the community, @david.md!

Looking at the docs you linked, I see that there really isn’t any explanation for why this is considered a “mistake” :man_facepalming:. Happy to provide the context.

We run a full node (also known as “p2p node”, “archival node”, “consensus node”) for every asset we support at Coinbase. Our security team considers it an “integration blocker” to rely on some external party (some external node) for connectivity to the network. Some of the major reasons for this view (not comprehensive):

  • We don’t want to be dependent on the reliability or security practices of a third party. We want to peer with other nodes ourselves (i.e. what if the node we connect to has lax security practices and ends up getting eclipsed, we have no way of knowing whether or not this is occurring).
  • We don’t want to trust a third party to propagate our transactions to other nodes. It is very easy for some malicious third party to swallow all of our broadcasts and report that they were broadcast.
  • We don’t want to trust a third party to serve correct/unmanipulated block data. As you may imagine, we are heavily dependent on the correctness/security of block data both for recognizing deposits and confirming withdrawals (as you will see in the Rosetta Spec, we don’t trust the node to reconcile balances or know about private keys either). We’d much rather run a cluster of independently connected nodes that we can cross-check to get a higher level of confidence about data correctness.

Could you elaborate on what this means?

Does it have some cryptographic guarantee that some entire block is authentic (I’m no crypto wiz but this sounds tough)? Seems like this is impossible to provide if all data is served by another party as the client would have to make some assumption somewhere based on the validator set that the third party provided (the only cryptographic guarantees I’m familiar with are ones where we have proof some validator signed some block)?