Handle missed blocks for `/block`

Hello!

I’m trying to handle missed blocks for /block method. If no blocks are produced, they are being missed. Missed blocks don’t have any information stored, so querying those kind of blocks doesn’t return any information, like hash, index etc.

What could be a way to handle those kind of blocks?

When I run rosetta-cli check:data --configuration-file rosetta-cli-conf/mainnet/config.json missed blocks are not counted as blocks (nor synced or orphaned). So if I mark those as non-existing it with error displayed, syncing process of rosetta-cli stops.

Could you please tell how to mark those kind of blocks as orphaned for example? Or what’s the check rosetta-cli uses to determine if a block is orphaned?

Hi @duset!

Great question! We addressed this in version 1.4.2 of the spec. From the documentation for BlockResponse:

As a result of the consensus algorithm of some blockchains, blocks can be omitted (i.e. certain block indices can be skipped). If a query for one of these omitted indices is made, the response should not include a Block object. It is VERY important to note that blocks MUST still form a canonical, connected chain of blocks where each block has a unique index. In other words, the PartialBlockIdentifier of a block after an omitted block should reference the last non-omitted block.

If you are interested, you can also take a look at the discussion on a similar question here. :slightly_smiling_face:

Thanks!

1 Like

TL;DR, your block response should be a 200 with:

{
  "block": null
}

or

{}

Hey, thank you! That’s helpful!

Yes, that helped. Thank you!

Now I meet a problem that some blocks can be missed by rosetta check:data for example it checks block 46950, skips 46951-46952 and then continues with 46952, so it missed 2 blocks. Even though, if I use view:block for those indexes, it works fine.

Please let me know if that’s any kind of a known issue. In the meanwhile, I’ll continue investigation of this problem

Thanks!

Could you share some rosetta-cli logs from when you observed this skip? I’m not aware of any known issues around our skip block handling (we are currently using it to test a few other chains that skip blocks).

I was playing around with your implementation (nice work btw!) and everything looked fine to me. For example, block 6 correctly references block 4 (when block 5 is skipped):

{
    "block": {
        "block_identifier": {
            "index": 6,
            "hash": "336fa27c77036379a5764bce7b208b3cffeea8b0e8cda4efb2893259646403d4"
        },
        "parent_block_identifier": {
            "index": 4,
            "hash": "5c6b778730425b5f102675563ed4495e2392ffec3d98df32a1f92502a2cb8d2f"
        },
        "timestamp": 1606824095000,
        "transactions": null,
        "metadata": {
            "epoch": 0
        }
    }
}

I also checked all blocks between 46949-46953 and didn’t see anything that looked off! Maybe the rosetta-cli logs are misleading?

Hey, To check the skip I just printed out some debug messages and explored the logs. And some of the blocks were not printed out as I could check.

But in general the issue is the following:
Once the check:data is started rosetta-cli will start syncing blocks and at some point it will stop at a certain block and continiously check the same block ±1 index number. While the rosetta-cli sync height grows, all of the following blocks are considered as orphaned and that goes to infinite loop.

Here is a concrete example:
[STATS] Blocks: 379689 (Orphaned: 124292)
[PROGRESS] Blocks Synced: 259462/273242
At block 259462 rosetta-cli tries to check the same block continiously.
While batch of blocks that is added to [STATS] part increases, all of them are stored as Orphaned.

I checked block number 259462 it’s parent hash is the correct one and exists in fact.

Also the problem can be recreated, if I clear all beacon-chain data and rosetta-cli data. And for example when the block height of beacon is 50000 I start the check:data it will go to that infinite loop at block height around 50000. Even cleaning the rosetta-cli data won’t change anything, the break will appear at the same block height (50000).

I spent some time digging in here and was able to sync to 100k+ without any issue (as long as the node was synced to a height higher than where the rosetta-cli was fetching):

[STATS] Blocks: 100768 (Orphaned: 0) Transactions: 0 Operations: 0 Accounts: 0 Reconciliations: 0 (Inactive: 0, Exempt: 0, Skipped: 0, Coverage: 0.000000%)
[PROGRESS] Blocks Synced: 103034/104127 (Completed: 98.950320%, Rate: 219.182609/second) Time Remaining: 4s Reconciler Queue: 0 (Last Index Checked: -1)

However, when I got close to the locally synced tip I noticed an interesting pattern related to Orphaned (using the log_blocks config):

{
    "network": {
      "blockchain": "Ethereum 2.0",
      "network": "Mainnet"
    },
    "data_directory": "cli-data",
    "http_timeout": 10,
    "tip_delay": 300,
    "max_retries": 5,
    "max_online_connections": 1000,
    "compression_disabled": true,
    "memory_limit_disabled": true,
    "data": {
    "initial_balance_fetch_disabled": true,
    "historical_balance_enabled": false,
    "reconciliation_disabled": true,
    "inactive_discrepency_search_disabled": true,
    "balance_tracking_disabled": true,
    "log_blocks": true,
    "end_conditions": {
        "tip": true
        }
    }
}

When close to tip, the rosetta-cli seems to almost always orphan a block on the epoch boundary even though the hashes don’t change.

For example (block 104288 is in 3259 and 104287 is in 3258):

Add Block 104288:388b8895e91120459a5a18ab11d3f5d27b77587c6808b9f9e33c9cee6dec1e92 with Parent Block 104287:23ac83fd1689e2b3da382eca9325d49bb0bdb3e3c0bbf7c4fc61d68008d5e7a1
Remove Block 104288:388b8895e91120459a5a18ab11d3f5d27b77587c6808b9f9e33c9cee6dec1e92
Add Block 104288:388b8895e91120459a5a18ab11d3f5d27b77587c6808b9f9e33c9cee6dec1e92 with Parent Block 104287:23ac83fd1689e2b3da382eca9325d49bb0bdb3e3c0bbf7c4fc61d68008d5e7a1

Strangely, this sort of behavior does not happen when we are far from tip. I’m wondering if prysm is doing something weird with status responses when we are close to the synced tip? I’m still investigating but wanted to share my initial findings.

Thank you very much, I’ve been trying to replace the highest block with the finalized, which is 2 epochs behind the highest, thinking that would help. But got no luck either.

Could you please tell me or point to the sources of, how and which blocks are considered as orphaned?

We will orphan a block when its ParentBlockIdentifier does not matched the BlockIdentifier of the last synced block. You can find the logic for this here:

Just to offer some further help, if the /network/status endpoint returns a BlockIdentifer < the last synced BlockIdentifier it will also cause the tip to be orphaned.

Hopefully that helps!

Hey, I’ve been trying to debug this case. But got some unexpected results :joy:
So after cleaning all the data folders and restarting the process, once the eth2 node has synced. All the checks have passed successfully. The only issue is that I didn’t fix anything…

We :heart: non-deterministic bugs!!

This is similar to the behavior I observed…I think there is something weird going on somewhere in /network/status while the node is syncing. When reaching tip, did it keep trying to “orphan” or did it look ok?

I think you are right, seems like this problem appears only when the node is syncing. This time I was checking if the node is always synced, while Rosetta indexes all the blocks. And as it was indeed fully synced, the check was successful, so it didn’t “orphan” any of the blocks.