Create Publication

We are looking for publications that demonstrate building dApps or smart contracts!
See the full list of Gitcoin bounties that are eligible for rewards.

Tutorial Thumbnail
Intermediate · 30 minutes

Indexer For Blockchain Analysis

This tutorial will go over a few examples of using the ‘indexer’ to calculate Algorand blockchain metrics. It will cover client instantiation, data extraction, data processing and metrics calculations.

Requirements

Required

  • Python >= 3.6
  • Some python IDE. Since this tutorial is a step-by-step tutorial, using Jupyter notebook/ipython is recommended
  • py-algorand-sdk
  • Indexer:
    there are two options to use the indexer:
    1. An infrastructure service (like purestake, algorand-explorer, etc.) - this is a simple solution. It will provide you read access to the indexer using the sdk.
    2. install indexer - Installing the indexer locally will provide you access to the Algorand blockchain data independently of external providers. It might take a few days to install, since the indexer needs to catchup with the Algorand blockchain from its genesis.

Optional

  • Virtual environment

Background

The Algorand blockchain can process transactions at high throughput. Parsing the blocks to find the information of interest might be time-consuming. The indexer provides convenient and quick access to the data recorded on the Algorand blockchain.

In this tutorial, we will go over the data extraction process by getting the data of all transactions from the first week of August 2020. We will go over how to process the results and apply some metrics.

Steps

1. Environment Setup

Install all dependencies in your environment:

pip install pandas
pip install dateutil
pip install py-algorand-sdk

2. Imports and Instantiations

First, lets import all packages needed for this tutorial.

import os
import json
from time import sleep
from datetime import timezone

from dateutil.rrule import HOURLY, rrule
from dateutil.parser import parse
import pandas as pd

from algosdk.v2client import indexer

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

Now, we need to instantiate the indexer client. There are 3 fields we need to provide - indexer_token, indexer_address, and headers (optional).

Info

If you are using an infrastracture service, the information can be found after login. For example, on purestake.io:
EditorImages/2020/09/03 00:59/purestake.png

Tip

Save your API key as an environment variable for easy access

indexer_token = os.environ.get('TOKEN')
myindexer = indexer.IndexerClient(indexer_token=indexer_token, indexer_address='https://mainnet-algorand.api.purestake.io/idx2')

3. Transactions Extraction

In the function below we extract all transactions recorded on the blockchain between start_time and end_time.

def get_txn_response(start_time, end_time):
    """
    Returns all transactions added to the blockchain between 'start_time' and 'end_time'

    """
    # The indexer expacting time inputs to be in RFC 3339 format
    start_time = start_time.astimezone(timezone.utc).isoformat('T')
    end_time = end_time.astimezone(timezone.utc).isoformat('T')

    nexttoken = ""
    numtx = 1

    responses = []

    # Retrieve up-to 1000 transactions at each request.
    while numtx > 0:
        response = myindexer.search_transactions(start_time=start_time, end_time=end_time,
                                                 next_page=nexttoken, limit=1000)
        transactions = response['transactions']
        responses += transactions
        numtx = len(transactions)
        if numtx > 0:
            # pointer to the next chunk of requests
            nexttoken = response['next-token']
    return responses

Note

The indexer provider might have some limitations on the number of requests and their duration.
In this example, we split the week’s worth of data into chunks of 6 hours to avoid exceeding the request processing time limit.

Try using other chunk sizes

start = "20200801T000000"
end = "20200808T000000"

# get a list of dates between start date and end date with 6 hours intervals
dates = list(rrule(HOURLY,
                   dtstart=parse(start),
                   until=parse(end),
                   interval=6))

Now, we can loop over dates and combine all the data:

Tip

If you have memory limitations, save each chunk separately and use a parallel computing library (like spark, dask, etc.) for computations.

responses = []
for start_time, end_time in zip(dates[:-1], dates[1:]):
    responses += get_txn_response(start_time, end_time)
df = pd.DataFrame(responses)
df.head()

asset-transfer-transactionclose-rewardsclosing-amountconfirmed-roundfeefirst-validgenesis-hashgenesis-ididintra-round-offsetlast-validnotereceiver-rewardsround-timesendersender-rewardssignaturetx-typepayment-transactiongroupleasekeyreg-transactionasset-config-transactioncreated-asset-index
0{'amount': 31000, 'asset-id': 2512768, 'close-...00819518110008195123wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=mainnet-v1.0RPQ7BVPQQENXXJDFLYCMSVFU7QV6Z44LDIIAVISBUHSXCO...08196123gqZwVHhuSWTZJGI0OGY1NTUwLTA4NGEtNDZkNS05OGUxLW...01596254404SN3FSQVLLNWSI3I7QCMQEQN6L42OW5RP4FNM6BNOZHBRKH...0{'sig': 'X73FojlwcRlV8A8A9bWvWPKq/6i5qxBOb/xQZ...axferNaNNaNNaNNaNNaNNaN
1NaN00819518110008195127wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=mainnet-v1.0BGGD5TM6IFZBTDCDCNGUJ4PBDQFAG5RZTATKB7T34LGVKN...18196127gqZwVHhuSWTZJDgzZmU0MmQ5LTg2MjYtNDI0NC1hZDczLT...0159625440462FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV...0{'sig': 'P0qA6tng2r6e1aSwoG2sNKQEXjoGWTua+YykQ...pay{'amount': 201000, 'close-amount': 0, 'receive...NaNNaNNaNNaNNaN
2{'amount': 100, 'asset-id': 2513338, 'close-am...00819518110008195127wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=mainnet-v1.0HHFNE2KELVHPL4TGI6B6PXDIZQ6ROGFH7NOUDCJKCRBHXB...28196127gqZwVHhuSWTZJDI4NDE1NmM2LTQxNzAtNDhmNy1hMmFkLT...0159625440462FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV...0{'sig': 'jNSYv/8dclBfXkJU2JEDJIVj8rlcnLZRLdd1R...axferNaNNaNNaNNaNNaNNaN
3{'amount': 0, 'asset-id': 2512768, 'close-amou...00819518110008195134wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=mainnet-v1.0NS4Z2S3NCKMVZSY6VX4CKE7QTN7N5JLT7AVYQFJSRVRLUS...38196134gqZwVHhuSWTZJDY0ZThhOGUyLTBhMTItNDdlNC04N2Q3LW...01596254404BXUZYY5XN2PFLBJE3BX2RDMQLQUW5ZUWX4ZCNDUL6QMXJZ...0{'sig': 'FbhxqPDlZxDcaAELShez+rXNQuTR4rRVP4vpe...axferNaNNaNNaNNaNNaNNaN
4{'amount': 100, 'asset-id': 2513338, 'close-am...00819518110008195123wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=mainnet-v1.02S7TV7FI4TMPO7GXZI55PPW3FQE2PJVQ7F34LCS3C6Z7NW...48196123gqZwVHhuSWTZJGJmYjlhNGY4LTNmODgtNDQ1My04YWY5LW...0159625440462FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV...0{'sig': '3QqczHlaOjymMeoEZK/ULKSbXv8pYP59AJqt4...axferNaNNaNNaNNaNNaNNaN

4. Preprocessing the Dataset

Notice that some of the columns in the table above have fields of type ‘dict’. Making calculations on nested fields complicated and messy. In this step, we will parse those fields into new columns.

df = pd.concat([df.drop(['asset-transfer-transaction'], axis=1),
                df['asset-transfer-transaction'].apply(pd.Series).add_suffix('-asset-txn').drop(['0-asset-txn'], axis=1, errors='ignore')], axis=1)

df = pd.concat([df.drop(['payment-transaction'], axis=1), 
                df['payment-transaction'].apply(pd.Series).add_suffix('-pay-txn').drop(['0-pay-txn'], axis=1, errors='ignore')], axis=1)

df = pd.concat([df.drop(['asset-config-transaction'], axis=1),
                df['asset-config-transaction'].apply(pd.Series).add_suffix('-asst-cnfg-txn').drop(['0-asst-cnfg-txn'], axis=1, errors='ignore')], axis=1)

df = pd.concat([df.drop(['signature'], axis=1), 
                df['signature'].apply(pd.Series).add_suffix('-sig').drop(['0-sig'], axis=1, errors='ignore')], axis=1)

# format the unix seconds to human readble date format
df['date'] = pd.to_datetime(df['round-time'], unit='s').dt.date

df.head()

close-rewardsclosing-amountconfirmed-roundfeefirst-validgenesis-hashgenesis-ididintra-round-offsetlast-validnotereceiver-rewardsround-timesendersender-rewardstx-typegroupleasekeyreg-transactioncreated-asset-indexamount-asset-txnasset-id-asset-txnclose-amount-asset-txnclose-to-asset-txnreceiver-asset-txnsender-asset-txnamount-pay-txnclose-amount-pay-txnclose-remainder-to-pay-txnreceiver-pay-txnasset-id-asst-cnfg-txnparams-asst-cnfg-txnsig-sigmultisig-siglogicsig-sigdateday
000819518110008195123wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=mainnet-v1.0RPQ7BVPQQENXXJDFLYCMSVFU7QV6Z44LDIIAVISBUHSXCO...08196123gqZwVHhuSWTZJGI0OGY1NTUwLTA4NGEtNDZkNS05OGUxLW...01596254404SN3FSQVLLNWSI3I7QCMQEQN6L42OW5RP4FNM6BNOZHBRKH...0axferNaNNaNNaNNaN31000.02512768.00.0NaNIWPA7EBIWKXLROZRFTNQKYKYCAMDSHJQL6QKEMQAO6DKXT...NaNNaNNaNNaNNaNNaNNaNX73FojlwcRlV8A8A9bWvWPKq/6i5qxBOb/xQZlqXc+Zuba...NaNNaN2020-08-01Saturday
100819518110008195127wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=mainnet-v1.0BGGD5TM6IFZBTDCDCNGUJ4PBDQFAG5RZTATKB7T34LGVKN...18196127gqZwVHhuSWTZJDgzZmU0MmQ5LTg2MjYtNDI0NC1hZDczLT...0159625440462FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV...0payNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN201000.00.0NaNMGY5TCZEUVOINTRXXPEHFQ3OTBVPOQGMMPANECG2WHASVY...NaNNaNP0qA6tng2r6e1aSwoG2sNKQEXjoGWTua+YykQtf8VVvtwl...NaNNaN2020-08-01Saturday
200819518110008195127wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=mainnet-v1.0HHFNE2KELVHPL4TGI6B6PXDIZQ6ROGFH7NOUDCJKCRBHXB...28196127gqZwVHhuSWTZJDI4NDE1NmM2LTQxNzAtNDhmNy1hMmFkLT...0159625440462FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV...0axferNaNNaNNaNNaN100.02513338.00.0NaNEBDDABI4F7IWCPLHMICNYKUCAEPCBFP5HXKUITS4SSSQKG...NaNNaNNaNNaNNaNNaNNaNjNSYv/8dclBfXkJU2JEDJIVj8rlcnLZRLdd1RYvJbfOWMQ...NaNNaN2020-08-01Saturday
300819518110008195134wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=mainnet-v1.0NS4Z2S3NCKMVZSY6VX4CKE7QTN7N5JLT7AVYQFJSRVRLUS...38196134gqZwVHhuSWTZJDY0ZThhOGUyLTBhMTItNDdlNC04N2Q3LW...01596254404BXUZYY5XN2PFLBJE3BX2RDMQLQUW5ZUWX4ZCNDUL6QMXJZ...0axferNaNNaNNaNNaN0.02512768.00.0NaNBXUZYY5XN2PFLBJE3BX2RDMQLQUW5ZUWX4ZCNDUL6QMXJZ...NaNNaNNaNNaNNaNNaNNaNFbhxqPDlZxDcaAELShez+rXNQuTR4rRVP4vpeDassKN0pt...NaNNaN2020-08-01Saturday
400819518110008195123wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=mainnet-v1.02S7TV7FI4TMPO7GXZI55PPW3FQE2PJVQ7F34LCS3C6Z7NW...48196123gqZwVHhuSWTZJGJmYjlhNGY4LTNmODgtNDQ1My04YWY5LW...0159625440462FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV...0axferNaNNaNNaNNaN100.02513338.00.0NaNM2UEUR6EP3JSI2BGGPXEXUIOTTJMNL7A6VSPE357N2CBFA...NaNNaNNaNNaNNaNNaNNaN3QqczHlaOjymMeoEZK/ULKSbXv8pYP59AJqt4PpRrUFfnp...NaNNaN2020-08-01Saturday

5. Calculate Metrics

Now that we have the processed table, we calculate some metrics and get some insights on the Algorand blockchain.

Total number of transactions

len(df)

4687303

Types of transactions

df['tx-type'].value_counts()

axfer     3599820
pay       1087417
acfg           57
keyreg          9
Name: tx-type, dtype: int64

Where axfer is asset transfer, pay is payment, acfg is asset config and keyreg is key registration.
More on transaction types can be found here.

Total amount of algos transfered during this week

df['amount-pay-txn'].sum()

251299842396859.0

Distribution of transactions over days of the weeks

df['day'] = pd.to_datetime(df['date']).dt.day_name()
df['day'].value_counts().plot()

EditorImages/2020/09/03 01:42/days_plot.png

Unique assets transfered

df['asset-id-asset-txn'].nunique()

64

df['asset-id-asset-txn'].value_counts()

2512768.0    1699636
2513338.0    1237993
2513746.0     661888
312769.0          87
163650.0          25
438840.0          22
1942776.0         14
604.0             14
438839.0          12
...
137489.0           1
Name: asset-id-asset-txn, dtype: int64

The transactions table provides us with asset ids but doesn’t have the asset name. In the next step we extract more data to learn more about the transactions.

6. Extract Assets

Just like with the transactions extraction, we create a function to loop over the requests.

def get_txn_response():
    nexttoken = ""
    numtx = 1

    responses = []

    # Retrieve up-to 1000 assets at each request.
    while numtx > 0:
        response = myindexer.search_assets(next_page=nexttoken)
        assets = response['assets']
        responses += assets
        numtx = len(assets)
        if numtx > 0:
            # pointer to the next chunk of requests
            nexttoken = response['next-token']
    return responses

assets = pd.DataFrame(get_txn_response())
assets.head()

indexparams
05{'clawback': 'L226FSG3LTZR4V2MI5M4SDKJSF5HP2TQ...
16{'clawback': 'QYQYXRJ7FLQCRRGE3WH5CIAAL56LLMIN...
2235{'clawback': 'QA75IQ76F6H2T55G65BY7BPLF5QNWSLT...
3236{'clawback': 'QA75IQ76F6H2T55G65BY7BPLF5QNWSLT...
4237{'clawback': 'QA75IQ76F6H2T55G65BY7BPLF5QNWSLT...

As in the preprocessing above, we parse the column with the dict fields:

assets = pd.concat([assets.drop(['params'], axis=1),
                    assets['params'].apply(pd.Series)], axis=1)
assets.head()

And now, we combine the assets value count from above with the assets names.

assets_counts = pd.DataFrame(df['asset-id-asset-txn'].value_counts()).join(assets[['index', 'name', 'unit-name']].set_index('index'))

assets_counts.head()

asset-id-asset-txnnameunit-name
2512768.01699636YouNow Pending PropsxPROPS
2513338.01237993Camfrog Pending PropsxPROPS
2513746.0661888Paltalk Pending PropsxPROPS
312769.087Tether USDtUSDt
163650.025Asia Reserve Currency CoinARCC

pd.DataFrame(assets_counts.groupby('unit-name')['asset-id-asset-txn'].sum()).nlargest(10, 'asset-id-asset-txn').plot()

EditorImages/2020/09/03 02:00/assets_plot.png

Conclusion

There is a lot of information recorded on the blockchain and a lot of fascinating insights that can be derived from it. The indexer provides an easy tool to access this data.

Learn more on how to use the indexer and don’t be shy to share your insights with us