Intermediate · 30 minutes

Indexer For Blockchain Analysis

This tutorial will go over a few examples of using the ‘indexer’ to calculate Algorand blockchain metrics. It will cover client instantiation, data extraction, data processing and metrics calculations.

Anya Gilad Algorand

September 08, 2020

Python

简体中文

Requirements

Required

Python >= 3.6
Some python IDE. Since this tutorial is a step-by-step tutorial, using Jupyter notebook/ipython is recommended
py-algorand-sdk
Indexer:
there are two options to use the indexer:
1. An infrastructure service (like purestake, algorand-explorer, etc.) - this is a simple solution. It will provide you read access to the indexer using the sdk.
2. install indexer - Installing the indexer locally will provide you access to the Algorand blockchain data independently of external providers. It might take a few days to install, since the indexer needs to catchup with the Algorand blockchain from its genesis.

Optional

Virtual environment

Background

The Algorand blockchain can process transactions at high throughput. Parsing the blocks to find the information of interest might be time-consuming. The indexer provides convenient and quick access to the data recorded on the Algorand blockchain.

In this tutorial, we will go over the data extraction process by getting the data of all transactions from the first week of August 2020. We will go over how to process the results and apply some metrics.

1. Environment Setup

Install all dependencies in your environment:

pip install pandas
pip install dateutil
pip install py-algorand-sdk

2. Imports and Instantiations

First, lets import all packages needed for this tutorial.

import os
import json
from time import sleep
from datetime import timezone

from dateutil.rrule import HOURLY, rrule
from dateutil.parser import parse
import pandas as pd

from algosdk.v2client import indexer

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

Now, we need to instantiate the indexer client. There are 3 fields we need to provide - indexer_token, indexer_address, and headers (optional).

Info

If you are using an infrastracture service, the information can be found after login. For example, on purestake.io:
EditorImages/2020/09/03 00:59/purestake.png

Tip

Save your API key as an environment variable for easy access

indexer_token = os.environ.get('TOKEN')
myindexer = indexer.IndexerClient(indexer_token=indexer_token, indexer_address='https://mainnet-algorand.api.purestake.io/idx2')

3. Transactions Extraction

In the function below we extract all transactions recorded on the blockchain between start_time and end_time.

def get_txn_response(start_time, end_time):
    """
    Returns all transactions added to the blockchain between 'start_time' and 'end_time'

    """
    # The indexer expacting time inputs to be in RFC 3339 format
    start_time = start_time.astimezone(timezone.utc).isoformat('T')
    end_time = end_time.astimezone(timezone.utc).isoformat('T')

    nexttoken = ""
    numtx = 1

    responses = []

    # Retrieve up-to 1000 transactions at each request.
    while numtx > 0:
        response = myindexer.search_transactions(start_time=start_time, end_time=end_time,
                                                 next_page=nexttoken, limit=1000)
        transactions = response['transactions']
        responses += transactions
        numtx = len(transactions)
        if numtx > 0:
            # pointer to the next chunk of requests
            nexttoken = response['next-token']
    return responses

Note

The indexer provider might have some limitations on the number of requests and their duration.
In this example, we split the week’s worth of data into chunks of 6 hours to avoid exceeding the request processing time limit.

Try using other chunk sizes

start = "20200801T000000"
end = "20200808T000000"

# get a list of dates between start date and end date with 6 hours intervals
dates = list(rrule(HOURLY,
                   dtstart=parse(start),
                   until=parse(end),
                   interval=6))

Now, we can loop over dates and combine all the data:

Tip

If you have memory limitations, save each chunk separately and use a parallel computing library (like spark, dask, etc.) for computations.

responses = []
for start_time, end_time in zip(dates[:-1], dates[1:]):
    responses += get_txn_response(start_time, end_time)
df = pd.DataFrame(responses)
df.head()

	asset-transfer-transaction	confirmed-round	fee	first-valid	genesis-hash	genesis-id	id	intra-round-offset	last-valid	note	round-time	sender	signature	tx-type	payment-transaction	group	lease	keyreg-transaction	asset-config-transaction	created-asset-index
0	{'amount': 31000, 'asset-id': 2512768, 'close-...	8195181	1000	8195123	wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=	mainnet-v1.0	RPQ7BVPQQENXXJDFLYCMSVFU7QV6Z44LDIIAVISBUHSXCO...	0	8196123	gqZwVHhuSWTZJGI0OGY1NTUwLTA4NGEtNDZkNS05OGUxLW...	1596254404	SN3FSQVLLNWSI3I7QCMQEQN6L42OW5RP4FNM6BNOZHBRKH...	{'sig': 'X73FojlwcRlV8A8A9bWvWPKq/6i5qxBOb/xQZ...	axfer	NaN	NaN	NaN	NaN	NaN	NaN
1	NaN	8195181	1000	8195127	wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=	mainnet-v1.0	BGGD5TM6IFZBTDCDCNGUJ4PBDQFAG5RZTATKB7T34LGVKN...	1	8196127	gqZwVHhuSWTZJDgzZmU0MmQ5LTg2MjYtNDI0NC1hZDczLT...	1596254404	62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV...	{'sig': 'P0qA6tng2r6e1aSwoG2sNKQEXjoGWTua+YykQ...	pay	{'amount': 201000, 'close-amount': 0, 'receive...	NaN	NaN	NaN	NaN	NaN
2	{'amount': 100, 'asset-id': 2513338, 'close-am...	8195181	1000	8195127	wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=	mainnet-v1.0	HHFNE2KELVHPL4TGI6B6PXDIZQ6ROGFH7NOUDCJKCRBHXB...	2	8196127	gqZwVHhuSWTZJDI4NDE1NmM2LTQxNzAtNDhmNy1hMmFkLT...	1596254404	62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV...	{'sig': 'jNSYv/8dclBfXkJU2JEDJIVj8rlcnLZRLdd1R...	axfer	NaN	NaN	NaN	NaN	NaN	NaN
3	{'amount': 0, 'asset-id': 2512768, 'close-amou...	8195181	1000	8195134	wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=	mainnet-v1.0	NS4Z2S3NCKMVZSY6VX4CKE7QTN7N5JLT7AVYQFJSRVRLUS...	3	8196134	gqZwVHhuSWTZJDY0ZThhOGUyLTBhMTItNDdlNC04N2Q3LW...	1596254404	BXUZYY5XN2PFLBJE3BX2RDMQLQUW5ZUWX4ZCNDUL6QMXJZ...	{'sig': 'FbhxqPDlZxDcaAELShez+rXNQuTR4rRVP4vpe...	axfer	NaN	NaN	NaN	NaN	NaN	NaN
4	{'amount': 100, 'asset-id': 2513338, 'close-am...	8195181	1000	8195123	wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=	mainnet-v1.0	2S7TV7FI4TMPO7GXZI55PPW3FQE2PJVQ7F34LCS3C6Z7NW...	4	8196123	gqZwVHhuSWTZJGJmYjlhNGY4LTNmODgtNDQ1My04YWY5LW...	1596254404	62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV...	{'sig': '3QqczHlaOjymMeoEZK/ULKSbXv8pYP59AJqt4...	axfer	NaN	NaN	NaN	NaN	NaN	NaN

4. Preprocessing the Dataset

Notice that some of the columns in the table above have fields of type ‘dict’. Making calculations on nested fields complicated and messy. In this step, we will parse those fields into new columns.

df = pd.concat([df.drop(['asset-transfer-transaction'], axis=1),
                df['asset-transfer-transaction'].apply(pd.Series).add_suffix('-asset-txn').drop(['0-asset-txn'], axis=1, errors='ignore')], axis=1)

df = pd.concat([df.drop(['payment-transaction'], axis=1), 
                df['payment-transaction'].apply(pd.Series).add_suffix('-pay-txn').drop(['0-pay-txn'], axis=1, errors='ignore')], axis=1)

df = pd.concat([df.drop(['asset-config-transaction'], axis=1),
                df['asset-config-transaction'].apply(pd.Series).add_suffix('-asst-cnfg-txn').drop(['0-asst-cnfg-txn'], axis=1, errors='ignore')], axis=1)

df = pd.concat([df.drop(['signature'], axis=1), 
                df['signature'].apply(pd.Series).add_suffix('-sig').drop(['0-sig'], axis=1, errors='ignore')], axis=1)

# format the unix seconds to human readble date format
df['date'] = pd.to_datetime(df['round-time'], unit='s').dt.date

df.head()

	confirmed-round	fee	first-valid	genesis-hash	genesis-id	id	intra-round-offset	last-valid	note	round-time	sender	tx-type	group	lease	keyreg-transaction	created-asset-index	amount-asset-txn	asset-id-asset-txn	close-amount-asset-txn	close-to-asset-txn	receiver-asset-txn	sender-asset-txn	amount-pay-txn	close-amount-pay-txn	close-remainder-to-pay-txn	receiver-pay-txn	asset-id-asst-cnfg-txn	params-asst-cnfg-txn	sig-sig	multisig-sig	logicsig-sig	date	day
0	8195181	1000	8195123	wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=	mainnet-v1.0	RPQ7BVPQQENXXJDFLYCMSVFU7QV6Z44LDIIAVISBUHSXCO...	0	8196123	gqZwVHhuSWTZJGI0OGY1NTUwLTA4NGEtNDZkNS05OGUxLW...	1596254404	SN3FSQVLLNWSI3I7QCMQEQN6L42OW5RP4FNM6BNOZHBRKH...	axfer	NaN	NaN	NaN	NaN	31000.0	2512768.0	0.0	NaN	IWPA7EBIWKXLROZRFTNQKYKYCAMDSHJQL6QKEMQAO6DKXT...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	X73FojlwcRlV8A8A9bWvWPKq/6i5qxBOb/xQZlqXc+Zuba...	NaN	NaN	2020-08-01	Saturday
1	8195181	1000	8195127	wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=	mainnet-v1.0	BGGD5TM6IFZBTDCDCNGUJ4PBDQFAG5RZTATKB7T34LGVKN...	1	8196127	gqZwVHhuSWTZJDgzZmU0MmQ5LTg2MjYtNDI0NC1hZDczLT...	1596254404	62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV...	pay	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	201000.0	0.0	NaN	MGY5TCZEUVOINTRXXPEHFQ3OTBVPOQGMMPANECG2WHASVY...	NaN	NaN	P0qA6tng2r6e1aSwoG2sNKQEXjoGWTua+YykQtf8VVvtwl...	NaN	NaN	2020-08-01	Saturday
2	8195181	1000	8195127	wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=	mainnet-v1.0	HHFNE2KELVHPL4TGI6B6PXDIZQ6ROGFH7NOUDCJKCRBHXB...	2	8196127	gqZwVHhuSWTZJDI4NDE1NmM2LTQxNzAtNDhmNy1hMmFkLT...	1596254404	62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV...	axfer	NaN	NaN	NaN	NaN	100.0	2513338.0	0.0	NaN	EBDDABI4F7IWCPLHMICNYKUCAEPCBFP5HXKUITS4SSSQKG...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	jNSYv/8dclBfXkJU2JEDJIVj8rlcnLZRLdd1RYvJbfOWMQ...	NaN	NaN	2020-08-01	Saturday
3	8195181	1000	8195134	wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=	mainnet-v1.0	NS4Z2S3NCKMVZSY6VX4CKE7QTN7N5JLT7AVYQFJSRVRLUS...	3	8196134	gqZwVHhuSWTZJDY0ZThhOGUyLTBhMTItNDdlNC04N2Q3LW...	1596254404	BXUZYY5XN2PFLBJE3BX2RDMQLQUW5ZUWX4ZCNDUL6QMXJZ...	axfer	NaN	NaN	NaN	NaN	0.0	2512768.0	0.0	NaN	BXUZYY5XN2PFLBJE3BX2RDMQLQUW5ZUWX4ZCNDUL6QMXJZ...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	FbhxqPDlZxDcaAELShez+rXNQuTR4rRVP4vpeDassKN0pt...	NaN	NaN	2020-08-01	Saturday
4	8195181	1000	8195123	wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=	mainnet-v1.0	2S7TV7FI4TMPO7GXZI55PPW3FQE2PJVQ7F34LCS3C6Z7NW...	4	8196123	gqZwVHhuSWTZJGJmYjlhNGY4LTNmODgtNDQ1My04YWY5LW...	1596254404	62FMJKL2BDSYWSF3RYYZHAXA5HICQ57NFZCWWNM4ZJBYSV...	axfer	NaN	NaN	NaN	NaN	100.0	2513338.0	0.0	NaN	M2UEUR6EP3JSI2BGGPXEXUIOTTJMNL7A6VSPE357N2CBFA...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	3QqczHlaOjymMeoEZK/ULKSbXv8pYP59AJqt4PpRrUFfnp...	NaN	NaN	2020-08-01	Saturday

5. Calculate Metrics

Now that we have the processed table, we calculate some metrics and get some insights on the Algorand blockchain.

Total number of transactions

len(df)

Types of transactions

df['tx-type'].value_counts()

axfer     3599820
pay       1087417
acfg           57
keyreg          9
Name: tx-type, dtype: int64

Where axfer is asset transfer, pay is payment, acfg is asset config and keyreg is key registration.
More on transaction types can be found here.

Total amount of algos transfered during this week

df['amount-pay-txn'].sum()

251299842396859.0

Distribution of transactions over days of the weeks

df['day'] = pd.to_datetime(df['date']).dt.day_name()
df['day'].value_counts().plot()

EditorImages/2020/09/03 01:42/days_plot.png

Unique assets transfered

df['asset-id-asset-txn'].nunique()

df['asset-id-asset-txn'].value_counts()

2512768.0    1699636
2513338.0    1237993
2513746.0     661888
312769.0          87
163650.0          25
438840.0          22
1942776.0         14
604.0             14
438839.0          12
...
137489.0           1
Name: asset-id-asset-txn, dtype: int64

The transactions table provides us with asset ids but doesn’t have the asset name. In the next step we extract more data to learn more about the transactions.

6. Extract Assets

Just like with the transactions extraction, we create a function to loop over the requests.

def get_txn_response():
    nexttoken = ""
    numtx = 1

    responses = []

    # Retrieve up-to 1000 assets at each request.
    while numtx > 0:
        response = myindexer.search_assets(next_page=nexttoken)
        assets = response['assets']
        responses += assets
        numtx = len(assets)
        if numtx > 0:
            # pointer to the next chunk of requests
            nexttoken = response['next-token']
    return responses

assets = pd.DataFrame(get_txn_response())
assets.head()

	index	params
0	5	{'clawback': 'L226FSG3LTZR4V2MI5M4SDKJSF5HP2TQ...
1	6	{'clawback': 'QYQYXRJ7FLQCRRGE3WH5CIAAL56LLMIN...
2	235	{'clawback': 'QA75IQ76F6H2T55G65BY7BPLF5QNWSLT...
3	236	{'clawback': 'QA75IQ76F6H2T55G65BY7BPLF5QNWSLT...
4	237	{'clawback': 'QA75IQ76F6H2T55G65BY7BPLF5QNWSLT...

As in the preprocessing above, we parse the column with the dict fields:

assets = pd.concat([assets.drop(['params'], axis=1),
                    assets['params'].apply(pd.Series)], axis=1)
assets.head()

And now, we combine the assets value count from above with the assets names.

assets_counts = pd.DataFrame(df['asset-id-asset-txn'].value_counts()).join(assets[['index', 'name', 'unit-name']].set_index('index'))

assets_counts.head()

	asset-id-asset-txn	name	unit-name
2512768.0	1699636	YouNow Pending Props	xPROPS
2513338.0	1237993	Camfrog Pending Props	xPROPS
2513746.0	661888	Paltalk Pending Props	xPROPS
312769.0	87	Tether USDt	USDt
163650.0	25	Asia Reserve Currency Coin	ARCC

pd.DataFrame(assets_counts.groupby('unit-name')['asset-id-asset-txn'].sum()).nlargest(10, 'asset-id-asset-txn').plot()

EditorImages/2020/09/03 02:00/assets_plot.png

Conclusion

There is a lot of information recorded on the blockchain and a lot of fascinating insights that can be derived from it. The indexer provides an easy tool to access this data.

Learn more on how to use the indexer and don’t be shy to share your insights with us

Create Publication

Indexer For Blockchain Analysis

Requirements

Background

Steps

1. Environment Setup

2. Imports and Instantiations

3. Transactions Extraction

4. Preprocessing the Dataset

5. Calculate Metrics

Total number of transactions

Types of transactions

Total amount of algos transfered during this week

Distribution of transactions over days of the weeks

Unique assets transfered

6. Extract Assets

Conclusion