Beyond traditional data: Leveraging alternative data in banking

Financial Services Data and Analytics | April 2024

In recent times, due to the explosion of the digital economy, rise of FinTechs and favourable Government policies, India’s financial landscape has changed considerably. New -age products and business models have emerged, where banks are collaborating with different partners to deliver products and services like buy now, pay later (BNPL), co -lending models, co -branded cards and beyond banking services. 

In 2022, the global alternative data market size was valued at USD 4 billion and the Banking Financial Services and Insurance (BFSI) sector led with 15% market share (i.e. nearly USD 660 million). Due to the variety, predictive nature and rising demand of alternative data sources, the sector is expected to reach USD 273.01 billion in 2032, growing at a compound annual growth rate (CAGR) of nearly 52.6% from 2023 to 2032.1

Banks’ urgent decision making primarily relies on traditional data sources like core banking system, cards and credit bureaus, and lacks strategic advantage in a competitive business landscape. To complement data -driven decision making, many banks and regulators are gradually moving towards data partnerships with advanced tech stacks, which could help in supervision, increase sales, enhance customer experience, improve operational efficiency, reduce non -performing assets, and facilitate new product development, market research and external benchmarking. In this paper, our primary focus is on customer -centric activities like lead generation, next -best offer (NBO), agri lending, early warning systems, fraud detection and branch optimisation. 

Though there are many alternative data sources, we have identified key players and categorised them across seven major categories – income proxy, spend pattern, risk and compliance, geography internet of things (IoT), customer behaviour, aggregators and thought leadership, and Government data. We have further identified the key performance indicators (KPIs) to be leveraged from these data sources and how we can create advanced analytics models across five major themes – strategy, marketing, pre -screening, underwriting and collections. This is backed by industry use cases across the globe, followed by the approach to be taken for these use cases. 

Lastly, we outline how banks can start their alternative data journey with four key steps – ideation (identifying business value), decision making (selection of relevant data source and vendor), foundation (running a successful proof of concept) and adoption (scaling up). We follow up with how the functional architecture would look like where banks can leverage existing datasets with alternative datasets to deliver desired outcomes. 

Need for alternative data

Due to the significant rise of digital economy and increased data consumption, customer behaviour has changed drastically in India. To meet customers’ needs and keep up with a dynamically changing market, organisations need to find a way to capture a variety of data across a customer’s journey. Therefore, banks are using alternative data to make strategic decisions across distinct functions for customer profiling, sensing customer behaviour, combating fraud, running effective targeted marketing campaigns and refining decision-making processes.

While traditional decision making involves leveraging internal data along with some external data like Reserve Bank of India (RBI) policy, credit bureau and market research, traditional data sources do not always provide the complete picture and could miss crucial information.

This is where alternative data sources come in. Banks can learn more about their customers’ lifestyles, hobbies and objectives by examining data from sources like social media, online activity and purchase history. Alternative data can help in granular customer segmentation and hyper personalisation. By using this information, the right cross-selling and upselling opportunities for customers can be identified and proposed. For instance, a financial institution might provide travel insurance or credit cards with travel incentives to a consumer who frequently interacts with travel-related content on social media and has made transactions on travel-related apps.

Alternative data encourages the creation of new products and services by spotting trends in social media, improves benchmarking by assessing consumer sentiment, and facilitates branch selection using geolocation information. For instance, if alternative data shows rising customer interest in sustainable investing, banks can envisage relevant products that better appeal to changing customer tastes.

Alternative data enhances eligibility criteria of customers with utility payments and online activities, improves fraud detection by analysing online behaviour, and enhances know your customer (KYC). For instance, abrupt changes in a person’s online activities or demographics may cause additional inquiry while they are being considered for credit lending. Such data provides a more comprehensive view of a person’s online activities and browsing patterns, which helps in crafting more effective fraud prevention techniques.

While the Government has introduced many initiatives to boost financial inclusion – Jan-Dhan Yojana, Mudra Yojana, Kisan Credit Card – the lack of relevant data has historically been a major obstacle for banks to extend credit to the unserved population. For instance, banks can leverage alternative sources like telecom, social network and transaction data for existing to bank (ETB) customers to understand their needs and behavioural traits to offer a personalised NBO.

In case of new to bank (NTB) customers, the need for alternative data is directly proportional to the perceived risk of the customer – i.e., higher the risk, higher is the need to leverage alternative data for better underwriting. Depending on the credit history and customer persona, banks could use a combination of alternative and traditional data points to provide a holistic solution.

Banks should primarily focus how they can leverage alternative data and underwrite customers having poor or no credit history.

The financial services industry has seen tremendous growth in the use cases encompassing alternative data in the recent past. Alternative data helps banks and other financial institutions to derive insights about businesses, and provides them with a strong platform to unleash measures to accelerate growth, innovation and risk management.

With data sources growing exponentially, financial institutions have been coming up with diverse ways to gather and collect the same in order to unlock new insights.

figure 1

Alternative data sources

Due to increase in technological advancements, backed by favourable policies, many players have created enterprise solutions providing a variety of alternative data. Banks have been increasingly using alternative data sources like taxation, credit bureau, weather forecasts and RBI policy for lending, policy making and underwriting decisions etc. Data sources can be categorised based on multiple parameters like usage, private or Government, paid, free or freemium, and standalone or aggregators.

Based on the features exhibited and use in banking value chain, alternative data can be classified into seven major categories, as outlined below.

  1. Income proxy: Income is one of the key parameters to determine customer segment and lifetime value. In case of missing income details, we create an income proxy which can be extracted from utilities and payments data, along with data from other categories.
  2. Spend patterns: Spend patterns help banks to understand the needs of a customer, along with their channel preference and risk appetite. When a customer has multiple bank accounts with no specific bank preference, it becomes difficult to identify the spend pattern from internal data. To strengthen insights on spends, banks can use e-commerce data from travel and shopping apps, along with investment and insurance data.
  3. Risk and compliance: Risk and compliance data sources will help in flagging and filtering high-risk customers who may potentially default, be fraudulent or not comply with anti-money laundering (AML) norms. Primary use cases involve underwriting, financial crime and collections. Banks can leverage traditional credit bureaus along with other new-age players to mitigate the risk.
  4. Geography/location/IoT: Geography and sensor data helps in identifying location-based trends and insights for an individual as well as a cohort. Primary use cases involve customer segmentation, NBO and underwriting, understanding customers’ location and demographics based on telecom data, leveraging drones, sensors, and satellite imagery in case of corporate and agri loans.
  5. Customer behaviour: Customer behaviour data helps in understanding customer traits and personality. Field visits and surveys that use psychometric questions can be used to understand customer ethics, along with their household condition. For customers with no credit history, banks can leverage field visits or surveys while social network data can be used to understand customers’ preferences.
  6. Aggregators and thought leadership: Aggregators help in collating data from multiple players and provide a one-stop solution for a variety of use cases. Thought leadership can be used to understand future trends of the industry and align the overall strategy of the banks.
  7. Government data: Government data from multiple ministries helps in understanding macro-economic factors and formulating the overall strategy for banks. Furthermore, banks can use this data to validate declarations by customers – e.g. work experience, litigation, bankruptcy and tax.

Using various alternative data sources, financial institutions can gain useful insights into macro-economic trends, customer behaviour and associated risks, and enhance their day-to-day business metrics by tracking their overall performance.

Table 1 highlights the usage of KPIs and data and analytics-led initiatives that are associated with it. These KPIs would be instrumental in determining the aspects that need to be focused to drive the requisite use cases.

Table 1. KPIs and their use

Category Illustrative KPI Usage
Income proxy
  • Transaction patterns across different channels and merchants
  • Proxy for salary

This includes tracking the payment patterns, transactional data, demographic profile and utilities payments of customers. This data would be used to help in determining the creditworthiness of the customers.

  • Underwriting
  • Customer segmentation
  • Cross-sell and upsell
  • Campaign analytics
Spend patterns
  • Investment portfolio amount across different instruments
  • Travel patterns of the customer
  • Spending trend on e-commerce sites

This includes assessing the spend patterns of customers, which includes investment data, demographic profile travel patterns etc. This data can be used to serve customers according to their loan types and help in determining their creditworthiness.

  • Customer segmentation
  • Cross-sell and upsell
  • Campaign analytics
Risk and compliance
  • Screen against sanctions and fraud/negative list
  • Same KYC document used by different people
  • Number of tradelines across products 
  • Number of inquiries/applications/rejections
  • Payment history
  • Age of credit history
  • Percentage of credit out of the available limit

This KPI helps banks to understand if a customer has been involved in any fraudulent/AML activities, while his/her credit history helps them understand the pattern of repayment/application frequency, and credit usage.

  • Fraud and AML models 
  • Underwriting
Geography/location/IoT
  • Size and pattern of farmland or manufacturing facility
  • Pre-season baseline sales risk/estimates
  • Season progress at village and field level
  • Stress warnings and sales impact
  • Mobile/data usage
  • GPS location
  • Roaming details
  • Financial SMS scrapping 
  • Top-up and recharge patterns  
  • Telecom data includes raw data from calls, GPS and SMS while other devices include point-ofsale (PoS) and sensor data which can be used in IoT, satellite imagery, drones, CCTV and other forms of sensors that provide raw data. These are primarily used in the agriculture and manufacturing sectors where the focus is on providing micro-loans or collateral-based loans to farmers.
  • Underwriting of agri facility
  • Customer segmentation
  • Cross-sell and upsell
  • Campaign analytics
  • Fraud and AML models
  • Collection models
Customer behaviour
  • Income details
  • Work history
  • Activity on social network
  • Religious or political inclination
  • Locality of customer
  • Electronic appliances and lifestyle products
  • Personality (ethics, morality, beliefs)

Customer behaviour is captured using unstructured or semi-structured data, which includes scanning social network for behavioural patterns and assessing survey results.

  • Underwriting segmentation
  • Cross-sell and upsell
  • Campaign analytics
Aggregators and thought leadership
  • Key themes/trends 
  • Key risks and mitigation
  • Global outlook
  • Case studies
This focuses on market research that might be published by various large-scale organisations through research papers, white papers and newsletters, while aggregators provide diverse datasets on a single platform.
Government database
  • Expenses based on tax information
  • Salary proxy based on Employees’ Provident Fund Organisation (EPFO) deductions
  • Purchase and sales data
  • Percentage of investments on total salary
  • Phone and electricity bills
This includes Government sites, Central Registry of Securitisation Asset Reconstruction and Security Interest of India (CERSAI), electricity bills, tax information of the customer and other user-specific data. This data could be utilised to understand the customer demographic and determine their creditworthiness.
figure 7
Utilities – electricity/gas/ internet Payments
  • Bharat Bill Pay System
  • Payment aggregators
  • UPI/cards/wallet
  • Payment gateways
E-commerce travel E-commerce - shopping
  • Travel Troops/Easy Trip Planners
  • Paytm/Amazon
  • Amazon/Flipkart
  • Snapdeal
Insurance nvestments and account aggregators
  • LIC/MLI/Ipru
  • Acko
  • CAMS
  • CRIF, Perfios
Fraud/AML/KYC Credit bureau
  • Jocata
  • Jumio, Authbridge
  • FICO
  • Equifax
  • Experian, CRIF
Telecom/smartphone Satellite/drone/IoT
  • Airtel, Jio, Vi
  • Digitap
  • MTNL, BSNL
  • Satsure
  • Cropin, BharatAgri
Field visit /survey Social network
  • Questionnaire
  • Household inspection
  • Behavioural traits
  • FB, WhatsApp, Instagram
  • Google
  • News, OTT
Market research /thought leadership Overall data aggregators
  • Nielson,Gartner
  • Strategy consulting
  • Perfios Karzaa
  • Digitap
  • Vayana Network
Taxation and CERSAI  Corporate, employment, litigation
  • Personal tax
  • Corporate tax, GST
  • CERSAI, stamp duty
  • Ministry of Corporate Affairs 
  • Insolvency and Bankruptcy Board of India
  • Litigation
  • EPFO
Economic, sector outlook, census
  • Ministry of Statistics
  • RBI, IBEF, NITI Aayog
  • IMD

Use of alternative data

Globally, many banks and regulators have leveraged alternative data for creating in-house risk scorecards, understanding customers better, and gauge customer sentiments, market risks etc.

  1. Axis Bank leveraged data from sources like Bureau, Liability, Drone, Telecom, Farmland etc, to create use cases around income estimation models, rural lending (in-house scorecards) and new customer lending. This has significantly reduced the rate of bad loans and brought down the risks associated with it.2
  2. Union Bank of the Philippines uses alternative data to better understand the customers, improve operational efficiencies and manage risks. Their AI-powered alternative credit-scoring solution helps to increase financial inclusion.3
  3. Bank of Italy uses natural language processing (NLP) solutions and social media intelligence to understand posts, market vibration and public sentiment, and assess the bank’s underlying financial policy.4
  4. Autorité des Marchés Financiers (AMF) of Canada has developed a tool based on customer complaint and social media data using NLP topic modelling to extract and organise a series of contextual topics which are discussed frequently. The tool helps AMF to understand customers’ underlying issues better.5
  5. RBI is working on regulatory supervision with indicative use cases like sectoral sensitivity, risks from markets, complaint analysis, social media analytics, examining related party/circuitous transactions, employee stress monitor.6
  6. Wells Fargo has signed data exchange agreement with FinTech Envestnet | Yodlee, which will allow seamless data exchange using application programming interfaces (APIs).7

With the increasing use of alternative data sources, conventional decision-making processes may slowly fade away. Let’s now look at how alternative data interventions are spread across the banking value chain.

Alternative data interventions in banking

Banks are using and intend to use alternative data across the banking value chain, right from lead sourcing to account management. Alternative data can primarily be leveraged across five major themes as follows:

  1. Strategy – new product development, benchmarking, identifying new branches/offices
  2. Marketing – customer segmentation, cross-selling/upselling, campaigns
  3. Pre-screening – fraud/AML checks, minimum eligibility, KYC
  4. Underwriting – exhaustive checks to determine credit/fraud risk, rate of interest, tenure
  5. Collections and account monitoring – fraud/AML monitoring, probability of default, days past due (DPD)/NPA

Some use cases which consume alternative data extensively have been highlighted below. These use cases are presented in the context of the themes highlighted above and are expected to assist banks to come up with new strategies to win customers and increase revenue.

figure 3

Incorporating alternative data in banking models

It’s vital for banks to determine ways to incorporate alternative data and develop suitable models that would benefit them. Some of such models have been briefly described below.

Agri loans are advances provided to farmers to help them finance seasonal farming operations or allied ventures like raising livestock, cultivating fish, or buying land or farm equipment in addition to buying fertilisers and seeds for crops. An agricultural underwriting model that seeks to provide loans to farmers based on profile data and other criteria (like drones, field visits) might be established by using alternative data, as outlined below.

Approach

  1. Sourcing: Acquire traditional and alternative data sources from credit bureau, payments data and partnership data from different vendors.
  2. Aggregation: Create agri data mart and map customer tags based on certain behavioural traits and transaction patterns.
  3. Analytics: Implement underwriting analytics engine along with business rules like:
    1. customer profiling
    2. statistical analysis 
    3. borrower risk evaluation. 
    4. Apply business rule engine and calculate the updated risk score. 
  4. Underwriting decision: Communicate the application status to the customer.

Case study

Rohit Sharma, aged 30, from Jalgaon, applied for a farm loan of INR 5 lakh. His past records were extensively studied, farmland was carefully observed, environmental conditions were aptly investigated, and crop conditions were thoroughly examined through drones, farm visits and other utilities. A risk score was derived basis the attributes set, which successfully passed the qualifying mark and the loan of INR 5 lakh was approved.

Some special cases might see partial loan approvals based on certain criteria which might hamper the risk score of the customer – for instance, if a region is prone to floods and sees periodic floods every year, given that the borrower has all the other attributes like crop health, past records, and other factors intact, it would still impact the risk profile of the borrower. If Rohit’s case had been such, the approved loan amount could have been INR 2 lakh instead of the requested amount.

figure 4

In order to ensure that time and efforts are spent in nurturing leads, organisations are using lead scoring models. Lead scoring models have helped banks to classify or rank leads based on varied attributes and data points which signifies their inclination towards buying the product. Complementing the model with alternative data along with intrinsic data can provide a boost to score classifications. This model focuses on categorising the leads based on the scores derived from the data that encapsulates demographic, behavioural characteristics and other factors like spend pattern, income proxy, etc. The approach note for lead scoring model using alternative data is outlined below:

Approach

  1. Sourcing: Acquire traditional and alternative data sources from digital footprints data, spend pattern, demographics and partnership data from different vendors.
  2. Aggregation: Pre-process click stream data, create lead scoring mart and identify potential leads based on certain behavioural traits and transaction patterns.
  3. Model implementation:
    1. Real-time online models: Implement machine learning model for lead conversion propensity. Lead allocation engine will thereafter assign leads for appropriate agent profiling.
    2. Offline models: Develop customer segmentation model to extract customer persona.
  4. Analytics: Provide real-time campaigns and dashboards to agents with detailed customer information.

Case study

Rahul Sharma, aged 31, a resident of Mumbai, spends INR 40,000 monthly, with an annual income of INR 22 lakh. Sharma is looking for properties on third-party sites. His online searches indicate that he has been looking at home loan eligibility and lowinterest banks in Mumbai. Based on the lead scores from the model, the customer would be categorised under the ‘Hot’ group.

figure 5

The NBO model is getting widely popular with banks to provide more personalised experience to their customers. It is an advanced analytics-driven engagement model that consumes customer-related, telecom and third-party data to predict the next best product that could be offered to the customer. This model focuses on recommending the NBOs to thecustomers based on the customer lifestyle segmentation scores derived from SMS information, demographics data and other attributes. The approach for the NBO model using alternative data is highlighted below:

Approach

  1. Sourcing: Acquire traditional and alternative data sources from internal system, telecom data, customer demographics and partnership data from different vendors.
  2. Aggregation: Create cross-sell data mart with all the relevant data points after pre-processing the data.
  3. Analytics: Implement NLP-based segmentation engine based on parameters like stemming, tokenisation and topic identification to obtain customer lifestyle scores. These scores would be augmented with the campaign scores derived from the recommendation model based on the centralised data.
  4. Cross-sell: An NBO will be rolled out to the customer based on the final scores.

Case study

Ajay Verma, 29, is based in Bangalore, with an annual income of INR 24 lakh. His average monthly spend is INR 50,000 rupees, and he pays a monthly rent of INR 20,000. Verma is looking for a permanent accommodation according to the third-party data and other search-related data. Customer lifestyle segmentation scores were then calculated, along with the campaign scores. In this case, housing loan had the highest score, followed by other products (personal, vehicle, etc.), basis which a housing loan – followed by personal loan – would be offered to the customer.

figure 6

Charting the roadmap for your alternative data journey

When organisations start looking at alternative data use cases, it is important to address multiple challenges associated with the same, along with charting out the ideal for it. A successful alternative data journey starts with the ideation stage wherein one understands the benefits of alternative data and how it supplements existing data sources. Banks’ strategy and IT team should carefully communicate their vision to create buy-in from business teams. They should then translate the vision into a target future state and bring together perspectives to define a target state operating model. The next step is to define a roadmap with crucial milestones and align their vision and roadmap to a business case.

In parallel, banks should work on proof of concept (POC) to find opportunities to learn, experiment and create short-term value. They should start with decision making on data-point selection and evaluating different vendors to get relevant data points.

To take full advantage of the POC, bank must build minimum capability maturity in terms of setting up basic technology stacks and having the right workforce to handle exploratory data analysis and unstructured data, build analytics models and make sense of the data.

After a few iterations and campaigns, banks should review the return on investment (RoI) to capture potential business gain and future partnerships. Once the foundation is laid successfully, banks can start by stabilising their people, process and technology; having a robust and scalable architecture; implementing agile methodology; and collaborating with multiple vendors for various data sources.

figure 7

The functional architecture shown in Figure 8 illustrates how banks can integrate internal/existing data warehouse with new alternative datasets. In the preliminary phase, one should define and measure which alternative data sources would give the relevant RoI and accordingly, these sources should be considered as the first choice.

Based on the current state architecture, banks need to augment or create new data platforms to have a consolidated view which is scalable, agile and resilient to new data sources. At the design stage, banks should design ingestion frameworks, loading strategies, recon and audit frameworks which help in the enablement of alternative datasets. Furthermore, a data platform should be set up and be operated by following the best practices, along with creating business and analytics data marts.

figure 8

*Customer consent is taken for each source to use data for analytics and selling. 

In their nascent stage, banks can reap maximum benefits if they focus on high-priority analytics use cases across the five major themes along with the consumption of analytics model output. Relevant operating procedures should be defined, and downstream integration should be done via digital marketing, customer relationship management (CRM) tools, outbound call centres, relationship managers, underwriters and investigators.

Conclusion

With advancements in technology and the evolving BFSI sector, alternative data helps banks in gaining a strategic advantage. In this paper, we explored a variety of alternative data sources – right from freely available data like RBI policy to specialised farmland data.

One of the key challenges for banks is to start using alternative data, and parameters that have to be considered to ensure a holistic take on new ways of banking. Our six -step iterative framework will therefore help banks in charting their alternative data journey.

However, there is no ‘one -size -fits -all’ formula – what works for one bank may not work for other banks. In essence, leveraging alternative data relies on three key pillars – people, process and technology.

People: Using alternative data in day -to -day decision making will require significant shift from the traditional mindset. It will become imperative for banks to ensure that people are on board with the change and are willing to get used to augmented data -driven decision making. Banks should partner with vendors who provide RoI and are the right fit for future growth, along with the right set of people who can make sense of alternative data and derive expected value.

Process: Integrating alternative data will require changes in existing processes to consume the data and curate insights. This will necessitate having the right processes in place in order to evaluate and select relevant data sources or vendors.

Technology: In order to ensure that alternative data adds value to businesses, it is important to work on aspects such as data accuracy and correctness and selecting and building suitable analytical models. Moreover, it is necessary to have the right technologies in place that are reliable, scalable, and agile, and provide seamless integration with the existing systems with minimal trade -offs and high RoI.

Authors

Deepak Kanojia, Krunal Sampat, Vaibhav Jain, Riddhi Ruparelia

About PwC

At PwC, our purpose is to build trust in society and solve important problems. We’re a network of firms in 151 countries with over 360,000 people who are committed to delivering quality in assurance, advisory and tax services. Find out more and tell us what matters to you by visiting us at www.pwc.com.

PwC refers to the PwC network and/or one or more of its member firms, each of which is a separate legal entity. Please see www.pwc.com/structure for further details.

© 2024 PwC. All rights reserved. 

Contact us

Mukesh Deshpande

Partner – Financial Services Data and Analytics, PwC India

Email

Samir Shah

Managing Director, PwC India

Email

Follow PwC India