With rapid increase in data volumes and complexity, it is becoming more challenging to represent data in a simple manner by using structured relational databases. In this edition’s topic of the month, we discuss the benefits of adopting NoSQL databases such as the graph database. According to Gartner, ‘by 2025, graph technologies will be used in 80% of data and analytics innovations, up from 10% in 2021, facilitating rapid decision making across the organisation.’
One of the key features of the graph databases is to store, query and view relationships between interconnected datasets in an easy and scalable manner. Firms are constantly evolving their customer engagement strategies by using graph databases to gain deeper business insights such as driving new sales, reducing costs and building closer relationships with customers.
The newsletter also includes industry news and updates on financial service organisations leveraging data to improve operational efficiencies and customer experiences. It also highlights key measures introduced by regulatory bodies for increasing financial inclusion. Happy reading!
In order to utilise data for business analytical requirements, it is necessary for the data to be stored in a structured format for easy querying and hence, data model plays an important role while architecting any data store. Graph databases are no different except for the fact that the initial diagram drawn on the whiteboard representing the entities and their relationships is exactly how a graph data model would look like in reality. Hence, graph databases are said to be whiteboard-friendly. Let’s explore this further with an example of a banking data model with typical associated entities like bank, branch, account and customer.
A graph database has four elements: Node, relationship, label and property.
Node: A node is synonymous with entity and forms the fundamental units in the graph data model.
Property: Nodes contain properties that store relevant data points. These properties serve as the metadata to better understand the data entity.
Label: Once the nodes are identified, labels can be added (if needed) to group the nodes. Labels are an optional component of the graph.
Relationship: The relationships between nodes form a critical component of a graph database. Associations defined between two nodes help to identify the data nodes that are related. The fundamental rule of a graph database is ‘no broken links’. A relationship always has a start and an end node – deleting a node also deletes the relationships that are linked to it.
In the above example, the nodes defined are bank, branches, customers, accounts and products. Insurance and loans are services provided by the bank and can be grouped under products. Individual relationships are defined, such as banks providing loans and insurance. The nodes – bank and branch – have defined properties that store additional information about the entities.
Based on the example depicted above, the following are the differences in graph and relational data modelling:
Components and use cases | Graph data model | Relational data model |
---|---|---|
When to use? |
|
|
Data entities |
|
|
Attributes |
|
|
Relationships |
|
|
Modelling |
|
|
Querying |
|
|
In a graph database, the relationships between data entities are stored as data points. Thus, relationships are persistent and not computed using complex joins in every data cycle as in relational databases. This allows for significant improvements in query performance.
In relational data modelling, as the volume and complexity of data in systems increase, query results may not be as expected, with joins and complex queries taking longer to compute the outputs.
Graph databases do not have a schema associated with them. This facilitates a more flexible approach which easily accommodates additions to the data model as new requirements come in – without compromising the existing model.
Business requirements are often constantly evolving, and relational data models can sometimes be too inflexible to handle these ad hoc changes. Augmenting a relational data model involves adding new attributes and entities in the existing tabular structure and forcefitting relationships to the pre-defined relationships and structures.
Graph databases are not ideal for processing or analysing large amounts of operational data. Although these databases enable speedy retrieval of data with an emphasis on every relationship being correctly defined in the model, transactional systems having simpler relationships and data that can easily be stored in rationalised tables are better queried using a relational model.
There is no standard query language that can be used for graph databases. The language is platformdependent, which means that one needs to learn a new query language in order to use the database.
Graph databases are not as widely used as relational models. Owing to its considerably smaller user base, it becomes difficult to find support in case of any issues.
There are many graph database options available. Some of these include: Neo4j, ArangoDB, Amazon Neptune, DataStax, Dgraph, FlockDB, OrientDB, Titan, IBM Graph and TigerGraph. The features and use cases for a few of the most popular graph databases in use today are outlined below.
Feature | Neo4j | TigerGraph | Amazon Neptune |
---|---|---|---|
Description |
|
|
|
Initial release |
2007 | 2007 | 2007 |
Licence | Open source as well as commercial | Commercial |
Commercial |
Supported programming languages and operating system (OS) |
|
|
|
Advantages |
|
|
|
In a banking data distribution model, it is difficult to analyse large volumes of data across different channels to gain insights. To easily answer questions such as ’what data is consumed by end users’ or ‘how different datasets are related’, a leading global bank used a graph database to store such metadata. This knowledge base describes the flow of data and attributes from different systems to the end users. Attributes such as client description, legal names and number of end users who viewed certain data outputs form part of the dataset. The graph would include nodes such as consumer, channel, datasets and services. Moreover, it would allow parsing through the integrated datasets and masters, which would result in significantly improving access to information and higher user engagement.
While fraud detection is important for risk mitigation, banks need to ensure that they do not unnecessarily cause inconvenience to their customers. Thus, financial institutions need to ensure that the number of false positive notifications are reduced. Data links between known fraudulent credit card applications and new applications can be found using graph analytics. Financial institutions can use this to quickly shut down fake cards, reveal fraud rings and spot difficult-to-notice tendencies. In such a graph, the nodes captured would be new credit card applications, known fraudulent applications and their relationships.
A common practice for ensuring compliance with AML legislation and governance requirements is known as ‘know your customer’ (KYC). The main goal of graph databases and analytics is to gather and analyse large datasets and further use them to discover relationships between people, organisations and transactions. Financial services companies use this technique to expose illicit conduct and abide by evolving Government regulations. The graph for the same would include the following nodes – customers, their transactional history and audit information to track any irregularities.
The increase in the number of credit applications that are not tracked adequately by credit bureaus poses a threat to risk assessment and monitoring. For financial institutions, determining a customer’s eligibility for products such as loans, mortgages and other lines of credit presents both risks and possibilities. Financial institutions must therefore use all available data to make an informed, timely assessment on a customer’s creditworthiness or else they run the risk of losing market share. Graph database is a useful option that will enable financial institutions to perform deep analytics in such cases where data from multiple dimensions and sources are required to be connected in order to obtain the complete picture. For this example, the graph would include nodes such as customers, their credit card application history, and individual assets and liability details to help in determining the credit scores.
Graph databases are highly flexible and preferred when working with interconnected data. However, they are use-case specific. On the other hand, graph data models will not accommodate enterprise-wide data storage unlike relational data models, and thus can’t be used for complete management information systems (MIS) and key performance indicator (KPI) reporting. However, a graph database is one of the most compatible options for when deep analytics is required from multidimensional related or interconnected databases. While we are constantly using machine learning (ML) techniques to understand the relationships between data at different levels, the use of graph databases can help ML tools to penetrate further into the relationships between data and databases.
Bank of Baroda joined major banks in leveraging the AA framework for digital loans. Existing and new customers can apply for loans in paperless manner across digital modes like mobile banking app and the bank website. The AA platform will help in customer convenience while the bank can mitigate risks by fraud monitoring.
Max Life Insurance has launched MediCheck, a realtime analytics platform that can detect inaccurate and faulty medical reports at the issuance stage, which would reduce claims repudiation. MediCheck also gives a real-time health score to its customers, which can help underwriters in taking decisions.
Cygnet has received in-principle approval from the RBI to operate as an AA. The AA entity facilitates the collection of information pertaining to a customer from financial information providers. Customer consent is required for information sharing, and AAs are regulated by the RBI.
Aegon became an early adopter of paperless data-driven underwriting, moving away from the traditional ways. This has advantages in terms of improving straight-through processing for customers and operational efficiencies for the organisation.
Deutsche Bank is collaborating with Synthesized to leverage data and accelerate the adoption of artificial intelligence and ML-driven client insights, while ensuring data privacy and security. Key advantages of the same are enhanced performance and durability of ML models, by producing more unbiased training data for model testing, reducing data collecting cycles to increase the time-tovalue of ML models, driving ideas for legal data monetisation by using irreversible datasets and facilitating quicker development cycles to shorten the time to market.
The CTO of Godrej Capital discussed how cloud technology helped them with scalability, cost effectiveness, high availability and maximum uptime. She also discussed how big data frameworks and data lakes have helped the swifter processing of structured and unstructured data, and how using NeoLAP (and offering for subject matter experts) has enabled Godrej Capital to assess multiple forms of income and thus provide loans with larger ticket sizes.
The IRDAI has given its approval to Bima Sugam, which is a digital marketplace hosting all life and nonlife insurance policies. All the regulated insurers will be part of Bima Sugam by January 2023. All sales, services and claims functions will be carried through this digital channel. The IRDAI has mandated the dematerialisation of all the new life and nonlife insurance policies by the end of this year, i.e. by December. An exposure draft has also been proposed by the IRDAI to allow private equity firms to invest in any insurers as promoters, given that the firms have completed ten years of operations and the funds raised are USD 500 million or more.
Based on Inc42’s latest report ‘state of Indian FinTech ecosystem Q3 2022 – InFocus: Neobanks’, digital lending is set to grow at a compound annual growth rate (CAGR) of 22% with USD 1.3 trillion market opportunity by 2030. In Q2 2022, the digital lending segment witnessed the highest fund inflow with 50% share, followed by investment tech at 14%. Changing regulatory norms is a major hurdle for the digital lending ecosystem.
The non-banking financial companies’ contribution to the agricultural credit portfolio is 4%. One of the primary reasons for such low engagement is the lack of credit information. The CCFR will synergise the credit information capability from TransUnion CIBIL and crop parameters based on geospatial data from SatSure, providing a more accurate credit assessment as the output.
SEBI has approved the full acquisition of L&T Investment Management Limited by HSBC Asset Management. L&T Investment Management Limited is a wholly owned subsidiary of L&T Finance Holdings Limited, and the investment manager of the L&T Mutual Fund. After the transaction is complete, L&T Mutual Fund’s schemes will either be moved to HSBC Mutual Fund, or be combined with specific HSBC mutual fund schemes.
Acknowledgements: This newsletter has been researched and authored by Aniket Borse, Anuj Jain, Arpita Shrivastava, Dhananjay GoeI, Gaurav Ramakant Shirude, Harshit Singh, Krunal Sampat, Nisha Nair and Samir Shah.