It is a fairly common need across multiple enterprise functions to display current trends versus historical trends data in reports/dashboards which necessitates the data architects to muse over multiple design patterns. Key questions include computing the historical trends in the database itself vs using an in-memory reporting component (typically cubes) by storing the base level data in cubes and computing trends real time. Both designs have their own advantages and disadvantages with respect to flexibility and cost. What if there is a special engineered database to suit your needs and is optimised to process and retrieve ‘time- series’ data, measure change over time or aggregate over large periods of data? In this newsletter, we are providing a concept note on such a different type of database called time series databases (TSDBs).
This edition of the newsletter also includes news snippets around new initiatives taken across sectors, with the most interesting one on the different announcements made by the Insurance Regulatory Development Authority of India (IRDAI) in the past couple of months. Execution is going to be key to drive these initiatives to their full potential, but the zeal from the regulator and the industry is definitely commendable.
A TSDB is a database optimised for storing and retrieving time-stamped or time-series data. Temporal ordering, a key characteristic of time-series data, organises events in the order in which they occur and arrive for processing. One can use time-series data to look backward and measure change, or to look forward and predict future change. The TSDB architecture is fully capable of supporting cloud/on-premise/hybrid solutions. The different architecture patterns typically used are —a standalone specialised time series database that is receiving data from one or more data collector endpoints or an extension to the current relational database management system (RDBMS) / NoSQL database, specially designed for time-series data analysis and computation. There are currently many options available for TSDBs — RDBMS options like TimescaleDB and questDB, NoSQL options like Amazon TimestreamDB and Prometheus, and standalone TSDB like InfluxDB. An abstract model of time series data attempts to answer the following question:
Name (who): Describes the subject that produces the data, which can be a person, a monitoring metric, or an object.
Tags (who): Additional information to describe/classify the ‘name’.
Timestamp (when): Time is the most important and fixed axis feature of time-series data and is a key attribute to what distinguishes it from other data.
Location (where): Used for locating the monitored object which is described by one or more tags.
Values (what): The value or status corresponding to the data. Multiple values or statuses can be provided, which are not necessarily numeric.
The TSDB architecture treats the above five tags as ‘indexes’ on which data gets loaded, organised, compressed, and retrieved from database for performing different kinds of operations like aggregation or down sampling. Back in time, TSDBs supported only four indexes, namely — metric, tags, timestamp and values. But with advent in the field of geospatial temporal analysis, another parameter of ‘location’ was added to further boost the utilities of TSDB in various fields such as analysis of customer growth in certain geographical areas over a period.
Below is a snapshot of data getting stored in a TSDB.
The most significant characteristic of TSDBs (along with supported SQL queries) is to provide time-series data specific functions and features to further exploit the benefits of temporal indexing. These features can majorly be divided into four categories:
Aggregates: A query can be for a single time series or multiple time series. For range queries or multiple time series, the results are down sampled, grouped and aggregated to give the user a holistic view of data to gain insights, e.g. aggregate window, aggregate rate.
Selectors: These functions are used for selection of specific datasets and records from a single or multiple time series, according to desired requirements. Results can be returned based on highest/lowest average, count, rate, etc., for a selected window, e.g. quantile, highest average/lowest average.
Transformations: These set of queries support varies time-dependent values' extraction and transformations that are not possible in any other type of DB (without significant coding efforts). For example, calculating running derivatives and exponential moving time weighted averages. Functions involve various kinds of calculations, moving averages, timeframe moving average, etc., e.g. double EMA, timed moving average.
Geotemporal: This is one of the most unique features of TSDBs. This is achieved by categorising the received location input in S2 cells (mathematical mechanism that helps computers translate earth's spherical 3D shape into 2D geometry), e.g. geo grid filter, geo shape data.
We shall now explore how TSDB's inbuilt function helps in solving the following problem statement.
For instance, consider the following problem statement and simplicity of generating output from TSDB vs RDBMS:
Which dynamic six-hour periods saw the most log ins from users on tablet devices, for last one week starting 1 January 2020?
TSDB (TimeScale database) | RDBMS (MS SQL database) |
---|---|
Select time bucket (‘6 hours’, login timestamp, timestamp ‘2020-01-01 08:00:00’) as device bucket, device type, count(*) as logins by device, From user logins Where login timestamp > now() - interval ‘1 week And device type = ‘tablet’ group by device bucket, Device type order by logins by device desc; |
Pseudo query/logic:
|
Time bucket () and interval are inbuilt functions in TSDB, to help in solving problem statement | No special function available and hence the code is lengthy, complex and needs developer support |
Power of solving simple to complex problem statements at hands of end user without any special technical support | End user will have to take help from special technical resource for designing query to solve problem statements |
We shall explore another such problem statement solved by TSDB.
For instance, consider the following problem statement and simplicity of generating output from TSDB vs RDBMS:
How to trace vehicle movement as to how many trucks exit from Los Angeles on a daily basis and what is the weight change during the same?
TSDB (TimeScale database) | RDBMS (MS SQL database) |
---|---|
Select time bucket(‘1 day’, time) as day, count(*) as trucks exiting, sum(weight) as tonnage FROM vehicle movements Where ST within(last location, ST Polygon ((select geom from cities where name=’los angeles’),4326)) and not ST within(current location, ST Polygon((select geom from cities where name=’los angeles’),4326)) group by day Order by day DESC LIMIT 30; |
No such feature available |
In the last three years, the trend in adoption of TSDB for business use cases has grown exponentially to harness the evolving nature of data and explore the potential of driving more real-time use cases to solve complex problems. Following are some of the major use cases where TSDB has been very promising:
IoT applications and use cases
Internet of things (IoT) devices, wearables generate enormous time-series data which can be increasingly used for data analysis and uncovering the hidden patterns related to health, safety, financial activity, etc., which can generate revenue for client offering subscriptions to real-time alerts.
Analysing and predicting customer characteristics
Data generated from customers’ online activities can be subjected to time-series database analytics to understand the spend pattern, predict the customer characteristics like shopping behaviour and accordingly set up campaigns and alerts to attract the customer to a client’s ecosystem network.
Anomaly detection
Temporal ordering allows the user to analyse timeseries information to compare current data to historical data, detect anomalies and generate real-time alerts, or visualise historical trends. These anomalies and historical trends can further be used to reinforce current mechanism and train artificial intelligence (AI) models for more resilient and seamless automation. For example, in ATMs and banking apps, temporal analysis can help learn user activity and detect any malicious anomaly, allowing the bank to better protect the consumers from frauds and cyberattacks.
Financial trend analysis
Storing and sorting data in a time-series format allows stockbrokers and traders to analyse the previous trends of the stock market and use the same for predictive modelling and result forecasting. These autonomous trading algorithms continuously collect data on how the markets are changing to optimise returns, both in the short and long term.
Medical insurance tracking
Tracking each aspect of patient data (age, admitted or discharged, days to recovery, etc.) during the pandemic helped us understand how we arrive at the daily counts, allowing us to better analyse trends, accurately report totals, and act. Such details and analysis during the pandemic impacts public policy in cities and towns, and insurance premiums, allowing for the Government and organisations to adapt and make informed decisions.
TSDBs are comparatively new as compared to NoSQL or more conventional RDBMS. Thus, it becomes imperative to analyse the differences and offerings of each database engine to further our understanding.
Factors | TSDB | RDBMS | NOSQL |
---|---|---|---|
Key features | Columnar data storage, time variable indexed and specialised design for timeseries data management | Known for highly robust, structured data storage framework | Highly adaptive to dynamic data generation and highly scalable |
Data storage method | Append only | Insert/update | Insert/update |
Data purging | Auto deletion of records post aggregation, available as out of the box | Custom routines to be built for data purging | Custom routines to be built for data purging |
Scalability and durability | Designed to achieve highest scalability and durability | Limited scalability | Highly scalable |
Compression and storage | Highly optimised for storage due to efficient compression ratios | Limited compression support | Better compression support than RDBMS |
Analytical function support | Comes with inbuilt analytical functions/aggregates for unleashing analytical insights | Use of stored procedures/ functions/extensions can be used for analytical use cases | No inbuilt support for analytics; extensions like Python/R/Scala can be used for generating analytical outcomes |
Nature of data | Data stored in structured/unstructured format | Data stored in structured format | Data stored in structured format |
Querying support | Basic/advanced querying levels depend on the product and underlying architecture | Highly optimised for simple to complex querying | No inbuilt support for data querying |
Use cases | High time-precision applications, e.g. IoT, timebased application monitoring and optimisation, real-time data streaming, etc. | Design of robust enterpriselevel solutions and applications | Analytics and reporting for unstructured data generated from sources like social media |
Transerve, a new-age FinTech, is helping NBFCs to assess the credit risk of customers based on their addresses/home location. By classifying various areas/regions, borrowers belonging to certain regions will be classified as low-/high-risk debtors. Additional checks would be implemented for highrisk debtors.
Union Bank of India has debuted on metaverse with its Uni-verse virtual lounge which will showcase the bank’s products and services revolving around current account savings account (CASA), loans, Government welfare schemes and digital initiatives. It has also created an open banking sandbox to collaborate with FinTechs and innovate banking products.
Thailand’s top asset managers are being enabled by technology and big data to study the non-linear and dynamic aspects that provide new opportunities. They are now capable of reclassifying funds in ways that are less susceptible to market capitalisation and provide do-it-yourself (DIY) products to the market.
With the help of AI, Kotak General Insurance has partnered with Inspektlabs to detect damages in vehicles. Now, customers will upload a video of their vehicle for policy renewal, the complete process will be automated and inspection reports of the damage will be generated post uploading the video. According to the company, this process will help in the underwriting process, prevent fraud, save time and cost, and lead to customer satisfaction.
Max Life insurance is leveraging technology and data to improve its operational efficiency and reduce human intervention. Its in-house product, Shield, has helped in automating and digitalising 75% of nonterm business and saving almost INR 800 crore by minimising fraud. Shield helps in catching fraudulent policies at an issuance stage using an AI bases as a model instead of rejecting them at the claims stage.
The banking, financial services and insurance (BFSI) industry engages in almost 70% of all customer interactions via calls. Convin has launched a new platform which supports multichannel conversation for customers and automates call reports to give an analysis on call performance, outcome from the calls, sentiment analysis and an alert mechanism to avoid inappropriate calls for its BFSI clients. It is expected to improve customer experience, increase productivity of the agents and develop better sales strategies.
As per a note published by one of the leading rating agencies, AUM of NBFCs is expected to grow in FY23.1 AUM growth rate of NBFCs in FY22 was 9.5%, for FY23 and this growth rate will fall between 9–11 % as per this report.
As per the draft, regulated entities would not be required to take prior approval from the RBI before availing IT/ITeS services. However, banks and non-banking financial companies (NBFCs) would have to ensure that usage of these services does not diminish its ability to fulfil its obligations to customers or impede supervision by the supervisory authorities.
For the revival of the economy, the finance ministry has asked public sector banks to explore co-lending opportunities with FinTechs. It has also asked to focus on technology and data analytics to accelerate their lending while keeping a check on frauds through IT security and cyber security systems.
IRDAI intends to increase insurance penetration in India by improving customer experience and enlarging the targets of insurers. Towards this, IRDAI has also allowed both life and non-life insurers to launch products without any prior approvals and given them state-wise targets, thus expecting innovation and customisation in the industry. Some of these guidelines include creating a national health exchange to settle claims, standardisation and benchmarking for hospitals, and regulation on the high vehicle insurance rates and formulation of guidelines towards single coverage for multiple vehicles of a single customer.
Acknowledgements: This newsletter has been researched and authored by Aman Mann, Aniket Borse, Anuj Jain, Arpita Shrivastava, Dhananjay Goel, Fenil Thakkar, Harshit Singh, Krunal Sampat, Mamta Kumawat and Shyam Mishra.