A software developer’s perspective on open banking’s product reference data
How we apply business rules to unlock this data’s potential and power the Stryd Product Repository.
In my role as a software developer, I’ve been heavily involved in building Stryd’s digital home loan products, including the flagship Stryd Product Repository for mortgages.
The repository is used by mortgage brokers, aggregators and lenders to scan the bank market for the most competitive deals, and we update it daily for home loan product pricing and risk criteria.
At the time of writing, Stryd is populating the repository with data for 1,600+ home loans from 80+ bank brands.
To do that, we pull Product Reference Data (PRD) provided by banks in accordance with their data holder obligations under Australia’s open banking regime, which is part of the Consumer Data Right.
Open banking’s product data is a treasure trove of information! It covers the key features of the lender’s products, such as interest rate, comparison rate, constraints, fees and features.
Additionally, PRD is a faster, more accurate and sustainable method of obtaining this information than web-scraping or receiving static product files from lenders.
However, using it does present us with some challenges.
Unlocking the potential of better quality Product Reference Data
At Stryd, we’ve discovered usability issues with PRD from more than half of the banks.
From my point of view as a lead developer, some of the more specific challenges we have encountered when consuming this data include:
- volatility given the source is treated as a live document
- discrepancies between lender and brand names
- inconsistency of percent values for loan-to-valuation ratios (LVRs)
- non-standard disclosure of key information such as LVR and minimum loan amount by putting it in additionalInfo rather than in the tiers.
In this blog, I’ll discuss each of these challenges in more detail, describe how Stryd has handled them, and give you an idea of the level of workaround required.
If you’d like to dive in deeper, there’s additional information - with examples - in our April 2024 white paper and scorecard of PRD titled Democratising Data for the Good of Consumers which describes how to unlock the potential of better quality product data.
We cannot remediate all the issues we encounter because it’s dependent on the data type and whether we can find the correct information elsewhere.
In the absence of a reliable workaround for a key input like interest rate, to take just one example, we have no option other than to exclude that lender’s product from our repository.
As a result of that exclusion, the lender could lose business in the competitive home loan market, as their product will not be discoverable in the Stryd Product Repository.
Challenge 1 - Treating Product Reference Data as a “live” document
In our experience, lenders make changes to their home loan products approximately monthly.
However, between PRD API requests a few seconds apart, it's not unusual to see the bank’s product document:
- completely changing the order of lending tiers and features
- returning the rate numbers to different amounts of precision
- changing the “last update” field even if nothing in the document changed.
Treating the product reference document as live information introduces volatility and increases complexity unnecessarily, for both the bank and the user.
Rather than treating PRD as a live document, lenders would do better to treat it as a dynamic document that is updated as and when there are material product changes.
For our part as a consumer of the API, we need to build in financial logic to determine if products are the same - or different - from the previous request.
In dealing with some of these challenges, we have implemented rules and methodology to identify if a product is truly a new product or if it has just had a productId change. Rules have also been implemented to try to match old rate information with new rate information when lenders alter the order of lending tiers, so we compare the same rates with each update.
Our rules standardise the data presented and approve changes only when the lastUpdate field has changed.
Challenge 2 - Lender and brand name discrepancies
There are a lot of inconsistencies in lender and brand names across the product reference APIs, the consumer APIs, and the branding APIs.
Ideally, the lender name would consistently and precisely match the name that's been registered in the branding APIs. However, many lenders use different names for their brand in each of these APIs.
Some of the notable inconsistencies include:
- extra spaces in the lender names
- various capitalisations
- short versions
- long brand name
- lenders return their brand name as "null", a blank string, or even a single letter.
As a result, if a lender updates their PRD and changes their brand name, it looks like the lender has introduced a new product, which makes tracking actual product changes over time unnecessarily difficult. Likewise, it can be challenging to match a consumer's account data with product data.
To deal with the negative impact, we have created an extensive library of rules to standardise the permutations of lender names.
Challenge 3 – LVR consistency with percent values
The Data Standards Body has specified that percentage values must use the "RateString" representation:
For LVR tier amounts where the unitOfMeasure is a percentage, the specification is less clear.
Consequently, lenders are interpreting the data standards in different ways.
Some lenders are continuing to follow the RateString presentation, while others are using a percentage. Both are reasonable interpretations of the standard in the absence of a precise specification in the standards.
The upshot is that approximately 14% of lenders that provide LVR use the RateString and the remaining 86% use percent. So the user of the data must work it out for themselves and convert the data if needed.
We have developed a number of rules to identify the data type provided and convert this to the RateString to ensure consistency in the repository.
Challenge 4 - Relegating LVR and loan amounts to the additionalInfo field
There is a formal BankingProductRateTierV3 structure specified in the data standards for lenders to provide minimum and maximum loan amount as well as LVR tier information.
However, many lenders do not include LVR tiers in their PRD. Rather, they send this information as a long text string in the "additionalInfo" field.
In some cases, lenders have misunderstood the nuanced meaning of “optional” and have not disclosed LVR tiers, despite the standards clearly specifying that “optional” properties should be implemented if the data is available:
Where the information is provided but in the additionalInfo field, an example of how a lender might represent these fields in the additionalInfo is: "For borrowers with a Loan to Value Ratio of 80 to 90% and a loan amount of 100k or more."
It is also not uncommon for these lenders to use a completely different method in another product. For instance, "LVR >=80 <=90 with loan value > $100000.00". In such instances, the information provided is not only in an incorrect field but also in an incorrect format!
So the product information cannot be used as provided by the lender because the LVR and loan amount information are not in the expected fields and will not show correctly without remediation.
The workaround at Stryd is to create rules that deconstruct and reinterpret this data to ensure it is usable, performing multiple transformations and cross-checks, including manual validation.
Manual validation is also undertaken to ensure accuracy as we have noted a significant number of instances where LVR information provided in PRD is incorrect when compared to the lender's product information on their website.
And, in some cases, we have no option other than to exclude the lender’s product from our repository.
Lessons learned for improving Australia’s Consumer Data Right (CDR)
The Stryd Product Repository is the culmination of significant time, investment and work that has gone into creating thousands of rules, exclusions and transformations, as well as manual validation.
To date, approximately 2,000 lines of code and 17,700 lines of rules have been written to enhance the data so it is usable and 318 filters have been implemented to remove bad or irrelevant data.
A key lesson learned is that there is a significant discrepancy between lenders in terms of the quality of their PRD. There can also be a big difference in PRD quality between the products of the same lender.
In its current state, PRD cannot be taken as a straight throughput of data to be exposed to consumers. It requires tools like Stryd with sophisticated rules to safeguard the quality of data for the end user.
Curious to find out more?
We’re here to help! To discover more about Stryd and how it works, you can get in touch with us by email at sales@stryd.au or set up a convenient time for a demo here.