By clicking “Accept ”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Cookies Policy for more information.
Blog
Product insights

What is data quality and how to turn messy transaction information into actionable insights? 

By
Michal Maliarov
8
min read

Don’t feel like reading? Listen to the audio version.

Introduction

Every transaction, every single tap with a credit card creates a stream of 0 and 1, and the ability to extract meaningful insights from that data is a key differentiator for banks and fintech companies. Data enrichment, especially in the payment sector, involves enhancing transactional data with additional information, offering a deeper understanding of customer behavior, reducing fraud, and ultimately improving the overall payment ecosystem.

When it comes to data enrichment services, three fundamental qualities need to be balanced to achieve the best results. In this article, we explore the crucial aspects of Coverage, Accuracy and Information richness, emphasizing the interconnected nature of these qualities for a comprehensive understanding.

Coverage

Coverage, measured as the percentage of transaction volume a solution provides results for, is a critical parameter. In the fintech environment it also follows the Pareto Principle 80% of consequences come from 20% of causes), where a handful of major merchants drive a significant portion of transactions. Striking a balance is key -reaching an optimal 80-90% coverage ensures that resources are not wasted on marginal increases, maintaining efficiency while upholding other essential qualities. Note that coverage is measured for each data point separately.

Example:

Out of 1,000,000 transactions...

  • 85%of transactions have a clean merchant name (e.g., original merchant name"HFB ECO IKEA 069 OLDENBUR" --> clean merchant name"IKEA")
  • 75%of transactions have a merchant logo (512x512 px, optimized for circular cutand visually checked by humans to ensure high quality)
  • 95%of transactions have a correct merchant category
  • 70%of transactions have been enriched with location details, including a fulladdress and GPS coordinates of the shop where the transaction occurred (not the headquarters)
  • 60%of transactions have been enriched with the merchant's URL.

Accuracy

Accuracy, or the percentage of enriched transactions done correctly, is the hallmark of data quality, since even a small error can lead to significant consequences, such as reputation loss for the entire company[LH1] . While it might be tempting to boost coverage at the expense of accuracy, the potential fallout from inaccurate enrichment, such as increased customer disputes or flawed analytical models, emphasizes the need for a meticulous approach. A company will lose valuable data when focusing on coverage at the expense of accuracy, which will have a negative impact on all banking features. In short – 75% Coverage with 99.9% Accuracy always beats 99.9% Coverage with 4% false data.

Example:

Merchant name – Is the correct merchant assigned?

Location

Is the detected location the exact address, wider area, or address of headquarters? Location verification ensures that enriched locations correspond to the actual shop where the transaction occurred. This is crucial as some providers might display the address of the merchant's headquarters or, in uncertain cases, provide a very general location, such as GPS coordinates of the Municipal District of Prague 8.

Examples:

Typical mistake in the enrichment of e-commerce or online services (Microsoft, Uber, Google, Netflix) - in these cases not returning any address is the correct enrichment

Brand logo

What is the quality of the enriched logo and is it correct?

How to correctly display the merchant logo on the transaction details and what errors are common.

Category

How accurate is the categorization?

There are at least 26 possible category and tag vatiants for a single MCC 5411 category. Tapix provides 25 merchant categories and 500+ store level category tags

Information richness

The level of detail provided in the enriched information often dances on the boundary with accuracy. This aspect is subjective, dependent on specific use-cases and sensitivity to the level of detail required. Whether it's transaction recognition behind payment gates, GPS locations, or logo quality, a higher level of detail typically leads to less confusion and, consequently, better results. It's a delicate balance that should be evaluated based on individual needs and use-cases.

Merchant name differentiation involves efforts to find the correct merchant behind payment gateways. For example, the original merchant name "PAYPAL *ABOUTYOUSEC" should be correctly identified as "ABOUT YOU," not "PayPal."
Tags supplement merchant categories, helping specify transaction types. For example, a transport company might be categorized as 'Travel,' but specific tags might include bus and plane tickets, bike-sharing, or Taxi services.
Accurate shop recognition involves differentiating between sub-entities, such as Google, Google Play, Google Workspace, Google Ads, and Google Cloud. Similar distinctions are made for entities like Amazon, Amazon Prime, Amazon Music, and Tesco, Tesco Gas Station, Tesco Mobile, Esso.
Localized merchant names account for variations in the presentation of merchant names according to geographical locations.
Precise categorization goes beyond Merchant Category Codes (MCC) and considers more detailed categories, especially when a merchant has multiple fields of business. For instance, IKEA may be categorized as "House And Garden" when purchasing home products and as "Food and Drink" when having lunch at the IKEA food court.
EcoTrack raise carbon literacy and helps to make eco-friendly decisions and better choices when shopping sustainably.

Know the difference

Logo Coverage: Addressing the logo coverage requires a nuanced understanding of the situation. Small businesses, often without logos, impact the feasibility of achieving 70-80% logo enrichment. In such cases, placeholder pictograms may be utilized, but it's crucial not to directly compare them to instances where logos are present. Additionally, recognizing online transactions to the level of payment gateways versus understanding the end recipient is crucial for proper coverage. For instance, transactions may be recognized at the gateway level (e.g., PayPal), but the true end recipient may differ.

GPS Location: Similarly, the differentiation between the GPS location of the shop, the merchant's headquarters, or the middle of a city street is an important detail. Recognizing the specific nuances, especially in online transactions, safeguards against artificial boosts in overall claim coverages. For instance, transactions conducted online may provide the GPS location of the company headquarters, but this doesn't necessarily reflect the location of the actual purchase. Understanding these subtleties is critical for delivering precise and reliable data.

POC as a cornerstone

In decision-making processes, relying solely on claimed coverages can be misleading. Running a Proof of Concept (PoC) becomes a perfect safeguard, where a sample of transactions is enriched within the same deadline. This approach allows for a direct and representative comparison based on evidence, avoiding simple claims and facilitating a holistic evaluation of solution metrics. POC is a unique topic we will explore in detail in the future articles.

Conclusion

Understanding the delicate interplay between coverage, accuracy, and detailing is what makes a good business tick. Striking the right balance ensures that data enrichment services not only meet but exceed expectations. As Tapix continues to enrich data for prominent companies, the emphasis on achieving this trifecta of qualities remains central to our commitment to delivering the highest value.

About author

Michal Maliarov, an enthusiastic writer who loves to talk about fintech, AI and the mobile tech market.

Michal Maliarov

Senior insider

A creative enthusiast who has spent half of his life in the technology industry. Passionate about fintech, AI, and the mobile tech market. Navigating the thin line between the worlds of media and advertising for over 10 years, where he feels most at home.

Table of contents