Cr to our bsnss is the intprtn of senta nces lik thes one.
Our data sources – Kiranas & Supermarkets
Singularium’s core business revolves around the process of acquiring, standardizing and interpreting data from
the last mile of the FMCG market. Data is sourced on the ground in Bangalore, from Modern Retail through cloud
services, and from Kiranas through imaging.
This has been used to build an accurate and near real-time market intelligence platform pertaining to products,
schemes & throughputs in a dynamic industry.
Complications arising from erratic product names
Product identification is crucial to interpret raw data or build a value-added layer of analytics. Data from both
Modern Retail and Kiranas make product identification rather difficult with product names that are often
shortened, garbled, incomplete or mis-spelt. The inability of data driven systems to aggregate such information into that of a single product compounds errors while aggregating across retailers and in certain cases for a single retailer as well. The presence of multiple barcodes for a single SKU implies that without consistency of name, basic inventory management could be ill managed at a retailer.
Further, catalogue mapping various datasets manually was highly unproductive, inaccurate and
expensive especially with catalogues spanning hundreds of thousands of lines.
Examples of unstructured names in FMCG
- CLINIC PS&L SHP 175M instead of Clinic Plus Strong & Long Shampoo 175ml
- BRT GOOD DAY CC 90g instead of Britannia Good Day Chocolate Chip Cookies 90g
- Misspellings of text such as Lifebuoy and Biotique are quite commonplace
-
Figure 1: Britannia Item Names from a PoS DataBase
Figure 2: HUL Item Names from a Kirana Purchase
The algorithmic solution with over 90% mapping accuracy
The JESTR Algorithm helps us map erratic product names to their closest matches on a master catalogue spanning hundreds of thousands of items with over 90% accuracy on the top 3 candidates matched.
Samples of results as matched against a Britannia master catalogue of 110 items
input string | mapped text |
---|---|
brt gd c chip | britannia good day chocochip cookies |
dalyfrshcrd | britannia daily fresh curd |
ragi nc | britannia nutri choice essentials ragi cookies |
- JESTR mimics the string matching processes as would be implemented by humans. The algorithm manipulates input text optimally before employing a reductive matrix methodology to score for similarity and industry specific jargon.
- Algorithms founded on pure ML were found to have a lower accuracy due to the constant inflow of new products and significant deviations in the string structure used for naming by data sources. Built on general principles than machine learning – JESTR is capable of dealing with countless variations of input and does not require tf-idf based mechanisms for scoring or mapping.
-
Figure 3: jestr – brt gd c chip
Figure 4: jestr – dalyfrshcrd
Figure 5: jestr – ragi nc
Enhancing searches, chatbots and semantic analysis
In a world of text searches ( Google / Bing / Maps) and product discovery (Amazon / Flipkart / Swiggy / Just-Dial / Zomato) – JESTR holds the potential to narrow the scope and correctly interpret the query of a user.
It could be a useful tool to enhance dataset & header mapping for those combining data from multiple sources especially for firms dealing with large datasets.
Cron jobs used by e-commerce sites need to be backed by rigorous catalogue mapping to ensure the selections are mapped accurately.
AI chatbots are fairly commonplace, and JESTR could accelerate the bots through their training phase. In some cases, it can also be used to interpret a user’s text communication (SMS / Whatsapp / Chat etc).
With some enhancements in phonetic mapping & syllable evaluation & dictionaries of n-grams, the program would be a significant add on to pre-processing text for the semantic evaluation of sentences.
This is pretty impressive. Good work!