Skip to content

India car database changelog

Changelog for https://www.teoalida.com/cardatabase/india/

In 2015 about 8% of website traffic was from India, every year dozens of Indians ask me if I can make an automobile database for their country. For long time I stayed away from doing this, for 3 reasons:
– Lack of reliable data source comparable with AutoKatalog magazine (Europe) and Edmunds.com website (America).
– The poverty of a third-world country (some customers offered me sums of money that makes me crying).
– The bad experience with some Indians, many people ask what they need but only a small fraction of them are actually willing to pay, difficult to estimate market size, easy to overestimate sale potential, also aggressive attitude and putting me to do the work for free and pay afterwards only “if they like the work”, sometimes asking me to do unwanted work out of my experience field, made me undecided if I should serve anyone in India.

Copy-pasting data from websites would have required dozens of hours, too much effort for the offers received, alternatively could pay a programmer to scrap data automatically from a website, but with the little sums of money offered by potential customers, had to do at least 10-20 sales to be able to cover the cost (freelance programmers charge few hundred dollars for scraping services).

The decision to build an Indian car database came in August 2015, after finishing a new phase in development of European databases that kept me busy between 25 July and 15 August, got some free time and I learned to scrap data from websites myself, this can reduce significantly the amount of time investment needed to create the India car database, reaching the line of profit. As coincidence, Carwale website was redesigned between 14 or 18 August 2015 according www.archive.org.

In just one week, been contacted by 3 new potential customers asking for a list of cars in India. 2 of them insisting to scrap data from a website and one was against automated scraping. If you’re against scraping please suggest an alternate way to get the data!

Cars list of updates

Between 2015 and 2017 I ran scraper on make pages to get model URLs, then every model URL to get version URLs, then every version URL to get specifications, remove all data from previous update and put new data. All cars got updated (including prices) to current month.

In February 2017 Carwale website removed (hide) URLs leading to discontinued models. So my database contains valuable data that you cannot get yourself from Carwale anymore. I kept updating database by getting version URLs of new cars only, add URLs in existing data, compare the unique ID number from each URL, delete duplicates, then scrap all versions URL (new and discontinued) to get specifications including current price and last recorded price for discontinued cars.

In November 2017 Carwale removed unique ID from each URL, which was the ONLY way to distinguish multiple cars with exactly same name. All cars URL been changed and redirected to new URLs without ID number at end of URL, in 10 cases the old version URLs redirect to 404 Not Found, in 197 cases the old version URLs is redirecting to wrong car (multiple old URLs redirect to same new URL because of identical model name), making me impossible to re-scrap old cars for updates without risking loss of model versions.

The only way to update database is to run scraper on new cars only, add data into New & Old cars, use an Excel formula to identify duplicate URLs and delete them, remaining URLs I assume that they are cars launched last month and I add them at bottom of database. I add new cars each month, but cannot update older cars data anymore (example: price, which change often). This is not 100% reliable, if Carwale change/correct a model name it will reflect in different URL and I will add in database as new model, and if a model is discontinued and replaced by a new model with same name and URL, it will be not included.

In 2019 Carwale choose to concatenate multiple specifications into a single field (such as cubic centimetres, cylinders, valves and camshaft), causing inconsistencies in my database between old and new cars. Since old cars aren’t showing anymore on Carwale website to scrap data again for ALL cars as I did in 2015-2017, my database’s quality is at risk if Carwale continue to do changes on website (if you purchase “new cars only” database, don’t worry, it is consistent).

~500 models, 2855 versions (998 in production), ~1000 KB – 25 August 2015 (25 columns). 3 eurocents/model = 85.65 euro.
Dimensions database: 3 eurocents/model / dimensions (6 columns).

~500 models, 2904 versions (partial update), 5393 KB – 28 October 2015 (176 columns).

528 models, 3121 versions (1044 in production), 5969 KB – 22 December 2015 (178 columns), 4 eurocents/model = 124.84 euro.
After 3rd sale I re-launch database in 4 formats: Make & Model (310 models), Dimensions (515 models), Basic Specs (2 eurocents/model), Full Specs & Features (4 eurocents/model).

549 models, 3214 versions, unreleased – 1 February 2016.

557 models, 3254 versions, unreleased – 1 March 2016.

547 models, 3290 versions (1078 in production), 6474 KB – 14 March 2016 (179 columns), 4 eurocents/model = 131.60 euro.

561 models, 3303 versions, unreleased – 1 April 2016.

564 models, 3354 versions, unreleased – 1 May 2016.

569 models, 3404 versions (1045 in production), 6556 KB – 1 June 2016.
Price capped at 120 euro, no further increases.

574 models, 3432 versions (1098 in production), 6877 KB – 1 July 2016.
Make & Model 319 models, Dimensions 472 models. Added ID column.

576 models, 3469 versions (1082 in production), 7312 KB – 1 August 2016 (183 columns).
Basic specs and No specs now include also prices. Status (production/discontinued) column removed because prices do this job.

579 models, 3509 versions (1101 in production), 7414 KB – 1 September 2016.

582 models, 3555 versions (1100 in production), 7125 KB (except colors) – 1 October 2016.

596 models, 3615 versions (1118 in production), 7634 KB – 1 November 2016 (186 columns).
Status column added back, prices removed from No specs. Image URL added. One customer told me to scrap an used cars website and by this way I found additional makes, discontinued, missed from my database: Mahindra-Renault (Logan and Sandero models that aren’t listed on either Mahindra or Renault), also Chrysler, Maini, Maybach, Willys, total 9 models, 50 versions.

605 models, 3661 versions (1115 in production), 7769 KB – 1 December 2016.

610 models, 3680 versions (1128 in production), 7824 KB – 1 January 2017 (188 columns).
Added car class and body style, two columns added in October as custom package for a specific customer, now they are offered to all customers.
No specs (5 columns) 505 KB, Basic specs (26 columns) 1357 KB.
Dimensions 535 models (10 columns) 169 KB, added car class and body style.
Make & Model 353 models (4 columns) 76 KB, added status and car class.

613 models, 3725 versions (1165 in production), 8002 KB – 1 February 2017.

619 models, 3795 versions (1147 in production), 8084 KB – 4 March 2017.

3843 versions (1155 in production), 8505 KB – 1 April 2017.

3945 versions (1122 in production), 9004 KB – 1 May 2017. URL column added.

3955 versions (1145 in production), 9315 KB – 2 June 2017. Some changes in the source website caused file size increase.

3980 versions (1140 in production), 9108 KB – 1 July 2017.

4016 versions (1144 in production), 9236 KB – 1 August 2017.

4062 versions (1168 in production), 9605 KB – 1 September 2017.

4097 versions (1179 in production), 9670 KB – 1 October 2017.

4144 versions (1192 in production), 10251 KB – 1 November 2017.

4181 versions (1161 in production), 10397 KB – 1 December 2017.

4218 versions (1180 in production), 11371 KB – 1 January 2018.

4246 versions (1134 in production), ???? KB – 1 February 2018.

4274 versions (1127 in production), 11544 KB – 1 March 2018.

4296 versions (1125 in production), 11596 KB – 1 April 2018.

4332 versions (1146 in production), 11702 KB – 1 May 2018.

4372 versions (1141 in production), 11832 KB – 1 June 2018.

4401 versions (1139 in production), 11884 KB – 1 July 2018.

4418 versions (1135 in production), 11939 KB – 1 August 2018.

4480 versions (1145 in production), ? KB – 1 September 2018.

4511 versions (1160 in production), ? KB – 1 October 2018.

4562 versions (1168 in production), ? KB – 1 November 2018.

4616 versions (1176 in production), ? KB – 1 December 2018.

4639 versions (1180 in production), 12663 KB – 1 January 2019.

4683 versions (1178 in production), ? KB – 1 February 2019.

4721 versions (1177 in production), ? KB – 1 March 2019.

4756 versions (1168 in production), 12982 KB – 1 April 2019.

4803 versions (1158 in production), 13137 KB – 1 May 2019.

4850 versions (1171 in production), 13271 KB – 1 June 2019.

4900 versions (1214 in production), 13431 KB – 1 July 2019.

4929 versions (1172 in production), ? KB – 1 August 2019.

4983 versions (1200 in production), ? KB – 1 September 2019.

5038 versions (1225 in production), 13782 KB – 12 October 2019.

5046 versions (1227 in production), 13782 KB – 1 November 2019.

5066 versions (1226 in production), 13808 KB – 3 December 2019.

5093 versions, new cars only: 288 models, 1247 versions, 14109 KB – 1 January 2020, added 6 more columns.

5155 versions, new cars only: 292 models, 1236 versions, 14259 KB – 1 February 2020.

5236 versions, new cars only 294 models, 1231 versions, ? KB – 7 March 2020

5298 versions, new cars only: 289 models, 1136 versions, 14639 KB – 1 April 2020.

1 May 2020: 5364 versions, new cars only: 280 models, of which 208 models have 996 versions. The adoption of BS6 norms at 1 April ended production of numerous models, thus version count dropped.

1 June 2020: 5398 versions, new cars only: 277 models, of which 201 models have 940 versions.

1 July 2020: 5436 versions, new cars only: 276 models, of which 193 models have 933 versions.

1 August 2020: 5? versions, new cars only: 248 models, of which 171 models have 833 versions.

In August 2020 Carwale website was redesigned, I spend few hours editing scraper xPath codes. Carwale page source code no longer include Make and Model separated from Version, so the only place to get this info was in URL, that do not have correct capitalization.

On October 2020 Carwale re-added (some of) discontinued models, allowing me to re-scrap ALL cars and not just the ones currently in production, but it resulted 3317 model versions. I emailed update notifications to 30+ customers asking if I should continue adding new cars into existing database each month, bearing the risk of inconsistencies and duplicates described above + possible even more inconsistences in the future if Carwale redesign their website again, OR start a new database containing new and discontinued cars, ONLY those currently shown on Carwale (2000 cars less) with consistent data in each column and without duplicates?

of 30+ past customers emailed, only 2 replied choosing option 2. Another 2 new customers also choose option 2. So in December 2020 I redesigned database according customer preference and according current design of Carwale, solving problems of inconsistency.

January 2021 had 1301 production cars, 2441 discontinued cars showing on Carwale, and I added 2570 cars from 2020 database that does not currently exist on Carwale.

March 2021 have 1074 production cars, 2512 discontinued cars, and I added 2806 cars from 2020 database that does not currently exist on Carwale (I copy-pasted all URLs from 2020 database into current database and deleted duplicates). The huge variations in number of cars indicate that Carwale is changing URLs over time, and because Carwale is no longer show an unique ID for each car since 2017, my job became a HELL in offering you a complete and duplicate-free database.

1061 versions in production, 3609 versions including discontinued, 6748 versions including deleted – 22 May 2021

1072 versions in production, 3497 versions including discontinued, 7797 versions including deleted – 1 August 2021

1101 versions in production, 3576 versions including discontinued, 7846 versions including deleted – 1 September 2021

1236 versions in production, 3878 versions including discontinued, 8371 versions including deleted – 1 November 2021

Due to frequent and unexpected changes in Carwale website starting from 2021, keeping updates on-going require now paying third-party programmers, and low sales in India (compared with EU & US) makes this project no longer viable, especially because there is NO GUARANTEE that tomorrow Carwale won’t block scrapers completely. I could start scraping data from Cardekho instead (with my own tools without paying programmers), but I was afraid that offering a new database incompatible with Carwale-based database will make old customers not happy.

As May 2022 one of my customers claimed that has made a scraper Carwale and will send me, but he only kept me waiting. In July 2022 a freelancer from Egypt that I previously hired for other projects, returned online so I asked him if can make a scraper for Carwale. Thanks to him I was able to resume updates for India Car Database, he found 30 additional columns not included in my previous updates. BETA database published on 2 August 2022 to ask my customers to check for errors. I asked him to do some changes, such as adding price in numerical value without Lakh and Crore, but he has disappeared, so next updates will be in same format like August. I am looking for another programmer expert in Python.

1126 models, 8369 versions – 02 Aug 2022
1131 models, 8503 versions – 12 Nov 2022
1176 models, 8742 versions – 26 Mar 2023
1177 models, 9359 versions – 25 Jun 2023
1178 models, 9376 versions – 1 Aug 2023
1179 models, 9591 versions – 4 Dec 2023
1179 models, 9686 versions – 15 Mar 2024

Each time I ran Mohamed’s Python scraper on Carwale website I was getting few more rows, giving impression that everything is OK. But customers reported missing new cars. Even if I don’t have experience in writing Python codes, I inspected his code and found that he setup scraper to go on https://www.carwale.com/webapi/carmakesdata/getcarmakes/?type=used&module=2&year=2023 and try each year from 1998 to 2023. No wonder why was not picking any 2024 models. 1980s models missing from database for same reason. I put now range 1900-2099 and this increased number of rows from 9693 to 10973.

1276 models 10973 versions – 4 April 2024
1291 models 11099 versions – 16 May 2024
1298 models 11202 versions – 13 July 2024

Bikes list of updates

24 makes, 247 models – January 2016 (initial release).
25 makes, 271 models – October 2016.
Somewhere in 2017 I discovered that Bikewale have links to discontinued models, this increased model count to over 600. I did 2 more updates in 2017 but did not tracked them.
27 makes, 647 models, specs for 426 models – April 2018.

In April 2019 I made new scraper for Bikewale, adding individual versions in the Indian bikes database (in the previous editions, if a bike had multiple versions, database contained only base version).

36 makes, 782 models, 1214 versions, specs available for 927 models – 4 April 2019.
41 makes, 845 models, 1219 versions – 13 October 2019.
41 makes, 857 models, 1271 versions – 29 January 2020.
41 makes, 866 models, 1301 versions – 1 May 2020.
41 makes, 937 models, 1386 versions – 2 August 2020.
43 makes, 947 models, 1491 versions – 20 March 2021.
49 makes, 978 models, 1569 versions – 26 May 2021 (temporary, not published).
49 makes, 979 models, 1570 versions – 4 June 2021.
50 makes, 1006 models, 1619 versions – 1 September 2021.
50 makes, 1088 models, 1691 versions – 12 January 2022.
58 makes, 1122 models, 1871 versions – 10 November 2022.
58? makes, 1151 models, 1925 versions – 26 March 2023.
58? makes, 1213 models, ? versions – 26 June 2023.
66 makes, 1322 models – 1 April 2024. I spent few days rewriting xPaths in C# because Bikewale website changed a lot. Database have now more columns, but I wasn’t able to extract color information anymore, also not able to scrape each version individually, thus 2024 update have 1 row per model. Maybe in the next update I will figure out a solution to have 1 row per version, like previous updates.
67 makes, 1351 models – 14 July 2024.

2 comments

  1. HI,
    Came across your page. Am working on a startup idea and in need of database for past and present cars in India covering make, model, specs and features.
    Pls contact me.
    Thanks and regards,
    Atul
    #9911515335

    1. I can see that you started a chat on 7 January but haven’t asked any question, what answer did you expected?

      Feel free to buy any of my databases, what are you waiting for? Do you have any questions before purchasing? Send a chat message again if you need my help.

Leave a comment

Your email address will not be published. Required fields are marked *