Progress Dashboard

Tracking the progress as I resurrect abandoned CoolRunning.com race results

Coolrunning.com came online sometime mid 90's and by the early 2000's was a widely free-to-use site for race directors in the New England region to promote road races and post results. In early 2020, Active.com announced it was buying CoolRunning and with that retiring the CoolRunning site. Active stated results would not be migrated to Active's own results repository. Active announces CoolRunning Shutdown

Fortunately all is not lost! Thanks to webcrawler sites like Wayback Machine - Internet Archive there are saved copies, aka captures, of many,  but alas not all, of these abandoned race results. Using the Wayback Machine API and wget utility, I have downloaded these page captures where available. I then reformat the result file html content when possible removing extraneous content and built a new race result catalog/database.

I started the conversion effort with my home state of Massachusetts - the state with the most postings (~30K of ~65K race postings, ~45%). I'll then work my way thru the other Northeast region (~75%) and later the entire U.S based on post population. Postings pre-1999 did not have consistent naming conventions, formatting and folder structure. These conversions will require more time and effort and will be done later as time dicates.

Not all result files have been archived. There are near-zero results captured from 2019. A majority of the 2018 postings have no capture after August. Also secondary result files for a race were sporidically captured. Secondary resultsets are seperate webpages - typically including results in multi-distance events, age group winners, team scoring, race stores, photos etc. I'm projecting about 75% of the posted result content can be restored. The graphs and tables on this page illustrate the race posting demograhics and conversion progress.

The scope of this effort will be strictly the race results, stories and images posted by the race directors. Any of the content authored by CoolRunning will not be restored. For example: race calculators, articles and site graphics/images.

Results found here on this site are 'as is'. No corrections will be made.

- R. Landry April 2024

Coolrunning results posted were mostly organized within a 2 digit year and state url path structure although postings pre 1999 have less structure. In general there is a unique webpage listing the races by the state and year. The url for 2006 Massachusetts results looked like this: www.coolrunning.com\results\06\ma.shtml
Opening this file's html source you could find the race date, race name and URL to each posted race.
While there is some variation, especially pre-1999, each race had a filename shortened based on race date and name.
The URL to the 2006 New Bedford Half Marathon results looked liked this www.coolrunning.com\results\06\ma\Mar19_29thNe_set1.shtml
The primary race result page had a filename ending like ..._set1.shtml and secondary result sets named ..._set2.shtml, ..._set3.shtml and so on.

CoolRunning was shut down around Febraury 2020 but it was late in 2019 when race directors started noticing issues posting results.
With the Wayback Engine search API you can query full or partial paths returning the wayback URL to the zero, one or many historical captures. Wayback's webcrawler looks to have stopped capturing site pages right around August 2018 based on the avaiable data and files captured.
You can easily query Wayback on a full or partial url. This following HTTP Request will return the captures found for 2016 results across all states.
https://web.archive.org/cdx/search/cdx?url=www.coolrunning.com/results/14/ma/*&output=json&limit=100000&matchtype=prefix&filter=statuscode:200&page=0

[
[
"urlkey",
"timestamp",
"original",
"mimetype",
"statuscode",
"digest",
"length"
],
[
"com,coolrunning)/results/14/ma/apr10_srrthu_set1.shtml",
"20141122183412",
"http://www.coolrunning.com/results/14/ma/Apr10_SRRThu_set1.shtml",
"text/html",
"200",
"PH6CO2NRFMIMNYUAAM4TDAYJSXIWRQEK",
"6493"
],
[
"com,coolrunning)/results/14/ma/apr12_20than_set1.shtml",
"20141122182801",
"http://www.coolrunning.com/results/14/ma/Apr12_20thAn_set1.shtml",
"text/html",
"200",
"NKRAAOOJQ4XUUSZOFYRIORB63YZTVCCX",
"30343"
],
[
"com,coolrunning)/results/14/ma/apr12_2ndann_6_set1.shtml",
"20141122134645",
"http://www.coolrunning.com/results/14/ma/Apr12_2ndAnn_6_set1.shtml",
"text/html",
"200",
"QSCESWHKF24NIDDHGH3TFMN4HGJ3WX4D",
"7915"
],
        
From this you need to parse out each unique URL and find the most capture using the timestamp part to write these as line items to a text file like this
http://web.archive.org/web/20141122183412/http://www.coolrunning.com/results/14/ma/Apr10_SRRThu_set1.shtml
http://web.archive.org/web/20141122182801/http://www.coolrunning.com/results/14/ma/Apr12_20thAn_set1.shtml
http://web.archive.org/web/20141122134645/http://www.coolrunning.com/results/14/ma/Apr12_2ndAnn_6_set1.shtml
        
This file will then be passed into the wget program file as an argument to kickoff the download of requested captures
-x -nH --cut-dirs=3 --input-file=\wayback\search\searchlist-14.txt
        


The captured results are a bit ugly. They contain many links, images and ad placements embedded in the captured page's html souce. These link and resources/images may no longer be available and when they are available may take time to load.
I started downloading (about 100K+ pages and supporting files/images) and converting files back around 2021.
I then wrote code to parse thru each state & year file collecting the race dates, names and result urls to assemble a new catalog of the results that were available.
Next, I spun thru each result file pulling apart and reformatting the result pages. The actual result content was conveniently tagged at the start and end to make stripping out the unwanted stuff easy enough. The remapping and identifying dead result set URLS and resource images took a little more work. The links or images that were located within the CoolRunning site were downloaded/relinked when available or removed when not.

As of April 2024 this is still a work in progress. Im still improving the convert/reformat logic as needed, particularly older results were the page syntax was more varied. I also only just started validating my work - making sure all the avialable captures were downloaded; all result sets and content links were sucessfully remapped when available or removed when not; result content information was not inadverlty damaged or lost in the conversion process; and finally ensuring all converted files and linked resources have been properly uploaded to new site.
View the Race Results Repository

Posted Counts: Parsing thru each state and year listing page and finding each race link. Incomplete. A best guess based on available information. Webcrawl/Captures of state listing pages stopped around Aug 2018.

Capture count: Count of races where the result page url parsed form state page return a capture from the Query Wayback Engine API that was succesfully downloaded.

Convert Count: Count of race result page was succesfully converted and uploaded to new site.