A Metric We Recommend You Avoid: Total Ad Counts

Posted on September 22, 2021 by Greg Chmura

If you use online jobs data for labor market insights, there is a certain metric that may be more misleading than helpful.

With Chmura’s online job ads dataset—called Real-Time Intelligence, or RTI for short—we aim to provide a clear and reliable picture of what's happening in the job market. Therefore, we do not show total counts in its dataset because of their questionable value. Furthermore, as described below, the more we would try to get an accurate and complete total count, the more the quality of the entire data set would be compromised.

Online job ads are found basically on one of three types of websites: the company’s own webpage; an aggregator page where the business knowingly promotes its job ads, oftentimes paying a third-party to have its ads listed; and indirect aggregator pages, or sites which show ads from multiple businesses without, necessarily, the knowledge of those companies.

If the same ad shows up on fourteen different websites, the unique, deduplicated count would be “one” unique ad. However, the total count would be “fourteen” since it shows up in fourteen different places.

Some might suggest using this total count of fourteen sites is a helpful metric. That's not the case if you’re looking for accurate labor market insights.

If Chmura was able to limit its search only to a company's webpage and direct aggregator sites, we could see how many different places a business is deliberately advertising a job online.[1] If that was possible, the data might arguably be correlated to the effort a company is putting into get the job filled.

But unfortunately, in practice, it is not always possible to distinguish between direct and indirect aggregators.

Moreover, the number of indirect aggregators is so large that their online presence significantly inflates the total ad count. The number of sites where a job appears doesn't provide a true illustration of the effort a company is putting into getting that job filled, but rather says more about the online job board environment.

Quality is another big factor. Including ads from any and all aggregators into Chmura's dataset would compromise the integrity of the data, introducing numerous inaccuracies.

While the accuracy, consistency, and completeness of ads can be good or bad on any site, these factors are typically more problematic on indirect aggregator sites, perhaps because employers do not have control over the ad content on these sites.

For example, instead of the full job description appearing, a site may show only the first few sentences. Or, the location of the workplace may be altered if the aggregator site follows its own protocols rather than strictly following the original presentation from the employer. Another issue is that an indirect aggregator can leave an ad up long after the employer has removed the post from its own website, making the ad appear to still be active even though it is not.

These may not sound like large problems, but these types of alterations create multiple implications for the data, and none of them are positive. Each detracts from the accuracy of the data, including the introduction of wrong locations, missing job requirements, and incorrect duration times.

Combined, these variations sabotage the deduplication process, which ultimately leads to inaccurate unique job counts.

Derivative Metrics

Since Chmura does not recommend the use of total ad counts for labor market analysis, we likewise do not recommend the use of any derivative metrics.

For example, some would propose using the ratio of total counts to unique counts as a measure of the intensity with which an employer is attempting to fill a position. Since the total count is questionable as described above, a ratio such as this would be equally problematic.

The large number of indirect aggregators distorts these metrics, which can then lead to misleading conclusions for anyone using these data. We therefore solely provide deduplicated ad counts to provide a more reliable and useful set of data for our clients and their applications.

Webpages Not Used in RTI

Chmura's dataset captures job ads from more than 40,000 websites. Some sites, however, are deliberately not used.

Sites that have, for example, incomplete job description text and unreliable location information are omitted. Frequently these sites tend to be aggregators that for the most part contain job ads that are already being captured from other sites. Any site that does not pass Chmura's quality assessment will not make its way into the final dataset.

Please reach out to Chmura if you’d like to learn more about our data or economic consulting services.

[1] “Job” in the singular is being used here for ease of reference. This usage implies one job per one ad. While this may be the most usual case, in actuality, job ads frequently are placed to fill more than one position (and also may not result in any hires, such as when no suitable candidate is found).

This blog reflects Chmura staff assessments and opinions with the information available at the time the blog was written.