Google Flights is a useful platform with which to find the cheapest flights for a given trip. However, a user like me could find it difficult to peruse prices for return flights across multiple dimensions simultaneously. I capture information from the platform’s “Date Grid” and perform some very simple eyeballing.
This document was last updated on 2024-08-04.
Google Flights is a useful platform with which to find the cheapest flights for a given trip. The site offers many tools and levers for the user to fine-tune their search. However, a user like me could find it difficult to peruse prices for return flights across multiple dimensions simultaneously, specifically a) departure or return dates, and b) trip duration. Prices are surprisingly sensitive to slight changes in either dimension. To address this problem in my search for cheap tickets between Cape Town International (CPT
) and Brussels Airport (BRU
), I exploit– and capture information from the platform’s “Date Grid”. This data is compiled and cleaned, and investigated using a few simple techniques. This has aided my search for return trips I should be tracking.
Date grid
tool in Figure 2.Thus, I elicit as many date combinations or grids between 13 October and 16 November by using the Departure and Return toggles on the Date grid
. From my initial starting point, I toggle either Return or Departure in increments of 7 days. For example, for a given week of Return dates, I will shift the grid left or right (i.e. Departure dates) by a week, whilst staying within the bounds of 13 October and 16 November. I do the same for Return dates, but then keep Departure dates fixed. In turn, I take a screenshot of each resulting grid. Naturally, there existed plenty of overlap between the contents of the screenshots.
Screenshots were pasted to a single .docx
file, which was subsequently saved as a .pdf
. Using this free Image To CSV Converter, I was able to tabulate the data from the .pdf
containing all of the prices for the date combinations contained in the screenshots. The resulting .csv
file needed some cleaning, which was surprisingly time consuming. Most importantly, I removed duplicate entries and illogical date combinations (“no flights”).
By now it should be obvious to the reader that this approach is far from optimal. An API approach or, at the very least, a Date grid
with more rows and columns would have made my life quite a bit easier. I am not suggesting that other approaches are not possible, but instead that this was my approach, which emerged haphazardly and as a result of some path dependence related to using Google Flights as my starting point. It seems that the Google Flights API was shut down some years ago and I did not attempt to find any suitable alternatives. I also tried web-scraping the data, but I gave up pretty quickly when I could not easily determine the appropriate HTML attributes and nodes to use when the Date grid
was determined dynamically. Please, feel free to recommend more sensible paths.
The aforementioned screenshots were captured on 21 September 2023, thereby providing only a static representation of prices which are inherently dynamic. Duplicate and “no flight” entries were also removed. The resulting data is presented below.
There are 595 potential combinations of dates in the range of \([2023/10/13;\ 2023/11/16]\). That is, all unique combinations of Departure and Return dates where Departure does not exceed Return. Furthermore, trips should be no shorter than 7 days and no longer than 21 days. Filtering the data in this fashion leaves 315 potential combinations.
Of those remaining combinations, 28 price observations are omitted. In other words, these are the observations which I failed to capture with screenshots. Seemingly, as illustrated in Figure 4, I failed to capture the grid of Departures of \([2023/10/13;\ 2023/10/19]\) with Returns of \([2023/10/20;\ 2023/10/26]\). In fact, I neglected these observations because I did not deem it feasible to depart during the week of 13 to 19 October, particularly given the improbability of obtaining a Schengen Visa on such short notice. Ultimately, the dataset encompasses prices for 287 different trips.
Table 1 presents the cheapest possible return trips by departure date and trip length given my stylised criteria. At the time of sampling, the lowest price of any trip was R10356. 18 different trips of varying departure dates and duration were priced at this level.
Depart | Return | Trip | Depart | Return | Trip |
---|---|---|---|---|---|
2023-10-15 | 2023-11-01 | 17 | 2023-10-22 | 2023-11-10 | 19 |
2023-10-15 | 2023-11-03 | 19 | 2023-10-22 | 2023-11-11 | 20 |
2023-10-17 | 2023-11-01 | 15 | 2023-10-24 | 2023-11-01 | 8 |
2023-10-17 | 2023-11-03 | 17 | 2023-10-24 | 2023-11-03 | 10 |
2023-10-17 | 2023-11-06 | 20 | 2023-10-24 | 2023-11-06 | 13 |
2023-10-22 | 2023-11-01 | 10 | 2023-10-24 | 2023-11-08 | 15 |
2023-10-22 | 2023-11-03 | 12 | 2023-10-24 | 2023-11-10 | 17 |
2023-10-22 | 2023-11-06 | 15 | 2023-10-24 | 2023-11-11 | 18 |
2023-10-22 | 2023-11-08 | 17 | 2023-10-24 | 2023-11-13 | 20 |
21 September 2023. Price in ZAR. |
Figure 5 presents the distribution of prices by trip duration and a corresponding linear regression of price on trip duration. There is quite clearly a statistically significant \((\text{t-statistic} = -4.326)\) inverse relationship between price and trip duration \((\beta = -91.81),\) although the variation in duration alone explains only a relatively small portion of the variation in price \((\text{R-squared} = 0.058).\)
Figure 6 presents the price distribution of return trips by week of departure. At first glance, prices are decreasing in the amount of time between the date of data collection (21 September) and the departure data, although it is doubtful that this is a statistically significant inverse relationship (see Figure 7). This corresponds with the notion that trips booked on shorter notice are likely to be more expensive. The week beginning 30 October appears to be an exception. Presumably, surge pricing may apply during the first week of November. In all cases, the cheapest trips in the dataset are to be found at the extreme tails of price distributions of each week. In other words, the cheapest tickets—the ones I am most interested in–are exceptional cases.
This post has enabled me to take a closer look at some of the useful tools and features Google Flights has to offer. In particular, the platform is able to present the user with extraordinarily cheap tickets for a given return trip in a given time span. However, when users’ convictions about time frames and trip duration are loosely held, they may find it difficult to search for and compare promising options varying across both dimensions. I have tried–although quite clunkily–to overcome this issue by capturing trip data across multiple date grids. I have uncovered the cheapest trips given my idiosyncratic preferences and performed some cursory analyses. I show that prices are quite sensitive to slight changes in these parameters and that longer trips command lower prices. Moreover, departure dates determine prices in as far as a shorter lead time is likely to entail higher prices, and departure dates are subject to higher demand and (presumably) surge pricing. The sample, of course, is limited in size and only representative of a single route for a specific period spanning slightly more than a month. Next steps: APIs, time-varying prices and improved coverage.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com/WihanZA/wihan_distill, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Marais (2023, Sept. 25). Wihan Marais: CPT to BRU: A (Very) Cursory Google Flights Analysis. Retrieved from https://www.wihanza.com/posts/2023-09-25-cpt-to-bru-a-very-cursory-google-flights-analysis/
BibTeX citation
@misc{marais2023cpt, author = {Marais, Wihan}, title = {Wihan Marais: CPT to BRU: A (Very) Cursory Google Flights Analysis}, url = {https://www.wihanza.com/posts/2023-09-25-cpt-to-bru-a-very-cursory-google-flights-analysis/}, year = {2023} }