Assessment of Applicability of Wi-Fi Analytics in Studies of Urban Public Transport Passenger Flow ( Moscow Case Study )

The advantages and disadvantages of existing tools for calculating passenger flow are shown using the example of the city of Moscow. The objective of the research was to assess possibilities of using Wi-Fi data as a tool for analysing passenger flow. The authors used two types of Wi-Fi scanners and a tool they developed to analyse the collected data. The primary results of the study demonstrate the possibility of practical application of Wi-Fi data to analyse passenger flow. The described empirical studies, particularly data received from the portable Wi-Fi scanner, have shown that more than 20% of mobile devices in urban public transport and metro are used with Wi-Fi enabled, which is clearly not enough to get results necessary for comprehensive and detailed analysis of passenger flows. Nevertheless, the accumulating data allow to get possibility to forecast general passenger flow. A portable Wi-Fi scanner does not provide an opportunity to extensively capture a large area of the surveyed territory in real time (stops of urban public transport, locations where passengers enter the metro, etc.). Stationary Wi-Fi scanners could increase the amount of data and, accordingly, significantly adjust the results obtained. This enhancement could also be achieved through expansion of adoption of the tool of studying passenger flow to urban railways, i.e., in case of Moscow, to Moscow Central Circle and Moscow Central Diameters, as those routes provide Wi-Fi access at stations and in coaches. Data collected from Wi-Fi scanners can be an additional tool to other data sources, such as validation, automatic systems of passenger flow monitoring, and data obtained from cellular operators. For this reason, the further research in the field of Wi-Fi analytics along with development of technology in the field of existing data sources of passenger flow monitoring may result in better calculation of passenger flow.


INTRODUCTION
Correct planning of urban transport infrastructure and organisation of public spaces, transport interchange hubs, routes of urban public transport, car sharing, taxis, and making appropriate decisions require the most accurate and relevant source data regardless of future use of various methods of data analysis . This issue is especially important in relation to megacities, given the scale of these planning and organisation tasks .
The territory of Moscow agglomeration englobes a huge number of routes that use different types of urban public transport, such as bus, trolleybus, tram, electric bus, and metro, including Moscow Central Circle (hereinafter referred to as MCC) and Moscow Central Diameters (hereinafter referred to as MCD), which are integral part of urban mass transit .
State Unitary Enterprise «Moscow Metro» in 2018 carried more than 2,5 billion passengers 1 , State Unitary Enterprise «Mosgortrans» transported more than 1,28 billion passengers 2 . These statistics demonstrate a high load on urban public transport and require daily analysis of passenger traffic, as well as development of an origin-destination trip correspondence matrix .
Conventional tools for primary data collection comprise ticket validators, video sensors of passenger traffic (automatic systems of passenger flow monitoring), video surveillance and others . With advancement of digitalisation, new tools have emerged, such as big data analysis . One of the most promising areas is the use of Wi-Fi technologies . Currently, the topic of using Wi-Fi technologies for analysing passenger flow has received a sharp development .
A large number of articles on Wi-Fi analytics, in most cases, in one way or another, are not related to passenger flows, but to pedestrian flows (for example, in TIH or passenger terminals of airports, their transit zones) . To measure and analyse the flow of pedestrians and paths of their movements (origin-destination trips), it is possible to use sensor technologies, such as Wi-Fi scanners, which provide an opportunity to understand not only the volumes of travelling, but to reveal the locations where the trip starts and ends (the latter aspect is associated with a particularly difficult task) . The authors of the studied works tested various Wi-Fi scanners to determine the maximum possible number of electronic devices, i .e ., impersonal (depersonalised) pedestrians [1] . The applicability of technical solutions for study of pedestrian flow in terms of data collection was well shown . Also, some algorithms have been developed for filtering and analysing data collected from electronic devices [2] . Thus, in the research studied, attention was paid to the technical aspect of implementation of monitoring, in particular, to the use of filters to eliminate noise from various electronic devices, while the average walking time of pedestrians in the underpass was estimated [3; 4] . However, this filter does not accurately calculate the number of moving pedestrians . Other researchers have developed a system for analysing pedestrian traffic using Wi-Fi packet sensors [5] and have shown how a Wi-Fi detection system can be implemented and what are some of the difficulties in designing and managing such a system, both on small and large scales [6] . Researchers [7] presented the use of MAC address data as an effective tool for tracking and analysing the spatial and temporal dynamics of a pedestrian in terms of behaviour when using a shared space . The performance of the Bluetooth-Wi-Fi system has been developed and evaluated in solving problems of detecting anonymous MAC addresses of devices over short distances in fixed locations [8] . Possibilities of revealing pedestrian flows were also studied by analysing schemes for detecting surrounding Bluetooth devices [9] .
Also, several studies have pointed to problems associated with methods of filtering Wi-Fi data [10; 11] . Several studies have been inconclusive due to the lack of quantitative measurements . Several studies have compared methods of observing passenger traffic, including such parameters as the number of p a s s e n g e r s o n b o a r d , b o a r d i n g a n d disembarking at each stop, based on filtering results [12−14] . The apparent discrepancies between surveillance data and Wi-Fi filtering results indicate significant errors caused by strong threshold filtering methods . Therefore, an accurate and efficient way of separating passenger and non-passenger MAC address data is highly needed .
A co-author of this research N . Alekseev in collaboration with William H . K . Lam, Professor of Hong Kong Polytechnic University, wrote an article [15], describing the research on counting pedestrians within the territory of Hong Kong Polytechnic University . According to the study, approximately 32 to 40 % of active Wi-Fi devices of the actual number of pedestrians passed were detected using Wi-Fi scanners . The article concluded that to predict pedestrian flows, it is necessary to get more data from Wi-Fi scanners for different periods of time (summer/winter/autumn/spring, working day/ weekend) .
The objective of the work is to analyse the application of Wi-Fi data to clarify the matrix of origin-destination trips between stopping points of urban public transport . The task of the work is to create and test an algorithm for calculating passenger flow in public transport using a new method of data collection, namely a Wi-Fi scanner .

Types of Data Sources on Passenger Flow in the City of Moscow
Currently, Automatic system of fare control (ASKP) of Mosgortrans [Moscow City Transport] and the fare control system at State Unitary Enterprise «Metro» are in operation in Moscow metropolitan area . These systems include control over passenger travelling with single transport payment cards (for example, such as Troika travel card) allowing access to the services of all the urban public transport facilities and the metro . Also, some people use social cards delivered to pensioners, students, and schoolchildren . To pay for travel, people make their cards read by validators installed in the compartments of city public transport vehicles and at the entrance to Moscow Metro stations . Each validator and card have their own unique identification numbers, which makes it possible to calculate the number of passengers using public transport on a daily basis and to analyse their travelling .
Given the global use of single transport payment and social cards, the data obtained can be used to calculate passenger flow . The data makes it possible to determine the number of passengers entering the urban public transport, metro and, in the future, to analyse and forecast passenger flows in Moscow metropolitan area . The disadvantage of calculating passenger traffic using data from validators is the lack of data on exits from public transport and the metro, either on the locations of passenger transfers (this problem applies specifically to the metro) . It is possible to predict the passenger's exit point based on a new point of entry into urban public transport and metro, but there is a possibility of an error, due to the fact that the prospective passenger can leave metro at one station and enter it at another one . For example, using validation to determine trips of couriers using metro during their journey is a daunting task . Moreover, following the refusal to the system of the «forced» single entrance to the urban public transport vehicles [the need to pass only through the turnstile that validates the ticket or a transport payment card], many passengers do not pay for the fare, i . e ., do not use a single transport payment card or social card, and this can lead to even greater errors in calculating passenger flow . JSC Tsentralnaya PPK [Central commuter railway] has a similar system and, accordingly, similar problems .
Automatic system of passenger flow monitoring (ASMPP) is another source of data . This system is equipped with video sensors located above the doorways of public transport vehicles . Video sensors count the number of passengers entering and leaving a vehicle . The disadvantage of this system is that the system does not identify the passenger and does not give an understanding of who and when entered public transport and where they left . Moreover, not all urban public transport vehicles are equipped with automated control systems . ASMPP is installed on separate vehicles and separate routes on irregular basis to determine the approximate passenger traffic . Also, there is a problem of synchronisation of ASMPP data with GLONASS/GPS tracks of urban public transport .
Another source of data on passenger travels is data received from mobile operators . There is a «Geosocial Analytics» (GSA) project based on the data of mobile operators, the purpose of which is to collect data on the population, on the dynamics of its travels, on the load on the transport infrastructure by analysing the load on the cellular network of mobile operators . With the help of this project, the locations of concentration of the population and movement of cellular subscribers are being studied . The scatter of the data is 500 m by 500 m, which is a rather a large range when considering small population travels . Table 1 shows the advantages and disadvantages of existing data sources for calculating passenger flow .
Considering the sources of data on passenger flows described above, the authors draw attention to the fact that there is no data source that would allow determining the exit point from urban public transport (from a bus, tram, trolleybus, electric bus) and from the metro, including MCD, MCC . It is for this reason that the authors considered the possibility of using a new type of data source: data obtained from Wi-Fi scanners .

Wi-Fi Analytics as a Data Source. Types of Wi-Fi Scanners
It is now claimed that more than 80 % of people have used at least one mobile Wi-Fi device in their daily life [16] . Thus, Wi-Fi based passenger flow estimation has great potential to become a more reliable method as compared to existing tools .
There is a single Wi-Fi network called MT FREE on the territory of Moscow metropolitan area, used to increase attractiveness of urban public transport . This Wi-Fi network comprises Wi-Fi routers installed in all the vehicles, at public transport stops and in metro coaches . This Wi-Fi network is free for all residents and guests of Moscow metropolitan area, but it has advertising content . To disable advertising content, the intended user is prompted to make a monthly payment . The only condition for using this free Wi-Fi network is registration with your MT FREE account .
Having in mind an extensive Wi-Fi network in Moscow metropolitan area, and relevant expertise developed for other purposes and locations, the authors supposed a possibility of using Wi-Fi data to determine the entry and exit points of prospective passengers .
As part of the study, the authors carried out research on calculating passenger flow using two Wi-Fi scanners on public transport, in particular, at public stops, in buses and metro coaches .
For the study, two Wi-Fi scanners with the function of detecting electronic devices were used . Each Wi-Fi scanner has its own detection area, which depends on the power of the Wi-Fi antenna, measured in decibels . An electronic device is identified by its unique MAC address . The MAC address is a unique identifier provided by the manufacturer for each electronic device and has a six-byte number (LL: LL: LL: XX: XX: XX), in which the first three bytes identify the manufacturer of the electronic device (LL: LL: LL) .
The Wi-Fi scanner has a Received Signal Strength Indicator (RSSI) . The area where the electronic device is located can be determined by the signal strength . Since, presumably, an electronic device with Wi-Fi enabled is used by a passenger of urban public transport, we can determine movement tracks (origin-destination trip matrix) of prospective passengers, namely the places where they start and end their travel .
The first Wi-Fi scanner was received for testing from a distributor in the Russian Federation . Manufacturer was Libelium (Spain), model was Meshlium Xtreme . This device is large in size and is primarily intended for static use .
The principle of operation of a Wi-Fi scanner is to search for active Wi-Fi devices within its range . The Wi-Fi scanner scans active Wi-Fi devices (smartphones, laptops, tablets, printers, MFPs, etc .) within its range and in response receives MAC addresses of the scanned devices Below is an example of raw data from the Meshlium Xtreme Wi-Fi scanner ( Table 2) .
This Wi-Fi scanner has a built-in memory that allows to record the received data and further, to upload data to a local server .
Though this Wi-Fi scanner has a Bluetooth scanner, the Bluetooth scanner was not used in this work .
The second Wi-Fi scanner was purchased from an organisation engaged in Wi-Fi analytics in the Russian Federation, namely, analysing the behaviour of potential buyers in shopping centres . Basically, this organisation uses a Wi-Fi scanner, which is permanently installed in shopping centres .
To solve the problem posed by the researchers, a portable Wi-Fi scanner was created . This portable Wi-Fi scanner consists of the following elements: 1 . TP-LINK TL-MR3020 v3 .2 -Wi-Fi router reconfigured to a Wi-Fi scanner .
2 . Powerbank mi 20000 -portable power supply for Wi-Fi scanner and GSM modem .

Research Description
The passengers were counted in two ways: • Full-scale (visual calculation of the number of passengers on the bus at the points of entry/ exit into/from the bus and entry/exit of passengers at specific stops with reference to time) .
• Scanning Wi-Fi devices at stops and inside public transport with time reference using a portable Wi-Fi scanner .
The received initial data were transferred to the created Framework based on Excel . Framework has the following degrees of filtering: 1 . Removing «noise» . Noise means all Wi-Fi devices detected within the range of a Wi-Fi scanner, that are routers, etc . That is, it is understood that routers and other similar devices cannot be considered as an intended passenger .
2 . Removing MAC addresses that have one or two stamps . This Wi-Fi device makes it possible to scan the area in real time . In practice, to detect movement of prospective passengers, at least 5 detections are required at different periods of time .

Pic. 5. Data from visual inspection and Wi-Fi scanner on November 28, 2019 (bus route
No. 249).    The data obtained from urban public transport (bus) are shown in Tables 3−7 and shown in Pic . 3−8, and from the metro -in Table 8 and Pic . 9 .
The data in the above tables show that, on average, more than 20 % of prospective passengers have Wi-Fi enabled on their mobile devices, which makes it possible to determine the origindestination trip matrix for a certain number of «passengers» (detected Wi-Fi devices) both in urban public transport and in the metro .
Pic . 1 shows a graph of the origin-destination trip matrix obtained as a result of analysing the   Result: On average, more than 20 % of prospective passengers have Wi-Fi enabled on their mobile device .
data received from a Wi-Fi scanner in a metro coach . These results were obtained exclusively from a «portable» Wi-Fi scanner . Comparing the visual data (an overview of the number of passengers in the coach) and Wi-Fi data, as a result of the study, it was revealed that, on average, more than 20 % of the passengers present in the coach travel with Wi-Fi enabled on their mobile device .
Another feature of the Wi-Fi scanner is that a mobile device does not need to connect to the MT FREE network to receive data, it is enough to have Wi-Fi enabled on the mobile device .
According to the primary analysis of the study results, it might be concluded that the data may be insufficient . For this reason, the researchers suggest that to improve the results obtained, it is necessary to use the installation scheme for Wi-Fi scanners, shown in Pic . 11 .
In ideal conditions, it is necessary to install 1 Wi-Fi scanner in the bus, 1 Wi-Fi scanner at a public transport stop, 1 Wi-Fi scanner at the metro entrance, 1 Wi-Fi scanner in a metro coach, 1 Wi-Fi Fi-scanner at the exit from the metro . The suggested placement of devices will significantly improve the result of the study of passenger flow .

Serpukhovsko-Timiryazevskaya metro line).
The data in the above tables show that, on average, more than 20% of prospective passengers have Wi-Fi enabled on their mobile devices, which makes it possible to determine the origin-destination trip matrix for a certain number of «passengers» (detected Wi-Fi devices) both in urban public transport and in the metro.
Below is a graph of the origin-destination trip matrix obtained as a result of analysing the data received from a Wi-Fi scanner in a metro coach. These results were obtained exclusively from a «portable» Wi-Fi scanner.  Also, to exclude the full-scale field method, it is necessary to use data from the ASMPP and navigation terminals installed in the urban public transport, which will determine date and time of arrival of a vehicle at the urban public transport stop and the actual number of passengers entering and leaving the vehicle .

Pic. 10. The origin-destination trip matrix of Wi-Fi devices («prospective» passengers) between stations Bulvar Dmitriya Donskogo and
As a result, we will receive data about date, time and MAC address of the device and data from the navigation terminal . By processing the received data, we will be able to determine at what particular moment the device (MAC address) and at which particular stop entered and exited . This will allow a more detailed consideration of the analysis of passenger flow .
We also argue that all the data obtained was used exclusively in the framework of research activities, without reference to specific passengers . The MAC addresses of devices are not personal data . The MAC addresses of the devices are technical data . All received data from Wi-Fi scanners were used exclusively to determine the entry and exit points of specific MAC addresses at public stops . All data are anonymous . The anonymous nature is associated with the use of MAC addresses as identifiers . MAC addresses are not associated with any particular user's account or mobile phone, not even with any particular vehicle . Besides, the mode of «active» search for a Wi-Fi net-work on a mobile device is the choice of each specific prospective passenger .

CONCLUSIONS
The results obtained demonstrate the prospects for development of Wi-Fi technology . An increasing number of prospective passengers will use urban public transport, switching from private cars to public transport In Moscow and Moscow region . This fact is considered by adoption of new tools aimed at increasing attractiveness of urban public transport, such as adaptive traffic light control, i .e ., priority to public transport . And, accordingly, it can be assumed that there will be much more passengers using urban public transport and the metro . Accordingly, an increasing number of passengers will be connected to free Wi-Fi networks . The authors are observing an increasing number of passengers consuming content on mobile devices from the Internet in their daily trips on public transport using mobile Internet (GPRS) or Wi-Fi networks . Now, from the data received from the portable Wi-Fi scanner, it is possible to make a preliminary conclusion that more than 20 % of mobile devices in urban public transport and metro are used with WI-FI turned on, which is not enough for clear assessment of passenger flow . Therefore, there is a possibility of forecasting passenger flow using the accumulating data .

Pic. 11. Block diagram of placement of Wi-Fi scanners.
scanners, shown in Pic. 11:

Pic. 11. Block diagram of placement of Wi-Fi scanners.
In ideal conditions, it is necessary to install 1 Wi-Fi scanner in the bus, 1 Wi-Fi scanner at a public transport stop, 1 Wi-Fi scanner at the metro entrance, 1 Wi-Fi scanner in a metro coach, 1 Wi-Fi Fi-scanner at the exit from the metro. The Researchers believe that a portable Wi-Fi scanner does not provide an opportunity to extensively capture a large area of the surveyed territory in real time (stops of urban public transport, locations where passengers enter the metro, etc .) . Stationary Wi-Fi scanners could increase the amount of data and, accordingly, significantly adjust the results obtained .
Also, the researchers suggest that besides studying passenger flow in urban public transport, it is required to conduct research at stations and in the coaches of MCC and MCD, which also have Wi-Fi networks .
Based on the results obtained, the researchers conclude that to obtain a complete picture of passenger flow, data from Wi-Fi scanners can complement other data sources, such as validation, ASMPP and data from cellular operators . For this reason, the researchers believe that further research in the field of Wi-Fi analytics, combined with development of technology of existing data sources for monitoring passenger traffic, may lead to better results in calculating passenger flow .