Identifying factors that contribute to collision avoidance behaviours while walking in a natural environment

To address our research questions, we filmed natural walking behaviours along a paved path and used deep learning computer algorithms and subjective observation to analyze pedestrian interactions. The steps are illustrated in Fig. 1. Below we describe each step in detail.

Table of Contents

Data collection

We filmed pedestrians walking (i.e., bi-directional pedestrian flow) along a 3.5–3.8 m wide urban path at English Bay, Vancouver, BC, Canada (Fig. 2). A Sony FDR-AX43 UHD 4 K Handycam Camcorder secured on a tripod recorded behaviour on an angle from an elevated position overlooking the path at 60 Hz. This elevated angle ensured the pedestrian tracking algorithms would work. We selected this path for its high foot-traffic and location. This location is a confined path with a minimum number of obstacles. The length of the path represented the study area and was between 18 and 25 m depending on the camera position and zoom level on a given collection day. The path afforded bi-directional pedestrian movement. We collected data between the hours of noon to 5 pm during the peak pedestrian traffic months of June, July, and August of 2020 and 2021. The consistency in the time and month of data collection helped maintain a uniform pattern in pedestrian activity, contributing to the accuracy of our findings. Pedestrians were not aware of being filmed and thus, we were able to collect natural, real-life behaviour.

Because we filmed at a public place, there was no reasonable expectation of privacy. In addition, the research team did not interact with any pedestrian, nor collect any personal data. Thus, we did not require consent. The Office of Research Ethics at Simon Fraser University approved the study protocol (#30000463), and all methods were carried out in accordance with relevant guidelines and regulations.

Preparation of videos for analysis

To make the kinematic analysis easier and more intuitive, we transformed videos into a bird’s-eye view (described in more detail in the next section). Specifically, we created a perspective transformation matrix in Python (version 3.5.2; to use later during the video analysis. Thus, in this step of the analysis, we selected four points, forming a rectangular area of interest (AOI), from the first frame of the video (see Fig. 2) that served as inputs to this function. We used the bottom left corner of the AOI as the origin of the new coordinate system (0, 0). Subsequently, we selected three points with a known distance between each to use for converting pixels to cm during the video analysis (see Fig. 2). Finally, we selected a point on the bottom left corner of any obstacles present (e.g., bench, garbage can, light pole).

Frame-by-frame tracking: identification of pedestrians and one-on-one pedestrian interactions

To detect pedestrians for further analysis, we processed the raw video footage using a deep learning algorithm called FairMOT, which is based on the YOLOV5 architecture¹⁷. FairMOT is specifically designed to identify pedestrians in videos and track their movements throughout the recorded period. We used a pre-trained version of this algorithm and customized it to suit the requirements of our project. This customization included use of this algorithm and embedded pedestrian tracking, converting distances, and extracting kinematic data. One of the key capabilities of FairMOT is its ability to track multiple pedestrians simultaneously, allowing us to extract each pedestrian’s individual walking trajectory. The algorithm analyzes frames from the video using an encoder-decoder network. With this network, the encoder compresses the input, and the decoder reconstructs features relevant to pedestrian detection. This generates a feature map that highlights elements indicative of human presence, allowing for efficient detection and tracking of pedestrians.

Each frame, all the pixels that make up pedestrian(s) are identified, then the algorithm creates a distinct bounding box, with a unique ID, around each separate pedestrian (see Fig. 2). This bounding box not only serves to distinguish between individuals but also provides a means to measure distances between them. Subsequently, we transform the lower left corner of each bounding box to a bird’s-eye view using the perspective transformation matrix created earlier. Finally, for each point representing a bounding box, we convert the pixel coordinates to cm. To perform a pixel-to-cm conversion, we used following equations:

$$L_cm=\fracL_pixL_1-L_2*180$$

(1)

$$W_cm=\fracW_pixW_1-W_3*180$$

(2)

where L₁ – L₂ represents the difference in pixel length (y-coordinate) between two of the selected points (see Fig. 2) that correspond to 180 cm in the real world. Similarly, W₁ – W₃ represents the difference in pixel width (x-coordinate) between two of the selected points (see Fig. 2) that correspond to 180 cm in the real world. $\:L_pix$ and $\:W_pix$ represent the position in pixels of a point in the video and L_cm and W_cm represent the position in cm of that point.

To track each pedestrian across frames, we used JDETracker from the FairMOT library. This method compares the locations of pedestrians in consecutive frames. If the coordinates of an individual (i.e., the point representing the bounding box) in the current frame are within a specified proximity to the coordinates of the same person in the previous frame, they are considered as from the same pedestrian.

After detecting and tracking all pedestrians in the video, we manually searched for one-on-one pedestrian interactions. We use the term interaction to denote an occurrence where one pedestrian walks past another pedestrian on the path; we did not observe any collisions. For each interaction, we recorded the pedestrian IDs and the time at which they walked past each other (i.e., crossing time) for later analyses.

Path deviation identification and ML separation

To address our primary question as to which factors influence whether a person deviates from their initial walking trajectory, we first had to determine whether and when one or both pedestrians deviated. We used a change in the medial-lateral (ML) separation between the two pedestrians, which represents the mutual ML separation. Specifically, we calculated the mean and standard deviation of the ML separation between the two pedestrians from 1 s before to 0.5 s after the crossing time. We then worked backwards in time from the time that the two pedestrians crossed each other to detect a point of deviation. A point of deviation was identified if the ML separation between the two pedestrians fell below 2 standard deviations of the average ML separation. This approach allowed us to identify the closest time to the time of crossing when two pedestrians increased their ML separation to avoid a collision while approaching each other. This method ensured that we detected the point of deviation related to the pedestrians of interest rather than deviations that might have occurred earlier due to interactions with other individuals. We also calculated the ML separation when the two pedestrians were 8 m apart for use as a predictor variable in our statistical analyses, since all deviations occurred when the two pedestrians were less than 8 m apart in the anterior-posterior direction (see Results section). Furthermore, we calculated ML separation when the two pedestrians walked past each other (i.e., ML separation at crossing) for use a response variable in our statistical analyses (see Fig. 3a).

Calculation of crowd size

Because all deviations occurred when the two pedestrians were less than 8 m apart in the anterior-posterior direction (see Results section), we used an 8 m distance as a reference point for calculating crowd size. To calculate the crowd size, we identified the precise time when two pedestrians were 8 m apart from each other. At this moment, we counted the number of pedestrians within the designated scene. The designated scene included the 8 m distance between pedestrians, plus an additional 2 m behind each pedestrian, resulting in a total of 12 m in the anterior-posterior direction (see Fig. 3b). In the ML direction, the designated scene included the ML distance between the two pedestrians, plus an additional 0.8 m on the outer side of each pedestrian (see Fig. 3b). We included the extra area around each pedestrian because it provided sufficient space to include pedestrians who were slightly outside the direct line between the two individuals but still within a reasonable proximity who could theoretically influence deviation behaviour. By defining the scene dimensions in this manner, we ensured that our crowd size calculations captured a comprehensive view of the surrounding pedestrians involved in the interaction.

Subjective assessment of interactions

Three people from the laboratory but unaware of the study purpose reviewed videos and answered a comprehensive questionnaire of 14 questions related to each one-on-one pedestrian interaction. The uneven number of people ensured that we had a tiebreaker in cases where two people disagreed on an answer. The questionnaire captured aspects of pedestrian interactions that could not be quantified merely through our kinematic data, such as the approximate age of the pedestrians, potential mobility constraints, distractions like phone usage, and reasons for deviating from a straight line even when far from another person. Each person rated their confidence level in their answers on a scale from 0 to 100%. We only used responses in later analyses where the average confidence level across all people exceeded 75%. The questionnaire is included in Supplementary Information.

Some of the questionnaire questions had a greater number of categories than we opted to use in our analyses. We decided to use many categories to provide a thorough account of each interaction. However, to increase counts in certain categories and simplify analyses, we reduced the number of categories. Specifically, we reduced the age variable to two categories: pedestrians within the same age group and pedestrians of different age groups. Age groups included children (e.g., < 12 years old), adolescent or young adult (e.g., 13–20 years old), early adulthood (e.g., 21 to 40 years old), middle adulthood (e.g., 41 to 60 years old), and older adult (e.g., 60+ years old). Although we asked about a variety of different obstacles within a distance to affect behaviour (e.g., light pole, bench, stationary person), we used two categories for the presence of an obstacle: yes and no. In addition, we reduced the mobility constraint variable to two categories: yes and no. We defined a mobility constraint as an object held or used while walking, excluding a cell phone, and options on the questionnaire included a cane, two- or four-wheel walker, pushing a bicycle, stroller, or wheelchair, and walking an animal. Due to the challenge of identifying sex from the videos based on feedback from the people completing the questionnaire, we chose to remove this variable from further analyses. To assess inter-rater reliability of the factors from the questionnaire used in our analyses (pedestrian age group, distracted pedestrians, presence of obstacles, and mobility constraints), we randomly selected 50 one-on-one pedestrian interactions from our database. We found 100% rater agreement (Fleiss’ Kappa = 1) for each factor. Table 1 provides an overview of all factors used in the statistical analyses.

Table 1 Factors used in the statistical analyses and their categories.

Statistical analyses

To determine which factors can predict whether one or both pedestrians deviate during their interaction, we used a multiple logistic regression model. We included the binary response variable, deviation (yes/no; reference = no deviation), and the following predictor variables: ML separation at 8 m, crowd size at 8 m, age group, distraction, presence of obstacles, and mobility constraint.

In a series of secondary analyses, we first focussed only on interactions with deviations to determine which factors can predict the time of path deviation relative to the pedestrians walking past each other (i.e., time to pass from deviation). In a linear regression model, we included time to pass from deviation as the response variable (log transformed to ensure normality) and ML separation at 8 m, crowd size at 8 m, age group, distraction, and mobility constraints as predictor variables. We excluded the obstacle variable in this analysis because there were too few cases. Subsequently, we sought to determine which factors can predict the ML separation between the two pedestrians at the time of crossing (including cases with and without path deviations). We used a linear regression model, with ML separation (at time of crossing) as the response variable and age group, distraction, mobility constraints, presence of obstacles, and crowd size at 8 m distance between pedestrians as predictor variables.

We used RStudio version 2023.06.1 + 524 (R version 4.3.1) and an alpha level of 0.05 for all statistical analyses. We used the plot_model function of the sjPlot package version 2.8.15 to create plots of the results. We also re-ran each regression model with only the significant predictors and show these results alongside the full model.

link

Identifying factors that contribute to collision avoidance behaviours while walking in a natural environment

Data collection

Preparation of videos for analysis

Frame-by-frame tracking: identification of pedestrians and one-on-one pedestrian interactions

Path deviation identification and ML separation

Calculation of crowd size

Subjective assessment of interactions

Statistical analyses

More Stories

The natural architecture of oyster reefs maximizes recruit survival

Community Commitment | Constellation Energy

RVA Environmental Film Festival Returns This Year

The natural architecture of oyster reefs maximizes recruit survival

Community Commitment | Constellation Energy

‘A New Dawn’ Director Yoshitoshi Shinomiya on His Anime Breakthrough

RVA Environmental Film Festival Returns This Year

Data collection

Preparation of videos for analysis

Frame-by-frame tracking: identification of pedestrians and one-on-one pedestrian interactions

Path deviation identification and ML separation

Calculation of crowd size

Subjective assessment of interactions

Statistical analyses

More Stories

The natural architecture of oyster reefs maximizes recruit survival

Community Commitment | Constellation Energy

RVA Environmental Film Festival Returns This Year

You may have missed

The natural architecture of oyster reefs maximizes recruit survival

Community Commitment | Constellation Energy

‘A New Dawn’ Director Yoshitoshi Shinomiya on His Anime Breakthrough

RVA Environmental Film Festival Returns This Year