I decided it was finally time to do some real signal processing, so I thought I would start off by trying to count the number of steps in one of my daily routines: walking to Real Foods in the morning to get a coffee and a bagel. I walked naturally across my apartment to establish a baseline signal, with both my Flex and my logger on my right arm. The Flex recorded 21 steps in this period. I then walked out of the apartment, down about 10 steps, down a gently downward sloping street for about 1.5 blocks, up a flight of about 10 steps into the Real Foods, got a cup of coffee and ordered my bagel. I stood around for a while waiting, and then reversed the process after check out. After the initial calibration steps, I recorded 517 additional steps on my Flex which equated to 0.24 miles as I undertook the following trip:
I then plotted the data (Real Foods Data
) my logger had recorded at a sample rate of 25 Hz, as shown below:
Plot of Logger Data From a Quick Trip to Real Foods
The first yellow bar shows the calibration steps, and the second and third bars show the trips to and from Real Foods, respectively. My first thought was to run a script from a previous post for the calibration step data to see how it performed. I did this from time t = 43 to t = 60, with the minimum distance between samples set at d = 10, and produced the following plot:
myfirstpeakdetector.m Running From Time t = 43 to t = 60
The simple algorithm (which is incredibly sensitive to the input data range is defined) calculated the following number of peaks (i.e. step count):
While the Y and Z axes are close to the expected value of 21, the X axis seems to be an outlier, which drags the average value higher. I won’t get into a discussion about potential causes for this, since the real goal here was simply to find a data range within the calibration steps that was suitable to use as a sample data signal. I chose to use t = 45 to t = 55, which seems like it should contain a very crisp, uniform signal for all three axes. I wanted to run a cross correlation of the sample data range versus all the data for each axis, with hoped that I could use periods of high correlation to window a peak detection algorithm, thereby only counting peaks which matched the pattern of a series of steps. I created a script correlationplotter.m which uses the xcorr function to sweep data from each axis from t = 45 to t=55 across all the data from that axis, and then plot the correlation (blue) on top of the original data (green). It produced the following plot:
Correlation Between All Data and a Data Sample from t=45 to t=55 Plotted Ontop of the Original Data for each Axis
The X Axis appears to have good data. The correlation output is high (magnitude has been scaled to match original signal magnitude) during the sample data period, as well as during the two heavy period of walking to and from the store. The relationship between the two data sets from t=550 to t=650 is truly text book, and gave me great confidence in the methodology being used Unfortunately, this methodology does not seem to hold up as well for the Y and Z axes, in which the relationship between the correlation and the original data seems to be the opposite of what it should be; it is lower during periods of heavy walking, and higher during periods of random motion or little movement. This could possibly be explained by the fact that with my arm in the normal walking position, i.e. hanging to my side, the X axis is most likely to experience a force that is directly related to the number of steps, as my arm is moved up and down when the arch of my foot expands and contracts as part of my stride. I will move on with an analysis solely of the X axis, but if anyone can spot an error in my correlation methodology for the other two axes, I would greatly appreciate a comment on what I’m doing wrong.
I created a script correlationwindower.m, which performs a cross correlation of the data from an axis with a sample data set selected by a user definable range. The script then creates a windowing vector by converting every data point in the correlation vector to either a 0 or a 1 based on whether its value is greater or less than a user definable threshold, expressed as a percentage of the maximum value in the correlation data set (after scaling). This windowing vector is then multiplied by the original data set to remove all values which do not have a decent correlation with the sample data set. It then performs a peak detection on this new windowed data, where peaks must be greater than the mean of all non-zero values, and have a minimum distance between them of a user definable number of samples. The input window for the script looks like this:
Input Window for correlationwindower.m
The windowed signal looks like this:
X Axis Signal Windowed by It’s Correlation With a Sample Dataset
And the script outputs the detected peaks plotted on top of the original, “un-windowed” signal:
Original Signal With Detected Peaks
As you can see, for the input parameters shown in the previous screenshot of the inputs window, the script detected 583 peaks, which is about 13% more than we expected; not too bad considering all the non-cyclical motion that was going on. I thought it was also worth performing a brief sensitivity analysis around both the correlation windowing threshold value and the minimum distance between peaks. The table below shows the absolute percentage different between the number of steps the script calculated given different values for the aforementioned parameters, and the 517 steps the Flex recorded. All parameter pairs with a 13% or less absolute difference have been highlighted, to show that inputting those parameters would have been at least as accurate as the data shown above:
Sensitivity Analysis Around Minimum peak distance and Windowing Threshold
The sensitivity analysis shows that there is a broad tolerance centered roughly around the 10 samples, 60% cut off threshold for the windowing. This is probably a good thing, although it would be nice to expand the map further. Overall, I was happy with the first real signal processing that I’ve done so far, and I look forward to trying the script methodology out on other data sets in the future.
UPDATE: I just realized that my algorithm is actually more accurate than I had previously thought. The 583 detected peaks includes ALL the peaks in the sample, when what should be compared to the Fitbit’s 517 measured peaks are just the peaks which occurred after calibration set was recorded. Removing the peaks that occurred before t = 55s brings the total detected peaks down by 30 to 553, which is only a 7% difference from what the Fitbit calculated. The sensitivity table above is now invalid because it includes these early peaks, but I’m not going to bother to recalculate it because it’s scotch thirty.