## Coffee Run

I decided it was finally time to do some real signal processing, so I thought I would start off by trying to count the number of steps in one of my daily routines: walking to Real Foods in the morning to get a coffee and a bagel. I walked naturally across my apartment to establish a baseline signal, with both my Flex and my logger on my right arm. The Flex recorded 21 steps in this period. I then walked out of the apartment, down about 10 steps, down a gently downward sloping street for about 1.5 blocks, up a flight of about 10 steps into the Real Foods, got a cup of coffee and ordered my bagel. I stood around for a while waiting, and then reversed the process after check out.  After the initial calibration steps, I recorded 517 additional steps on my Flex which equated to 0.24 miles as I undertook the following trip:

I then plotted the data (Real Foods Data) my logger had recorded at a sample rate of 25 Hz, as shown below:

Plot of Logger Data From a Quick Trip to Real Foods

The first yellow bar shows the calibration steps, and the second and third bars show the trips to and from Real Foods, respectively. My first thought was to run a script from a previous post for the calibration step data to see how it performed.  I did this from time t = 43 to t = 60, with the minimum distance between samples set at d = 10, and produced the following plot:

myfirstpeakdetector.m Running From Time t = 43 to t = 60

The simple algorithm (which is incredibly sensitive to the input data range is defined) calculated the following number of peaks (i.e. step count):

Algorithm Output

While the Y and Z axes are close to the expected value of 21, the X axis seems to be an outlier, which drags the average value higher. I won’t get into a discussion about potential causes for this, since the real goal here was simply to find a data range within the calibration steps that was suitable to use as a sample data signal. I chose to use t = 45 to t = 55, which seems like it should contain a very crisp, uniform signal for all three axes. I wanted to run a cross correlation of the sample data range versus all the data for each axis, with hoped that I could use periods of high correlation to window a peak detection algorithm, thereby only counting peaks which matched the pattern of a series of steps. I created a script correlationplotter.m which uses the xcorr function to sweep data from each axis from t = 45 to t=55 across all the data from that axis, and then plot the correlation (blue) on top of the original data (green). It produced the following plot:

Correlation Between All Data and a Data Sample from t=45 to t=55 Plotted Ontop of the Original Data for each Axis

The X Axis appears to have good data. The correlation output is high (magnitude has been scaled to match original signal magnitude) during the sample data period, as well as during the two heavy period of walking to and from the store. The relationship between the two data sets from t=550 to t=650 is truly text book, and gave me great confidence in the methodology being used Unfortunately, this methodology does not seem to hold up as well for the Y and Z axes, in which the relationship between the correlation and the original data seems to be the opposite of what it should be; it is lower during periods of heavy walking, and higher during periods of random motion or little movement. This could possibly be explained by the fact that with my arm in the normal walking position, i.e. hanging to my side, the X axis is most likely to experience a force that is directly related to the number of steps, as my arm is moved up and down when the arch of my foot expands and contracts as part of my stride. I will move on with an analysis solely of the X axis, but if anyone can spot an error in my correlation methodology for the other two axes, I would greatly appreciate a comment on what I’m doing wrong.

I created a script correlationwindower.m, which performs a cross correlation of the data from an axis with a sample data set selected by a user definable range. The script then creates a windowing vector by converting every data point in the correlation vector to either a 0 or a 1 based on whether its value is greater or less than a user definable threshold, expressed as a percentage of the maximum value in the correlation data set (after scaling). This windowing vector is then multiplied by the original data set to remove all values which do not have a decent correlation with the sample data set. It then performs a peak detection on this new windowed data, where peaks must be greater than the mean of all non-zero values, and have a minimum distance between them of a user definable number of samples. The input window for the script looks like this:

Input Window for correlationwindower.m

The windowed signal looks like this:

X Axis Signal Windowed by It’s Correlation With a Sample Dataset

And the script outputs the detected peaks plotted on top of the original, “un-windowed” signal:

Original Signal With Detected Peaks

As you can see, for the input parameters shown in the previous screenshot of the inputs window, the script detected 583 peaks, which is about 13% more than we expected; not too bad considering all the non-cyclical motion that was going on. I thought it was also worth performing a brief sensitivity analysis around both the correlation windowing threshold value and the minimum distance between peaks. The table below shows the absolute percentage different between the number of steps the script calculated given different values for the aforementioned parameters, and the 517 steps the Flex recorded. All parameter pairs with a 13% or less absolute difference have been highlighted, to show that inputting those parameters would have been at least as accurate as the data shown above:

Sensitivity Analysis Around Minimum peak distance and Windowing Threshold

The sensitivity analysis shows that there is a broad tolerance centered roughly around the 10 samples, 60% cut off threshold for the windowing. This is probably a good thing, although it would be nice to expand the map further. Overall, I was happy with the first real signal processing that I’ve done so far, and I look forward to trying the script methodology out on other data sets in the future.

UPDATE: I just realized that my algorithm is actually more accurate than I had previously thought. The 583 detected peaks includes ALL the peaks in the sample, when what should be compared to the Fitbit’s 517 measured peaks are just the peaks which occurred after calibration set was recorded. Removing the peaks that occurred before t = 55s brings the total detected peaks down by 30 to 553, which is only a 7% difference from what the Fitbit calculated. The sensitivity table above is now invalid because it includes these early peaks, but I’m not going to bother to recalculate it because it’s scotch thirty.

## A Good Night’s Rest

I wanted to compare actual accelerometer data to the sleep report from my Fitbit Flex, so for the past several weeks I’ve been sleeping with both the flex and the data logger on my left hand. I chose a night where I felt like I had slept very well the next morning, and recorded the following screenshot from the FitBit app:

Screenshot of the Sleep Pattern screen from the Fitbit App

I then compiled data from the logger, which was tedious because I was sampling at 25Hz with a max file size of 90,000 samples. So, I had to combine many files in excel to produce a single data set for the accelerometer output which had 885,265 data points (Raw Data)… In retrospect, this was a little overkill, as there is nothing in the body that is happening so fast that you need to sample 25 times a second! Once this data was imported into Matlab, I was able to plot it versus time, where time was given in seconds since the beginning of the recording period. I wanted to plot the data versus the hour of the day however, and it took me a while to figure out the ultimately trivial commands. I first used the command ttime= datestr(t/86400+datenum(2013,10,14,22,54,14)) to create a date string which would combine the seconds elapsed with the start time to produce absolute rather than relative time measurements.  I then converted that string back into numbers using the command tdate = datenum(ttime), and plotted all the data versus this new time variable. Finally, I changed the formatting of time ticks to just show the hour of the day using the datetick(‘x’,’HH PM’) command. I also created a vector based on the FitBit sleep data, which I had to manually grab from the app by sliding my finger around the plot and noting the start and stop times of “restlessness”. This was then overlaid ontop of the accelerometer data (as shown below), with an arbitrary magnitude.

Plot of Raw Output Data Versus FitBit Data

Needless to say, I was very surprised by this data. First, there was much more movement than I had anticipated given the relatively stable reading from the Fitbit. Second, I was surprised at the low correlation between the data from the Fitbit and the logger. Without doing any quantitative analysis, it is easy to see that Restless Period (RP) 1 is due to a flurry of motion, which most likely represents me trying to fall asleep. RP2 seems to correlate with the spike in logger data which occurs just after it, and the discrepancy in timing could be due to the fact that FitBit quantizes data into 1 minute intervals, as well as differences in timing between the two devices. RP3 corresponds nicely with a large spike in the Y Axis, however the subsequent movements between 3am and 7am do not trigger any “restlessness” events, and there does not appear to be a clear trigger for restlessness based on either magnitude or duration of movement, which is somewhat perplexing. Perhaps FitBit’s algorithm is more nuanced, and takes into account the predicted sleep cycle. For example, the extended period of stillness from about 1230am to 130am might have registered as the first REM cycle, such that motion which occurred 4 hours latter in the next cycle was ignored because the algorithm predicted that I would be in REM sleep once again. It should also be noted that I was not sleeping alone during this data collection cycle, but as the data shows, there was nothing going on but sleeping because there are no high frequency, sinusoidal movements…. Giggity.

## 100 Step Trial

In order to test the logger in a real world situation I needed to mount it. I chose the wrist as the test location, so that its data could be easily compared to that from a FitBit or other activity tacker. I bought several wrist bands from Amazon, and used electrical tape to secure it to \$5 nylon band. I thought this would only be a trial solution, but despite the clunkyness of the mounted logger, it actually turned out really well and there was no need to improve upon the mount.

Mounted Logger (Overhead View) Alongside FitBit Flex

Mounted Logger (SideView) Alongside FitBit Flex

Once mounted on my left wrist, I had to decide what to do with my arm while walking. I chose to hold my phone in the monitored hand and walk with my arm stretched out in front of me as shown below. I wanted to keep the step signal as simple as possible and avoid the introduction of any swaying arm motion back and forth at this point. Figured holding my phone would give it additional inertia and help damp out any swaying back and forth (in addition to allowing me to see my fitbit step count).

Hand Position While Walking

With my arm held steady as shown above, I recorded a brief sample of data at 25Hz while standing still in order to see what the baseline measurements for the logger mounted in this location were. I produced the plot below, which includes the average values for the X, Y and Z axes from time t = 5 to t = 17.

3 Axis Plot From Mounted Sensor – (Stationary, i.e. no steps)

The change in the forces at the beginning and end of the plot makes sense, as I had to activate the logger with a magnet while looking at it as you would a watch, and then roll my wrist over to hold the phone as shown in the previous photo. I then had to roll it back to turn the logger off with the magnet at the end of the plot, which took a bit longer. I was initially concerned that the sum of the three highlighted forces did not equal 1, because in a previous post , where the device was sitting at rest on a table, the sum of the all three forces was always approximately equal to 1. I then realized, of course, that is the vector sum of the forces which should be equal to 1, i.e. sqrt(X^2+Y^2+Z^2) == 1. This calculation was not necessary in the first post because the static force of gravity was always parallel to an axis. With fractional components of the gravitational force in each axis as in the above, however, it is extremely important, and the vector sum of 1.0152 for the three values above gave me sufficient confidence in the device to proceed.

With my hand positioned as described, I walked exactly 100 steps at a brisk pace along a flat gravel road while wearing flip flops. My FitBit Flex recorded 101 steps, which is pretty close, and may actually be the more accurate number as I could have miscounted somewhere. I analyzed the raw data (DataPost2) in Matlab, scaled it, and produced the following plot with the mean values for each axis from t = 29 to t = 84 seconds superimposed:

3 Axis Plot From Mounted Sensor – (100 Steps)

The first thing that jumps out in the plot above is that the X and Z axes have flipped position. I think this must be due to a slightly different hand position while walking, which is supported by the very different mean values that the axes are experiencing. Although there was also motion introduced into the data here, this is probably not the primary cause because even during the period of non-motion which occurs after the steps from about t = 85 to t = 110 seconds, the X and Z axis are still flipped. Regardless, trying to compare the two plots in this post based on something as vague as my general arm position is not a good use of time – I will just assume the data logger is correct.

As part of my first real analysis of accelerometer data, I thought I would just do a simple peak detector to see if I could get some code running in Matlab. I created a Matlab script (myfirstpeakfinder) which to accomplish this, which generally runs as follows:

1. Load raw data into Matlab; scale to G units
2. Run the script, which will prompt the user for the start and stop time of the period they would like to analyze in the data, as well as for the accelerometer sampling frequency.
3. The script then uses the findpeaks function built into matlab, and passes it inputs around the minimum peak height and the minimum distance between peaks (‘MINPEAKHEIGHT’ and ‘MINPEAKDISTANCE’). I used the mean value of the data for a given axes in the given time frame as the minimum height (so as not to double count cycles), and set the minimum peak distance to 10 samples. In the future, it would be interesting to run a FFT on the data, figure out what the average frequency of my walk is, and then have that data inform the minimum distance between steps. But let’s just keep it simple for now….
4. The peak data are then plotted back onto the original input data, which have been cropped by the start and stop times.
5. The number of Peaks, aka “Steps”, are then displayed for the data from each axis, along with an average of all the axes.

By selecting only the cleanest portion of the data, from time t = 29 to t =  84, and keeping the min distance between peaks at 10 samples, I was able to generate the following plot:

3 Axis Plot From Mounted Sensor – (100 Steps, Analyzed with myfirstpeakdetector.m)

The diamonds indicate detected peaks, and the the data for each axis were as follows:

Matlab Output (Better To Be Lucky Than Smart!)

Now, there are obviously many issues with the algorithm, but it felt pretty damn good to achieve the same accuracy as my flex (+/- 1 Step) on the very first trial! Of course, taking a larger bite of data completely blows up the script, and inputting a minimum peak distance other than the one I serendipitous happened to input on the first trial dramatically decreases the accuracy. But – it’s a start. The next thing I would like to do is create an aggregate metric for motion in excess of 1g, and use that to inform the peak detector… more to come on that. But for now, I thought I would also include a sensitivity analysis showing how the average number of steps calculated from t=29 to t=84 is effected by the minimum step distance changing over a series of values:

Sensitivity Analysis: Minimum Number of Samples Between Steps versus Average Number of Steps Calculated

## First Test of the X8M-3 Self Contained Data Logger

I purchased a pair of self contained data loggers, which have a 3 axis accelerometer, 3 axis magnetometer, 2GB of internal storage, and a 250 mAh 3.7v Li-po battery. They measure about 50mm X 26mm X 12mm, and are surprisingly light at just around 10g (not including the housing).

Gulf Coast Data Concepts X8M-3 Data Logger

The device is recharged through a micro-USB port (its sole interface), and when connected to a PC it appears as a flash drive. Inside, there is a folder containing recorded data files, a primitive Java application which can help with data analysis and scheduling, and a few text files which can be edited to adjust configuration parameters (like sampling frequency). After reading the device manual (GCDC_X8M-3_User_Manual), I configured the device to sample continuously at 25Hz using a modified configuration file:

config.txt File

With the settings downloaded, it was time to take the XM8-3 for a spin. While recording, I moved the device so that each one of its axes were parallel to the force of gravity for about 10 seconds, and generated about 900 points of raw data (DataPost1). I then scaled the accelerometer data to g units by dividing each data point by 1024 ( no need to call Curtis Jackson) , and used Matlab to generate a plot:

3 Axis Accelerometer Test

The data was generally as expected. In each position, one axis should have experienced 1g while the other two axes experienced no force. Minor variations can be explained by the surface the device was sitting on during the test not being completely level, which is compounded by the fact that faces of the case with seams bulge out in the middle. This might have impacted the readings in which the X and Y axes were parallel to gravity. In any case, the stated zero-g offset accuracy for the device was +/- .15g for the X and Y axes and +/- .25g for the Z axis, and the data are roughly within those bounds. More important than device accuracy, however, is its precision, since we will be most interested in relative motion in future experiments.  Form time t = 2 seconds to t = 12, the standard deviation of the measurements in each axes was calculated: X = 0.012g Y = 0.011g Z = 0.012g. This tight clustering shows high precision in the device.

X8M-3 with Axes Labels