HomeSense is a National Centre for Research Methods (NCRM) supported Methodological Research Project that is focused on providing a better understanding of the implications of using modern digital sensors in social research. But before the implications can be fully understood methodological issues such as finding meaning in sensor data has to be refined, which was the focus of a paper presented by Dr Jie Jiang at The International Conference on Future Networks and Distributed Systems (ICFNDS’17) in Cambridge in July this year.
The paper discussed, as a case study, data analysis of one testbed home in the early stages of the fieldwork, and dealt with finding meaning in sensor-generated data and proposed methods for comparing data generated by sensors with self-reported time-use data generated by participants.
Since one of the implications of using sensors for social research is that, in due course, activities could be recognised automatically, it also proposed a method for modelling a range of activities recorded by sensors.
While the same approach was being applied in an extended version of the study with data from more households added to the mix, publication of the paper in the Proceedings of ICFNDS’17 provided an opportunity for some of its authors – Jie Jiang, Riccardo Pozza, Kristrún Gunnarsdóttir – to sit down and discuss the approach being taken by the HomeSense project.
That extension of the study to three households was also discussed by the team and is available here for download. Using Sensors to Study Home Activities (authors’ copy, freely available for fair use) is the latest HomeSense paper and has been accepted for publication in the Journal of Sensors and Actuator Networks.
Establishing a framework to evaluate agreement between two different sources of data
The extended paper reports on a pilot of three households using the same data-analytic methods, that, according to Jie Jiang, provides, “a framework to evaluate agreement between different sources of data that recognises activities of people’s daily lives.”
Social research studies usually involve asking participants to consider their actions and describe or write these down after the event. To identify any inconsistencies, researchers can apply measures to pinpoint participants’ actual activities, using a triangulation of methods or to establish what in this case is referred to as ‘ground truth’.
“As we don’t yet know whether digital sensors would be useful for research or how to use this method as yet, we need to compare the traditional and new method”, added Jie.
”New methods need to be introduced gradually, so understanding the similarities and differences between sensor-generated data and traditional empirical methods of understanding human activities is an important challenge for the project.”
The purpose of the conference and journal papers wasn’t to report the ‘truth’ of the activities in question, but, instead, quantify the differences in the interpretations the two methods yielded.
Typically, using sensor-generated data, said Jie. “Such studies seek to establish their ‘ground truth’ by using cameras, or giving participants a predefined list of tasks to do at particular times.”
So, rather than considering the exact details of the timings of activities, the key consideration is the methodological question of quantifying the degree of agreement or disagreement between the recorded sensor data and corresponding participant records written in time-use diaries, and proposing a framework for evaluating to what extent they are in agreement.
Kristrún added, “In this work, that uses time-use diaries and sensor-generated data, you ease on the expectation of ground truth, in order to create a framework that explores the agreement between them.”
“That’s unique ”, she added.
Advantages of sensor-generated data for social science
“Actually, there are many methods for recognising activities from sensor-generated data”, said Jie.
Besides which, no matter how diligent the participant, self-reported data isn’t likely to be entirely accurate as it places the burden of recollecting and recording actions every 10 minutes or so, while sensor-generated data, in this example, is automatically recorded every 3 seconds.
“There’s also the advantage in practical terms of overcoming the privacy concerns of using cameras to obtain ground truth”, added Riccardo.
“Not many people would want to have cameras monitor all their household activities at all times because they might be doing activities they don’t want you to see.”
Data model based on mean-shift clustering and change point analysis
Going deeper into the model, Riccardo said,“We applied machine learning techniques for pattern recognition to understand patterns from the data and assign these to activities of daily living of the subject, such as what people do at home, e.g. eating, cooking, playing with a computer, etc.”
Instead of evaluating the model against some ‘ground truth’ the researchers decided on an approach training a model to derive meaning from the data.
”We extracted features from raw data via mean-shift clustering as a way of taking the sensor-generated data and ‘clustering’ it around its average.”
“Clustering is an area of unsupervised machine learning, based on dividing data into groups without requiring knowledge about what each group represents.”
“Mean shift clustering proved very effective for grouping sensor values around their mean values, to eliminate spurious fluctuations in the data and more meaningfully describe the features in the data.”
“For example, a sound level sensor would record changes in measured decibels, but only an average value over a time window is needed to recognise presence or absence of noise to interpret presence in a room, such as when someone is cooking a meal in a kitchen.”
“We could produce a model recognising what the data revealed, using machine learning, and train the model to be more reliable by presenting it with loads of data over time.”
New combination of methods for social research
“Actually, said Jie, ”there are many similar methods we could use to recognise activities from sensor-generated data”.
”Also, both human subjects and learning models might misreport activities and people simply might not want to report some (more sensitive) activities.”
“And the question about which method is more accurate is a very difficult question to answer… but we only need to know to what extent we can use sensor-generated data.”
The feedback received by Jie at ICFNDS’17, from an audience of mainly engineers, naturally homed in on the technology rather than what it could be used for.
“They asked about the technical methodology rather than social science, such as the Hidden Markov Models, and about the types of sensors used in the study”.
Looking ahead to further possibilities of the HomeSense fieldwork and the data the project can generate, Jie continued, “we’ve now started to include more sensors including activating Bluetooth connectivity… and we’re still recruiting more households to do more research, to further explore the method of using digital sensors, rather than make conclusions about activities we can recognise.” Rounding up, Kristrún characterised the implications of this work in the whole as purely methodological… “which for the scholarly community is significant; it’s an end in itself for pushing methodological developments forward.”
Using Sensors to Study Home Activities Authors’ copy, freely available for fair use.