Big data, migration and human mobility

IOM’s Global Migration Data Analysis Centre (GMDAC) and the European Commission Knowledge Centre on Migration and Demography (KCMD) launched the Big Data for Migration Alliance (BD4M).

The term “big data” includes anonymized data that are generated by users of mobile devices, internet-based platforms, or by digital sensors and meters, for example, satellite imagery. With about 5.16 billion unique mobile users, and around 4.57 billion active internet users around the world (We Are Social – Hootsuite, 2020), such “digital traces” present an enormous opportunity to complement traditional sources of migration data and improve knowledge of various aspects of migration. This is all the more relevant in light of the current data gaps and the need to monitor progress towards the migration-related targets in the Sustainable Development Goals (SDGs). The potential of these innovative sources, however, comes with significant challenges.

Big data are usually understood as data generated automatically by users of mobile phones, social media, internet platforms and applications, as well as via digital sensors and meters. Such data are stored in real time in large databases, usually owned by private companies – be it mobile phone operators, providers of social media platforms or other internet-based services. However, big data are not only “big” because of their volume; the speed (“velocity”) at which they are generated and the complexity (“variety”) of the information are also considered as distinguishing features of this kind of data (Hilbert, 2013).

Big data are different from data based on traditional household surveys as they do not refer to a random sample of individuals but to the totality of the population using, for instance, mobile phones or internet-based platforms, and these data are accessible in real time (Hilbert, 2014). Big data also differ from traditional data because of the specific technical and analytical methods required to extract meaningful insights from them and transform these data into ”value” (de Mauro, Greco and Grimaldi, 2016). Letouzé (2015) distinguishes between big data as data, or “digital” translation of human actions, interactions and transactions picked up by digital devices and services,” and big data as “an ecosystem of data, human and technical capacities and communities” producing and using such information for decision-making.
Recent developments

Over the past years, a fast-growing number of projects and applications have demonstrated the potential of using various types of big data sources – such as mobile phone, social media, or satellite data – to improve the understanding of phenomena related to global migration and human mobility. An initial scepticism – not least within the statistical community (UN Global Working Group on Big Data for Official Statistics, 2016) – has given way to the realization of the value of these innovative data sources to complement traditional data sources and methodologies of migration statistics. Data innovation, including big data, is now subject of various task forces and working groups (Ibid., 2020; Eurostat Big Data Task Force, 2020; UN Global Pulse, 2020), and mentioned in key global migration policy frameworks, such as the Global Compact for Safe, Orderly and Regular Migration (GCM).

The Big Data for Migration Alliance (BD4M) – a joint initiative of IOM’s GMDAC with the EU Commission’s Joint Research Centre (JRC) – is compiling many of the pioneering projects in this field in a Data Innovation Directory (DID). In cooperation with a number of international partners, the DID offers up-to-date insights about projects, initiatives and applications of new data sources and innovative methodologies in migration and human mobility, so as to facilitate access to existing knowledge in this rapidly evolving field. Noteworthy examples include the following:

• Mobile phone Call Detail Records (CDRs) have been used to track internal displacement following natural disasters, such as the Nepal earthquake, or the spread of diseases, such as COVID-19 (Wilson et.al., 2016; Flowminder & Ghana Statistical Services, 2020, Pepe et.al., 2020). While CDR data are usually more helpful to identify internal migration patterns, they could also be used to measure international migration at the sub-regional level, particularly when combined with other sources. For instance, the combination of CDR with satellite data can help to map movements between cross-border communities (Sorichetta, 2017); CDRs coupled with census statistics can contribute to understanding patterns of refugee integration (Boy et.al., 2019); and CDRs combined with geolocated social media data and official labour force statistics can help to assess the social integration of migrants in destination countries.

• Geo-located social media activity, such as on Twitter and Facebook, have been used to infer international migration flows and stocks, also disaggregated by age, sex as well as skill levels or sector of occupation, based on user self-reported information (Zagheni, Kiran and State, 2014; Patel, 2017; Gendronneau, 2019). For instance, during the COVID-19 pandemic, Facebook’s Disease Prevention Maps provided data on population distribution and movement on a daily basis, available for further analysis of the disease outbreak (Maas et.al., 2020). Overall, the number of active social media users globally in April 2020 reached 3.8 billion (We Are Social and Hootsuite, 2020), of which 2.6 million were Facebook users alone (Statista, 2020). The popularity of these platforms, together with the geotagged information that can be extracted from them, can be leveraged to study mobility patterns.

• Social media data can also be used for providing information as in a “real-time census” at the national or global level at a certain point in time (Zagheni, Weber and Gummadi, 2017; Spyratos et.al., 2019). Data from the Facebook advertising platform, for instance, can yield information on a number of characteristics of users, such as their (self-reported) age, sex, their “home country” and current country of residence, educational background, sector of occupation and personal interests. In early 2018, Spyratos et al. were able to accurately identify the increase in the number of Venezuelan migrants (the number of active monthly users classified by Facebook as ‘expats’) in Spain, a trend confirmed in the Spanish official statistics. Additionally, social media content can also be used to analyse public sentiments toward migrants and refugees, and how opinions on social media can become polarized (Natale, 2017; UN Global Pulse and UNHCR, 2017).

• Repeated logins to the same website and IP addresses from e-mail sending activity have been used to estimate international mobility patterns and users’ likelihood to move to another country (Zagheni and Weber, 2012; State et.al., 2013). Self-reported information on sex and age of users also allowed to estimate migration rates by sex and age group. Online search data may also be helpful to forecast (forced) migration, as shown in projects that compare Google Trends data with numbers of arrivals of asylum-seekers and migrants in Europe and in Australia (Connor, 2017; UN Global Pulse, 2014). Similarly, the Google Trends Index (GTI) – derived from the Google search engine, used by over a billion people worldwide – for migration-related search terms can be exploited to measure migration intentions from a certain country and predict subsequent emigration flows (Böhme, Gröger and Stöhr, 2018). The European Asylum Support Office’s Early Warning and Preparedness System (EPS) is using a combination of Google Trends data and traditional data sources to detect changes in country of origin contexts and forecast asylum applications in the EU.

• Artificial Intelligence (AI) and Machine Learning can support projects and applications that seek to better understand migration-related phenomena in numerous ways. For instance, UNHCR’s Project Jetson computes an index with AI that allows to make short-term predictions of expected migration flows in Somalia based on key variables, such as commodity market prices, rainfall and violent conflicts. Further, AI can complement and enhance human expertise in interpreting satellite imagery to identify internal displacement or infrastructural damages after natural disasters (Quinn et.al., 2018). In Uganda, radio content was collected and analysed with machine learning to understand public attitudes toward refugees in the country (Quinn & Hidalgo-Sanchez, 2017).

Data sources

  • Big data sources that have so far been used in migration-related studies can be grouped under three broad categories (Global Migration Group, 2017):
  • Mobile-phone-based – e.g. call records or mobile money transfers.
  • Internet-based – e.g. social media or use of search engines.
  • Sensor-based – e.g. Earth Observation Data (satellite imagery).
  • The infographic below shows the specific types of sources.

Data strengths and limitations

The advantages of using new data sources for the analysis of migration-related aspects are linked to their potential to fill some of the gaps in traditional data sources and methods. While acknowledging the progress made by national governments and the international community on migration statistics, traditional data sources have inherent limitations: national population censuses are costly and infrequent, migrants may be hard to sample in household surveys, and they may be undercounted in administrative records if they are not able to access services in the host country. The increased availability of digital records presents an opportunity to address some of knowledge gaps around migration and mobility, especially given their timeliness, the frequency at which information can be updated, their wide coverage (of all users of mobile devices and internet-based platforms), and the level of detail they can provide.

Big data may be particularly useful to study patterns of temporary or circular migration, which are hard to measure through traditional sources and methods, or to anticipate migration trends. They can also contribute to more timely monitoring of public opinion or media discourse on migration, compared to public opinion surveys, for instance. Another advantage is that such data are generated at no additional cost and can be obtained at a lower cost compared to data from traditional sources – depending on the willingness of data holders to share data or the insights these can generate. The combination of information that can be extracted from traditional and innovative data sources can provide evidence on aspects of migration we currently have limited knowledge of, such as integration prospects of recently-arrived migrants in a country, fluid forms of migration that fall outside the UN definition of temporary or permanent migrants, or future migration movements.

The opportunities offered by big data are met by some significant challenges:

Ethical and privacy issues: There are confidentiality and ethical issues in using data automatically generated by individuals, often without their informed consent, as well as civil liberties concerns due to the risks of using such data for surveillance purposes, which are particularly serious in contexts of irregular migration and forced displacement. The creation of adequate legislative and regulatory frameworks to safeguard confidentiality of the information and ensure the ethical use of data is necessary. In 2019, UNESCO has embarked on a two-year process to elaborate a global standard-setting instrument on the ethics of AI. Furthermore, the EU’s Agency for Fundamental Rights (FRA) is working on a project titled “Artificial Intelligence, Big Data and Fundamental Rights” which assesses the advantages and disadvantages in terms of fundamental human rights of using artificial intelligence, machine learning and big data for public policy and business purposes. The project aims to produce fundamental rights guidelines and recommendations in using artificial intelligence for policy. IOM was one of the first international organizations to adopt its own Data Protection Principles, and is affiliated to the International Data Responsibility Group (IDRG), a global network of experts and organizations working on principles and standards required for guiding the data revolution in the context of humanitarian action and sustainable development. IOM also supported the Signal Program on Human Security and Technology of the Harvard Humanitarian Initiative, which produced core ethical obligations for information activities in humanitarian contexts.

Big data are inherently biased: Users of social media or mobile phones are not necessarily representative of the population at large. Specifically, differences in internet access or use of mobile devices and social media platforms by level of economic development, sex, age and urban/rural areas are still significant. Research is ongoing to address the methodological challenges associated with such (“self-selection”) bias and results obtained so far look promising (Spyratos et al., 2018; Zagheni, Weber, and Gummadi, 2017; Hughes et al., 2016). Understanding the measurement error inherent in big data sources is helpful to increase the predictive capacity of models based on such sources, and facilitate sensible use of big data for decision-making.

Technical, analytical and legal challenges: Some of the challenges are due to difficulties in accessing data – held by private or state actors – or using data for research purposes; inappropriate infrastructure and data management and security systems; and methodological difficulties in extracting meaning from huge, complex and “noisy” volumes of data. There are also issues of continuity of data, considering the rapid pace of technological change and innovation, and difficulties in gaining an overall picture of which big data sources or innovative methods can yield useful insights for policy, due to the proliferation of pilot applications and the absence of systematic services in this area. In this sense, the development of innovative “public-private partnerships” for data exchange and collaborations, such as “Data Collaboratives” (Verhulst, 2015) should be incentivized to make progress in this area.

As a way to concretely explore how to harness new data sources for migration analysis and policymaking, IOM’s Global Migration Data Analysis Centre (GMDAC) and the European Commission Knowledge Centre on Migration and Demography (KCMD) launched the Big Data for Migration Alliance (BD4M). While a series of initiatives exist at the UN- and EU-level focused on data innovation for sustainable development, such as the UN Global Pulse, the UN Data Innovation Lab, and the UN Global Working Group (GWG) on Big Data for Official Statistics, there was still no unit specifically tasked with leveraging new data sources in the field of migration and human mobility – hence the idea to create a dedicated Alliance.

The BD4M is a network of individuals and organizations across sectors that aims to a) explore the potential of new data sources and the combination of traditional and innovative methodologies for the analysis of migration and its relevance for policymaking; b) ensuring the ethical use of data and the protection of individuals’ privacy; c) promoting and facilitating new forms or partnerships across the business, policy and scientific communities; and d) supporting peer-to-peer learning, including by facilitating the sharing of good practices and by building capacities on migration data innovation. Plans to create the Alliance were announced in the follow-up to the expert workshop Big Data and alternative data sources on migration: From case-studies to policy support, jointly organized by the KCMD and GMDAC in Ispra. More information about the BD4M is available on the BD4M platform (www.data4migration.org), hosted by The Governance Lab (GovLab) at New York University Tandon School of Engineering.