---
title: "README"
author: LABERTONIERE Dahliane
date: 15/12/2025
---

# GENERAL INFORMATION

1. Project title: Fast mapping in hominids


2. Authors:
	A. Author 1 
		Name: Labertoniere Dahliane
		Role: Postdoctoral researcher
		Institution: ENS-PSL
		Address: 29 rue d'Ulm, 75005 Paris, Fance
		Email: dahliane.labertoniere@ens.psl.eu
		
	B. Author 2 
		Name: Wilson Vanessa A. D.
		Role: Lecturer
		Institution: University of Hull
		Address: Cottingham Road, Hull, HU6 7RX, United Kingdom
		Email: vanessa.Wilson@hull.ac.uk
		
	C. Author 3 
		Name: Pascual-Guàrdia Carla
		Role: Research assistant
		Institution: Université de Neuchâtel
		Address: Institut de biologie, Avenue de Bellevaux 51, 2000 Neuchâtel, Switzerland
		Email: carla.pascual@proton.me
		
	D. Author 4 
		Name: Skoruppa Katrin
		Role: Professeure ordinaire
		Institution: Université de Neuchâtel
		Address: Institut de sciences logopédiques, Pierre-à-Mazel 7, 2000 Neuchâtel, Switzerland
		Email: katrin.skoruppa@unine.ch
		
	E. Author 5 
		Name: Zuberbühler Klaus
		Role: Professeur ordinaire
		Institution: Université de Neuchâtel
		Address: Institut de biologie, Avenue de Bellevaux 51, 2000 Neuchâtel, Switzerland
		Email: klaus.zuberbuehler@unine.ch

3. Linked publication (join DOI if possible): https://link.springer.com/article/10.1007/s10071-025-01974-x
https://doi.org/10.1007/s10071-025-01974-x

4. Date of publication: 01/07/2025

5. Discipline: Comparative cognition - Psycholinguistics

6. Topic keywords: Fast mapping, Word learning, Language evolution, Comparative cognition, Language acquisition, Meaning, Mental representation

7. Funding: This research was supported by funding from the Institute of Speech and Language Therapy (K.S.), the University of Neuchâtel (K.S., K.Z.) 
and the National Centre for Competence in Research ‘Evolving Language’ (SNSF agreement 295 number 51NF40_180888, K.Z.) and Top-Up Grant (grant number N603-18-01, V.A.D.W., K.Z.)


## DATA OVERVIEW

1. Number of file and datasets: two csv datasets ('ape_clean.csv', 'human_clean.csv'), and a collection of stimuli pictures

2. For each file, describe shortly what it contains (data, documentation, code...):
- ape_clean.csv: cleaned eyetracking data for the Basel zoo apes
- human_clean.csv: cleaned eyetracking data for the Neuchâtel human participants
- picturecollection.zip: pictures of stimuli used for the eye tracking experiment

### METHODOLOGICAL INFORMATION

1. Date of data collection: 
Apes: 2022; humans: April 2024

2. Geographic location of data collection (precise if specific to each file): 
For the ape data: Basel zoo, Switzerland; for the human data: Université de Neuchâtel, Switzerland
 
3. Methods used for collection and/or generation of data: 
Apes: Tobii Spectrum eyetracker with Tobii Pro Lab; Humans: Tobii Pro X3-120 eye-tracker with Matlab

4. Methods used to process the data: Preprocessing (cleaning) script in Matlab

5. Instrument and/or software needed to read and handle the data: R or another software for statistical analyses

6. Quality-assurance procedures performed on the data: automatic recording of looking times in AOIs during the tests.
Computation of the proportion of target looks for pre-and post-naming phases for each trial.


#### DATAFILE-SPECIFIC INFORMATION: ape_clean.csv

1. Number of variables: 28

2. Variable List: Sujet (subject number), ExpDate (date of data collection), Bloc (Block), Trial (1= first round of 3 exposures; 2 = second round), 
Passage (1= first test; 2= second test), Pos_Image_Test (Target position: 1= left; 2= right), Contexte (1= variable; 2= invariant), ImageG (name of left image),
ImageD (name of right image), StimDroitPre (pre-naming looking time to right image), StimDroitPost (post-naming looking time to right image),
StimGauchePre (pre-naming looking time to left image), StimGauchePost (post-naming looking time to left image), StimGDPre (pre-naming looking time 
to left + right images), StimGDPost (post-naming looking time to left + right images), MotieDPre (pre-naming looking time to right side of the screen),
MoitieDPost (post-naming looking time to right side of the screen), MoitieGPre (pre-naming looking time to left side of the screen), MoitieGPost 
(post-naming looking time to left side of the screen), PreNamingR (pre-naming ratio of looks to the target), PostNamingR (post-naming ratio of looks to the 
target), PreNamingRDemin (pre-naming ratio of looks to the side of the target), PostNamingRDemi (post-naming ratio of looks to the side of the target),
OnTarget_0 (where is the first look at the start of the post-naming phase: 0= on the screen but not on stimuli; 1= on target, 2= on distractor; 
3= not on screen; 99= error), OnTarget_367 (same as previous except that codes first look after 367ms of post-naming), Valid (1= looking time to the screen
> 800ms; 0= < 800ms), targWd (name of the target sound), dist (name of the distractor sound).

3. Missing data codes: NA

4. Remarks: 
The 'Contexte' variable contains erroneous data, to get the correct data, run this code:

```
data$Contexte <- NA
for (i in 1:nrow(data)) {
  if (data$Bloc[i] == 1 || data$Bloc[i] == 4) {
    data$Contexte[i] <- 1 
  }
  else {
    data$Contexte[i] <- 2
  }
}
```

The only way to determine number of exposure is by experimental date. Here is the code that inputs the correct number of exposure for each participant:
We only take into account up to repetition number 6 for the analyses.

```
#when looking at the data, you want to sort by ExpDate, bloc then sujet
#that way you will have the data in order of blocks (bloc 1 first, ordered by date), and trials in the right order, for each subject
data$nb_rep <- NA
data$nb_rep[c(328, 329, 1, 2, 343, 344, 7, 8, 352, 353, 19, 20, 364, 365, 30, 31, 34, 35, 54, 55, 62, 63, 94, 95, 98, 99, 118, 119, 114, 115, 376, 377, 150, 151, 148, 149, 388, 389, 174, 175, 170, 171, 400, 401, 402, 403, 196, 197, 416, 417, 212, 213, 216, 217, 236, 237, 240, 241, 260, 261, 280, 281, 308, 309, 312, 313, 428, 429, 440, 441, 455, 456)] <- 1
data$nb_rep[c(330, 331, 3, 4, 345, 346, 9, 10, 354, 355, 21, 366, 367, 32, 33, 36, 37, 56, 57, 64, 65, 96, 97, 100, 101, 120, 121, 116, 117, 378, 379, 152, 153, 154, 155, 390, 391, 176, 177, 172, 173, 404, 405, 198, 199, 418, 419, 214, 215, 218, 219, 238, 239, 242, 243, 262, 263, 282, 283, 310, 311, 314, 315, 430, 431, 442, 443, 457, 458)] <- 2
data$nb_rep[c(335, 336, 5, 6, 347, 348, 11, 12, 356, 357, 22, 23, 368, 369, 38, 39, 42, 43, 66, 67, 74, 75, 102, 103, 106, 107, 122, 123, 126, 127, 380, 381, 158, 159, 156, 157, 393, 182, 183, 178, 179, 406, 407, 204, 205, 420, 421, 220, 221, 224, 225, 244, 245, 248, 249, 268, 269, 292, 293, 316, 317, 320, 321, 432, 433, 447, 448, 459, 460)] <- 3
data$nb_rep[c(337, 338, 349, 350, 13, 14, 358, 359, 24, 25, 370, 371, 40, 41, 44, 45, 68, 69, 76, 77, 104, 105, 108, 109, 130, 131, 128, 129, 382, 383, 160, 161, 162, 163, 394, 395, 184, 185, 180, 181, 408, 409, 206, 207, 422, 423, 222, 223, 226, 227, 246, 247, 250, 251, 270, 271, 294, 295, 318, 319, 322, 323, 434, 435, 449, 450, 461, 462)] <- 4
data$nb_rep[c(339, 340, 351, 15, 16, 360, 361, 26, 27, 372, 373, 46, 47, 78, 79, 90, 91, 110, 111, 124, 125, 136, 137, 384, 385, 166, 167, 164, 165, 396, 397, 188, 189, 186, 187, 410, 411, 208, 209, 424, 425, 228, 229, 232, 233, 252, 253, 256, 257, 264, 265, 300, 301, 324, 325, 436, 437, 451, 452)] <- 5
data$nb_rep[c(341, 342, 17, 18, 362, 363, 28, 29, 374, 375, 48, 49, 80, 81, 92, 93, 112, 113, 138, 139, 386, 387, 168, 169, 398, 399, 190, 191, 412, 413, 210, 211, 426, 427, 230, 231, 234, 235, 254, 255, 258, 259, 266, 267, 302, 303, 326, 327, 438, 439, 453, 454)] <- 6
data$nb_rep[c(50, 51, 86, 87, 414, 415, 272, 273)] <- 7
data$nb_rep[c(52, 53, 88, 89, 274, 275)] <- 8
#a few trials do not have exposure number, these are trials that need to be removed because of various bugs (no sound, loud music in the background...):
data <- data[!is.na(data$nb_rep), ]
```

Some apes continued a block begun one day on another day. For the analyses, we only take into account data from the first day.
To be able to filter these data, run this code: 

```
data$Day <- NA
for (i in 1:nrow(data)) {
  if (data$Sujet[i] == 1){
    if (data$ExpDate[i] == '01_06_22' || data$ExpDate[i] == '02_06_22' || data$ExpDate[i] == '16_08_22' || data$ExpDate[i] == '17_08_22'){ 
      data$Day[i] <- 1
    }
    if (data$ExpDate[i] == '25_08_22'|| data$ExpDate[i] == '06_09_22' || data$ExpDate[i] == '08_09_22'){
      data$Day[i] <- 2
    }
  }
  if (data$Sujet[i] == 2){
    if (data$ExpDate[i] == '31_08_22' || data$ExpDate[i] == '02_09_22' || data$ExpDate[i] == '06_09_22'){
      data$Day[i] <- 1
    }
    else {
      data$Day[i] <- 2
    }
  }
  if (data$Sujet[i] == 3){
    if (data$ExpDate[i] == '25_05_22' || data$ExpDate[i] == '26_05_22' || data$ExpDate[i] == '01_06_22' || data$ExpDate[i] == '17_08_22'){
      data$Day[i] <- 1
    }
    else if (data$ExpDate[i] == '02_09_22' || data$ExpDate[i] == '06_09_22' || data$ExpDate[i] == '24_08_22' || data$ExpDate[i] == '13_09_22'){
      data$Day[i] <- 3
    }
    else {
      data$Day[i] <- 2
    }
  }
  if (data$Sujet[i] == 4){
    #attention, ici, je compte que le 07 et le 09 sept pour Adira bloc 2 sont 1 seul jour, comme noté plus haut. il n'y a pas de bloc 3 (2èjour) le 12.09 (car viré plus haut)
    if (data$ExpDate[i] == '07_09_22' || data$ExpDate[i] == '09_09_22' || data$ExpDate[i] == '12_09_22'){
      data$Day[i] <- 1
    }
    else{
      data$Day[i] <- 2
    }
  }
  if (data$Sujet[i] == 5){
    data$Day[i] <- 1
  }
}
data <- filter(data, Day == 1)
```


#### DATAFILE-SPECIFIC INFORMATION: human_clean.csv

1. Number of variables: 20

2. Variable List: Sujet (subject number), ExpDate (date of data collection), Phase (2= test phase), Bloc (Block), Trial (number of exposure: 1= first round of 3 
exposures; 2= 6 exposures, 3= 9 exposures etc.), Passage (1= first test; 2= second test), Pos_Image_Test (Target position: 1= left; 2= right), Contexte 
(1= variable; 2= invariant), ImageG (name of left image), ImageD (name of right image), PreTotRegard (total looking time to the screen pre-naming), 
PostTotRegard (total looking time to the screen post-naming), PreNamingR (pre-naming ratio of looks to the target), PostNamingR (post-naming ratio of looks to the 
target), OnTarget_0 (where is the first look at the start of the post-naming phase: 0= on the screen but not on stimuli; 1= on target, 2= on distractor; 
3= not on screen; 99= error), OnTarget_367 (same as previous except that codes first look after 367ms of post-naming), Valid (1= looking time to the screen
> 800ms; 0= < 800ms), targWd (name of the target sound), dist (name of the distractor sound), Salience (of the target compared to distractor in pre-naming).

3. Missing data codes: NA

4. Remarks: Subject 1 is a false subject (test of the setup) and should be excluded.
The 'Contexte' variable contains erroneous data, to get the correct data, run this code:

```
data$Contexte <- NA
for (i in 1:nrow(data)) {
  if (data$Bloc[i] == 1 || data$Bloc[i] == 4) {
    data$Contexte[i] <- 1 
  }
  else {
    data$Contexte[i] <- 2
  }
}
```

#### DATAFILE-SPECIFIC INFORMATION: collection of stimuli pictures

1. Number of files: 20

20 pictures

2. Remarks:
Block 1 (variable condition): Pair 1 (Img_01_1 and Img_01_2, all backgrounds)
Block 2 (invariant condition): Pair 2 (Img_02_1_2 and Img_02_2_2)
Block 3 (invariant condition): Pair 3 (Img_03_1_4 and Img_03_2_4)
Block 4 (variable condition): Pair 4 (Img_04_1 and Img_04_2, all backgrounds)

Explanation of picture name: Img_block_number_background

##### SHARING/ACCESS INFORMATION

1. Licence of reuse: Open Access

2. Special considerations regarding data reuse (if any): cite the original article and inform the first author. 

3. Recommended citation for the dataset: 
Labertoniere, D., Wilson, V. A., Pascual-Guàrdia, C., Skoruppa, K., & Zuberbühler, K. (2025). Fast mapping in hominids - dataset

4. Data repository where the data is stored: libra.unine.ch