Transfer learning is a powerful technique that enables model training with limited amounts of data, making it crucial in many data-scarce real-world applications. Typically, transfer learning protocols require first to transfer all the feature-extractor layers of a network pretrained on a data-rich source task, and then to adapt only the task-specific readout layers to a data-poor target task. This workflow is based on two main assumptions: first, the feature maps of the pre-trained model are qualitatively similar to the ones that would have been learned with enough data on the target task; second, the source representations of the last hidden layers are always the most expressive. In this work, we demonstrate that this is not always the case and that the largest performance gain may be achieved when smaller portions of the pre-trained network are transferred. In particular, we perform a set of numerical experiments in a controlled setting, showing how the optimal transfer depth depends non-trivially on the amount of available training data and on the degree of sourcetarget task similarity, and it is often convenient to transfer only the first layers. We then propose a strategy to detect the most promising source task among the available candidates. This approach compares the internal representations of a network trained entirely from scratch on the target task with those of the networks pre-trained on the potential source tasks.

How to choose the right transfer learning protocol? A qualitative analysis in a controlled set-up

Saglietti, Luca;
2024

Abstract

Transfer learning is a powerful technique that enables model training with limited amounts of data, making it crucial in many data-scarce real-world applications. Typically, transfer learning protocols require first to transfer all the feature-extractor layers of a network pretrained on a data-rich source task, and then to adapt only the task-specific readout layers to a data-poor target task. This workflow is based on two main assumptions: first, the feature maps of the pre-trained model are qualitatively similar to the ones that would have been learned with enough data on the target task; second, the source representations of the last hidden layers are always the most expressive. In this work, we demonstrate that this is not always the case and that the largest performance gain may be achieved when smaller portions of the pre-trained network are transferred. In particular, we perform a set of numerical experiments in a controlled setting, showing how the optimal transfer depth depends non-trivially on the amount of available training data and on the degree of sourcetarget task similarity, and it is often convenient to transfer only the first layers. We then propose a strategy to detect the most promising source task among the available candidates. This approach compares the internal representations of a network trained entirely from scratch on the target task with those of the networks pre-trained on the potential source tasks.
2024
2024
Gerace, Federica; Doimo, Diego; Sarao Mannelli, Stefano; Saglietti, Luca; Laio, Alessandro
File in questo prodotto:
File Dimensione Formato  
2385_How_to_choose_the_right_t.pdf

accesso aperto

Descrizione: article
Tipologia: Pdf editoriale (Publisher's layout)
Licenza: Creative commons
Dimensione 1.25 MB
Formato Adobe PDF
1.25 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11565/4068076
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact