We now want to highlight that some of the actual research done by scientists to identify and demonstrate the adaptive effect of TEs in Drosophila, which was accomplished by following similar sorts of procedures to the ones you have been learning in these modules. We have attached a summary of some of these international investigations that may be of interest to you, and you test yourself as a young scientist.

As you can read, there is a variety of molecular mechanisms in which TE insertions can lead to an adaptive effect: they can result in the generation of new or more abundant gene transcripts, they can inactivate them, or they can increase specific transcripts in certain tissues. After you finish reading, you will easily be able to see how there are very few TEs whose adaptive effects are already known, so there is still a lot of work to do. Who’s in!?

Annex I: Current information about candidate TEs to be adaptive in Drosophila melanogaster

 

How can we distinguish mutations that are adaptive from those that are not?

If a mutation causes flies to have more descendants, there will be more flies in each generation which that mutation; so the population frequency will increase. TEs present in the genome at high frequencies are very good candidates to work with. Remember, we cannot distinguish adaptive TEs present at high frequencies from neutral TEs whose frequency increase is stochastic or random.

As we already know, when there is no recombination, cell control mechanisms are very relaxed, so let’s exclude as candidate TEs all those that are present in these kinds of regions. In other words, let’s remove those TEs present in zero recombination regions in the genome. It is possible we might lose some adaptive TEs, but the risk is worth it.

We are going to select a TE subgroup from our data list, which we’ve previously imported in R.

In R, the function we need is called “subset”. This is how it works:

new_list <- subset(data_file, condition_variable)

Have a look at the real code file in the text file.

REMEMBER: conditions in R are:

  • Equal to: ==
  • Different to: !=
  • Greater than: >
  • Lower than: <
  • For multiple conditions and the same time, you can use & (and) or | (or)

 

4.1. TEST YOURSELF

4.1. TEST YOURSELF

Generating Candidate Lists of Adaptive TEs

1.
How many TEs are in genome regions with recombination (non-zero)?

 

2. Remember, once you take a subgroup from the original file, the following subsets will not be taken from the original file but from the previous subgroup you obtained.

How many TEs are in recombination regions and in high frequency regions (more than 10%) in the genome of at least one out of the 5 populations?

3. We want to detect the adaptive “out-of-Africa” TEs, meaning these flies originated under tropical African conditions and then had to adapt to the cold and arid conditions of Europe. In this case, which populations will we will have to compare?

4. Keep in mind that with R, we can get a subset from the original file, taking into account more than one condition:

new_list <- subset (last_list_data_file, condition_variable | condition_variable & condition_variable

You can check it in the code file opened in the text editor. The threshold in which we consider a TE low frequency or  high frequency is a hot point. We consider a TE low frequency if less than 10% of the population carry it; and high frequency if more than 10% of the population carry it. By using the subset function, you can generate a list of which “out-of-Africa” candidates are adaptive. In this case, how many candidate TEs do we have for the “out-of-Africa” adaptation? (Remember they should be in regions with recombination).

5. We consider those TEs located inside genes or in the nearby gene regulatory regions to be most important. This way, TEs in our lists will be more likely to affect the nearby gene and avoid including neutral TEs. In Drosophila melanogaster, promotor and regulation regions are considered to be 1kb from the gene.

Remember that in our data file, gene distance is measured in bp and not in kb. And remember, module 2 box to transform the number and get the information you want.

How many candidate TEs do we have for the “out-of-Africa” adaptation (regions with recombination), within less than 1kb of a gene?

4.2. Changing the parameters to check how the results vary

The 10% threshold is a value randomly chosen – we’re assuming that the TEs present at frequencies higher than 10% could be relevant for the population. However, how does changing the threshold affect the number of candidate TEs?

Let’s check what happens if we modify the threshold value. How many candidate TEs do we have (those in recombination regions and inside or in the nearby of other genes) with the new threshold?

Have a look in the examples in the code file.

4.2. TEST YOURSELF

4.2. TEST YOURSELF

Change the parameters to check how results vary

1. How does the candidate list of adaptive TEs change when you change threshold values from 15% and 40%?

How do we know what the proper parameters are? We can’t. We can instead generate a hypothesis and validate it. Validation is the work done in the lab, and it is highly important. Once we have a candidate list of TEs, we have to check in the lab if we are right. What we learn from this will be useful for future analysis.

4.3. TEST YOURSELF

Check if the validated TEs are on the list

The last step is to validate our list with the candidate TEs thought to be responsible for the fly’s adaptation out of Africa to its new environments. Lab work takes many years, but some discoveries have been already been made. We can look for those TEs already shown to be adaptive in Drosophila, and check how many of them are in our lists.

In Annex I, you can read about 6 TEs already validated in Drosophila melanogaster. Once you get the “FBti” ID from the text, you can look it up in our data file, to check if they are in our “out-of-Africa” candidate TE list

4.3. TEST YOURSELF

1. Considering a 10% frequency threshold, how many of those TEs are in our list?

As you see, in selecting a small number of populations (5 in this case), we are “losing” an important number of adaptive TEs. Most probably, not all the 55 candidate TEs will be adaptive; and we know that there are more adaptive TEs than those that are in our lists.

Scientific results are accumulative; and the more we learn, the better we can understand what is going on in nature. New hypothesis emerge from previous results, and that’s how the knowledge continues to grow.

There is still a lot of work to do, and hopefully, with your work, you are helping us to move forward.

< Module 3