Generating Candidate Lists of Adaptive TEs

How many TEs are in genome regions with recombination (non-zero)?


2. Remember, once you take a subgroup from the original file, the following subsets will not be taken from the original file but from the previous subgroup you obtained.

How many TEs are in recombination regions and in high frequency regions (more than 10%) in the genome of at least one out of the 5 populations?

3. We want to detect the adaptive “out-of-Africa” TEs, meaning these flies originated under tropical African conditions and then had to adapt to the cold and arid conditions of Europe. In this case, which populations will we will have to compare?

4. Keep in mind that with R, we can get a subset from the original file, taking into account more than one condition:

new_list <- subset (last_list_data_file, condition_variable | condition_variable & condition_variable

You can check it in the code file opened in the text editor. The threshold in which we consider a TE low frequency or  high frequency is a hot point. We consider a TE low frequency if less than 10% of the population carry it; and high frequency if more than 10% of the population carry it. By using the subset function, you can generate a list of which “out-of-Africa” candidates are adaptive. In this case, how many candidate TEs do we have for the “out-of-Africa” adaptation? (Remember they should be in regions with recombination).

5. We consider those TEs located inside genes or in the nearby gene regulatory regions to be most important. This way, TEs in our lists will be more likely to affect the nearby gene and avoid including neutral TEs. In Drosophila melanogaster, promotor and regulation regions are considered to be 1kb from the gene.

Remember that in our data file, gene distance is measured in bp and not in kb. And remember, module 2 box to transform the number and get the information you want.

How many candidate TEs do we have for the “out-of-Africa” adaptation (regions with recombination), within less than 1kb of a gene?