
Android Forensics: APK Downgrades Part 3 Creating Synthetic Test Data
2 Jul 2025
7 minutes
Part 3: Creating Synthetic Test Data
Disclaimer and Prerequisites
This post discusses the generation of fake test data. You will simply need a web browser and a test device to load fake data into. This includes but is not limited to: names, addresses, email addresses, birthdays, locations, etc. The information I am using is NOT associated to a real person, but was created using the tools mentioned in this blog post. If you are attempting this, DO NOT use yours or anyone elses personal identifiable information (PII).
Introduction
Welcome to Part 3 of my Android Forensics: APK Downgrades series! In my last post, we unlocked the bootloader and rooted our Google Pixel 6a. The next step is to create fake test data that we will load into our device. If you’re interested in learning how to do so, I referred to a blog post and took a free course provided by Hexordia! Links for both can be found below.

Why is test data important?
Fake/test data could be needed for a variety of reasons. I ran into a perfect example of why during the set up of my device. After I initially unlocked and rooted my device, I realized I had jumped the gun. In order to download Termux from the Google Play store, you’re required to sign-in using a Google account. Can you guess my mistake? I used my personal email. This action, although it may seem small, would be captured once I took a full file system acquisition. As you can imagine, there are probably other things that would require further personal information which I would like to avoid giving.
Fortunately, since this is a project, I can start over by factory resetting the phone, re-unlock the bootloader and root the device, and then populate my device with the fake data I generated using the tools mentioned below. Generating fake data before starting this process would probably help to avoid small mistakes like this.
Another reason to think about is imagine you were conducting research over an extended period of time. If you didnât want anyone to know it was you, youâd probably use something like a sock puppet account. A sock puppet account is a fake account of a âfalse online identity used for deceptive purposes.â Assume you made a sock puppet account for LinkedIn. Something to consider is your digital footprint. An account that was just created would seem more suspicious compared to one that has been around for a couple years. Another example is an account that has made a reasonable amount of comments versus no comments at all. It depends on every situation but solid digital footprint is important to come off as ârealisticâ as possible and test/fake data becomes important in doing that.
Hexordia’s blog post provides another great reason from the shoes of someone working with data from a real case. Real investigations warrant parsing through data to collect information. Data from real cases hold PII - something you absolutely don’t want to distribute or do “testing” with. Something that becomes especially tricky when it comes to using third-party applications since you don’t know what is being shared. In a situation like that, test data becomes a life saver.

The Guidelines for Dataset Development by OSAC
This guideline is a very good resource for anyone who is developing and using datasets in digital forensics. If you are following my series to follow your own project, a digital forensics practitioner, or just interested overall, I highly suggest reading it! It provides an extensive range of topics in regards to dataset creation. This includes but is not limited to: the documentation, planning, preparation, and creation of datasets, the use of existing datasets, limitations, and more!
An example of how useful it was is when I came across a list of tips to crete a successful profile. One of the points was to “Create an email with an account that allows for anonymity and doesnât require a phone number or 2FA to start (i.e., Protonmail). Then use that email address as the second factor for other accounts.” As I mentioned earlier, I made the mistake of using my personal email address in the Google Play Store. After reading this, before even attempting to unlock and reboot my device again, I made sure I had a Protonmail email address ready for when I go through that process again.
This is of course a very tldr description on these guidelines but if you are following my series to conduct your own project, a digital forensics practitioner, or just interested overall, I highly suggest reading it!
Read more here!
What tools are available?
Fake Name Generator
Fake Name Generator is a free tool that generates fake but realistic names and biographical information. You can choose from 37 countries and 31 languages which broadens your options depending on what kind of âprofileâ youâre creating. This is one of the more realistic fake data generators Iâve used for this project. Any basic information you would need is given. For example, phone numbers, SSNâs, physical characteristics, financial and employment information, vehicle information, and online details (like what browser user agent youâre using!). If you were looking for even more specific/random information to associate with your fake uesr (e.g. answers to security questions), I would use the next tool I mention, Mockaroo, in addition to Fake Name Generator.

Mockaroo
Mockaroo is a free tool that can generate large amounts of realistic test data which can be easily downloaded and loaded into a testing environment. It has many data types that you can choose from AND it can support more than 5000 records per file. For project purposes, not only can it help create fake PII, but it can immediately give me information such as pet names, car models, hospitals, etc., that could be used for something like security questions when creating a fake profile!



GenerateData
GenerateData is a free and open-source project that was created specifically to generate fake data. You can choose the specific data types you want as well as export format. This is yet another tool that is extremely useful for people who need large data sets of information. However, unlike the correlations we saw between types of data from Mockaroo, the information that GenerateData outputs isn’t exactly realistic.
For example, in many cases, people include their full name in their email address (e.g. someone named Jane Doe having janedoe[@]gmail[.]com or jdoe[@]gmail[.]com as an email address). We saw realistic correlations like this in the data from Mockaroo. But, if we look at the second image below, we see discrepancies (e.g. Alec Dodson having the email aliquam[@]yahoo[.]ca). Again, this data is fake so it makes sense but if you’re looking for something more realistic, this may not be the tool for you.


ThisPersonDoesNotExist
ThisPersonDoesNotExist is a website that generates realistic images of fake people using AI. Refreshing the page will cause a new image to be generated. This is a good source for an image for a fake user but it is important to remember that this is only one picture. If you were looking to populate an Instagram sock puppet account, for example, you wouldnât be able to get other pictures of this person. These pictures are like one-time headshots. If you’d like to know more, The Guidelines for Dataset Development by OSAC discusses this very concept of “Profile Enrichment” in further detail.

It would be best to go through a couple images and look for a realistic picture. If you refresh the page a couple times, you might find that there are some images that look⌠off. If you want to create a realistic social media account, you probably donât want a picture like the one below XD

Conclusion
I hope you enjoyed this post! All the tools mentioned in this post are tools I used myself for this project . When creating test data for project or legitimate case work, always remember to follow best practices to ensure the integrity of your work. In my next blog post, I’ll be sharing what the results of using these tools looked like for me as well as going over full file system acquisitions. Until next time!

References
- https://www.hexordia.com/blog/creating-synthetic-test-data?rq=test%20data
- https://learn.hexordia.com/courses/Creating-Mobile-Test-Data-66eae235a2b4ef2d6b79cef3
- https://www.nist.gov/system/files/documents/2022/12/15/OSAC-DE-Guidelines%20for%20Dataset%20Development.pdf
- https://espysys.com/blog/simplifying-osint-profiling/