1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
#include <iostream> #include <vector> #include <cmath> #include <cstdlib> #include <ctime> using namespace std; // Function to generate a large dataset of random integers vector<int> generateDataset(size_t size, int minValue, int maxValue) { vector<int> dataset(size); for (size_t i = 0; i < size; ++i) { dataset[i] = minValue + rand() % (maxValue - minValue + 1); } return dataset; } // Function to calculate the mean of a dataset double calculateMean(const vector<int>& dataset) { double sum = 0; for (int value : dataset) { sum += value; } return sum / dataset.size(); } // Function to calculate the standard deviation of a dataset double calculateStandardDeviation(const vector<int>& dataset, double mean) { double sum = 0; for (int value : dataset) { sum += (value - mean) * (value - mean); } return sqrt(sum / dataset.size()); } int main() { srand(static_cast<unsigned>(time(0))); // Seed for random number generation size_t datasetSize = 1000000; // Size of the dataset int minValue = 0; // Minimum value in the dataset int maxValue = 100; // Maximum value in the dataset // Generate the dataset vector<int> dataset = generateDataset(datasetSize, minValue, maxValue); // Calculate mean and standard deviation double mean = calculateMean(dataset); double stdDev = calculateStandardDeviation(dataset, mean); // Print results cout << "Dataset Size: " << datasetSize << endl; cout << "Mean: " << mean << endl; cout << "Standard Deviation: " << stdDev << endl; return 0; } |
Explanation
- Headers:
<iostream>
: For input and output operations.<vector>
: For using thestd::vector
container.<cmath>
: For mathematical functions likesqrt()
.<cstdlib>
: Forrand()
andsrand()
functions.<ctime>
: Fortime()
function to seed the random number generator.
- Function
generateDataset
:- Parameters:
size
: Number of elements in the dataset.minValue
andmaxValue
: Range of random integers.
- Functionality:
- Generates a dataset of random integers within the specified range and size.
- Parameters:
- Function
calculateMean
:- Parameters:
const vector<int>& dataset
: The dataset for which to calculate the mean.
- Functionality:
- Computes the mean (average) of the dataset by summing all values and dividing by the size.
- Parameters:
- Function
calculateStandardDeviation
:- Parameters:
const vector<int>& dataset
: The dataset for which to calculate the standard deviation.double mean
: The mean of the dataset.
- Functionality:
- Computes the standard deviation by calculating the square root of the average of squared deviations from the mean.
- Parameters:
- Main Function:
- Initialization:
- Seeds the random number generator.
- Defines dataset size and value range.
- Generates the dataset.
- Processing:
- Calculates the mean and standard deviation of the dataset.
- Output:
- Prints the size of the dataset, mean, and standard deviation.
- Initialization:
Notes:
- Big Data Simulation: This example simulates the processing of a large dataset by performing statistical calculations.
- Efficiency: For actual big data processing, more efficient algorithms and libraries are needed to handle extremely large datasets.
- Extensions: The program can be extended with additional statistical operations, data transformations, or integration with actual big data processing frameworks.